Full Code of AlexsJones/llmfit for AI

main 152afc6e2c2f cached
68 files
1.6 MB
495.2k tokens
895 symbols
1 requests
Download .txt
Showing preview only (1,662K chars total). Download the full file or copy to clipboard to get everything.
Repository: AlexsJones/llmfit
Branch: main
Commit: 152afc6e2c2f
Files: 68
Total size: 1.6 MB

Directory structure:
gitextract_gm363zlf/

├── .dockerignore
├── .githooks/
│   └── pre-push
├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       ├── ci.yml
│       ├── docker.yml
│       ├── release-desktop.yml
│       └── release.yml
├── .gitignore
├── .release-please-manifest.json
├── AGENTS.md
├── API.md
├── CHANGELOG.md
├── CNAME
├── Cargo.toml
├── Dockerfile
├── LICENSE
├── MODELS.md
├── Makefile
├── README.md
├── README.zh.md
├── data/
│   └── hf_models.json
├── flake.nix
├── index.html
├── install.sh
├── llmfit-core/
│   ├── Cargo.toml
│   ├── data/
│   │   ├── docker_models.json
│   │   └── hf_models.json
│   └── src/
│       ├── fit.rs
│       ├── hardware.rs
│       ├── lib.rs
│       ├── models.rs
│       ├── plan.rs
│       └── providers.rs
├── llmfit-desktop/
│   ├── Cargo.toml
│   ├── build.rs
│   ├── capabilities/
│   │   └── default.json
│   ├── src/
│   │   └── main.rs
│   ├── tauri.conf.json
│   └── ui/
│       ├── app.js
│       ├── index.html
│       └── styles.css
├── llmfit-tui/
│   ├── Cargo.toml
│   ├── build.rs
│   └── src/
│       ├── display.rs
│       ├── main.rs
│       ├── serve_api.rs
│       ├── theme.rs
│       ├── tui_app.rs
│       ├── tui_events.rs
│       └── tui_ui.rs
├── llmfit-web/
│   ├── README.md
│   ├── index.html
│   ├── package.json
│   ├── src/
│   │   ├── App.jsx
│   │   ├── App.test.jsx
│   │   ├── api.js
│   │   ├── api.test.js
│   │   ├── main.jsx
│   │   ├── styles.css
│   │   └── test-setup.js
│   └── vite.config.js
├── scripts/
│   ├── install-openclaw-skill.sh
│   ├── scrape_docker_models.py
│   ├── scrape_hf_models.py
│   ├── test_api.py
│   ├── update_models.sh
│   └── verify_models.py
└── skills/
    └── llmfit-advisor/
        └── SKILL.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .dockerignore
================================================
# Build artifacts
target/
*.swp
*.swo
*~

# IDE
.vscode/
.idea/
*.iml

# Git
.git/
.gitignore
.githooks/

# Documentation and assets
*.md
!README.md
*.png
*.gif
demo.gif
download.gif
home_laptop.png
moe.png
assets/
.github/

# Scripts (not needed in container)
scripts/
install.sh

# Website files
index.html
CNAME

# Release metadata
.release-please-manifest.json

# Skills (OpenClaw integration, not needed in container)
skills/
AGENTS.md


================================================
FILE: .githooks/pre-push
================================================
#!/usr/bin/env bash
set -e

echo "Running cargo fmt --check..."
cargo fmt --check
if [ $? -ne 0 ]; then
    echo "❌ Formatting issues found. Run 'cargo fmt' to fix."
    exit 1
fi

echo "✅ Formatting OK"


================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
  - package-ecosystem: cargo
    directory: /
    schedule:
      interval: weekly
    cooldown: # applies only to version-updates (not security-updates)
      default-days: 10
      semver-minor-days: 14 # wait 14 days before applying minor updates
      semver-major-days: 28
  - package-ecosystem: github-actions
    directory: /
    schedule:
      interval: weekly
    cooldown:
      default-days: 10


================================================
FILE: .github/workflows/ci.yml
================================================
name: CI

on:
  push:
    branches:
      - main    # Run after merging into the "main" target
    paths:      # Run only if files relevant to CI have changed (i.e. not install.sh, scripts or .github/worfklows)
      - 'llmfit-*/**'
      - 'Cargo.*o*'
  pull_request:
    branches:
      - main    # Run in PRs targeting the "main" branch
    paths:
      - 'llmfit-*/**'
      - 'Cargo.*o*'
    types:      # Avoid low-impact events like "edited" or "labeled"
      - opened
      - synchronize
      - reopened

env:
  CARGO_TERM_COLOR: always
  RUST_BACKTRACE: 1

jobs:
  test:
    name: Test Suite
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        rust: [stable]

    steps:
      - name: Checkout code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable
        with:
          toolchain: ${{ matrix.rust }}

      - name: Set up Node
        uses: actions/setup-node@v4
        with:
          node-version: lts/*
          cache: npm
          cache-dependency-path: llmfit-web/package-lock.json

      - name: Build web dashboard assets
        run: |
          cd llmfit-web
          npm ci
          npm run build

      - name: Run web unit tests
        run: |
          cd llmfit-web
          npm test

      - name: Cache cargo registry
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: ~/.cargo/registry
          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-registry-

      - name: Cache cargo index
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: ~/.cargo/git
          key: ${{ runner.os }}-cargo-index-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-index-

      - name: Cache target directory
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: target
          key: ${{ runner.os }}-cargo-target-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-target-

      - name: Run tests
        run: cargo test --verbose

  fmt:
    name: Rustfmt
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable
        with:
          components: rustfmt

      - name: Check formatting
        run: cargo fmt --all -- --check

  clippy:
    name: Clippy
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable
        with:
          components: clippy

      - name: Set up Node
        uses: actions/setup-node@v4
        with:
          node-version: lts/*
          cache: npm
          cache-dependency-path: llmfit-web/package-lock.json

      - name: Build web dashboard assets
        run: |
          cd llmfit-web
          npm ci
          npm run build

      - name: Cache cargo registry
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: ~/.cargo/registry
          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-registry-

      - name: Cache cargo index
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: ~/.cargo/git
          key: ${{ runner.os }}-cargo-index-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-index-

      - name: Cache target directory
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: target
          key: ${{ runner.os }}-cargo-target-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-target-

      - name: Run clippy
        run: cargo clippy --all-targets --all-features

  check:
    name: Cargo Check
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable

      - name: Set up Node
        uses: actions/setup-node@v4
        with:
          node-version: lts/*
          cache: npm
          cache-dependency-path: llmfit-web/package-lock.json

      - name: Build web dashboard assets
        run: |
          cd llmfit-web
          npm ci
          npm run build

      - name: Cache cargo registry
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: ~/.cargo/registry
          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-registry-

      - name: Cache cargo index
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: ~/.cargo/git
          key: ${{ runner.os }}-cargo-index-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-index-

      - name: Cache target directory
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
        with:
          path: target
          key: ${{ runner.os }}-cargo-target-${{ hashFiles('**/Cargo.lock') }}
          restore-keys: |
            ${{ runner.os }}-cargo-target-

      - name: Run cargo check
        run: cargo check --all-targets --all-features


================================================
FILE: .github/workflows/docker.yml
================================================
name: Docker Build and Push

on:
  push:
    tags:
      - "v*"
      - "!v*-mac"
  workflow_dispatch: # Allow manual trigger

permissions:
  contents: read
  packages: write

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  docker:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v4

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata (tags, labels)
        id: meta
        uses: docker/metadata-action@030e881283bb7a6894de51c315a6bfe6a94e05cf # v6.0.0
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=semver,pattern={{major}}
            type=raw,value=latest,enable={{is_default_branch}}

      - name: Build and push Docker image
        uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max


================================================
FILE: .github/workflows/release-desktop.yml
================================================
name: Release Desktop (macOS)

on:
  push:
    tags:
      - "v*-mac"

permissions:
  contents: write

env:
  CARGO_TERM_COLOR: always

jobs:
  build-desktop:
    strategy:
      matrix:
        include:
          - target: aarch64-apple-darwin
            os: macos-latest
          - target: x86_64-apple-darwin
            os: macos-latest

    runs-on: ${{ matrix.os }}

    steps:
      - name: Checkout
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable
        with:
          targets: ${{ matrix.target }}

      - name: Install Tauri CLI
        run: cargo install tauri-cli --version "^2"

      - name: Build Tauri app bundle
        run: cargo tauri build --target ${{ matrix.target }} --bundles app
        working-directory: llmfit-desktop

      - name: Package .app bundle
        shell: bash
        run: |
          TAG="${GITHUB_REF_NAME}"
          # Search both possible target locations
          for BASE in "target" "llmfit-desktop/target"; do
            APP=$(find "${BASE}/${{ matrix.target }}/release/bundle" -name '*.app' -maxdepth 3 2>/dev/null | head -1)
            [ -n "$APP" ] && break
          done

          if [ -z "$APP" ]; then
            echo "::error::No .app bundle found"
            find target/ llmfit-desktop/target/ -type d -name 'bundle' 2>/dev/null || true
            exit 1
          fi

          echo "Found app bundle: $APP"
          DEST="llmfit-desktop-${TAG}-${{ matrix.target }}.app.tar.gz"
          cd "$(dirname "$APP")"
          tar czf "/tmp/${DEST}" "$(basename "$APP")"
          echo "DESKTOP_ASSET=${DEST}" >> "$GITHUB_ENV"
          echo "DESKTOP_ASSET_PATH=/tmp/${DEST}" >> "$GITHUB_ENV"

      - name: Upload artifact
        uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
        with:
          name: ${{ env.DESKTOP_ASSET }}
          path: ${{ env.DESKTOP_ASSET_PATH }}

  release:
    needs: build-desktop
    runs-on: ubuntu-latest

    steps:
      - name: Download all artifacts
        uses: actions/download-artifact@70fc10c6e5e1ce46ad2ea6f2b72d43f7d47b13c3 # v8.0.0
        with:
          path: artifacts

      - name: Create GitHub Release
        uses: softprops/action-gh-release@a06a81a03ee405af7f2048a818ed3f03bbf83c7b # v2.5.0
        with:
          generate_release_notes: true
          files: artifacts/**/*.tar.gz


================================================
FILE: .github/workflows/release.yml
================================================
name: Release

on:
  push:
    tags:
      - "v*"
      - "!v*-mac"

permissions:
  contents: write

env:
  CARGO_TERM_COLOR: always
  BINARY: llmfit

jobs:
  build:
    strategy:
      matrix:
        include:
          # Linux x86_64 (static musl)
          - target: x86_64-unknown-linux-musl
            os: ubuntu-latest
            use-cross: true

          # Linux ARM64 (static musl)
          - target: aarch64-unknown-linux-musl
            os: ubuntu-latest
            use-cross: true

          # Linux x86_64 (glibc)
          - target: x86_64-unknown-linux-gnu
            os: ubuntu-latest
            use-cross: false

          # Linux ARM64 (glibc)
          - target: aarch64-unknown-linux-gnu
            os: ubuntu-latest
            use-cross: true

          # macOS Intel (cross-compiled from ARM64 runner)
          - target: x86_64-apple-darwin
            os: macos-latest
            use-cross: false

          # macOS Apple Silicon
          - target: aarch64-apple-darwin
            os: macos-latest
            use-cross: false

          # Windows x86_64
          - target: x86_64-pc-windows-msvc
            os: windows-latest
            use-cross: false

          # Windows ARM64
          - target: aarch64-pc-windows-msvc
            os: windows-latest
            use-cross: false

    runs-on: ${{ matrix.os }}

    steps:
      - name: Checkout
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable
        with:
          targets: ${{ matrix.target }}

      - name: Build web dashboard assets
        run: |
          cd llmfit-web
          npm ci
          npm run build

      - name: Install cross
        if: matrix.use-cross
        run: cargo install cross --version 0.2.5

      - name: Build
        shell: bash
        run: |
          if [ "${{ matrix.use-cross }}" = "true" ]; then
            cross build --release --target ${{ matrix.target }}
          else
            cargo build --release --target ${{ matrix.target }}
          fi

      - name: Package
        shell: bash
        run: |
          TAG="${GITHUB_REF_NAME}"
          ASSET_NAME="${BINARY}-${TAG}-${{ matrix.target }}"
          STAGING="${RUNNER_TEMP}/${ASSET_NAME}"
          mkdir -p "${STAGING}"

          if [[ "${{ matrix.target }}" == *"windows"* ]]; then
            EXE_EXT=".exe"
            ARCHIVE_EXT=".zip"
            COMPRESS_CMD="7z a ${ASSET_NAME}${ARCHIVE_EXT} ${ASSET_NAME}"
          else
            EXE_EXT=""
            ARCHIVE_EXT=".tar.gz"
            COMPRESS_CMD="tar czf ${ASSET_NAME}${ARCHIVE_EXT} ${ASSET_NAME}"
          fi

          cp "target/${{ matrix.target }}/release/${BINARY}${EXE_EXT}" "${STAGING}/"
          cp README.md LICENSE "${STAGING}/" 2>/dev/null || true

          cd "${RUNNER_TEMP}"
          $COMPRESS_CMD

          # Generate per-asset SHA256 checksum file (consistent format for both tools)
          if command -v sha256sum >/dev/null 2>&1; then
            sha256sum "${ASSET_NAME}${ARCHIVE_EXT}" | awk '{print $1 "  " $2}' > "${ASSET_NAME}${ARCHIVE_EXT}.sha256"
          elif command -v shasum >/dev/null 2>&1; then
            shasum -a 256 "${ASSET_NAME}${ARCHIVE_EXT}" | awk '{print $1 "  " $2}' > "${ASSET_NAME}${ARCHIVE_EXT}.sha256"
          fi

          echo "ASSET=${ASSET_NAME}${ARCHIVE_EXT}" >> "$GITHUB_ENV"
          echo "ASSET_PATH=${RUNNER_TEMP}/${ASSET_NAME}${ARCHIVE_EXT}" >> "$GITHUB_ENV"
          echo "CHECKSUM_PATH=${RUNNER_TEMP}/${ASSET_NAME}${ARCHIVE_EXT}.sha256" >> "$GITHUB_ENV"

      - name: Upload artifact
        uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
        with:
          name: ${{ env.ASSET }}
          path: |
            ${{ env.ASSET_PATH }}
            ${{ env.CHECKSUM_PATH }}

  release:
    needs: build
    runs-on: ubuntu-latest

    steps:
      - name: Download all artifacts
        uses: actions/download-artifact@70fc10c6e5e1ce46ad2ea6f2b72d43f7d47b13c3 # v8.0.0
        with:
          path: artifacts

      - name: Create GitHub Release
        uses: softprops/action-gh-release@a06a81a03ee405af7f2048a818ed3f03bbf83c7b # v2.5.0
        with:
          generate_release_notes: true
          files: |
            artifacts/**/*.tar.gz
            artifacts/**/*.zip
            artifacts/**/*.sha256

  publish-crate:
    needs: release
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable

      - name: Publish llmfit-core to crates.io
        run: cargo publish -p llmfit-core --token ${{ secrets.CARGO_REGISTRY_TOKEN }}

      - name: Wait for crates.io index update
        run: sleep 30

      - name: Publish llmfit to crates.io
        run: cargo publish -p llmfit --token ${{ secrets.CARGO_REGISTRY_TOKEN }}

  update-homebrew:
    needs: release
    runs-on: ubuntu-latest

    steps:
      - name: Checkout tap
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
        with:
          repository: AlexsJones/homebrew-llmfit
          token: ${{ secrets.HOMEBREW_TAP_TOKEN }}
          path: homebrew-tap

      - name: Download release assets and update formula
        env:
          TAG: ${{ github.ref_name }}
        run: |
          VERSION="${TAG#v}"

          # Download the four tarballs and compute SHA256
          declare -A SHAS
          for target in aarch64-apple-darwin x86_64-apple-darwin aarch64-unknown-linux-musl x86_64-unknown-linux-musl; do
            URL="https://github.com/AlexsJones/llmfit/releases/download/${TAG}/llmfit-${TAG}-${target}.tar.gz"
            echo "Downloading ${URL}..."
            SHA=$(curl -fsSL "$URL" | shasum -a 256 | awk '{print $1}')
            SHAS[$target]="$SHA"
            echo "${target}: ${SHA}"
          done

          # Generate the formula
          cat > homebrew-tap/Formula/llmfit.rb << RUBY
          class Llmfit < Formula
            desc "Terminal tool that right-sizes LLM models to your system hardware"
            homepage "https://github.com/AlexsJones/llmfit"
            version "${VERSION}"
            license "MIT"

            on_macos do
              if Hardware::CPU.arm?
                url "https://github.com/AlexsJones/llmfit/releases/download/v#{version}/llmfit-v#{version}-aarch64-apple-darwin.tar.gz"
                sha256 "${SHAS[aarch64-apple-darwin]}"
              else
                url "https://github.com/AlexsJones/llmfit/releases/download/v#{version}/llmfit-v#{version}-x86_64-apple-darwin.tar.gz"
                sha256 "${SHAS[x86_64-apple-darwin]}"
              end
            end

            on_linux do
              if Hardware::CPU.arm?
                url "https://github.com/AlexsJones/llmfit/releases/download/v#{version}/llmfit-v#{version}-aarch64-unknown-linux-musl.tar.gz"
                sha256 "${SHAS[aarch64-unknown-linux-musl]}"
              else
                url "https://github.com/AlexsJones/llmfit/releases/download/v#{version}/llmfit-v#{version}-x86_64-unknown-linux-musl.tar.gz"
                sha256 "${SHAS[x86_64-unknown-linux-musl]}"
              end
            end

            def install
              bin.install "llmfit"
            end

            test do
              assert_match "llmfit", shell_output("#{bin}/llmfit --help")
            end
          end
          RUBY

          # Fix heredoc indentation
          sed -i 's/^          //' homebrew-tap/Formula/llmfit.rb

      - name: Commit and push
        run: |
          cd homebrew-tap
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add Formula/llmfit.rb
          git commit -m "Update llmfit to ${GITHUB_REF_NAME}"
          git push


================================================
FILE: .gitignore
================================================
/target
llmfit
/docs
llmfit-desktop/gen/schemas/*
data/gguf_sources_cache.json
llmfit-web/node_modules/
llmfit-web/dist/


================================================
FILE: .release-please-manifest.json
================================================
{
  ".": "0.3.7"
}


================================================
FILE: AGENTS.md
================================================
# AGENTS.md

Instructions for AI agents contributing to this codebase.

---

## Project overview

`llmfit` is a Rust CLI/TUI tool that matches LLM models against local system hardware (RAM, CPU, GPU). It detects system specs, loads a model database from embedded JSON, scores each model's fit, and presents results in an interactive terminal UI or classic table output.

## Language and toolchain

- Rust, edition 2024.
- Build with `cargo build`. Run with `cargo run`.
- No nightly features required. Stable toolchain only.
- Minimum supported Rust version: whatever edition 2024 requires (1.85+).

## Architecture

```
main.rs          Entrypoint. Parses CLI args via clap. Launches TUI by default,
                 falls back to CLI subcommands (system, list, fit, search, info)
                 or --cli flag for classic table output.

hardware.rs      SystemSpecs::detect() reads RAM/CPU via sysinfo crate.
                 detect_gpu() shells out to nvidia-smi / rocm-smi, and
                 detects Apple Silicon via system_profiler.
                 On unified memory (Apple Silicon), VRAM = system RAM.
                 No async. No unsafe.

models.rs        LlmModel struct. ModelDatabase loads from data/hf_models.json
                 embedded via include_str!() at compile time. No runtime file I/O.

fit.rs           FitLevel enum (Perfect, Good, Marginal, TooTight).
                 RunMode enum (Gpu, CpuOffload, CpuOnly).
                 ModelFit::analyze() compares a model against SystemSpecs,
                 selecting the best available execution path (GPU > CPU offload > CPU).
                 rank_models_by_fit() sorts by fit level, then run mode, then utilization.

display.rs       CLI-mode table rendering using the tabled crate.
                 Only used when --cli flag or subcommands are invoked.

tui_app.rs       TUI application state. Holds all models, filters (search text,
                 provider toggles, fit filter), selection index.
                 All filtering logic is here -- apply_filters() recomputes
                 filtered_fits indices whenever inputs change.

tui_ui.rs        Rendering with ratatui. Four layout regions: system bar,
                 search/filter bar, model table (or detail pane), status bar.
                 Stateless rendering -- reads from App, writes to Frame.

tui_events.rs    Keyboard event handling with crossterm. Two modes: Normal
                 (navigation, filter toggling, quit) and Search (text input).
```

## Data flow

1. `App::new()` calls `SystemSpecs::detect()` and `ModelDatabase::new()`.
2. Every model is analyzed into a `ModelFit` via `ModelFit::analyze()`.
3. Results are sorted by `rank_models_by_fit()`.
4. `apply_filters()` produces `filtered_fits: Vec<usize>` (indices into `all_fits`).
5. The TUI render loop reads `App` state and draws via `tui_ui::draw()`.
6. `tui_events::handle_events()` mutates `App` state, triggering re-render.

## Model database

- Source: `data/hf_models.json` (33 models).
- Generated by `scripts/scrape_hf_models.py` (Python, stdlib only, no pip deps).
- Embedded at compile time via `include_str!("../data/hf_models.json")`.
- Schema per entry: name, provider, parameter_count, min_ram_gb, recommended_ram_gb, min_vram_gb, quantization, context_length, use_case.
- `min_vram_gb` is VRAM needed for GPU inference. `min_ram_gb` is system RAM needed for CPU inference. Both are derived from the same parameter count.
- RAM formula: `params * 0.5 bytes (Q4_K_M) / 1024^3 * 1.2 overhead`.
- VRAM formula: `params * 0.5 bytes (Q4_K_M) / 1024^3 * 1.1 activation overhead`.
- Recommended RAM: `model_size * 2.0`.

Do not manually edit `hf_models.json`. Regenerate it by running the scraper:

```sh
python3 scripts/scrape_hf_models.py
```

The scraper has hardcoded fallback entries for gated models that require authentication.

## Conventions

- No `unsafe` code.
- No `.unwrap()` on user-facing paths. Use proper error handling or `expect()` with a descriptive message for internal invariants only.
- Fit levels are ordered: Perfect > Good > Marginal > TooTight. Do not add levels without updating `rank_models_by_fit()` sort logic.
- Fit is VRAM-first. GPU inference with sufficient VRAM is the ideal path. CPU inference via system RAM is a fallback. The `RunMode` enum tracks which memory pool is being used (Gpu, CpuOffload, CpuOnly).
- `min_vram_gb` is the VRAM needed to load model weights on GPU. `min_ram_gb` is the system RAM needed for CPU-only inference (same weights, loaded into RAM instead). They represent the same workload on different hardware paths.
- On Apple Silicon (unified memory), VRAM = system RAM. The `CpuOffload` path is skipped because there is no separate RAM pool to spill to. `SystemSpecs::unified_memory` tracks this.
- TUI rendering is stateless. `tui_ui::draw()` must not mutate `App`. Pass `&mut App` only for `TableState` widget requirements -- do not use it to change application state.
- Event handling in `tui_events.rs` is the sole place that mutates `App` in the TUI loop.
- Keep `display.rs` and `tui_*.rs` independent. The CLI path must work without initializing any TUI state.

## Adding a new model to the database

1. Add the model's HuggingFace repo ID to `TARGET_MODELS` in `scripts/scrape_hf_models.py`.
2. If the model is gated (requires HF auth), add a fallback entry to the `FALLBACK` dict in the same script.
3. Run `python3 scripts/scrape_hf_models.py`.
4. Verify the output in `data/hf_models.json`.
5. Run `cargo build` to verify compilation.

## Adding a new filter

1. Add the filter state to `App` in `tui_app.rs`.
2. Add filtering logic inside `apply_filters()`.
3. Add the keybinding in `tui_events.rs` (Normal mode handler).
4. Add the UI widget in `tui_ui.rs` (`draw_search_and_filters()` function).
5. Update the status bar help text in `draw_status_bar()`.

## Adding a new CLI subcommand

1. Add a variant to the `Commands` enum in `main.rs`.
2. Add the match arm in the `main()` function's command dispatch.
3. Use `display.rs` functions for output, or add new ones as needed.

## Testing

There are no tests yet. When adding tests:

- Unit tests for `fit.rs` logic (given known SystemSpecs and LlmModel values, assert correct FitLevel).
- Unit tests for `models.rs` (verify JSON parsing, search matching).
- Integration tests for CLI subcommands via `assert_cmd` crate.
- TUI is difficult to unit test. Keep rendering stateless and test the state mutations in `tui_app.rs` directly.

## Dependencies policy

- Prefer crates that are well-maintained and have minimal transitive dependencies.
- `sysinfo` is the system detection crate. Do not replace it with raw platform calls.
- `ratatui` + `crossterm` is the TUI stack. Do not mix in `termion` or `ncurses`.
- `clap` with derive feature for CLI parsing. Do not use manual arg parsing.
- The Python scraper uses only stdlib (`urllib`, `json`). Do not add pip dependencies.

## Common tasks

```sh
# Build
cargo build

# Run TUI
cargo run

# Run CLI mode
cargo run -- --cli

# Run specific subcommand
cargo run -- system
cargo run -- fit --perfect -n 5
cargo run -- search "llama"

# Refresh model database
python3 scripts/scrape_hf_models.py && cargo build

# Check for compilation issues
cargo check

# Format code
cargo fmt

# Lint
cargo clippy
```

## Platform notes

- GPU detection shells out to `nvidia-smi` (NVIDIA) and `rocm-smi` (AMD). These are best-effort and fail silently if unavailable.
- Apple Silicon detection uses `system_profiler SPDisplaysDataType`. On unified memory Macs, VRAM is reported as available system RAM (same pool).
- `sysinfo` handles cross-platform RAM/CPU. No conditional compilation needed.
- The TUI uses crossterm which works on Linux, macOS, and Windows terminals.


================================================
FILE: API.md
================================================
# llmfit REST API Guide

This document is for agent/client builders integrating with `llmfit serve`.

## Purpose

`llmfit serve` exposes node-local model fit analysis (same core data used by TUI/CLI) over HTTP and serves a local web dashboard.

Primary use case:
- Query each node in a cluster for top runnable models.
- Aggregate externally (scheduler/controller/UI) for placement decisions.

## Start the server

```sh
llmfit serve --port 8787
```

Global flags still apply:

```sh
llmfit --memory 24G --max-context 8192 serve --port 8787
```

## Base URL

Default local base URL:

```text
http://127.0.0.1:8787
```

To expose outside localhost, pass `--host 0.0.0.0`.

If you are building from source and want the dashboard embedded in `llmfit`, build web assets first:

```sh
cd llmfit-web && npm ci && npm run build
```

## Endpoints

### `GET /`
Web dashboard entrypoint (same-origin UI for fit exploration).

### `GET /health`
Liveness probe.

Example response:

```json
{
  "status": "ok",
  "node": {
    "name": "worker-1",
    "os": "linux"
  }
}
```

---

### `GET /api/v1/system`
Returns node identity + detected hardware.

Example response shape:

```json
{
  "node": {
    "name": "worker-1",
    "os": "linux"
  },
  "system": {
    "total_ram_gb": 62.23,
    "available_ram_gb": 41.08,
    "cpu_cores": 14,
    "cpu_name": "Intel(R) Core(TM) Ultra 7 165U",
    "has_gpu": false,
    "gpu_vram_gb": null,
    "gpu_name": null,
    "gpu_count": 0,
    "unified_memory": false,
    "backend": "CPU (x86)",
    "gpus": []
  }
}
```

---

### `GET /api/v1/models`
Returns filtered/sorted model-fit rows for this node.

Envelope shape:

```json
{
  "node": { "name": "worker-1", "os": "linux" },
  "system": { "...": "..." },
  "total_models": 23,
  "returned_models": 10,
  "filters": { "...": "echo of query state" },
  "models": [
    {
      "name": "Qwen/Qwen2.5-Coder-7B-Instruct",
      "provider": "Qwen",
      "parameter_count": "7B",
      "params_b": 7.0,
      "context_length": 32768,
      "use_case": "Coding",
      "category": "Coding",
      "release_date": "2025-03-14",
      "is_moe": false,
      "fit_level": "good",
      "fit_label": "Good",
      "run_mode": "gpu",
      "run_mode_label": "GPU",
      "score": 86.5,
      "score_components": {
        "quality": 87.0,
        "speed": 81.2,
        "fit": 90.1,
        "context": 88.0
      },
      "estimated_tps": 42.5,
      "runtime": "llamacpp",
      "runtime_label": "llama.cpp",
      "best_quant": "Q5_K_M",
      "memory_required_gb": 5.8,
      "memory_available_gb": 12.0,
      "utilization_pct": 48.3,
      "notes": [],
      "gguf_sources": []
    }
  ]
}
```

---

### `GET /api/v1/models/top`
Key scheduling endpoint. Same schema as `/api/v1/models`, but defaults to top 5 runnable entries.

Important behavior:
- Defaults `limit=5`.
- Excludes `too_tight` rows unless explicitly overridden (and top endpoint still keeps runnable semantics).

---

### `GET /api/v1/models/{name}`
Path-constrained search. Equivalent to a text search scoped by `{name}`.

Useful for:
- Client-side drilldown after selecting a model family.

## Query parameters

Supported on `/api/v1/models` and `/api/v1/models/top` (also `/api/v1/models/{name}`):

- `limit` (or alias `n`): max rows returned.
- `perfect`: `true|false` (when `true`, only perfect fits).
- `min_fit`: `perfect|good|marginal|too_tight`.
- `runtime`: `any|mlx|llamacpp`.
- `use_case`: `general|coding|reasoning|chat|multimodal|embedding`.
- `provider`: provider substring filter.
- `search`: free-text filter (name/provider/params/use-case/category).
- `sort`: `score|tps|params|mem|ctx|date|use_case`.
- `include_too_tight`: include unrunnable rows (defaults true for `/models`, false for `/models/top`).
- `max_context`: per-request context cap used by memory estimation.
- `force_runtime`: `mlx|llamacpp|vllm` — override automatic runtime selection during analysis (e.g. get llama.cpp recommendations on Apple Silicon instead of MLX).

## Error handling

Invalid filter values return HTTP 400:

```json
{
  "error": "invalid min_fit value: use perfect|good|marginal|too_tight"
}
```

Server errors return HTTP 500 with `{"error": "..."}`.

## Client integration recommendations

### 1) Polling pattern for schedulers
For each node agent:
1. Call `/health`.
2. Call `/api/v1/system`.
3. Call `/api/v1/models/top?limit=K&min_fit=good`.
4. Attach node metadata and forward to your central scheduler.

### 2) Conservative placement defaults
For production placement, prefer:

```text
min_fit=good
include_too_tight=false
sort=score
limit=5..20
```

### 3) Per-workload targeting
Examples:
- Coding workloads: `use_case=coding`
- Embedding workloads: `use_case=embedding`
- Runtime constrained to llama.cpp fleet: `runtime=llamacpp`

### 4) Stable parsing
Treat unknown fields as forward-compatible additions:
- Parse required fields you depend on.
- Ignore unknown fields.

## Curl examples

```sh
curl http://127.0.0.1:8787/health
curl http://127.0.0.1:8787/api/v1/system
curl "http://127.0.0.1:8787/api/v1/models?limit=20&min_fit=marginal&sort=score"
curl "http://127.0.0.1:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"
curl "http://127.0.0.1:8787/api/v1/models/Mistral?runtime=any"
```

## Versioning notes

Current API prefix is `v1`.

If you build long-lived clients, pin to `/api/v1/...` and validate behavior with the local test script in `scripts/test_api.py`.


================================================
FILE: CHANGELOG.md
================================================
# Changelog

## [0.3.7](https://github.com/AlexsJones/llmfit/compare/v0.3.6...v0.3.7) (2026-02-21)


### Features

* add --memory flag to override GPU VRAM autodetection ([9a02f6e](https://github.com/AlexsJones/llmfit/commit/9a02f6e1616f59783ccff5b007c25213854f63b9))
* add --memory flag to override GPU VRAM autodetection ([39c5486](https://github.com/AlexsJones/llmfit/commit/39c5486aa3d94f9b9ef36e29642b64d848d0d2b0))
* add 15 popular models from HuggingFace ([128a020](https://github.com/AlexsJones/llmfit/commit/128a020323897a67ed5d12dd397bcf4924a6bf51))
* Add 15 popular models from HuggingFace (33→48 models) ([c45606b](https://github.com/AlexsJones/llmfit/commit/c45606bdb235b6bfe616bb616b1364a97e76f0c1))
* add homebrew tap support and update release workflow ([db09473](https://github.com/AlexsJones/llmfit/commit/db094734288d17a49d9c3c5c99859fe0d7dc976d))
* added arc support ([b5892fc](https://github.com/AlexsJones/llmfit/commit/b5892fc2ff313e71f57b7d793c7444d2aaadc0bd))
* added logo ([c21d416](https://github.com/AlexsJones/llmfit/commit/c21d4168f2bcd6da878848f9a6f97179d558606b))
* added moe ([ac7ffe4](https://github.com/AlexsJones/llmfit/commit/ac7ffe4ed79eb22ec43cf7bc20e8cd8d102d16a9))
* adding release please ([f2bfc7f](https://github.com/AlexsJones/llmfit/commit/f2bfc7fcf2587b74e05d8ad9d1041be6de456e69))
* append (WSL) to RAM label in tui when running under WSL ([e0397cf](https://github.com/AlexsJones/llmfit/commit/e0397cf51025b393b0d4024c4ae67200ee206390))
* caught some unavailable models on ollama ([b9f38da](https://github.com/AlexsJones/llmfit/commit/b9f38da9579040a7c2bada55838c5541474883ca))
* caught some unavailable models on ollama ([c0f7c20](https://github.com/AlexsJones/llmfit/commit/c0f7c20f61cdd9ae692de6ca66344befba2fafa9))
* detect installed Ollama models and support pulling from TUI ([4159aaf](https://github.com/AlexsJones/llmfit/commit/4159aaf304b3b421679f8231cf574465783d5b41))
* first pass ([855ad3d](https://github.com/AlexsJones/llmfit/commit/855ad3d34160cce6200c0ff128c34bcdcb0b922b))
* fixed up skill ([fcb712a](https://github.com/AlexsJones/llmfit/commit/fcb712a98ac785ad83ad689d5300f17cb80a3f1c))
* fixed up skill ([1f7d1de](https://github.com/AlexsJones/llmfit/commit/1f7d1de547a31202b9d34dd62bf543f5a22b2de7))
* fixing vram on apple bug ([5e08754](https://github.com/AlexsJones/llmfit/commit/5e087549c7c1523f4d5df72bd8a915330498a795))
* fixing vram on apple bug ([b3deca1](https://github.com/AlexsJones/llmfit/commit/b3deca1d9eac16283d0e9269c68a1af1dfc871ab))
* fixing vram on apple bug ([92ddb0e](https://github.com/AlexsJones/llmfit/commit/92ddb0e82579c6018d1acb4e3dfbe1df7d582605))
* fixing vram on apple bug ([42b2081](https://github.com/AlexsJones/llmfit/commit/42b2081577bed23176c0f87d1ad0b142cce23872))
* improvements based on [#12](https://github.com/AlexsJones/llmfit/issues/12) ([5428ef8](https://github.com/AlexsJones/llmfit/commit/5428ef8cdd42e88bced1459b55b480aab767637c))
* increased model count ([156b29d](https://github.com/AlexsJones/llmfit/commit/156b29deb077a1d66948254b370597a118fd5daf))
* increment version ([283bebb](https://github.com/AlexsJones/llmfit/commit/283bebb8eca5da2fc7124b665ae773fda48aed93))
* overall to the scoring system ([f475938](https://github.com/AlexsJones/llmfit/commit/f4759381d23b834e0a42a4699d23fb3f858fe677))
* overall to the scoring system ([b0696cf](https://github.com/AlexsJones/llmfit/commit/b0696cf297f1cb11247493355406d8b9c56510db))
* overall to the scoring system ([37e2e10](https://github.com/AlexsJones/llmfit/commit/37e2e10076f450f79165d92541baf04957ec2fe9))
* plumbing 2 ([1c615bb](https://github.com/AlexsJones/llmfit/commit/1c615bb57b7395f9be888245f8157dec2bab8bb4))
* plumbing 2 ([dd6a3ec](https://github.com/AlexsJones/llmfit/commit/dd6a3ec20e09ae72eada1fada73a6392c9673221))
* pull functionality ([923e7e7](https://github.com/AlexsJones/llmfit/commit/923e7e7463dd2bd53b6438ad3c8f2eb1f7a45af4))
* release plumbing ([7d21719](https://github.com/AlexsJones/llmfit/commit/7d217192bc1638f7ff69a22c2467d7d86da96641))
* release plumbing ([3accbb4](https://github.com/AlexsJones/llmfit/commit/3accbb42c99321fb6f8ade9d2f07af0fee93ed9e))
* reworked available models for download ([9adc84f](https://github.com/AlexsJones/llmfit/commit/9adc84f3041dca14fdcdc4437409b2b81eaca5a3))
* support for windows vulkan ([cc0fd61](https://github.com/AlexsJones/llmfit/commit/cc0fd619fa31e01c398c3c23f45aa915005670c8))
* supporting 94 models ([a652be3](https://github.com/AlexsJones/llmfit/commit/a652be31dd0cbe36f89572de7022e2a145fb3788))
* updated build actions ([1e65fdd](https://github.com/AlexsJones/llmfit/commit/1e65fddecb5f183870ddf1aa865dcaddba47523a))
* updated images ([9141109](https://github.com/AlexsJones/llmfit/commit/9141109f753ef38eb2b2eb5c604edb6ee0d7e371))
* updated models ([2d6c1d6](https://github.com/AlexsJones/llmfit/commit/2d6c1d66708186c0a21cb2f082a5b4e2fb03db90))
* updated tui to support multiple providers better and also multiple GPU support ([a3ca0bd](https://github.com/AlexsJones/llmfit/commit/a3ca0bd64647fa958c15bb7038a9e02df175fe67))
* updated urls ([f75ec27](https://github.com/AlexsJones/llmfit/commit/f75ec2750f325ff73725e5b8b194ba854c8579e9))
* updated version ([2cfc73e](https://github.com/AlexsJones/llmfit/commit/2cfc73ebdb6214f801e32880ff6451b2809bbb45))


### Bug Fixes

* correctly estimate VRAM for APU integrated GPUs ([72c8cb0](https://github.com/AlexsJones/llmfit/commit/72c8cb0e7873e0a8bcf4a10aee877bc38555299c))
* correctly estimate VRAM for APU integrated GPUs (Radeon Graphics) ([8da5c2a](https://github.com/AlexsJones/llmfit/commit/8da5c2a0443b73a3ac78ac087b0f08acdba6aaa9)), closes [#25](https://github.com/AlexsJones/llmfit/issues/25)
* update OpenClaw skill to match actual CLI output ([f38a0e5](https://github.com/AlexsJones/llmfit/commit/f38a0e56ef332bde8f3b03f8b06b5982fe90c1cc))
* update OpenClaw skill to match actual CLI output ([e1adbfd](https://github.com/AlexsJones/llmfit/commit/e1adbfd0abd786bc7a99496f20a7f81070bc8fe3))

## [0.3.6](https://github.com/AlexsJones/llmfit/compare/llmfit-v0.3.5...llmfit-v0.3.6) (2026-02-21)


### Features

* release plumbing ([7d21719](https://github.com/AlexsJones/llmfit/commit/7d217192bc1638f7ff69a22c2467d7d86da96641))
* release plumbing ([3accbb4](https://github.com/AlexsJones/llmfit/commit/3accbb42c99321fb6f8ade9d2f07af0fee93ed9e))

## [0.3.5](https://github.com/AlexsJones/llmfit/compare/llmfit-v0.3.4...llmfit-v0.3.5) (2026-02-21)


### Features

* add --memory flag to override GPU VRAM autodetection ([9a02f6e](https://github.com/AlexsJones/llmfit/commit/9a02f6e1616f59783ccff5b007c25213854f63b9))
* add --memory flag to override GPU VRAM autodetection ([39c5486](https://github.com/AlexsJones/llmfit/commit/39c5486aa3d94f9b9ef36e29642b64d848d0d2b0))
* add 15 popular models from HuggingFace ([128a020](https://github.com/AlexsJones/llmfit/commit/128a020323897a67ed5d12dd397bcf4924a6bf51))
* Add 15 popular models from HuggingFace (33→48 models) ([c45606b](https://github.com/AlexsJones/llmfit/commit/c45606bdb235b6bfe616bb616b1364a97e76f0c1))
* add homebrew tap support and update release workflow ([db09473](https://github.com/AlexsJones/llmfit/commit/db094734288d17a49d9c3c5c99859fe0d7dc976d))
* added arc support ([b5892fc](https://github.com/AlexsJones/llmfit/commit/b5892fc2ff313e71f57b7d793c7444d2aaadc0bd))
* added logo ([c21d416](https://github.com/AlexsJones/llmfit/commit/c21d4168f2bcd6da878848f9a6f97179d558606b))
* added moe ([ac7ffe4](https://github.com/AlexsJones/llmfit/commit/ac7ffe4ed79eb22ec43cf7bc20e8cd8d102d16a9))
* adding release please ([f2bfc7f](https://github.com/AlexsJones/llmfit/commit/f2bfc7fcf2587b74e05d8ad9d1041be6de456e69))
* append (WSL) to RAM label in tui when running under WSL ([e0397cf](https://github.com/AlexsJones/llmfit/commit/e0397cf51025b393b0d4024c4ae67200ee206390))
* caught some unavailable models on ollama ([b9f38da](https://github.com/AlexsJones/llmfit/commit/b9f38da9579040a7c2bada55838c5541474883ca))
* caught some unavailable models on ollama ([c0f7c20](https://github.com/AlexsJones/llmfit/commit/c0f7c20f61cdd9ae692de6ca66344befba2fafa9))
* detect installed Ollama models and support pulling from TUI ([4159aaf](https://github.com/AlexsJones/llmfit/commit/4159aaf304b3b421679f8231cf574465783d5b41))
* first pass ([855ad3d](https://github.com/AlexsJones/llmfit/commit/855ad3d34160cce6200c0ff128c34bcdcb0b922b))
* fixed up skill ([fcb712a](https://github.com/AlexsJones/llmfit/commit/fcb712a98ac785ad83ad689d5300f17cb80a3f1c))
* fixed up skill ([1f7d1de](https://github.com/AlexsJones/llmfit/commit/1f7d1de547a31202b9d34dd62bf543f5a22b2de7))
* fixing vram on apple bug ([5e08754](https://github.com/AlexsJones/llmfit/commit/5e087549c7c1523f4d5df72bd8a915330498a795))
* fixing vram on apple bug ([b3deca1](https://github.com/AlexsJones/llmfit/commit/b3deca1d9eac16283d0e9269c68a1af1dfc871ab))
* fixing vram on apple bug ([92ddb0e](https://github.com/AlexsJones/llmfit/commit/92ddb0e82579c6018d1acb4e3dfbe1df7d582605))
* fixing vram on apple bug ([42b2081](https://github.com/AlexsJones/llmfit/commit/42b2081577bed23176c0f87d1ad0b142cce23872))
* improvements based on [#12](https://github.com/AlexsJones/llmfit/issues/12) ([5428ef8](https://github.com/AlexsJones/llmfit/commit/5428ef8cdd42e88bced1459b55b480aab767637c))
* increased model count ([156b29d](https://github.com/AlexsJones/llmfit/commit/156b29deb077a1d66948254b370597a118fd5daf))
* increment version ([283bebb](https://github.com/AlexsJones/llmfit/commit/283bebb8eca5da2fc7124b665ae773fda48aed93))
* overall to the scoring system ([f475938](https://github.com/AlexsJones/llmfit/commit/f4759381d23b834e0a42a4699d23fb3f858fe677))
* overall to the scoring system ([b0696cf](https://github.com/AlexsJones/llmfit/commit/b0696cf297f1cb11247493355406d8b9c56510db))
* overall to the scoring system ([37e2e10](https://github.com/AlexsJones/llmfit/commit/37e2e10076f450f79165d92541baf04957ec2fe9))
* pull functionality ([923e7e7](https://github.com/AlexsJones/llmfit/commit/923e7e7463dd2bd53b6438ad3c8f2eb1f7a45af4))
* reworked available models for download ([9adc84f](https://github.com/AlexsJones/llmfit/commit/9adc84f3041dca14fdcdc4437409b2b81eaca5a3))
* support for windows vulkan ([cc0fd61](https://github.com/AlexsJones/llmfit/commit/cc0fd619fa31e01c398c3c23f45aa915005670c8))
* supporting 94 models ([a652be3](https://github.com/AlexsJones/llmfit/commit/a652be31dd0cbe36f89572de7022e2a145fb3788))
* updated build actions ([1e65fdd](https://github.com/AlexsJones/llmfit/commit/1e65fddecb5f183870ddf1aa865dcaddba47523a))
* updated images ([9141109](https://github.com/AlexsJones/llmfit/commit/9141109f753ef38eb2b2eb5c604edb6ee0d7e371))
* updated models ([2d6c1d6](https://github.com/AlexsJones/llmfit/commit/2d6c1d66708186c0a21cb2f082a5b4e2fb03db90))
* updated tui to support multiple providers better and also multiple GPU support ([a3ca0bd](https://github.com/AlexsJones/llmfit/commit/a3ca0bd64647fa958c15bb7038a9e02df175fe67))
* updated urls ([f75ec27](https://github.com/AlexsJones/llmfit/commit/f75ec2750f325ff73725e5b8b194ba854c8579e9))
* updated version ([2cfc73e](https://github.com/AlexsJones/llmfit/commit/2cfc73ebdb6214f801e32880ff6451b2809bbb45))


### Bug Fixes

* correctly estimate VRAM for APU integrated GPUs ([72c8cb0](https://github.com/AlexsJones/llmfit/commit/72c8cb0e7873e0a8bcf4a10aee877bc38555299c))
* correctly estimate VRAM for APU integrated GPUs (Radeon Graphics) ([8da5c2a](https://github.com/AlexsJones/llmfit/commit/8da5c2a0443b73a3ac78ac087b0f08acdba6aaa9)), closes [#25](https://github.com/AlexsJones/llmfit/issues/25)
* update OpenClaw skill to match actual CLI output ([f38a0e5](https://github.com/AlexsJones/llmfit/commit/f38a0e56ef332bde8f3b03f8b06b5982fe90c1cc))
* update OpenClaw skill to match actual CLI output ([e1adbfd](https://github.com/AlexsJones/llmfit/commit/e1adbfd0abd786bc7a99496f20a7f81070bc8fe3))


================================================
FILE: CNAME
================================================
llmfit.axjns.dev

================================================
FILE: Cargo.toml
================================================
[workspace]
members = ["llmfit-core", "llmfit-tui", "llmfit-desktop"]
default-members = ["llmfit-core", "llmfit-tui"]
resolver = "3"

[workspace.package]
version = "0.8.0"


================================================
FILE: Dockerfile
================================================
# Multi-stage build for llmfit
# Stage 1: Build the Rust binary
FROM rust:1.88-slim AS builder

# Install build dependencies
RUN apt-get update && apt-get install -y \
    pkg-config \
    libssl-dev \
    && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /build

# Copy workspace configuration
COPY Cargo.toml Cargo.lock ./

# Copy all workspace members
COPY llmfit-core/ ./llmfit-core/
COPY llmfit-tui/ ./llmfit-tui/
COPY llmfit-desktop/ ./llmfit-desktop/
COPY data/ ./data/

# Build release binary for llmfit-tui
RUN cargo build --release -p llmfit

# Stage 2: Runtime image
FROM debian:bookworm-slim

# Install runtime dependencies for hardware detection
RUN apt-get update && apt-get install -y \
    pciutils \
    lshw \
    && rm -rf /var/lib/apt/lists/*

# Copy the binary from builder
COPY --from=builder /build/target/release/llmfit /usr/local/bin/llmfit

# Create a non-root user
RUN useradd -m -u 1000 llmfit && \
    chown -R llmfit:llmfit /usr/local/bin/llmfit

USER llmfit

# Set default command to output JSON recommendations
# In Kubernetes, this will run once per node and log results
ENTRYPOINT ["/usr/local/bin/llmfit"]
CMD ["recommend", "--json"]


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2026 Alex Jones

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: MODELS.md
================================================
# Supported Models

llmfit ships with a curated database of 106 LLM models from HuggingFace. All memory estimates assume Q4_K_M quantization (0.5 bytes per parameter) unless noted otherwise.

### 01.ai

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) | 6.1B | Q4_K_M | 4k | Instruction following, chat |
| [01-ai/Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) | 34.4B | Q4_K_M | 4k | Instruction following, chat |

### Alibaba

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | 600M | Q4_K_M | 40k | Lightweight, edge deployment |
| [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) | 873M | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen3.5-0.8B-Base](https://huggingface.co/Qwen/Qwen3.5-0.8B-Base) | 873M | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 1.5B | Q4_K_M | 32k | Code generation and completion |
| [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | 1.7B | Q4_K_M | 40k | Lightweight, edge deployment |
| [Qwen/Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B) | 2.3B | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen3.5-2B-Base](https://huggingface.co/Qwen/Qwen3.5-2B-Base) | 2.3B | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 3.8B | Q4_K_M | 32k | Multimodal, vision and text |
| [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) | 4.0B | Q4_K_M | 40k | General purpose text generation |
| [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) | 4.7B | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen3.5-4B-Base](https://huggingface.co/Qwen/Qwen3.5-4B-Base) | 4.7B | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | 7.6B | Q4_K_M | 32k | Instruction following, chat |
| [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) | 7.6B | Q4_K_M | 32k | Code generation and completion |
| [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | 8.2B | Q4_K_M | 40k | General purpose text generation |
| [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 8.3B | Q4_K_M | 32k | Multimodal, vision and text |
| [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) | 9.7B | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen3.5-9B-Base](https://huggingface.co/Qwen/Qwen3.5-9B-Base) | 9.7B | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | 14.8B | Q4_K_M | 128k | Instruction following, chat |
| [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) | 14.8B | Q4_K_M | 128k | General purpose text generation |
| [Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) | 14.8B | Q4_K_M | 32k | Code generation and completion |
| [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) | 27.8B | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) | 30.5B (MoE) | Q4_K_M | 40k | Efficient MoE, general purpose |
| [Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) | 36.0B (MoE) | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 32.5B | Q4_K_M | 128k | Instruction following, chat |
| [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | 32.8B | Q4_K_M | 40k | General purpose text generation |
| [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) | 32.8B | Q4_K_M | 32k | Code generation and completion |
| [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) | 72.7B | Q4_K_M | 32k | Instruction following, chat |
| [Qwen/Qwen3.5-122B-A10B](https://huggingface.co/Qwen/Qwen3.5-122B-A10B) | 125.1B (MoE) | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) | 235B (MoE) | Q4_K_M | 40k | State-of-the-art, MoE architecture |
| [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) | 403.4B (MoE) | Q4_K_M | 256k | Multimodal, vision and text |
| [Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct) | 480B (MoE) | Q4_K_M | 256k | Code generation and completion |

### Allen Institute

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [allenai/OLMo-2-0325-32B-Instruct](https://huggingface.co/allenai/OLMo-2-0325-32B-Instruct) | 32B | Q4_K_M | 4k | Fully open-source, instruction following |

### Ant Group

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [inclusionAI/Ling-lite](https://huggingface.co/inclusionAI/Ling-lite) | 16.8B (MoE) | Q4_K_M | 128k | Efficient MoE, general purpose |

### BAAI

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 335M | Q4_K_M | 512 | Text embeddings for RAG |

### Baidu

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [baidu/ERNIE-4.5-300B-A47B-Paddle](https://huggingface.co/baidu/ERNIE-4.5-300B-A47B-Paddle) | 300B (MoE) | Q4_K_M | 128k | Multilingual, reasoning |

### BigCode

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | 7.2B | Q4_K_M | 16k | Code generation and completion |
| [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b) | 15.7B | Q4_K_M | 16k | Code generation and completion |

### BigScience

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [bigscience/bloom](https://huggingface.co/bigscience/bloom) | 176B | Q4_K_M | 2k | Multilingual text generation |

### Cohere

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01) | 35B | Q4_K_M | 128k | RAG, tool use, agents |

### Community

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) | 1.1B | Q4_K_M | 2k | Instruction following, chat |

### DeepSeek

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | 7.6B | Q4_K_M | 128k | Advanced reasoning, chain-of-thought |
| [deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct) | 16B (MoE) | Q4_K_M | 128k | Code generation and completion |
| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 32.8B | Q4_K_M | 128k | Advanced reasoning, chain-of-thought |
| [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) | 671B (MoE) | Q4_K_M | 128k | Advanced reasoning, chain-of-thought |
| [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) | 685B (MoE) | Q4_K_M | 128k | State-of-the-art, MoE architecture |

### Google

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it) | 1B | Q4_K_M | 32k | Lightweight, edge deployment |
| [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) | 2.6B | Q4_K_M | 4k | General purpose text generation |
| [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) | 4B | Q4_K_M | 128k | Lightweight, general purpose |
| [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) | 9.2B | Q4_K_M | 4k | General purpose text generation |
| [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) | 12B | Q4_K_M | 128k | Multimodal, vision and text |
| [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it) | 27B | Q4_K_M | 128k | General purpose text generation |
| [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) | 27.2B | Q4_K_M | 4k | General purpose text generation |

### HuggingFace

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 7.2B | Q4_K_M | 32k | General purpose text generation |

### IBM

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [ibm-granite/granite-4.0-h-micro](https://huggingface.co/ibm-granite/granite-4.0-h-micro) | 3B | Q4_K_M | 128k | Enterprise, hybrid Mamba/transformer |
| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | 7B (MoE) | Q4_K_M | 128k | Enterprise, hybrid Mamba/transformer |
| [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) | 8.1B | Q4_K_M | 128k | Enterprise, instruction following |
| [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small) | 32B (MoE) | Q4_K_M | 128k | Enterprise, hybrid Mamba/transformer |

### LMSYS

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) | 7.0B | Q4_K_M | 4k | Instruction following, chat |
| [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) | 13.0B | Q4_K_M | 4k | Instruction following, chat |

### Meituan

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [meituan/LongCat-Flash](https://huggingface.co/meituan/LongCat-Flash) | 560B (MoE) | Q4_K_M | 512k | Long context MoE |

### Meta

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) | 1.2B | Q4_K_M | 4k | General purpose text generation |
| [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) | 3.2B | Q4_K_M | 4k | General purpose text generation |
| [meta-llama/CodeLlama-7b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf) | 6.7B | Q4_K_M | 4k | Code generation and completion |
| [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | 8.0B | Q4_K_M | 4k | General purpose text generation |
| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | 8.0B | Q4_K_M | 4k | Instruction following, chat |
| [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) | 10.7B | Q4_K_M | 4k | Instruction following, chat |
| [meta-llama/CodeLlama-13b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-13b-Instruct-hf) | 13.0B | Q4_K_M | 4k | Code generation and completion |
| [meta-llama/CodeLlama-34b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf) | 33.7B | Q4_K_M | 4k | Code generation and completion |
| [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | 70.6B | Q4_K_M | 4k | Instruction following, chat |
| [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | 70.6B | Q4_K_M | 128k | Instruction following, chat |
| [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) | 109B (MoE) | Q4_K_M | 128k | Multimodal, vision and text |
| [meta-llama/Llama-4-Maverick-17B-128E-Instruct](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct) | 400B (MoE) | Q4_K_M | 128k | Multimodal, vision and text |
| [meta-llama/Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) | 405.9B | Q4_K_M | 4k | Instruction following, chat |

### Microsoft

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [microsoft/phi-3-mini-4k-instruct](https://huggingface.co/microsoft/phi-3-mini-4k-instruct) | 3.8B | Q4_K_M | 4k | Lightweight, edge deployment |
| [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | 3.8B | Q4_K_M | 128k | Lightweight, long context |
| [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) | 3.8B | Q4_K_M | 128k | Lightweight, edge deployment |
| [microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b) | 7.0B | Q4_K_M | 4k | Reasoning, step-by-step solutions |
| [microsoft/Orca-2-13b](https://huggingface.co/microsoft/Orca-2-13b) | 13.0B | Q4_K_M | 4k | Reasoning, step-by-step solutions |
| [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) | 14B | Q4_K_M | 16k | Reasoning, STEM, code generation |
| [microsoft/Phi-3-medium-14b-instruct](https://huggingface.co/microsoft/Phi-3-medium-14b-instruct) | 14B | Q4_K_M | 4k | Balanced performance and size |

### Mistral AI

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) | 7.2B | Q4_K_M | 32k | Instruction following, chat |
| [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410) | 8.0B | Q4_K_M | 32k | Instruction following, chat |
| [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) | 12.2B | Q4_K_M | 128k | Instruction following, chat |
| [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) | 24B | Q4_K_M | 32k | Instruction following, chat |
| [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) | 24B | Q4_K_M | 128k | Multimodal, vision and text |
| [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) | 46.7B (MoE) | Q4_K_M | 32k | Instruction following, chat |
| [mistralai/Mistral-Large-Instruct-2407](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407) | 123B | Q4_K_M | 128k | Large-scale instruction following |
| [mistralai/Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) | 140.6B (MoE) | Q4_K_M | 64k | Large MoE, instruction following |

### Moonshot

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [moonshotai/Kimi-K2-Instruct](https://huggingface.co/moonshotai/Kimi-K2-Instruct) | 1000B (MoE) | Q4_K_M | 128k | Large MoE, reasoning |

### Nomic

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) | 137M | F16 | 8k | Text embeddings for RAG |

### NousResearch

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO) | 46.7B (MoE) | Q4_K_M | 32k | General purpose text generation |

### OpenChat

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [openchat/openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) | 7.0B | Q4_K_M | 8k | Instruction following, chat |

### Rednote

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [rednote-hilab/dots.llm1.inst](https://huggingface.co/rednote-hilab/dots.llm1.inst) | 142B (MoE) | Q4_K_M | 128k | MoE, general purpose |

### Stability AI

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [stabilityai/stablelm-2-1_6b-chat](https://huggingface.co/stabilityai/stablelm-2-1_6b-chat) | 1.6B | Q4_K_M | 4k | Instruction following, chat |

### TII

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) | 7.2B | Q4_K_M | 4k | Instruction following, chat |
| [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) | 7.5B | Q4_K_M | 32k | Instruction following, chat |
| [tiiuae/Falcon3-10B-Instruct](https://huggingface.co/tiiuae/Falcon3-10B-Instruct) | 10.3B | Q4_K_M | 32k | Instruction following, chat |
| [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) | 40.0B | Q4_K_M | 2k | Instruction following, chat |
| [tiiuae/falcon-180B-chat](https://huggingface.co/tiiuae/falcon-180B-chat) | 180B | Q4_K_M | 2k | Large-scale instruction following |

### Upstage

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) | 10.7B | Q4_K_M | 4k | High-performance instruction following |

### WizardLM

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [WizardLMTeam/WizardLM-13B-V1.2](https://huggingface.co/WizardLMTeam/WizardLM-13B-V1.2) | 13.0B | Q4_K_M | 4k | Instruction following, chat |
| [WizardLMTeam/WizardCoder-15B-V1.0](https://huggingface.co/WizardLMTeam/WizardCoder-15B-V1.0) | 15.5B | Q4_K_M | 8k | Code generation and completion |

### xAI

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [xai-org/grok-1](https://huggingface.co/xai-org/grok-1) | 314B (MoE) | Q4_K_M | 8k | Large MoE, general purpose |

### Zhipu AI

| Model | Parameters | Quantization | Context | Use Case |
|-------|-----------|--------------|---------|----------|
| [THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) | 9B | Q4_K_M | 128k | Multilingual, instruction following |


================================================
FILE: Makefile
================================================
# Makefile for llmfit
# Convenience commands for building, testing, and updating the model database

.PHONY: help build release clean run test update-models update-docker-models update-catalogs check fmt clippy install

# Default target
help:
	@echo "llmfit - LLM Model Fit Analyzer"
	@echo ""
	@echo "Available targets:"
	@echo "  make build          - Build debug binary"
	@echo "  make release        - Build release binary"
	@echo "  make run            - Run in TUI mode (debug)"
	@echo "  make test           - Run all unit tests"
	@echo "  make update-models  - Fetch latest model data from HuggingFace"
	@echo "  make update-docker-models - Refresh Docker Model Runner catalog"
	@echo "  make update-catalogs - Refresh all catalogs (HF models + Docker) and rebuild"
	@echo "  make check          - Run cargo check"
	@echo "  make fmt            - Format code with rustfmt"
	@echo "  make clippy         - Run clippy linter"
	@echo "  make clean          - Remove build artifacts"
	@echo "  make install        - Install release binary to ~/.cargo/bin"
	@echo ""

# Build debug version
build:
	cargo build

# Build release version
release:
	cargo build --release

# Clean build artifacts
clean:
	cargo clean

# Run in TUI mode
run:
	cargo run

# Run tests
test:
	cargo test

# Update model database from HuggingFace
update-models:
	@./scripts/update_models.sh

# Refresh Docker Model Runner catalog from Docker Hub
update-docker-models:
	python3 scripts/scrape_docker_models.py

# Refresh all catalogs (HF models + Docker) and rebuild
# Runs HF scraper first (via update_models.sh which also rebuilds),
# then Docker scraper (which depends on hf_models.json), then rebuilds again
# to embed the updated Docker catalog.
update-catalogs:
	@./scripts/update_models.sh
	python3 scripts/scrape_docker_models.py
	cargo build --release

# Check compilation without building
check:
	cargo check

# Format code
fmt:
	cargo fmt

# Run clippy
clippy:
	cargo clippy -- -D warnings

# Install to ~/.cargo/bin
install:
	cargo install --path .


================================================
FILE: README.md
================================================
# llmfit

<p align="center">
  <img src="assets/icon.svg" alt="llmfit icon" width="128" height="128">
</p>

<p align="center">
  <b>English</b> ·
  <a href="README.zh.md">中文</a>
</p>

<p align="center">
  <a href="https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml"><img src="https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://crates.io/crates/llmfit"><img src="https://img.shields.io/crates/v/llmfit.svg" alt="Crates.io"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"></a>
</p>

**Hundreds of models & providers. One command to find what runs on your hardware.**

A terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU. Detects your hardware, scores each model across quality, speed, fit, and context dimensions, and tells you which ones will actually run well on your machine.

Ships with an interactive TUI (default) and a classic CLI mode. Supports multi-GPU setups, MoE architectures, dynamic quantization selection, speed estimation, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).

> **Sister project:** Check out [sympozium](https://github.com/AlexsJones/sympozium/) for managing agents in Kubernetes.

![demo](demo.gif)

---

## Install

### Windows
```sh
scoop install llmfit
```

If Scoop is not installed, follow the [Scoop installation guide](https://scoop.sh/).

### macOS / Linux

#### Homebrew
```sh
brew install llmfit
```

#### Quick install
```sh
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
```

Downloads the latest release binary from GitHub and installs it to `/usr/local/bin` (or `~/.local/bin` if no sudo).

**Install to `~/.local/bin` without sudo:**
```sh
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local
```

### Docker / Podman
```sh
docker run ghcr.io/alexsjones/llmfit
```
This prints JSON from `llmfit recommend` command. The JSON could be further queried with `jq`.
```
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
```

### From source
```sh
git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# binary is at target/release/llmfit
```

---

## Usage

### TUI (default)

```sh
llmfit
```

Launches the interactive terminal UI. Your system specs (CPU, RAM, GPU name, VRAM, backend) are shown at the top. Models are listed in a scrollable table sorted by composite score. Each row shows the model's score, estimated tok/s, best quantization for your hardware, run mode, memory usage, and use-case category.

| Key                        | Action                                                                |
|----------------------------|-----------------------------------------------------------------------|
| `Up` / `Down` or `j` / `k` | Navigate models                                                       |
| `/`                        | Enter search mode (partial match on name, provider, params, use case) |
| `Esc` or `Enter`           | Exit search mode                                                      |
| `Ctrl-U`                   | Clear search                                                          |
| `f`                        | Cycle fit filter: All, Runnable, Perfect, Good, Marginal              |
| `a`                        | Cycle availability filter: All, GGUF Avail, Installed                 |
| `s`                        | Cycle sort column: Score, Params, Mem%, Ctx, Date, Use Case           |
| `v`                        | Enter Visual mode (select multiple models)                            |
| `V`                        | Enter Select mode (column-based filtering)                            |
| `t`                        | Cycle color theme (saved automatically)                               |
| `p`                        | Open Plan mode for selected model (hardware planning)                 |
| `P`                        | Open provider filter popup                                            |
| `U`                        | Open use-case filter popup                                            |
| `C`                        | Open capability filter popup                                          |
| `m`                        | Mark selected model for compare                                       |
| `c`                        | Open compare view (marked vs selected)                                |
| `x`                        | Clear compare mark                                                    |
| `i`                        | Toggle installed-first sorting (any detected runtime provider)        |
| `d`                        | Download selected model (provider picker when multiple are available) |
| `r`                        | Refresh installed models from runtime providers                       |
| `Enter`                    | Toggle detail view for selected model                                 |
| `PgUp` / `PgDn`            | Scroll by 10                                                          |
| `g` / `G`                  | Jump to top / bottom                                                  |
| `q`                        | Quit                                                                  |

### Vim-like modes

The TUI uses Vim-inspired modes shown in the bottom-left status bar. The current mode determines which keys are active.

#### Normal mode

The default mode. Navigate, search, filter, and open views. All keys in the table above apply here.

#### Visual mode (`v`)

Select a contiguous range of models for bulk comparison. Press `v` to anchor at the current row, then navigate with `j`/`k` or arrow keys to extend the selection. Selected rows are highlighted.

| Key                 | Action                                                 |
|---------------------|--------------------------------------------------------|
| `j` / `k` or arrows | Extend selection up/down                               |
| `c`                 | Compare all selected models (opens multi-compare view) |
| `m`                 | Mark current model for two-model compare               |
| `Esc` or `v`        | Exit Visual mode                                       |

The multi-compare view displays a table where rows are attributes (Score, tok/s, Fit, Mem%, Params, Mode, Context, Quant, etc.) and columns are models. Best values are highlighted. Use `h`/`l` or arrow keys to scroll horizontally if more models are selected than fit on screen.

#### Select mode (`V`)

Column-based filtering. Press `V` (shift-v) to enter Select mode, then use `h`/`l` or arrow keys to move between column headers. The active column is visually highlighted. Press `Enter` or `Space` to activate the appropriate filter for that column:

| Column                        | Filter action                                                             |
|-------------------------------|---------------------------------------------------------------------------|
| Inst                          | Cycle availability filter                                                 |
| Model                         | Enter search mode                                                         |
| Provider                      | Open provider popup                                                       |
| Params                        | Open parameter-size bucket popup (<3B, 3-7B, 7-14B, 14-30B, 30-70B, 70B+) |
| Score, tok/s, Mem%, Ctx, Date | Sort by that column                                                       |
| Quant                         | Open quantization popup                                                   |
| Mode                          | Open run-mode popup (GPU, MoE, CPU+GPU, CPU)                              |
| Fit                           | Cycle fit filter                                                          |
| Use Case                      | Open use-case popup                                                       |

Row navigation (`j`/`k`) still works in Select mode so you can see the effect of filters as you apply them. Press `Esc` to return to Normal mode.

### TUI Plan mode (`p`)

Plan mode inverts normal fit analysis: instead of asking "what fits my hardware?", it estimates "what hardware is needed for this model config?".

Use `p` on a selected row, then:

| Key                    | Action                                                    |
|------------------------|-----------------------------------------------------------|
| `Tab` / `j` / `k`      | Move between editable fields (Context, Quant, Target TPS) |
| `Left` / `Right`       | Move cursor in current field                              |
| Type                   | Edit current field                                        |
| `Backspace` / `Delete` | Remove characters                                         |
| `Ctrl-U`               | Clear current field                                       |
| `Esc` or `q`           | Exit Plan mode                                            |

Plan mode shows estimates for:
- minimum and recommended VRAM/RAM/CPU cores
- feasible run paths (GPU, CPU offload, CPU-only)
- upgrade deltas to reach better fit targets

### Themes

Press `t` to cycle through 10 built-in color themes. Your selection is saved automatically to `~/.config/llmfit/theme` and restored on next launch.

| Theme                    | Description                                       |
|--------------------------|---------------------------------------------------|
| **Default**              | Original llmfit colors                            |
| **Dracula**              | Dark purple background with pastel accents        |
| **Solarized**            | Ethan Schoonover's Solarized Dark palette         |
| **Nord**                 | Arctic, cool blue-gray tones                      |
| **Monokai**              | Monokai Pro warm syntax colors                    |
| **Gruvbox**              | Retro groove palette with warm earth tones        |
| **Catppuccin Latte**     | 🌻 Light theme — harmonious pastel inversion      |
| **Catppuccin Frappé**    | 🪴 Low-contrast dark — muted, subdued aesthetic   |
| **Catppuccin Macchiato** | 🌺 Medium-contrast dark — gentle, soothing tones  |
| **Catppuccin Mocha**     | 🌿 Darkest variant — cozy with color-rich accents |

### Web dashboard

When you run `llmfit` in non-JSON mode, it automatically starts a background web dashboard on `0.0.0.0:8787`. Open it in any browser on the same network:

```
http://<your-machine-ip>:8787
```

Override the host or port with environment variables:

```sh
LLMFIT_DASHBOARD_HOST=0.0.0.0 LLMFIT_DASHBOARD_PORT=9000 llmfit
```

| Variable | Default | Description |
|---|---|---|
| `LLMFIT_DASHBOARD_HOST` | `0.0.0.0` | Interface to bind the dashboard server |
| `LLMFIT_DASHBOARD_PORT` | `8787` | Port to bind the dashboard server |

To disable the auto-started dashboard, pass `--no-dashboard`:

```sh
llmfit --no-dashboard
```

### CLI mode

Use `--cli` or any subcommand to get classic table output:

```sh
# Table of all models ranked by fit
llmfit --cli

# Only perfectly fitting models, top 5
llmfit fit --perfect -n 5

# Show detected system specs
llmfit system

# List all models in the database
llmfit list

# Search by name, provider, or size
llmfit search "llama 8b"

# Detailed view of a single model
llmfit info "Mistral-7B"

# Top 5 recommendations (JSON, for agent/script consumption)
llmfit recommend --json --limit 5

# Recommendations filtered by use case
llmfit recommend --json --use-case coding --limit 3

# Force a specific runtime (bypass automatic MLX selection on Apple Silicon)
llmfit recommend --force-runtime llamacpp
llmfit recommend --force-runtime llamacpp --use-case coding --limit 3

# Plan required hardware for a specific model configuration
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json

# Run as a node-level REST API (for cluster schedulers / aggregators)
llmfit serve --host 0.0.0.0 --port 8787
```

### REST API (`llmfit serve`)

`llmfit serve` starts an HTTP API that exposes the same fit/scoring data used by TUI/CLI, including filtering and top-model selection for a node.

```sh
# Liveness
curl http://localhost:8787/health

# Node hardware info
curl http://localhost:8787/api/v1/system

# Full fit list with filters
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

# Key scheduling endpoint: top runnable models for this node
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

# Search by model name/provider text
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"
```

Supported query params for `models`/`models/top`:

- `limit` (or `n`): max number of rows returned
- `perfect`: `true|false` (forces perfect-only when `true`)
- `min_fit`: `perfect|good|marginal|too_tight`
- `runtime`: `any|mlx|llamacpp`
- `use_case`: `general|coding|reasoning|chat|multimodal|embedding`
- `provider`: provider text filter (substring)
- `search`: free-text filter across name/provider/size/use-case
- `sort`: `score|tps|params|mem|ctx|date|use_case`
- `include_too_tight`: include non-runnable rows (default `false` on `/top`, `true` on `/models`)
- `max_context`: per-request context cap for memory estimation
- `force_runtime`: `mlx|llamacpp|vllm` — override automatic runtime selection during analysis

Validate API behavior locally:

```sh
# spawn server automatically and run endpoint/schema/filter assertions
python3 scripts/test_api.py --spawn

# or test an already-running server
python3 scripts/test_api.py --base-url http://127.0.0.1:8787
```

### GPU memory override

GPU VRAM autodetection can fail on some systems (e.g. broken `nvidia-smi`, VMs, passthrough setups). Use `--memory` to manually specify your GPU's VRAM:

```sh
# Override with 32 GB VRAM
llmfit --memory=32G

# Megabytes also work (32000 MB ≈ 31.25 GB)
llmfit --memory=32000M

# Works with all modes: TUI, CLI, and subcommands
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --memory=24G system
llmfit --memory=24G info "Llama-3.1-70B"
llmfit --memory=24G recommend --json
```

Accepted suffixes: `G`/`GB`/`GiB` (gigabytes), `M`/`MB`/`MiB` (megabytes), `T`/`TB`/`TiB` (terabytes). Case-insensitive. If no GPU was detected, the override creates a synthetic GPU entry so models are scored for GPU inference.

### Context-length cap for estimation

Use `--max-context` to cap context length used for memory estimation (without changing each model's advertised maximum context):

```sh
# Estimate memory fit at 4K context
llmfit --max-context 4096 --cli

# Works with subcommands
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5
```

If `--max-context` is not set, llmfit will use `OLLAMA_CONTEXT_LENGTH` when available.

### JSON output

Add `--json` to any subcommand for machine-readable output:

```sh
llmfit --json system     # Hardware specs as JSON
llmfit --json fit -n 10  # Top 10 fits as JSON
llmfit recommend --json  # Top 5 recommendations (JSON is default for recommend)
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json
```

`plan` JSON includes stable fields for:
- request (`context`, `quantization`, `target_tps`)
- estimated minimum/recommended hardware
- per-path feasibility (`gpu`, `cpu_offload`, `cpu_only`)
- upgrade deltas

---

## How it works

1. **Hardware detection** -- Reads total/available RAM via `sysinfo`, counts CPU cores, and probes for GPUs:
   - **NVIDIA** -- Multi-GPU support via `nvidia-smi`. Aggregates VRAM across all detected GPUs. Falls back to VRAM estimation from GPU model name if reporting fails.
   - **AMD** -- Detected via `rocm-smi`.
   - **Intel Arc** -- Discrete VRAM via sysfs, integrated via `lspci`.
   - **Apple Silicon** -- Unified memory via `system_profiler`. VRAM = system RAM.
   - **Ascend** -- Detected via `npu-smi`.
   - **Backend detection** -- Automatically identifies the acceleration backend (CUDA, Metal, ROCm, SYCL, CPU ARM, CPU x86, Ascend) for speed estimation.

2. **Model database** -- Hundreds models sourced from the HuggingFace API, stored in `data/hf_models.json` and embedded at compile time. Memory requirements are computed from parameter counts across a quantization hierarchy (Q8_0 through Q2_K). VRAM is the primary constraint for GPU inference; system RAM is the fallback for CPU-only execution.

   **MoE support** -- Models with Mixture-of-Experts architectures (Mixtral, DeepSeek-V2/V3) are detected automatically. Only a subset of experts is active per token, so the effective VRAM requirement is much lower than total parameter count suggests. For example, Mixtral 8x7B has 46.7B total parameters but only activates ~12.9B per token, reducing VRAM from 23.9 GB to ~6.6 GB with expert offloading.

3. **Dynamic quantization** -- Instead of assuming a fixed quantization, llmfit tries the best quality quantization that fits your hardware. It walks a hierarchy from Q8_0 (best quality) down to Q2_K (most compressed), picking the highest quality that fits in available memory. If nothing fits at full context, it tries again at half context.

4. **Multi-dimensional scoring** -- Each model is scored across four dimensions (0–100 each):

   | Dimension   | What it measures                                                               |
   |-------------|--------------------------------------------------------------------------------|
   | **Quality** | Parameter count, model family reputation, quantization penalty, task alignment |
   | **Speed**   | Estimated tokens/sec based on backend, params, and quantization                |
   | **Fit**     | Memory utilization efficiency (sweet spot: 50–80% of available memory)         |
   | **Context** | Context window capability vs target for the use case                           |

   Dimensions are combined into a weighted composite score. Weights vary by use-case category (General, Coding, Reasoning, Chat, Multimodal, Embedding). For example, Chat weights Speed higher (0.35) while Reasoning weights Quality higher (0.55). Models are ranked by composite score, with unrunnable models (Too Tight) always at the bottom.

5. **Speed estimation** -- Token generation in LLM inference is memory-bandwidth-bound: each token requires reading the full model weights once from VRAM. When the GPU model is recognized, llmfit uses its actual memory bandwidth to estimate throughput:

   Formula: `(bandwidth_GB_s / model_size_GB) × efficiency_factor`

   The efficiency factor (0.55) accounts for kernel overhead, KV-cache reads, and memory controller effects. This approach is validated against published benchmarks from llama.cpp ([Apple Silicon](https://github.com/ggml-org/llama.cpp/discussions/4167), [NVIDIA T4](https://github.com/ggml-org/llama.cpp/discussions/4225)) and real-world measurements.

   The bandwidth lookup table covers ~80 GPUs across NVIDIA (consumer + datacenter), AMD (RDNA + CDNA), and Apple Silicon families.

   For unrecognized GPUs, llmfit falls back to per-backend speed constants:

   | Backend      | Speed constant |
   |--------------|----------------|
   | CUDA         | 220            |
   | Metal        | 160            |
   | ROCm         | 180            |
   | SYCL         | 100            |
   | CPU (ARM)    | 90             |
   | CPU (x86)    | 70             |
   | NPU (Ascend) | 390            |

   Fallback formula: `K / params_b × quant_speed_multiplier`, with penalties for CPU offload (0.5×), CPU-only (0.3×), and MoE expert switching (0.8×).

6. **Fit analysis** -- Each model is evaluated for memory compatibility:

   **Run modes:**
   - **GPU** -- Model fits in VRAM. Fast inference.
   - **MoE** -- Mixture-of-Experts with expert offloading. Active experts in VRAM, inactive in RAM.
   - **CPU+GPU** -- VRAM insufficient, spills to system RAM with partial GPU offload.
   - **CPU** -- No GPU. Model loaded entirely into system RAM.

   **Fit levels:**
   - **Perfect** -- Recommended memory met on GPU. Requires GPU acceleration.
   - **Good** -- Fits with headroom. Best achievable for MoE offload or CPU+GPU.
   - **Marginal** -- Tight fit, or CPU-only (CPU-only always caps here).
   - **Too Tight** -- Not enough VRAM or system RAM anywhere.

---

## Model database

The model list is generated by `scripts/scrape_hf_models.py`, a standalone Python script (stdlib only, no pip dependencies) that queries the HuggingFace REST API. Hundreds models & providers including Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, IBM Granite, Allen Institute OLMo, xAI Grok, Cohere, BigCode, 01.ai, Upstage, TII Falcon, HuggingFace, Zhipu GLM, Moonshot Kimi, Baidu ERNIE, and more. The scraper automatically detects MoE architectures via model config (`num_local_experts`, `num_experts_per_tok`) and known architecture mappings.

Model categories span general purpose, coding (CodeLlama, StarCoder2, WizardCoder, Qwen2.5-Coder, Qwen3-Coder), reasoning (DeepSeek-R1, Orca-2), multimodal/vision (Llama 3.2 Vision, Llama 4 Scout/Maverick, Qwen2.5-VL), chat, enterprise (IBM Granite), and embedding (nomic-embed, bge).

See [MODELS.md](MODELS.md) for the full list.

To refresh the model database:

```sh
# Automated update (recommended)
make update-models

# Or run the script directly
./scripts/update_models.sh

# Or manually
python3 scripts/scrape_hf_models.py
cargo build --release
```

The scraper writes `data/hf_models.json`, which is baked into the binary via `include_str!`. The automated update script backs up existing data, validates JSON output, and rebuilds the binary.

By default, the scraper enriches models with known GGUF download sources from providers like [unsloth](https://huggingface.co/unsloth) and [bartowski](https://huggingface.co/bartowski). Results are cached in `data/gguf_sources_cache.json` (7-day TTL) to avoid repeated API calls. Use `--no-gguf-sources` to skip enrichment for a faster scrape.

---

## Project structure

```
src/
  main.rs         -- CLI argument parsing, entrypoint, TUI launch
  hardware.rs     -- System RAM/CPU/GPU detection (multi-GPU, backend identification)
  models.rs       -- Model database, quantization hierarchy, dynamic quant selection
  fit.rs          -- Multi-dimensional scoring (Q/S/F/C), speed estimation, MoE offloading
  providers.rs    -- Runtime provider integration (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio), install detection, pull/download
  display.rs      -- Classic CLI table rendering + JSON output
  tui_app.rs      -- TUI application state, filters, navigation
  tui_ui.rs       -- TUI rendering (ratatui)
  tui_events.rs   -- TUI keyboard event handling (crossterm)
data/
  hf_models.json  -- Model database (206 models)
skills/
  llmfit-advisor/ -- OpenClaw skill for hardware-aware model recommendations
scripts/
  scrape_hf_models.py        -- HuggingFace API scraper
  update_models.sh            -- Automated database update script
  install-openclaw-skill.sh   -- Install the OpenClaw skill
Makefile           -- Build and maintenance commands
```

---

## Publishing to crates.io

The `Cargo.toml` already includes the required metadata (description, license, repository). To publish:

```sh
# Dry run first to catch issues
cargo publish --dry-run

# Publish for real (requires a crates.io API token)
cargo login
cargo publish
```

Before publishing, make sure:

- The version in `Cargo.toml` is correct (bump with each release).
- A `LICENSE` file exists in the repo root. Create one if missing:

```sh
# For MIT license:
curl -sL https://opensource.org/license/MIT -o LICENSE
# Or write your own. The Cargo.toml declares license = "MIT".
```

- `data/hf_models.json` is committed. It is embedded at compile time and must be present in the published crate.
- The `exclude` list in `Cargo.toml` keeps `target/`, `scripts/`, and `demo.gif` out of the published crate to keep the download small.

To publish updates:

```sh
# Bump version
# Edit Cargo.toml: version = "0.2.0"
cargo publish
```

---

## Dependencies

| Crate                  | Purpose                                          |
|------------------------|--------------------------------------------------|
| `clap`                 | CLI argument parsing with derive macros          |
| `sysinfo`              | Cross-platform RAM and CPU detection             |
| `serde` / `serde_json` | JSON deserialization for model database          |
| `tabled`               | CLI table formatting                             |
| `colored`              | CLI colored output                               |
| `ureq`                 | HTTP client for runtime/provider API integration |
| `ratatui`              | Terminal UI framework                            |
| `crossterm`            | Terminal input/output backend for ratatui        |

---

## Runtime provider integration

llmfit supports multiple local runtime providers:

- **Ollama** (daemon/API based pulls)
- **llama.cpp** (direct GGUF downloads from Hugging Face + local cache detection)
- **MLX** (Apple Silicon / mlx-community model cache + optional server)
- **Docker Model Runner** (Docker Desktop's built-in model serving)
- **LM Studio** (local model server with REST API for model management + downloads)

When more than one compatible provider is available for a model, pressing `d` in the TUI opens a provider picker modal.

### Ollama integration

llmfit integrates with [Ollama](https://ollama.com) to detect which models you already have installed and to download new ones directly from the TUI.

### Requirements

- **Ollama must be installed and running** (`ollama serve` or the Ollama desktop app)
- llmfit connects to `http://localhost:11434` (Ollama's default API port)
- No configuration needed — if Ollama is running, llmfit detects it automatically

### Remote Ollama instances

To connect to Ollama running on a different machine or port, set the `OLLAMA_HOST` environment variable:

```sh
# Connect to Ollama on a specific IP and port
OLLAMA_HOST="http://192.168.1.100:11434" llmfit

# Connect via hostname  
OLLAMA_HOST="http://ollama-server:666" llmfit

# Works with all TUI and CLI commands
OLLAMA_HOST="http://192.168.1.100:11434" llmfit --cli
OLLAMA_HOST="http://192.168.1.100:11434" llmfit fit --perfect -n 5
```

This is useful for:
- Running llmfit on one machine while Ollama serves from another (e.g., GPU server + laptop client)
- Connecting to Ollama running in Docker containers with custom ports
- Using Ollama behind reverse proxies or load balancers

### How it works

On startup, llmfit queries `GET /api/tags` to list your installed Ollama models. Each installed model gets a green **✓** in the **Inst** column of the TUI. The system bar shows `Ollama: ✓ (N installed)`.

When you press `d` on a model, llmfit sends `POST /api/pull` to Ollama to download it. The row highlights with an animated progress indicator showing download progress in real-time. Once complete, the model is immediately available for use with Ollama.

If Ollama is not running, Ollama-specific operations are skipped; the TUI still supports other providers like llama.cpp where available.

### llama.cpp integration

llmfit integrates with [llama.cpp](https://github.com/ggml-org/llama.cpp) as a runtime/download provider in both TUI and CLI.

Requirements:

- `llama-cli` or `llama-server` available in `PATH` (for runtime detection)
- network access to Hugging Face for GGUF downloads

How it works:

- llmfit maps HF models to known GGUF repos (with heuristic fallbacks)
- downloads GGUF files into the local llama.cpp model cache
- marks models installed when matching GGUF files are present locally

### Docker Model Runner integration

llmfit integrates with [Docker Model Runner](https://docs.docker.com/desktop/features/model-runner/), Docker Desktop's built-in model serving feature.

Requirements:

- Docker Desktop with Model Runner enabled
- Default endpoint: `http://localhost:12434`

How it works:

- llmfit queries `GET /engines` to list models available in Docker Model Runner
- models are matched to the HF database using Ollama-style tag mapping (Docker Model Runner uses `ai/<tag>` naming)
- pressing `d` in the TUI pulls via `docker model pull`

### Remote Docker Model Runner instances

To connect to Docker Model Runner on a different host or port, set the `DOCKER_MODEL_RUNNER_HOST` environment variable:

```sh
DOCKER_MODEL_RUNNER_HOST="http://192.168.1.100:12434" llmfit
```

### LM Studio integration

llmfit integrates with [LM Studio](https://lmstudio.ai) as a local model server with built-in model download capabilities.

Requirements:

- LM Studio must be running with its local server enabled
- Default endpoint: `http://127.0.0.1:1234`

How it works:

- llmfit queries `GET /v1/models` to list models available in LM Studio
- pressing `d` in the TUI triggers a download via `POST /api/v1/models/download`
- download progress is tracked by polling `GET /api/v1/models/download-status`
- LM Studio accepts HuggingFace model names directly, so no name mapping is needed

### Remote LM Studio instances

To connect to LM Studio on a different host or port, set the `LMSTUDIO_HOST` environment variable:

```sh
LMSTUDIO_HOST="http://192.168.1.100:1234" llmfit
```

### Model name mapping

llmfit's database uses HuggingFace model names (e.g. `Qwen/Qwen2.5-Coder-14B-Instruct`) while Ollama uses its own naming scheme (e.g. `qwen2.5-coder:14b`). llmfit maintains an accurate mapping table between the two so that install detection and pulls resolve to the correct model. Each mapping is exact — `qwen2.5-coder:14b` maps to the Coder model, not the base `qwen2.5:14b`.

---

## Platform support

- **Linux** -- Full support. GPU detection via `nvidia-smi` (NVIDIA), `rocm-smi` (AMD), sysfs/`lspci` (Intel Arc) and `npu-smi` (Ascend).
- **macOS (Apple Silicon)** -- Full support. Detects unified memory via `system_profiler`. VRAM = system RAM (shared pool). Models run via Metal GPU acceleration.
- **macOS (Intel)** -- RAM and CPU detection works. Discrete GPU detection if `nvidia-smi` available.
- **Windows** -- RAM and CPU detection works. NVIDIA GPU detection via `nvidia-smi` if installed.
- **Android / Termux / PRoot** -- CPU and RAM detection usually work, but GPU autodetection is not currently supported. Mobile GPUs such as Adreno typically are not visible through the desktop/server probing interfaces llmfit uses.

### GPU support

| Vendor                 | Detection method              | VRAM reporting                 |
|------------------------|-------------------------------|--------------------------------|
| NVIDIA                 | `nvidia-smi`                  | Exact dedicated VRAM           |
| AMD                    | `rocm-smi`                    | Detected (VRAM may be unknown) |
| Intel Arc (discrete)   | sysfs (`mem_info_vram_total`) | Exact dedicated VRAM           |
| Intel Arc (integrated) | `lspci`                       | Shared system memory           |
| Apple Silicon          | `system_profiler`             | Unified memory (= system RAM)  |
| Ascend                 | `npu-smi`                     | Detected (VRAM may be unknown) |

If autodetection fails or reports incorrect values, use `--memory=<SIZE>` to override (see [GPU memory override](#gpu-memory-override) above).

### Android / Termux note

On Android setups such as **Termux + PRoot**, llmfit usually cannot see mobile GPUs through the standard Linux detection paths (`nvidia-smi`, `rocm-smi`, DRM/sysfs, `lspci`, etc.). In those environments, "no GPU detected" is expected with the current implementation.

If you still want GPU-style recommendations on a unified-memory phone or tablet, use a manual memory override:

```sh
llmfit --memory=8G fit -n 20
llmfit recommend --json --memory=8G --limit 10
```

This is a workaround for recommendation/scoring only; it does not provide true Android GPU runtime detection.

---

## Contributing

Contributions are welcome, especially new models.

### Adding a model

1. Add the model's HuggingFace repo ID (e.g., `meta-llama/Llama-3.1-8B`) to the `TARGET_MODELS` list in `scripts/scrape_hf_models.py`.
2. If the model is gated (requires HuggingFace authentication to access metadata), add a fallback entry to the `FALLBACKS` list in the same script with the parameter count and context length.
3. Run the automated update script:
   ```sh
   make update-models
   # or: ./scripts/update_models.sh
   ```
4. Verify the updated model list: `./target/release/llmfit list`
5. Update [MODELS.md](MODELS.md) by running: `python3 << 'EOF' < scripts/...` (see commit history for the generator script)
6. Open a pull request.

See [MODELS.md](MODELS.md) for the current list and [AGENTS.md](AGENTS.md) for architecture details.

---

## OpenClaw integration

llmfit ships as an [OpenClaw](https://github.com/openclaw/openclaw) skill that lets the agent recommend hardware-appropriate local models and auto-configure Ollama/vLLM/LM Studio providers.

### Install the skill

```sh
# From the llmfit repo
./scripts/install-openclaw-skill.sh

# Or manually
cp -r skills/llmfit-advisor ~/.openclaw/skills/
```

Once installed, ask your OpenClaw agent things like:

- "What local models can I run?"
- "Recommend a coding model for my hardware"
- "Set up Ollama with the best models for my GPU"

The agent will call `llmfit recommend --json` under the hood, interpret the results, and offer to configure your `openclaw.json` with optimal model choices.

### How it works

The skill teaches the OpenClaw agent to:

1. Detect your hardware via `llmfit --json system`
2. Get ranked recommendations via `llmfit recommend --json`
3. Map HuggingFace model names to Ollama/vLLM/LM Studio tags
4. Configure `models.providers.ollama.models` in `openclaw.json`

See [skills/llmfit-advisor/SKILL.md](skills/llmfit-advisor/SKILL.md) for the full skill definition.

---

## Alternatives

If you're looking for a different approach, check out [llm-checker](https://github.com/Pavelevich/llm-checker) -- a Node.js CLI tool with Ollama integration that can pull and benchmark models directly. It takes a more hands-on approach by actually running models on your hardware via Ollama, rather than estimating from specs. Good if you already have Ollama installed and want to test real-world performance. Note that it doesn't support MoE (Mixture-of-Experts) architectures -- all models are treated as dense, so memory estimates for models like Mixtral or DeepSeek-V3 will reflect total parameter count rather than the smaller active subset.

---

## License

MIT


================================================
FILE: README.zh.md
================================================
# llmfit

<p align="center">
  <img src="assets/icon.svg" alt="llmfit 图标" width="128" height="128">
</p>

<p align="center">
  <a href="README.md">English</a> ·
  <b>中文</b>
</p>

<p align="center">
  <a href="https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml"><img src="https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://crates.io/crates/llmfit"><img src="https://img.shields.io/crates/v/llmfit.svg" alt="Crates.io"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="许可证"></a>
</p>

**数百种模型与提供商,一条命令即可找出你的硬件能运行哪些模型。**

一款终端工具,根据你系统的 RAM、CPU 和 GPU 为 LLM 模型匹配合适的规格。自动检测硬件,从质量、速度、适配度和上下文四个维度为每个模型打分,告诉你哪些模型能在你的机器上流畅运行。

内置交互式 TUI(默认)和经典 CLI 模式。支持多 GPU 配置、MoE(混合专家)架构、动态量化选择、速度估算,以及本地运行时提供商(Ollama、llama.cpp、MLX、Docker Model Runner、LM Studio)。

> **姐妹项目:** 欢迎查看 [sympozium](https://github.com/AlexsJones/sympozium/),用于在 Kubernetes 中管理 Agent。

![演示](demo.gif)

---

## 安装

### Windows
```sh
scoop install llmfit
```

如果尚未安装 Scoop,请参阅 [Scoop 安装指南](https://scoop.sh/)。

### macOS / Linux

#### Homebrew
```sh
brew install llmfit
```

#### 快速安装
```sh
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
```

从 GitHub 下载最新发布的二进制文件并安装到 `/usr/local/bin`(如果没有 sudo 则安装到 `~/.local/bin`)。

**安装到 `~/.local/bin`(无需 sudo):**
```sh
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local
```

### Docker / Podman
```sh
docker run ghcr.io/alexsjones/llmfit
```
此命令会输出 `llmfit recommend` 的 JSON 结果,可以用 `jq` 进一步查询。
```
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
```

### 从源码构建
```sh
git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# 二进制文件位于 target/release/llmfit
```

---

## 使用方法

### TUI(默认)

```sh
llmfit
```

启动交互式终端 UI。系统配置(CPU、RAM、GPU 名称、VRAM、后端)显示在顶部。模型按综合评分排序,以可滚动的表格列出。每行显示模型的评分、预估 tok/s、最佳量化方案、运行模式、内存占用和用途分类。

| 按键                       | 操作                                            |
|----------------------------|-------------------------------------------------|
| `Up` / `Down` 或 `j` / `k` | 浏览模型                                        |
| `/`                        | 进入搜索模式(按名称、提供商、参数量、用途模糊匹配) |
| `Esc` 或 `Enter`           | 退出搜索模式                                    |
| `Ctrl-U`                   | 清除搜索                                        |
| `f`                        | 切换适配度过滤:全部、可运行、完美、良好、勉强       |
| `a`                        | 切换可用性过滤:全部、GGUF 可用、已安装            |
| `s`                        | 切换排序列:评分、参数量、内存%、上下文、日期、用途   |
| `v`                        | 进入 Visual 模式(多选模型)                      |
| `V`                        | 进入 Select 模式(按列过滤)                      |
| `t`                        | 切换颜色主题(自动保存)                          |
| `p`                        | 打开 Plan 模式(硬件规划)                        |
| `P`                        | 打开提供商过滤弹窗                              |
| `U`                        | 打开用途过滤弹窗                                |
| `C`                        | 打开能力过滤弹窗                                |
| `m`                        | 标记选中模型用于对比                            |
| `c`                        | 打开对比视图(已标记 vs 选中)                    |
| `x`                        | 清除对比标记                                    |
| `i`                        | 切换已安装优先排序(任何已检测的运行时提供商)    |
| `d`                        | 下载选中模型(多个提供商可用时弹出选择器)        |
| `r`                        | 从运行时提供商刷新已安装模型                    |
| `Enter`                    | 切换选中模型的详情视图                          |
| `PgUp` / `PgDn`            | 滚动 10 行                                      |
| `g` / `G`                  | 跳转到顶部 / 底部                               |
| `q`                        | 退出                                            |

### 类 Vim 模式

TUI 使用类 Vim 模式,当前模式显示在左下角状态栏。当前模式决定哪些按键生效。

#### Normal 模式

默认模式。可浏览、搜索、过滤和打开各种视图。上表中的所有按键均在此模式下有效。

#### Visual 模式 (`v`)

选择连续的多个模型进行批量对比。按 `v` 在当前行设置锚点,然后用 `j`/`k` 或方向键扩展选区。选中的行会高亮显示。

| 按键               | 操作                                 |
|--------------------|--------------------------------------|
| `j` / `k` 或方向键 | 向上/下扩展选区                      |
| `c`                | 对比所有选中模型(打开多模型对比视图) |
| `m`                | 标记当前模型用于双模型对比           |
| `Esc` 或 `v`       | 退出 Visual 模式                     |

多模型对比视图以表格形式显示,行为属性(评分、tok/s、适配度、内存%、参数量、模式、上下文、量化等),列为模型。最优值会高亮显示。如果选中的模型超出屏幕宽度,可用 `h`/`l` 或方向键水平滚动。

#### Select 模式 (`V`)

按列过滤。按 `V`(shift-v)进入 Select 模式,然后用 `h`/`l` 或方向键在列标题间移动。当前列会高亮显示。按 `Enter` 或 `Space` 激活该列对应的过滤器:

| 列                        | 过滤操作                                              |
|---------------------------|-------------------------------------------------------|
| Inst                      | 切换可用性过滤                                        |
| Model                     | 进入搜索模式                                          |
| Provider                  | 打开提供商弹窗                                        |
| Params                    | 打开参数量分组弹窗(<3B、3-7B、7-14B、14-30B、30-70B、70B+) |
| Score、tok/s、Mem%、Ctx、Date | 按该列排序                                            |
| Quant                     | 打开量化弹窗                                          |
| Mode                      | 打开运行模式弹窗(GPU、MoE、CPU+GPU、CPU)                 |
| Fit                       | 切换适配度过滤                                        |
| Use Case                  | 打开用途弹窗                                          |

在 Select 模式下仍可用 `j`/`k` 浏览行,以便在应用过滤器时查看效果。按 `Esc` 返回 Normal 模式。

### TUI Plan 模式 (`p`)

Plan 模式与常规适配分析相反:不是问"我的硬件能跑什么?",而是估算"这个模型配置需要什么硬件?"。

在选中的行上按 `p`,然后:

| 按键                   | 操作                                     |
|------------------------|------------------------------------------|
| `Tab` / `j` / `k`      | 在可编辑字段间移动(上下文、量化、目标 TPS) |
| `Left` / `Right`       | 在当前字段内移动光标                     |
| 输入                   | 编辑当前字段                             |
| `Backspace` / `Delete` | 删除字符                                 |
| `Ctrl-U`               | 清空当前字段                             |
| `Esc` 或 `q`           | 退出 Plan 模式                           |

Plan 模式显示以下估算:
- 最低和推荐的 VRAM/RAM/CPU 核心数
- 可行的运行路径(GPU、CPU 卸载、纯 CPU)
- 达到更好适配目标所需的升级差距

### 主题

按 `t` 可在 10 种内置颜色主题间切换。选择会自动保存到 `~/.config/llmfit/theme`,下次启动时恢复。

| 主题                     | 描述                                        |
|--------------------------|---------------------------------------------|
| **Default**              | llmfit 原始配色                             |
| **Dracula**              | 深紫色背景搭配柔和色调                      |
| **Solarized**            | Ethan Schoonover 的 Solarized Dark 配色方案 |
| **Nord**                 | 极地风格,冷蓝灰色调                         |
| **Monokai**              | Monokai Pro 暖色语法配色                    |
| **Gruvbox**              | 复古风格,暖色大地色调                       |
| **Catppuccin Latte**     | 🌻 浅色主题——和谐的柔和反转配色             |
| **Catppuccin Frappé**    | 🪴 低对比度深色——柔和、内敛的美学            |
| **Catppuccin Macchiato** | 🌺 中对比度深色——温柔舒缓的色调             |
| **Catppuccin Mocha**     | 🌿 最深的暗色变体——温馨且色彩丰富           |

### Web 仪表盘

当你以非 JSON 模式运行 `llmfit` 时,会自动在后台启动 Web 仪表盘,默认监听 `0.0.0.0:8787`。可在同一网络中的任意浏览器打开:

```
http://<你的机器IP>:8787
```

你也可以通过环境变量覆盖主机或端口:

```sh
LLMFIT_DASHBOARD_HOST=0.0.0.0 LLMFIT_DASHBOARD_PORT=9000 llmfit
```

| 变量 | 默认值 | 说明 |
|---|---|---|
| `LLMFIT_DASHBOARD_HOST` | `0.0.0.0` | 仪表盘服务绑定的网卡地址 |
| `LLMFIT_DASHBOARD_PORT` | `8787` | 仪表盘服务绑定的端口 |

如需禁用自动启动仪表盘,添加 `--no-dashboard`:

```sh
llmfit --no-dashboard
```

### CLI 模式

使用 `--cli` 或任何子命令获取经典表格输出:

```sh
# 按适配度排序的所有模型表格
llmfit --cli

# 仅显示完美适配的模型,前 5 个
llmfit fit --perfect -n 5

# 显示检测到的系统配置
llmfit system

# 列出数据库中所有模型
llmfit list

# 按名称、提供商或参数量搜索
llmfit search "llama 8b"

# 单个模型的详细信息
llmfit info "Mistral-7B"

# 前 5 个推荐(JSON 格式,供 agent/脚本使用)
llmfit recommend --json --limit 5

# 按用途过滤推荐
llmfit recommend --json --use-case coding --limit 3

# 为特定模型配置规划所需硬件
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json

# 作为节点级 REST API 运行(供集群调度器/聚合器使用)
llmfit serve --host 0.0.0.0 --port 8787
```

### REST API (`llmfit serve`)

`llmfit serve` 启动一个 HTTP API,提供与 TUI/CLI 相同的适配/评分数据,包括过滤和节点级最优模型选择。

```sh
# 健康检查
curl http://localhost:8787/health

# 节点硬件信息
curl http://localhost:8787/api/v1/system

# 带过滤的完整适配列表
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

# 关键调度端点:该节点的最佳可运行模型
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

# 按模型名称/提供商搜索
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"
```

`models`/`models/top` 支持的查询参数:

- `limit`(或 `n`):返回的最大行数
- `perfect`:`true|false`(为 `true` 时强制仅显示完美适配)
- `min_fit`:`perfect|good|marginal|too_tight`
- `runtime`:`any|mlx|llamacpp`
- `use_case`:`general|coding|reasoning|chat|multimodal|embedding`
- `provider`:提供商文本过滤(子字符串匹配)
- `search`:跨名称/提供商/参数量/用途的全文过滤
- `sort`:`score|tps|params|mem|ctx|date|use_case`
- `include_too_tight`:包含不可运行的行(`/top` 默认 `false`,`/models` 默认 `true`)
- `max_context`:每次请求的上下文长度上限,用于内存估算

本地验证 API 行为:

```sh
# 自动启动服务器并运行端点/模式/过滤断言
python3 scripts/test_api.py --spawn

# 或测试已运行的服务器
python3 scripts/test_api.py --base-url http://127.0.0.1:8787
```

### GPU 显存覆盖

在某些系统上 GPU VRAM 自动检测可能失败(例如 `nvidia-smi` 故障、虚拟机、直通配置)。使用 `--memory` 手动指定 GPU 显存:

```sh
# 覆盖为 32 GB VRAM
llmfit --memory=32G

# 也支持兆字节(32000 MB ≈ 31.25 GB)
llmfit --memory=32000M

# 适用于所有模式:TUI、CLI 和子命令
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --memory=24G system
llmfit --memory=24G info "Llama-3.1-70B"
llmfit --memory=24G recommend --json
```

支持的后缀:`G`/`GB`/`GiB`(千兆字节)、`M`/`MB`/`MiB`(兆字节)、`T`/`TB`/`TiB`(太字节)。不区分大小写。如果未检测到 GPU,覆盖值会创建一个虚拟 GPU 条目,以便按 GPU 推理对模型评分。

### 上下文长度上限

使用 `--max-context` 限制用于内存估算的上下文长度(不改变每个模型标称的最大上下文):

```sh
# 按 4K 上下文估算内存适配
llmfit --max-context 4096 --cli

# 适用于子命令
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5
```

如果未设置 `--max-context`,llmfit 会在可用时使用 `OLLAMA_CONTEXT_LENGTH`。

### JSON 输出

在任何子命令后添加 `--json` 获取机器可读输出:

```sh
llmfit --json system     # 硬件信息(JSON)
llmfit --json fit -n 10  # 前 10 个适配结果(JSON)
llmfit recommend --json  # 前 5 个推荐(recommend 默认输出 JSON)
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json
```

`plan` 的 JSON 输出包含以下稳定字段:
- 请求参数(`context`、`quantization`、`target_tps`)
- 估算的最低/推荐硬件
- 每条路径的可行性(`gpu`、`cpu_offload`、`cpu_only`)
- 升级差距

---

## 工作原理

1. **硬件检测** -- 通过 `sysinfo` 读取总计/可用 RAM,统计 CPU 核心数,并探测 GPU:
   - **NVIDIA** -- 通过 `nvidia-smi` 支持多 GPU。聚合所有检测到的 GPU 的 VRAM。如果报告失败,则根据 GPU 型号名称估算 VRAM。
   - **AMD** -- 通过 `rocm-smi` 检测。
   - **Intel Arc** -- 独立显卡通过 sysfs 检测 VRAM,集成显卡通过 `lspci` 检测。
   - **Apple Silicon** -- 通过 `system_profiler` 检测统一内存。VRAM = 系统 RAM。
   - **Ascend** -- 通过 `npu-smi` 检测。
   - **后端检测** -- 自动识别加速后端(CUDA、Metal、ROCm、SYCL、CPU ARM、CPU x86、Ascend)用于速度估算。

2. **模型数据库** -- 数百个模型来源于 HuggingFace API,存储在 `data/hf_models.json` 中并在编译时嵌入。内存需求根据量化层级(Q8_0 到 Q2_K)的参数量计算。VRAM 是 GPU 推理的主要约束;系统 RAM 是纯 CPU 执行的后备方案。

   **MoE 支持** -- 自动检测混合专家架构(Mixtral、DeepSeek-V2/V3)的模型。每个 token 只有部分专家处于活跃状态,因此实际 VRAM 需求远低于总参数量的暗示。例如,Mixtral 8x7B 总参数量为 46.7B,但每个 token 仅激活约 12.9B,通过专家卸载将 VRAM 需求从 23.9 GB 降至约 6.6 GB。

3. **动态量化** -- llmfit 不假设固定量化,而是尝试适配你硬件的最高质量量化。它从 Q8_0(最高质量)到 Q2_K(最高压缩)逐级尝试,选择能装入可用内存的最高质量等级。如果在完整上下文下无法装入,则尝试半上下文。

4. **多维评分** -- 每个模型按四个维度评分(每个 0-100):

   | 维度       | 衡量内容                                 |
   |------------|------------------------------------------|
   | **质量**   | 参数量、模型系列声誉、量化惩罚、任务对齐度  |
   | **速度**   | 基于后端、参数量和量化的预估 tokens/sec   |
   | **适配度** | 内存利用效率(最佳区间:可用内存的 50-80%) |
   | **上下文** | 上下文窗口能力与用途目标的对比           |

   各维度通过加权合成为综合评分。权重因用途类别而异(通用、编程、推理、对话、多模态、嵌入)。例如,对话类更侧重速度(0.35),推理类更侧重质量(0.55)。模型按综合评分排序,不可运行的模型(Too Tight)始终排在最后。

5. **速度估算** -- LLM 推理中的 token 生成受内存带宽限制:每个 token 需要从 VRAM 完整读取一次模型权重。当识别出 GPU 型号时,llmfit 使用其实际内存带宽来估算吞吐量:

   公式:`(bandwidth_GB_s / model_size_GB) × efficiency_factor`

   效率因子(0.55)考虑了内核开销、KV 缓存读取和内存控制器效应。该方法已通过 llama.cpp 的公开基准测试验证([Apple Silicon](https://github.com/ggml-org/llama.cpp/discussions/4167)、[NVIDIA T4](https://github.com/ggml-org/llama.cpp/discussions/4225))及实际测量数据。

   带宽查找表涵盖约 80 种 GPU,覆盖 NVIDIA(消费级 + 数据中心级)、AMD(RDNA + CDNA)和 Apple Silicon 系列。

   对于未识别的 GPU,llmfit 使用按后端的速度常量作为回退:

   | 后端         | 速度常量 |
   |--------------|----------|
   | CUDA         | 220      |
   | Metal        | 160      |
   | ROCm         | 180      |
   | SYCL         | 100      |
   | CPU (ARM)    | 90       |
   | CPU (x86)    | 70       |
   | NPU (Ascend) | 390      |

   回退公式:`K / params_b × quant_speed_multiplier`,对 CPU 卸载(0.5x)、纯 CPU(0.3x)和 MoE 专家切换(0.8x)施加惩罚。

6. **适配分析** -- 评估每个模型的内存兼容性:

   **运行模式:**
   - **GPU** -- 模型完全装入 VRAM。推理速度快。
   - **MoE** -- 混合专家 + 专家卸载。活跃专家在 VRAM 中,非活跃专家在 RAM 中。
   - **CPU+GPU** -- VRAM 不足,溢出到系统 RAM 并使用部分 GPU 卸载。
   - **CPU** -- 无 GPU。模型完全加载到系统 RAM 中。

   **适配等级:**
   - **Perfect(完美)** -- GPU 上满足推荐内存。需要 GPU 加速。
   - **Good(良好)** -- 有余量地装入。MoE 卸载或 CPU+GPU 模式的最佳等级。
   - **Marginal(勉强)** -- 装入紧张,或纯 CPU 运行(纯 CPU 始终封顶在此等级)。
   - **Too Tight(过紧)** -- VRAM 和系统 RAM 均不足。

---

## 模型数据库

模型列表由 `scripts/scrape_hf_models.py` 生成,这是一个独立的 Python 脚本(仅使用标准库,无需 pip 依赖),通过 HuggingFace REST API 查询。数百个模型和提供商,包括 Meta Llama、Mistral、Qwen、Google Gemma、Microsoft Phi、DeepSeek、IBM Granite、Allen Institute OLMo、xAI Grok、Cohere、BigCode、01.ai、Upstage、TII Falcon、HuggingFace、Zhipu GLM、Moonshot Kimi、Baidu ERNIE 等。爬虫通过模型配置(`num_local_experts`、`num_experts_per_tok`)和已知架构映射自动检测 MoE 架构。

模型类别涵盖通用、编程(CodeLlama、StarCoder2、WizardCoder、Qwen2.5-Coder、Qwen3-Coder)、推理(DeepSeek-R1、Orca-2)、多模态/视觉(Llama 3.2 Vision、Llama 4 Scout/Maverick、Qwen2.5-VL)、对话、企业级(IBM Granite)和嵌入(nomic-embed、bge)。

完整列表请参阅 [MODELS.md](MODELS.md)。

刷新模型数据库:

```sh
# 自动更新(推荐)
make update-models

# 或直接运行脚本
./scripts/update_models.sh

# 或手动执行
python3 scripts/scrape_hf_models.py
cargo build --release
```

爬虫将结果写入 `data/hf_models.json`,通过 `include_str!` 在编译时嵌入二进制文件。自动更新脚本会备份现有数据、验证 JSON 输出并重新构建二进制文件。

默认情况下,爬虫会使用来自 [unsloth](https://huggingface.co/unsloth) 和 [bartowski](https://huggingface.co/bartowski) 等提供商的已知 GGUF 下载源来丰富模型信息。结果缓存在 `data/gguf_sources_cache.json` 中(7 天 TTL),以避免重复 API 调用。使用 `--no-gguf-sources` 可跳过丰富步骤以加快爬取速度。

---

## 项目结构

```
src/
  main.rs         -- CLI 参数解析、入口、TUI 启动
  hardware.rs     -- 系统 RAM/CPU/GPU 检测(多 GPU、后端识别)
  models.rs       -- 模型数据库、量化层级、动态量化选择
  fit.rs          -- 多维评分(Q/S/F/C)、速度估算、MoE 卸载
  providers.rs    -- 运行时提供商集成(Ollama、llama.cpp、MLX、Docker Model Runner、LM Studio)、安装检测、拉取/下载
  display.rs      -- 经典 CLI 表格渲染 + JSON 输出
  tui_app.rs      -- TUI 应用状态、过滤器、导航
  tui_ui.rs       -- TUI 渲染(ratatui)
  tui_events.rs   -- TUI 键盘事件处理(crossterm)
data/
  hf_models.json  -- 模型数据库(206 个模型)
skills/
  llmfit-advisor/ -- 用于硬件感知模型推荐的 OpenClaw 技能
scripts/
  scrape_hf_models.py        -- HuggingFace API 爬虫
  update_models.sh            -- 自动化数据库更新脚本
  install-openclaw-skill.sh   -- 安装 OpenClaw 技能
Makefile           -- 构建和维护命令
```

---

## 发布到 crates.io

`Cargo.toml` 已包含所需的元数据(描述、许可证、仓库地址)。发布步骤:

```sh
# 先进行试运行以发现问题
cargo publish --dry-run

# 正式发布(需要 crates.io API token)
cargo login
cargo publish
```

发布前请确认:

- `Cargo.toml` 中的版本号正确(每次发布时递增)。
- 仓库根目录存在 `LICENSE` 文件。如果缺失请创建:

```sh
# MIT 许可证:
curl -sL https://opensource.org/license/MIT -o LICENSE
# 或自行编写。Cargo.toml 声明 license = "MIT"。
```

- `data/hf_models.json` 已提交。它在编译时嵌入,必须存在于发布的 crate 中。
- `Cargo.toml` 中的 `exclude` 列表将 `target/`、`scripts/` 和 `demo.gif` 排除在发布的 crate 之外,以减小下载体积。

发布更新:

```sh
# 递增版本号
# 编辑 Cargo.toml: version = "0.2.0"
cargo publish
```

---

## 依赖

| Crate                  | 用途                                     |
|------------------------|------------------------------------------|
| `clap`                 | 基于 derive 宏的 CLI 参数解析            |
| `sysinfo`              | 跨平台 RAM 和 CPU 检测                   |
| `serde` / `serde_json` | 模型数据库的 JSON 反序列化               |
| `tabled`               | CLI 表格格式化                           |
| `colored`              | CLI 彩色输出                             |
| `ureq`                 | 用于运行时/提供商 API 集成的 HTTP 客户端 |
| `ratatui`              | 终端 UI 框架                             |
| `crossterm`            | ratatui 的终端输入/输出后端              |

---

## 运行时提供商集成

llmfit 支持多个本地运行时提供商:

- **Ollama**(基于守护进程/API 的拉取)
- **llama.cpp**(从 Hugging Face 直接下载 GGUF + 本地缓存检测)
- **MLX**(Apple Silicon / mlx-community 模型缓存 + 可选服务器)
- **Docker Model Runner**(Docker Desktop 内置的模型服务)
- **LM Studio**(本地模型服务器,支持 REST API 模型管理和下载)

当某个模型有多个兼容的提供商可用时,在 TUI 中按 `d` 会打开提供商选择弹窗。

### Ollama 集成

llmfit 与 [Ollama](https://ollama.com) 集成,可检测你已安装的模型并直接从 TUI 下载新模型。

### 要求

- **Ollama 必须已安装且正在运行**(`ollama serve` 或 Ollama 桌面应用)
- llmfit 连接到 `http://localhost:11434`(Ollama 默认 API 端口)
- 无需配置 -- 如果 Ollama 正在运行,llmfit 会自动检测到它

### 远程 Ollama 实例

要连接到在其他机器或端口上运行的 Ollama,设置 `OLLAMA_HOST` 环境变量:

```sh
# 连接到指定 IP 和端口的 Ollama
OLLAMA_HOST="http://192.168.1.100:11434" llmfit

# 通过主机名连接
OLLAMA_HOST="http://ollama-server:666" llmfit

# 适用于所有 TUI 和 CLI 命令
OLLAMA_HOST="http://192.168.1.100:11434" llmfit --cli
OLLAMA_HOST="http://192.168.1.100:11434" llmfit fit --perfect -n 5
```

适用场景:
- 在一台机器上运行 llmfit,而 Ollama 在另一台机器上提供服务(例如 GPU 服务器 + 笔记本客户端)
- 连接到在 Docker 容器中以自定义端口运行的 Ollama
- 使用反向代理或负载均衡器后面的 Ollama

### 工作原理

启动时,llmfit 查询 `GET /api/tags` 列出已安装的 Ollama 模型。每个已安装的模型在 TUI 的 **Inst** 列显示绿色 **✓**。系统栏显示 `Ollama: ✓ (N installed)`。

在模型上按 `d` 时,llmfit 向 Ollama 发送 `POST /api/pull` 来下载模型。该行会高亮显示并带有动画进度指示器,实时显示下载进度。下载完成后,模型可立即在 Ollama 中使用。

如果 Ollama 未运行,Ollama 相关操作会被跳过;TUI 仍然支持其他可用的提供商(如 llama.cpp)。

### llama.cpp 集成

llmfit 与 [llama.cpp](https://github.com/ggml-org/llama.cpp) 集成,在 TUI 和 CLI 中均可作为运行时/下载提供商使用。

要求:

- `llama-cli` 或 `llama-server` 在 `PATH` 中可用(用于运行时检测)
- 需要网络访问 Hugging Face 以下载 GGUF 文件

工作原理:

- llmfit 将 HF 模型映射到已知的 GGUF 仓库(带有启发式回退)
- 将 GGUF 文件下载到本地 llama.cpp 模型缓存
- 当本地存在匹配的 GGUF 文件时标记模型为已安装

### Docker Model Runner 集成

llmfit 与 [Docker Model Runner](https://docs.docker.com/desktop/features/model-runner/) 集成,这是 Docker Desktop 内置的模型服务功能。

要求:

- Docker Desktop 已启用 Model Runner
- 默认端点:`http://localhost:12434`

工作原理:

- llmfit 查询 `GET /engines` 列出 Docker Model Runner 中可用的模型
- 使用 Ollama 风格的标签映射将模型与 HF 数据库匹配(Docker Model Runner 使用 `ai/<tag>` 命名)
- 在 TUI 中按 `d` 通过 `docker model pull` 拉取模型

### 远程 Docker Model Runner 实例

要连接到不同主机或端口的 Docker Model Runner,设置 `DOCKER_MODEL_RUNNER_HOST` 环境变量:

```sh
DOCKER_MODEL_RUNNER_HOST="http://192.168.1.100:12434" llmfit
```

### LM Studio 集成

llmfit 与 [LM Studio](https://lmstudio.ai) 集成,作为本地模型服务器,支持内置模型下载功能。

要求:

- LM Studio 必须运行且本地服务器已启用
- 默认端点:`http://127.0.0.1:1234`

工作原理:

- llmfit 查询 `GET /v1/models` 列出 LM Studio 中可用的模型
- 在 TUI 中按 `d` 通过 `POST /api/v1/models/download` 触发下载
- 通过轮询 `GET /api/v1/models/download-status` 跟踪下载进度
- LM Studio 直接接受 HuggingFace 模型名称,无需名称映射

### 远程 LM Studio 实例

要连接到不同主机或端口的 LM Studio,设置 `LMSTUDIO_HOST` 环境变量:

```sh
LMSTUDIO_HOST="http://192.168.1.100:1234" llmfit
```

### 模型名称映射

llmfit 的数据库使用 HuggingFace 模型名称(例如 `Qwen/Qwen2.5-Coder-14B-Instruct`),而 Ollama 使用自己的命名方案(例如 `qwen2.5-coder:14b`)。llmfit 维护了一个精确的映射表,确保安装检测和拉取操作解析到正确的模型。每个映射都是精确的 -- `qwen2.5-coder:14b` 映射到 Coder 模型,而不是基础的 `qwen2.5:14b`。

---

## 平台支持

- **Linux** -- 完全支持。通过 `nvidia-smi`(NVIDIA)、`rocm-smi`(AMD)、sysfs/`lspci`(Intel Arc)和 `npu-smi`(Ascend)进行 GPU 检测。
- **macOS (Apple Silicon)** -- 完全支持。通过 `system_profiler` 检测统一内存。VRAM = 系统 RAM(共享池)。模型通过 Metal GPU 加速运行。
- **macOS (Intel)** -- RAM 和 CPU 检测正常。如果 `nvidia-smi` 可用,可检测独立 GPU。
- **Windows** -- RAM 和 CPU 检测正常。如果安装了 `nvidia-smi`,可检测 NVIDIA GPU。
- **Android / Termux / PRoot** -- CPU 和 RAM 检测通常正常,但目前不支持 GPU 自动检测。Adreno 等移动 GPU 通常无法通过 llmfit 使用的桌面/服务器探测接口访问。

### GPU 支持

| 厂商            | 检测方式                      | VRAM 报告             |
|-----------------|-------------------------------|-----------------------|
| NVIDIA          | `nvidia-smi`                  | 精确的独立 VRAM       |
| AMD             | `rocm-smi`                    | 已检测(VRAM 可能未知) |
| Intel Arc(独立) | sysfs (`mem_info_vram_total`) | 精确的独立 VRAM       |
| Intel Arc(集成) | `lspci`                       | 共享系统内存          |
| Apple Silicon   | `system_profiler`             | 统一内存(= 系统 RAM)  |
| Ascend          | `npu-smi`                     | 已检测(VRAM 可能未知) |

如果自动检测失败或报告的值不正确,使用 `--memory=<SIZE>` 覆盖(参见上方 [GPU 显存覆盖](#gpu-显存覆盖))。

### Android / Termux 说明

在 **Termux + PRoot** 等 Android 环境中,llmfit 通常无法通过标准 Linux 检测路径(`nvidia-smi`、`rocm-smi`、DRM/sysfs、`lspci` 等)检测到移动 GPU。在这些环境中,"未检测到 GPU"是当前实现的预期行为。

如果你仍希望在统一内存的手机或平板上获得 GPU 风格的推荐,可使用手动内存覆盖:

```sh
llmfit --memory=8G fit -n 20
llmfit recommend --json --memory=8G --limit 10
```

这仅是推荐/评分的变通方案;不提供真正的 Android GPU 运行时检测。

---

## 贡献

欢迎贡献,特别是添加新模型。

### 添加模型

1. 在 `scripts/scrape_hf_models.py` 的 `TARGET_MODELS` 列表中添加模型的 HuggingFace 仓库 ID(例如 `meta-llama/Llama-3.1-8B`)。
2. 如果模型有访问限制(需要 HuggingFace 身份验证才能访问元数据),在同一脚本的 `FALLBACKS` 列表中添加包含参数量和上下文长度的回退条目。
3. 运行自动更新脚本:
   ```sh
   make update-models
   # 或: ./scripts/update_models.sh
   ```
4. 验证更新后的模型列表:`./target/release/llmfit list`
5. 运行以下命令更新 [MODELS.md](MODELS.md):`python3 << 'EOF' < scripts/...`(参见提交历史中的生成脚本)
6. 提交 Pull Request。

参见 [MODELS.md](MODELS.md) 查看当前列表,[AGENTS.md](AGENTS.md) 查看架构详情。

---

## OpenClaw 集成

llmfit 作为 [OpenClaw](https://github.com/openclaw/openclaw) 技能提供,让 agent 能够推荐适合硬件的本地模型,并自动配置 Ollama/vLLM/LM Studio 提供商。

### 安装技能

```sh
# 从 llmfit 仓库
./scripts/install-openclaw-skill.sh

# 或手动安装
cp -r skills/llmfit-advisor ~/.openclaw/skills/
```

安装后,可以向 OpenClaw agent 提问:

- "我能运行哪些本地模型?"
- "为我的硬件推荐一个编程模型"
- "用最适合我 GPU 的模型配置 Ollama"

Agent 会在后台调用 `llmfit recommend --json`,解读结果,并提议用最优的模型选择配置你的 `openclaw.json`。

### 工作原理

该技能教会 OpenClaw agent:

1. 通过 `llmfit --json system` 检测你的硬件
2. 通过 `llmfit recommend --json` 获取排序后的推荐
3. 将 HuggingFace 模型名称映射到 Ollama/vLLM/LM Studio 标签
4. 配置 `openclaw.json` 中的 `models.providers.ollama.models`

参见 [skills/llmfit-advisor/SKILL.md](skills/llmfit-advisor/SKILL.md) 查看完整技能定义。

---

## 替代方案

如果你在寻找不同的方案,可以看看 [llm-checker](https://github.com/Pavelevich/llm-checker) -- 一个带有 Ollama 集成的 Node.js CLI 工具,可以直接拉取和基准测试模型。它采用更直接的方式,通过 Ollama 在你的硬件上实际运行模型,而不是从配置参数估算。如果你已安装 Ollama 并想测试真实性能,这是个不错的选择。注意它不支持 MoE(混合专家)架构 -- 所有模型都被视为密集模型,因此 Mixtral 或 DeepSeek-V3 等模型的内存估算将反映总参数量而非较小的活跃子集。

---

## 许可证

MIT

---

*本文档由 [@JasonYeYuhe](https://github.com/JasonYeYuhe) 翻译并维护。如果您发现任何翻译问题或需要增加新特性说明,欢迎提交 Issue 或与我联系。*


================================================
FILE: data/hf_models.json
================================================
[
  {
    "name": "echarlaix/tiny-random-PhiForCausalLM",
    "provider": "echarlaix",
    "parameter_count": "80K",
    "parameters_raw": 80074,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 512,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "phi",
    "hf_downloads": 24984,
    "hf_likes": 0,
    "release_date": "2024-03-29",
    "_discovered": true
  },
  {
    "name": "peft-internal-testing/tiny-random-GPT2LMHeadModel",
    "provider": "peft-internal-testing",
    "parameter_count": "83K",
    "parameters_raw": 83161,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 512,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt2",
    "hf_downloads": 37534,
    "hf_likes": 0,
    "release_date": "2025-11-17",
    "_discovered": true
  },
  {
    "name": "peft-internal-testing/tiny-random-gpt2",
    "provider": "peft-internal-testing",
    "parameter_count": "112K",
    "parameters_raw": 111968,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 512,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt2",
    "hf_downloads": 28458,
    "hf_likes": 0,
    "release_date": "2025-11-17",
    "_discovered": true
  },
  {
    "name": "peft-internal-testing/tiny-random-GPTJForCausalLM",
    "provider": "peft-internal-testing",
    "parameter_count": "129K",
    "parameters_raw": 129184,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 512,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gptj",
    "hf_downloads": 38953,
    "hf_likes": 0,
    "release_date": "2025-11-17",
    "_discovered": true
  },
  {
    "name": "allenai/Olmo-3-7B-Instruct",
    "provider": "allenai",
    "parameter_count": "528K",
    "parameters_raw": 528384,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 65536,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "olmo3",
    "hf_downloads": 101787,
    "hf_likes": 118,
    "release_date": "2025-11-19",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "unsloth/Olmo-3-7B-Instruct-GGUF",
        "provider": "unsloth"
      }
    ]
  },
  {
    "name": "allenai/Olmo-3-7B-Think",
    "provider": "allenai",
    "parameter_count": "528K",
    "parameters_raw": 528384,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 65536,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "olmo3",
    "hf_downloads": 44414,
    "hf_likes": 88,
    "release_date": "2025-11-18",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "unsloth/Olmo-3-7B-Think-GGUF",
        "provider": "unsloth"
      }
    ]
  },
  {
    "name": "allenai/Olmo-3-7B-Think-DPO",
    "provider": "allenai",
    "parameter_count": "528K",
    "parameters_raw": 528384,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 65536,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "olmo3",
    "hf_downloads": 21555,
    "hf_likes": 7,
    "release_date": "2025-11-18",
    "_discovered": true
  },
  {
    "name": "MaxJeblick/llama2-0b-unit-test",
    "provider": "maxjeblick",
    "parameter_count": "771K",
    "parameters_raw": 770940,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 1024,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 48409,
    "hf_likes": 2,
    "release_date": "2023-10-25",
    "_discovered": true
  },
  {
    "name": "peft-internal-testing/tiny-random-OPTForCausalLM",
    "provider": "peft-internal-testing",
    "parameter_count": "812K",
    "parameters_raw": 812404,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 100,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "opt",
    "hf_downloads": 388627,
    "hf_likes": 0,
    "release_date": "2025-11-13",
    "_discovered": true
  },
  {
    "name": "hmellor/tiny-random-LlamaForCausalLM",
    "provider": "hmellor",
    "parameter_count": "1M",
    "parameters_raw": 1062992,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 8192,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 1295572,
    "hf_likes": 0,
    "release_date": "2025-04-29",
    "_discovered": true
  },
  {
    "name": "peft-internal-testing/tiny-dummy-qwen2",
    "provider": "peft-internal-testing",
    "parameter_count": "1M",
    "parameters_raw": 1217480,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 102441,
    "hf_likes": 0,
    "release_date": "2024-07-04",
    "_discovered": true
  },
  {
    "name": "SimpleStories/SimpleStories-1.25M",
    "provider": "simplestories",
    "parameter_count": "1M",
    "parameters_raw": 1245824,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 512,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 86406,
    "hf_likes": 1,
    "release_date": "2025-04-22",
    "_discovered": true
  },
  {
    "name": "optimum-intel-internal-testing/tiny-random-Phi3ForCausalLM",
    "provider": "optimum-intel-internal-testing",
    "parameter_count": "2M",
    "parameters_raw": 2072736,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "phi3",
    "hf_downloads": 22058,
    "hf_likes": 0,
    "release_date": "2025-10-21",
    "_discovered": true
  },
  {
    "name": "llamafactory/tiny-random-qwen3",
    "provider": "llamafactory",
    "parameter_count": "2M",
    "parameters_raw": 2439264,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 47369,
    "hf_likes": 0,
    "release_date": "2026-01-06",
    "_discovered": true
  },
  {
    "name": "tiny-random/qwen3-next-moe",
    "provider": "tiny-random",
    "parameter_count": "3M",
    "parameters_raw": 2839160,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3_next",
    "hf_downloads": 27920,
    "hf_likes": 4,
    "release_date": "2025-09-12",
    "is_moe": true,
    "num_experts": 32,
    "active_experts": 10,
    "active_parameters": 984828,
    "_discovered": true
  },
  {
    "name": "llamafactory/tiny-random-Llama-3",
    "provider": "llamafactory",
    "parameter_count": "4M",
    "parameters_raw": 4112464,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 131072,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 950276,
    "hf_likes": 3,
    "release_date": "2024-06-07",
    "_discovered": true
  },
  {
    "name": "Maykeye/TinyLLama-v0",
    "provider": "maykeye",
    "parameter_count": "5M",
    "parameters_raw": 4621392,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 32384,
    "hf_likes": 43,
    "release_date": "2023-07-08",
    "_discovered": true
  },
  {
    "name": "optimum-intel-internal-testing/tiny-random-gpt-oss-mxfp4",
    "provider": "optimum-intel-internal-testing",
    "parameter_count": "7M",
    "parameters_raw": 6865444,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 131072,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_oss",
    "hf_downloads": 27904,
    "hf_likes": 0,
    "release_date": "2025-10-21",
    "is_moe": true,
    "num_experts": 32,
    "active_experts": 4,
    "active_parameters": 1158540,
    "_discovered": true
  },
  {
    "name": "hmellor/tiny-random-Gemma2ForCausalLM",
    "provider": "hmellor",
    "parameter_count": "8M",
    "parameters_raw": 8438816,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 8192,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gemma2",
    "hf_downloads": 339841,
    "hf_likes": 0,
    "release_date": "2025-04-29",
    "_discovered": true
  },
  {
    "name": "michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random",
    "provider": "michaelbenayoun",
    "parameter_count": "9M",
    "parameters_raw": 8537216,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 52387,
    "hf_likes": 0,
    "release_date": "2024-03-28",
    "_discovered": true
  },
  {
    "name": "tiiuae/falcon-mamba-tiny-dev",
    "provider": "TII",
    "parameter_count": "9M",
    "parameters_raw": 8765056,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "falcon_mamba",
    "hf_downloads": 21730,
    "hf_likes": 2,
    "release_date": "2024-10-13",
    "_discovered": true
  },
  {
    "name": "arnir0/Tiny-LLM",
    "provider": "arnir0",
    "parameter_count": "13M",
    "parameters_raw": 12988992,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 1024,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 54600,
    "hf_likes": 45,
    "release_date": "2024-11-03",
    "_discovered": true
  },
  {
    "name": "EleutherAI/pythia-14m",
    "provider": "eleutherai",
    "parameter_count": "14M",
    "parameters_raw": 14067712,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neox",
    "hf_downloads": 33322,
    "hf_likes": 0,
    "release_date": "2026-02-24",
    "_discovered": true
  },
  {
    "name": "hmellor/tiny-random-BambaForCausalLM",
    "provider": "hmellor",
    "parameter_count": "33M",
    "parameters_raw": 33110760,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "bamba",
    "hf_downloads": 173798,
    "hf_likes": 0,
    "release_date": "2025-04-29",
    "_discovered": true
  },
  {
    "name": "erwanf/gpt2-mini",
    "provider": "erwanf",
    "parameter_count": "39M",
    "parameters_raw": 38604288,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 512,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt2",
    "hf_downloads": 391187,
    "hf_likes": 2,
    "release_date": "2024-06-23",
    "_discovered": true
  },
  {
    "name": "EleutherAI/pythia-14m-deduped",
    "provider": "eleutherai",
    "parameter_count": "39M",
    "parameters_raw": 39233560,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neox",
    "hf_downloads": 69404,
    "hf_likes": 28,
    "release_date": "2023-07-19",
    "_discovered": true
  },
  {
    "name": "hyper-accel/tiny-random-llama",
    "provider": "hyper-accel",
    "parameter_count": "73M",
    "parameters_raw": 73271808,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 131072,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 44649,
    "hf_likes": 0,
    "release_date": "2025-02-10",
    "_discovered": true
  },
  {
    "name": "RedHatAI/SmolLM-135M-Instruct-quantized.w8a16",
    "provider": "redhatai",
    "parameter_count": "83M",
    "parameters_raw": 83356260,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 20835,
    "hf_likes": 0,
    "release_date": "2024-08-22",
    "_discovered": true
  },
  {
    "name": "tiiuae/Falcon-H1-Tiny-90M-Instruct",
    "provider": "TII",
    "parameter_count": "91M",
    "parameters_raw": 91131072,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "falcon_h1",
    "hf_downloads": 301062,
    "hf_likes": 33,
    "release_date": "2026-01-12",
    "_discovered": true
  },
  {
    "name": "EleutherAI/pythia-70m-deduped",
    "provider": "eleutherai",
    "parameter_count": "96M",
    "parameters_raw": 95592496,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neox",
    "hf_downloads": 613928,
    "hf_likes": 27,
    "release_date": "2023-02-13",
    "_discovered": true
  },
  {
    "name": "gratefulasi/lumeleto",
    "provider": "gratefulasi",
    "parameter_count": "124M",
    "parameters_raw": 124439808,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 1024,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt2",
    "hf_downloads": 47679,
    "hf_likes": 1,
    "release_date": "2025-04-24",
    "_discovered": true
  },
  {
    "name": "peft-internal-testing/opt-125m",
    "provider": "peft-internal-testing",
    "parameter_count": "125M",
    "parameters_raw": 125239296,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "opt",
    "hf_downloads": 232784,
    "hf_likes": 0,
    "release_date": "2025-11-19",
    "_discovered": true
  },
  {
    "name": "state-spaces/mamba-130m-hf",
    "provider": "state-spaces",
    "parameter_count": "129M",
    "parameters_raw": 129135360,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "mamba",
    "hf_downloads": 161407,
    "hf_likes": 68,
    "release_date": "2024-03-06",
    "_discovered": true
  },
  {
    "name": "HuggingFaceTB/SmolLM2-135M",
    "provider": "huggingfacetb",
    "parameter_count": "135M",
    "parameters_raw": 134515008,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 8192,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 954486,
    "hf_likes": 168,
    "release_date": "2024-10-31",
    "_discovered": true
  },
  {
    "name": "HuggingFaceTB/SmolLM2-135M-Instruct",
    "provider": "huggingfacetb",
    "parameter_count": "135M",
    "parameters_raw": 134515008,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 8192,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 603656,
    "hf_likes": 295,
    "release_date": "2024-10-31",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "unsloth/SmolLM2-135M-Instruct-GGUF",
        "provider": "unsloth"
      },
      {
        "repo": "bartowski/SmolLM2-135M-Instruct-GGUF",
        "provider": "bartowski"
      }
    ]
  },
  {
    "name": "HuggingFaceTB/SmolLM-135M-Instruct",
    "provider": "huggingfacetb",
    "parameter_count": "135M",
    "parameters_raw": 134515008,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 359214,
    "hf_likes": 133,
    "release_date": "2024-07-15",
    "_discovered": true
  },
  {
    "name": "HuggingFaceTB/SmolLM-135M",
    "provider": "huggingfacetb",
    "parameter_count": "135M",
    "parameters_raw": 134515008,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 156129,
    "hf_likes": 249,
    "release_date": "2024-07-14",
    "_discovered": true
  },
  {
    "name": "nomic-ai/nomic-embed-text-v1.5",
    "provider": "Nomic",
    "parameter_count": "137M",
    "parameters_raw": 137000000,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "F16",
    "context_length": 8192,
    "use_case": "Text embeddings for RAG",
    "pipeline_tag": "feature-extraction",
    "architecture": "nomic_bert",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": null
  },
  {
    "name": "EleutherAI/gpt-neo-125m",
    "provider": "eleutherai",
    "parameter_count": "150M",
    "parameters_raw": 150364416,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neo",
    "hf_downloads": 100060,
    "hf_likes": 227,
    "release_date": "2022-03-02",
    "_discovered": true
  },
  {
    "name": "JackFram/llama-160m",
    "provider": "jackfram",
    "parameter_count": "162M",
    "parameters_raw": 162417792,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 46025,
    "hf_likes": 36,
    "release_date": "2023-05-26",
    "_discovered": true
  },
  {
    "name": "microsoft/DialoGPT-small",
    "provider": "Microsoft",
    "parameter_count": "176M",
    "parameters_raw": 175620096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 1024,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt2",
    "hf_downloads": 58248,
    "hf_likes": 143,
    "release_date": "2022-03-02",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/LFM2.5-1.2B-Instruct-MLX-4bit",
    "provider": "lmstudio-community",
    "parameter_count": "183M",
    "parameters_raw": 182975232,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 441394,
    "hf_likes": 1,
    "release_date": "2026-01-07",
    "_discovered": true
  },
  {
    "name": "AI-Sweden-Models/gpt-sw3-126m",
    "provider": "ai-sweden-models",
    "parameter_count": "186M",
    "parameters_raw": 186112512,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt2",
    "hf_downloads": 115269,
    "hf_likes": 3,
    "release_date": "2022-12-14",
    "_discovered": true
  },
  {
    "name": "rinna/japanese-gpt-neox-small",
    "provider": "rinna",
    "parameter_count": "204M",
    "parameters_raw": 203611008,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neox",
    "hf_downloads": 457560,
    "hf_likes": 15,
    "release_date": "2022-08-31",
    "_discovered": true
  },
  {
    "name": "EleutherAI/pythia-160m-deduped",
    "provider": "eleutherai",
    "parameter_count": "213M",
    "parameters_raw": 212654688,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neox",
    "hf_downloads": 82245,
    "hf_likes": 3,
    "release_date": "2023-02-08",
    "_discovered": true
  },
  {
    "name": "Vamsi/T5_Paraphrase_Paws",
    "provider": "vamsi",
    "parameter_count": "223M",
    "parameters_raw": 222903936,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 512,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "t5",
    "hf_downloads": 83813,
    "hf_likes": 40,
    "release_date": "2022-03-02",
    "_discovered": true
  },
  {
    "name": "TitanML/tiny-mixtral",
    "provider": "titanml",
    "parameter_count": "247M",
    "parameters_raw": 246961152,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 131072,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "mixtral",
    "hf_downloads": 100054,
    "hf_likes": 2,
    "release_date": "2024-04-24",
    "is_moe": true,
    "num_experts": 8,
    "active_experts": 2,
    "active_parameters": 71001329,
    "_discovered": true
  },
  {
    "name": "lmstudio-community/LFM2.5-1.2B-Instruct-MLX-6bit",
    "provider": "lmstudio-community",
    "parameter_count": "256M",
    "parameters_raw": 256113408,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 441834,
    "hf_likes": 4,
    "release_date": "2026-01-07",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-1.7B-MLX-4bit",
    "provider": "lmstudio-community",
    "parameter_count": "269M",
    "parameters_raw": 268944384,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 40960,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 25290,
    "hf_likes": 0,
    "release_date": "2025-04-28",
    "_discovered": true
  },
  {
    "name": "google/t5gemma-s-s-prefixlm",
    "provider": "Google",
    "parameter_count": "313M",
    "parameters_raw": 312517632,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "t5gemma",
    "hf_downloads": 41131,
    "hf_likes": 2,
    "release_date": "2025-06-19",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/LFM2.5-1.2B-Instruct-MLX-8bit",
    "provider": "lmstudio-community",
    "parameter_count": "329M",
    "parameters_raw": 329251584,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 449901,
    "hf_likes": 2,
    "release_date": "2026-01-07",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/LFM2-1.2B-MLX-8bit",
    "provider": "lmstudio-community",
    "parameter_count": "329M",
    "parameters_raw": 329251584,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 26421,
    "hf_likes": 4,
    "release_date": "2025-07-14",
    "_discovered": true
  },
  {
    "name": "LiquidAI/LFM2-ColBERT-350M",
    "provider": "Liquid AI",
    "parameter_count": "353M",
    "parameters_raw": 353322752,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Semantic search, sentence similarity",
    "pipeline_tag": "sentence-similarity",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2-350M",
    "provider": "liquidai",
    "parameter_count": "354M",
    "parameters_raw": 354483968,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 41124,
    "hf_likes": 235,
    "release_date": "2025-07-10",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "unsloth/LFM2-350M-GGUF",
        "provider": "unsloth"
      }
    ]
  },
  {
    "name": "HuggingFaceTB/SmolLM2-360M",
    "provider": "huggingfacetb",
    "parameter_count": "362M",
    "parameters_raw": 361821120,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 8192,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 36444,
    "hf_likes": 87,
    "release_date": "2024-10-31",
    "_discovered": true
  },
  {
    "name": "LiquidAI/LFM2-350M-Extract",
    "provider": "Liquid AI",
    "parameter_count": "354M",
    "parameters_raw": 354483968,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Data extraction, structured output",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2-350M-Math",
    "provider": "Liquid AI",
    "parameter_count": "354M",
    "parameters_raw": 354483968,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Math reasoning, chain-of-thought",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2-350M-ENJP-MT",
    "provider": "Liquid AI",
    "parameter_count": "354M",
    "parameters_raw": 354483968,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "English-Japanese translation",
    "pipeline_tag": "translation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2-350M-PII-Extract-JP",
    "provider": "Liquid AI",
    "parameter_count": "354M",
    "parameters_raw": 354483968,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "PII extraction, Japanese",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "lmstudio-community/LFM2-350M-MLX-8bit",
    "provider": "lmstudio-community",
    "parameter_count": "354M",
    "parameters_raw": 354483968,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "mlx-8bit",
    "context_length": 128000,
    "use_case": "Lightweight, edge deployment",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "lmstudio-community/LFM2-350M-MLX-bf16",
    "provider": "lmstudio-community",
    "parameter_count": "354M",
    "parameters_raw": 354483968,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.7,
    "quantization": "BF16",
    "context_length": 128000,
    "use_case": "Lightweight, edge deployment",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "HuggingFaceTB/SmolLM-360M-Instruct",
    "provider": "huggingfacetb",
    "parameter_count": "362M",
    "parameters_raw": 361821120,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 26935,
    "hf_likes": 83,
    "release_date": "2024-07-15",
    "_discovered": true
  },
  {
    "name": "openbmb/MiniCPM4-0.5B",
    "provider": "openbmb",
    "parameter_count": "434M",
    "parameters_raw": 433873920,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "unknown",
    "hf_downloads": 28889,
    "hf_likes": 77,
    "release_date": "2025-06-05",
    "_discovered": true
  },
  {
    "name": "LiquidAI/LFM2-VL-450M",
    "provider": "Liquid AI",
    "parameter_count": "451M",
    "parameters_raw": 450822656,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Multimodal, vision and text",
    "pipeline_tag": "image-text-to-text",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "lmstudio-community/Qwen3-1.7B-MLX-8bit",
    "provider": "lmstudio-community",
    "parameter_count": "484M",
    "parameters_raw": 484000768,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 40960,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 28313,
    "hf_likes": 1,
    "release_date": "2025-04-28",
    "_discovered": true
  },
  {
    "name": "Qwen/Qwen2.5-0.5B-Instruct",
    "provider": "Alibaba",
    "parameter_count": "494M",
    "parameters_raw": 494032768,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Instruction following, chat",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 6992099,
    "hf_likes": 470,
    "release_date": "2024-09-16",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "bartowski/Qwen2.5-0.5B-Instruct-GGUF",
        "provider": "bartowski"
      }
    ]
  },
  {
    "name": "Qwen/Qwen2.5-Coder-0.5B-Instruct",
    "provider": "Alibaba",
    "parameter_count": "494M",
    "parameters_raw": 494032768,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Code generation and completion",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 1408034,
    "hf_likes": 65,
    "release_date": "2024-11-06",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "unsloth/Qwen2.5-Coder-0.5B-Instruct-GGUF",
        "provider": "unsloth"
      },
      {
        "repo": "bartowski/Qwen2.5-Coder-0.5B-Instruct-GGUF",
        "provider": "bartowski"
      }
    ]
  },
  {
    "name": "Qwen/Qwen2.5-0.5B",
    "provider": "Alibaba",
    "parameter_count": "494M",
    "parameters_raw": 494032768,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 1200041,
    "hf_likes": 378,
    "release_date": "2024-09-15",
    "_discovered": true
  },
  {
    "name": "Qwen/Qwen2-0.5B-Instruct",
    "provider": "Alibaba",
    "parameter_count": "494M",
    "parameters_raw": 494032768,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 259334,
    "hf_likes": 200,
    "release_date": "2024-06-03",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "bartowski/Qwen2-0.5B-Instruct-GGUF",
        "provider": "bartowski"
      }
    ]
  },
  {
    "name": "Gensyn/Qwen2.5-0.5B-Instruct",
    "provider": "gensyn",
    "parameter_count": "494M",
    "parameters_raw": 494032768,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Instruction following, chat",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 106514,
    "hf_likes": 33,
    "release_date": "2025-03-28",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "bartowski/Qwen2.5-0.5B-Instruct-GGUF",
        "provider": "bartowski"
      }
    ]
  },
  {
    "name": "Qwen/Qwen2.5-Coder-0.5B",
    "provider": "Alibaba",
    "parameter_count": "494M",
    "parameters_raw": 494032768,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Code generation and completion",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 64868,
    "hf_likes": 44,
    "release_date": "2024-11-08",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "bartowski/Qwen2.5-Coder-0.5B-GGUF",
        "provider": "bartowski"
      }
    ]
  },
  {
    "name": "EleutherAI/pythia-410m",
    "provider": "eleutherai",
    "parameter_count": "506M",
    "parameters_raw": 505997504,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neox",
    "hf_downloads": 88847,
    "hf_likes": 36,
    "release_date": "2023-02-13",
    "_discovered": true
  },
  {
    "name": "EleutherAI/pythia-410m-deduped",
    "provider": "eleutherai",
    "parameter_count": "506M",
    "parameters_raw": 505997504,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neox",
    "hf_downloads": 32196,
    "hf_likes": 20,
    "release_date": "2023-02-13",
    "_discovered": true
  },
  {
    "name": "h2oai/h2o-danube3-500m-chat",
    "provider": "h2oai",
    "parameter_count": "514M",
    "parameters_raw": 513590784,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 8192,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 31122,
    "hf_likes": 39,
    "release_date": "2024-07-04",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "bartowski/h2o-danube3-500m-chat-GGUF",
        "provider": "bartowski"
      }
    ]
  },
  {
    "name": "tiiuae/Falcon-H1-0.5B-Base",
    "provider": "TII",
    "parameter_count": "521M",
    "parameters_raw": 521411104,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 16384,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "falcon_h1",
    "hf_downloads": 25562,
    "hf_likes": 16,
    "release_date": "2025-05-01",
    "_discovered": true
  },
  {
    "name": "RedHatAI/Qwen3-30B-A3B-Instruct-2507-speculator.eagle3",
    "provider": "redhatai",
    "parameter_count": "522M",
    "parameters_raw": 522152832,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "Instruction following, chat",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "unknown",
    "hf_downloads": 115085,
    "hf_likes": 1,
    "release_date": "2025-12-12",
    "_discovered": true
  },
  {
    "name": "z-lab/Qwen3-4B-DFlash-b16",
    "provider": "z-lab",
    "parameter_count": "537M",
    "parameters_raw": 537427200,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 40960,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 25679,
    "hf_likes": 22,
    "release_date": "2026-01-04",
    "_discovered": true
  },
  {
    "name": "bigscience/bloomz-560m",
    "provider": "bigscience",
    "parameter_count": "559M",
    "parameters_raw": 559214592,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "bloom",
    "hf_downloads": 1303926,
    "hf_likes": 137,
    "release_date": "2022-10-08",
    "_discovered": true
  },
  {
    "name": "bigscience/bloom-560m",
    "provider": "bigscience",
    "parameter_count": "559M",
    "parameters_raw": 559214592,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "bloom",
    "hf_downloads": 134778,
    "hf_likes": 371,
    "release_date": "2022-05-19",
    "_discovered": true
  },
  {
    "name": "Qwen/Qwen3-4B-MLX-4bit",
    "provider": "Alibaba",
    "parameter_count": "566M",
    "parameters_raw": 565828096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 65536,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 74343,
    "hf_likes": 26,
    "release_date": "2025-05-23",
    "_discovered": true
  },
  {
    "name": "google/t5gemma-b-b-ul2",
    "provider": "Google",
    "parameter_count": "591M",
    "parameters_raw": 591490560,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "t5gemma",
    "hf_downloads": 39788,
    "hf_likes": 2,
    "release_date": "2025-06-19",
    "_discovered": true
  },
  {
    "name": "google/t5gemma-b-b-prefixlm",
    "provider": "Google",
    "parameter_count": "591M",
    "parameters_raw": 591490560,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "pipeline_tag": "text-generation",
    "architecture": "t5gemma",
    "hf_downloads": 1187971,
    "hf_likes": 13,
    "release_date": "2025-06-19",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Phi-4-mini-reasoning-MLX-4bit",
    "provider": "lmstudio-community",
    "parameter_count": "600M",
    "parameters_raw": 599546880,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 131072,
    "use_case": "Advanced reasoning, chain-of-thought",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "phi3",
    "hf_downloads": 43404,
    "hf_likes": 3,
    "release_date": "2025-05-01",
    "_discovered": true
  },
  {
    "name": "Qwen/Qwen1.5-0.5B-Chat",
    "provider": "Alibaba",
    "parameter_count": "620M",
    "parameters_raw": 619570176,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 87380,
    "hf_likes": 92,
    "release_date": "2024-01-31",
    "_discovered": true
  },
  {
    "name": "Qwen/Qwen1.5-0.5B",
    "provider": "Alibaba",
    "parameter_count": "620M",
    "parameters_raw": 619570176,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "qwen2",
    "hf_downloads": 26651,
    "hf_likes": 173,
    "release_date": "2024-01-22",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit",
    "provider": "lmstudio-community",
    "parameter_count": "629M",
    "parameters_raw": 628676096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 95794,
    "hf_likes": 10,
    "release_date": "2025-08-06",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-4B-Instruct-2507-MLX-4bit",
    "provider": "lmstudio-community",
    "parameter_count": "629M",
    "parameters_raw": 628676096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "Instruction following, chat",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 66279,
    "hf_likes": 3,
    "release_date": "2025-08-06",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-4B-MLX-4bit",
    "provider": "lmstudio-community",
    "parameter_count": "629M",
    "parameters_raw": 628676096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 40960,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 21982,
    "hf_likes": 1,
    "release_date": "2025-04-28",
    "_discovered": true
  },
  {
    "name": "LiquidAI/LFM2-700M",
    "provider": "Liquid AI",
    "parameter_count": "742M",
    "parameters_raw": 742489344,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Lightweight, edge deployment",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "lmstudio-community/LFM2-700M-MLX-8bit",
    "provider": "lmstudio-community",
    "parameter_count": "742M",
    "parameters_raw": 742489344,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.8,
    "quantization": "mlx-8bit",
    "context_length": 128000,
    "use_case": "Lightweight, edge deployment",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "lmstudio-community/LFM2-700M-MLX-bf16",
    "provider": "lmstudio-community",
    "parameter_count": "742M",
    "parameters_raw": 742489344,
    "min_ram_gb": 1.7,
    "recommended_ram_gb": 2.8,
    "min_vram_gb": 1.5,
    "quantization": "BF16",
    "context_length": 128000,
    "use_case": "Lightweight, edge deployment",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "Qwen/Qwen3-0.6B",
    "provider": "Alibaba",
    "parameter_count": "752M",
    "parameters_raw": 751632384,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 40960,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 11310453,
    "hf_likes": 1120,
    "release_date": "2025-04-27",
    "gguf_sources": [
      {
        "repo": "unsloth/Qwen3-0.6B-GGUF",
        "provider": "unsloth"
      }
    ]
  },
  {
    "name": "Qwen/Qwen3Guard-Gen-0.6B",
    "provider": "Alibaba",
    "parameter_count": "752M",
    "parameters_raw": 751632384,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 32768,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 146728,
    "hf_likes": 62,
    "release_date": "2025-09-23",
    "_discovered": true
  },
  {
    "name": "Qwen/Qwen3-0.6B-FP8",
    "provider": "Alibaba",
    "parameter_count": "752M",
    "parameters_raw": 751659264,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 40960,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 1648717,
    "hf_likes": 57,
    "release_date": "2025-04-28",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-4B-Instruct-2507-MLX-5bit",
    "provider": "lmstudio-community",
    "parameter_count": "754M",
    "parameters_raw": 754372096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "Instruction following, chat",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 62740,
    "hf_likes": 0,
    "release_date": "2025-08-06",
    "_discovered": true
  },
  {
    "name": "h2oai/h2ovl-mississippi-800m",
    "provider": "h2oai",
    "parameter_count": "826M",
    "parameters_raw": 826295808,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "h2ovl_chat",
    "hf_downloads": 1014882,
    "hf_likes": 39,
    "release_date": "2024-10-16",
    "_discovered": true
  },
  {
    "name": "Qwen/Qwen3.5-0.8B",
    "provider": "Alibaba",
    "parameter_count": "873M",
    "parameters_raw": 873438784,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "General purpose",
    "capabilities": [
      "vision",
      "tool_use"
    ],
    "pipeline_tag": "image-text-to-text",
    "architecture": "qwen3_5",
    "hf_downloads": 93448,
    "hf_likes": 208,
    "release_date": "2026-02-28",
    "gguf_sources": [
      {
        "repo": "unsloth/Qwen3.5-0.8B-GGUF",
        "provider": "unsloth"
      }
    ]
  },
  {
    "name": "Qwen/Qwen3.5-0.8B-Base",
    "provider": "Alibaba",
    "parameter_count": "873M",
    "parameters_raw": 873438784,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "General purpose",
    "capabilities": [
      "vision",
      "tool_use"
    ],
    "pipeline_tag": "image-text-to-text",
    "architecture": "qwen3_5",
    "hf_downloads": 4680,
    "hf_likes": 37,
    "release_date": "2026-02-28"
  },
  {
    "name": "lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit",
    "provider": "lmstudio-community",
    "parameter_count": "880M",
    "parameters_raw": 880068096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 91703,
    "hf_likes": 2,
    "release_date": "2025-08-06",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-4B-Instruct-2507-MLX-6bit",
    "provider": "lmstudio-community",
    "parameter_count": "880M",
    "parameters_raw": 880068096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "Instruction following, chat",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 62883,
    "hf_likes": 0,
    "release_date": "2025-08-06",
    "_discovered": true
  },
  {
    "name": "Joaoffg/ELM",
    "provider": "joaoffg",
    "parameter_count": "903M",
    "parameters_raw": 902891520,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 339775,
    "hf_likes": 2,
    "release_date": "2024-05-29",
    "_discovered": true
  },
  {
    "name": "RedHatAI/Qwen3-8B-speculator.eagle3",
    "provider": "redhatai",
    "parameter_count": "1.0B",
    "parameters_raw": 1022037632,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.5,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "unknown",
    "hf_downloads": 76636,
    "hf_likes": 2,
    "release_date": "2025-09-19",
    "_discovered": true
  },
  {
    "name": "EleutherAI/pythia-1b",
    "provider": "eleutherai",
    "parameter_count": "1.1B",
    "parameters_raw": 1078891008,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neox",
    "hf_downloads": 27818,
    "hf_likes": 43,
    "release_date": "2023-03-10",
    "_discovered": true
  },
  {
    "name": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "provider": "Community",
    "parameter_count": "1.1B",
    "parameters_raw": 1100048384,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 1870099,
    "hf_likes": 1538,
    "release_date": "2023-12-30"
  },
  {
    "name": "nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change",
    "provider": "nm-testing",
    "parameter_count": "1.1B",
    "parameters_raw": 1100048692,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "Lightweight, edge deployment",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 31348,
    "hf_likes": 0,
    "release_date": "2024-06-12",
    "_discovered": true
  },
  {
    "name": "bigcode/gpt_bigcode-santacoder",
    "provider": "BigCode",
    "parameter_count": "1.1B",
    "parameters_raw": 1124886528,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "Code generation and completion",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_bigcode",
    "hf_downloads": 49973,
    "hf_likes": 26,
    "release_date": "2023-04-06",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit",
    "provider": "lmstudio-community",
    "parameter_count": "1.1B",
    "parameters_raw": 1131460096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 93477,
    "hf_likes": 7,
    "release_date": "2025-08-06",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-4B-Instruct-2507-MLX-8bit",
    "provider": "lmstudio-community",
    "parameter_count": "1.1B",
    "parameters_raw": 1131460096,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 262144,
    "use_case": "Instruction following, chat",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 63832,
    "hf_likes": 1,
    "release_date": "2025-08-06",
    "_discovered": true
  },
  {
    "name": "LiquidAI/LFM2.5-1.2B-Instruct",
    "provider": "liquidai",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 116655,
    "hf_likes": 516,
    "release_date": "2026-01-06",
    "_discovered": true,
    "gguf_sources": [
      {
        "repo": "unsloth/LFM2.5-1.2B-Instruct-GGUF",
        "provider": "unsloth"
      }
    ]
  },
  {
    "name": "lmstudio-community/LFM2-1.2B-MLX-bf16",
    "provider": "lmstudio-community",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 26071,
    "hf_likes": 6,
    "release_date": "2025-07-14",
    "_discovered": true
  },
  {
    "name": "LiquidAI/LFM2-1.2B",
    "provider": "Liquid AI",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "General purpose text generation",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2.5-1.2B-Base",
    "provider": "Liquid AI",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "General purpose text generation",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2.5-1.2B-Thinking",
    "provider": "Liquid AI",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Advanced reasoning, chain-of-thought",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2.5-1.2B-JP",
    "provider": "Liquid AI",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Japanese language, multilingual chat",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2-1.2B-Tool",
    "provider": "Liquid AI",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Tool calling, function calling",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2-1.2B-RAG",
    "provider": "Liquid AI",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Retrieval-augmented generation",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "LiquidAI/LFM2-1.2B-Extract",
    "provider": "Liquid AI",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 128000,
    "use_case": "Data extraction, structured output",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "lmstudio-community/LFM2.5-1.2B-Thinking-MLX-8bit",
    "provider": "lmstudio-community",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 1.3,
    "recommended_ram_gb": 2.2,
    "min_vram_gb": 1.2,
    "quantization": "mlx-8bit",
    "context_length": 128000,
    "use_case": "Advanced reasoning, chain-of-thought",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "lmstudio-community/LFM2.5-1.2B-Thinking-MLX-bf16",
    "provider": "lmstudio-community",
    "parameter_count": "1.2B",
    "parameters_raw": 1170340608,
    "min_ram_gb": 2.6,
    "recommended_ram_gb": 4.4,
    "min_vram_gb": 2.4,
    "quantization": "BF16",
    "context_length": 128000,
    "use_case": "Advanced reasoning, chain-of-thought",
    "pipeline_tag": "text-generation",
    "architecture": "lfm2",
    "hf_downloads": 0,
    "hf_likes": 0,
    "release_date": "2025-11-28"
  },
  {
    "name": "allenai/OLMo-1B-hf",
    "provider": "allenai",
    "parameter_count": "1.2B",
    "parameters_raw": 1176764416,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "olmo",
    "hf_downloads": 23538,
    "hf_likes": 26,
    "release_date": "2024-04-12",
    "_discovered": true
  },
  {
    "name": "Zyphra/Zamba2-1.2B-instruct",
    "provider": "zyphra",
    "parameter_count": "1.2B",
    "parameters_raw": 1215064704,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "zamba2",
    "hf_downloads": 72584,
    "hf_likes": 30,
    "release_date": "2024-09-19",
    "_discovered": true
  },
  {
    "name": "meta-llama/Llama-3.2-1B",
    "provider": "Meta",
    "parameter_count": "1.2B",
    "parameters_raw": 1235814400,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "llama",
    "hf_downloads": 1453836,
    "hf_likes": 2306,
    "release_date": "2024-09-18"
  },
  {
    "name": "hmellor/Ilama-3.2-1B",
    "provider": "hmellor",
    "parameter_count": "1.2B",
    "parameters_raw": 1235814400,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "Q4_K_M",
    "context_length": 131072,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "ilama",
    "hf_downloads": 89998,
    "hf_likes": 0,
    "release_date": "2025-07-22",
    "_discovered": true
  },
  {
    "name": "warshanks/Jan-nano-AWQ",
    "provider": "warshanks",
    "parameter_count": "1.3B",
    "parameters_raw": 1264206840,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.6,
    "quantization": "AWQ-4bit",
    "context_length": 40960,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 99084,
    "hf_likes": 3,
    "release_date": "2025-07-12",
    "_discovered": true,
    "format": "awq"
  },
  {
    "name": "LGAI-EXAONE/EXAONE-4.0-1.2B",
    "provider": "lgai-exaone",
    "parameter_count": "1.3B",
    "parameters_raw": 1279391488,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.7,
    "quantization": "Q4_K_M",
    "context_length": 65536,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "exaone4",
    "hf_downloads": 100975,
    "hf_likes": 172,
    "release_date": "2025-07-11"
  },
  {
    "name": "lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit",
    "provider": "lmstudio-community",
    "parameter_count": "1.3B",
    "parameters_raw": 1280062464,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.7,
    "quantization": "Q4_K_M",
    "context_length": 131072,
    "use_case": "Advanced reasoning, chain-of-thought",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 348365,
    "hf_likes": 7,
    "release_date": "2025-05-29",
    "_discovered": true
  },
  {
    "name": "lmstudio-community/Qwen3-8B-MLX-4bit",
    "provider": "lmstudio-community",
    "parameter_count": "1.3B",
    "parameters_raw": 1280062464,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.7,
    "quantization": "Q4_K_M",
    "context_length": 40960,
    "use_case": "General purpose text generation",
    "capabilities": [
      "tool_use"
    ],
    "pipeline_tag": "text-generation",
    "architecture": "qwen3",
    "hf_downloads": 39201,
    "hf_likes": 2,
    "release_date": "2025-04-28",
    "_discovered": true
  },
  {
    "name": "pfnet/plamo-2-1b",
    "provider": "pfnet",
    "parameter_count": "1.3B",
    "parameters_raw": 1291441920,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.7,
    "quantization": "Q4_K_M",
    "context_length": 10485760,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "plamo2",
    "hf_downloads": 63725,
    "hf_likes": 38,
    "release_date": "2025-02-05",
    "_discovered": true
  },
  {
    "name": "EleutherAI/gpt-neo-1.3B",
    "provider": "eleutherai",
    "parameter_count": "1.4B",
    "parameters_raw": 1365907456,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.7,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "gpt_neo",
    "hf_downloads": 48440,
    "hf_likes": 324,
    "release_date": "2022-03-02",
    "_discovered": true
  },
  {
    "name": "microsoft/phi-1_5",
    "provider": "Microsoft",
    "parameter_count": "1.4B",
    "parameters_raw": 1418270720,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.7,
    "quantization": "Q4_K_M",
    "context_length": 2048,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "phi",
    "hf_downloads": 152337,
    "hf_likes": 1355,
    "release_date": "2023-09-10",
    "_discovered": true
  },
  {
    "name": "starvector/starvector-1b-im2svg",
    "provider": "starvector",
    "parameter_count": "1.4B",
    "parameters_raw": 1434095620,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.7,
    "quantization": "Q4_K_M",
    "context_length": 8192,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "starvector",
    "hf_downloads": 38196,
    "hf_likes": 184,
    "release_date": "2025-01-11",
    "_discovered": true
  },
  {
    "name": "allenai/OLMo-2-0425-1B",
    "provider": "allenai",
    "parameter_count": "1.5B",
    "parameters_raw": 1484916736,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.8,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "General purpose text generation",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "olmo2",
    "hf_downloads": 533223,
    "hf_likes": 70,
    "release_date": "2025-04-17",
    "_discovered": true
  },
  {
    "name": "allenai/OLMo-2-0425-1B-Instruct",
    "provider": "allenai",
    "parameter_count": "1.5B",
    "parameters_raw": 1484916736,
    "min_ram_gb": 1.0,
    "recommended_ram_gb": 2.0,
    "min_vram_gb": 0.8,
    "quantization": "Q4_K_M",
    "context_length": 4096,
    "use_case": "Instruction following, chat",
    "capabilities": [],
    "pipeline_tag": "text-generation",
    "architecture": "olmo2",
    "hf_downloads": 38389,
    "hf_likes": 56,
    "release_date": "2025-04-29",
    "_discovered": true,
    "gguf_sources":
Download .txt
gitextract_gm363zlf/

├── .dockerignore
├── .githooks/
│   └── pre-push
├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       ├── ci.yml
│       ├── docker.yml
│       ├── release-desktop.yml
│       └── release.yml
├── .gitignore
├── .release-please-manifest.json
├── AGENTS.md
├── API.md
├── CHANGELOG.md
├── CNAME
├── Cargo.toml
├── Dockerfile
├── LICENSE
├── MODELS.md
├── Makefile
├── README.md
├── README.zh.md
├── data/
│   └── hf_models.json
├── flake.nix
├── index.html
├── install.sh
├── llmfit-core/
│   ├── Cargo.toml
│   ├── data/
│   │   ├── docker_models.json
│   │   └── hf_models.json
│   └── src/
│       ├── fit.rs
│       ├── hardware.rs
│       ├── lib.rs
│       ├── models.rs
│       ├── plan.rs
│       └── providers.rs
├── llmfit-desktop/
│   ├── Cargo.toml
│   ├── build.rs
│   ├── capabilities/
│   │   └── default.json
│   ├── src/
│   │   └── main.rs
│   ├── tauri.conf.json
│   └── ui/
│       ├── app.js
│       ├── index.html
│       └── styles.css
├── llmfit-tui/
│   ├── Cargo.toml
│   ├── build.rs
│   └── src/
│       ├── display.rs
│       ├── main.rs
│       ├── serve_api.rs
│       ├── theme.rs
│       ├── tui_app.rs
│       ├── tui_events.rs
│       └── tui_ui.rs
├── llmfit-web/
│   ├── README.md
│   ├── index.html
│   ├── package.json
│   ├── src/
│   │   ├── App.jsx
│   │   ├── App.test.jsx
│   │   ├── api.js
│   │   ├── api.test.js
│   │   ├── main.jsx
│   │   ├── styles.css
│   │   └── test-setup.js
│   └── vite.config.js
├── scripts/
│   ├── install-openclaw-skill.sh
│   ├── scrape_docker_models.py
│   ├── scrape_hf_models.py
│   ├── test_api.py
│   ├── update_models.sh
│   └── verify_models.py
└── skills/
    └── llmfit-advisor/
        └── SKILL.md
Download .txt
SYMBOL INDEX (895 symbols across 23 files)

FILE: llmfit-core/src/fit.rs
  type InferenceRuntime (line 7) | pub enum InferenceRuntime {
    method label (line 14) | pub fn label(&self) -> &'static str {
  type SortColumn (line 25) | pub enum SortColumn {
    method label (line 36) | pub fn label(&self) -> &str {
    method next (line 48) | pub fn next(&self) -> Self {
  type FitLevel (line 64) | pub enum FitLevel {
  type RunMode (line 74) | pub enum RunMode {
  type ScoreComponents (line 83) | pub struct ScoreComponents {
  type ModelFit (line 95) | pub struct ModelFit {
    method analyze (line 114) | pub fn analyze(model: &LlmModel, system: &SystemSpecs) -> Self {
    method analyze_with_context_limit (line 118) | pub fn analyze_with_context_limit(
    method analyze_with_forced_runtime (line 131) | pub fn analyze_with_forced_runtime(
    method analyze_inner (line 140) | fn analyze_inner(
    method fit_emoji (line 369) | pub fn fit_emoji(&self) -> &str {
    method fit_text (line 378) | pub fn fit_text(&self) -> &str {
    method runtime_text (line 387) | pub fn runtime_text(&self) -> &str {
    method run_mode_text (line 391) | pub fn run_mode_text(&self) -> &str {
  function score_fit (line 405) | fn score_fit(
  function cpu_path (line 450) | fn cpu_path(
  function moe_offload_path (line 478) | fn moe_offload_path(
  function moe_memory_for_quant (line 557) | fn moe_memory_for_quant(model: &LlmModel, quant: &str) -> Option<(f64, f...
  function best_quant_for_runtime_budget (line 573) | fn best_quant_for_runtime_budget(
  function backend_compatible (line 599) | pub fn backend_compatible(model: &LlmModel, system: &SystemSpecs) -> bool {
  function rank_models_by_fit (line 622) | pub fn rank_models_by_fit(models: Vec<ModelFit>) -> Vec<ModelFit> {
  function rank_models_by_fit_opts (line 626) | pub fn rank_models_by_fit_opts(models: Vec<ModelFit>, installed_first: b...
  function rank_models_by_fit_opts_col (line 630) | pub fn rank_models_by_fit_opts_col(
  function estimate_tps (line 750) | fn estimate_tps(
  function compute_scores (line 864) | fn compute_scores(
  function quality_score (line 881) | fn quality_score(model: &LlmModel, quant: &str, use_case: UseCase) -> f64 {
  function speed_score (line 958) | fn speed_score(tps: f64, use_case: UseCase) -> f64 {
  function fit_score (line 968) | fn fit_score(required: f64, available: f64) -> f64 {
  function context_score (line 989) | fn context_score(model: &LlmModel, use_case: UseCase) -> f64 {
  function weighted_score (line 1007) | fn weighted_score(sc: ScoreComponents, use_case: UseCase) -> f64 {
  function test_model (line 1029) | fn test_model(param_count: &str, min_ram: f64, min_vram: Option<f64>) ->...
  function test_system (line 1052) | fn test_system(ram: f64, has_gpu: bool, vram: Option<f64>) -> SystemSpecs {
  function test_score_fit_too_tight (line 1082) | fn test_score_fit_too_tight() {
  function test_score_fit_gpu_perfect (line 1089) | fn test_score_fit_gpu_perfect() {
  function test_score_fit_gpu_good (line 1096) | fn test_score_fit_gpu_good() {
  function test_score_fit_gpu_marginal (line 1103) | fn test_score_fit_gpu_marginal() {
  function test_score_fit_cpu_caps_at_marginal (line 1110) | fn test_score_fit_cpu_caps_at_marginal() {
  function test_score_fit_cpu_offload_caps_at_good (line 1117) | fn test_score_fit_cpu_offload_caps_at_good() {
  function test_score_fit_moe_offload (line 1124) | fn test_score_fit_moe_offload() {
  function test_model_fit_gpu_path (line 1139) | fn test_model_fit_gpu_path() {
  function test_model_fit_cpu_only (line 1152) | fn test_model_fit_cpu_only() {
  function test_model_fit_cpu_offload (line 1165) | fn test_model_fit_cpu_offload() {
  function test_model_fit_unified_memory (line 1181) | fn test_model_fit_unified_memory() {
  function test_model_fit_too_tight (line 1194) | fn test_model_fit_too_tight() {
  function test_moe_offload_tries_lower_quantization (line 1205) | fn test_moe_offload_tries_lower_quantization() {
  function test_dense_model_uses_quant_in_path_selection (line 1237) | fn test_dense_model_uses_quant_in_path_selection() {
  function test_model_fit_utilization (line 1270) | fn test_model_fit_utilization() {
  function test_rank_models_by_fit (line 1290) | fn test_rank_models_by_fit() {
  function test_rank_models_separates_runnable_from_too_tight (line 1319) | fn test_rank_models_separates_runnable_from_too_tight() {
  function test_fit_score_sweet_spot (line 1348) | fn test_fit_score_sweet_spot() {
  function test_fit_score_under_utilized (line 1358) | fn test_fit_score_under_utilized() {
  function test_fit_score_tight (line 1366) | fn test_fit_score_tight() {
  function test_fit_score_exceeds_available (line 1374) | fn test_fit_score_exceeds_available() {
  function test_speed_score_normalized (line 1381) | fn test_speed_score_normalized() {
  function test_context_score (line 1396) | fn test_context_score() {
  function test_quality_score_by_params (line 1409) | fn test_quality_score_by_params() {
  function test_quality_score_quant_penalty (line 1424) | fn test_quality_score_quant_penalty() {
  function test_weighted_score_composition (line 1437) | fn test_weighted_score_composition() {
  function test_estimate_tps_mlx_faster_than_llamacpp (line 1460) | fn test_estimate_tps_mlx_faster_than_llamacpp() {
  function test_analyze_selects_mlx_on_apple_silicon (line 1488) | fn test_analyze_selects_mlx_on_apple_silicon() {
  function test_analyze_defaults_llamacpp_on_cuda (line 1501) | fn test_analyze_defaults_llamacpp_on_cuda() {
  function test_analyze_with_context_limit_reduces_memory_estimate (line 1510) | fn test_analyze_with_context_limit_reduces_memory_estimate() {
  function test_estimate_tps_run_mode_penalties (line 1528) | fn test_estimate_tps_run_mode_penalties() {
  function test_estimate_tps_moe_uses_active_parameters (line 1572) | fn test_estimate_tps_moe_uses_active_parameters() {
  function test_estimate_tps_moe_without_active_parameters_falls_back_to_total (line 1599) | fn test_estimate_tps_moe_without_active_parameters_falls_back_to_total() {
  function test_sort_by_tps (line 1630) | fn test_sort_by_tps() {
  function test_sort_by_release_date (line 1651) | fn test_sort_by_release_date() {
  function test_system_with_gpu (line 1685) | fn test_system_with_gpu(ram: f64, vram: f64, gpu_name: &str) -> SystemSp...
  function test_bandwidth_estimation_rtx4090_faster_than_rtx3060 (line 1703) | fn test_bandwidth_estimation_rtx4090_faster_than_rtx3060() {
  function test_bandwidth_estimation_rtx4090_27b_q4_realistic (line 1731) | fn test_bandwidth_estimation_rtx4090_27b_q4_realistic() {
  function test_bandwidth_estimation_t4_7b_f16_realistic (line 1750) | fn test_bandwidth_estimation_t4_7b_f16_realistic() {
  function test_bandwidth_estimation_unknown_gpu_uses_fallback (line 1769) | fn test_bandwidth_estimation_unknown_gpu_uses_fallback() {
  function test_bandwidth_estimation_cpu_only_ignores_bandwidth (line 1788) | fn test_bandwidth_estimation_cpu_only_ignores_bandwidth() {
  function test_prequantized_requires_cuda_or_rocm (line 1817) | fn test_prequantized_requires_cuda_or_rocm() {
  function test_awq_incompatible_on_volta_v100 (line 1847) | fn test_awq_incompatible_on_volta_v100() {
  function test_gptq_incompatible_on_volta_v100 (line 1858) | fn test_gptq_incompatible_on_volta_v100() {
  function test_awq_compatible_on_turing_and_newer (line 1868) | fn test_awq_compatible_on_turing_and_newer() {
  function test_awq_on_rocm_always_compatible (line 1891) | fn test_awq_on_rocm_always_compatible() {
  function test_awq_on_pascal_incompatible (line 1903) | fn test_awq_on_pascal_incompatible() {
  function test_gguf_on_volta_still_compatible (line 1914) | fn test_gguf_on_volta_still_compatible() {

FILE: llmfit-core/src/hardware.rs
  type GpuBackend (line 6) | pub enum GpuBackend {
    method label (line 18) | pub fn label(&self) -> &'static str {
  type GpuInfo (line 34) | pub struct GpuInfo {
  type SystemSpecs (line 43) | pub struct SystemSpecs {
    method detect (line 63) | pub fn detect() -> Self {
    method detect_all_gpus (line 121) | fn detect_all_gpus(total_ram_gb: f64, cpu_name: &str) -> Vec<GpuInfo> {
    method detect_nvidia_gpus (line 258) | fn detect_nvidia_gpus() -> Vec<GpuInfo> {
    method try_nvidia_smi_with_addressing_mode (line 287) | fn try_nvidia_smi_with_addressing_mode() -> Option<Vec<GpuInfo>> {
    method parse_nvidia_smi_extended (line 306) | fn parse_nvidia_smi_extended(text: &str) -> Vec<GpuInfo> {
    method parse_nvidia_smi_list (line 374) | fn parse_nvidia_smi_list(text: &str) -> Vec<GpuInfo> {
    method detect_nvidia_gpu_sysfs_info (line 431) | fn detect_nvidia_gpu_sysfs_info() -> Option<GpuInfo> {
    method detect_amd_gpu_rocm_info (line 513) | fn detect_amd_gpu_rocm_info() -> Option<GpuInfo> {
    method detect_amd_gpu_sysfs_info (line 602) | fn detect_amd_gpu_sysfs_info() -> Option<GpuInfo> {
    method get_amd_gpu_name_lspci (line 671) | fn get_amd_gpu_name_lspci(slot_hints: &[String]) -> Option<String> {
    method get_nvidia_gpu_name_lspci (line 703) | fn get_nvidia_gpu_name_lspci(slot_hints: &[String]) -> Option<String> {
    method lspci_output (line 735) | fn lspci_output() -> Option<String> {
    method extract_model_from_lspci_line (line 757) | fn extract_model_from_lspci_line(line: &str) -> Option<String> {
    method detect_gpu_windows_info (line 793) | fn detect_gpu_windows_info() -> Vec<GpuInfo> {
    method detect_gpu_windows_wmic_list (line 817) | fn detect_gpu_windows_wmic_list() -> Vec<GpuInfo> {
    method parse_windows_gpu_list (line 868) | fn parse_windows_gpu_list(text: &str) -> Vec<GpuInfo> {
    method resolve_wmi_vram (line 906) | fn resolve_wmi_vram(raw_bytes: u64, name: &str) -> Option<f64> {
    method infer_gpu_backend (line 918) | fn infer_gpu_backend(name: &str) -> GpuBackend {
    method detect_intel_gpu (line 942) | fn detect_intel_gpu() -> Option<f64> {
    method detect_apple_gpu (line 1003) | fn detect_apple_gpu(total_ram_gb: f64) -> Option<f64> {
    method has_command (line 1032) | fn has_command(command: &str) -> bool {
    method detect_vulkan_gpu_info (line 1057) | fn detect_vulkan_gpu_info() -> Vec<GpuInfo> {
    method is_same_gpu_name (line 1095) | fn is_same_gpu_name(existing_name: &str, candidate_name: &str) -> bool {
    method normalize_gpu_name_for_dedupe (line 1100) | fn normalize_gpu_name_for_dedupe(name: &str) -> String {
    method parse_vulkan_device_names (line 1117) | fn parse_vulkan_device_names(text: &str) -> Vec<String> {
    method is_software_vulkan_device (line 1151) | fn is_software_vulkan_device(name: &str) -> bool {
    method detect_ascend_npus (line 1160) | fn detect_ascend_npus() -> Vec<GpuInfo> {
    method available_ram_fallback (line 1221) | fn available_ram_fallback(sys: &System, total_bytes: u64, total_gb: f6...
    method available_ram_from_vm_stat (line 1239) | fn available_ram_from_vm_stat() -> Option<f64> {
    method parse_vm_stat_line (line 1283) | fn parse_vm_stat_line(line: &str, key: &str) -> Option<u64> {
    method detect_cpu_name (line 1295) | fn detect_cpu_name(sys: &System) -> String {
    method read_cpu_name_from_proc_cpuinfo (line 1316) | fn read_cpu_name_from_proc_cpuinfo() -> Option<String> {
    method parse_cpu_name_from_cpuinfo (line 1329) | fn parse_cpu_name_from_cpuinfo(text: &str) -> Option<String> {
    method read_android_soc_name (line 1347) | fn read_android_soc_name() -> Option<String> {
    method with_gpu_memory_override (line 1376) | pub fn with_gpu_memory_override(mut self, vram_gb: f64) -> Self {
    method display (line 1411) | pub fn display(&self) {
  function parse_memory_size (line 1482) | pub fn parse_memory_size(s: &str) -> Option<f64> {
  function is_running_in_wsl (line 1507) | pub fn is_running_in_wsl() -> bool {
  function detect_running_in_wsl (line 1512) | fn detect_running_in_wsl() -> bool {
  function is_amd_unified_memory_apu (line 1537) | fn is_amd_unified_memory_apu(cpu_name: &str) -> bool {
  function read_proc_meminfo_total_gb (line 1553) | fn read_proc_meminfo_total_gb() -> Option<f64> {
  function gpu_memory_bandwidth_gbps (line 1579) | pub fn gpu_memory_bandwidth_gbps(name: &str) -> Option<f64> {
  function gpu_compute_capability (line 1878) | pub fn gpu_compute_capability(name: &str) -> Option<(u8, u8)> {
  function quant_min_compute_capability (line 1965) | pub fn quant_min_compute_capability(quantization: &str) -> Option<(u8, u...
  function estimate_vram_from_name (line 1977) | fn estimate_vram_from_name(name: &str) -> f64 {
  function test_parse_nvidia_smi_does_not_sum_multi_gpu_vram (line 2208) | fn test_parse_nvidia_smi_does_not_sum_multi_gpu_vram() {
  function test_parse_nvidia_smi_keeps_distinct_models (line 2222) | fn test_parse_nvidia_smi_keeps_distinct_models() {
  function test_parse_nvidia_smi_gb10_gets_vram_estimate (line 2232) | fn test_parse_nvidia_smi_gb10_gets_vram_estimate() {
  function test_estimate_vram_gb10 (line 2245) | fn test_estimate_vram_gb10() {
  function test_estimate_vram_rtx_professional (line 2251) | fn test_estimate_vram_rtx_professional() {
  function test_parse_extended_discrete_gpu_not_unified (line 2261) | fn test_parse_extended_discrete_gpu_not_unified() {
  function test_parse_extended_tegra_unified_memory (line 2277) | fn test_parse_extended_tegra_unified_memory() {
  function test_parse_extended_multi_gpu_discrete (line 2292) | fn test_parse_extended_multi_gpu_discrete() {
  function test_gpu_bandwidth_known_gpus (line 2303) | fn test_gpu_bandwidth_known_gpus() {
  function test_gpu_bandwidth_apple_silicon (line 2325) | fn test_gpu_bandwidth_apple_silicon() {
  function test_gpu_bandwidth_unknown_returns_none (line 2337) | fn test_gpu_bandwidth_unknown_returns_none() {
  function test_gpu_bandwidth_amd (line 2343) | fn test_gpu_bandwidth_amd() {
  function test_parse_cpu_name_from_cpuinfo_prefers_model_name (line 2355) | fn test_parse_cpu_name_from_cpuinfo_prefers_model_name() {
  function test_parse_cpu_name_from_cpuinfo_uses_hardware_fallback (line 2368) | fn test_parse_cpu_name_from_cpuinfo_uses_hardware_fallback() {
  function test_parse_vulkan_device_names_from_summary_output (line 2380) | fn test_parse_vulkan_device_names_from_summary_output() {
  function test_parse_vulkan_device_names_from_gpu_id_lines (line 2398) | fn test_parse_vulkan_device_names_from_gpu_id_lines() {
  function test_is_software_vulkan_device (line 2414) | fn test_is_software_vulkan_device() {
  function test_is_same_gpu_name_uses_normalized_exact_match (line 2423) | fn test_is_same_gpu_name_uses_normalized_exact_match() {
  function test_normalize_gpu_name_for_dedupe (line 2432) | fn test_normalize_gpu_name_for_dedupe() {
  function test_gpu_backend_labels (line 2442) | fn test_gpu_backend_labels() {
  function test_parse_memory_size_gb (line 2456) | fn test_parse_memory_size_gb() {
  function test_parse_memory_size_mb (line 2465) | fn test_parse_memory_size_mb() {
  function test_parse_memory_size_tb (line 2473) | fn test_parse_memory_size_tb() {
  function test_parse_memory_size_bare_number (line 2481) | fn test_parse_memory_size_bare_number() {
  function test_parse_memory_size_whitespace (line 2486) | fn test_parse_memory_size_whitespace() {
  function test_parse_memory_size_empty (line 2491) | fn test_parse_memory_size_empty() {
  function test_parse_memory_size_invalid_suffix (line 2497) | fn test_parse_memory_size_invalid_suffix() {
  function test_parse_memory_size_fractional (line 2503) | fn test_parse_memory_size_fractional() {
  function make_specs_no_gpu (line 2509) | fn make_specs_no_gpu() -> SystemSpecs {
  function make_specs_with_gpu (line 2526) | fn make_specs_with_gpu() -> SystemSpecs {
  function test_gpu_override_creates_synthetic_gpu_when_none (line 2550) | fn test_gpu_override_creates_synthetic_gpu_when_none() {
  function test_gpu_override_updates_existing_gpu (line 2561) | fn test_gpu_override_updates_existing_gpu() {
  function test_gpu_override_multi_gpu_scales_total (line 2570) | fn test_gpu_override_multi_gpu_scales_total() {
  function test_amd_unified_memory_apu_detection (line 2581) | fn test_amd_unified_memory_apu_detection() {
  function test_bandwidth_rtx_20_series (line 2596) | fn test_bandwidth_rtx_20_series() {
  function test_bandwidth_gtx_16_series (line 2610) | fn test_bandwidth_gtx_16_series() {
  function test_bandwidth_rtx_50_series (line 2624) | fn test_bandwidth_rtx_50_series() {
  function test_bandwidth_amd_rx_6000 (line 2654) | fn test_bandwidth_amd_rx_6000() {
  function test_bandwidth_nvidia_professional (line 2672) | fn test_bandwidth_nvidia_professional() {
  function test_bandwidth_apple_silicon_all (line 2688) | fn test_bandwidth_apple_silicon_all() {
  function test_bandwidth_amd_cdna (line 2720) | fn test_bandwidth_amd_cdna() {
  function test_bandwidth_amd_rdna4 (line 2738) | fn test_bandwidth_amd_rdna4() {
  function test_compute_capability_nvidia_generations (line 2752) | fn test_compute_capability_nvidia_generations() {
  function test_compute_capability_unknown_returns_none (line 2795) | fn test_compute_capability_unknown_returns_none() {
  function test_quant_min_compute_capability (line 2805) | fn test_quant_min_compute_capability() {

FILE: llmfit-core/src/models.rs
  constant QUANT_HIERARCHY (line 7) | pub const QUANT_HIERARCHY: &[&str] = &["Q8_0", "Q6_K", "Q5_K_M", "Q4_K_M...
  constant MLX_QUANT_HIERARCHY (line 10) | pub const MLX_QUANT_HIERARCHY: &[&str] = &["mlx-8bit", "mlx-4bit"];
  function quant_bpp (line 13) | pub fn quant_bpp(quant: &str) -> f64 {
  function quant_speed_multiplier (line 34) | pub fn quant_speed_multiplier(quant: &str) -> f64 {
  function quant_bytes_per_param (line 53) | pub fn quant_bytes_per_param(quant: &str) -> f64 {
  function quant_quality_penalty (line 71) | pub fn quant_quality_penalty(quant: &str) -> f64 {
  type Capability (line 93) | pub enum Capability {
    method label (line 99) | pub fn label(&self) -> &'static str {
    method all (line 106) | pub fn all() -> &'static [Capability] {
    method infer (line 111) | pub fn infer(model: &LlmModel) -> Vec<Capability> {
  type ModelFormat (line 152) | pub enum ModelFormat {
    method is_prequantized (line 164) | pub fn is_prequantized(&self) -> bool {
  type UseCase (line 171) | pub enum UseCase {
    method label (line 181) | pub fn label(&self) -> &'static str {
    method from_model (line 193) | pub fn from_model(model: &LlmModel) -> Self {
  type LlmModel (line 217) | pub struct LlmModel {
    method is_mlx_model (line 263) | pub fn is_mlx_model(&self) -> bool {
    method is_prequantized (line 270) | pub fn is_prequantized(&self) -> bool {
    method quant_bpp (line 275) | fn quant_bpp(&self) -> f64 {
    method params_b (line 280) | pub fn params_b(&self) -> f64 {
    method estimate_memory_gb (line 298) | pub fn estimate_memory_gb(&self, quant: &str, ctx: u32) -> f64 {
    method best_quant_for_budget (line 311) | pub fn best_quant_for_budget(&self, budget_gb: f64, ctx: u32) -> Optio...
    method best_quant_for_budget_with (line 316) | pub fn best_quant_for_budget_with(
    method moe_active_vram_gb (line 344) | pub fn moe_active_vram_gb(&self) -> Option<f64> {
    method is_mlx_only (line 356) | pub fn is_mlx_only(&self) -> bool {
    method moe_offloaded_ram_gb (line 362) | pub fn moe_offloaded_ram_gb(&self) -> Option<f64> {
  type GgufSource (line 252) | pub struct GgufSource {
  type HfModelEntry (line 380) | struct HfModelEntry {
  function parse_parameter_count_hint (line 414) | fn parse_parameter_count_hint(parameter_count: &str) -> Option<u64> {
  function effective_parameters_raw (line 430) | fn effective_parameters_raw(entry: &HfModelEntry) -> Option<u64> {
  function option_max (line 436) | fn option_max<T: PartialOrd + Copy>(left: Option<T>, right: Option<T>) -...
  function hf_entry_rank (line 445) | fn hf_entry_rank(entry: &HfModelEntry) -> (u64, u64, usize, usize, u8, u...
  function merge_exact_name_entries (line 458) | fn merge_exact_name_entries(
  function dedupe_hf_entries (line 517) | fn dedupe_hf_entries(entries: Vec<HfModelEntry>) -> Vec<HfModelEntry> {
  constant HF_MODELS_JSON (line 534) | const HF_MODELS_JSON: &str = include_str!("../data/hf_models.json");
  type ModelDatabase (line 536) | pub struct ModelDatabase {
    method new (line 547) | pub fn new() -> Self {
    method get_all_models (line 584) | pub fn get_all_models(&self) -> &Vec<LlmModel> {
    method find_model (line 588) | pub fn find_model(&self, query: &str) -> Vec<&LlmModel> {
    method models_fitting_system (line 600) | pub fn models_fitting_system(
  method default (line 541) | fn default() -> Self {
  function test_mlx_quant_bpp_values (line 642) | fn test_mlx_quant_bpp_values() {
  function test_best_quant_with_mlx_hierarchy (line 652) | fn test_best_quant_with_mlx_hierarchy() {
  function test_quant_bpp (line 688) | fn test_quant_bpp() {
  function test_quant_speed_multiplier (line 699) | fn test_quant_speed_multiplier() {
  function test_quant_quality_penalty (line 709) | fn test_quant_quality_penalty() {
  function test_params_b_from_raw (line 723) | fn test_params_b_from_raw() {
  function test_params_b_from_string (line 748) | fn test_params_b_from_string() {
  function test_params_b_from_millions (line 773) | fn test_params_b_from_millions() {
  function test_estimate_memory_gb (line 798) | fn test_estimate_memory_gb() {
  function test_best_quant_for_budget (line 831) | fn test_best_quant_for_budget() {
  function test_moe_active_vram_gb (line 869) | fn test_moe_active_vram_gb() {
  function test_moe_offloaded_ram_gb (line 923) | fn test_moe_offloaded_ram_gb() {
  function test_use_case_from_model_coding (line 980) | fn test_use_case_from_model_coding() {
  function test_use_case_from_model_embedding (line 1005) | fn test_use_case_from_model_embedding() {
  function test_use_case_from_model_reasoning (line 1030) | fn test_use_case_from_model_reasoning() {
  function test_model_database_new (line 1059) | fn test_model_database_new() {
  function test_dedupe_hf_entries_merges_duplicate_metadata (line 1067) | fn test_dedupe_hf_entries_merges_duplicate_metadata() {
  function test_model_database_deduplicates_exact_name_collisions (line 1151) | fn test_model_database_deduplicates_exact_name_collisions() {
  function test_find_model (line 1185) | fn test_find_model() {
  function test_models_fitting_system (line 1203) | fn test_models_fitting_system() {
  function test_capability_infer_vision (line 1225) | fn test_capability_infer_vision() {
  function test_capability_infer_tool_use (line 1253) | fn test_capability_infer_tool_use() {
  function test_capability_infer_none (line 1280) | fn test_capability_infer_none() {
  function test_capability_preserves_explicit (line 1306) | fn test_capability_preserves_explicit() {
  function test_awq_gptq_quant_values (line 1333) | fn test_awq_gptq_quant_values() {
  function test_model_format_prequantized (line 1351) | fn test_model_format_prequantized() {
  function test_gguf_source_deserialization (line 1364) | fn test_gguf_source_deserialization() {
  function test_gguf_sources_default_to_empty (line 1372) | fn test_gguf_sources_default_to_empty() {
  function test_catalog_popular_models_have_gguf_sources (line 1389) | fn test_catalog_popular_models_have_gguf_sources() {
  function test_catalog_gguf_sources_have_valid_repos (line 1412) | fn test_catalog_gguf_sources_have_valid_repos() {
  function test_catalog_has_significant_gguf_coverage (line 1438) | fn test_catalog_has_significant_gguf_coverage() {

FILE: llmfit-core/src/plan.rs
  constant SUPPORTED_QUANTS (line 5) | const SUPPORTED_QUANTS: &[&str] = &[
  type PlanRequest (line 25) | pub struct PlanRequest {
  type HardwareEstimate (line 32) | pub struct HardwareEstimate {
  type PlanRunPath (line 40) | pub enum PlanRunPath {
    method label (line 47) | pub fn label(&self) -> &'static str {
    method run_mode (line 55) | fn run_mode(self) -> RunMode {
  type PathEstimate (line 65) | pub struct PathEstimate {
  type UpgradeDelta (line 76) | pub struct UpgradeDelta {
  type PlanCurrentStatus (line 86) | pub struct PlanCurrentStatus {
  type PlanEstimate (line 93) | pub struct PlanEstimate {
  function normalize_quant (line 107) | pub fn normalize_quant(quant: &str) -> Option<String> {
  function estimate_tps (line 143) | fn estimate_tps(
  function estimate_tps_with_gpu (line 156) | fn estimate_tps_with_gpu(
  function fit_level_for (line 224) | fn fit_level_for(
  function minimum_cores_for_target (line 255) | fn minimum_cores_for_target(
  function default_gpu_backend (line 276) | fn default_gpu_backend(system: &SystemSpecs) -> GpuBackend {
  function evaluate_current (line 284) | fn evaluate_current(
  function build_path_estimate (line 389) | fn build_path_estimate(
  function estimate_model_plan (line 522) | pub fn estimate_model_plan(
  function resolve_model_selector (line 670) | pub fn resolve_model_selector<'a>(
  function test_model (line 714) | fn test_model() -> LlmModel {
  function test_specs (line 737) | fn test_specs() -> SystemSpecs {
  function test_normalize_quant (line 755) | fn test_normalize_quant() {
  function test_normalize_quant_all_supported (line 762) | fn test_normalize_quant_all_supported() {
  function test_normalize_quant_whitespace_handling (line 777) | fn test_normalize_quant_whitespace_handling() {
  function test_estimate_model_plan (line 784) | fn test_estimate_model_plan() {
  function test_estimate_model_plan_zero_context_errors (line 798) | fn test_estimate_model_plan_zero_context_errors() {
  function test_estimate_model_plan_negative_tps_errors (line 814) | fn test_estimate_model_plan_negative_tps_errors() {
  function test_estimate_model_plan_invalid_quant_errors (line 830) | fn test_estimate_model_plan_invalid_quant_errors() {
  function test_estimate_model_plan_uses_model_quant_when_none (line 842) | fn test_estimate_model_plan_uses_model_quant_when_none() {
  function test_estimate_model_plan_has_three_run_paths (line 853) | fn test_estimate_model_plan_has_three_run_paths() {
  function test_estimate_model_plan_gpu_path_feasible (line 867) | fn test_estimate_model_plan_gpu_path_feasible() {
  function test_fit_level_for_gpu_perfect (line 884) | fn test_fit_level_for_gpu_perfect() {
  function test_fit_level_for_gpu_good (line 890) | fn test_fit_level_for_gpu_good() {
  function test_fit_level_for_gpu_marginal (line 897) | fn test_fit_level_for_gpu_marginal() {
  function test_fit_level_for_too_tight (line 904) | fn test_fit_level_for_too_tight() {
  function test_fit_level_for_cpu_offload_caps_at_good (line 910) | fn test_fit_level_for_cpu_offload_caps_at_good() {
  function test_fit_level_for_cpu_only_always_marginal (line 916) | fn test_fit_level_for_cpu_only_always_marginal() {
  function test_plan_run_path_labels (line 924) | fn test_plan_run_path_labels() {
  function test_plan_run_path_to_run_mode (line 931) | fn test_plan_run_path_to_run_mode() {
  function test_estimate_tps_gpu_faster_than_cpu (line 940) | fn test_estimate_tps_gpu_faster_than_cpu() {
  function test_estimate_tps_cpu_offload_slower_than_gpu (line 954) | fn test_estimate_tps_cpu_offload_slower_than_gpu() {
  function test_estimate_tps_more_cores_helps (line 968) | fn test_estimate_tps_more_cores_helps() {
  function test_estimate_tps_with_known_gpu_uses_bandwidth (line 976) | fn test_estimate_tps_with_known_gpu_uses_bandwidth() {
  function test_minimum_cores_no_target_returns_default (line 1001) | fn test_minimum_cores_no_target_returns_default() {
  function test_minimum_cores_with_reachable_target (line 1009) | fn test_minimum_cores_with_reachable_target() {
  function test_minimum_cores_unreachable_target_returns_none (line 1023) | fn test_minimum_cores_unreachable_target_returns_none() {
  function test_default_gpu_backend_uses_system_when_gpu (line 1038) | fn test_default_gpu_backend_uses_system_when_gpu() {
  function test_default_gpu_backend_falls_back_to_cuda (line 1044) | fn test_default_gpu_backend_falls_back_to_cuda() {
  function test_evaluate_current_with_gpu (line 1053) | fn test_evaluate_current_with_gpu() {
  function test_evaluate_current_no_gpu_uses_cpu (line 1063) | fn test_evaluate_current_no_gpu_uses_cpu() {
  function test_evaluate_current_too_tight_when_no_memory (line 1075) | fn test_evaluate_current_too_tight_when_no_memory() {
  function test_build_path_estimate_gpu (line 1089) | fn test_build_path_estimate_gpu() {
  function test_build_path_estimate_cpu_offload_on_unified_is_infeasible (line 1100) | fn test_build_path_estimate_cpu_offload_on_unified_is_infeasible() {
  function test_build_path_estimate_cpu_only_no_vram (line 1117) | fn test_build_path_estimate_cpu_only_no_vram() {
  function test_resolve_model_selector (line 1129) | fn test_resolve_model_selector() {
  function test_resolve_model_selector_empty_errors (line 1136) | fn test_resolve_model_selector_empty_errors() {
  function test_resolve_model_selector_not_found (line 1144) | fn test_resolve_model_selector_not_found() {
  function test_resolve_model_selector_ambiguous (line 1152) | fn test_resolve_model_selector_ambiguous() {
  function test_resolve_model_selector_partial_match (line 1164) | fn test_resolve_model_selector_partial_match() {
  function test_plan_has_upgrade_deltas (line 1173) | fn test_plan_has_upgrade_deltas() {
  function test_normalize_awq_gptq_quants (line 1188) | fn test_normalize_awq_gptq_quants() {

FILE: llmfit-core/src/providers.rs
  type ModelProvider (line 14) | pub trait ModelProvider {
    method name (line 16) | fn name(&self) -> &str;
    method is_available (line 19) | fn is_available(&self) -> bool;
    method installed_models (line 23) | fn installed_models(&self) -> HashSet<String>;
    method start_pull (line 27) | fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String>;
    method name (line 195) | fn name(&self) -> &str {
    method is_available (line 199) | fn is_available(&self) -> bool {
    method installed_models (line 208) | fn installed_models(&self) -> HashSet<String> {
    method start_pull (line 213) | fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {
    method name (line 414) | fn name(&self) -> &str {
    method is_available (line 418) | fn is_available(&self) -> bool {
    method installed_models (line 437) | fn installed_models(&self) -> HashSet<String> {
    method start_pull (line 461) | fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {
    method name (line 956) | fn name(&self) -> &str {
    method is_available (line 960) | fn is_available(&self) -> bool {
    method installed_models (line 964) | fn installed_models(&self) -> HashSet<String> {
    method start_pull (line 969) | fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {
    method name (line 1133) | fn name(&self) -> &str {
    method is_available (line 1137) | fn is_available(&self) -> bool {
    method installed_models (line 1146) | fn installed_models(&self) -> HashSet<String> {
    method start_pull (line 1151) | fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {
    method name (line 1337) | fn name(&self) -> &str {
    method is_available (line 1341) | fn is_available(&self) -> bool {
    method installed_models (line 1350) | fn installed_models(&self) -> HashSet<String> {
    method start_pull (line 1355) | fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {
  type PullHandle (line 32) | pub struct PullHandle {
  type PullEvent (line 38) | pub enum PullEvent {
  type OllamaProvider (line 51) | pub struct OllamaProvider {
    method new (line 93) | pub fn new() -> Self {
    method api_url (line 98) | fn api_url(&self, path: &str) -> String {
    method detect_with_installed (line 104) | pub fn detect_with_installed(&self) -> (bool, HashSet<String>, usize) {
    method installed_models_counted (line 132) | pub fn installed_models_counted(&self) -> (HashSet<String>, usize) {
    method has_remote_tag (line 158) | pub fn has_remote_tag(&self, model_tag: &str) -> bool {
  function normalize_ollama_host (line 55) | fn normalize_ollama_host(raw: &str) -> Option<String> {
  method default (line 74) | fn default() -> Self {
  type TagsResponse (line 172) | struct TagsResponse {
  type OllamaModel (line 177) | struct OllamaModel {
  type PullStreamLine (line 183) | struct PullStreamLine {
  type MlxProvider (line 282) | pub struct MlxProvider {
    method new (line 307) | pub fn new() -> Self {
    method detect_with_installed (line 313) | pub fn detect_with_installed(&self) -> (bool, HashSet<String>) {
  method default (line 287) | fn default() -> Self {
  function check_mlx_python (line 345) | fn check_mlx_python() -> bool {
  function is_likely_mlx_repo (line 357) | fn is_likely_mlx_repo(owner: &str, repo: &str) -> bool {
  function scan_hf_cache_for_mlx (line 368) | fn scan_hf_cache_for_mlx() -> HashSet<String> {
  function dirs_hf_cache (line 400) | fn dirs_hf_cache() -> std::path::PathBuf {
  type LlamaCppProvider (line 524) | pub struct LlamaCppProvider {
    method new (line 547) | pub fn new() -> Self {
    method installed_models_counted (line 554) | pub fn installed_models_counted(&self) -> (HashSet<String>, usize) {
    method models_dir (line 571) | pub fn models_dir(&self) -> &std::path::Path {
    method llama_cli_path (line 576) | pub fn llama_cli_path(&self) -> Option<&str> {
    method llama_server_path (line 581) | pub fn llama_server_path(&self) -> Option<&str> {
    method list_gguf_files (line 586) | pub fn list_gguf_files(&self) -> Vec<PathBuf> {
    method search_hf_gguf (line 601) | pub fn search_hf_gguf(query: &str) -> Vec<(String, String)> {
    method list_repo_gguf_files (line 633) | pub fn list_repo_gguf_files(repo_id: &str) -> Vec<(String, u64)> {
    method select_best_gguf (line 652) | pub fn select_best_gguf(files: &[(String, u64)], budget_gb: f64) -> Op...
    method download_gguf (line 687) | pub fn download_gguf(&self, repo_id: &str, filename: &str) -> Result<P...
  method default (line 534) | fn default() -> Self {
  function validate_gguf_filename (line 844) | fn validate_gguf_filename(filename: &str) -> Result<(), String> {
  function is_split_file (line 882) | fn is_split_file(filename: &str) -> bool {
  function parse_repo_gguf_entries (line 887) | fn parse_repo_gguf_entries(entries: Vec<serde_json::Value>) -> Vec<(Stri...
  function llamacpp_models_dir (line 904) | fn llamacpp_models_dir() -> PathBuf {
  function find_binary (line 918) | fn find_binary(name: &str) -> Option<String> {
  function encode (line 938) | pub fn encode(s: &str) -> String {
  type DockerModelRunnerProvider (line 1030) | pub struct DockerModelRunnerProvider {
    method new (line 1072) | pub fn new() -> Self {
    method models_url (line 1076) | fn models_url(&self) -> String {
    method detect_with_installed (line 1082) | pub fn detect_with_installed(&self) -> (bool, HashSet<String>, usize) {
    method installed_models_counted (line 1115) | pub fn installed_models_counted(&self) -> (HashSet<String>, usize) {
  function normalize_docker_mr_host (line 1034) | fn normalize_docker_mr_host(raw: &str) -> Option<String> {
  method default (line 1052) | fn default() -> Self {
  type DockerModelList (line 1122) | struct DockerModelList {
  type DockerEngine (line 1127) | struct DockerEngine {
  type LmStudioProvider (line 1200) | pub struct LmStudioProvider {
    method new (line 1242) | pub fn new() -> Self {
    method models_url (line 1246) | fn models_url(&self) -> String {
    method download_url (line 1250) | fn download_url(&self) -> String {
    method download_status_url (line 1257) | fn download_status_url(&self) -> String {
    method detect_with_installed (line 1266) | pub fn detect_with_installed(&self) -> (bool, HashSet<String>, usize) {
    method installed_models_counted (line 1295) | pub fn installed_models_counted(&self) -> (HashSet<String>, usize) {
  function normalize_lmstudio_host (line 1204) | fn normalize_lmstudio_host(raw: &str) -> Option<String> {
  method default (line 1222) | fn default() -> Self {
  type LmStudioModelList (line 1302) | struct LmStudioModelList {
  type LmStudioModel (line 1307) | struct LmStudioModel {
  type LmStudioDownloadResponse (line 1313) | struct LmStudioDownloadResponse {
  type LmStudioDownloadStatus (line 1325) | struct LmStudioDownloadStatus {
  function hf_name_to_lmstudio_candidates (line 1497) | pub fn hf_name_to_lmstudio_candidates(hf_name: &str) -> Vec<String> {
  function is_model_installed_lmstudio (line 1520) | pub fn is_model_installed_lmstudio(hf_name: &str, installed: &HashSet<St...
  function has_lmstudio_mapping (line 1531) | pub fn has_lmstudio_mapping(hf_name: &str) -> bool {
  function lmstudio_pull_tag (line 1539) | pub fn lmstudio_pull_tag(hf_name: &str) -> Option<String> {
  constant DOCKER_MODELS_JSON (line 1553) | const DOCKER_MODELS_JSON: &str = include_str!("../data/docker_models.jso...
  type DockerModelCatalog (line 1556) | struct DockerModelCatalog {
  type DockerModelEntry (line 1561) | struct DockerModelEntry {
  function docker_mr_catalog (line 1567) | fn docker_mr_catalog() -> &'static [(String, String)] {
  function has_docker_mr_mapping (line 1583) | pub fn has_docker_mr_mapping(hf_name: &str) -> bool {
  function docker_mr_pull_tag (line 1589) | pub fn docker_mr_pull_tag(hf_name: &str) -> Option<String> {
  function hf_name_to_docker_mr_candidates (line 1600) | pub fn hf_name_to_docker_mr_candidates(hf_name: &str) -> Vec<String> {
  function is_model_installed_docker_mr (line 1618) | pub fn is_model_installed_docker_mr(hf_name: &str, installed: &HashSet<S...
  function docker_mr_installed_matches (line 1627) | fn docker_mr_installed_matches(installed_name: &str, candidate: &str) ->...
  function strip_gguf_quant_suffix (line 1641) | fn strip_gguf_quant_suffix(stem: &str) -> Option<String> {
  constant LLAMACPP_GGUF_MAPPINGS (line 1661) | const LLAMACPP_GGUF_MAPPINGS: &[(&str, &str)] = &[
  function lookup_gguf_repo (line 1788) | fn lookup_gguf_repo(hf_name: &str) -> Option<&'static str> {
  function hf_name_to_gguf_candidates (line 1801) | pub fn hf_name_to_gguf_candidates(hf_name: &str) -> Vec<String> {
  function has_gguf_mapping (line 1817) | pub fn has_gguf_mapping(hf_name: &str) -> bool {
  function is_model_installed_llamacpp (line 1822) | pub fn is_model_installed_llamacpp(hf_name: &str, installed: &HashSet<St...
  function gguf_pull_tag (line 1847) | pub fn gguf_pull_tag(hf_name: &str) -> Option<String> {
  function hf_repo_exists (line 1852) | pub fn hf_repo_exists(repo_id: &str) -> bool {
  function first_existing_gguf_repo (line 1863) | pub fn first_existing_gguf_repo(hf_name: &str) -> Option<String> {
  function push_unique_candidate (line 1877) | fn push_unique_candidate(candidates: &mut Vec<String>, candidate: String) {
  function strip_trailing_quant_suffix (line 1883) | fn strip_trailing_quant_suffix(name: &str) -> String {
  function normalize_mlx_repo_base (line 1892) | fn normalize_mlx_repo_base(repo_lower: &str) -> String {
  function strip_trailing_common_model_suffixes (line 1902) | fn strip_trailing_common_model_suffixes(name: &str) -> String {
  function explicit_mlx_repo_id (line 1920) | fn explicit_mlx_repo_id(hf_name: &str) -> Option<String> {
  function hf_name_to_mlx_candidates (line 1935) | pub fn hf_name_to_mlx_candidates(hf_name: &str) -> Vec<String> {
  function is_model_installed_mlx (line 2081) | pub fn is_model_installed_mlx(hf_name: &str, installed: &HashSet<String>...
  function mlx_pull_tag (line 2087) | pub fn mlx_pull_tag(hf_name: &str) -> String {
  constant OLLAMA_MAPPINGS (line 2115) | const OLLAMA_MAPPINGS: &[(&str, &str)] = &[
  function lookup_ollama_tag (line 2252) | fn lookup_ollama_tag(hf_name: &str) -> Option<&'static str> {
  function hf_name_to_ollama_candidates (line 2266) | pub fn hf_name_to_ollama_candidates(hf_name: &str) -> Vec<String> {
  function has_ollama_mapping (line 2275) | pub fn has_ollama_mapping(hf_name: &str) -> bool {
  function ollama_installed_matches_candidate (line 2279) | fn ollama_installed_matches_candidate(installed_name: &str, candidate: &...
  function is_model_installed (line 2296) | pub fn is_model_installed(hf_name: &str, installed: &HashSet<String>) ->...
  function ollama_pull_tag (line 2307) | pub fn ollama_pull_tag(hf_name: &str) -> Option<String> {
  function test_hf_name_to_mlx_candidates (line 2316) | fn test_hf_name_to_mlx_candidates() {
  function test_hf_name_to_mlx_candidates_qwen35 (line 2334) | fn test_hf_name_to_mlx_candidates_qwen35() {
  function test_hf_name_to_mlx_candidates_llama4 (line 2341) | fn test_hf_name_to_mlx_candidates_llama4() {
  function test_hf_name_to_mlx_candidates_gemma3 (line 2348) | fn test_hf_name_to_mlx_candidates_gemma3() {
  function test_hf_name_to_mlx_fallback_generates_mlx_infix_candidates (line 2355) | fn test_hf_name_to_mlx_fallback_generates_mlx_infix_candidates() {
  function test_hf_name_to_mlx_candidates_normalizes_explicit_mlx_repo (line 2364) | fn test_hf_name_to_mlx_candidates_normalizes_explicit_mlx_repo() {
  function test_mlx_pull_tag_prefers_explicit_repo_id (line 2379) | fn test_mlx_pull_tag_prefers_explicit_repo_id() {
  function test_mlx_cache_scan_parsing (line 2388) | fn test_mlx_cache_scan_parsing() {
  function test_is_model_installed_mlx (line 2405) | fn test_is_model_installed_mlx() {
  function test_is_model_installed_mlx_with_owner_prefixed_repo_id (line 2420) | fn test_is_model_installed_mlx_with_owner_prefixed_repo_id() {
  function test_qwen_coder_14b_matches_coder_entry (line 2431) | fn test_qwen_coder_14b_matches_coder_entry() {
  function test_qwen_base_does_not_match_coder (line 2448) | fn test_qwen_base_does_not_match_coder() {
  function test_installed_variant_suffix_matches_ollama_candidate (line 2463) | fn test_installed_variant_suffix_matches_ollama_candidate() {
  function test_candidates_for_coder_model (line 2476) | fn test_candidates_for_coder_model() {
  function test_candidates_for_base_model (line 2482) | fn test_candidates_for_base_model() {
  function test_llama_mapping (line 2488) | fn test_llama_mapping() {
  function test_deepseek_coder_mapping (line 2494) | fn test_deepseek_coder_mapping() {
  function test_normalize_ollama_host_with_scheme (line 2501) | fn test_normalize_ollama_host_with_scheme() {
  function test_normalize_ollama_host_without_scheme (line 2509) | fn test_normalize_ollama_host_without_scheme() {
  function test_normalize_ollama_host_rejects_unsupported_scheme (line 2517) | fn test_normalize_ollama_host_rejects_unsupported_scheme() {
  function test_validate_gguf_filename_valid (line 2525) | fn test_validate_gguf_filename_valid() {
  function test_validate_gguf_filename_traversal (line 2531) | fn test_validate_gguf_filename_traversal() {
  function test_validate_gguf_filename_absolute (line 2538) | fn test_validate_gguf_filename_absolute() {
  function test_validate_gguf_filename_bad_extension (line 2544) | fn test_validate_gguf_filename_bad_extension() {
  function test_validate_gguf_filename_empty (line 2551) | fn test_validate_gguf_filename_empty() {
  function test_validate_gguf_filename_subdirectory (line 2556) | fn test_validate_gguf_filename_subdirectory() {
  function test_validate_gguf_filename_rejects_non_basename_forms (line 2561) | fn test_validate_gguf_filename_rejects_non_basename_forms() {
  function test_parse_repo_gguf_entries_filters_unsafe_paths (line 2570) | fn test_parse_repo_gguf_entries_filters_unsafe_paths() {
  function test_hf_name_to_gguf_candidates_generates_common_patterns (line 2588) | fn test_hf_name_to_gguf_candidates_generates_common_patterns() {
  function test_hf_name_to_gguf_candidates_strips_owner (line 2615) | fn test_hf_name_to_gguf_candidates_strips_owner() {
  function test_lookup_gguf_repo_known_mappings (line 2628) | fn test_lookup_gguf_repo_known_mappings() {
  function test_lookup_gguf_repo_unknown_returns_none (line 2635) | fn test_lookup_gguf_repo_unknown_returns_none() {
  function test_has_gguf_mapping_matches_known_models (line 2640) | fn test_has_gguf_mapping_matches_known_models() {
  function test_gguf_candidates_fallback_covers_major_providers (line 2646) | fn test_gguf_candidates_fallback_covers_major_providers() {
  function test_gguf_candidates_known_mapping_returns_single (line 2657) | fn test_gguf_candidates_known_mapping_returns_single() {
  function test_select_best_gguf_prefers_higher_quality (line 2667) | fn test_select_best_gguf_prefers_higher_quality() {
  function test_select_best_gguf_respects_budget (line 2680) | fn test_select_best_gguf_respects_budget() {
  function test_select_best_gguf_nothing_fits (line 2698) | fn test_select_best_gguf_nothing_fits() {
  function test_select_best_gguf_skips_split_files (line 2705) | fn test_select_best_gguf_skips_split_files() {
  function test_select_best_gguf_empty_list (line 2724) | fn test_select_best_gguf_empty_list() {
  function test_is_split_file (line 2732) | fn test_is_split_file() {
  function test_urlencoding_ascii (line 2741) | fn test_urlencoding_ascii() {
  function test_urlencoding_special_chars (line 2747) | fn test_urlencoding_special_chars() {
  function test_urlencoding_empty (line 2754) | fn test_urlencoding_empty() {
  function test_is_model_installed_llamacpp_exact (line 2761) | fn test_is_model_installed_llamacpp_exact() {
  function test_is_model_installed_llamacpp_stripped_suffixes (line 2771) | fn test_is_model_installed_llamacpp_stripped_suffixes() {
  function test_is_model_installed_llamacpp_not_installed (line 2781) | fn test_is_model_installed_llamacpp_not_installed() {
  function test_gguf_pull_tag_known (line 2792) | fn test_gguf_pull_tag_known() {
  function test_gguf_pull_tag_unknown (line 2799) | fn test_gguf_pull_tag_unknown() {
  function test_has_ollama_mapping_known (line 2806) | fn test_has_ollama_mapping_known() {
  function test_has_ollama_mapping_unknown (line 2812) | fn test_has_ollama_mapping_unknown() {
  function test_ollama_pull_tag_known (line 2819) | fn test_ollama_pull_tag_known() {
  function test_ollama_pull_tag_unknown (line 2825) | fn test_ollama_pull_tag_unknown() {
  function test_mlx_pull_tag_prefers_4bit (line 2832) | fn test_mlx_pull_tag_prefers_4bit() {
  function test_mlx_pull_tag_fallback (line 2838) | fn test_mlx_pull_tag_fallback() {
  function test_ollama_installed_matches_exact (line 2846) | fn test_ollama_installed_matches_exact() {
  function test_ollama_installed_matches_variant_suffix (line 2854) | fn test_ollama_installed_matches_variant_suffix() {
  function test_ollama_installed_no_match (line 2862) | fn test_ollama_installed_no_match() {
  function test_parse_repo_gguf_entries_valid (line 2872) | fn test_parse_repo_gguf_entries_valid() {
  function test_parse_repo_gguf_entries_missing_size_defaults_to_zero (line 2884) | fn test_parse_repo_gguf_entries_missing_size_defaults_to_zero() {
  function test_parse_repo_gguf_entries_skips_non_gguf (line 2892) | fn test_parse_repo_gguf_entries_skips_non_gguf() {
  function test_hf_name_to_mlx_candidates_bare_model_name (line 2906) | fn test_hf_name_to_mlx_candidates_bare_model_name() {
  function test_hf_name_to_mlx_candidates_no_duplicates (line 2913) | fn test_hf_name_to_mlx_candidates_no_duplicates() {
  function test_hf_name_to_ollama_candidates_unknown_returns_empty (line 2927) | fn test_hf_name_to_ollama_candidates_unknown_returns_empty() {
  function test_hf_name_to_ollama_candidates_multiple_models (line 2933) | fn test_hf_name_to_ollama_candidates_multiple_models() {
  function test_docker_mr_catalog_parses (line 2943) | fn test_docker_mr_catalog_parses() {
  function test_has_docker_mr_mapping_known (line 2950) | fn test_has_docker_mr_mapping_known() {
  function test_has_docker_mr_mapping_unknown (line 2956) | fn test_has_docker_mr_mapping_unknown() {
  function test_docker_mr_pull_tag_returns_ai_prefixed (line 2961) | fn test_docker_mr_pull_tag_returns_ai_prefixed() {
  function test_docker_mr_candidates_includes_ai_prefix (line 2968) | fn test_docker_mr_candidates_includes_ai_prefix() {
  function test_docker_mr_candidates_unknown_returns_empty (line 2974) | fn test_docker_mr_candidates_unknown_returns_empty() {
  function test_is_model_installed_docker_mr_exact (line 2980) | fn test_is_model_installed_docker_mr_exact() {
  function test_is_model_installed_docker_mr_variant_suffix (line 2992) | fn test_is_model_installed_docker_mr_variant_suffix() {
  function test_is_model_installed_docker_mr_not_installed (line 3002) | fn test_is_model_installed_docker_mr_not_installed() {
  function test_normalize_docker_mr_host_with_scheme (line 3011) | fn test_normalize_docker_mr_host_with_scheme() {
  function test_normalize_docker_mr_host_without_scheme (line 3019) | fn test_normalize_docker_mr_host_without_scheme() {
  function test_normalize_docker_mr_host_rejects_unsupported_scheme (line 3027) | fn test_normalize_docker_mr_host_rejects_unsupported_scheme() {

FILE: llmfit-desktop/build.rs
  function main (line 1) | fn main() {

FILE: llmfit-desktop/src/main.rs
  type GpuInfoJs (line 12) | struct GpuInfoJs {
  type SystemInfo (line 21) | struct SystemInfo {
  type ModelFitInfo (line 31) | struct ModelFitInfo {
  type PullStatus (line 50) | struct PullStatus {
  type AppState (line 57) | struct AppState {
  function get_system_specs (line 63) | fn get_system_specs() -> Result<SystemInfo, String> {
  function get_model_fits (line 87) | fn get_model_fits() -> Result<Vec<ModelFitInfo>, String> {
  function start_pull (line 135) | fn start_pull(model_tag: String, state: State<'_, AppState>) -> Result<S...
  function poll_pull (line 143) | fn poll_pull(state: State<'_, AppState>) -> Result<PullStatus, String> {
  function is_ollama_available (line 184) | fn is_ollama_available(state: State<'_, AppState>) -> bool {
  function main (line 188) | fn main() {

FILE: llmfit-desktop/ui/app.js
  function esc (line 9) | function esc(s) {
  function loadSpecs (line 15) | async function loadSpecs() {
  function fitClass (line 61) | function fitClass(level) {
  function modeClass (line 70) | function modeClass(mode) {
  function showModal (line 79) | function showModal(fit) {
  function closeModal (line 169) | function closeModal() {
  function pullModel (line 177) | async function pullModel(name) {
  function renderModels (line 219) | function renderModels(fits) {
  function applyFilters (line 249) | function applyFilters() {
  function loadModels (line 263) | async function loadModels() {
  function init (line 287) | async function init() {

FILE: llmfit-tui/build.rs
  function main (line 5) | fn main() {
  function collect_files (line 33) | fn collect_files(dir: &Path, files: &mut Vec<PathBuf>) {
  function generate_assets_from_dist (line 47) | fn generate_assets_from_dist(dist_dir: &Path, files: &[PathBuf]) -> Stri...
  function generate_fallback_assets (line 74) | fn generate_fallback_assets() -> String {
  function content_type_for_path (line 102) | fn content_type_for_path(path: &Path) -> &'static str {

FILE: llmfit-tui/src/display.rs
  type ModelRow (line 9) | struct ModelRow {
  function display_all_models (line 34) | pub fn display_all_models(models: &[LlmModel]) {
  function display_model_fits (line 59) | pub fn display_model_fits(fits: &[ModelFit]) {
  function display_model_detail (line 99) | pub fn display_model_detail(fit: &ModelFit) {
  function display_model_diff (line 214) | pub fn display_model_diff(fits: &[ModelFit], sort_label: &str) {
  function print_metric_row (line 337) | fn print_metric_row(metric: &str, values: Vec<String>, metric_width: usi...
  function format_with_delta (line 345) | fn format_with_delta(value: String, delta: f64) -> String {
  function truncate_to_width (line 352) | fn truncate_to_width(input: &str, width: usize) -> String {
  function display_search_results (line 364) | pub fn display_search_results(models: &[&LlmModel], query: &str) {
  function display_json_system (line 407) | pub fn display_json_system(specs: &SystemSpecs) {
  function display_json_fits (line 418) | pub fn display_json_fits(specs: &SystemSpecs, fits: &[ModelFit]) {
  function display_json_diff_fits (line 431) | pub fn display_json_diff_fits(specs: &SystemSpecs, fits: &[ModelFit]) {
  function system_json (line 447) | fn system_json(specs: &SystemSpecs) -> serde_json::Value {
  function fit_to_json (line 477) | fn fit_to_json(fit: &ModelFit) -> serde_json::Value {
  function round1 (line 511) | fn round1(v: f64) -> f64 {
  function round2 (line 515) | fn round2(v: f64) -> f64 {
  function display_model_plan (line 519) | pub fn display_model_plan(plan: &PlanEstimate) {
  function display_json_plan (line 589) | pub fn display_json_plan(plan: &PlanEstimate) {

FILE: llmfit-tui/src/main.rs
  constant DEFAULT_DASHBOARD_HOST (line 19) | const DEFAULT_DASHBOARD_HOST: &str = "0.0.0.0";
  constant DEFAULT_DASHBOARD_PORT (line 20) | const DEFAULT_DASHBOARD_PORT: u16 = 8787;
  type SortArg (line 23) | enum SortArg {
  method from (line 46) | fn from(value: SortArg) -> Self {
  type FitArg (line 60) | enum FitArg {
  type Cli (line 95) | struct Cli {
  type Commands (line 135) | enum Commands {
  function detect_specs (line 581) | fn detect_specs(memory_override: &Option<String>) -> SystemSpecs {
  function resolve_context_limit (line 599) | fn resolve_context_limit(max_context: Option<u32>) -> Option<u32> {
  function dashboard_pid_path (line 619) | fn dashboard_pid_path() -> std::path::PathBuf {
  function write_dashboard_pid (line 623) | fn write_dashboard_pid(pid: u32) {
  type DashboardGuard (line 627) | struct DashboardGuard {
  method drop (line 632) | fn drop(&mut self) {
  function dashboard_target_from_env (line 638) | fn dashboard_target_from_env() -> (String, u16) {
  function dashboard_reachable (line 662) | fn dashboard_reachable(host: &str, port: u16) -> bool {
  function ensure_dashboard_available (line 672) | fn ensure_dashboard_available(
  function run_fit (line 746) | fn run_fit(
  function fit_matches_filter (line 798) | fn fit_matches_filter(fit: &ModelFit, filter: FitArg) -> bool {
  function find_name_index_by_selector (line 809) | fn find_name_index_by_selector<T>(
  function find_fit_index_by_selector (line 858) | fn find_fit_index_by_selector(fits: &[ModelFit], selector: &str) -> Resu...
  function run_diff (line 862) | fn run_diff(
  function run_tui (line 934) | fn run_tui(memory_override: &Option<String>, context_limit: Option<u32>)...
  function draw_boot_screen (line 978) | fn draw_boot_screen(
  function run_recommend (line 1012) | fn run_recommend(
  function run_download (line 1133) | fn run_download(
  function run_hf_search (line 1314) | fn run_hf_search(query: &str, limit: usize) {
  function run_model (line 1338) | fn run_model(model: &str, server: bool, port: u16, ngl: i32, ctx_size: u...
  function run_plan (line 1440) | fn run_plan(
  function main (line 1469) | fn main() {
  function mock_fit (line 1663) | fn mock_fit(name: &str, fit_level: FitLevel) -> ModelFit {
  function fit_filter_runnable_excludes_too_tight (line 1708) | fn fit_filter_runnable_excludes_too_tight() {
  function selector_prefers_exact_match (line 1716) | fn selector_prefers_exact_match() {
  function selector_errors_on_ambiguous_partial (line 1726) | fn selector_errors_on_ambiguous_partial() {
  function generic_selector_prefers_exact_match_for_models (line 1736) | fn generic_selector_prefers_exact_match_for_models() {

FILE: llmfit-tui/src/serve_api.rs
  type AppState (line 25) | struct AppState {
  type ModelsQuery (line 34) | struct ModelsQuery {
  type NodeInfo (line 51) | struct NodeInfo {
  type ApiEnvelope (line 57) | struct ApiEnvelope {
  type ApiError (line 67) | struct ApiError {
    method bad_request (line 73) | fn bad_request(message: impl Into<String>) -> Self {
    method internal (line 80) | fn internal(message: impl Into<String>) -> Self {
  method into_response (line 89) | fn into_response(self) -> Response {
  type ApiResult (line 100) | type ApiResult<T> = Result<T, ApiError>;
  function run_serve (line 102) | pub fn run_serve(
  function build_router (line 161) | fn build_router(state: Arc<AppState>) -> Router {
  function health (line 174) | async fn health(State(state): State<Arc<AppState>>) -> Json<serde_json::...
  function system (line 184) | async fn system(State(state): State<Arc<AppState>>) -> Json<serde_json::...
  function web_index (line 194) | async fn web_index() -> Response {
  function web_asset (line 198) | async fn web_asset(Path(path): Path<String>) -> Response {
  function spa_fallback (line 203) | async fn spa_fallback(Path(path): Path<String>) -> Response {
  function serve_web_path (line 210) | fn serve_web_path(path: &str) -> Response {
  function find_web_asset (line 230) | fn find_web_asset(path: &str) -> Option<&'static EmbeddedAsset> {
  function models (line 234) | async fn models(
  function top_models (line 261) | async fn top_models(
  function model_by_name (line 288) | async fn model_by_name(
  function filtered_fits (line 319) | fn filtered_fits(
  type RuntimeFilter (line 390) | enum RuntimeFilter {
  function parse_sort (line 397) | fn parse_sort(raw: Option<&str>) -> Result<SortColumn, ApiError> {
  function parse_min_fit (line 416) | fn parse_min_fit(raw: Option<&str>) -> Result<FitLevel, ApiError> {
  function parse_runtime (line 432) | fn parse_runtime(raw: Option<&str>) -> Result<RuntimeFilter, ApiError> {
  function parse_force_runtime (line 451) | fn parse_force_runtime(
  function parse_use_case (line 469) | fn parse_use_case(raw: Option<&str>) -> Result<Option<UseCase>, ApiError> {
  function fit_at_least (line 490) | fn fit_at_least(actual: FitLevel, minimum: FitLevel) -> bool {
  function active_filters_json (line 500) | fn active_filters_json(query: &ModelsQuery, top_only: bool) -> serde_jso...
  function fit_level_code (line 516) | fn fit_level_code(fit_level: FitLevel) -> &'static str {
  function run_mode_code (line 525) | fn run_mode_code(run_mode: llmfit_core::fit::RunMode) -> &'static str {
  function runtime_code (line 534) | fn runtime_code(runtime: InferenceRuntime) -> &'static str {
  function system_json (line 542) | fn system_json(specs: &SystemSpecs) -> serde_json::Value {
  function fit_to_json (line 572) | fn fit_to_json(fit: &ModelFit) -> serde_json::Value {
  function round1 (line 608) | fn round1(v: f64) -> f64 {
  function round2 (line 612) | fn round2(v: f64) -> f64 {
  function detect_specs (line 617) | fn detect_specs(memory_override: &Option<String>) -> SystemSpecs {
  function test_state (line 638) | fn test_state() -> Arc<AppState> {
  function test_router (line 649) | fn test_router() -> Router {
  function find_asset_path_with_ext (line 653) | fn find_asset_path_with_ext(ext: &str) -> Option<&'static EmbeddedAsset> {
  function run_async (line 659) | fn run_async<T>(future: impl Future<Output = T>) -> T {
  function root_serves_index_html (line 668) | fn root_serves_index_html() {
  function assets_route_serves_embedded_file_with_content_type (line 684) | fn assets_route_serves_embedded_file_with_content_type() {
  function unknown_non_api_routes_fallback_to_index (line 712) | fn unknown_non_api_routes_fallback_to_index() {
  function existing_api_route_response_shape_is_preserved (line 733) | fn existing_api_route_response_shape_is_preserved() {
  function unknown_api_paths_do_not_fallback_to_html (line 754) | fn unknown_api_paths_do_not_fallback_to_html() {

FILE: llmfit-tui/src/theme.rs
  type Theme (line 7) | pub enum Theme {
    method label (line 21) | pub fn label(&self) -> &'static str {
    method next (line 36) | pub fn next(&self) -> Self {
    method colors (line 51) | pub fn colors(&self) -> ThemeColors {
    method config_path (line 67) | fn config_path() -> Option<PathBuf> {
    method save (line 80) | pub fn save(&self) {
    method load (line 90) | pub fn load() -> Self {
    method from_label (line 97) | fn from_label(s: &str) -> Self {
  type ThemeColors (line 114) | pub struct ThemeColors {
  function default_colors (line 155) | fn default_colors() -> ThemeColors {
  function dracula_colors (line 194) | fn dracula_colors() -> ThemeColors {
  function solarized_colors (line 231) | fn solarized_colors() -> ThemeColors {
  function nord_colors (line 268) | fn nord_colors() -> ThemeColors {
  function monokai_colors (line 305) | fn monokai_colors() -> ThemeColors {
  function gruvbox_colors (line 342) | fn gruvbox_colors() -> ThemeColors {
  function catppuccin_latte_colors (line 379) | fn catppuccin_latte_colors() -> ThemeColors {
  function catppuccin_frappe_colors (line 417) | fn catppuccin_frappe_colors() -> ThemeColors {
  function catppuccin_macchiato_colors (line 455) | fn catppuccin_macchiato_colors() -> ThemeColors {
  function catppuccin_mocha_colors (line 493) | fn catppuccin_mocha_colors() -> ThemeColors {

FILE: llmfit-tui/src/tui_app.rs
  type InputMode (line 16) | pub enum InputMode {
  type PlanField (line 32) | pub enum PlanField {
    method next (line 39) | fn next(self) -> Self {
    method prev (line 47) | fn prev(self) -> Self {
  type FitFilter (line 57) | pub enum FitFilter {
    method label (line 67) | pub fn label(&self) -> &str {
    method next (line 78) | pub fn next(&self) -> Self {
  type AvailabilityFilter (line 92) | pub enum AvailabilityFilter {
    method label (line 99) | pub fn label(&self) -> &str {
    method next (line 107) | pub fn next(&self) -> Self {
  type DownloadProvider (line 117) | pub enum DownloadProvider {
  type DownloadCapability (line 126) | pub enum DownloadCapability {
  constant DL_OLLAMA (line 132) | pub const DL_OLLAMA: u8 = 0b0001;
  constant DL_LLAMACPP (line 133) | pub const DL_LLAMACPP: u8 = 0b0010;
  constant DL_DOCKER (line 134) | pub const DL_DOCKER: u8 = 0b0100;
  constant DL_LMSTUDIO (line 135) | pub const DL_LMSTUDIO: u8 = 0b1000;
  type ActivePullProvider (line 138) | enum ActivePullProvider {
    method label (line 147) | fn label(self) -> &'static str {
  type App (line 158) | pub struct App {
    method with_specs_and_context (line 278) | pub fn with_specs_and_context(specs: SystemSpecs, context_limit: Optio...
    method apply_filters (line 484) | pub fn apply_filters(&mut self) {
    method selected_fit (line 644) | pub fn selected_fit(&self) -> Option<&ModelFit> {
    method move_up (line 650) | pub fn move_up(&mut self) {
    method move_down (line 658) | pub fn move_down(&mut self) {
    method page_up (line 666) | pub fn page_up(&mut self) {
    method page_down (line 672) | pub fn page_down(&mut self) {
    method half_page_up (line 680) | pub fn half_page_up(&mut self) {
    method half_page_down (line 685) | pub fn half_page_down(&mut self) {
    method home (line 692) | pub fn home(&mut self) {
    method end (line 697) | pub fn end(&mut self) {
    method cycle_fit_filter (line 704) | pub fn cycle_fit_filter(&mut self) {
    method cycle_availability_filter (line 709) | pub fn cycle_availability_filter(&mut self) {
    method cycle_sort_column (line 714) | pub fn cycle_sort_column(&mut self) {
    method cycle_theme (line 720) | pub fn cycle_theme(&mut self) {
    method enter_search (line 725) | pub fn enter_search(&mut self) {
    method exit_search (line 729) | pub fn exit_search(&mut self) {
    method search_input (line 733) | pub fn search_input(&mut self, c: char) {
    method search_backspace (line 739) | pub fn search_backspace(&mut self) {
    method search_delete (line 747) | pub fn search_delete(&mut self) {
    method clear_search (line 754) | pub fn clear_search(&mut self) {
    method toggle_detail (line 760) | pub fn toggle_detail(&mut self) {
    method mark_selected_for_compare (line 766) | pub fn mark_selected_for_compare(&mut self) {
    method clear_compare_mark (line 775) | pub fn clear_compare_mark(&mut self) {
    method selected_compare_pair (line 781) | pub fn selected_compare_pair(&self) -> Option<(&ModelFit, &ModelFit)> {
    method toggle_compare_view (line 791) | pub fn toggle_compare_view(&mut self) {
    method open_plan_mode (line 810) | pub fn open_plan_mode(&mut self) {
    method close_plan_mode (line 829) | pub fn close_plan_mode(&mut self) {
    method plan_next_field (line 837) | pub fn plan_next_field(&mut self) {
    method plan_prev_field (line 842) | pub fn plan_prev_field(&mut self) {
    method plan_cursor_left (line 847) | pub fn plan_cursor_left(&mut self) {
    method plan_cursor_right (line 853) | pub fn plan_cursor_right(&mut self) {
    method plan_input (line 860) | pub fn plan_input(&mut self, c: char) {
    method plan_backspace (line 891) | pub fn plan_backspace(&mut self) {
    method plan_delete (line 904) | pub fn plan_delete(&mut self) {
    method plan_clear_field (line 913) | pub fn plan_clear_field(&mut self) {
    method refresh_plan_estimate (line 919) | pub fn refresh_plan_estimate(&mut self) {
    method plan_model_name (line 977) | pub fn plan_model_name(&self) -> Option<&str> {
    method open_provider_popup (line 983) | pub fn open_provider_popup(&mut self) {
    method close_provider_popup (line 988) | pub fn close_provider_popup(&mut self) {
    method open_use_case_popup (line 992) | pub fn open_use_case_popup(&mut self) {
    method close_use_case_popup (line 997) | pub fn close_use_case_popup(&mut self) {
    method provider_popup_up (line 1001) | pub fn provider_popup_up(&mut self) {
    method provider_popup_down (line 1007) | pub fn provider_popup_down(&mut self) {
    method provider_popup_toggle (line 1013) | pub fn provider_popup_toggle(&mut self) {
    method provider_popup_select_all (line 1021) | pub fn provider_popup_select_all(&mut self) {
    method use_case_popup_up (line 1030) | pub fn use_case_popup_up(&mut self) {
    method use_case_popup_down (line 1036) | pub fn use_case_popup_down(&mut self) {
    method use_case_popup_toggle (line 1042) | pub fn use_case_popup_toggle(&mut self) {
    method use_case_popup_select_all (line 1050) | pub fn use_case_popup_select_all(&mut self) {
    method open_capability_popup (line 1059) | pub fn open_capability_popup(&mut self) {
    method close_capability_popup (line 1063) | pub fn close_capability_popup(&mut self) {
    method capability_popup_up (line 1067) | pub fn capability_popup_up(&mut self) {
    method capability_popup_down (line 1073) | pub fn capability_popup_down(&mut self) {
    method capability_popup_toggle (line 1079) | pub fn capability_popup_toggle(&mut self) {
    method capability_popup_select_all (line 1087) | pub fn capability_popup_select_all(&mut self) {
    method enter_visual_mode (line 1098) | pub fn enter_visual_mode(&mut self) {
    method exit_visual_mode (line 1103) | pub fn exit_visual_mode(&mut self) {
    method visual_range (line 1108) | pub fn visual_range(&self) -> Option<std::ops::RangeInclusive<usize>> {
    method visual_selection_count (line 1115) | pub fn visual_selection_count(&self) -> usize {
    method visual_compare (line 1122) | pub fn visual_compare(&mut self) {
    method close_multi_compare (line 1144) | pub fn close_multi_compare(&mut self) {
    method multi_compare_scroll_left (line 1149) | pub fn multi_compare_scroll_left(&mut self) {
    method multi_compare_scroll_right (line 1155) | pub fn multi_compare_scroll_right(&mut self) {
    method enter_select_mode (line 1165) | pub fn enter_select_mode(&mut self) {
    method exit_select_mode (line 1169) | pub fn exit_select_mode(&mut self) {
    method select_column_left (line 1173) | pub fn select_column_left(&mut self) {
    method select_column_right (line 1179) | pub fn select_column_right(&mut self) {
    method activate_select_column_filter (line 1186) | pub fn activate_select_column_filter(&mut self) {
    method set_or_toggle_sort (line 1218) | fn set_or_toggle_sort(&mut self, col: SortColumn) {
    method close_quant_popup (line 1230) | pub fn close_quant_popup(&mut self) {
    method quant_popup_up (line 1234) | pub fn quant_popup_up(&mut self) {
    method quant_popup_down (line 1240) | pub fn quant_popup_down(&mut self) {
    method quant_popup_toggle (line 1246) | pub fn quant_popup_toggle(&mut self) {
    method quant_popup_select_all (line 1253) | pub fn quant_popup_select_all(&mut self) {
    method close_run_mode_popup (line 1264) | pub fn close_run_mode_popup(&mut self) {
    method run_mode_popup_up (line 1268) | pub fn run_mode_popup_up(&mut self) {
    method run_mode_popup_down (line 1274) | pub fn run_mode_popup_down(&mut self) {
    method run_mode_popup_toggle (line 1280) | pub fn run_mode_popup_toggle(&mut self) {
    method run_mode_popup_select_all (line 1288) | pub fn run_mode_popup_select_all(&mut self) {
    method close_params_bucket_popup (line 1299) | pub fn close_params_bucket_popup(&mut self) {
    method params_bucket_popup_up (line 1303) | pub fn params_bucket_popup_up(&mut self) {
    method params_bucket_popup_down (line 1309) | pub fn params_bucket_popup_down(&mut self) {
    method params_bucket_popup_toggle (line 1315) | pub fn params_bucket_popup_toggle(&mut self) {
    method params_bucket_popup_select_all (line 1323) | pub fn params_bucket_popup_select_all(&mut self) {
    method toggle_installed_first (line 1332) | pub fn toggle_installed_first(&mut self) {
    method re_sort (line 1338) | fn re_sort(&mut self) {
    method start_download (line 1353) | pub fn start_download(&mut self) {
    method format_no_download_message (line 1401) | fn format_no_download_message(
    method start_mlx_download (line 1423) | fn start_mlx_download(&mut self, model_name: String) {
    method start_download_with_provider (line 1444) | fn start_download_with_provider(&mut self, model_name: String, provide...
    method start_ollama_download (line 1454) | fn start_ollama_download(&mut self, model_name: String) {
    method start_llamacpp_download_for_model (line 1474) | fn start_llamacpp_download_for_model(&mut self, model_name: String) {
    method start_docker_mr_download (line 1502) | fn start_docker_mr_download(&mut self, model_name: String) {
    method start_lmstudio_download (line 1521) | fn start_lmstudio_download(&mut self, model_name: String) {
    method tick_pull (line 1541) | pub fn tick_pull(&mut self) {
    method available_download_providers (line 1593) | fn available_download_providers(
    method open_download_provider_popup (line 1623) | fn open_download_provider_popup(&mut self, model_name: String, options...
    method close_download_provider_popup (line 1631) | pub fn close_download_provider_popup(&mut self) {
    method download_provider_popup_up (line 1639) | pub fn download_provider_popup_up(&mut self) {
    method download_provider_popup_down (line 1645) | pub fn download_provider_popup_down(&mut self) {
    method confirm_download_provider_selection (line 1651) | pub fn confirm_download_provider_selection(&mut self) {
    method refresh_installed (line 1673) | pub fn refresh_installed(&mut self) {
    method download_capability_for (line 1707) | pub fn download_capability_for(&self, model_name: &str) -> DownloadCap...
    method enqueue_capability_probes_for_visible (line 1714) | pub fn enqueue_capability_probes_for_visible(&mut self, window: usize) {
    method enqueue_capability_probe (line 1729) | fn enqueue_capability_probe(&mut self, model_name: String, has_catalog...
    method tick_download_capability (line 1771) | fn tick_download_capability(&mut self) {
    method active_plan_input (line 1784) | fn active_plan_input(&self) -> &String {
    method active_plan_input_mut (line 1792) | fn active_plan_input_mut(&mut self) -> &mut String {
  function command_exists (line 1801) | fn command_exists(name: &str) -> bool {

FILE: llmfit-tui/src/tui_events.rs
  function handle_events (line 7) | pub fn handle_events(app: &mut App) -> std::io::Result<bool> {
  function handle_normal_mode (line 37) | fn handle_normal_mode(app: &mut App, key: KeyEvent) {
  function handle_visual_mode (line 140) | fn handle_visual_mode(app: &mut App, key: KeyEvent) {
  function handle_select_mode (line 165) | fn handle_select_mode(app: &mut App, key: KeyEvent) {
  function handle_search_mode (line 185) | fn handle_search_mode(app: &mut App, key: KeyEvent) {
  function handle_provider_popup_mode (line 206) | fn handle_provider_popup_mode(app: &mut App, key: KeyEvent) {
  function handle_plan_mode (line 221) | fn handle_plan_mode(app: &mut App, key: KeyEvent) {
  function handle_use_case_popup_mode (line 238) | fn handle_use_case_popup_mode(app: &mut App, key: KeyEvent) {
  function handle_capability_popup_mode (line 253) | fn handle_capability_popup_mode(app: &mut App, key: KeyEvent) {
  function handle_download_provider_popup_mode (line 268) | fn handle_download_provider_popup_mode(app: &mut App, key: KeyEvent) {
  function handle_quant_popup_mode (line 278) | fn handle_quant_popup_mode(app: &mut App, key: KeyEvent) {
  function handle_run_mode_popup_mode (line 293) | fn handle_run_mode_popup_mode(app: &mut App, key: KeyEvent) {
  function handle_params_bucket_popup_mode (line 308) | fn handle_params_bucket_popup_mode(app: &mut App, key: KeyEvent) {

FILE: llmfit-tui/src/tui_ui.rs
  function draw (line 21) | pub fn draw(frame: &mut Frame, app: &mut App) {
  function draw_system_bar (line 75) | fn draw_system_bar(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeC...
  function draw_search_and_filters (line 238) | fn draw_search_and_filters(frame: &mut Frame, app: &App, area: Rect, tc:...
  function fit_color (line 450) | fn fit_color(level: FitLevel, tc: &ThemeColors) -> Color {
  function fit_indicator (line 459) | fn fit_indicator(level: FitLevel) -> &'static str {
  function pull_indicator (line 469) | fn pull_indicator(percent: Option<f64>, tick: u64) -> String {
  function draw_table (line 490) | fn draw_table(frame: &mut Frame, app: &mut App, area: Rect, tc: &ThemeCo...
  function draw_compare (line 724) | fn draw_compare(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColo...
  type CompareMetrics (line 902) | struct CompareMetrics {
  function compare_badges (line 915) | fn compare_badges(fit: &ModelFit) -> String {
  function render_compare_panel (line 933) | fn render_compare_panel(
  function draw_multi_compare (line 1034) | fn draw_multi_compare(frame: &mut Frame, app: &App, area: Rect, tc: &The...
  function truncate_str (line 1338) | fn truncate_str(s: &str, max_len: usize) -> String {
  function draw_detail (line 1346) | fn draw_detail(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColor...
  function draw_plan (line 1816) | fn draw_plan(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColors) {
  function draw_provider_popup (line 2015) | fn draw_provider_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {
  function draw_use_case_popup (line 2090) | fn draw_use_case_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {
  function draw_capability_popup (line 2173) | fn draw_capability_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {
  function draw_download_provider_popup (line 2256) | fn draw_download_provider_popup(frame: &mut Frame, app: &App, tc: &Theme...
  function status_keys_and_mode (line 2314) | fn status_keys_and_mode(app: &App) -> (String, String) {
  function draw_status_bar (line 2412) | fn draw_status_bar(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeC...
  function draw_quant_popup (line 2466) | fn draw_quant_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {
  function draw_run_mode_popup (line 2537) | fn draw_run_mode_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {
  function draw_params_bucket_popup (line 2612) | fn draw_params_bucket_popup(frame: &mut Frame, app: &App, tc: &ThemeColo...

FILE: llmfit-web/src/App.jsx
  constant THEME_KEY (line 4) | const THEME_KEY = 'llmfit-theme';
  constant FIT_OPTIONS (line 6) | const FIT_OPTIONS = [
  constant RUNTIME_OPTIONS (line 14) | const RUNTIME_OPTIONS = [
  constant USE_CASE_OPTIONS (line 21) | const USE_CASE_OPTIONS = [
  constant LIMIT_OPTIONS (line 31) | const LIMIT_OPTIONS = [
  constant SORT_OPTIONS (line 40) | const SORT_OPTIONS = [
  function initialTheme (line 50) | function initialTheme() {
  function round (line 63) | function round(value, digits = 1) {
  function fitClass (line 70) | function fitClass(code) {
  function modeClass (line 74) | function modeClass(code) {
  function SystemCard (line 78) | function SystemCard({ label, value, detail }) {
  function MetricBar (line 88) | function MetricBar({ label, value }) {
  function fitRank (line 103) | function fitRank(level) {
  function applyClientFitFilter (line 118) | function applyClientFitFilter(models, minFit) {
  function App (line 134) | function App() {

FILE: llmfit-web/src/App.test.jsx
  function jsonResponse (line 4) | function jsonResponse(payload, { ok = true, status = 200 } = {}) {

FILE: llmfit-web/src/api.js
  constant DEFAULT_FILTERS (line 1) | const DEFAULT_FILTERS = {
  function trimOrEmpty (line 11) | function trimOrEmpty(value) {
  function buildModelsQuery (line 15) | function buildModelsQuery(filters) {
  function parseJsonOrThrow (line 61) | async function parseJsonOrThrow(response) {
  function fetchSystemInfo (line 77) | async function fetchSystemInfo(signal) {
  function fetchModels (line 82) | async function fetchModels(filters, signal) {

FILE: scripts/scrape_docker_models.py
  function fetch_docker_hub_models (line 157) | def fetch_docker_hub_models() -> list[str]:
  function fetch_tags_for_model (line 181) | def fetch_tags_for_model(model_name: str) -> list[str]:
  function ollama_tag_to_docker_repo (line 194) | def ollama_tag_to_docker_repo(ollama_tag: str) -> str:
  function lookup_ollama_tag (line 202) | def lookup_ollama_tag(hf_name: str) -> str | None:
  function main (line 212) | def main():

FILE: scripts/scrape_hf_models.py
  function _auth_headers (line 27) | def _auth_headers() -> dict[str, str]:
  function fetch_model_info (line 309) | def fetch_model_info(repo_id: str) -> dict | None:
  function format_param_count (line 328) | def format_param_count(total_params: int) -> str:
  function estimate_ram (line 340) | def estimate_ram(total_params: int, quant: str) -> tuple[float, float]:
  function estimate_vram (line 358) | def estimate_vram(total_params: int, quant: str) -> float:
  function detect_moe (line 367) | def detect_moe(repo_id: str, config: dict | None, architecture: str,
  function estimate_active_params (line 405) | def estimate_active_params(total_params: int, num_experts: int,
  function infer_use_case (line 419) | def infer_use_case(repo_id: str, pipeline_tag: str | None, config: dict ...
  function infer_context_length (line 437) | def infer_context_length(config: dict | None) -> int:
  function fetch_config_json (line 469) | def fetch_config_json(repo_id: str) -> dict | None:
  function extract_provider (line 480) | def extract_provider(repo_id: str) -> str:
  function infer_capabilities (line 509) | def infer_capabilities(repo_id: str, pipeline_tag: str | None, use_case:...
  function detect_quant_format (line 545) | def detect_quant_format(repo_id: str, config: dict | None) -> tuple[str,...
  function _detect_format_from_name (line 594) | def _detect_format_from_name(repo_id: str) -> tuple[str, str]:
  function scrape_model (line 612) | def scrape_model(repo_id: str) -> dict | None:
  function _load_gguf_cache (line 691) | def _load_gguf_cache() -> dict:
  function _save_gguf_cache (line 703) | def _save_gguf_cache(cache: dict):
  function _cache_entry_fresh (line 710) | def _cache_entry_fresh(entry: dict) -> bool:
  function _model_gguf_repo_candidates (line 720) | def _model_gguf_repo_candidates(repo_id: str) -> list[tuple[str, str]]:
  function check_gguf_repo_exists (line 735) | def check_gguf_repo_exists(repo_id: str) -> bool:
  function enrich_gguf_sources (line 748) | def enrich_gguf_sources(models: list[dict]) -> int:
  function discover_trending_models (line 819) | def discover_trending_models(limit: int = 30, min_downloads: int = 10000...
  function main (line 890) | def main():

FILE: scripts/test_api.py
  function _http_json (line 27) | def _http_json(url: str, timeout: float = 10.0) -> Tuple[int, Dict[str, ...
  function _assert (line 44) | def _assert(condition: bool, message: str) -> None:
  function _expect_keys (line 49) | def _expect_keys(obj: Dict[str, Any], keys: List[str], path: str) -> None:
  function test_health (line 54) | def test_health(base_url: str) -> None:
  function test_system (line 63) | def test_system(base_url: str) -> None:
  function test_models_envelope_and_limit (line 74) | def test_models_envelope_and_limit(base_url: str) -> None:
  function test_top_endpoint_excludes_too_tight (line 83) | def test_top_endpoint_excludes_too_tight(base_url: str) -> None:
  function test_filters_runtime_and_use_case (line 91) | def test_filters_runtime_and_use_case(base_url: str) -> None:
  function test_models_shape (line 100) | def test_models_shape(base_url: str) -> None:
  function test_name_lookup (line 129) | def test_name_lookup(base_url: str) -> None:
  function test_invalid_filter_returns_400 (line 153) | def test_invalid_filter_returns_400(base_url: str) -> None:
  function test_sort_score_desc (line 159) | def test_sort_score_desc(base_url: str) -> None:
  function wait_for_health (line 176) | def wait_for_health(base_url: str, timeout_s: float = 30.0) -> None:
  function spawn_server (line 189) | def spawn_server(base_url: str, project_root: str) -> subprocess.Popen:
  function run_all_tests (line 217) | def run_all_tests(base_url: str) -> None:
  function main (line 235) | def main() -> int:

FILE: scripts/verify_models.py
  function check_url (line 30) | def check_url(url: str) -> int:
  function load_hf_models (line 46) | def load_hf_models() -> list[str]:
  function verify_hf (line 53) | def verify_hf(models: list[str]) -> list[str]:
  function parse_ollama_tags (line 73) | def parse_ollama_tags() -> list[str]:
  function verify_ollama (line 100) | def verify_ollama(tags: list[str]) -> list[str]:
  function main (line 120) | def main():
Condensed preview — 68 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,787K chars).
[
  {
    "path": ".dockerignore",
    "chars": 441,
    "preview": "# Build artifacts\ntarget/\n*.swp\n*.swo\n*~\n\n# IDE\n.vscode/\n.idea/\n*.iml\n\n# Git\n.git/\n.gitignore\n.githooks/\n\n# Documentatio"
  },
  {
    "path": ".githooks/pre-push",
    "chars": 204,
    "preview": "#!/usr/bin/env bash\nset -e\n\necho \"Running cargo fmt --check...\"\ncargo fmt --check\nif [ $? -ne 0 ]; then\n    echo \"❌ Form"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 427,
    "preview": "version: 2\nupdates:\n  - package-ecosystem: cargo\n    directory: /\n    schedule:\n      interval: weekly\n    cooldown: # a"
  },
  {
    "path": ".github/workflows/ci.yml",
    "chars": 5932,
    "preview": "name: CI\n\non:\n  push:\n    branches:\n      - main    # Run after merging into the \"main\" target\n    paths:      # Run onl"
  },
  {
    "path": ".github/workflows/docker.yml",
    "chars": 1666,
    "preview": "name: Docker Build and Push\n\non:\n  push:\n    tags:\n      - \"v*\"\n      - \"!v*-mac\"\n  workflow_dispatch: # Allow manual tr"
  },
  {
    "path": ".github/workflows/release-desktop.yml",
    "chars": 2459,
    "preview": "name: Release Desktop (macOS)\n\non:\n  push:\n    tags:\n      - \"v*-mac\"\n\npermissions:\n  contents: write\n\nenv:\n  CARGO_TERM"
  },
  {
    "path": ".github/workflows/release.yml",
    "chars": 7967,
    "preview": "name: Release\n\non:\n  push:\n    tags:\n      - \"v*\"\n      - \"!v*-mac\"\n\npermissions:\n  contents: write\n\nenv:\n  CARGO_TERM_C"
  },
  {
    "path": ".gitignore",
    "chars": 121,
    "preview": "/target\nllmfit\n/docs\nllmfit-desktop/gen/schemas/*\ndata/gguf_sources_cache.json\nllmfit-web/node_modules/\nllmfit-web/dist/"
  },
  {
    "path": ".release-please-manifest.json",
    "chars": 19,
    "preview": "{\n  \".\": \"0.3.7\"\n}\n"
  },
  {
    "path": "AGENTS.md",
    "chars": 7756,
    "preview": "# AGENTS.md\n\nInstructions for AI agents contributing to this codebase.\n\n---\n\n## Project overview\n\n`llmfit` is a Rust CLI"
  },
  {
    "path": "API.md",
    "chars": 5441,
    "preview": "# llmfit REST API Guide\n\nThis document is for agent/client builders integrating with `llmfit serve`.\n\n## Purpose\n\n`llmfi"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 11869,
    "preview": "# Changelog\n\n## [0.3.7](https://github.com/AlexsJones/llmfit/compare/v0.3.6...v0.3.7) (2026-02-21)\n\n\n### Features\n\n* add"
  },
  {
    "path": "CNAME",
    "chars": 16,
    "preview": "llmfit.axjns.dev"
  },
  {
    "path": "Cargo.toml",
    "chars": 172,
    "preview": "[workspace]\nmembers = [\"llmfit-core\", \"llmfit-tui\", \"llmfit-desktop\"]\ndefault-members = [\"llmfit-core\", \"llmfit-tui\"]\nre"
  },
  {
    "path": "Dockerfile",
    "chars": 1183,
    "preview": "# Multi-stage build for llmfit\n# Stage 1: Build the Rust binary\nFROM rust:1.88-slim AS builder\n\n# Install build dependen"
  },
  {
    "path": "LICENSE",
    "chars": 1067,
    "preview": "MIT License\n\nCopyright (c) 2026 Alex Jones\n\nPermission is hereby granted, free of charge, to any person obtaining a copy"
  },
  {
    "path": "MODELS.md",
    "chars": 19006,
    "preview": "# Supported Models\n\nllmfit ships with a curated database of 106 LLM models from HuggingFace. All memory estimates assume"
  },
  {
    "path": "Makefile",
    "chars": 2036,
    "preview": "# Makefile for llmfit\n# Convenience commands for building, testing, and updating the model database\n\n.PHONY: help build "
  },
  {
    "path": "README.md",
    "chars": 34799,
    "preview": "# llmfit\n\n<p align=\"center\">\n  <img src=\"assets/icon.svg\" alt=\"llmfit icon\" width=\"128\" height=\"128\">\n</p>\n\n<p align=\"ce"
  },
  {
    "path": "README.zh.md",
    "chars": 22248,
    "preview": "# llmfit\n\n<p align=\"center\">\n  <img src=\"assets/icon.svg\" alt=\"llmfit 图标\" width=\"128\" height=\"128\">\n</p>\n\n<p align=\"cent"
  },
  {
    "path": "data/hf_models.json",
    "chars": 333011,
    "preview": "[\n  {\n    \"name\": \"echarlaix/tiny-random-PhiForCausalLM\",\n    \"provider\": \"echarlaix\",\n    \"parameter_count\": \"80K\",\n   "
  },
  {
    "path": "flake.nix",
    "chars": 1419,
    "preview": "{\n  description = \"Hundreds of models & providers. One command to find what runs on your hardware.\";\n\n  inputs = {\n    n"
  },
  {
    "path": "index.html",
    "chars": 288,
    "preview": "<html>\n<head>\n  <meta charset=\"utf-8\">\n  <title>llmfit</title>\n</head>\n<body>\n  <h1>llmfit</h1>\n  <p>Match LLM models to"
  },
  {
    "path": "install.sh",
    "chars": 6049,
    "preview": "#!/bin/sh\n# llmfit installer\n# Usage: curl -fsSL https://raw.githubusercontent.com/AlexsJones/llmfit/main/install.sh | s"
  },
  {
    "path": "llmfit-core/Cargo.toml",
    "chars": 607,
    "preview": "[package]\nname = \"llmfit-core\"\nversion.workspace = true\nedition = \"2024\"\nauthors = [\"Alex Jones <alex@example.com>\"]\ndes"
  },
  {
    "path": "llmfit-core/data/docker_models.json",
    "chars": 10606,
    "preview": "{\n  \"generated_by\": \"scrape_docker_models.py\",\n  \"docker_hub_repo_count\": 46,\n  \"matched_model_count\": 35,\n  \"models\": ["
  },
  {
    "path": "llmfit-core/data/hf_models.json",
    "chars": 342482,
    "preview": "[\n  {\n    \"name\": \"echarlaix/tiny-random-PhiForCausalLM\",\n    \"provider\": \"echarlaix\",\n    \"parameter_count\": \"80K\",\n   "
  },
  {
    "path": "llmfit-core/src/fit.rs",
    "chars": 67434,
    "preview": "use crate::hardware::{GpuBackend, SystemSpecs};\nuse crate::models::{self, LlmModel, UseCase};\n\n/// Inference runtime — t"
  },
  {
    "path": "llmfit-core/src/hardware.rs",
    "chars": 92394,
    "preview": "use std::collections::BTreeMap;\nuse sysinfo::System;\n\n/// The acceleration backend for inference speed estimation.\n#[der"
  },
  {
    "path": "llmfit-core/src/lib.rs",
    "chars": 596,
    "preview": "pub mod fit;\npub mod hardware;\npub mod models;\npub mod plan;\npub mod providers;\n\npub use fit::{FitLevel, InferenceRuntim"
  },
  {
    "path": "llmfit-core/src/models.rs",
    "chars": 50545,
    "preview": "use std::collections::HashMap;\n\nuse serde::{Deserialize, Serialize};\n\n/// Quantization levels ordered from best quality "
  },
  {
    "path": "llmfit-core/src/plan.rs",
    "chars": 36460,
    "preview": "use crate::fit::{FitLevel, RunMode};\nuse crate::hardware::{GpuBackend, SystemSpecs};\nuse crate::models::{LlmModel, quant"
  },
  {
    "path": "llmfit-core/src/providers.rs",
    "chars": 105452,
    "preview": "//! Runtime model providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).\n//!\n//! Each provider can list loc"
  },
  {
    "path": "llmfit-desktop/Cargo.toml",
    "chars": 656,
    "preview": "[package]\nname = \"llmfit-desktop\"\nversion.workspace = true\nedition = \"2024\"\nauthors = [\"axjns\"]\ndescription = \"macOS des"
  },
  {
    "path": "llmfit-desktop/build.rs",
    "chars": 39,
    "preview": "fn main() {\n    tauri_build::build()\n}\n"
  },
  {
    "path": "llmfit-desktop/capabilities/default.json",
    "chars": 147,
    "preview": "{\n  \"identifier\": \"default\",\n  \"description\": \"Default permissions for llmfit desktop\",\n  \"windows\": [\"main\"],\n  \"permis"
  },
  {
    "path": "llmfit-desktop/src/main.rs",
    "chars": 6095,
    "preview": "#![cfg_attr(not(debug_assertions), windows_subsystem = \"windows\")]\n\nuse llmfit_core::fit::{FitLevel, InferenceRuntime, M"
  },
  {
    "path": "llmfit-desktop/tauri.conf.json",
    "chars": 437,
    "preview": "{\n  \"productName\": \"llmfit\",\n  \"version\": \"0.4.8\",\n  \"identifier\": \"com.llmfit.desktop\",\n  \"build\": {\n    \"frontendDist\""
  },
  {
    "path": "llmfit-desktop/ui/app.js",
    "chars": 9759,
    "preview": "const invoke = window.__TAURI_INTERNALS__\n  ? window.__TAURI_INTERNALS__.invoke\n  : async (cmd) => { console.warn('Tauri"
  },
  {
    "path": "llmfit-desktop/ui/index.html",
    "chars": 2313,
    "preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n  <meta charset=\"UTF-8\">\n  <meta name=\"viewport\" content=\"width=device-width, in"
  },
  {
    "path": "llmfit-desktop/ui/styles.css",
    "chars": 5636,
    "preview": ":root {\n  --bg: #0d1117;\n  --surface: #161b22;\n  --border: #30363d;\n  --text: #e6edf3;\n  --text-dim: #8b949e;\n  --accent"
  },
  {
    "path": "llmfit-tui/Cargo.toml",
    "chars": 973,
    "preview": "[package]\nname = \"llmfit\"\nversion.workspace = true\nedition = \"2024\"\nbuild = \"build.rs\"\nauthors = [\"Alex Jones <alex@exam"
  },
  {
    "path": "llmfit-tui/build.rs",
    "chars": 5181,
    "preview": "use std::env;\nuse std::fs;\nuse std::path::{Path, PathBuf};\n\nfn main() {\n    println!(\"cargo:rerun-if-changed=build.rs\");"
  },
  {
    "path": "llmfit-tui/src/display.rs",
    "chars": 18238,
    "preview": "use colored::*;\nuse llmfit_core::fit::{FitLevel, ModelFit};\nuse llmfit_core::hardware::SystemSpecs;\nuse llmfit_core::mod"
  },
  {
    "path": "llmfit-tui/src/main.rs",
    "chars": 57385,
    "preview": "mod display;\nmod serve_api;\nmod theme;\nmod tui_app;\nmod tui_events;\nmod tui_ui;\n\nuse clap::{Parser, Subcommand};\nuse std"
  },
  {
    "path": "llmfit-tui/src/serve_api.rs",
    "chars": 23708,
    "preview": "use std::collections::HashMap;\nuse std::net::{IpAddr, SocketAddr};\nuse std::sync::{Arc, LazyLock};\n\nuse axum::extract::{"
  },
  {
    "path": "llmfit-tui/src/theme.rs",
    "chars": 17605,
    "preview": "use ratatui::style::Color;\nuse std::fs;\nuse std::path::PathBuf;\n\n/// Available color themes for the TUI.\n#[derive(Debug,"
  },
  {
    "path": "llmfit-tui/src/tui_app.rs",
    "chars": 62699,
    "preview": "use llmfit_core::fit::{FitLevel, ModelFit, SortColumn, backend_compatible};\nuse llmfit_core::hardware::SystemSpecs;\nuse "
  },
  {
    "path": "llmfit-tui/src/tui_events.rs",
    "chars": 11342,
    "preview": "use crossterm::event::{self, Event, KeyCode, KeyEvent, KeyEventKind, KeyModifiers};\nuse std::time::Duration;\n\nuse crate:"
  },
  {
    "path": "llmfit-tui/src/tui_ui.rs",
    "chars": 93286,
    "preview": "use ratatui::{\n    Frame,\n    layout::{Constraint, Direction, Layout, Rect},\n    style::{Color, Modifier, Style},\n    te"
  },
  {
    "path": "llmfit-web/README.md",
    "chars": 343,
    "preview": "# llmfit-web\n\nReact + Vite frontend for the llmfit local web dashboard.\n\n## Development\n\n```sh\nnpm ci\nnpm run dev\n```\n\nT"
  },
  {
    "path": "llmfit-web/index.html",
    "chars": 307,
    "preview": "<!doctype html>\n<html lang=\"en\">\n  <head>\n    <meta charset=\"UTF-8\" />\n    <meta name=\"viewport\" content=\"width=device-w"
  },
  {
    "path": "llmfit-web/package.json",
    "chars": 534,
    "preview": "{\n  \"name\": \"llmfit-web\",\n  \"private\": true,\n  \"version\": \"0.1.0\",\n  \"type\": \"module\",\n  \"scripts\": {\n    \"dev\": \"vite\","
  },
  {
    "path": "llmfit-web/src/App.jsx",
    "chars": 18646,
    "preview": "import { useEffect, useMemo, useState } from 'react';\nimport { DEFAULT_FILTERS, fetchModels, fetchSystemInfo } from './a"
  },
  {
    "path": "llmfit-web/src/App.test.jsx",
    "chars": 6158,
    "preview": "import { fireEvent, render, screen, waitFor } from '@testing-library/react';\nimport App from './App';\n\nfunction jsonResp"
  },
  {
    "path": "llmfit-web/src/api.js",
    "chars": 2247,
    "preview": "export const DEFAULT_FILTERS = {\n  search: '',\n  minFit: 'marginal',\n  runtime: 'any',\n  useCase: 'all',\n  provider: '',"
  },
  {
    "path": "llmfit-web/src/api.test.js",
    "chars": 1940,
    "preview": "import { buildModelsQuery } from './api';\n\ndescribe('buildModelsQuery', () => {\n  it('maps filter state to API query par"
  },
  {
    "path": "llmfit-web/src/main.jsx",
    "chars": 236,
    "preview": "import React from 'react';\nimport ReactDOM from 'react-dom/client';\nimport App from './App';\nimport './styles.css';\n\nRea"
  },
  {
    "path": "llmfit-web/src/styles.css",
    "chars": 10441,
    "preview": ":root {\n  --font-ui: \"Sora\", \"Manrope\", \"Avenir Next\", \"Segoe UI\", sans-serif;\n  --radius-xl: 22px;\n  --radius-lg: 16px;"
  },
  {
    "path": "llmfit-web/src/test-setup.js",
    "chars": 43,
    "preview": "import '@testing-library/jest-dom/vitest';\n"
  },
  {
    "path": "llmfit-web/vite.config.js",
    "chars": 380,
    "preview": "import { defineConfig } from 'vite';\nimport react from '@vitejs/plugin-react';\n\nexport default defineConfig({\n  plugins:"
  },
  {
    "path": "scripts/install-openclaw-skill.sh",
    "chars": 967,
    "preview": "#!/usr/bin/env bash\nset -euo pipefail\n\n# Install the llmfit-advisor skill for OpenClaw\n# Usage: ./scripts/install-opencl"
  },
  {
    "path": "scripts/scrape_docker_models.py",
    "chars": 10464,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nScraper for Docker Model Runner available models.\n\nQueries the Docker Hub API for models in t"
  },
  {
    "path": "scripts/scrape_hf_models.py",
    "chars": 86210,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nScraper for popular LLM models from Hugging Face.\nFetches model metadata and computes RAM/VRA"
  },
  {
    "path": "scripts/test_api.py",
    "chars": 9630,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nLocal API validation tests for llmfit serve.\n\nUsage:\n  # Test an already-running server\n  pyt"
  },
  {
    "path": "scripts/update_models.sh",
    "chars": 3223,
    "preview": "#!/usr/bin/env bash\n# Automated model database update script for llmfit\n# This script:\n# 1. Runs the HuggingFace model s"
  },
  {
    "path": "scripts/verify_models.py",
    "chars": 5264,
    "preview": "#!/usr/bin/env python3\n\"\"\"Verify that all models in hf_models.json exist on HuggingFace and all\nOllama mappings in src/p"
  },
  {
    "path": "skills/llmfit-advisor/SKILL.md",
    "chars": 6974,
    "preview": "---\nname: llmfit-advisor\ndescription: Detect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM mo"
  }
]

About this extraction

This page contains the full source code of the AlexsJones/llmfit GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 68 files (1.6 MB), approximately 495.2k tokens, and a symbol index with 895 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!