[
  {
    "path": ".dockerignore",
    "content": "# Build artifacts\ntarget/\n*.swp\n*.swo\n*~\n\n# IDE\n.vscode/\n.idea/\n*.iml\n\n# Git\n.git/\n.gitignore\n.githooks/\n\n# Documentation and assets\n*.md\n!README.md\n*.png\n*.gif\ndemo.gif\ndownload.gif\nhome_laptop.png\nmoe.png\nassets/\n.github/\n\n# Scripts (not needed in container)\nscripts/\ninstall.sh\n\n# Website files\nindex.html\nCNAME\n\n# Release metadata\n.release-please-manifest.json\n\n# Skills (OpenClaw integration, not needed in container)\nskills/\nAGENTS.md\n"
  },
  {
    "path": ".githooks/pre-push",
    "content": "#!/usr/bin/env bash\nset -e\n\necho \"Running cargo fmt --check...\"\ncargo fmt --check\nif [ $? -ne 0 ]; then\n    echo \"❌ Formatting issues found. Run 'cargo fmt' to fix.\"\n    exit 1\nfi\n\necho \"✅ Formatting OK\"\n"
  },
  {
    "path": ".github/dependabot.yml",
    "content": "version: 2\nupdates:\n  - package-ecosystem: cargo\n    directory: /\n    schedule:\n      interval: weekly\n    cooldown: # applies only to version-updates (not security-updates)\n      default-days: 10\n      semver-minor-days: 14 # wait 14 days before applying minor updates\n      semver-major-days: 28\n  - package-ecosystem: github-actions\n    directory: /\n    schedule:\n      interval: weekly\n    cooldown:\n      default-days: 10\n"
  },
  {
    "path": ".github/workflows/ci.yml",
    "content": "name: CI\n\non:\n  push:\n    branches:\n      - main    # Run after merging into the \"main\" target\n    paths:      # Run only if files relevant to CI have changed (i.e. not install.sh, scripts or .github/worfklows)\n      - 'llmfit-*/**'\n      - 'Cargo.*o*'\n  pull_request:\n    branches:\n      - main    # Run in PRs targeting the \"main\" branch\n    paths:\n      - 'llmfit-*/**'\n      - 'Cargo.*o*'\n    types:      # Avoid low-impact events like \"edited\" or \"labeled\"\n      - opened\n      - synchronize\n      - reopened\n\nenv:\n  CARGO_TERM_COLOR: always\n  RUST_BACKTRACE: 1\n\njobs:\n  test:\n    name: Test Suite\n    runs-on: ${{ matrix.os }}\n    strategy:\n      matrix:\n        os: [ubuntu-latest, macos-latest, windows-latest]\n        rust: [stable]\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n\n      - name: Install Rust toolchain\n        uses: dtolnay/rust-toolchain@stable\n        with:\n          toolchain: ${{ matrix.rust }}\n\n      - name: Set up Node\n        uses: actions/setup-node@v4\n        with:\n          node-version: lts/*\n          cache: npm\n          cache-dependency-path: llmfit-web/package-lock.json\n\n      - name: Build web dashboard assets\n        run: |\n          cd llmfit-web\n          npm ci\n          npm run build\n\n      - name: Run web unit tests\n        run: |\n          cd llmfit-web\n          npm test\n\n      - name: Cache cargo registry\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: ~/.cargo/registry\n          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-registry-\n\n      - name: Cache cargo index\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: ~/.cargo/git\n          key: ${{ runner.os }}-cargo-index-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-index-\n\n      - name: Cache target directory\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: target\n          key: ${{ runner.os }}-cargo-target-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-target-\n\n      - name: Run tests\n        run: cargo test --verbose\n\n  fmt:\n    name: Rustfmt\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n\n      - name: Install Rust toolchain\n        uses: dtolnay/rust-toolchain@stable\n        with:\n          components: rustfmt\n\n      - name: Check formatting\n        run: cargo fmt --all -- --check\n\n  clippy:\n    name: Clippy\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n\n      - name: Install Rust toolchain\n        uses: dtolnay/rust-toolchain@stable\n        with:\n          components: clippy\n\n      - name: Set up Node\n        uses: actions/setup-node@v4\n        with:\n          node-version: lts/*\n          cache: npm\n          cache-dependency-path: llmfit-web/package-lock.json\n\n      - name: Build web dashboard assets\n        run: |\n          cd llmfit-web\n          npm ci\n          npm run build\n\n      - name: Cache cargo registry\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: ~/.cargo/registry\n          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-registry-\n\n      - name: Cache cargo index\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: ~/.cargo/git\n          key: ${{ runner.os }}-cargo-index-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-index-\n\n      - name: Cache target directory\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: target\n          key: ${{ runner.os }}-cargo-target-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-target-\n\n      - name: Run clippy\n        run: cargo clippy --all-targets --all-features\n\n  check:\n    name: Cargo Check\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n\n      - name: Install Rust toolchain\n        uses: dtolnay/rust-toolchain@stable\n\n      - name: Set up Node\n        uses: actions/setup-node@v4\n        with:\n          node-version: lts/*\n          cache: npm\n          cache-dependency-path: llmfit-web/package-lock.json\n\n      - name: Build web dashboard assets\n        run: |\n          cd llmfit-web\n          npm ci\n          npm run build\n\n      - name: Cache cargo registry\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: ~/.cargo/registry\n          key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-registry-\n\n      - name: Cache cargo index\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: ~/.cargo/git\n          key: ${{ runner.os }}-cargo-index-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-index-\n\n      - name: Cache target directory\n        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3\n        with:\n          path: target\n          key: ${{ runner.os }}-cargo-target-${{ hashFiles('**/Cargo.lock') }}\n          restore-keys: |\n            ${{ runner.os }}-cargo-target-\n\n      - name: Run cargo check\n        run: cargo check --all-targets --all-features\n"
  },
  {
    "path": ".github/workflows/docker.yml",
    "content": "name: Docker Build and Push\n\non:\n  push:\n    tags:\n      - \"v*\"\n      - \"!v*-mac\"\n  workflow_dispatch: # Allow manual trigger\n\npermissions:\n  contents: read\n  packages: write\n\nenv:\n  REGISTRY: ghcr.io\n  IMAGE_NAME: ${{ github.repository }}\n\njobs:\n  docker:\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n\n      - name: Set up QEMU\n        uses: docker/setup-qemu-action@v4\n\n      - name: Set up Docker Buildx\n        uses: docker/setup-buildx-action@v4\n\n      - name: Log in to GitHub Container Registry\n        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3\n        with:\n          registry: ${{ env.REGISTRY }}\n          username: ${{ github.actor }}\n          password: ${{ secrets.GITHUB_TOKEN }}\n\n      - name: Extract metadata (tags, labels)\n        id: meta\n        uses: docker/metadata-action@030e881283bb7a6894de51c315a6bfe6a94e05cf # v6.0.0\n        with:\n          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}\n          tags: |\n            type=semver,pattern={{version}}\n            type=semver,pattern={{major}}.{{minor}}\n            type=semver,pattern={{major}}\n            type=raw,value=latest,enable={{is_default_branch}}\n\n      - name: Build and push Docker image\n        uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0\n        with:\n          context: .\n          platforms: linux/amd64,linux/arm64\n          push: true\n          tags: ${{ steps.meta.outputs.tags }}\n          labels: ${{ steps.meta.outputs.labels }}\n          cache-from: type=gha\n          cache-to: type=gha,mode=max\n"
  },
  {
    "path": ".github/workflows/release-desktop.yml",
    "content": "name: Release Desktop (macOS)\n\non:\n  push:\n    tags:\n      - \"v*-mac\"\n\npermissions:\n  contents: write\n\nenv:\n  CARGO_TERM_COLOR: always\n\njobs:\n  build-desktop:\n    strategy:\n      matrix:\n        include:\n          - target: aarch64-apple-darwin\n            os: macos-latest\n          - target: x86_64-apple-darwin\n            os: macos-latest\n\n    runs-on: ${{ matrix.os }}\n\n    steps:\n      - name: Checkout\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n\n      - name: Install Rust toolchain\n        uses: dtolnay/rust-toolchain@stable\n        with:\n          targets: ${{ matrix.target }}\n\n      - name: Install Tauri CLI\n        run: cargo install tauri-cli --version \"^2\"\n\n      - name: Build Tauri app bundle\n        run: cargo tauri build --target ${{ matrix.target }} --bundles app\n        working-directory: llmfit-desktop\n\n      - name: Package .app bundle\n        shell: bash\n        run: |\n          TAG=\"${GITHUB_REF_NAME}\"\n          # Search both possible target locations\n          for BASE in \"target\" \"llmfit-desktop/target\"; do\n            APP=$(find \"${BASE}/${{ matrix.target }}/release/bundle\" -name '*.app' -maxdepth 3 2>/dev/null | head -1)\n            [ -n \"$APP\" ] && break\n          done\n\n          if [ -z \"$APP\" ]; then\n            echo \"::error::No .app bundle found\"\n            find target/ llmfit-desktop/target/ -type d -name 'bundle' 2>/dev/null || true\n            exit 1\n          fi\n\n          echo \"Found app bundle: $APP\"\n          DEST=\"llmfit-desktop-${TAG}-${{ matrix.target }}.app.tar.gz\"\n          cd \"$(dirname \"$APP\")\"\n          tar czf \"/tmp/${DEST}\" \"$(basename \"$APP\")\"\n          echo \"DESKTOP_ASSET=${DEST}\" >> \"$GITHUB_ENV\"\n          echo \"DESKTOP_ASSET_PATH=/tmp/${DEST}\" >> \"$GITHUB_ENV\"\n\n      - name: Upload artifact\n        uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0\n        with:\n          name: ${{ env.DESKTOP_ASSET }}\n          path: ${{ env.DESKTOP_ASSET_PATH }}\n\n  release:\n    needs: build-desktop\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Download all artifacts\n        uses: actions/download-artifact@70fc10c6e5e1ce46ad2ea6f2b72d43f7d47b13c3 # v8.0.0\n        with:\n          path: artifacts\n\n      - name: Create GitHub Release\n        uses: softprops/action-gh-release@a06a81a03ee405af7f2048a818ed3f03bbf83c7b # v2.5.0\n        with:\n          generate_release_notes: true\n          files: artifacts/**/*.tar.gz\n"
  },
  {
    "path": ".github/workflows/release.yml",
    "content": "name: Release\n\non:\n  push:\n    tags:\n      - \"v*\"\n      - \"!v*-mac\"\n\npermissions:\n  contents: write\n\nenv:\n  CARGO_TERM_COLOR: always\n  BINARY: llmfit\n\njobs:\n  build:\n    strategy:\n      matrix:\n        include:\n          # Linux x86_64 (static musl)\n          - target: x86_64-unknown-linux-musl\n            os: ubuntu-latest\n            use-cross: true\n\n          # Linux ARM64 (static musl)\n          - target: aarch64-unknown-linux-musl\n            os: ubuntu-latest\n            use-cross: true\n\n          # Linux x86_64 (glibc)\n          - target: x86_64-unknown-linux-gnu\n            os: ubuntu-latest\n            use-cross: false\n\n          # Linux ARM64 (glibc)\n          - target: aarch64-unknown-linux-gnu\n            os: ubuntu-latest\n            use-cross: true\n\n          # macOS Intel (cross-compiled from ARM64 runner)\n          - target: x86_64-apple-darwin\n            os: macos-latest\n            use-cross: false\n\n          # macOS Apple Silicon\n          - target: aarch64-apple-darwin\n            os: macos-latest\n            use-cross: false\n\n          # Windows x86_64\n          - target: x86_64-pc-windows-msvc\n            os: windows-latest\n            use-cross: false\n\n          # Windows ARM64\n          - target: aarch64-pc-windows-msvc\n            os: windows-latest\n            use-cross: false\n\n    runs-on: ${{ matrix.os }}\n\n    steps:\n      - name: Checkout\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n\n      - name: Install Rust toolchain\n        uses: dtolnay/rust-toolchain@stable\n        with:\n          targets: ${{ matrix.target }}\n\n      - name: Build web dashboard assets\n        run: |\n          cd llmfit-web\n          npm ci\n          npm run build\n\n      - name: Install cross\n        if: matrix.use-cross\n        run: cargo install cross --version 0.2.5\n\n      - name: Build\n        shell: bash\n        run: |\n          if [ \"${{ matrix.use-cross }}\" = \"true\" ]; then\n            cross build --release --target ${{ matrix.target }}\n          else\n            cargo build --release --target ${{ matrix.target }}\n          fi\n\n      - name: Package\n        shell: bash\n        run: |\n          TAG=\"${GITHUB_REF_NAME}\"\n          ASSET_NAME=\"${BINARY}-${TAG}-${{ matrix.target }}\"\n          STAGING=\"${RUNNER_TEMP}/${ASSET_NAME}\"\n          mkdir -p \"${STAGING}\"\n\n          if [[ \"${{ matrix.target }}\" == *\"windows\"* ]]; then\n            EXE_EXT=\".exe\"\n            ARCHIVE_EXT=\".zip\"\n            COMPRESS_CMD=\"7z a ${ASSET_NAME}${ARCHIVE_EXT} ${ASSET_NAME}\"\n          else\n            EXE_EXT=\"\"\n            ARCHIVE_EXT=\".tar.gz\"\n            COMPRESS_CMD=\"tar czf ${ASSET_NAME}${ARCHIVE_EXT} ${ASSET_NAME}\"\n          fi\n\n          cp \"target/${{ matrix.target }}/release/${BINARY}${EXE_EXT}\" \"${STAGING}/\"\n          cp README.md LICENSE \"${STAGING}/\" 2>/dev/null || true\n\n          cd \"${RUNNER_TEMP}\"\n          $COMPRESS_CMD\n\n          # Generate per-asset SHA256 checksum file (consistent format for both tools)\n          if command -v sha256sum >/dev/null 2>&1; then\n            sha256sum \"${ASSET_NAME}${ARCHIVE_EXT}\" | awk '{print $1 \"  \" $2}' > \"${ASSET_NAME}${ARCHIVE_EXT}.sha256\"\n          elif command -v shasum >/dev/null 2>&1; then\n            shasum -a 256 \"${ASSET_NAME}${ARCHIVE_EXT}\" | awk '{print $1 \"  \" $2}' > \"${ASSET_NAME}${ARCHIVE_EXT}.sha256\"\n          fi\n\n          echo \"ASSET=${ASSET_NAME}${ARCHIVE_EXT}\" >> \"$GITHUB_ENV\"\n          echo \"ASSET_PATH=${RUNNER_TEMP}/${ASSET_NAME}${ARCHIVE_EXT}\" >> \"$GITHUB_ENV\"\n          echo \"CHECKSUM_PATH=${RUNNER_TEMP}/${ASSET_NAME}${ARCHIVE_EXT}.sha256\" >> \"$GITHUB_ENV\"\n\n      - name: Upload artifact\n        uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0\n        with:\n          name: ${{ env.ASSET }}\n          path: |\n            ${{ env.ASSET_PATH }}\n            ${{ env.CHECKSUM_PATH }}\n\n  release:\n    needs: build\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Download all artifacts\n        uses: actions/download-artifact@70fc10c6e5e1ce46ad2ea6f2b72d43f7d47b13c3 # v8.0.0\n        with:\n          path: artifacts\n\n      - name: Create GitHub Release\n        uses: softprops/action-gh-release@a06a81a03ee405af7f2048a818ed3f03bbf83c7b # v2.5.0\n        with:\n          generate_release_notes: true\n          files: |\n            artifacts/**/*.tar.gz\n            artifacts/**/*.zip\n            artifacts/**/*.sha256\n\n  publish-crate:\n    needs: release\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n\n      - name: Install Rust toolchain\n        uses: dtolnay/rust-toolchain@stable\n\n      - name: Publish llmfit-core to crates.io\n        run: cargo publish -p llmfit-core --token ${{ secrets.CARGO_REGISTRY_TOKEN }}\n\n      - name: Wait for crates.io index update\n        run: sleep 30\n\n      - name: Publish llmfit to crates.io\n        run: cargo publish -p llmfit --token ${{ secrets.CARGO_REGISTRY_TOKEN }}\n\n  update-homebrew:\n    needs: release\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout tap\n        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2\n        with:\n          repository: AlexsJones/homebrew-llmfit\n          token: ${{ secrets.HOMEBREW_TAP_TOKEN }}\n          path: homebrew-tap\n\n      - name: Download release assets and update formula\n        env:\n          TAG: ${{ github.ref_name }}\n        run: |\n          VERSION=\"${TAG#v}\"\n\n          # Download the four tarballs and compute SHA256\n          declare -A SHAS\n          for target in aarch64-apple-darwin x86_64-apple-darwin aarch64-unknown-linux-musl x86_64-unknown-linux-musl; do\n            URL=\"https://github.com/AlexsJones/llmfit/releases/download/${TAG}/llmfit-${TAG}-${target}.tar.gz\"\n            echo \"Downloading ${URL}...\"\n            SHA=$(curl -fsSL \"$URL\" | shasum -a 256 | awk '{print $1}')\n            SHAS[$target]=\"$SHA\"\n            echo \"${target}: ${SHA}\"\n          done\n\n          # Generate the formula\n          cat > homebrew-tap/Formula/llmfit.rb << RUBY\n          class Llmfit < Formula\n            desc \"Terminal tool that right-sizes LLM models to your system hardware\"\n            homepage \"https://github.com/AlexsJones/llmfit\"\n            version \"${VERSION}\"\n            license \"MIT\"\n\n            on_macos do\n              if Hardware::CPU.arm?\n                url \"https://github.com/AlexsJones/llmfit/releases/download/v#{version}/llmfit-v#{version}-aarch64-apple-darwin.tar.gz\"\n                sha256 \"${SHAS[aarch64-apple-darwin]}\"\n              else\n                url \"https://github.com/AlexsJones/llmfit/releases/download/v#{version}/llmfit-v#{version}-x86_64-apple-darwin.tar.gz\"\n                sha256 \"${SHAS[x86_64-apple-darwin]}\"\n              end\n            end\n\n            on_linux do\n              if Hardware::CPU.arm?\n                url \"https://github.com/AlexsJones/llmfit/releases/download/v#{version}/llmfit-v#{version}-aarch64-unknown-linux-musl.tar.gz\"\n                sha256 \"${SHAS[aarch64-unknown-linux-musl]}\"\n              else\n                url \"https://github.com/AlexsJones/llmfit/releases/download/v#{version}/llmfit-v#{version}-x86_64-unknown-linux-musl.tar.gz\"\n                sha256 \"${SHAS[x86_64-unknown-linux-musl]}\"\n              end\n            end\n\n            def install\n              bin.install \"llmfit\"\n            end\n\n            test do\n              assert_match \"llmfit\", shell_output(\"#{bin}/llmfit --help\")\n            end\n          end\n          RUBY\n\n          # Fix heredoc indentation\n          sed -i 's/^          //' homebrew-tap/Formula/llmfit.rb\n\n      - name: Commit and push\n        run: |\n          cd homebrew-tap\n          git config user.name \"github-actions[bot]\"\n          git config user.email \"github-actions[bot]@users.noreply.github.com\"\n          git add Formula/llmfit.rb\n          git commit -m \"Update llmfit to ${GITHUB_REF_NAME}\"\n          git push\n"
  },
  {
    "path": ".gitignore",
    "content": "/target\nllmfit\n/docs\nllmfit-desktop/gen/schemas/*\ndata/gguf_sources_cache.json\nllmfit-web/node_modules/\nllmfit-web/dist/\n"
  },
  {
    "path": ".release-please-manifest.json",
    "content": "{\n  \".\": \"0.3.7\"\n}\n"
  },
  {
    "path": "AGENTS.md",
    "content": "# AGENTS.md\n\nInstructions for AI agents contributing to this codebase.\n\n---\n\n## Project overview\n\n`llmfit` is a Rust CLI/TUI tool that matches LLM models against local system hardware (RAM, CPU, GPU). It detects system specs, loads a model database from embedded JSON, scores each model's fit, and presents results in an interactive terminal UI or classic table output.\n\n## Language and toolchain\n\n- Rust, edition 2024.\n- Build with `cargo build`. Run with `cargo run`.\n- No nightly features required. Stable toolchain only.\n- Minimum supported Rust version: whatever edition 2024 requires (1.85+).\n\n## Architecture\n\n```\nmain.rs          Entrypoint. Parses CLI args via clap. Launches TUI by default,\n                 falls back to CLI subcommands (system, list, fit, search, info)\n                 or --cli flag for classic table output.\n\nhardware.rs      SystemSpecs::detect() reads RAM/CPU via sysinfo crate.\n                 detect_gpu() shells out to nvidia-smi / rocm-smi, and\n                 detects Apple Silicon via system_profiler.\n                 On unified memory (Apple Silicon), VRAM = system RAM.\n                 No async. No unsafe.\n\nmodels.rs        LlmModel struct. ModelDatabase loads from data/hf_models.json\n                 embedded via include_str!() at compile time. No runtime file I/O.\n\nfit.rs           FitLevel enum (Perfect, Good, Marginal, TooTight).\n                 RunMode enum (Gpu, CpuOffload, CpuOnly).\n                 ModelFit::analyze() compares a model against SystemSpecs,\n                 selecting the best available execution path (GPU > CPU offload > CPU).\n                 rank_models_by_fit() sorts by fit level, then run mode, then utilization.\n\ndisplay.rs       CLI-mode table rendering using the tabled crate.\n                 Only used when --cli flag or subcommands are invoked.\n\ntui_app.rs       TUI application state. Holds all models, filters (search text,\n                 provider toggles, fit filter), selection index.\n                 All filtering logic is here -- apply_filters() recomputes\n                 filtered_fits indices whenever inputs change.\n\ntui_ui.rs        Rendering with ratatui. Four layout regions: system bar,\n                 search/filter bar, model table (or detail pane), status bar.\n                 Stateless rendering -- reads from App, writes to Frame.\n\ntui_events.rs    Keyboard event handling with crossterm. Two modes: Normal\n                 (navigation, filter toggling, quit) and Search (text input).\n```\n\n## Data flow\n\n1. `App::new()` calls `SystemSpecs::detect()` and `ModelDatabase::new()`.\n2. Every model is analyzed into a `ModelFit` via `ModelFit::analyze()`.\n3. Results are sorted by `rank_models_by_fit()`.\n4. `apply_filters()` produces `filtered_fits: Vec<usize>` (indices into `all_fits`).\n5. The TUI render loop reads `App` state and draws via `tui_ui::draw()`.\n6. `tui_events::handle_events()` mutates `App` state, triggering re-render.\n\n## Model database\n\n- Source: `data/hf_models.json` (33 models).\n- Generated by `scripts/scrape_hf_models.py` (Python, stdlib only, no pip deps).\n- Embedded at compile time via `include_str!(\"../data/hf_models.json\")`.\n- Schema per entry: name, provider, parameter_count, min_ram_gb, recommended_ram_gb, min_vram_gb, quantization, context_length, use_case.\n- `min_vram_gb` is VRAM needed for GPU inference. `min_ram_gb` is system RAM needed for CPU inference. Both are derived from the same parameter count.\n- RAM formula: `params * 0.5 bytes (Q4_K_M) / 1024^3 * 1.2 overhead`.\n- VRAM formula: `params * 0.5 bytes (Q4_K_M) / 1024^3 * 1.1 activation overhead`.\n- Recommended RAM: `model_size * 2.0`.\n\nDo not manually edit `hf_models.json`. Regenerate it by running the scraper:\n\n```sh\npython3 scripts/scrape_hf_models.py\n```\n\nThe scraper has hardcoded fallback entries for gated models that require authentication.\n\n## Conventions\n\n- No `unsafe` code.\n- No `.unwrap()` on user-facing paths. Use proper error handling or `expect()` with a descriptive message for internal invariants only.\n- Fit levels are ordered: Perfect > Good > Marginal > TooTight. Do not add levels without updating `rank_models_by_fit()` sort logic.\n- Fit is VRAM-first. GPU inference with sufficient VRAM is the ideal path. CPU inference via system RAM is a fallback. The `RunMode` enum tracks which memory pool is being used (Gpu, CpuOffload, CpuOnly).\n- `min_vram_gb` is the VRAM needed to load model weights on GPU. `min_ram_gb` is the system RAM needed for CPU-only inference (same weights, loaded into RAM instead). They represent the same workload on different hardware paths.\n- On Apple Silicon (unified memory), VRAM = system RAM. The `CpuOffload` path is skipped because there is no separate RAM pool to spill to. `SystemSpecs::unified_memory` tracks this.\n- TUI rendering is stateless. `tui_ui::draw()` must not mutate `App`. Pass `&mut App` only for `TableState` widget requirements -- do not use it to change application state.\n- Event handling in `tui_events.rs` is the sole place that mutates `App` in the TUI loop.\n- Keep `display.rs` and `tui_*.rs` independent. The CLI path must work without initializing any TUI state.\n\n## Adding a new model to the database\n\n1. Add the model's HuggingFace repo ID to `TARGET_MODELS` in `scripts/scrape_hf_models.py`.\n2. If the model is gated (requires HF auth), add a fallback entry to the `FALLBACK` dict in the same script.\n3. Run `python3 scripts/scrape_hf_models.py`.\n4. Verify the output in `data/hf_models.json`.\n5. Run `cargo build` to verify compilation.\n\n## Adding a new filter\n\n1. Add the filter state to `App` in `tui_app.rs`.\n2. Add filtering logic inside `apply_filters()`.\n3. Add the keybinding in `tui_events.rs` (Normal mode handler).\n4. Add the UI widget in `tui_ui.rs` (`draw_search_and_filters()` function).\n5. Update the status bar help text in `draw_status_bar()`.\n\n## Adding a new CLI subcommand\n\n1. Add a variant to the `Commands` enum in `main.rs`.\n2. Add the match arm in the `main()` function's command dispatch.\n3. Use `display.rs` functions for output, or add new ones as needed.\n\n## Testing\n\nThere are no tests yet. When adding tests:\n\n- Unit tests for `fit.rs` logic (given known SystemSpecs and LlmModel values, assert correct FitLevel).\n- Unit tests for `models.rs` (verify JSON parsing, search matching).\n- Integration tests for CLI subcommands via `assert_cmd` crate.\n- TUI is difficult to unit test. Keep rendering stateless and test the state mutations in `tui_app.rs` directly.\n\n## Dependencies policy\n\n- Prefer crates that are well-maintained and have minimal transitive dependencies.\n- `sysinfo` is the system detection crate. Do not replace it with raw platform calls.\n- `ratatui` + `crossterm` is the TUI stack. Do not mix in `termion` or `ncurses`.\n- `clap` with derive feature for CLI parsing. Do not use manual arg parsing.\n- The Python scraper uses only stdlib (`urllib`, `json`). Do not add pip dependencies.\n\n## Common tasks\n\n```sh\n# Build\ncargo build\n\n# Run TUI\ncargo run\n\n# Run CLI mode\ncargo run -- --cli\n\n# Run specific subcommand\ncargo run -- system\ncargo run -- fit --perfect -n 5\ncargo run -- search \"llama\"\n\n# Refresh model database\npython3 scripts/scrape_hf_models.py && cargo build\n\n# Check for compilation issues\ncargo check\n\n# Format code\ncargo fmt\n\n# Lint\ncargo clippy\n```\n\n## Platform notes\n\n- GPU detection shells out to `nvidia-smi` (NVIDIA) and `rocm-smi` (AMD). These are best-effort and fail silently if unavailable.\n- Apple Silicon detection uses `system_profiler SPDisplaysDataType`. On unified memory Macs, VRAM is reported as available system RAM (same pool).\n- `sysinfo` handles cross-platform RAM/CPU. No conditional compilation needed.\n- The TUI uses crossterm which works on Linux, macOS, and Windows terminals.\n"
  },
  {
    "path": "API.md",
    "content": "# llmfit REST API Guide\n\nThis document is for agent/client builders integrating with `llmfit serve`.\n\n## Purpose\n\n`llmfit serve` exposes node-local model fit analysis (same core data used by TUI/CLI) over HTTP and serves a local web dashboard.\n\nPrimary use case:\n- Query each node in a cluster for top runnable models.\n- Aggregate externally (scheduler/controller/UI) for placement decisions.\n\n## Start the server\n\n```sh\nllmfit serve --port 8787\n```\n\nGlobal flags still apply:\n\n```sh\nllmfit --memory 24G --max-context 8192 serve --port 8787\n```\n\n## Base URL\n\nDefault local base URL:\n\n```text\nhttp://127.0.0.1:8787\n```\n\nTo expose outside localhost, pass `--host 0.0.0.0`.\n\nIf you are building from source and want the dashboard embedded in `llmfit`, build web assets first:\n\n```sh\ncd llmfit-web && npm ci && npm run build\n```\n\n## Endpoints\n\n### `GET /`\nWeb dashboard entrypoint (same-origin UI for fit exploration).\n\n### `GET /health`\nLiveness probe.\n\nExample response:\n\n```json\n{\n  \"status\": \"ok\",\n  \"node\": {\n    \"name\": \"worker-1\",\n    \"os\": \"linux\"\n  }\n}\n```\n\n---\n\n### `GET /api/v1/system`\nReturns node identity + detected hardware.\n\nExample response shape:\n\n```json\n{\n  \"node\": {\n    \"name\": \"worker-1\",\n    \"os\": \"linux\"\n  },\n  \"system\": {\n    \"total_ram_gb\": 62.23,\n    \"available_ram_gb\": 41.08,\n    \"cpu_cores\": 14,\n    \"cpu_name\": \"Intel(R) Core(TM) Ultra 7 165U\",\n    \"has_gpu\": false,\n    \"gpu_vram_gb\": null,\n    \"gpu_name\": null,\n    \"gpu_count\": 0,\n    \"unified_memory\": false,\n    \"backend\": \"CPU (x86)\",\n    \"gpus\": []\n  }\n}\n```\n\n---\n\n### `GET /api/v1/models`\nReturns filtered/sorted model-fit rows for this node.\n\nEnvelope shape:\n\n```json\n{\n  \"node\": { \"name\": \"worker-1\", \"os\": \"linux\" },\n  \"system\": { \"...\": \"...\" },\n  \"total_models\": 23,\n  \"returned_models\": 10,\n  \"filters\": { \"...\": \"echo of query state\" },\n  \"models\": [\n    {\n      \"name\": \"Qwen/Qwen2.5-Coder-7B-Instruct\",\n      \"provider\": \"Qwen\",\n      \"parameter_count\": \"7B\",\n      \"params_b\": 7.0,\n      \"context_length\": 32768,\n      \"use_case\": \"Coding\",\n      \"category\": \"Coding\",\n      \"release_date\": \"2025-03-14\",\n      \"is_moe\": false,\n      \"fit_level\": \"good\",\n      \"fit_label\": \"Good\",\n      \"run_mode\": \"gpu\",\n      \"run_mode_label\": \"GPU\",\n      \"score\": 86.5,\n      \"score_components\": {\n        \"quality\": 87.0,\n        \"speed\": 81.2,\n        \"fit\": 90.1,\n        \"context\": 88.0\n      },\n      \"estimated_tps\": 42.5,\n      \"runtime\": \"llamacpp\",\n      \"runtime_label\": \"llama.cpp\",\n      \"best_quant\": \"Q5_K_M\",\n      \"memory_required_gb\": 5.8,\n      \"memory_available_gb\": 12.0,\n      \"utilization_pct\": 48.3,\n      \"notes\": [],\n      \"gguf_sources\": []\n    }\n  ]\n}\n```\n\n---\n\n### `GET /api/v1/models/top`\nKey scheduling endpoint. Same schema as `/api/v1/models`, but defaults to top 5 runnable entries.\n\nImportant behavior:\n- Defaults `limit=5`.\n- Excludes `too_tight` rows unless explicitly overridden (and top endpoint still keeps runnable semantics).\n\n---\n\n### `GET /api/v1/models/{name}`\nPath-constrained search. Equivalent to a text search scoped by `{name}`.\n\nUseful for:\n- Client-side drilldown after selecting a model family.\n\n## Query parameters\n\nSupported on `/api/v1/models` and `/api/v1/models/top` (also `/api/v1/models/{name}`):\n\n- `limit` (or alias `n`): max rows returned.\n- `perfect`: `true|false` (when `true`, only perfect fits).\n- `min_fit`: `perfect|good|marginal|too_tight`.\n- `runtime`: `any|mlx|llamacpp`.\n- `use_case`: `general|coding|reasoning|chat|multimodal|embedding`.\n- `provider`: provider substring filter.\n- `search`: free-text filter (name/provider/params/use-case/category).\n- `sort`: `score|tps|params|mem|ctx|date|use_case`.\n- `include_too_tight`: include unrunnable rows (defaults true for `/models`, false for `/models/top`).\n- `max_context`: per-request context cap used by memory estimation.\n- `force_runtime`: `mlx|llamacpp|vllm` — override automatic runtime selection during analysis (e.g. get llama.cpp recommendations on Apple Silicon instead of MLX).\n\n## Error handling\n\nInvalid filter values return HTTP 400:\n\n```json\n{\n  \"error\": \"invalid min_fit value: use perfect|good|marginal|too_tight\"\n}\n```\n\nServer errors return HTTP 500 with `{\"error\": \"...\"}`.\n\n## Client integration recommendations\n\n### 1) Polling pattern for schedulers\nFor each node agent:\n1. Call `/health`.\n2. Call `/api/v1/system`.\n3. Call `/api/v1/models/top?limit=K&min_fit=good`.\n4. Attach node metadata and forward to your central scheduler.\n\n### 2) Conservative placement defaults\nFor production placement, prefer:\n\n```text\nmin_fit=good\ninclude_too_tight=false\nsort=score\nlimit=5..20\n```\n\n### 3) Per-workload targeting\nExamples:\n- Coding workloads: `use_case=coding`\n- Embedding workloads: `use_case=embedding`\n- Runtime constrained to llama.cpp fleet: `runtime=llamacpp`\n\n### 4) Stable parsing\nTreat unknown fields as forward-compatible additions:\n- Parse required fields you depend on.\n- Ignore unknown fields.\n\n## Curl examples\n\n```sh\ncurl http://127.0.0.1:8787/health\ncurl http://127.0.0.1:8787/api/v1/system\ncurl \"http://127.0.0.1:8787/api/v1/models?limit=20&min_fit=marginal&sort=score\"\ncurl \"http://127.0.0.1:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding\"\ncurl \"http://127.0.0.1:8787/api/v1/models/Mistral?runtime=any\"\n```\n\n## Versioning notes\n\nCurrent API prefix is `v1`.\n\nIf you build long-lived clients, pin to `/api/v1/...` and validate behavior with the local test script in `scripts/test_api.py`.\n"
  },
  {
    "path": "CHANGELOG.md",
    "content": "# Changelog\n\n## [0.3.7](https://github.com/AlexsJones/llmfit/compare/v0.3.6...v0.3.7) (2026-02-21)\n\n\n### Features\n\n* add --memory flag to override GPU VRAM autodetection ([9a02f6e](https://github.com/AlexsJones/llmfit/commit/9a02f6e1616f59783ccff5b007c25213854f63b9))\n* add --memory flag to override GPU VRAM autodetection ([39c5486](https://github.com/AlexsJones/llmfit/commit/39c5486aa3d94f9b9ef36e29642b64d848d0d2b0))\n* add 15 popular models from HuggingFace ([128a020](https://github.com/AlexsJones/llmfit/commit/128a020323897a67ed5d12dd397bcf4924a6bf51))\n* Add 15 popular models from HuggingFace (33→48 models) ([c45606b](https://github.com/AlexsJones/llmfit/commit/c45606bdb235b6bfe616bb616b1364a97e76f0c1))\n* add homebrew tap support and update release workflow ([db09473](https://github.com/AlexsJones/llmfit/commit/db094734288d17a49d9c3c5c99859fe0d7dc976d))\n* added arc support ([b5892fc](https://github.com/AlexsJones/llmfit/commit/b5892fc2ff313e71f57b7d793c7444d2aaadc0bd))\n* added logo ([c21d416](https://github.com/AlexsJones/llmfit/commit/c21d4168f2bcd6da878848f9a6f97179d558606b))\n* added moe ([ac7ffe4](https://github.com/AlexsJones/llmfit/commit/ac7ffe4ed79eb22ec43cf7bc20e8cd8d102d16a9))\n* adding release please ([f2bfc7f](https://github.com/AlexsJones/llmfit/commit/f2bfc7fcf2587b74e05d8ad9d1041be6de456e69))\n* append (WSL) to RAM label in tui when running under WSL ([e0397cf](https://github.com/AlexsJones/llmfit/commit/e0397cf51025b393b0d4024c4ae67200ee206390))\n* caught some unavailable models on ollama ([b9f38da](https://github.com/AlexsJones/llmfit/commit/b9f38da9579040a7c2bada55838c5541474883ca))\n* caught some unavailable models on ollama ([c0f7c20](https://github.com/AlexsJones/llmfit/commit/c0f7c20f61cdd9ae692de6ca66344befba2fafa9))\n* detect installed Ollama models and support pulling from TUI ([4159aaf](https://github.com/AlexsJones/llmfit/commit/4159aaf304b3b421679f8231cf574465783d5b41))\n* first pass ([855ad3d](https://github.com/AlexsJones/llmfit/commit/855ad3d34160cce6200c0ff128c34bcdcb0b922b))\n* fixed up skill ([fcb712a](https://github.com/AlexsJones/llmfit/commit/fcb712a98ac785ad83ad689d5300f17cb80a3f1c))\n* fixed up skill ([1f7d1de](https://github.com/AlexsJones/llmfit/commit/1f7d1de547a31202b9d34dd62bf543f5a22b2de7))\n* fixing vram on apple bug ([5e08754](https://github.com/AlexsJones/llmfit/commit/5e087549c7c1523f4d5df72bd8a915330498a795))\n* fixing vram on apple bug ([b3deca1](https://github.com/AlexsJones/llmfit/commit/b3deca1d9eac16283d0e9269c68a1af1dfc871ab))\n* fixing vram on apple bug ([92ddb0e](https://github.com/AlexsJones/llmfit/commit/92ddb0e82579c6018d1acb4e3dfbe1df7d582605))\n* fixing vram on apple bug ([42b2081](https://github.com/AlexsJones/llmfit/commit/42b2081577bed23176c0f87d1ad0b142cce23872))\n* improvements based on [#12](https://github.com/AlexsJones/llmfit/issues/12) ([5428ef8](https://github.com/AlexsJones/llmfit/commit/5428ef8cdd42e88bced1459b55b480aab767637c))\n* increased model count ([156b29d](https://github.com/AlexsJones/llmfit/commit/156b29deb077a1d66948254b370597a118fd5daf))\n* increment version ([283bebb](https://github.com/AlexsJones/llmfit/commit/283bebb8eca5da2fc7124b665ae773fda48aed93))\n* overall to the scoring system ([f475938](https://github.com/AlexsJones/llmfit/commit/f4759381d23b834e0a42a4699d23fb3f858fe677))\n* overall to the scoring system ([b0696cf](https://github.com/AlexsJones/llmfit/commit/b0696cf297f1cb11247493355406d8b9c56510db))\n* overall to the scoring system ([37e2e10](https://github.com/AlexsJones/llmfit/commit/37e2e10076f450f79165d92541baf04957ec2fe9))\n* plumbing 2 ([1c615bb](https://github.com/AlexsJones/llmfit/commit/1c615bb57b7395f9be888245f8157dec2bab8bb4))\n* plumbing 2 ([dd6a3ec](https://github.com/AlexsJones/llmfit/commit/dd6a3ec20e09ae72eada1fada73a6392c9673221))\n* pull functionality ([923e7e7](https://github.com/AlexsJones/llmfit/commit/923e7e7463dd2bd53b6438ad3c8f2eb1f7a45af4))\n* release plumbing ([7d21719](https://github.com/AlexsJones/llmfit/commit/7d217192bc1638f7ff69a22c2467d7d86da96641))\n* release plumbing ([3accbb4](https://github.com/AlexsJones/llmfit/commit/3accbb42c99321fb6f8ade9d2f07af0fee93ed9e))\n* reworked available models for download ([9adc84f](https://github.com/AlexsJones/llmfit/commit/9adc84f3041dca14fdcdc4437409b2b81eaca5a3))\n* support for windows vulkan ([cc0fd61](https://github.com/AlexsJones/llmfit/commit/cc0fd619fa31e01c398c3c23f45aa915005670c8))\n* supporting 94 models ([a652be3](https://github.com/AlexsJones/llmfit/commit/a652be31dd0cbe36f89572de7022e2a145fb3788))\n* updated build actions ([1e65fdd](https://github.com/AlexsJones/llmfit/commit/1e65fddecb5f183870ddf1aa865dcaddba47523a))\n* updated images ([9141109](https://github.com/AlexsJones/llmfit/commit/9141109f753ef38eb2b2eb5c604edb6ee0d7e371))\n* updated models ([2d6c1d6](https://github.com/AlexsJones/llmfit/commit/2d6c1d66708186c0a21cb2f082a5b4e2fb03db90))\n* updated tui to support multiple providers better and also multiple GPU support ([a3ca0bd](https://github.com/AlexsJones/llmfit/commit/a3ca0bd64647fa958c15bb7038a9e02df175fe67))\n* updated urls ([f75ec27](https://github.com/AlexsJones/llmfit/commit/f75ec2750f325ff73725e5b8b194ba854c8579e9))\n* updated version ([2cfc73e](https://github.com/AlexsJones/llmfit/commit/2cfc73ebdb6214f801e32880ff6451b2809bbb45))\n\n\n### Bug Fixes\n\n* correctly estimate VRAM for APU integrated GPUs ([72c8cb0](https://github.com/AlexsJones/llmfit/commit/72c8cb0e7873e0a8bcf4a10aee877bc38555299c))\n* correctly estimate VRAM for APU integrated GPUs (Radeon Graphics) ([8da5c2a](https://github.com/AlexsJones/llmfit/commit/8da5c2a0443b73a3ac78ac087b0f08acdba6aaa9)), closes [#25](https://github.com/AlexsJones/llmfit/issues/25)\n* update OpenClaw skill to match actual CLI output ([f38a0e5](https://github.com/AlexsJones/llmfit/commit/f38a0e56ef332bde8f3b03f8b06b5982fe90c1cc))\n* update OpenClaw skill to match actual CLI output ([e1adbfd](https://github.com/AlexsJones/llmfit/commit/e1adbfd0abd786bc7a99496f20a7f81070bc8fe3))\n\n## [0.3.6](https://github.com/AlexsJones/llmfit/compare/llmfit-v0.3.5...llmfit-v0.3.6) (2026-02-21)\n\n\n### Features\n\n* release plumbing ([7d21719](https://github.com/AlexsJones/llmfit/commit/7d217192bc1638f7ff69a22c2467d7d86da96641))\n* release plumbing ([3accbb4](https://github.com/AlexsJones/llmfit/commit/3accbb42c99321fb6f8ade9d2f07af0fee93ed9e))\n\n## [0.3.5](https://github.com/AlexsJones/llmfit/compare/llmfit-v0.3.4...llmfit-v0.3.5) (2026-02-21)\n\n\n### Features\n\n* add --memory flag to override GPU VRAM autodetection ([9a02f6e](https://github.com/AlexsJones/llmfit/commit/9a02f6e1616f59783ccff5b007c25213854f63b9))\n* add --memory flag to override GPU VRAM autodetection ([39c5486](https://github.com/AlexsJones/llmfit/commit/39c5486aa3d94f9b9ef36e29642b64d848d0d2b0))\n* add 15 popular models from HuggingFace ([128a020](https://github.com/AlexsJones/llmfit/commit/128a020323897a67ed5d12dd397bcf4924a6bf51))\n* Add 15 popular models from HuggingFace (33→48 models) ([c45606b](https://github.com/AlexsJones/llmfit/commit/c45606bdb235b6bfe616bb616b1364a97e76f0c1))\n* add homebrew tap support and update release workflow ([db09473](https://github.com/AlexsJones/llmfit/commit/db094734288d17a49d9c3c5c99859fe0d7dc976d))\n* added arc support ([b5892fc](https://github.com/AlexsJones/llmfit/commit/b5892fc2ff313e71f57b7d793c7444d2aaadc0bd))\n* added logo ([c21d416](https://github.com/AlexsJones/llmfit/commit/c21d4168f2bcd6da878848f9a6f97179d558606b))\n* added moe ([ac7ffe4](https://github.com/AlexsJones/llmfit/commit/ac7ffe4ed79eb22ec43cf7bc20e8cd8d102d16a9))\n* adding release please ([f2bfc7f](https://github.com/AlexsJones/llmfit/commit/f2bfc7fcf2587b74e05d8ad9d1041be6de456e69))\n* append (WSL) to RAM label in tui when running under WSL ([e0397cf](https://github.com/AlexsJones/llmfit/commit/e0397cf51025b393b0d4024c4ae67200ee206390))\n* caught some unavailable models on ollama ([b9f38da](https://github.com/AlexsJones/llmfit/commit/b9f38da9579040a7c2bada55838c5541474883ca))\n* caught some unavailable models on ollama ([c0f7c20](https://github.com/AlexsJones/llmfit/commit/c0f7c20f61cdd9ae692de6ca66344befba2fafa9))\n* detect installed Ollama models and support pulling from TUI ([4159aaf](https://github.com/AlexsJones/llmfit/commit/4159aaf304b3b421679f8231cf574465783d5b41))\n* first pass ([855ad3d](https://github.com/AlexsJones/llmfit/commit/855ad3d34160cce6200c0ff128c34bcdcb0b922b))\n* fixed up skill ([fcb712a](https://github.com/AlexsJones/llmfit/commit/fcb712a98ac785ad83ad689d5300f17cb80a3f1c))\n* fixed up skill ([1f7d1de](https://github.com/AlexsJones/llmfit/commit/1f7d1de547a31202b9d34dd62bf543f5a22b2de7))\n* fixing vram on apple bug ([5e08754](https://github.com/AlexsJones/llmfit/commit/5e087549c7c1523f4d5df72bd8a915330498a795))\n* fixing vram on apple bug ([b3deca1](https://github.com/AlexsJones/llmfit/commit/b3deca1d9eac16283d0e9269c68a1af1dfc871ab))\n* fixing vram on apple bug ([92ddb0e](https://github.com/AlexsJones/llmfit/commit/92ddb0e82579c6018d1acb4e3dfbe1df7d582605))\n* fixing vram on apple bug ([42b2081](https://github.com/AlexsJones/llmfit/commit/42b2081577bed23176c0f87d1ad0b142cce23872))\n* improvements based on [#12](https://github.com/AlexsJones/llmfit/issues/12) ([5428ef8](https://github.com/AlexsJones/llmfit/commit/5428ef8cdd42e88bced1459b55b480aab767637c))\n* increased model count ([156b29d](https://github.com/AlexsJones/llmfit/commit/156b29deb077a1d66948254b370597a118fd5daf))\n* increment version ([283bebb](https://github.com/AlexsJones/llmfit/commit/283bebb8eca5da2fc7124b665ae773fda48aed93))\n* overall to the scoring system ([f475938](https://github.com/AlexsJones/llmfit/commit/f4759381d23b834e0a42a4699d23fb3f858fe677))\n* overall to the scoring system ([b0696cf](https://github.com/AlexsJones/llmfit/commit/b0696cf297f1cb11247493355406d8b9c56510db))\n* overall to the scoring system ([37e2e10](https://github.com/AlexsJones/llmfit/commit/37e2e10076f450f79165d92541baf04957ec2fe9))\n* pull functionality ([923e7e7](https://github.com/AlexsJones/llmfit/commit/923e7e7463dd2bd53b6438ad3c8f2eb1f7a45af4))\n* reworked available models for download ([9adc84f](https://github.com/AlexsJones/llmfit/commit/9adc84f3041dca14fdcdc4437409b2b81eaca5a3))\n* support for windows vulkan ([cc0fd61](https://github.com/AlexsJones/llmfit/commit/cc0fd619fa31e01c398c3c23f45aa915005670c8))\n* supporting 94 models ([a652be3](https://github.com/AlexsJones/llmfit/commit/a652be31dd0cbe36f89572de7022e2a145fb3788))\n* updated build actions ([1e65fdd](https://github.com/AlexsJones/llmfit/commit/1e65fddecb5f183870ddf1aa865dcaddba47523a))\n* updated images ([9141109](https://github.com/AlexsJones/llmfit/commit/9141109f753ef38eb2b2eb5c604edb6ee0d7e371))\n* updated models ([2d6c1d6](https://github.com/AlexsJones/llmfit/commit/2d6c1d66708186c0a21cb2f082a5b4e2fb03db90))\n* updated tui to support multiple providers better and also multiple GPU support ([a3ca0bd](https://github.com/AlexsJones/llmfit/commit/a3ca0bd64647fa958c15bb7038a9e02df175fe67))\n* updated urls ([f75ec27](https://github.com/AlexsJones/llmfit/commit/f75ec2750f325ff73725e5b8b194ba854c8579e9))\n* updated version ([2cfc73e](https://github.com/AlexsJones/llmfit/commit/2cfc73ebdb6214f801e32880ff6451b2809bbb45))\n\n\n### Bug Fixes\n\n* correctly estimate VRAM for APU integrated GPUs ([72c8cb0](https://github.com/AlexsJones/llmfit/commit/72c8cb0e7873e0a8bcf4a10aee877bc38555299c))\n* correctly estimate VRAM for APU integrated GPUs (Radeon Graphics) ([8da5c2a](https://github.com/AlexsJones/llmfit/commit/8da5c2a0443b73a3ac78ac087b0f08acdba6aaa9)), closes [#25](https://github.com/AlexsJones/llmfit/issues/25)\n* update OpenClaw skill to match actual CLI output ([f38a0e5](https://github.com/AlexsJones/llmfit/commit/f38a0e56ef332bde8f3b03f8b06b5982fe90c1cc))\n* update OpenClaw skill to match actual CLI output ([e1adbfd](https://github.com/AlexsJones/llmfit/commit/e1adbfd0abd786bc7a99496f20a7f81070bc8fe3))\n"
  },
  {
    "path": "CNAME",
    "content": "llmfit.axjns.dev"
  },
  {
    "path": "Cargo.toml",
    "content": "[workspace]\nmembers = [\"llmfit-core\", \"llmfit-tui\", \"llmfit-desktop\"]\ndefault-members = [\"llmfit-core\", \"llmfit-tui\"]\nresolver = \"3\"\n\n[workspace.package]\nversion = \"0.8.0\"\n"
  },
  {
    "path": "Dockerfile",
    "content": "# Multi-stage build for llmfit\n# Stage 1: Build the Rust binary\nFROM rust:1.88-slim AS builder\n\n# Install build dependencies\nRUN apt-get update && apt-get install -y \\\n    pkg-config \\\n    libssl-dev \\\n    && rm -rf /var/lib/apt/lists/*\n\n# Set working directory\nWORKDIR /build\n\n# Copy workspace configuration\nCOPY Cargo.toml Cargo.lock ./\n\n# Copy all workspace members\nCOPY llmfit-core/ ./llmfit-core/\nCOPY llmfit-tui/ ./llmfit-tui/\nCOPY llmfit-desktop/ ./llmfit-desktop/\nCOPY data/ ./data/\n\n# Build release binary for llmfit-tui\nRUN cargo build --release -p llmfit\n\n# Stage 2: Runtime image\nFROM debian:bookworm-slim\n\n# Install runtime dependencies for hardware detection\nRUN apt-get update && apt-get install -y \\\n    pciutils \\\n    lshw \\\n    && rm -rf /var/lib/apt/lists/*\n\n# Copy the binary from builder\nCOPY --from=builder /build/target/release/llmfit /usr/local/bin/llmfit\n\n# Create a non-root user\nRUN useradd -m -u 1000 llmfit && \\\n    chown -R llmfit:llmfit /usr/local/bin/llmfit\n\nUSER llmfit\n\n# Set default command to output JSON recommendations\n# In Kubernetes, this will run once per node and log results\nENTRYPOINT [\"/usr/local/bin/llmfit\"]\nCMD [\"recommend\", \"--json\"]\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2026 Alex Jones\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "MODELS.md",
    "content": "# Supported Models\n\nllmfit ships with a curated database of 106 LLM models from HuggingFace. All memory estimates assume Q4_K_M quantization (0.5 bytes per parameter) unless noted otherwise.\n\n### 01.ai\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) | 6.1B | Q4_K_M | 4k | Instruction following, chat |\n| [01-ai/Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) | 34.4B | Q4_K_M | 4k | Instruction following, chat |\n\n### Alibaba\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | 600M | Q4_K_M | 40k | Lightweight, edge deployment |\n| [Qwen/Qwen3.5-0.8B](https://huggingface.co/Qwen/Qwen3.5-0.8B) | 873M | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen3.5-0.8B-Base](https://huggingface.co/Qwen/Qwen3.5-0.8B-Base) | 873M | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) | 1.5B | Q4_K_M | 32k | Code generation and completion |\n| [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) | 1.7B | Q4_K_M | 40k | Lightweight, edge deployment |\n| [Qwen/Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B) | 2.3B | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen3.5-2B-Base](https://huggingface.co/Qwen/Qwen3.5-2B-Base) | 2.3B | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) | 3.8B | Q4_K_M | 32k | Multimodal, vision and text |\n| [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) | 4.0B | Q4_K_M | 40k | General purpose text generation |\n| [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) | 4.7B | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen3.5-4B-Base](https://huggingface.co/Qwen/Qwen3.5-4B-Base) | 4.7B | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | 7.6B | Q4_K_M | 32k | Instruction following, chat |\n| [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) | 7.6B | Q4_K_M | 32k | Code generation and completion |\n| [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | 8.2B | Q4_K_M | 40k | General purpose text generation |\n| [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 8.3B | Q4_K_M | 32k | Multimodal, vision and text |\n| [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) | 9.7B | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen3.5-9B-Base](https://huggingface.co/Qwen/Qwen3.5-9B-Base) | 9.7B | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | 14.8B | Q4_K_M | 128k | Instruction following, chat |\n| [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) | 14.8B | Q4_K_M | 128k | General purpose text generation |\n| [Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) | 14.8B | Q4_K_M | 32k | Code generation and completion |\n| [Qwen/Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) | 27.8B | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) | 30.5B (MoE) | Q4_K_M | 40k | Efficient MoE, general purpose |\n| [Qwen/Qwen3.5-35B-A3B](https://huggingface.co/Qwen/Qwen3.5-35B-A3B) | 36.0B (MoE) | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 32.5B | Q4_K_M | 128k | Instruction following, chat |\n| [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | 32.8B | Q4_K_M | 40k | General purpose text generation |\n| [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) | 32.8B | Q4_K_M | 32k | Code generation and completion |\n| [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) | 72.7B | Q4_K_M | 32k | Instruction following, chat |\n| [Qwen/Qwen3.5-122B-A10B](https://huggingface.co/Qwen/Qwen3.5-122B-A10B) | 125.1B (MoE) | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) | 235B (MoE) | Q4_K_M | 40k | State-of-the-art, MoE architecture |\n| [Qwen/Qwen3.5-397B-A17B](https://huggingface.co/Qwen/Qwen3.5-397B-A17B) | 403.4B (MoE) | Q4_K_M | 256k | Multimodal, vision and text |\n| [Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct) | 480B (MoE) | Q4_K_M | 256k | Code generation and completion |\n\n### Allen Institute\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [allenai/OLMo-2-0325-32B-Instruct](https://huggingface.co/allenai/OLMo-2-0325-32B-Instruct) | 32B | Q4_K_M | 4k | Fully open-source, instruction following |\n\n### Ant Group\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [inclusionAI/Ling-lite](https://huggingface.co/inclusionAI/Ling-lite) | 16.8B (MoE) | Q4_K_M | 128k | Efficient MoE, general purpose |\n\n### BAAI\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 335M | Q4_K_M | 512 | Text embeddings for RAG |\n\n### Baidu\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [baidu/ERNIE-4.5-300B-A47B-Paddle](https://huggingface.co/baidu/ERNIE-4.5-300B-A47B-Paddle) | 300B (MoE) | Q4_K_M | 128k | Multilingual, reasoning |\n\n### BigCode\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [bigcode/starcoder2-7b](https://huggingface.co/bigcode/starcoder2-7b) | 7.2B | Q4_K_M | 16k | Code generation and completion |\n| [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b) | 15.7B | Q4_K_M | 16k | Code generation and completion |\n\n### BigScience\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [bigscience/bloom](https://huggingface.co/bigscience/bloom) | 176B | Q4_K_M | 2k | Multilingual text generation |\n\n### Cohere\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01) | 35B | Q4_K_M | 128k | RAG, tool use, agents |\n\n### Community\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) | 1.1B | Q4_K_M | 2k | Instruction following, chat |\n\n### DeepSeek\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | 7.6B | Q4_K_M | 128k | Advanced reasoning, chain-of-thought |\n| [deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct) | 16B (MoE) | Q4_K_M | 128k | Code generation and completion |\n| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 32.8B | Q4_K_M | 128k | Advanced reasoning, chain-of-thought |\n| [deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) | 671B (MoE) | Q4_K_M | 128k | Advanced reasoning, chain-of-thought |\n| [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) | 685B (MoE) | Q4_K_M | 128k | State-of-the-art, MoE architecture |\n\n### Google\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it) | 1B | Q4_K_M | 32k | Lightweight, edge deployment |\n| [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) | 2.6B | Q4_K_M | 4k | General purpose text generation |\n| [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) | 4B | Q4_K_M | 128k | Lightweight, general purpose |\n| [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) | 9.2B | Q4_K_M | 4k | General purpose text generation |\n| [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) | 12B | Q4_K_M | 128k | Multimodal, vision and text |\n| [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it) | 27B | Q4_K_M | 128k | General purpose text generation |\n| [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) | 27.2B | Q4_K_M | 4k | General purpose text generation |\n\n### HuggingFace\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 7.2B | Q4_K_M | 32k | General purpose text generation |\n\n### IBM\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [ibm-granite/granite-4.0-h-micro](https://huggingface.co/ibm-granite/granite-4.0-h-micro) | 3B | Q4_K_M | 128k | Enterprise, hybrid Mamba/transformer |\n| [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny) | 7B (MoE) | Q4_K_M | 128k | Enterprise, hybrid Mamba/transformer |\n| [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) | 8.1B | Q4_K_M | 128k | Enterprise, instruction following |\n| [ibm-granite/granite-4.0-h-small](https://huggingface.co/ibm-granite/granite-4.0-h-small) | 32B (MoE) | Q4_K_M | 128k | Enterprise, hybrid Mamba/transformer |\n\n### LMSYS\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) | 7.0B | Q4_K_M | 4k | Instruction following, chat |\n| [lmsys/vicuna-13b-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5) | 13.0B | Q4_K_M | 4k | Instruction following, chat |\n\n### Meituan\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [meituan/LongCat-Flash](https://huggingface.co/meituan/LongCat-Flash) | 560B (MoE) | Q4_K_M | 512k | Long context MoE |\n\n### Meta\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) | 1.2B | Q4_K_M | 4k | General purpose text generation |\n| [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) | 3.2B | Q4_K_M | 4k | General purpose text generation |\n| [meta-llama/CodeLlama-7b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf) | 6.7B | Q4_K_M | 4k | Code generation and completion |\n| [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | 8.0B | Q4_K_M | 4k | General purpose text generation |\n| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | 8.0B | Q4_K_M | 4k | Instruction following, chat |\n| [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) | 10.7B | Q4_K_M | 4k | Instruction following, chat |\n| [meta-llama/CodeLlama-13b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-13b-Instruct-hf) | 13.0B | Q4_K_M | 4k | Code generation and completion |\n| [meta-llama/CodeLlama-34b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf) | 33.7B | Q4_K_M | 4k | Code generation and completion |\n| [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | 70.6B | Q4_K_M | 4k | Instruction following, chat |\n| [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | 70.6B | Q4_K_M | 128k | Instruction following, chat |\n| [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) | 109B (MoE) | Q4_K_M | 128k | Multimodal, vision and text |\n| [meta-llama/Llama-4-Maverick-17B-128E-Instruct](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct) | 400B (MoE) | Q4_K_M | 128k | Multimodal, vision and text |\n| [meta-llama/Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) | 405.9B | Q4_K_M | 4k | Instruction following, chat |\n\n### Microsoft\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [microsoft/phi-3-mini-4k-instruct](https://huggingface.co/microsoft/phi-3-mini-4k-instruct) | 3.8B | Q4_K_M | 4k | Lightweight, edge deployment |\n| [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) | 3.8B | Q4_K_M | 128k | Lightweight, long context |\n| [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) | 3.8B | Q4_K_M | 128k | Lightweight, edge deployment |\n| [microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b) | 7.0B | Q4_K_M | 4k | Reasoning, step-by-step solutions |\n| [microsoft/Orca-2-13b](https://huggingface.co/microsoft/Orca-2-13b) | 13.0B | Q4_K_M | 4k | Reasoning, step-by-step solutions |\n| [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) | 14B | Q4_K_M | 16k | Reasoning, STEM, code generation |\n| [microsoft/Phi-3-medium-14b-instruct](https://huggingface.co/microsoft/Phi-3-medium-14b-instruct) | 14B | Q4_K_M | 4k | Balanced performance and size |\n\n### Mistral AI\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) | 7.2B | Q4_K_M | 32k | Instruction following, chat |\n| [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410) | 8.0B | Q4_K_M | 32k | Instruction following, chat |\n| [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) | 12.2B | Q4_K_M | 128k | Instruction following, chat |\n| [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) | 24B | Q4_K_M | 32k | Instruction following, chat |\n| [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) | 24B | Q4_K_M | 128k | Multimodal, vision and text |\n| [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) | 46.7B (MoE) | Q4_K_M | 32k | Instruction following, chat |\n| [mistralai/Mistral-Large-Instruct-2407](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407) | 123B | Q4_K_M | 128k | Large-scale instruction following |\n| [mistralai/Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) | 140.6B (MoE) | Q4_K_M | 64k | Large MoE, instruction following |\n\n### Moonshot\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [moonshotai/Kimi-K2-Instruct](https://huggingface.co/moonshotai/Kimi-K2-Instruct) | 1000B (MoE) | Q4_K_M | 128k | Large MoE, reasoning |\n\n### Nomic\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) | 137M | F16 | 8k | Text embeddings for RAG |\n\n### NousResearch\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO) | 46.7B (MoE) | Q4_K_M | 32k | General purpose text generation |\n\n### OpenChat\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [openchat/openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) | 7.0B | Q4_K_M | 8k | Instruction following, chat |\n\n### Rednote\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [rednote-hilab/dots.llm1.inst](https://huggingface.co/rednote-hilab/dots.llm1.inst) | 142B (MoE) | Q4_K_M | 128k | MoE, general purpose |\n\n### Stability AI\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [stabilityai/stablelm-2-1_6b-chat](https://huggingface.co/stabilityai/stablelm-2-1_6b-chat) | 1.6B | Q4_K_M | 4k | Instruction following, chat |\n\n### TII\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) | 7.2B | Q4_K_M | 4k | Instruction following, chat |\n| [tiiuae/Falcon3-7B-Instruct](https://huggingface.co/tiiuae/Falcon3-7B-Instruct) | 7.5B | Q4_K_M | 32k | Instruction following, chat |\n| [tiiuae/Falcon3-10B-Instruct](https://huggingface.co/tiiuae/Falcon3-10B-Instruct) | 10.3B | Q4_K_M | 32k | Instruction following, chat |\n| [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) | 40.0B | Q4_K_M | 2k | Instruction following, chat |\n| [tiiuae/falcon-180B-chat](https://huggingface.co/tiiuae/falcon-180B-chat) | 180B | Q4_K_M | 2k | Large-scale instruction following |\n\n### Upstage\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) | 10.7B | Q4_K_M | 4k | High-performance instruction following |\n\n### WizardLM\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [WizardLMTeam/WizardLM-13B-V1.2](https://huggingface.co/WizardLMTeam/WizardLM-13B-V1.2) | 13.0B | Q4_K_M | 4k | Instruction following, chat |\n| [WizardLMTeam/WizardCoder-15B-V1.0](https://huggingface.co/WizardLMTeam/WizardCoder-15B-V1.0) | 15.5B | Q4_K_M | 8k | Code generation and completion |\n\n### xAI\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [xai-org/grok-1](https://huggingface.co/xai-org/grok-1) | 314B (MoE) | Q4_K_M | 8k | Large MoE, general purpose |\n\n### Zhipu AI\n\n| Model | Parameters | Quantization | Context | Use Case |\n|-------|-----------|--------------|---------|----------|\n| [THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) | 9B | Q4_K_M | 128k | Multilingual, instruction following |\n"
  },
  {
    "path": "Makefile",
    "content": "# Makefile for llmfit\n# Convenience commands for building, testing, and updating the model database\n\n.PHONY: help build release clean run test update-models update-docker-models update-catalogs check fmt clippy install\n\n# Default target\nhelp:\n\t@echo \"llmfit - LLM Model Fit Analyzer\"\n\t@echo \"\"\n\t@echo \"Available targets:\"\n\t@echo \"  make build          - Build debug binary\"\n\t@echo \"  make release        - Build release binary\"\n\t@echo \"  make run            - Run in TUI mode (debug)\"\n\t@echo \"  make test           - Run all unit tests\"\n\t@echo \"  make update-models  - Fetch latest model data from HuggingFace\"\n\t@echo \"  make update-docker-models - Refresh Docker Model Runner catalog\"\n\t@echo \"  make update-catalogs - Refresh all catalogs (HF models + Docker) and rebuild\"\n\t@echo \"  make check          - Run cargo check\"\n\t@echo \"  make fmt            - Format code with rustfmt\"\n\t@echo \"  make clippy         - Run clippy linter\"\n\t@echo \"  make clean          - Remove build artifacts\"\n\t@echo \"  make install        - Install release binary to ~/.cargo/bin\"\n\t@echo \"\"\n\n# Build debug version\nbuild:\n\tcargo build\n\n# Build release version\nrelease:\n\tcargo build --release\n\n# Clean build artifacts\nclean:\n\tcargo clean\n\n# Run in TUI mode\nrun:\n\tcargo run\n\n# Run tests\ntest:\n\tcargo test\n\n# Update model database from HuggingFace\nupdate-models:\n\t@./scripts/update_models.sh\n\n# Refresh Docker Model Runner catalog from Docker Hub\nupdate-docker-models:\n\tpython3 scripts/scrape_docker_models.py\n\n# Refresh all catalogs (HF models + Docker) and rebuild\n# Runs HF scraper first (via update_models.sh which also rebuilds),\n# then Docker scraper (which depends on hf_models.json), then rebuilds again\n# to embed the updated Docker catalog.\nupdate-catalogs:\n\t@./scripts/update_models.sh\n\tpython3 scripts/scrape_docker_models.py\n\tcargo build --release\n\n# Check compilation without building\ncheck:\n\tcargo check\n\n# Format code\nfmt:\n\tcargo fmt\n\n# Run clippy\nclippy:\n\tcargo clippy -- -D warnings\n\n# Install to ~/.cargo/bin\ninstall:\n\tcargo install --path .\n"
  },
  {
    "path": "README.md",
    "content": "# llmfit\n\n<p align=\"center\">\n  <img src=\"assets/icon.svg\" alt=\"llmfit icon\" width=\"128\" height=\"128\">\n</p>\n\n<p align=\"center\">\n  <b>English</b> ·\n  <a href=\"README.zh.md\">中文</a>\n</p>\n\n<p align=\"center\">\n  <a href=\"https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml\"><img src=\"https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"></a>\n  <a href=\"https://crates.io/crates/llmfit\"><img src=\"https://img.shields.io/crates/v/llmfit.svg\" alt=\"Crates.io\"></a>\n  <a href=\"LICENSE\"><img src=\"https://img.shields.io/badge/license-MIT-blue.svg\" alt=\"License\"></a>\n</p>\n\n**Hundreds of models & providers. One command to find what runs on your hardware.**\n\nA terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU. Detects your hardware, scores each model across quality, speed, fit, and context dimensions, and tells you which ones will actually run well on your machine.\n\nShips with an interactive TUI (default) and a classic CLI mode. Supports multi-GPU setups, MoE architectures, dynamic quantization selection, speed estimation, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).\n\n> **Sister project:** Check out [sympozium](https://github.com/AlexsJones/sympozium/) for managing agents in Kubernetes.\n\n![demo](demo.gif)\n\n---\n\n## Install\n\n### Windows\n```sh\nscoop install llmfit\n```\n\nIf Scoop is not installed, follow the [Scoop installation guide](https://scoop.sh/).\n\n### macOS / Linux\n\n#### Homebrew\n```sh\nbrew install llmfit\n```\n\n#### Quick install\n```sh\ncurl -fsSL https://llmfit.axjns.dev/install.sh | sh\n```\n\nDownloads the latest release binary from GitHub and installs it to `/usr/local/bin` (or `~/.local/bin` if no sudo).\n\n**Install to `~/.local/bin` without sudo:**\n```sh\ncurl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local\n```\n\n### Docker / Podman\n```sh\ndocker run ghcr.io/alexsjones/llmfit\n```\nThis prints JSON from `llmfit recommend` command. The JSON could be further queried with `jq`.\n```\npodman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'\n```\n\n### From source\n```sh\ngit clone https://github.com/AlexsJones/llmfit.git\ncd llmfit\ncargo build --release\n# binary is at target/release/llmfit\n```\n\n---\n\n## Usage\n\n### TUI (default)\n\n```sh\nllmfit\n```\n\nLaunches the interactive terminal UI. Your system specs (CPU, RAM, GPU name, VRAM, backend) are shown at the top. Models are listed in a scrollable table sorted by composite score. Each row shows the model's score, estimated tok/s, best quantization for your hardware, run mode, memory usage, and use-case category.\n\n| Key                        | Action                                                                |\n|----------------------------|-----------------------------------------------------------------------|\n| `Up` / `Down` or `j` / `k` | Navigate models                                                       |\n| `/`                        | Enter search mode (partial match on name, provider, params, use case) |\n| `Esc` or `Enter`           | Exit search mode                                                      |\n| `Ctrl-U`                   | Clear search                                                          |\n| `f`                        | Cycle fit filter: All, Runnable, Perfect, Good, Marginal              |\n| `a`                        | Cycle availability filter: All, GGUF Avail, Installed                 |\n| `s`                        | Cycle sort column: Score, Params, Mem%, Ctx, Date, Use Case           |\n| `v`                        | Enter Visual mode (select multiple models)                            |\n| `V`                        | Enter Select mode (column-based filtering)                            |\n| `t`                        | Cycle color theme (saved automatically)                               |\n| `p`                        | Open Plan mode for selected model (hardware planning)                 |\n| `P`                        | Open provider filter popup                                            |\n| `U`                        | Open use-case filter popup                                            |\n| `C`                        | Open capability filter popup                                          |\n| `m`                        | Mark selected model for compare                                       |\n| `c`                        | Open compare view (marked vs selected)                                |\n| `x`                        | Clear compare mark                                                    |\n| `i`                        | Toggle installed-first sorting (any detected runtime provider)        |\n| `d`                        | Download selected model (provider picker when multiple are available) |\n| `r`                        | Refresh installed models from runtime providers                       |\n| `Enter`                    | Toggle detail view for selected model                                 |\n| `PgUp` / `PgDn`            | Scroll by 10                                                          |\n| `g` / `G`                  | Jump to top / bottom                                                  |\n| `q`                        | Quit                                                                  |\n\n### Vim-like modes\n\nThe TUI uses Vim-inspired modes shown in the bottom-left status bar. The current mode determines which keys are active.\n\n#### Normal mode\n\nThe default mode. Navigate, search, filter, and open views. All keys in the table above apply here.\n\n#### Visual mode (`v`)\n\nSelect a contiguous range of models for bulk comparison. Press `v` to anchor at the current row, then navigate with `j`/`k` or arrow keys to extend the selection. Selected rows are highlighted.\n\n| Key                 | Action                                                 |\n|---------------------|--------------------------------------------------------|\n| `j` / `k` or arrows | Extend selection up/down                               |\n| `c`                 | Compare all selected models (opens multi-compare view) |\n| `m`                 | Mark current model for two-model compare               |\n| `Esc` or `v`        | Exit Visual mode                                       |\n\nThe multi-compare view displays a table where rows are attributes (Score, tok/s, Fit, Mem%, Params, Mode, Context, Quant, etc.) and columns are models. Best values are highlighted. Use `h`/`l` or arrow keys to scroll horizontally if more models are selected than fit on screen.\n\n#### Select mode (`V`)\n\nColumn-based filtering. Press `V` (shift-v) to enter Select mode, then use `h`/`l` or arrow keys to move between column headers. The active column is visually highlighted. Press `Enter` or `Space` to activate the appropriate filter for that column:\n\n| Column                        | Filter action                                                             |\n|-------------------------------|---------------------------------------------------------------------------|\n| Inst                          | Cycle availability filter                                                 |\n| Model                         | Enter search mode                                                         |\n| Provider                      | Open provider popup                                                       |\n| Params                        | Open parameter-size bucket popup (<3B, 3-7B, 7-14B, 14-30B, 30-70B, 70B+) |\n| Score, tok/s, Mem%, Ctx, Date | Sort by that column                                                       |\n| Quant                         | Open quantization popup                                                   |\n| Mode                          | Open run-mode popup (GPU, MoE, CPU+GPU, CPU)                              |\n| Fit                           | Cycle fit filter                                                          |\n| Use Case                      | Open use-case popup                                                       |\n\nRow navigation (`j`/`k`) still works in Select mode so you can see the effect of filters as you apply them. Press `Esc` to return to Normal mode.\n\n### TUI Plan mode (`p`)\n\nPlan mode inverts normal fit analysis: instead of asking \"what fits my hardware?\", it estimates \"what hardware is needed for this model config?\".\n\nUse `p` on a selected row, then:\n\n| Key                    | Action                                                    |\n|------------------------|-----------------------------------------------------------|\n| `Tab` / `j` / `k`      | Move between editable fields (Context, Quant, Target TPS) |\n| `Left` / `Right`       | Move cursor in current field                              |\n| Type                   | Edit current field                                        |\n| `Backspace` / `Delete` | Remove characters                                         |\n| `Ctrl-U`               | Clear current field                                       |\n| `Esc` or `q`           | Exit Plan mode                                            |\n\nPlan mode shows estimates for:\n- minimum and recommended VRAM/RAM/CPU cores\n- feasible run paths (GPU, CPU offload, CPU-only)\n- upgrade deltas to reach better fit targets\n\n### Themes\n\nPress `t` to cycle through 10 built-in color themes. Your selection is saved automatically to `~/.config/llmfit/theme` and restored on next launch.\n\n| Theme                    | Description                                       |\n|--------------------------|---------------------------------------------------|\n| **Default**              | Original llmfit colors                            |\n| **Dracula**              | Dark purple background with pastel accents        |\n| **Solarized**            | Ethan Schoonover's Solarized Dark palette         |\n| **Nord**                 | Arctic, cool blue-gray tones                      |\n| **Monokai**              | Monokai Pro warm syntax colors                    |\n| **Gruvbox**              | Retro groove palette with warm earth tones        |\n| **Catppuccin Latte**     | 🌻 Light theme — harmonious pastel inversion      |\n| **Catppuccin Frappé**    | 🪴 Low-contrast dark — muted, subdued aesthetic   |\n| **Catppuccin Macchiato** | 🌺 Medium-contrast dark — gentle, soothing tones  |\n| **Catppuccin Mocha**     | 🌿 Darkest variant — cozy with color-rich accents |\n\n### Web dashboard\n\nWhen you run `llmfit` in non-JSON mode, it automatically starts a background web dashboard on `0.0.0.0:8787`. Open it in any browser on the same network:\n\n```\nhttp://<your-machine-ip>:8787\n```\n\nOverride the host or port with environment variables:\n\n```sh\nLLMFIT_DASHBOARD_HOST=0.0.0.0 LLMFIT_DASHBOARD_PORT=9000 llmfit\n```\n\n| Variable | Default | Description |\n|---|---|---|\n| `LLMFIT_DASHBOARD_HOST` | `0.0.0.0` | Interface to bind the dashboard server |\n| `LLMFIT_DASHBOARD_PORT` | `8787` | Port to bind the dashboard server |\n\nTo disable the auto-started dashboard, pass `--no-dashboard`:\n\n```sh\nllmfit --no-dashboard\n```\n\n### CLI mode\n\nUse `--cli` or any subcommand to get classic table output:\n\n```sh\n# Table of all models ranked by fit\nllmfit --cli\n\n# Only perfectly fitting models, top 5\nllmfit fit --perfect -n 5\n\n# Show detected system specs\nllmfit system\n\n# List all models in the database\nllmfit list\n\n# Search by name, provider, or size\nllmfit search \"llama 8b\"\n\n# Detailed view of a single model\nllmfit info \"Mistral-7B\"\n\n# Top 5 recommendations (JSON, for agent/script consumption)\nllmfit recommend --json --limit 5\n\n# Recommendations filtered by use case\nllmfit recommend --json --use-case coding --limit 3\n\n# Force a specific runtime (bypass automatic MLX selection on Apple Silicon)\nllmfit recommend --force-runtime llamacpp\nllmfit recommend --force-runtime llamacpp --use-case coding --limit 3\n\n# Plan required hardware for a specific model configuration\nllmfit plan \"Qwen/Qwen3-4B-MLX-4bit\" --context 8192\nllmfit plan \"Qwen/Qwen3-4B-MLX-4bit\" --context 8192 --quant mlx-4bit\nllmfit plan \"Qwen/Qwen3-4B-MLX-4bit\" --context 8192 --target-tps 25 --json\n\n# Run as a node-level REST API (for cluster schedulers / aggregators)\nllmfit serve --host 0.0.0.0 --port 8787\n```\n\n### REST API (`llmfit serve`)\n\n`llmfit serve` starts an HTTP API that exposes the same fit/scoring data used by TUI/CLI, including filtering and top-model selection for a node.\n\n```sh\n# Liveness\ncurl http://localhost:8787/health\n\n# Node hardware info\ncurl http://localhost:8787/api/v1/system\n\n# Full fit list with filters\ncurl \"http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20\"\n\n# Key scheduling endpoint: top runnable models for this node\ncurl \"http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding\"\n\n# Search by model name/provider text\ncurl \"http://localhost:8787/api/v1/models/Mistral?runtime=any\"\n```\n\nSupported query params for `models`/`models/top`:\n\n- `limit` (or `n`): max number of rows returned\n- `perfect`: `true|false` (forces perfect-only when `true`)\n- `min_fit`: `perfect|good|marginal|too_tight`\n- `runtime`: `any|mlx|llamacpp`\n- `use_case`: `general|coding|reasoning|chat|multimodal|embedding`\n- `provider`: provider text filter (substring)\n- `search`: free-text filter across name/provider/size/use-case\n- `sort`: `score|tps|params|mem|ctx|date|use_case`\n- `include_too_tight`: include non-runnable rows (default `false` on `/top`, `true` on `/models`)\n- `max_context`: per-request context cap for memory estimation\n- `force_runtime`: `mlx|llamacpp|vllm` — override automatic runtime selection during analysis\n\nValidate API behavior locally:\n\n```sh\n# spawn server automatically and run endpoint/schema/filter assertions\npython3 scripts/test_api.py --spawn\n\n# or test an already-running server\npython3 scripts/test_api.py --base-url http://127.0.0.1:8787\n```\n\n### GPU memory override\n\nGPU VRAM autodetection can fail on some systems (e.g. broken `nvidia-smi`, VMs, passthrough setups). Use `--memory` to manually specify your GPU's VRAM:\n\n```sh\n# Override with 32 GB VRAM\nllmfit --memory=32G\n\n# Megabytes also work (32000 MB ≈ 31.25 GB)\nllmfit --memory=32000M\n\n# Works with all modes: TUI, CLI, and subcommands\nllmfit --memory=24G --cli\nllmfit --memory=24G fit --perfect -n 5\nllmfit --memory=24G system\nllmfit --memory=24G info \"Llama-3.1-70B\"\nllmfit --memory=24G recommend --json\n```\n\nAccepted suffixes: `G`/`GB`/`GiB` (gigabytes), `M`/`MB`/`MiB` (megabytes), `T`/`TB`/`TiB` (terabytes). Case-insensitive. If no GPU was detected, the override creates a synthetic GPU entry so models are scored for GPU inference.\n\n### Context-length cap for estimation\n\nUse `--max-context` to cap context length used for memory estimation (without changing each model's advertised maximum context):\n\n```sh\n# Estimate memory fit at 4K context\nllmfit --max-context 4096 --cli\n\n# Works with subcommands\nllmfit --max-context 8192 fit --perfect -n 5\nllmfit --max-context 16384 recommend --json --limit 5\n```\n\nIf `--max-context` is not set, llmfit will use `OLLAMA_CONTEXT_LENGTH` when available.\n\n### JSON output\n\nAdd `--json` to any subcommand for machine-readable output:\n\n```sh\nllmfit --json system     # Hardware specs as JSON\nllmfit --json fit -n 10  # Top 10 fits as JSON\nllmfit recommend --json  # Top 5 recommendations (JSON is default for recommend)\nllmfit plan \"Qwen/Qwen2.5-Coder-0.5B-Instruct\" --context 8192 --json\n```\n\n`plan` JSON includes stable fields for:\n- request (`context`, `quantization`, `target_tps`)\n- estimated minimum/recommended hardware\n- per-path feasibility (`gpu`, `cpu_offload`, `cpu_only`)\n- upgrade deltas\n\n---\n\n## How it works\n\n1. **Hardware detection** -- Reads total/available RAM via `sysinfo`, counts CPU cores, and probes for GPUs:\n   - **NVIDIA** -- Multi-GPU support via `nvidia-smi`. Aggregates VRAM across all detected GPUs. Falls back to VRAM estimation from GPU model name if reporting fails.\n   - **AMD** -- Detected via `rocm-smi`.\n   - **Intel Arc** -- Discrete VRAM via sysfs, integrated via `lspci`.\n   - **Apple Silicon** -- Unified memory via `system_profiler`. VRAM = system RAM.\n   - **Ascend** -- Detected via `npu-smi`.\n   - **Backend detection** -- Automatically identifies the acceleration backend (CUDA, Metal, ROCm, SYCL, CPU ARM, CPU x86, Ascend) for speed estimation.\n\n2. **Model database** -- Hundreds models sourced from the HuggingFace API, stored in `data/hf_models.json` and embedded at compile time. Memory requirements are computed from parameter counts across a quantization hierarchy (Q8_0 through Q2_K). VRAM is the primary constraint for GPU inference; system RAM is the fallback for CPU-only execution.\n\n   **MoE support** -- Models with Mixture-of-Experts architectures (Mixtral, DeepSeek-V2/V3) are detected automatically. Only a subset of experts is active per token, so the effective VRAM requirement is much lower than total parameter count suggests. For example, Mixtral 8x7B has 46.7B total parameters but only activates ~12.9B per token, reducing VRAM from 23.9 GB to ~6.6 GB with expert offloading.\n\n3. **Dynamic quantization** -- Instead of assuming a fixed quantization, llmfit tries the best quality quantization that fits your hardware. It walks a hierarchy from Q8_0 (best quality) down to Q2_K (most compressed), picking the highest quality that fits in available memory. If nothing fits at full context, it tries again at half context.\n\n4. **Multi-dimensional scoring** -- Each model is scored across four dimensions (0–100 each):\n\n   | Dimension   | What it measures                                                               |\n   |-------------|--------------------------------------------------------------------------------|\n   | **Quality** | Parameter count, model family reputation, quantization penalty, task alignment |\n   | **Speed**   | Estimated tokens/sec based on backend, params, and quantization                |\n   | **Fit**     | Memory utilization efficiency (sweet spot: 50–80% of available memory)         |\n   | **Context** | Context window capability vs target for the use case                           |\n\n   Dimensions are combined into a weighted composite score. Weights vary by use-case category (General, Coding, Reasoning, Chat, Multimodal, Embedding). For example, Chat weights Speed higher (0.35) while Reasoning weights Quality higher (0.55). Models are ranked by composite score, with unrunnable models (Too Tight) always at the bottom.\n\n5. **Speed estimation** -- Token generation in LLM inference is memory-bandwidth-bound: each token requires reading the full model weights once from VRAM. When the GPU model is recognized, llmfit uses its actual memory bandwidth to estimate throughput:\n\n   Formula: `(bandwidth_GB_s / model_size_GB) × efficiency_factor`\n\n   The efficiency factor (0.55) accounts for kernel overhead, KV-cache reads, and memory controller effects. This approach is validated against published benchmarks from llama.cpp ([Apple Silicon](https://github.com/ggml-org/llama.cpp/discussions/4167), [NVIDIA T4](https://github.com/ggml-org/llama.cpp/discussions/4225)) and real-world measurements.\n\n   The bandwidth lookup table covers ~80 GPUs across NVIDIA (consumer + datacenter), AMD (RDNA + CDNA), and Apple Silicon families.\n\n   For unrecognized GPUs, llmfit falls back to per-backend speed constants:\n\n   | Backend      | Speed constant |\n   |--------------|----------------|\n   | CUDA         | 220            |\n   | Metal        | 160            |\n   | ROCm         | 180            |\n   | SYCL         | 100            |\n   | CPU (ARM)    | 90             |\n   | CPU (x86)    | 70             |\n   | NPU (Ascend) | 390            |\n\n   Fallback formula: `K / params_b × quant_speed_multiplier`, with penalties for CPU offload (0.5×), CPU-only (0.3×), and MoE expert switching (0.8×).\n\n6. **Fit analysis** -- Each model is evaluated for memory compatibility:\n\n   **Run modes:**\n   - **GPU** -- Model fits in VRAM. Fast inference.\n   - **MoE** -- Mixture-of-Experts with expert offloading. Active experts in VRAM, inactive in RAM.\n   - **CPU+GPU** -- VRAM insufficient, spills to system RAM with partial GPU offload.\n   - **CPU** -- No GPU. Model loaded entirely into system RAM.\n\n   **Fit levels:**\n   - **Perfect** -- Recommended memory met on GPU. Requires GPU acceleration.\n   - **Good** -- Fits with headroom. Best achievable for MoE offload or CPU+GPU.\n   - **Marginal** -- Tight fit, or CPU-only (CPU-only always caps here).\n   - **Too Tight** -- Not enough VRAM or system RAM anywhere.\n\n---\n\n## Model database\n\nThe model list is generated by `scripts/scrape_hf_models.py`, a standalone Python script (stdlib only, no pip dependencies) that queries the HuggingFace REST API. Hundreds models & providers including Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, IBM Granite, Allen Institute OLMo, xAI Grok, Cohere, BigCode, 01.ai, Upstage, TII Falcon, HuggingFace, Zhipu GLM, Moonshot Kimi, Baidu ERNIE, and more. The scraper automatically detects MoE architectures via model config (`num_local_experts`, `num_experts_per_tok`) and known architecture mappings.\n\nModel categories span general purpose, coding (CodeLlama, StarCoder2, WizardCoder, Qwen2.5-Coder, Qwen3-Coder), reasoning (DeepSeek-R1, Orca-2), multimodal/vision (Llama 3.2 Vision, Llama 4 Scout/Maverick, Qwen2.5-VL), chat, enterprise (IBM Granite), and embedding (nomic-embed, bge).\n\nSee [MODELS.md](MODELS.md) for the full list.\n\nTo refresh the model database:\n\n```sh\n# Automated update (recommended)\nmake update-models\n\n# Or run the script directly\n./scripts/update_models.sh\n\n# Or manually\npython3 scripts/scrape_hf_models.py\ncargo build --release\n```\n\nThe scraper writes `data/hf_models.json`, which is baked into the binary via `include_str!`. The automated update script backs up existing data, validates JSON output, and rebuilds the binary.\n\nBy default, the scraper enriches models with known GGUF download sources from providers like [unsloth](https://huggingface.co/unsloth) and [bartowski](https://huggingface.co/bartowski). Results are cached in `data/gguf_sources_cache.json` (7-day TTL) to avoid repeated API calls. Use `--no-gguf-sources` to skip enrichment for a faster scrape.\n\n---\n\n## Project structure\n\n```\nsrc/\n  main.rs         -- CLI argument parsing, entrypoint, TUI launch\n  hardware.rs     -- System RAM/CPU/GPU detection (multi-GPU, backend identification)\n  models.rs       -- Model database, quantization hierarchy, dynamic quant selection\n  fit.rs          -- Multi-dimensional scoring (Q/S/F/C), speed estimation, MoE offloading\n  providers.rs    -- Runtime provider integration (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio), install detection, pull/download\n  display.rs      -- Classic CLI table rendering + JSON output\n  tui_app.rs      -- TUI application state, filters, navigation\n  tui_ui.rs       -- TUI rendering (ratatui)\n  tui_events.rs   -- TUI keyboard event handling (crossterm)\ndata/\n  hf_models.json  -- Model database (206 models)\nskills/\n  llmfit-advisor/ -- OpenClaw skill for hardware-aware model recommendations\nscripts/\n  scrape_hf_models.py        -- HuggingFace API scraper\n  update_models.sh            -- Automated database update script\n  install-openclaw-skill.sh   -- Install the OpenClaw skill\nMakefile           -- Build and maintenance commands\n```\n\n---\n\n## Publishing to crates.io\n\nThe `Cargo.toml` already includes the required metadata (description, license, repository). To publish:\n\n```sh\n# Dry run first to catch issues\ncargo publish --dry-run\n\n# Publish for real (requires a crates.io API token)\ncargo login\ncargo publish\n```\n\nBefore publishing, make sure:\n\n- The version in `Cargo.toml` is correct (bump with each release).\n- A `LICENSE` file exists in the repo root. Create one if missing:\n\n```sh\n# For MIT license:\ncurl -sL https://opensource.org/license/MIT -o LICENSE\n# Or write your own. The Cargo.toml declares license = \"MIT\".\n```\n\n- `data/hf_models.json` is committed. It is embedded at compile time and must be present in the published crate.\n- The `exclude` list in `Cargo.toml` keeps `target/`, `scripts/`, and `demo.gif` out of the published crate to keep the download small.\n\nTo publish updates:\n\n```sh\n# Bump version\n# Edit Cargo.toml: version = \"0.2.0\"\ncargo publish\n```\n\n---\n\n## Dependencies\n\n| Crate                  | Purpose                                          |\n|------------------------|--------------------------------------------------|\n| `clap`                 | CLI argument parsing with derive macros          |\n| `sysinfo`              | Cross-platform RAM and CPU detection             |\n| `serde` / `serde_json` | JSON deserialization for model database          |\n| `tabled`               | CLI table formatting                             |\n| `colored`              | CLI colored output                               |\n| `ureq`                 | HTTP client for runtime/provider API integration |\n| `ratatui`              | Terminal UI framework                            |\n| `crossterm`            | Terminal input/output backend for ratatui        |\n\n---\n\n## Runtime provider integration\n\nllmfit supports multiple local runtime providers:\n\n- **Ollama** (daemon/API based pulls)\n- **llama.cpp** (direct GGUF downloads from Hugging Face + local cache detection)\n- **MLX** (Apple Silicon / mlx-community model cache + optional server)\n- **Docker Model Runner** (Docker Desktop's built-in model serving)\n- **LM Studio** (local model server with REST API for model management + downloads)\n\nWhen more than one compatible provider is available for a model, pressing `d` in the TUI opens a provider picker modal.\n\n### Ollama integration\n\nllmfit integrates with [Ollama](https://ollama.com) to detect which models you already have installed and to download new ones directly from the TUI.\n\n### Requirements\n\n- **Ollama must be installed and running** (`ollama serve` or the Ollama desktop app)\n- llmfit connects to `http://localhost:11434` (Ollama's default API port)\n- No configuration needed — if Ollama is running, llmfit detects it automatically\n\n### Remote Ollama instances\n\nTo connect to Ollama running on a different machine or port, set the `OLLAMA_HOST` environment variable:\n\n```sh\n# Connect to Ollama on a specific IP and port\nOLLAMA_HOST=\"http://192.168.1.100:11434\" llmfit\n\n# Connect via hostname  \nOLLAMA_HOST=\"http://ollama-server:666\" llmfit\n\n# Works with all TUI and CLI commands\nOLLAMA_HOST=\"http://192.168.1.100:11434\" llmfit --cli\nOLLAMA_HOST=\"http://192.168.1.100:11434\" llmfit fit --perfect -n 5\n```\n\nThis is useful for:\n- Running llmfit on one machine while Ollama serves from another (e.g., GPU server + laptop client)\n- Connecting to Ollama running in Docker containers with custom ports\n- Using Ollama behind reverse proxies or load balancers\n\n### How it works\n\nOn startup, llmfit queries `GET /api/tags` to list your installed Ollama models. Each installed model gets a green **✓** in the **Inst** column of the TUI. The system bar shows `Ollama: ✓ (N installed)`.\n\nWhen you press `d` on a model, llmfit sends `POST /api/pull` to Ollama to download it. The row highlights with an animated progress indicator showing download progress in real-time. Once complete, the model is immediately available for use with Ollama.\n\nIf Ollama is not running, Ollama-specific operations are skipped; the TUI still supports other providers like llama.cpp where available.\n\n### llama.cpp integration\n\nllmfit integrates with [llama.cpp](https://github.com/ggml-org/llama.cpp) as a runtime/download provider in both TUI and CLI.\n\nRequirements:\n\n- `llama-cli` or `llama-server` available in `PATH` (for runtime detection)\n- network access to Hugging Face for GGUF downloads\n\nHow it works:\n\n- llmfit maps HF models to known GGUF repos (with heuristic fallbacks)\n- downloads GGUF files into the local llama.cpp model cache\n- marks models installed when matching GGUF files are present locally\n\n### Docker Model Runner integration\n\nllmfit integrates with [Docker Model Runner](https://docs.docker.com/desktop/features/model-runner/), Docker Desktop's built-in model serving feature.\n\nRequirements:\n\n- Docker Desktop with Model Runner enabled\n- Default endpoint: `http://localhost:12434`\n\nHow it works:\n\n- llmfit queries `GET /engines` to list models available in Docker Model Runner\n- models are matched to the HF database using Ollama-style tag mapping (Docker Model Runner uses `ai/<tag>` naming)\n- pressing `d` in the TUI pulls via `docker model pull`\n\n### Remote Docker Model Runner instances\n\nTo connect to Docker Model Runner on a different host or port, set the `DOCKER_MODEL_RUNNER_HOST` environment variable:\n\n```sh\nDOCKER_MODEL_RUNNER_HOST=\"http://192.168.1.100:12434\" llmfit\n```\n\n### LM Studio integration\n\nllmfit integrates with [LM Studio](https://lmstudio.ai) as a local model server with built-in model download capabilities.\n\nRequirements:\n\n- LM Studio must be running with its local server enabled\n- Default endpoint: `http://127.0.0.1:1234`\n\nHow it works:\n\n- llmfit queries `GET /v1/models` to list models available in LM Studio\n- pressing `d` in the TUI triggers a download via `POST /api/v1/models/download`\n- download progress is tracked by polling `GET /api/v1/models/download-status`\n- LM Studio accepts HuggingFace model names directly, so no name mapping is needed\n\n### Remote LM Studio instances\n\nTo connect to LM Studio on a different host or port, set the `LMSTUDIO_HOST` environment variable:\n\n```sh\nLMSTUDIO_HOST=\"http://192.168.1.100:1234\" llmfit\n```\n\n### Model name mapping\n\nllmfit's database uses HuggingFace model names (e.g. `Qwen/Qwen2.5-Coder-14B-Instruct`) while Ollama uses its own naming scheme (e.g. `qwen2.5-coder:14b`). llmfit maintains an accurate mapping table between the two so that install detection and pulls resolve to the correct model. Each mapping is exact — `qwen2.5-coder:14b` maps to the Coder model, not the base `qwen2.5:14b`.\n\n---\n\n## Platform support\n\n- **Linux** -- Full support. GPU detection via `nvidia-smi` (NVIDIA), `rocm-smi` (AMD), sysfs/`lspci` (Intel Arc) and `npu-smi` (Ascend).\n- **macOS (Apple Silicon)** -- Full support. Detects unified memory via `system_profiler`. VRAM = system RAM (shared pool). Models run via Metal GPU acceleration.\n- **macOS (Intel)** -- RAM and CPU detection works. Discrete GPU detection if `nvidia-smi` available.\n- **Windows** -- RAM and CPU detection works. NVIDIA GPU detection via `nvidia-smi` if installed.\n- **Android / Termux / PRoot** -- CPU and RAM detection usually work, but GPU autodetection is not currently supported. Mobile GPUs such as Adreno typically are not visible through the desktop/server probing interfaces llmfit uses.\n\n### GPU support\n\n| Vendor                 | Detection method              | VRAM reporting                 |\n|------------------------|-------------------------------|--------------------------------|\n| NVIDIA                 | `nvidia-smi`                  | Exact dedicated VRAM           |\n| AMD                    | `rocm-smi`                    | Detected (VRAM may be unknown) |\n| Intel Arc (discrete)   | sysfs (`mem_info_vram_total`) | Exact dedicated VRAM           |\n| Intel Arc (integrated) | `lspci`                       | Shared system memory           |\n| Apple Silicon          | `system_profiler`             | Unified memory (= system RAM)  |\n| Ascend                 | `npu-smi`                     | Detected (VRAM may be unknown) |\n\nIf autodetection fails or reports incorrect values, use `--memory=<SIZE>` to override (see [GPU memory override](#gpu-memory-override) above).\n\n### Android / Termux note\n\nOn Android setups such as **Termux + PRoot**, llmfit usually cannot see mobile GPUs through the standard Linux detection paths (`nvidia-smi`, `rocm-smi`, DRM/sysfs, `lspci`, etc.). In those environments, \"no GPU detected\" is expected with the current implementation.\n\nIf you still want GPU-style recommendations on a unified-memory phone or tablet, use a manual memory override:\n\n```sh\nllmfit --memory=8G fit -n 20\nllmfit recommend --json --memory=8G --limit 10\n```\n\nThis is a workaround for recommendation/scoring only; it does not provide true Android GPU runtime detection.\n\n---\n\n## Contributing\n\nContributions are welcome, especially new models.\n\n### Adding a model\n\n1. Add the model's HuggingFace repo ID (e.g., `meta-llama/Llama-3.1-8B`) to the `TARGET_MODELS` list in `scripts/scrape_hf_models.py`.\n2. If the model is gated (requires HuggingFace authentication to access metadata), add a fallback entry to the `FALLBACKS` list in the same script with the parameter count and context length.\n3. Run the automated update script:\n   ```sh\n   make update-models\n   # or: ./scripts/update_models.sh\n   ```\n4. Verify the updated model list: `./target/release/llmfit list`\n5. Update [MODELS.md](MODELS.md) by running: `python3 << 'EOF' < scripts/...` (see commit history for the generator script)\n6. Open a pull request.\n\nSee [MODELS.md](MODELS.md) for the current list and [AGENTS.md](AGENTS.md) for architecture details.\n\n---\n\n## OpenClaw integration\n\nllmfit ships as an [OpenClaw](https://github.com/openclaw/openclaw) skill that lets the agent recommend hardware-appropriate local models and auto-configure Ollama/vLLM/LM Studio providers.\n\n### Install the skill\n\n```sh\n# From the llmfit repo\n./scripts/install-openclaw-skill.sh\n\n# Or manually\ncp -r skills/llmfit-advisor ~/.openclaw/skills/\n```\n\nOnce installed, ask your OpenClaw agent things like:\n\n- \"What local models can I run?\"\n- \"Recommend a coding model for my hardware\"\n- \"Set up Ollama with the best models for my GPU\"\n\nThe agent will call `llmfit recommend --json` under the hood, interpret the results, and offer to configure your `openclaw.json` with optimal model choices.\n\n### How it works\n\nThe skill teaches the OpenClaw agent to:\n\n1. Detect your hardware via `llmfit --json system`\n2. Get ranked recommendations via `llmfit recommend --json`\n3. Map HuggingFace model names to Ollama/vLLM/LM Studio tags\n4. Configure `models.providers.ollama.models` in `openclaw.json`\n\nSee [skills/llmfit-advisor/SKILL.md](skills/llmfit-advisor/SKILL.md) for the full skill definition.\n\n---\n\n## Alternatives\n\nIf you're looking for a different approach, check out [llm-checker](https://github.com/Pavelevich/llm-checker) -- a Node.js CLI tool with Ollama integration that can pull and benchmark models directly. It takes a more hands-on approach by actually running models on your hardware via Ollama, rather than estimating from specs. Good if you already have Ollama installed and want to test real-world performance. Note that it doesn't support MoE (Mixture-of-Experts) architectures -- all models are treated as dense, so memory estimates for models like Mixtral or DeepSeek-V3 will reflect total parameter count rather than the smaller active subset.\n\n---\n\n## License\n\nMIT\n"
  },
  {
    "path": "README.zh.md",
    "content": "# llmfit\n\n<p align=\"center\">\n  <img src=\"assets/icon.svg\" alt=\"llmfit 图标\" width=\"128\" height=\"128\">\n</p>\n\n<p align=\"center\">\n  <a href=\"README.md\">English</a> ·\n  <b>中文</b>\n</p>\n\n<p align=\"center\">\n  <a href=\"https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml\"><img src=\"https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"></a>\n  <a href=\"https://crates.io/crates/llmfit\"><img src=\"https://img.shields.io/crates/v/llmfit.svg\" alt=\"Crates.io\"></a>\n  <a href=\"LICENSE\"><img src=\"https://img.shields.io/badge/license-MIT-blue.svg\" alt=\"许可证\"></a>\n</p>\n\n**数百种模型与提供商，一条命令即可找出你的硬件能运行哪些模型。**\n\n一款终端工具，根据你系统的 RAM、CPU 和 GPU 为 LLM 模型匹配合适的规格。自动检测硬件，从质量、速度、适配度和上下文四个维度为每个模型打分，告诉你哪些模型能在你的机器上流畅运行。\n\n内置交互式 TUI（默认）和经典 CLI 模式。支持多 GPU 配置、MoE（混合专家）架构、动态量化选择、速度估算，以及本地运行时提供商（Ollama、llama.cpp、MLX、Docker Model Runner、LM Studio）。\n\n> **姐妹项目：** 欢迎查看 [sympozium](https://github.com/AlexsJones/sympozium/)，用于在 Kubernetes 中管理 Agent。\n\n![演示](demo.gif)\n\n---\n\n## 安装\n\n### Windows\n```sh\nscoop install llmfit\n```\n\n如果尚未安装 Scoop，请参阅 [Scoop 安装指南](https://scoop.sh/)。\n\n### macOS / Linux\n\n#### Homebrew\n```sh\nbrew install llmfit\n```\n\n#### 快速安装\n```sh\ncurl -fsSL https://llmfit.axjns.dev/install.sh | sh\n```\n\n从 GitHub 下载最新发布的二进制文件并安装到 `/usr/local/bin`（如果没有 sudo 则安装到 `~/.local/bin`）。\n\n**安装到 `~/.local/bin`（无需 sudo）：**\n```sh\ncurl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local\n```\n\n### Docker / Podman\n```sh\ndocker run ghcr.io/alexsjones/llmfit\n```\n此命令会输出 `llmfit recommend` 的 JSON 结果，可以用 `jq` 进一步查询。\n```\npodman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'\n```\n\n### 从源码构建\n```sh\ngit clone https://github.com/AlexsJones/llmfit.git\ncd llmfit\ncargo build --release\n# 二进制文件位于 target/release/llmfit\n```\n\n---\n\n## 使用方法\n\n### TUI（默认）\n\n```sh\nllmfit\n```\n\n启动交互式终端 UI。系统配置（CPU、RAM、GPU 名称、VRAM、后端）显示在顶部。模型按综合评分排序，以可滚动的表格列出。每行显示模型的评分、预估 tok/s、最佳量化方案、运行模式、内存占用和用途分类。\n\n| 按键                       | 操作                                            |\n|----------------------------|-------------------------------------------------|\n| `Up` / `Down` 或 `j` / `k` | 浏览模型                                        |\n| `/`                        | 进入搜索模式（按名称、提供商、参数量、用途模糊匹配） |\n| `Esc` 或 `Enter`           | 退出搜索模式                                    |\n| `Ctrl-U`                   | 清除搜索                                        |\n| `f`                        | 切换适配度过滤：全部、可运行、完美、良好、勉强       |\n| `a`                        | 切换可用性过滤：全部、GGUF 可用、已安装            |\n| `s`                        | 切换排序列：评分、参数量、内存%、上下文、日期、用途   |\n| `v`                        | 进入 Visual 模式（多选模型）                      |\n| `V`                        | 进入 Select 模式（按列过滤）                      |\n| `t`                        | 切换颜色主题（自动保存）                          |\n| `p`                        | 打开 Plan 模式（硬件规划）                        |\n| `P`                        | 打开提供商过滤弹窗                              |\n| `U`                        | 打开用途过滤弹窗                                |\n| `C`                        | 打开能力过滤弹窗                                |\n| `m`                        | 标记选中模型用于对比                            |\n| `c`                        | 打开对比视图（已标记 vs 选中）                    |\n| `x`                        | 清除对比标记                                    |\n| `i`                        | 切换已安装优先排序（任何已检测的运行时提供商）    |\n| `d`                        | 下载选中模型（多个提供商可用时弹出选择器）        |\n| `r`                        | 从运行时提供商刷新已安装模型                    |\n| `Enter`                    | 切换选中模型的详情视图                          |\n| `PgUp` / `PgDn`            | 滚动 10 行                                      |\n| `g` / `G`                  | 跳转到顶部 / 底部                               |\n| `q`                        | 退出                                            |\n\n### 类 Vim 模式\n\nTUI 使用类 Vim 模式，当前模式显示在左下角状态栏。当前模式决定哪些按键生效。\n\n#### Normal 模式\n\n默认模式。可浏览、搜索、过滤和打开各种视图。上表中的所有按键均在此模式下有效。\n\n#### Visual 模式 (`v`)\n\n选择连续的多个模型进行批量对比。按 `v` 在当前行设置锚点，然后用 `j`/`k` 或方向键扩展选区。选中的行会高亮显示。\n\n| 按键               | 操作                                 |\n|--------------------|--------------------------------------|\n| `j` / `k` 或方向键 | 向上/下扩展选区                      |\n| `c`                | 对比所有选中模型（打开多模型对比视图） |\n| `m`                | 标记当前模型用于双模型对比           |\n| `Esc` 或 `v`       | 退出 Visual 模式                     |\n\n多模型对比视图以表格形式显示，行为属性（评分、tok/s、适配度、内存%、参数量、模式、上下文、量化等），列为模型。最优值会高亮显示。如果选中的模型超出屏幕宽度，可用 `h`/`l` 或方向键水平滚动。\n\n#### Select 模式 (`V`)\n\n按列过滤。按 `V`（shift-v）进入 Select 模式，然后用 `h`/`l` 或方向键在列标题间移动。当前列会高亮显示。按 `Enter` 或 `Space` 激活该列对应的过滤器：\n\n| 列                        | 过滤操作                                              |\n|---------------------------|-------------------------------------------------------|\n| Inst                      | 切换可用性过滤                                        |\n| Model                     | 进入搜索模式                                          |\n| Provider                  | 打开提供商弹窗                                        |\n| Params                    | 打开参数量分组弹窗（<3B、3-7B、7-14B、14-30B、30-70B、70B+） |\n| Score、tok/s、Mem%、Ctx、Date | 按该列排序                                            |\n| Quant                     | 打开量化弹窗                                          |\n| Mode                      | 打开运行模式弹窗（GPU、MoE、CPU+GPU、CPU）                 |\n| Fit                       | 切换适配度过滤                                        |\n| Use Case                  | 打开用途弹窗                                          |\n\n在 Select 模式下仍可用 `j`/`k` 浏览行，以便在应用过滤器时查看效果。按 `Esc` 返回 Normal 模式。\n\n### TUI Plan 模式 (`p`)\n\nPlan 模式与常规适配分析相反：不是问\"我的硬件能跑什么？\"，而是估算\"这个模型配置需要什么硬件？\"。\n\n在选中的行上按 `p`，然后：\n\n| 按键                   | 操作                                     |\n|------------------------|------------------------------------------|\n| `Tab` / `j` / `k`      | 在可编辑字段间移动（上下文、量化、目标 TPS） |\n| `Left` / `Right`       | 在当前字段内移动光标                     |\n| 输入                   | 编辑当前字段                             |\n| `Backspace` / `Delete` | 删除字符                                 |\n| `Ctrl-U`               | 清空当前字段                             |\n| `Esc` 或 `q`           | 退出 Plan 模式                           |\n\nPlan 模式显示以下估算：\n- 最低和推荐的 VRAM/RAM/CPU 核心数\n- 可行的运行路径（GPU、CPU 卸载、纯 CPU）\n- 达到更好适配目标所需的升级差距\n\n### 主题\n\n按 `t` 可在 10 种内置颜色主题间切换。选择会自动保存到 `~/.config/llmfit/theme`，下次启动时恢复。\n\n| 主题                     | 描述                                        |\n|--------------------------|---------------------------------------------|\n| **Default**              | llmfit 原始配色                             |\n| **Dracula**              | 深紫色背景搭配柔和色调                      |\n| **Solarized**            | Ethan Schoonover 的 Solarized Dark 配色方案 |\n| **Nord**                 | 极地风格，冷蓝灰色调                         |\n| **Monokai**              | Monokai Pro 暖色语法配色                    |\n| **Gruvbox**              | 复古风格，暖色大地色调                       |\n| **Catppuccin Latte**     | 🌻 浅色主题——和谐的柔和反转配色             |\n| **Catppuccin Frappé**    | 🪴 低对比度深色——柔和、内敛的美学            |\n| **Catppuccin Macchiato** | 🌺 中对比度深色——温柔舒缓的色调             |\n| **Catppuccin Mocha**     | 🌿 最深的暗色变体——温馨且色彩丰富           |\n\n### Web 仪表盘\n\n当你以非 JSON 模式运行 `llmfit` 时，会自动在后台启动 Web 仪表盘，默认监听 `0.0.0.0:8787`。可在同一网络中的任意浏览器打开：\n\n```\nhttp://<你的机器IP>:8787\n```\n\n你也可以通过环境变量覆盖主机或端口：\n\n```sh\nLLMFIT_DASHBOARD_HOST=0.0.0.0 LLMFIT_DASHBOARD_PORT=9000 llmfit\n```\n\n| 变量 | 默认值 | 说明 |\n|---|---|---|\n| `LLMFIT_DASHBOARD_HOST` | `0.0.0.0` | 仪表盘服务绑定的网卡地址 |\n| `LLMFIT_DASHBOARD_PORT` | `8787` | 仪表盘服务绑定的端口 |\n\n如需禁用自动启动仪表盘，添加 `--no-dashboard`：\n\n```sh\nllmfit --no-dashboard\n```\n\n### CLI 模式\n\n使用 `--cli` 或任何子命令获取经典表格输出：\n\n```sh\n# 按适配度排序的所有模型表格\nllmfit --cli\n\n# 仅显示完美适配的模型，前 5 个\nllmfit fit --perfect -n 5\n\n# 显示检测到的系统配置\nllmfit system\n\n# 列出数据库中所有模型\nllmfit list\n\n# 按名称、提供商或参数量搜索\nllmfit search \"llama 8b\"\n\n# 单个模型的详细信息\nllmfit info \"Mistral-7B\"\n\n# 前 5 个推荐（JSON 格式，供 agent/脚本使用）\nllmfit recommend --json --limit 5\n\n# 按用途过滤推荐\nllmfit recommend --json --use-case coding --limit 3\n\n# 为特定模型配置规划所需硬件\nllmfit plan \"Qwen/Qwen3-4B-MLX-4bit\" --context 8192\nllmfit plan \"Qwen/Qwen3-4B-MLX-4bit\" --context 8192 --quant mlx-4bit\nllmfit plan \"Qwen/Qwen3-4B-MLX-4bit\" --context 8192 --target-tps 25 --json\n\n# 作为节点级 REST API 运行（供集群调度器/聚合器使用）\nllmfit serve --host 0.0.0.0 --port 8787\n```\n\n### REST API (`llmfit serve`)\n\n`llmfit serve` 启动一个 HTTP API，提供与 TUI/CLI 相同的适配/评分数据，包括过滤和节点级最优模型选择。\n\n```sh\n# 健康检查\ncurl http://localhost:8787/health\n\n# 节点硬件信息\ncurl http://localhost:8787/api/v1/system\n\n# 带过滤的完整适配列表\ncurl \"http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20\"\n\n# 关键调度端点：该节点的最佳可运行模型\ncurl \"http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding\"\n\n# 按模型名称/提供商搜索\ncurl \"http://localhost:8787/api/v1/models/Mistral?runtime=any\"\n```\n\n`models`/`models/top` 支持的查询参数：\n\n- `limit`（或 `n`）：返回的最大行数\n- `perfect`：`true|false`（为 `true` 时强制仅显示完美适配）\n- `min_fit`：`perfect|good|marginal|too_tight`\n- `runtime`：`any|mlx|llamacpp`\n- `use_case`：`general|coding|reasoning|chat|multimodal|embedding`\n- `provider`：提供商文本过滤（子字符串匹配）\n- `search`：跨名称/提供商/参数量/用途的全文过滤\n- `sort`：`score|tps|params|mem|ctx|date|use_case`\n- `include_too_tight`：包含不可运行的行（`/top` 默认 `false`，`/models` 默认 `true`）\n- `max_context`：每次请求的上下文长度上限，用于内存估算\n\n本地验证 API 行为：\n\n```sh\n# 自动启动服务器并运行端点/模式/过滤断言\npython3 scripts/test_api.py --spawn\n\n# 或测试已运行的服务器\npython3 scripts/test_api.py --base-url http://127.0.0.1:8787\n```\n\n### GPU 显存覆盖\n\n在某些系统上 GPU VRAM 自动检测可能失败（例如 `nvidia-smi` 故障、虚拟机、直通配置）。使用 `--memory` 手动指定 GPU 显存：\n\n```sh\n# 覆盖为 32 GB VRAM\nllmfit --memory=32G\n\n# 也支持兆字节（32000 MB ≈ 31.25 GB）\nllmfit --memory=32000M\n\n# 适用于所有模式：TUI、CLI 和子命令\nllmfit --memory=24G --cli\nllmfit --memory=24G fit --perfect -n 5\nllmfit --memory=24G system\nllmfit --memory=24G info \"Llama-3.1-70B\"\nllmfit --memory=24G recommend --json\n```\n\n支持的后缀：`G`/`GB`/`GiB`（千兆字节）、`M`/`MB`/`MiB`（兆字节）、`T`/`TB`/`TiB`（太字节）。不区分大小写。如果未检测到 GPU，覆盖值会创建一个虚拟 GPU 条目，以便按 GPU 推理对模型评分。\n\n### 上下文长度上限\n\n使用 `--max-context` 限制用于内存估算的上下文长度（不改变每个模型标称的最大上下文）：\n\n```sh\n# 按 4K 上下文估算内存适配\nllmfit --max-context 4096 --cli\n\n# 适用于子命令\nllmfit --max-context 8192 fit --perfect -n 5\nllmfit --max-context 16384 recommend --json --limit 5\n```\n\n如果未设置 `--max-context`，llmfit 会在可用时使用 `OLLAMA_CONTEXT_LENGTH`。\n\n### JSON 输出\n\n在任何子命令后添加 `--json` 获取机器可读输出：\n\n```sh\nllmfit --json system     # 硬件信息（JSON）\nllmfit --json fit -n 10  # 前 10 个适配结果（JSON）\nllmfit recommend --json  # 前 5 个推荐（recommend 默认输出 JSON）\nllmfit plan \"Qwen/Qwen2.5-Coder-0.5B-Instruct\" --context 8192 --json\n```\n\n`plan` 的 JSON 输出包含以下稳定字段：\n- 请求参数（`context`、`quantization`、`target_tps`）\n- 估算的最低/推荐硬件\n- 每条路径的可行性（`gpu`、`cpu_offload`、`cpu_only`）\n- 升级差距\n\n---\n\n## 工作原理\n\n1. **硬件检测** -- 通过 `sysinfo` 读取总计/可用 RAM，统计 CPU 核心数，并探测 GPU：\n   - **NVIDIA** -- 通过 `nvidia-smi` 支持多 GPU。聚合所有检测到的 GPU 的 VRAM。如果报告失败，则根据 GPU 型号名称估算 VRAM。\n   - **AMD** -- 通过 `rocm-smi` 检测。\n   - **Intel Arc** -- 独立显卡通过 sysfs 检测 VRAM，集成显卡通过 `lspci` 检测。\n   - **Apple Silicon** -- 通过 `system_profiler` 检测统一内存。VRAM = 系统 RAM。\n   - **Ascend** -- 通过 `npu-smi` 检测。\n   - **后端检测** -- 自动识别加速后端（CUDA、Metal、ROCm、SYCL、CPU ARM、CPU x86、Ascend）用于速度估算。\n\n2. **模型数据库** -- 数百个模型来源于 HuggingFace API，存储在 `data/hf_models.json` 中并在编译时嵌入。内存需求根据量化层级（Q8_0 到 Q2_K）的参数量计算。VRAM 是 GPU 推理的主要约束；系统 RAM 是纯 CPU 执行的后备方案。\n\n   **MoE 支持** -- 自动检测混合专家架构（Mixtral、DeepSeek-V2/V3）的模型。每个 token 只有部分专家处于活跃状态，因此实际 VRAM 需求远低于总参数量的暗示。例如，Mixtral 8x7B 总参数量为 46.7B，但每个 token 仅激活约 12.9B，通过专家卸载将 VRAM 需求从 23.9 GB 降至约 6.6 GB。\n\n3. **动态量化** -- llmfit 不假设固定量化，而是尝试适配你硬件的最高质量量化。它从 Q8_0（最高质量）到 Q2_K（最高压缩）逐级尝试，选择能装入可用内存的最高质量等级。如果在完整上下文下无法装入，则尝试半上下文。\n\n4. **多维评分** -- 每个模型按四个维度评分（每个 0-100）：\n\n   | 维度       | 衡量内容                                 |\n   |------------|------------------------------------------|\n   | **质量**   | 参数量、模型系列声誉、量化惩罚、任务对齐度  |\n   | **速度**   | 基于后端、参数量和量化的预估 tokens/sec   |\n   | **适配度** | 内存利用效率（最佳区间：可用内存的 50-80%） |\n   | **上下文** | 上下文窗口能力与用途目标的对比           |\n\n   各维度通过加权合成为综合评分。权重因用途类别而异（通用、编程、推理、对话、多模态、嵌入）。例如，对话类更侧重速度（0.35），推理类更侧重质量（0.55）。模型按综合评分排序，不可运行的模型（Too Tight）始终排在最后。\n\n5. **速度估算** -- LLM 推理中的 token 生成受内存带宽限制：每个 token 需要从 VRAM 完整读取一次模型权重。当识别出 GPU 型号时，llmfit 使用其实际内存带宽来估算吞吐量：\n\n   公式：`(bandwidth_GB_s / model_size_GB) × efficiency_factor`\n\n   效率因子（0.55）考虑了内核开销、KV 缓存读取和内存控制器效应。该方法已通过 llama.cpp 的公开基准测试验证（[Apple Silicon](https://github.com/ggml-org/llama.cpp/discussions/4167)、[NVIDIA T4](https://github.com/ggml-org/llama.cpp/discussions/4225)）及实际测量数据。\n\n   带宽查找表涵盖约 80 种 GPU，覆盖 NVIDIA（消费级 + 数据中心级）、AMD（RDNA + CDNA）和 Apple Silicon 系列。\n\n   对于未识别的 GPU，llmfit 使用按后端的速度常量作为回退：\n\n   | 后端         | 速度常量 |\n   |--------------|----------|\n   | CUDA         | 220      |\n   | Metal        | 160      |\n   | ROCm         | 180      |\n   | SYCL         | 100      |\n   | CPU (ARM)    | 90       |\n   | CPU (x86)    | 70       |\n   | NPU (Ascend) | 390      |\n\n   回退公式：`K / params_b × quant_speed_multiplier`，对 CPU 卸载（0.5x）、纯 CPU（0.3x）和 MoE 专家切换（0.8x）施加惩罚。\n\n6. **适配分析** -- 评估每个模型的内存兼容性：\n\n   **运行模式：**\n   - **GPU** -- 模型完全装入 VRAM。推理速度快。\n   - **MoE** -- 混合专家 + 专家卸载。活跃专家在 VRAM 中，非活跃专家在 RAM 中。\n   - **CPU+GPU** -- VRAM 不足，溢出到系统 RAM 并使用部分 GPU 卸载。\n   - **CPU** -- 无 GPU。模型完全加载到系统 RAM 中。\n\n   **适配等级：**\n   - **Perfect（完美）** -- GPU 上满足推荐内存。需要 GPU 加速。\n   - **Good（良好）** -- 有余量地装入。MoE 卸载或 CPU+GPU 模式的最佳等级。\n   - **Marginal（勉强）** -- 装入紧张，或纯 CPU 运行（纯 CPU 始终封顶在此等级）。\n   - **Too Tight（过紧）** -- VRAM 和系统 RAM 均不足。\n\n---\n\n## 模型数据库\n\n模型列表由 `scripts/scrape_hf_models.py` 生成，这是一个独立的 Python 脚本（仅使用标准库，无需 pip 依赖），通过 HuggingFace REST API 查询。数百个模型和提供商，包括 Meta Llama、Mistral、Qwen、Google Gemma、Microsoft Phi、DeepSeek、IBM Granite、Allen Institute OLMo、xAI Grok、Cohere、BigCode、01.ai、Upstage、TII Falcon、HuggingFace、Zhipu GLM、Moonshot Kimi、Baidu ERNIE 等。爬虫通过模型配置（`num_local_experts`、`num_experts_per_tok`）和已知架构映射自动检测 MoE 架构。\n\n模型类别涵盖通用、编程（CodeLlama、StarCoder2、WizardCoder、Qwen2.5-Coder、Qwen3-Coder）、推理（DeepSeek-R1、Orca-2）、多模态/视觉（Llama 3.2 Vision、Llama 4 Scout/Maverick、Qwen2.5-VL）、对话、企业级（IBM Granite）和嵌入（nomic-embed、bge）。\n\n完整列表请参阅 [MODELS.md](MODELS.md)。\n\n刷新模型数据库：\n\n```sh\n# 自动更新（推荐）\nmake update-models\n\n# 或直接运行脚本\n./scripts/update_models.sh\n\n# 或手动执行\npython3 scripts/scrape_hf_models.py\ncargo build --release\n```\n\n爬虫将结果写入 `data/hf_models.json`，通过 `include_str!` 在编译时嵌入二进制文件。自动更新脚本会备份现有数据、验证 JSON 输出并重新构建二进制文件。\n\n默认情况下，爬虫会使用来自 [unsloth](https://huggingface.co/unsloth) 和 [bartowski](https://huggingface.co/bartowski) 等提供商的已知 GGUF 下载源来丰富模型信息。结果缓存在 `data/gguf_sources_cache.json` 中（7 天 TTL），以避免重复 API 调用。使用 `--no-gguf-sources` 可跳过丰富步骤以加快爬取速度。\n\n---\n\n## 项目结构\n\n```\nsrc/\n  main.rs         -- CLI 参数解析、入口、TUI 启动\n  hardware.rs     -- 系统 RAM/CPU/GPU 检测（多 GPU、后端识别）\n  models.rs       -- 模型数据库、量化层级、动态量化选择\n  fit.rs          -- 多维评分（Q/S/F/C）、速度估算、MoE 卸载\n  providers.rs    -- 运行时提供商集成（Ollama、llama.cpp、MLX、Docker Model Runner、LM Studio）、安装检测、拉取/下载\n  display.rs      -- 经典 CLI 表格渲染 + JSON 输出\n  tui_app.rs      -- TUI 应用状态、过滤器、导航\n  tui_ui.rs       -- TUI 渲染（ratatui）\n  tui_events.rs   -- TUI 键盘事件处理（crossterm）\ndata/\n  hf_models.json  -- 模型数据库（206 个模型）\nskills/\n  llmfit-advisor/ -- 用于硬件感知模型推荐的 OpenClaw 技能\nscripts/\n  scrape_hf_models.py        -- HuggingFace API 爬虫\n  update_models.sh            -- 自动化数据库更新脚本\n  install-openclaw-skill.sh   -- 安装 OpenClaw 技能\nMakefile           -- 构建和维护命令\n```\n\n---\n\n## 发布到 crates.io\n\n`Cargo.toml` 已包含所需的元数据（描述、许可证、仓库地址）。发布步骤：\n\n```sh\n# 先进行试运行以发现问题\ncargo publish --dry-run\n\n# 正式发布（需要 crates.io API token）\ncargo login\ncargo publish\n```\n\n发布前请确认：\n\n- `Cargo.toml` 中的版本号正确（每次发布时递增）。\n- 仓库根目录存在 `LICENSE` 文件。如果缺失请创建：\n\n```sh\n# MIT 许可证：\ncurl -sL https://opensource.org/license/MIT -o LICENSE\n# 或自行编写。Cargo.toml 声明 license = \"MIT\"。\n```\n\n- `data/hf_models.json` 已提交。它在编译时嵌入，必须存在于发布的 crate 中。\n- `Cargo.toml` 中的 `exclude` 列表将 `target/`、`scripts/` 和 `demo.gif` 排除在发布的 crate 之外，以减小下载体积。\n\n发布更新：\n\n```sh\n# 递增版本号\n# 编辑 Cargo.toml: version = \"0.2.0\"\ncargo publish\n```\n\n---\n\n## 依赖\n\n| Crate                  | 用途                                     |\n|------------------------|------------------------------------------|\n| `clap`                 | 基于 derive 宏的 CLI 参数解析            |\n| `sysinfo`              | 跨平台 RAM 和 CPU 检测                   |\n| `serde` / `serde_json` | 模型数据库的 JSON 反序列化               |\n| `tabled`               | CLI 表格格式化                           |\n| `colored`              | CLI 彩色输出                             |\n| `ureq`                 | 用于运行时/提供商 API 集成的 HTTP 客户端 |\n| `ratatui`              | 终端 UI 框架                             |\n| `crossterm`            | ratatui 的终端输入/输出后端              |\n\n---\n\n## 运行时提供商集成\n\nllmfit 支持多个本地运行时提供商：\n\n- **Ollama**（基于守护进程/API 的拉取）\n- **llama.cpp**（从 Hugging Face 直接下载 GGUF + 本地缓存检测）\n- **MLX**（Apple Silicon / mlx-community 模型缓存 + 可选服务器）\n- **Docker Model Runner**（Docker Desktop 内置的模型服务）\n- **LM Studio**（本地模型服务器，支持 REST API 模型管理和下载）\n\n当某个模型有多个兼容的提供商可用时，在 TUI 中按 `d` 会打开提供商选择弹窗。\n\n### Ollama 集成\n\nllmfit 与 [Ollama](https://ollama.com) 集成，可检测你已安装的模型并直接从 TUI 下载新模型。\n\n### 要求\n\n- **Ollama 必须已安装且正在运行**（`ollama serve` 或 Ollama 桌面应用）\n- llmfit 连接到 `http://localhost:11434`（Ollama 默认 API 端口）\n- 无需配置 -- 如果 Ollama 正在运行，llmfit 会自动检测到它\n\n### 远程 Ollama 实例\n\n要连接到在其他机器或端口上运行的 Ollama，设置 `OLLAMA_HOST` 环境变量：\n\n```sh\n# 连接到指定 IP 和端口的 Ollama\nOLLAMA_HOST=\"http://192.168.1.100:11434\" llmfit\n\n# 通过主机名连接\nOLLAMA_HOST=\"http://ollama-server:666\" llmfit\n\n# 适用于所有 TUI 和 CLI 命令\nOLLAMA_HOST=\"http://192.168.1.100:11434\" llmfit --cli\nOLLAMA_HOST=\"http://192.168.1.100:11434\" llmfit fit --perfect -n 5\n```\n\n适用场景：\n- 在一台机器上运行 llmfit，而 Ollama 在另一台机器上提供服务（例如 GPU 服务器 + 笔记本客户端）\n- 连接到在 Docker 容器中以自定义端口运行的 Ollama\n- 使用反向代理或负载均衡器后面的 Ollama\n\n### 工作原理\n\n启动时，llmfit 查询 `GET /api/tags` 列出已安装的 Ollama 模型。每个已安装的模型在 TUI 的 **Inst** 列显示绿色 **✓**。系统栏显示 `Ollama: ✓ (N installed)`。\n\n在模型上按 `d` 时，llmfit 向 Ollama 发送 `POST /api/pull` 来下载模型。该行会高亮显示并带有动画进度指示器，实时显示下载进度。下载完成后，模型可立即在 Ollama 中使用。\n\n如果 Ollama 未运行，Ollama 相关操作会被跳过；TUI 仍然支持其他可用的提供商（如 llama.cpp）。\n\n### llama.cpp 集成\n\nllmfit 与 [llama.cpp](https://github.com/ggml-org/llama.cpp) 集成，在 TUI 和 CLI 中均可作为运行时/下载提供商使用。\n\n要求：\n\n- `llama-cli` 或 `llama-server` 在 `PATH` 中可用（用于运行时检测）\n- 需要网络访问 Hugging Face 以下载 GGUF 文件\n\n工作原理：\n\n- llmfit 将 HF 模型映射到已知的 GGUF 仓库（带有启发式回退）\n- 将 GGUF 文件下载到本地 llama.cpp 模型缓存\n- 当本地存在匹配的 GGUF 文件时标记模型为已安装\n\n### Docker Model Runner 集成\n\nllmfit 与 [Docker Model Runner](https://docs.docker.com/desktop/features/model-runner/) 集成，这是 Docker Desktop 内置的模型服务功能。\n\n要求：\n\n- Docker Desktop 已启用 Model Runner\n- 默认端点：`http://localhost:12434`\n\n工作原理：\n\n- llmfit 查询 `GET /engines` 列出 Docker Model Runner 中可用的模型\n- 使用 Ollama 风格的标签映射将模型与 HF 数据库匹配（Docker Model Runner 使用 `ai/<tag>` 命名）\n- 在 TUI 中按 `d` 通过 `docker model pull` 拉取模型\n\n### 远程 Docker Model Runner 实例\n\n要连接到不同主机或端口的 Docker Model Runner，设置 `DOCKER_MODEL_RUNNER_HOST` 环境变量：\n\n```sh\nDOCKER_MODEL_RUNNER_HOST=\"http://192.168.1.100:12434\" llmfit\n```\n\n### LM Studio 集成\n\nllmfit 与 [LM Studio](https://lmstudio.ai) 集成，作为本地模型服务器，支持内置模型下载功能。\n\n要求：\n\n- LM Studio 必须运行且本地服务器已启用\n- 默认端点：`http://127.0.0.1:1234`\n\n工作原理：\n\n- llmfit 查询 `GET /v1/models` 列出 LM Studio 中可用的模型\n- 在 TUI 中按 `d` 通过 `POST /api/v1/models/download` 触发下载\n- 通过轮询 `GET /api/v1/models/download-status` 跟踪下载进度\n- LM Studio 直接接受 HuggingFace 模型名称，无需名称映射\n\n### 远程 LM Studio 实例\n\n要连接到不同主机或端口的 LM Studio，设置 `LMSTUDIO_HOST` 环境变量：\n\n```sh\nLMSTUDIO_HOST=\"http://192.168.1.100:1234\" llmfit\n```\n\n### 模型名称映射\n\nllmfit 的数据库使用 HuggingFace 模型名称（例如 `Qwen/Qwen2.5-Coder-14B-Instruct`），而 Ollama 使用自己的命名方案（例如 `qwen2.5-coder:14b`）。llmfit 维护了一个精确的映射表，确保安装检测和拉取操作解析到正确的模型。每个映射都是精确的 -- `qwen2.5-coder:14b` 映射到 Coder 模型，而不是基础的 `qwen2.5:14b`。\n\n---\n\n## 平台支持\n\n- **Linux** -- 完全支持。通过 `nvidia-smi`（NVIDIA）、`rocm-smi`（AMD）、sysfs/`lspci`（Intel Arc）和 `npu-smi`（Ascend）进行 GPU 检测。\n- **macOS (Apple Silicon)** -- 完全支持。通过 `system_profiler` 检测统一内存。VRAM = 系统 RAM（共享池）。模型通过 Metal GPU 加速运行。\n- **macOS (Intel)** -- RAM 和 CPU 检测正常。如果 `nvidia-smi` 可用，可检测独立 GPU。\n- **Windows** -- RAM 和 CPU 检测正常。如果安装了 `nvidia-smi`，可检测 NVIDIA GPU。\n- **Android / Termux / PRoot** -- CPU 和 RAM 检测通常正常，但目前不支持 GPU 自动检测。Adreno 等移动 GPU 通常无法通过 llmfit 使用的桌面/服务器探测接口访问。\n\n### GPU 支持\n\n| 厂商            | 检测方式                      | VRAM 报告             |\n|-----------------|-------------------------------|-----------------------|\n| NVIDIA          | `nvidia-smi`                  | 精确的独立 VRAM       |\n| AMD             | `rocm-smi`                    | 已检测（VRAM 可能未知） |\n| Intel Arc（独立） | sysfs (`mem_info_vram_total`) | 精确的独立 VRAM       |\n| Intel Arc（集成） | `lspci`                       | 共享系统内存          |\n| Apple Silicon   | `system_profiler`             | 统一内存（= 系统 RAM）  |\n| Ascend          | `npu-smi`                     | 已检测（VRAM 可能未知） |\n\n如果自动检测失败或报告的值不正确，使用 `--memory=<SIZE>` 覆盖（参见上方 [GPU 显存覆盖](#gpu-显存覆盖)）。\n\n### Android / Termux 说明\n\n在 **Termux + PRoot** 等 Android 环境中，llmfit 通常无法通过标准 Linux 检测路径（`nvidia-smi`、`rocm-smi`、DRM/sysfs、`lspci` 等）检测到移动 GPU。在这些环境中，\"未检测到 GPU\"是当前实现的预期行为。\n\n如果你仍希望在统一内存的手机或平板上获得 GPU 风格的推荐，可使用手动内存覆盖：\n\n```sh\nllmfit --memory=8G fit -n 20\nllmfit recommend --json --memory=8G --limit 10\n```\n\n这仅是推荐/评分的变通方案；不提供真正的 Android GPU 运行时检测。\n\n---\n\n## 贡献\n\n欢迎贡献，特别是添加新模型。\n\n### 添加模型\n\n1. 在 `scripts/scrape_hf_models.py` 的 `TARGET_MODELS` 列表中添加模型的 HuggingFace 仓库 ID（例如 `meta-llama/Llama-3.1-8B`）。\n2. 如果模型有访问限制（需要 HuggingFace 身份验证才能访问元数据），在同一脚本的 `FALLBACKS` 列表中添加包含参数量和上下文长度的回退条目。\n3. 运行自动更新脚本：\n   ```sh\n   make update-models\n   # 或: ./scripts/update_models.sh\n   ```\n4. 验证更新后的模型列表：`./target/release/llmfit list`\n5. 运行以下命令更新 [MODELS.md](MODELS.md)：`python3 << 'EOF' < scripts/...`（参见提交历史中的生成脚本）\n6. 提交 Pull Request。\n\n参见 [MODELS.md](MODELS.md) 查看当前列表，[AGENTS.md](AGENTS.md) 查看架构详情。\n\n---\n\n## OpenClaw 集成\n\nllmfit 作为 [OpenClaw](https://github.com/openclaw/openclaw) 技能提供，让 agent 能够推荐适合硬件的本地模型，并自动配置 Ollama/vLLM/LM Studio 提供商。\n\n### 安装技能\n\n```sh\n# 从 llmfit 仓库\n./scripts/install-openclaw-skill.sh\n\n# 或手动安装\ncp -r skills/llmfit-advisor ~/.openclaw/skills/\n```\n\n安装后，可以向 OpenClaw agent 提问：\n\n- \"我能运行哪些本地模型？\"\n- \"为我的硬件推荐一个编程模型\"\n- \"用最适合我 GPU 的模型配置 Ollama\"\n\nAgent 会在后台调用 `llmfit recommend --json`，解读结果，并提议用最优的模型选择配置你的 `openclaw.json`。\n\n### 工作原理\n\n该技能教会 OpenClaw agent：\n\n1. 通过 `llmfit --json system` 检测你的硬件\n2. 通过 `llmfit recommend --json` 获取排序后的推荐\n3. 将 HuggingFace 模型名称映射到 Ollama/vLLM/LM Studio 标签\n4. 配置 `openclaw.json` 中的 `models.providers.ollama.models`\n\n参见 [skills/llmfit-advisor/SKILL.md](skills/llmfit-advisor/SKILL.md) 查看完整技能定义。\n\n---\n\n## 替代方案\n\n如果你在寻找不同的方案，可以看看 [llm-checker](https://github.com/Pavelevich/llm-checker) -- 一个带有 Ollama 集成的 Node.js CLI 工具，可以直接拉取和基准测试模型。它采用更直接的方式，通过 Ollama 在你的硬件上实际运行模型，而不是从配置参数估算。如果你已安装 Ollama 并想测试真实性能，这是个不错的选择。注意它不支持 MoE（混合专家）架构 -- 所有模型都被视为密集模型，因此 Mixtral 或 DeepSeek-V3 等模型的内存估算将反映总参数量而非较小的活跃子集。\n\n---\n\n## 许可证\n\nMIT\n\n---\n\n*本文档由 [@JasonYeYuhe](https://github.com/JasonYeYuhe) 翻译并维护。如果您发现任何翻译问题或需要增加新特性说明，欢迎提交 Issue 或与我联系。*\n"
  },
  {
    "path": "data/hf_models.json",
    "content": "[\n  {\n    \"name\": \"echarlaix/tiny-random-PhiForCausalLM\",\n    \"provider\": \"echarlaix\",\n    \"parameter_count\": \"80K\",\n    \"parameters_raw\": 80074,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi\",\n    \"hf_downloads\": 24984,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-03-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-random-GPT2LMHeadModel\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"83K\",\n    \"parameters_raw\": 83161,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 37534,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-random-gpt2\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"112K\",\n    \"parameters_raw\": 111968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 28458,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-random-GPTJForCausalLM\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"129K\",\n    \"parameters_raw\": 129184,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gptj\",\n    \"hf_downloads\": 38953,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/Olmo-3-7B-Instruct\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"528K\",\n    \"parameters_raw\": 528384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 101787,\n    \"hf_likes\": 118,\n    \"release_date\": \"2025-11-19\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Olmo-3-7B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"allenai/Olmo-3-7B-Think\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"528K\",\n    \"parameters_raw\": 528384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 44414,\n    \"hf_likes\": 88,\n    \"release_date\": \"2025-11-18\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Olmo-3-7B-Think-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"allenai/Olmo-3-7B-Think-DPO\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"528K\",\n    \"parameters_raw\": 528384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 21555,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-11-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"MaxJeblick/llama2-0b-unit-test\",\n    \"provider\": \"maxjeblick\",\n    \"parameter_count\": \"771K\",\n    \"parameters_raw\": 770940,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1024,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 48409,\n    \"hf_likes\": 2,\n    \"release_date\": \"2023-10-25\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-random-OPTForCausalLM\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"812K\",\n    \"parameters_raw\": 812404,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 100,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"opt\",\n    \"hf_downloads\": 388627,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hmellor/tiny-random-LlamaForCausalLM\",\n    \"provider\": \"hmellor\",\n    \"parameter_count\": \"1M\",\n    \"parameters_raw\": 1062992,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1295572,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-dummy-qwen2\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"1M\",\n    \"parameters_raw\": 1217480,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 102441,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-07-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"SimpleStories/SimpleStories-1.25M\",\n    \"provider\": \"simplestories\",\n    \"parameter_count\": \"1M\",\n    \"parameters_raw\": 1245824,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 86406,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"optimum-intel-internal-testing/tiny-random-Phi3ForCausalLM\",\n    \"provider\": \"optimum-intel-internal-testing\",\n    \"parameter_count\": \"2M\",\n    \"parameters_raw\": 2072736,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 22058,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-10-21\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"llamafactory/tiny-random-qwen3\",\n    \"provider\": \"llamafactory\",\n    \"parameter_count\": \"2M\",\n    \"parameters_raw\": 2439264,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 47369,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-01-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"tiny-random/qwen3-next-moe\",\n    \"provider\": \"tiny-random\",\n    \"parameter_count\": \"3M\",\n    \"parameters_raw\": 2839160,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 27920,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-09-12\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 10,\n    \"active_parameters\": 984828,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"llamafactory/tiny-random-Llama-3\",\n    \"provider\": \"llamafactory\",\n    \"parameter_count\": \"4M\",\n    \"parameters_raw\": 4112464,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 950276,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-06-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Maykeye/TinyLLama-v0\",\n    \"provider\": \"maykeye\",\n    \"parameter_count\": \"5M\",\n    \"parameters_raw\": 4621392,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 32384,\n    \"hf_likes\": 43,\n    \"release_date\": \"2023-07-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"optimum-intel-internal-testing/tiny-random-gpt-oss-mxfp4\",\n    \"provider\": \"optimum-intel-internal-testing\",\n    \"parameter_count\": \"7M\",\n    \"parameters_raw\": 6865444,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 27904,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-10-21\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 1158540,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hmellor/tiny-random-Gemma2ForCausalLM\",\n    \"provider\": \"hmellor\",\n    \"parameter_count\": \"8M\",\n    \"parameters_raw\": 8438816,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 339841,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random\",\n    \"provider\": \"michaelbenayoun\",\n    \"parameter_count\": \"9M\",\n    \"parameters_raw\": 8537216,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 52387,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-03-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"tiiuae/falcon-mamba-tiny-dev\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"9M\",\n    \"parameters_raw\": 8765056,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon_mamba\",\n    \"hf_downloads\": 21730,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-10-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"arnir0/Tiny-LLM\",\n    \"provider\": \"arnir0\",\n    \"parameter_count\": \"13M\",\n    \"parameters_raw\": 12988992,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1024,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 54600,\n    \"hf_likes\": 45,\n    \"release_date\": \"2024-11-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-14m\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"14M\",\n    \"parameters_raw\": 14067712,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 33322,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hmellor/tiny-random-BambaForCausalLM\",\n    \"provider\": \"hmellor\",\n    \"parameter_count\": \"33M\",\n    \"parameters_raw\": 33110760,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bamba\",\n    \"hf_downloads\": 173798,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"erwanf/gpt2-mini\",\n    \"provider\": \"erwanf\",\n    \"parameter_count\": \"39M\",\n    \"parameters_raw\": 38604288,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 391187,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-06-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-14m-deduped\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"39M\",\n    \"parameters_raw\": 39233560,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 69404,\n    \"hf_likes\": 28,\n    \"release_date\": \"2023-07-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hyper-accel/tiny-random-llama\",\n    \"provider\": \"hyper-accel\",\n    \"parameter_count\": \"73M\",\n    \"parameters_raw\": 73271808,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 44649,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-02-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/SmolLM-135M-Instruct-quantized.w8a16\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"83M\",\n    \"parameters_raw\": 83356260,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 20835,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-08-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"tiiuae/Falcon-H1-Tiny-90M-Instruct\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"91M\",\n    \"parameters_raw\": 91131072,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon_h1\",\n    \"hf_downloads\": 301062,\n    \"hf_likes\": 33,\n    \"release_date\": \"2026-01-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-70m-deduped\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"96M\",\n    \"parameters_raw\": 95592496,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 613928,\n    \"hf_likes\": 27,\n    \"release_date\": \"2023-02-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"gratefulasi/lumeleto\",\n    \"provider\": \"gratefulasi\",\n    \"parameter_count\": \"124M\",\n    \"parameters_raw\": 124439808,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1024,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 47679,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/opt-125m\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"125M\",\n    \"parameters_raw\": 125239296,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"opt\",\n    \"hf_downloads\": 232784,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"state-spaces/mamba-130m-hf\",\n    \"provider\": \"state-spaces\",\n    \"parameter_count\": \"129M\",\n    \"parameters_raw\": 129135360,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mamba\",\n    \"hf_downloads\": 161407,\n    \"hf_likes\": 68,\n    \"release_date\": \"2024-03-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM2-135M\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"135M\",\n    \"parameters_raw\": 134515008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 954486,\n    \"hf_likes\": 168,\n    \"release_date\": \"2024-10-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM2-135M-Instruct\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"135M\",\n    \"parameters_raw\": 134515008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 603656,\n    \"hf_likes\": 295,\n    \"release_date\": \"2024-10-31\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/SmolLM2-135M-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/SmolLM2-135M-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM-135M-Instruct\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"135M\",\n    \"parameters_raw\": 134515008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 359214,\n    \"hf_likes\": 133,\n    \"release_date\": \"2024-07-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM-135M\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"135M\",\n    \"parameters_raw\": 134515008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 156129,\n    \"hf_likes\": 249,\n    \"release_date\": \"2024-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nomic-ai/nomic-embed-text-v1.5\",\n    \"provider\": \"Nomic\",\n    \"parameter_count\": \"137M\",\n    \"parameters_raw\": 137000000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"F16\",\n    \"context_length\": 8192,\n    \"use_case\": \"Text embeddings for RAG\",\n    \"pipeline_tag\": \"feature-extraction\",\n    \"architecture\": \"nomic_bert\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"EleutherAI/gpt-neo-125m\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"150M\",\n    \"parameters_raw\": 150364416,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neo\",\n    \"hf_downloads\": 100060,\n    \"hf_likes\": 227,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"JackFram/llama-160m\",\n    \"provider\": \"jackfram\",\n    \"parameter_count\": \"162M\",\n    \"parameters_raw\": 162417792,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 46025,\n    \"hf_likes\": 36,\n    \"release_date\": \"2023-05-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/DialoGPT-small\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"176M\",\n    \"parameters_raw\": 175620096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1024,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 58248,\n    \"hf_likes\": 143,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"183M\",\n    \"parameters_raw\": 182975232,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 441394,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-01-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"AI-Sweden-Models/gpt-sw3-126m\",\n    \"provider\": \"ai-sweden-models\",\n    \"parameter_count\": \"186M\",\n    \"parameters_raw\": 186112512,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 115269,\n    \"hf_likes\": 3,\n    \"release_date\": \"2022-12-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"rinna/japanese-gpt-neox-small\",\n    \"provider\": \"rinna\",\n    \"parameter_count\": \"204M\",\n    \"parameters_raw\": 203611008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 457560,\n    \"hf_likes\": 15,\n    \"release_date\": \"2022-08-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-160m-deduped\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"213M\",\n    \"parameters_raw\": 212654688,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 82245,\n    \"hf_likes\": 3,\n    \"release_date\": \"2023-02-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Vamsi/T5_Paraphrase_Paws\",\n    \"provider\": \"vamsi\",\n    \"parameter_count\": \"223M\",\n    \"parameters_raw\": 222903936,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5\",\n    \"hf_downloads\": 83813,\n    \"hf_likes\": 40,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"TitanML/tiny-mixtral\",\n    \"provider\": \"titanml\",\n    \"parameter_count\": \"247M\",\n    \"parameters_raw\": 246961152,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 100054,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-04-24\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 71001329,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Instruct-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"256M\",\n    \"parameters_raw\": 256113408,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 441834,\n    \"hf_likes\": 4,\n    \"release_date\": \"2026-01-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-1.7B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"269M\",\n    \"parameters_raw\": 268944384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 25290,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/t5gemma-s-s-prefixlm\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"313M\",\n    \"parameters_raw\": 312517632,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5gemma\",\n    \"hf_downloads\": 41131,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-06-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"329M\",\n    \"parameters_raw\": 329251584,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 449901,\n    \"hf_likes\": 2,\n    \"release_date\": \"2026-01-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-1.2B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"329M\",\n    \"parameters_raw\": 329251584,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 26421,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-ColBERT-350M\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"353M\",\n    \"parameters_raw\": 353322752,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Semantic search, sentence similarity\",\n    \"pipeline_tag\": \"sentence-similarity\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M\",\n    \"provider\": \"liquidai\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 41124,\n    \"hf_likes\": 235,\n    \"release_date\": \"2025-07-10\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2-350M-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM2-360M\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"362M\",\n    \"parameters_raw\": 361821120,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 36444,\n    \"hf_likes\": 87,\n    \"release_date\": \"2024-10-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M-Extract\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Data extraction, structured output\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M-Math\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Math reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M-ENJP-MT\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"English-Japanese translation\",\n    \"pipeline_tag\": \"translation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M-PII-Extract-JP\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"PII extraction, Japanese\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-350M-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"mlx-8bit\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-350M-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"BF16\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM-360M-Instruct\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"362M\",\n    \"parameters_raw\": 361821120,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 26935,\n    \"hf_likes\": 83,\n    \"release_date\": \"2024-07-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"openbmb/MiniCPM4-0.5B\",\n    \"provider\": \"openbmb\",\n    \"parameter_count\": \"434M\",\n    \"parameters_raw\": 433873920,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 28889,\n    \"hf_likes\": 77,\n    \"release_date\": \"2025-06-05\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-VL-450M\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"451M\",\n    \"parameters_raw\": 450822656,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-1.7B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"484M\",\n    \"parameters_raw\": 484000768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 28313,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-0.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 6992099,\n    \"hf_likes\": 470,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-0.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-0.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1408034,\n    \"hf_likes\": 65,\n    \"release_date\": \"2024-11-06\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-0.5B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-0.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-0.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1200041,\n    \"hf_likes\": 378,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2-0.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 259334,\n    \"hf_likes\": 200,\n    \"release_date\": \"2024-06-03\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2-0.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Gensyn/Qwen2.5-0.5B-Instruct\",\n    \"provider\": \"gensyn\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 106514,\n    \"hf_likes\": 33,\n    \"release_date\": \"2025-03-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-0.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-0.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 64868,\n    \"hf_likes\": 44,\n    \"release_date\": \"2024-11-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-0.5B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/pythia-410m\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"506M\",\n    \"parameters_raw\": 505997504,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 88847,\n    \"hf_likes\": 36,\n    \"release_date\": \"2023-02-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-410m-deduped\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"506M\",\n    \"parameters_raw\": 505997504,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 32196,\n    \"hf_likes\": 20,\n    \"release_date\": \"2023-02-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"h2oai/h2o-danube3-500m-chat\",\n    \"provider\": \"h2oai\",\n    \"parameter_count\": \"514M\",\n    \"parameters_raw\": 513590784,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 31122,\n    \"hf_likes\": 39,\n    \"release_date\": \"2024-07-04\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/h2o-danube3-500m-chat-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"tiiuae/Falcon-H1-0.5B-Base\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"521M\",\n    \"parameters_raw\": 521411104,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon_h1\",\n    \"hf_downloads\": 25562,\n    \"hf_likes\": 16,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Qwen3-30B-A3B-Instruct-2507-speculator.eagle3\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"522M\",\n    \"parameters_raw\": 522152832,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 115085,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-12-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"z-lab/Qwen3-4B-DFlash-b16\",\n    \"provider\": \"z-lab\",\n    \"parameter_count\": \"537M\",\n    \"parameters_raw\": 537427200,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 25679,\n    \"hf_likes\": 22,\n    \"release_date\": \"2026-01-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigscience/bloomz-560m\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"559M\",\n    \"parameters_raw\": 559214592,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 1303926,\n    \"hf_likes\": 137,\n    \"release_date\": \"2022-10-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigscience/bloom-560m\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"559M\",\n    \"parameters_raw\": 559214592,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 134778,\n    \"hf_likes\": 371,\n    \"release_date\": \"2022-05-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-MLX-4bit\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"566M\",\n    \"parameters_raw\": 565828096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 74343,\n    \"hf_likes\": 26,\n    \"release_date\": \"2025-05-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/t5gemma-b-b-ul2\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"591M\",\n    \"parameters_raw\": 591490560,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5gemma\",\n    \"hf_downloads\": 39788,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-06-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/t5gemma-b-b-prefixlm\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"591M\",\n    \"parameters_raw\": 591490560,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5gemma\",\n    \"hf_downloads\": 1187971,\n    \"hf_likes\": 13,\n    \"release_date\": \"2025-06-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Phi-4-mini-reasoning-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"600M\",\n    \"parameters_raw\": 599546880,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 43404,\n    \"hf_likes\": 3,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-0.5B-Chat\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"620M\",\n    \"parameters_raw\": 619570176,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 87380,\n    \"hf_likes\": 92,\n    \"release_date\": \"2024-01-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-0.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"620M\",\n    \"parameters_raw\": 619570176,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 26651,\n    \"hf_likes\": 173,\n    \"release_date\": \"2024-01-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"629M\",\n    \"parameters_raw\": 628676096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 95794,\n    \"hf_likes\": 10,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Instruct-2507-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"629M\",\n    \"parameters_raw\": 628676096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 66279,\n    \"hf_likes\": 3,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"629M\",\n    \"parameters_raw\": 628676096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 21982,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-700M\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"742M\",\n    \"parameters_raw\": 742489344,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-700M-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"742M\",\n    \"parameters_raw\": 742489344,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"mlx-8bit\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-700M-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"742M\",\n    \"parameters_raw\": 742489344,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.5,\n    \"quantization\": \"BF16\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3-0.6B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"752M\",\n    \"parameters_raw\": 751632384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 11310453,\n    \"hf_likes\": 1120,\n    \"release_date\": \"2025-04-27\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-0.6B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3Guard-Gen-0.6B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"752M\",\n    \"parameters_raw\": 751632384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 146728,\n    \"hf_likes\": 62,\n    \"release_date\": \"2025-09-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-0.6B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"752M\",\n    \"parameters_raw\": 751659264,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 1648717,\n    \"hf_likes\": 57,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Instruct-2507-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"754M\",\n    \"parameters_raw\": 754372096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 62740,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"h2oai/h2ovl-mississippi-800m\",\n    \"provider\": \"h2oai\",\n    \"parameter_count\": \"826M\",\n    \"parameters_raw\": 826295808,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"h2ovl_chat\",\n    \"hf_downloads\": 1014882,\n    \"hf_likes\": 39,\n    \"release_date\": \"2024-10-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-0.8B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"873M\",\n    \"parameters_raw\": 873438784,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 93448,\n    \"hf_likes\": 208,\n    \"release_date\": \"2026-02-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-0.8B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-0.8B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"873M\",\n    \"parameters_raw\": 873438784,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 4680,\n    \"hf_likes\": 37,\n    \"release_date\": \"2026-02-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"880M\",\n    \"parameters_raw\": 880068096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 91703,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Instruct-2507-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"880M\",\n    \"parameters_raw\": 880068096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 62883,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Joaoffg/ELM\",\n    \"provider\": \"joaoffg\",\n    \"parameter_count\": \"903M\",\n    \"parameters_raw\": 902891520,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 339775,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-05-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Qwen3-8B-speculator.eagle3\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.0B\",\n    \"parameters_raw\": 1022037632,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 76636,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-09-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-1b\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1078891008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 27818,\n    \"hf_likes\": 43,\n    \"release_date\": \"2023-03-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"TinyLlama/TinyLlama-1.1B-Chat-v1.0\",\n    \"provider\": \"Community\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1100048384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1870099,\n    \"hf_likes\": 1538,\n    \"release_date\": \"2023-12-30\"\n  },\n  {\n    \"name\": \"nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change\",\n    \"provider\": \"nm-testing\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1100048692,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 31348,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-06-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigcode/gpt_bigcode-santacoder\",\n    \"provider\": \"BigCode\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1124886528,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_bigcode\",\n    \"hf_downloads\": 49973,\n    \"hf_likes\": 26,\n    \"release_date\": \"2023-04-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1131460096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 93477,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Instruct-2507-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1131460096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 63832,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-1.2B-Instruct\",\n    \"provider\": \"liquidai\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 116655,\n    \"hf_likes\": 516,\n    \"release_date\": \"2026-01-06\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2.5-1.2B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-1.2B-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 26071,\n    \"hf_likes\": 6,\n    \"release_date\": \"2025-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-1.2B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-1.2B-Base\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-1.2B-Thinking\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-1.2B-JP\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Japanese language, multilingual chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-1.2B-Tool\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Tool calling, function calling\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-1.2B-RAG\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Retrieval-augmented generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-1.2B-Extract\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Data extraction, structured output\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Thinking-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.2,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"mlx-8bit\",\n    \"context_length\": 128000,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Thinking-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.4,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"BF16\",\n    \"context_length\": 128000,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"allenai/OLMo-1B-hf\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1176764416,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo\",\n    \"hf_downloads\": 23538,\n    \"hf_likes\": 26,\n    \"release_date\": \"2024-04-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Zyphra/Zamba2-1.2B-instruct\",\n    \"provider\": \"zyphra\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1215064704,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"zamba2\",\n    \"hf_downloads\": 72584,\n    \"hf_likes\": 30,\n    \"release_date\": \"2024-09-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.2-1B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1235814400,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1453836,\n    \"hf_likes\": 2306,\n    \"release_date\": \"2024-09-18\"\n  },\n  {\n    \"name\": \"hmellor/Ilama-3.2-1B\",\n    \"provider\": \"hmellor\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1235814400,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ilama\",\n    \"hf_downloads\": 89998,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"warshanks/Jan-nano-AWQ\",\n    \"provider\": \"warshanks\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1264206840,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 99084,\n    \"hf_likes\": 3,\n    \"release_date\": \"2025-07-12\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"LGAI-EXAONE/EXAONE-4.0-1.2B\",\n    \"provider\": \"lgai-exaone\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1279391488,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone4\",\n    \"hf_downloads\": 100975,\n    \"hf_likes\": 172,\n    \"release_date\": \"2025-07-11\"\n  },\n  {\n    \"name\": \"lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1280062464,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 348365,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-05-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-8B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1280062464,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 39201,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"pfnet/plamo-2-1b\",\n    \"provider\": \"pfnet\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1291441920,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 10485760,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"plamo2\",\n    \"hf_downloads\": 63725,\n    \"hf_likes\": 38,\n    \"release_date\": \"2025-02-05\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/gpt-neo-1.3B\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"1.4B\",\n    \"parameters_raw\": 1365907456,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neo\",\n    \"hf_downloads\": 48440,\n    \"hf_likes\": 324,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/phi-1_5\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"1.4B\",\n    \"parameters_raw\": 1418270720,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi\",\n    \"hf_downloads\": 152337,\n    \"hf_likes\": 1355,\n    \"release_date\": \"2023-09-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"starvector/starvector-1b-im2svg\",\n    \"provider\": \"starvector\",\n    \"parameter_count\": \"1.4B\",\n    \"parameters_raw\": 1434095620,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starvector\",\n    \"hf_downloads\": 38196,\n    \"hf_likes\": 184,\n    \"release_date\": \"2025-01-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMo-2-0425-1B\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1484916736,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo2\",\n    \"hf_downloads\": 533223,\n    \"hf_likes\": 70,\n    \"release_date\": \"2025-04-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMo-2-0425-1B-Instruct\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1484916736,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo2\",\n    \"hf_downloads\": 38389,\n    \"hf_likes\": 56,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/OLMo-2-0425-1B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/Llama-3.2-1B-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1498482912,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 814349,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-09-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1498859520,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1823969,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-09-25\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-Audio-1.5B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1500000000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Speech-to-speech, ASR, TTS\",\n    \"pipeline_tag\": \"audio-to-audio\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-Audio-1.5B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1500000000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Speech-to-speech, ASR, TTS\",\n    \"pipeline_tag\": \"audio-to-audio\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"EleutherAI/pythia-1.4b\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1515311488,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 27804,\n    \"hf_likes\": 26,\n    \"release_date\": \"2023-02-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-1.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1789513,\n    \"hf_likes\": 107,\n    \"release_date\": \"2024-09-18\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-1.5B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-1.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-1.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 7037921,\n    \"hf_likes\": 627,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-1.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2-1.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 3508972,\n    \"hf_likes\": 161,\n    \"release_date\": \"2024-06-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Math-1.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1064952,\n    \"hf_likes\": 102,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-1.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 431369,\n    \"hf_likes\": 166,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2-1.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 114016,\n    \"hf_likes\": 99,\n    \"release_date\": \"2024-05-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Math-1.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 80310,\n    \"hf_likes\": 54,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Math-1.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/Qwen2-1.5B-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 24030,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-06-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"KiteFishAI/Minnow-Math-1.5B\",\n    \"provider\": \"kitefishai\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1633781760,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 147620,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-VL-1.6B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1584804000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-VL-1.6B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1596625904,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-VL-1.6B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1596625904,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"mlx-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-VL-1.6B-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1596625904,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.2,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"mlx-6bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-VL-1.6B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1596625904,\n    \"min_ram_gb\": 1.8,\n    \"recommended_ram_gb\": 3.0,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"mlx-8bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"stabilityai/stablelm-2-1_6b-chat\",\n    \"provider\": \"Stability AI\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1644515328,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"stablelm\",\n    \"hf_downloads\": 955,\n    \"hf_likes\": 34,\n    \"release_date\": \"2024-04-08\"\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM-1.7B\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1711376384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 63387,\n    \"hf_likes\": 180,\n    \"release_date\": \"2024-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM2-1.7B\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1711376384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 25638,\n    \"hf_likes\": 144,\n    \"release_date\": \"2024-10-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/Nanbeige4.1-3B-AWQ-8bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1717865408,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"AWQ-8bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 49220,\n    \"hf_likes\": 2,\n    \"release_date\": \"2026-02-15\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3-1.7B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1720574976,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 295900,\n    \"hf_likes\": 64,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-1.7B-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1720574976,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 24714,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigscience/bloom-1b7\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1722408960,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 38813,\n    \"hf_likes\": 122,\n    \"release_date\": \"2022-05-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-1.5B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777088000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 727989,\n    \"hf_likes\": 6,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777088000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 164152,\n    \"hf_likes\": 4,\n    \"release_date\": \"2024-09-20\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2-1.5B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777088000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 24850,\n    \"hf_likes\": 9,\n    \"release_date\": \"2024-06-06\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777675776,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 24724,\n    \"hf_likes\": 5,\n    \"release_date\": \"2024-06-06\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"RedHatAI/Qwen2.5-1.5B-quantized.w8a8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777733120,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1091974,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-10-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-1.8B-Chat\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1836828672,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 72445,\n    \"hf_likes\": 73,\n    \"release_date\": \"2024-01-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"jonathanli/induction-vl2-mdl-fswd7-20000-720p-proj-256-var\",\n    \"provider\": \"jonathanli\",\n    \"parameter_count\": \"1.9B\",\n    \"parameters_raw\": 1940015872,\n    \"min_ram_gb\": 1.1,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"induction_vl2\",\n    \"hf_downloads\": 24886,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/granite-4.0-h-tiny-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"2.0B\",\n    \"parameters_raw\": 1997098800,\n    \"min_ram_gb\": 1.1,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"granitemoehybrid\",\n    \"hf_downloads\": 63040,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-10-13\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 277721550,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3-1.7B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"2.0B\",\n    \"parameters_raw\": 2031825920,\n    \"min_ram_gb\": 1.1,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 47050,\n    \"hf_likes\": 35,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"h2oai/h2ovl-mississippi-2b\",\n    \"provider\": \"h2oai\",\n    \"parameter_count\": \"2.2B\",\n    \"parameters_raw\": 2152317440,\n    \"min_ram_gb\": 1.2,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"h2ovl_chat\",\n    \"hf_downloads\": 1007240,\n    \"hf_likes\": 42,\n    \"release_date\": \"2024-10-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"warshanks/Qwen3-8B-abliterated-AWQ\",\n    \"provider\": \"warshanks\",\n    \"parameter_count\": \"2.2B\",\n    \"parameters_raw\": 2174236152,\n    \"min_ram_gb\": 1.2,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 25559,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-27\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-2B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2274069824,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 46974,\n    \"hf_likes\": 115,\n    \"release_date\": \"2026-02-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-2B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-2B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2274069824,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 3336,\n    \"hf_likes\": 33,\n    \"release_date\": \"2026-02-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/Phi-4-reasoning-plus-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2290897920,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 28622,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2303865856,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 333300,\n    \"hf_likes\": 13,\n    \"release_date\": \"2025-05-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-8B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2303865856,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 37222,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-14B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2307906560,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 46163,\n    \"hf_likes\": 5,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen2.5-Coder-14B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2308527104,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 92774,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-11-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-1.1-2b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"2.5B\",\n    \"parameters_raw\": 2506172416,\n    \"min_ram_gb\": 1.4,\n    \"recommended_ram_gb\": 2.3,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma\",\n    \"hf_downloads\": 66616,\n    \"hf_likes\": 171,\n    \"release_date\": \"2024-03-26\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-1.1-2b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-2.6B\",\n    \"provider\": \"liquidai\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2569272320,\n    \"min_ram_gb\": 1.4,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 25773,\n    \"hf_likes\": 180,\n    \"release_date\": \"2025-09-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-2.6B-Exp\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2569272320,\n    \"min_ram_gb\": 1.4,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, math, knowledge\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-2.6B-Transcript\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2569272320,\n    \"min_ram_gb\": 1.4,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Meeting transcription, summarization\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"google/gemma-2-2b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2614341376,\n    \"min_ram_gb\": 1.5,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-2-2b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Efficient-Large-Model/gemma-2-2b-it\",\n    \"provider\": \"efficient-large-model\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2614341888,\n    \"min_ram_gb\": 1.5,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 50419,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-12-12\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-2-2b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/gpt-neo-2.7B\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"2.7B\",\n    \"parameters_raw\": 2718416384,\n    \"min_ram_gb\": 1.5,\n    \"recommended_ram_gb\": 2.5,\n    \"min_vram_gb\": 1.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neo\",\n    \"hf_downloads\": 23217,\n    \"hf_likes\": 501,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/phi-2\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"2.8B\",\n    \"parameters_raw\": 2779683840,\n    \"min_ram_gb\": 1.6,\n    \"recommended_ram_gb\": 2.6,\n    \"min_vram_gb\": 1.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi\",\n    \"hf_downloads\": 1651432,\n    \"hf_likes\": 3429,\n    \"release_date\": \"2023-12-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"stabilityai/stablelm-3b-4e1t\",\n    \"provider\": \"Stability AI\",\n    \"parameter_count\": \"2.8B\",\n    \"parameters_raw\": 2795443200,\n    \"min_ram_gb\": 1.6,\n    \"recommended_ram_gb\": 2.6,\n    \"min_vram_gb\": 1.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"stablelm\",\n    \"hf_downloads\": 24407,\n    \"hf_likes\": 312,\n    \"release_date\": \"2023-09-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM3-3B\",\n    \"provider\": \"HuggingFace\",\n    \"parameter_count\": \"3B\",\n    \"parameters_raw\": 3000000000,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, multilingual reasoning\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"smollm\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-08\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/SmolLM3-3B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-VL-3B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"3.0B\",\n    \"parameters_raw\": 2998975216,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"bigscience/bloom-3b\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"3.0B\",\n    \"parameters_raw\": 3002557440,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 30567,\n    \"hf_likes\": 94,\n    \"release_date\": \"2022-05-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigcode/starcoder2-3b\",\n    \"provider\": \"BigCode\",\n    \"parameter_count\": \"3.0B\",\n    \"parameters_raw\": 3030371328,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder2\",\n    \"hf_downloads\": 97310,\n    \"hf_likes\": 216,\n    \"release_date\": \"2023-11-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"TechxGenus/gemma-1.1-2b-it-GPTQ\",\n    \"provider\": \"techxgenus\",\n    \"parameter_count\": \"3.0B\",\n    \"parameters_raw\": 3031170048,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma\",\n    \"hf_downloads\": 20793,\n    \"hf_likes\": 1,\n    \"release_date\": \"2024-04-07\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-3B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 6598470,\n    \"hf_likes\": 409,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-3B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-3B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 297679,\n    \"hf_likes\": 172,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-3B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-3B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 126989,\n    \"hf_likes\": 96,\n    \"release_date\": \"2024-11-06\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-3B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-3B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Salesforce/xLAM-2-3b-fc-r\",\n    \"provider\": \"salesforce\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 44516,\n    \"hf_likes\": 16,\n    \"release_date\": \"2025-03-27\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-3B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 42540,\n    \"hf_likes\": 40,\n    \"release_date\": \"2024-11-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-3B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.2-3B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"3.2B\",\n    \"parameters_raw\": 3212749824,\n    \"min_ram_gb\": 1.8,\n    \"recommended_ram_gb\": 3.0,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1409393,\n    \"hf_likes\": 702,\n    \"release_date\": \"2024-09-18\"\n  },\n  {\n    \"name\": \"ibm-research/PowerMoE-3b\",\n    \"provider\": \"ibm-research\",\n    \"parameter_count\": \"3.4B\",\n    \"parameters_raw\": 3374286336,\n    \"min_ram_gb\": 1.9,\n    \"recommended_ram_gb\": 3.1,\n    \"min_vram_gb\": 1.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"granitemoe\",\n    \"hf_downloads\": 399266,\n    \"hf_likes\": 17,\n    \"release_date\": \"2024-08-14\",\n    \"is_moe\": true,\n    \"num_experts\": 40,\n    \"active_experts\": 8,\n    \"active_parameters\": 809828716,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-3B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.4B\",\n    \"parameters_raw\": 3397103616,\n    \"min_ram_gb\": 1.9,\n    \"recommended_ram_gb\": 3.2,\n    \"min_vram_gb\": 1.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 38262,\n    \"hf_likes\": 16,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-3B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.4B\",\n    \"parameters_raw\": 3397103616,\n    \"min_ram_gb\": 1.9,\n    \"recommended_ram_gb\": 3.2,\n    \"min_vram_gb\": 1.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 21964,\n    \"hf_likes\": 5,\n    \"release_date\": \"2024-11-09\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"ibm-granite/granite-3b-code-base-2k\",\n    \"provider\": \"ibm-granite\",\n    \"parameter_count\": \"3.5B\",\n    \"parameters_raw\": 3482503680,\n    \"min_ram_gb\": 1.9,\n    \"recommended_ram_gb\": 3.2,\n    \"min_vram_gb\": 1.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 73193,\n    \"hf_likes\": 37,\n    \"release_date\": \"2024-04-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"ibm-research/PowerLM-3b\",\n    \"provider\": \"ibm-research\",\n    \"parameter_count\": \"3.5B\",\n    \"parameters_raw\": 3512017152,\n    \"min_ram_gb\": 2.0,\n    \"recommended_ram_gb\": 3.3,\n    \"min_vram_gb\": 1.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"granite\",\n    \"hf_downloads\": 30013,\n    \"hf_likes\": 20,\n    \"release_date\": \"2024-08-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-VL-3B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3754622976,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.5,\n    \"min_vram_gb\": 1.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen2_5_vl\",\n    \"hf_downloads\": 2621650,\n    \"hf_likes\": 623,\n    \"release_date\": \"2025-01-26\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-VL-3B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/Phi-tiny-MoE-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3755220288,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.5,\n    \"min_vram_gb\": 1.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phimoe\",\n    \"hf_downloads\": 310211,\n    \"hf_likes\": 31,\n    \"release_date\": \"2025-06-23\",\n    \"is_moe\": true,\n    \"num_experts\": 16,\n    \"active_experts\": 2,\n    \"active_parameters\": 633693422,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"llm-jp/llm-jp-3-3.7b-instruct\",\n    \"provider\": \"llm-jp\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3782913024,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.5,\n    \"min_vram_gb\": 1.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 810462,\n    \"hf_likes\": 13,\n    \"release_date\": \"2024-09-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/Phi-4-mini-reasoning\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3800000000,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.5,\n    \"min_vram_gb\": 1.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Lightweight reasoning\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi4\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-01\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Phi-4-mini-reasoning-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/phi-3-mini-4k-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821000000,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/phi-3-mini-4k-instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/Phi-3.5-mini-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821000000,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, long context\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Phi-3.5-mini-instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zstanjj/HTML-Pruner-Phi-3.8B\",\n    \"provider\": \"zstanjj\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821079552,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 88805,\n    \"hf_likes\": 18,\n    \"release_date\": \"2024-10-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Sreenington/Phi-3-mini-4k-instruct-AWQ\",\n    \"provider\": \"sreenington\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821079552,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 40949,\n    \"hf_likes\": 5,\n    \"release_date\": \"2024-05-05\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"numind/NuExtract-1.5\",\n    \"provider\": \"numind\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821079552,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 31247,\n    \"hf_likes\": 243,\n    \"release_date\": \"2024-09-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"kaitchup/Phi-3-mini-4k-instruct-gptq-4bit\",\n    \"provider\": \"kaitchup\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3822095360,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 881144,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-04-25\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Nanbeige/Nanbeige4.1-3B\",\n    \"provider\": \"nanbeige\",\n    \"parameter_count\": \"3.9B\",\n    \"parameters_raw\": 3933637120,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 417673,\n    \"hf_likes\": 941,\n    \"release_date\": \"2026-02-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-3n-E2B-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"4B\",\n    \"parameters_raw\": 4000000000,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, on-device (effective 2B)\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"gemma3n\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-06-25\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gemma-3n-E2B-it-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.0B\",\n    \"parameters_raw\": 4022468096,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 548989,\n    \"hf_likes\": 81,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.0B\",\n    \"parameters_raw\": 4022468096,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 344398,\n    \"hf_likes\": 25,\n    \"release_date\": \"2025-05-05\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"typhoon-ai/typhoon2.5-qwen3-4b\",\n    \"provider\": \"typhoon-ai\",\n    \"parameter_count\": \"4.0B\",\n    \"parameters_raw\": 4022468096,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 51135,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-09-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"JunHowie/Qwen3-4B-Instruct-2507-GPTQ-Int4\",\n    \"provider\": \"junhowie\",\n    \"parameter_count\": \"4.0B\",\n    \"parameters_raw\": 4022468096,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 36817,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-09-01\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"TIGER-Lab/VLM2Vec-Full\",\n    \"provider\": \"tiger-lab\",\n    \"parameter_count\": \"4.1B\",\n    \"parameters_raw\": 4146621440,\n    \"min_ram_gb\": 2.3,\n    \"recommended_ram_gb\": 3.9,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3_v\",\n    \"hf_downloads\": 64160,\n    \"hf_likes\": 28,\n    \"release_date\": \"2024-10-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-14B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"4.2B\",\n    \"parameters_raw\": 4153891840,\n    \"min_ram_gb\": 2.3,\n    \"recommended_ram_gb\": 3.9,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 42084,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen2.5-Coder-14B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"4.2B\",\n    \"parameters_raw\": 4154676224,\n    \"min_ram_gb\": 2.3,\n    \"recommended_ram_gb\": 3.9,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 82050,\n    \"hf_likes\": 1,\n    \"release_date\": \"2024-11-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-SafeRL\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.4B\",\n    \"parameters_raw\": 4411424256,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.1,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 53732,\n    \"hf_likes\": 41,\n    \"release_date\": \"2025-09-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-Instruct-2507-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.4B\",\n    \"parameters_raw\": 4411646016,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.1,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 507765,\n    \"hf_likes\": 69,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.4B\",\n    \"parameters_raw\": 4411646016,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.1,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 250469,\n    \"hf_likes\": 38,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/Nemotron-H-4B-Base-8K\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"4.5B\",\n    \"parameters_raw\": 4489223040,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.2,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 40602,\n    \"hf_likes\": 5,\n    \"release_date\": \"2025-03-20\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/Nemotron-H-4B-Instruct-128K\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"4.5B\",\n    \"parameters_raw\": 4489223040,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.2,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 38647,\n    \"hf_likes\": 8,\n    \"release_date\": \"2025-04-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ\",\n    \"provider\": \"stelterlab\",\n    \"parameter_count\": \"4.6B\",\n    \"parameters_raw\": 4605856128,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.3,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 63349,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 503765510,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-4B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.7B\",\n    \"parameters_raw\": 4659865088,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.3,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 99087,\n    \"hf_likes\": 202,\n    \"release_date\": \"2026-02-27\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-4B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-4B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.7B\",\n    \"parameters_raw\": 4659865088,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.3,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 3593,\n    \"hf_likes\": 38,\n    \"release_date\": \"2026-02-27\"\n  },\n  {\n    \"name\": \"nvidia/Qwen3-8B-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"4.7B\",\n    \"parameters_raw\": 4717851648,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.4,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 32743,\n    \"hf_likes\": 14,\n    \"release_date\": \"2025-09-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"speakleash/Bielik-4.5B-v3.0-Instruct\",\n    \"provider\": \"speakleash\",\n    \"parameter_count\": \"4.8B\",\n    \"parameters_raw\": 4757260288,\n    \"min_ram_gb\": 2.7,\n    \"recommended_ram_gb\": 4.4,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 43008,\n    \"hf_likes\": 27,\n    \"release_date\": \"2025-04-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"XLabs-AI/xflux_text_encoders\",\n    \"provider\": \"xlabs-ai\",\n    \"parameter_count\": \"4.8B\",\n    \"parameters_raw\": 4762310656,\n    \"min_ram_gb\": 2.7,\n    \"recommended_ram_gb\": 4.4,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5\",\n    \"hf_downloads\": 162123,\n    \"hf_likes\": 21,\n    \"release_date\": \"2024-08-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ\",\n    \"provider\": \"stelterlab\",\n    \"parameter_count\": \"5.1B\",\n    \"parameters_raw\": 5053827112,\n    \"min_ram_gb\": 2.8,\n    \"recommended_ram_gb\": 4.7,\n    \"min_vram_gb\": 2.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 38947,\n    \"hf_likes\": 4,\n    \"release_date\": \"2026-01-31\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-32B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"5.1B\",\n    \"parameters_raw\": 5119652864,\n    \"min_ram_gb\": 2.9,\n    \"recommended_ram_gb\": 4.8,\n    \"min_vram_gb\": 2.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 26287,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen2.5-Coder-32B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"5.1B\",\n    \"parameters_raw\": 5120300032,\n    \"min_ram_gb\": 2.9,\n    \"recommended_ram_gb\": 4.8,\n    \"min_vram_gb\": 2.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 44413,\n    \"hf_likes\": 6,\n    \"release_date\": \"2024-11-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/QwQ-32B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"5.1B\",\n    \"parameters_raw\": 5120300032,\n    \"min_ram_gb\": 2.9,\n    \"recommended_ram_gb\": 4.8,\n    \"min_vram_gb\": 2.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 32595,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-03-05\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"5.3B\",\n    \"parameters_raw\": 5306567040,\n    \"min_ram_gb\": 3.0,\n    \"recommended_ram_gb\": 4.9,\n    \"min_vram_gb\": 2.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 135548,\n    \"hf_likes\": 40,\n    \"release_date\": \"2025-08-01\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 580405768,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"5.3B\",\n    \"parameters_raw\": 5306567040,\n    \"min_ram_gb\": 3.0,\n    \"recommended_ram_gb\": 4.9,\n    \"min_vram_gb\": 2.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 85989,\n    \"hf_likes\": 30,\n    \"release_date\": \"2025-07-29\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 580405768,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/MiroThinker-v1.5-30B-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"5.3B\",\n    \"parameters_raw\": 5306567040,\n    \"min_ram_gb\": 3.0,\n    \"recommended_ram_gb\": 4.9,\n    \"min_vram_gb\": 2.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 20465,\n    \"hf_likes\": 3,\n    \"release_date\": \"2026-01-06\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 580405768,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"01-ai/Yi-6B-Chat\",\n    \"provider\": \"01.ai\",\n    \"parameter_count\": \"6.1B\",\n    \"parameters_raw\": 6061035520,\n    \"min_ram_gb\": 3.4,\n    \"recommended_ram_gb\": 5.6,\n    \"min_vram_gb\": 3.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 15481,\n    \"hf_likes\": 70,\n    \"release_date\": \"2023-11-22\"\n  },\n  {\n    \"name\": \"arcee-ai/Trinity-Nano-Preview\",\n    \"provider\": \"arcee-ai\",\n    \"parameter_count\": \"6.1B\",\n    \"parameters_raw\": 6120003328,\n    \"min_ram_gb\": 3.4,\n    \"recommended_ram_gb\": 5.7,\n    \"min_vram_gb\": 3.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"afmoe\",\n    \"hf_downloads\": 22294,\n    \"hf_likes\": 67,\n    \"release_date\": \"2025-12-01\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 669375358,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/GLM-4.7-Flash-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"6.4B\",\n    \"parameters_raw\": 6407095318,\n    \"min_ram_gb\": 3.6,\n    \"recommended_ram_gb\": 6.0,\n    \"min_vram_gb\": 3.3,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe_lite\",\n    \"hf_downloads\": 217691,\n    \"hf_likes\": 46,\n    \"release_date\": \"2026-01-19\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmsys/vicuna-7b-v1.5\",\n    \"provider\": \"LMSYS\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 6738415616,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"tartuNLP/Llammas-base-p1-GPT-4o-human-error-mix-paragraph-GEC\",\n    \"provider\": \"tartunlp\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738415616,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 36045,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-02-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-2-7b-hf\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738417664,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 617643,\n    \"hf_likes\": 2272,\n    \"release_date\": \"2023-07-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"huggyllama/llama-7b\",\n    \"provider\": \"huggyllama\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738417664,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 103505,\n    \"hf_likes\": 354,\n    \"release_date\": \"2023-04-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"NousResearch/Llama-2-7b-hf\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738417664,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 81336,\n    \"hf_likes\": 171,\n    \"release_date\": \"2023-07-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"NousResearch/Llama-2-7b-chat-hf\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738417664,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 20573,\n    \"hf_likes\": 194,\n    \"release_date\": \"2023-07-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/CodeLlama-7b-Instruct-hf\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738546688,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 5404,\n    \"hf_likes\": 59,\n    \"release_date\": \"2024-03-13\"\n  },\n  {\n    \"name\": \"codellama/CodeLlama-7b-Instruct-hf\",\n    \"provider\": \"codellama\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738546688,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 65896,\n    \"hf_likes\": 254,\n    \"release_date\": \"2023-08-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"codellama/CodeLlama-7b-hf\",\n    \"provider\": \"codellama\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738546688,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 54518,\n    \"hf_likes\": 375,\n    \"release_date\": \"2023-08-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/deepseek-coder-6.7b-instruct\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6740512768,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 97176,\n    \"hf_likes\": 478,\n    \"release_date\": \"2023-10-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/deepseek-coder-6.7b-base\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6740512768,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 28134,\n    \"hf_likes\": 122,\n    \"release_date\": \"2023-10-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMoE-1B-7B-0125\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"6.9B\",\n    \"parameters_raw\": 6919161856,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.4,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmoe\",\n    \"hf_downloads\": 42434,\n    \"hf_likes\": 35,\n    \"release_date\": \"2025-01-21\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 8,\n    \"active_parameters\": 1167608556,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMoE-1B-7B-0125-Instruct\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"6.9B\",\n    \"parameters_raw\": 6919161856,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.4,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmoe\",\n    \"hf_downloads\": 35624,\n    \"hf_likes\": 58,\n    \"release_date\": \"2025-01-27\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 8,\n    \"active_parameters\": 1167608556,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-6.9b\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 6991520256,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.5,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 20516,\n    \"hf_likes\": 59,\n    \"release_date\": \"2023-02-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"openchat/openchat-3.5-0106\",\n    \"provider\": \"OpenChat\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 7000000000,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.5,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"XiaomiMiMo/MiMo-7B-RL\",\n    \"provider\": \"Xiaomi\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 7000000000,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.5,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Advanced reasoning, math and code\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mimo\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-05-01\"\n  },\n  {\n    \"name\": \"microsoft/Orca-2-7b\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 7016400896,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.5,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Reasoning, step-by-step solutions\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"omni-research/Tarsier-7b\",\n    \"provider\": \"omni-research\",\n    \"parameter_count\": \"7.1B\",\n    \"parameters_raw\": 7063427072,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.6,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llava\",\n    \"hf_downloads\": 49581,\n    \"hf_likes\": 25,\n    \"release_date\": \"2024-07-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigcode/starcoder2-7b\",\n    \"provider\": \"BigCode\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7173923840,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder2\",\n    \"hf_downloads\": 19199,\n    \"hf_likes\": 208,\n    \"release_date\": \"2024-02-20\"\n  },\n  {\n    \"name\": \"tiiuae/falcon-7b-instruct\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7217189760,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon\",\n    \"hf_downloads\": 47656,\n    \"hf_likes\": 1031,\n    \"release_date\": \"2023-04-25\"\n  },\n  {\n    \"name\": \"HuggingFaceH4/zephyr-7b-beta\",\n    \"provider\": \"HuggingFace\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 107437,\n    \"hf_likes\": 1834,\n    \"release_date\": \"2023-10-26\"\n  },\n  {\n    \"name\": \"mistralai/Mistral-7B-Instruct-v0.2\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 2920309,\n    \"hf_likes\": 3088,\n    \"release_date\": \"2023-12-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"speakleash/Bielik-7B-Instruct-v0.1\",\n    \"provider\": \"speakleash\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 101914,\n    \"hf_likes\": 63,\n    \"release_date\": \"2024-03-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"prometheus-eval/prometheus-7b-v2.0\",\n    \"provider\": \"prometheus-eval\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 54661,\n    \"hf_likes\": 100,\n    \"release_date\": \"2024-02-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Salesforce/xLAM-7b-r\",\n    \"provider\": \"salesforce\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 38045,\n    \"hf_likes\": 32,\n    \"release_date\": \"2024-08-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/xLAM-7b-r-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Intel/neural-chat-7b-v3-3\",\n    \"provider\": \"intel\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 27068,\n    \"hf_likes\": 80,\n    \"release_date\": \"2023-12-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Featherless-Chat-Models/Mistral-7B-Instruct-v0.2\",\n    \"provider\": \"featherless-chat-models\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 26186,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-05-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"augmxnt/shisa-gamma-7b-v1\",\n    \"provider\": \"augmxnt\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 20213,\n    \"hf_likes\": 18,\n    \"release_date\": \"2023-12-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"dphn/dolphin-2.6-mistral-7b\",\n    \"provider\": \"dphn\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241740288,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 60305,\n    \"hf_likes\": 105,\n    \"release_date\": \"2023-12-27\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"mistralai/Mistral-7B-Instruct-v0.3\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7248023552,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"unknown\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 1540743,\n    \"hf_likes\": 2447,\n    \"release_date\": \"2024-05-22\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Mistral-7B-Instruct-v0.3-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"allenai/wildguard\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7248031744,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 23686,\n    \"hf_likes\": 38,\n    \"release_date\": \"2024-06-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"dphn/dolphin-2.9.3-mistral-7B-32k\",\n    \"provider\": \"dphn\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7248039936,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 79357,\n    \"hf_likes\": 57,\n    \"release_date\": \"2024-06-25\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/dolphin-2.9.3-mistral-7B-32k-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"thesven/Mistral-7B-Instruct-v0.3-GPTQ\",\n    \"provider\": \"thesven\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7249399808,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 35763,\n    \"hf_likes\": 1,\n    \"release_date\": \"2024-05-22\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"allenai/Olmo-3-7B-Instruct-SFT\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"7.3B\",\n    \"parameters_raw\": 7298011136,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 134834,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-11-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/Olmo-3-1025-7B\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"7.3B\",\n    \"parameters_raw\": 7298011136,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 71128,\n    \"hf_likes\": 54,\n    \"release_date\": \"2025-09-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"TechxGenus/starcoder2-7b-GPTQ\",\n    \"provider\": \"techxgenus\",\n    \"parameter_count\": \"7.4B\",\n    \"parameters_raw\": 7400416256,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.9,\n    \"min_vram_gb\": 3.8,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder2\",\n    \"hf_downloads\": 36955,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-03-22\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"tiiuae/Falcon3-7B-Instruct\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"7.5B\",\n    \"parameters_raw\": 7455550464,\n    \"min_ram_gb\": 4.2,\n    \"recommended_ram_gb\": 6.9,\n    \"min_vram_gb\": 3.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 18394,\n    \"hf_likes\": 76,\n    \"release_date\": \"2024-11-29\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Falcon3-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 20736120,\n    \"hf_likes\": 1108,\n    \"release_date\": \"2024-09-16\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1575000,\n    \"hf_likes\": 659,\n    \"release_date\": \"2024-09-17\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-7B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 743941,\n    \"hf_likes\": 797,\n    \"release_date\": \"2025-01-20\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 2029944,\n    \"hf_likes\": 266,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-7B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1107387,\n    \"hf_likes\": 19,\n    \"release_date\": \"2024-09-20\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1066717,\n    \"hf_likes\": 13,\n    \"release_date\": \"2024-09-20\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Math-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 318106,\n    \"hf_likes\": 89,\n    \"release_date\": \"2024-09-19\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Math-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 310355,\n    \"hf_likes\": 683,\n    \"release_date\": \"2024-06-04\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 240132,\n    \"hf_likes\": 137,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 158122,\n    \"hf_likes\": 29,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Dream-org/Dream-v0-Instruct-7B\",\n    \"provider\": \"dream-org\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"Dream\",\n    \"hf_downloads\": 73949,\n    \"hf_likes\": 154,\n    \"release_date\": \"2025-04-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 70734,\n    \"hf_likes\": 170,\n    \"release_date\": \"2024-06-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Math-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 68238,\n    \"hf_likes\": 106,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"DeepHat/DeepHat-V1-7B\",\n    \"provider\": \"deephat\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 63374,\n    \"hf_likes\": 111,\n    \"release_date\": \"2025-04-25\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B-Instruct-1M\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1010000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 46699,\n    \"hf_likes\": 366,\n    \"release_date\": \"2025-01-23\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-7B-Instruct-1M-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"GPTQ-Int8\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 30708,\n    \"hf_likes\": 18,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"microsoft/Phi-mini-MoE-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7647632704,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phimoe\",\n    \"hf_downloads\": 69775,\n    \"hf_likes\": 30,\n    \"release_date\": \"2025-06-23\",\n    \"is_moe\": true,\n    \"num_experts\": 16,\n    \"active_experts\": 2,\n    \"active_parameters\": 1290538017,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen-7B-Chat\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.7B\",\n    \"parameters_raw\": 7721324544,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen\",\n    \"hf_downloads\": 195550,\n    \"hf_likes\": 787,\n    \"release_date\": \"2023-08-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.7B\",\n    \"parameters_raw\": 7721324544,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen\",\n    \"hf_downloads\": 189346,\n    \"hf_likes\": 396,\n    \"release_date\": \"2023-08-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.7B\",\n    \"parameters_raw\": 7721324544,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 75458,\n    \"hf_likes\": 56,\n    \"release_date\": \"2024-01-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"BSC-LT/salamandra-7b-instruct\",\n    \"provider\": \"bsc-lt\",\n    \"parameter_count\": \"7.8B\",\n    \"parameters_raw\": 7768117248,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 31017,\n    \"hf_likes\": 75,\n    \"release_date\": \"2024-09-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"kmhf/hf-moshiko\",\n    \"provider\": \"kmhf\",\n    \"parameter_count\": \"7.8B\",\n    \"parameters_raw\": 7783880545,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 3000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"moshi\",\n    \"hf_downloads\": 123900,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-09-27\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"XiaomiMiMo/MiMo-7B-Base\",\n    \"provider\": \"xiaomimimo\",\n    \"parameter_count\": \"7.8B\",\n    \"parameters_raw\": 7833409536,\n    \"min_ram_gb\": 4.4,\n    \"recommended_ram_gb\": 7.3,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mimo\",\n    \"hf_downloads\": 93937,\n    \"hf_likes\": 124,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-3n-E4B-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"8B\",\n    \"parameters_raw\": 8000000000,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, on-device (effective 4B)\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"gemma3n\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-06-25\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gemma-3n-E4B-it-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"mistralai/Ministral-8B-Instruct-2410\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Ministral-8B-Instruct-2410-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"meta-llama/Meta-Llama-3-8B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 2463959,\n    \"hf_likes\": 6473,\n    \"release_date\": \"2024-04-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Meta-Llama-3-8B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1353966,\n    \"hf_likes\": 4391,\n    \"release_date\": \"2024-04-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Meta-Llama-3-8B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"NousResearch/Hermes-3-Llama-3.1-8B\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 635984,\n    \"hf_likes\": 391,\n    \"release_date\": \"2024-07-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Hermes-3-Llama-3.1-8B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"IlyaGusev/saiga_llama3_8b\",\n    \"provider\": \"ilyagusev\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 399621,\n    \"hf_likes\": 137,\n    \"release_date\": \"2024-04-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"NousResearch/Meta-Llama-3.1-8B-Instruct\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 207258,\n    \"hf_likes\": 39,\n    \"release_date\": \"2024-07-24\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Meta-Llama-3.1-8B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"meta-llama/Llama-Guard-3-8B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 163719,\n    \"hf_likes\": 272,\n    \"release_date\": \"2024-07-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/Llama-3.1-8B-Instruct-FP8\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 93876,\n    \"hf_likes\": 32,\n    \"release_date\": \"2024-08-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-v1.1\",\n    \"provider\": \"patronusai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 20626,\n    \"hf_likes\": 10,\n    \"release_date\": \"2024-07-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261696,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 684729,\n    \"hf_likes\": 44,\n    \"release_date\": \"2024-07-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-8B-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261696,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 200501,\n    \"hf_likes\": 10,\n    \"release_date\": \"2024-07-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"fdtn-ai/Foundation-Sec-1.1-8B-Instruct\",\n    \"provider\": \"fdtn-ai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030326784,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 53389,\n    \"hf_likes\": 13,\n    \"release_date\": \"2025-11-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmms-lab/llava-onevision-qwen2-7b-ov\",\n    \"provider\": \"lmms-lab\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030348832,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"vision\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llava\",\n    \"hf_downloads\": 133340,\n    \"hf_likes\": 62,\n    \"release_date\": \"2024-06-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8031637504,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 36809,\n    \"hf_likes\": 30,\n    \"release_date\": \"2024-07-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4\",\n    \"provider\": \"hugging-quants\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8031637504,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 27054,\n    \"hf_likes\": 41,\n    \"release_date\": \"2024-07-24\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8031637504,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 21204,\n    \"hf_likes\": 9,\n    \"release_date\": \"2024-07-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"ibm-granite/granite-3.3-8b-instruct\",\n    \"provider\": \"ibm-granite\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8170864640,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"granite\",\n    \"hf_downloads\": 65699,\n    \"hf_likes\": 153,\n    \"release_date\": \"2025-04-09\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/granite-3.3-8b-instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-8B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8190735360,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 790734,\n    \"hf_likes\": 87,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-8B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8190735360,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 327827,\n    \"hf_likes\": 37,\n    \"release_date\": \"2025-05-03\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-0528-Qwen3-8B\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8190735360,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 148562,\n    \"hf_likes\": 1040,\n    \"release_date\": \"2025-05-29\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"huihui-ai/Huihui-Qwen3-8B-abliterated-v2\",\n    \"provider\": \"huihui-ai\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8190735360,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 32025,\n    \"hf_likes\": 34,\n    \"release_date\": \"2025-06-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-8B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8191159296,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 196191,\n    \"hf_likes\": 57,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nytopop/Qwen3-8B.w8a8\",\n    \"provider\": \"nytopop\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8192136192,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 33985,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-VL-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"8.3B\",\n    \"parameters_raw\": 8292166656,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.7,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen2_5_vl\",\n    \"hf_downloads\": 4008802,\n    \"hf_likes\": 1462,\n    \"release_date\": \"2025-01-26\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-VL-7B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-8B-A1B\",\n    \"provider\": \"liquidai\",\n    \"parameter_count\": \"8.3B\",\n    \"parameters_raw\": 8339929856,\n    \"min_ram_gb\": 4.7,\n    \"recommended_ram_gb\": 7.8,\n    \"min_vram_gb\": 4.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 47242,\n    \"hf_likes\": 328,\n    \"release_date\": \"2025-10-07\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 1407363160,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2-8B-A1B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/Mistral-NeMo-Minitron-8B-Instruct\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.4B\",\n    \"parameters_raw\": 8414105600,\n    \"min_ram_gb\": 4.7,\n    \"recommended_ram_gb\": 7.8,\n    \"min_vram_gb\": 4.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 55809,\n    \"hf_likes\": 82,\n    \"release_date\": \"2024-10-02\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Mistral-NeMo-Minitron-8B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"01-ai/Yi-1.5-9B-Chat\",\n    \"provider\": \"01.ai\",\n    \"parameter_count\": \"8.8B\",\n    \"parameters_raw\": 8829407232,\n    \"min_ram_gb\": 4.9,\n    \"recommended_ram_gb\": 8.2,\n    \"min_vram_gb\": 4.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 19975,\n    \"hf_likes\": 148,\n    \"release_date\": \"2024-05-10\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Yi-1.5-9B-Chat-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2-Base\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.9B\",\n    \"parameters_raw\": 8888227328,\n    \"min_ram_gb\": 5.0,\n    \"recommended_ram_gb\": 8.3,\n    \"min_vram_gb\": 4.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 165722,\n    \"hf_likes\": 43,\n    \"release_date\": \"2025-08-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.9B\",\n    \"parameters_raw\": 8888227328,\n    \"min_ram_gb\": 5.0,\n    \"recommended_ram_gb\": 8.3,\n    \"min_vram_gb\": 4.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 24028,\n    \"hf_likes\": 121,\n    \"release_date\": \"2026-02-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.9B\",\n    \"parameters_raw\": 8888227432,\n    \"min_ram_gb\": 5.0,\n    \"recommended_ram_gb\": 8.3,\n    \"min_vram_gb\": 4.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 70791,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-09-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2\",\n    \"provider\": \"NVIDIA\",\n    \"parameter_count\": \"9B\",\n    \"parameters_raw\": 9000000000,\n    \"min_ram_gb\": 5.0,\n    \"recommended_ram_gb\": 8.4,\n    \"min_vram_gb\": 4.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Hybrid Mamba2, reasoning\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-06-01\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-32B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"9.2B\",\n    \"parameters_raw\": 9214833664,\n    \"min_ram_gb\": 5.1,\n    \"recommended_ram_gb\": 8.6,\n    \"min_vram_gb\": 4.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 24718,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen2.5-Coder-32B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"9.2B\",\n    \"parameters_raw\": 9215644672,\n    \"min_ram_gb\": 5.1,\n    \"recommended_ram_gb\": 8.6,\n    \"min_vram_gb\": 4.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 41754,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-11-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/QwQ-32B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"9.2B\",\n    \"parameters_raw\": 9215644672,\n    \"min_ram_gb\": 5.1,\n    \"recommended_ram_gb\": 8.6,\n    \"min_vram_gb\": 4.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 32269,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-03-05\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-2-9b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"9.2B\",\n    \"parameters_raw\": 9241705984,\n    \"min_ram_gb\": 5.2,\n    \"recommended_ram_gb\": 8.6,\n    \"min_vram_gb\": 4.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 180627,\n    \"hf_likes\": 775,\n    \"release_date\": \"2024-06-24\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-2-9b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zai-org/glm-4-9b-chat-hf\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"9.4B\",\n    \"parameters_raw\": 9399951360,\n    \"min_ram_gb\": 5.3,\n    \"recommended_ram_gb\": 8.8,\n    \"min_vram_gb\": 4.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm\",\n    \"hf_downloads\": 22553,\n    \"hf_likes\": 24,\n    \"release_date\": \"2024-10-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"THUDM/glm-4-9b-chat\",\n    \"provider\": \"thudm\",\n    \"parameter_count\": \"9.4B\",\n    \"parameters_raw\": 9399951392,\n    \"min_ram_gb\": 5.3,\n    \"recommended_ram_gb\": 8.8,\n    \"min_vram_gb\": 4.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"unknown\",\n    \"architecture\": \"chatglm\",\n    \"hf_downloads\": 190092,\n    \"hf_likes\": 702,\n    \"release_date\": \"2024-06-04\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/glm-4-9b-chat-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zai-org/glm-4-9b\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"9.4B\",\n    \"parameters_raw\": 9399951392,\n    \"min_ram_gb\": 5.3,\n    \"recommended_ram_gb\": 8.8,\n    \"min_vram_gb\": 4.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"chatglm\",\n    \"hf_downloads\": 23550,\n    \"hf_likes\": 143,\n    \"release_date\": \"2024-06-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-9B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"9.7B\",\n    \"parameters_raw\": 9653104368,\n    \"min_ram_gb\": 5.4,\n    \"recommended_ram_gb\": 9.0,\n    \"min_vram_gb\": 4.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 172298,\n    \"hf_likes\": 345,\n    \"release_date\": \"2026-02-27\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-9B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-9B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"9.7B\",\n    \"parameters_raw\": 9653104368,\n    \"min_ram_gb\": 5.4,\n    \"recommended_ram_gb\": 9.0,\n    \"min_vram_gb\": 4.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 5324,\n    \"hf_likes\": 38,\n    \"release_date\": \"2026-02-26\"\n  },\n  {\n    \"name\": \"solidrust/gemma-2-9b-it-AWQ\",\n    \"provider\": \"solidrust\",\n    \"parameter_count\": \"10.2B\",\n    \"parameters_raw\": 10159209984,\n    \"min_ram_gb\": 5.7,\n    \"recommended_ram_gb\": 9.5,\n    \"min_vram_gb\": 5.2,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 32664,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-09-03\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.2-11B-Vision-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"11.0B\",\n    \"parameters_raw\": 10665463808,\n    \"min_ram_gb\": 6.0,\n    \"recommended_ram_gb\": 9.9,\n    \"min_vram_gb\": 5.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"upstage/SOLAR-10.7B-Instruct-v1.0\",\n    \"provider\": \"Upstage\",\n    \"parameter_count\": \"10.7B\",\n    \"parameters_raw\": 10700000000,\n    \"min_ram_gb\": 6.0,\n    \"recommended_ram_gb\": 10.0,\n    \"min_vram_gb\": 5.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"High-performance instruction following\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B\",\n    \"provider\": \"naver-hyperclovax\",\n    \"parameter_count\": \"10.7B\",\n    \"parameters_raw\": 10741664520,\n    \"min_ram_gb\": 6.0,\n    \"recommended_ram_gb\": 10.0,\n    \"min_vram_gb\": 5.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"vlm\",\n    \"hf_downloads\": 102546,\n    \"hf_likes\": 181,\n    \"release_date\": \"2025-12-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"speakleash/Bielik-11B-v3.0-Instruct\",\n    \"provider\": \"speakleash\",\n    \"parameter_count\": \"11.2B\",\n    \"parameters_raw\": 11168796672,\n    \"min_ram_gb\": 6.2,\n    \"recommended_ram_gb\": 10.4,\n    \"min_vram_gb\": 5.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 232376,\n    \"hf_likes\": 55,\n    \"release_date\": \"2025-11-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cjvt/GaMS3-12B-Instruct\",\n    \"provider\": \"cjvt\",\n    \"parameter_count\": \"11.8B\",\n    \"parameters_raw\": 11766034176,\n    \"min_ram_gb\": 6.6,\n    \"recommended_ram_gb\": 11.0,\n    \"min_vram_gb\": 6.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma3_text\",\n    \"hf_downloads\": 26653,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-12-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-12b\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"12.0B\",\n    \"parameters_raw\": 11997067840,\n    \"min_ram_gb\": 6.7,\n    \"recommended_ram_gb\": 11.2,\n    \"min_vram_gb\": 6.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 43453,\n    \"hf_likes\": 144,\n    \"release_date\": \"2023-02-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-3-12b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"12B\",\n    \"parameters_raw\": 12000000000,\n    \"min_ram_gb\": 6.7,\n    \"recommended_ram_gb\": 11.2,\n    \"min_vram_gb\": 6.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gemma-3-12b-it-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"mistralai/Mistral-Nemo-Instruct-2407\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"12.2B\",\n    \"parameters_raw\": 12247076864,\n    \"min_ram_gb\": 6.8,\n    \"recommended_ram_gb\": 11.4,\n    \"min_vram_gb\": 6.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Mistral-Nemo-Instruct-2407-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Mistral-Nemo-Instruct-2407-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"casperhansen/mistral-nemo-instruct-2407-awq\",\n    \"provider\": \"casperhansen\",\n    \"parameter_count\": \"12.2B\",\n    \"parameters_raw\": 12247782400,\n    \"min_ram_gb\": 6.8,\n    \"recommended_ram_gb\": 11.4,\n    \"min_vram_gb\": 6.3,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 1024000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 189490,\n    \"hf_likes\": 12,\n    \"release_date\": \"2024-07-23\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"m8than/Mistral-Nemo-Instruct-2407-lenient-chatfix\",\n    \"provider\": \"m8than\",\n    \"parameter_count\": \"12.2B\",\n    \"parameters_raw\": 12247782400,\n    \"min_ram_gb\": 6.8,\n    \"recommended_ram_gb\": 11.4,\n    \"min_vram_gb\": 6.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 25879,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-05-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"mixtao/MixTAO-7Bx2-MoE-v8.1\",\n    \"provider\": \"mixtao\",\n    \"parameter_count\": \"12.9B\",\n    \"parameters_raw\": 12879138816,\n    \"min_ram_gb\": 7.2,\n    \"recommended_ram_gb\": 12.0,\n    \"min_vram_gb\": 6.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 20213,\n    \"hf_likes\": 55,\n    \"release_date\": \"2024-02-26\",\n    \"is_moe\": true,\n    \"num_experts\": 2,\n    \"active_experts\": 2,\n    \"active_parameters\": 12879138816,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/Orca-2-13b\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13015864320,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Reasoning, step-by-step solutions\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"lmsys/vicuna-13b-v1.5\",\n    \"provider\": \"LMSYS\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13015864320,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"WizardLMTeam/WizardLM-13B-V1.2\",\n    \"provider\": \"WizardLM\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13015864320,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"cais/HarmBench-Llama-2-13b-cls\",\n    \"provider\": \"cais\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13015864320,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 30370,\n    \"hf_likes\": 27,\n    \"release_date\": \"2024-02-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/CodeLlama-13b-Instruct-hf\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13016028160,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 6450,\n    \"hf_likes\": 27,\n    \"release_date\": \"2024-03-13\"\n  },\n  {\n    \"name\": \"microsoft/phi-4\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"14B\",\n    \"parameters_raw\": 14000000000,\n    \"min_ram_gb\": 7.8,\n    \"recommended_ram_gb\": 13.0,\n    \"min_vram_gb\": 7.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Reasoning, STEM, code generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/phi-4-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/phi-4-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/Phi-3-medium-14b-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"14B\",\n    \"parameters_raw\": 14000000000,\n    \"min_ram_gb\": 7.8,\n    \"recommended_ram_gb\": 13.0,\n    \"min_vram_gb\": 7.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Balanced performance and size\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"microsoft/Phi-4-reasoning\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"14B\",\n    \"parameters_raw\": 14000000000,\n    \"min_ram_gb\": 7.8,\n    \"recommended_ram_gb\": 13.0,\n    \"min_vram_gb\": 7.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Advanced reasoning, math and code\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi4\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-01\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Phi-4-reasoning-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/Phi-4-multimodal-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"14B\",\n    \"parameters_raw\": 14000000000,\n    \"min_ram_gb\": 7.8,\n    \"recommended_ram_gb\": 13.0,\n    \"min_vram_gb\": 7.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, vision and audio\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"phi4\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-01\"\n  },\n  {\n    \"name\": \"Qwen/Qwen-14B-Chat-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.2B\",\n    \"parameters_raw\": 14168796160,\n    \"min_ram_gb\": 7.9,\n    \"recommended_ram_gb\": 13.2,\n    \"min_vram_gb\": 7.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen\",\n    \"hf_downloads\": 45732,\n    \"hf_likes\": 100,\n    \"release_date\": \"2023-09-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-MoE-A2.7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.3B\",\n    \"parameters_raw\": 14315784192,\n    \"min_ram_gb\": 8.0,\n    \"recommended_ram_gb\": 13.3,\n    \"min_vram_gb\": 7.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2_moe\",\n    \"hf_downloads\": 59931,\n    \"hf_likes\": 220,\n    \"release_date\": \"2024-02-29\",\n    \"is_moe\": true,\n    \"num_experts\": 60,\n    \"active_experts\": 4,\n    \"active_parameters\": 1622455541,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bullpoint/Qwen3-Coder-Next-AWQ-4bit\",\n    \"provider\": \"bullpoint\",\n    \"parameter_count\": \"14.4B\",\n    \"parameters_raw\": 14444722944,\n    \"min_ram_gb\": 8.1,\n    \"recommended_ram_gb\": 13.5,\n    \"min_vram_gb\": 7.4,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 1226868,\n    \"hf_likes\": 14,\n    \"release_date\": \"2026-02-03\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 990253467,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"stelterlab/phi-4-AWQ\",\n    \"provider\": \"stelterlab\",\n    \"parameter_count\": \"14.7B\",\n    \"parameters_raw\": 14659507200,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.5,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 16384,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 55064,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-01-11\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"14.7B\",\n    \"parameters_raw\": 14736242944,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.5,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 192744,\n    \"hf_likes\": 61,\n    \"release_date\": \"2025-09-12\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 1010238527,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"14.7B\",\n    \"parameters_raw\": 14736242944,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.5,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 168561,\n    \"hf_likes\": 22,\n    \"release_date\": \"2025-09-12\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 1010238527,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3-14B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14768307200,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 258163,\n    \"hf_likes\": 57,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"OpenPipe/Qwen3-14B-Instruct\",\n    \"provider\": \"openpipe\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14768307200,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 207053,\n    \"hf_likes\": 12,\n    \"release_date\": \"2025-10-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Goekdeniz-Guelmez/Josiefied-Qwen3-14B-abliterated-v3\",\n    \"provider\": \"goekdeniz-guelmez\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14768307200,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 55059,\n    \"hf_likes\": 24,\n    \"release_date\": \"2025-05-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-14B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14768307200,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 50835,\n    \"hf_likes\": 49,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770000000,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-14B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-14B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770000000,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-14B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-14B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 491583,\n    \"hf_likes\": 142,\n    \"release_date\": \"2024-11-06\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-14B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-14B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1077036,\n    \"hf_likes\": 27,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-Distill-Qwen-14B\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 761474,\n    \"hf_likes\": 608,\n    \"release_date\": \"2025-01-20\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-14B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 168345,\n    \"hf_likes\": 16,\n    \"release_date\": \"2024-11-09\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 100307,\n    \"hf_likes\": 144,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 93325,\n    \"hf_likes\": 26,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct-1M\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1010000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 54355,\n    \"hf_likes\": 334,\n    \"release_date\": \"2025-01-23\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-14B-Instruct-1M-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"OpenDFM/ChemDFM-R-14B\",\n    \"provider\": \"opendfm\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 41195,\n    \"hf_likes\": 6,\n    \"release_date\": \"2025-10-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"GPTQ-Int8\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 37961,\n    \"hf_likes\": 21,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-14B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 27181,\n    \"hf_likes\": 66,\n    \"release_date\": \"2024-11-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-14B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"WizardLMTeam/WizardCoder-15B-V1.0\",\n    \"provider\": \"WizardLM\",\n    \"parameter_count\": \"15.5B\",\n    \"parameters_raw\": 15515334656,\n    \"min_ram_gb\": 8.7,\n    \"recommended_ram_gb\": 14.5,\n    \"min_vram_gb\": 7.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Code generation and completion\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"nvidia/Qwen3-30B-A3B-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"15.6B\",\n    \"parameters_raw\": 15583623168,\n    \"min_ram_gb\": 8.7,\n    \"recommended_ram_gb\": 14.5,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 63897,\n    \"hf_likes\": 24,\n    \"release_date\": \"2025-07-08\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 1704458782,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"NVFP4/Qwen3-Coder-30B-A3B-Instruct-FP4\",\n    \"provider\": \"nvfp4\",\n    \"parameter_count\": \"15.6B\",\n    \"parameters_raw\": 15583623168,\n    \"min_ram_gb\": 8.7,\n    \"recommended_ram_gb\": 14.5,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 25920,\n    \"hf_likes\": 11,\n    \"release_date\": \"2025-08-05\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 1704458782,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigcode/starcoder2-15b\",\n    \"provider\": \"BigCode\",\n    \"parameter_count\": \"15.7B\",\n    \"parameters_raw\": 15700000000,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"16B\",\n    \"parameters_raw\": 15700000000,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Code generation and completion\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2400000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V2-Lite-Chat\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"15.7B\",\n    \"parameters_raw\": 15706484224,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 330400,\n    \"hf_likes\": 134,\n    \"release_date\": \"2024-05-15\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2184182961,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V2-Lite\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"15.7B\",\n    \"parameters_raw\": 15706484224,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 194737,\n    \"hf_likes\": 167,\n    \"release_date\": \"2024-05-15\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2184182961,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/DeepSeek-Coder-V2-Lite-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"15.7B\",\n    \"parameters_raw\": 15706484224,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 53780,\n    \"hf_likes\": 9,\n    \"release_date\": \"2024-07-17\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2184182961,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"moonshotai/Moonlight-16B-A3B\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"16.0B\",\n    \"parameters_raw\": 15960111936,\n    \"min_ram_gb\": 8.9,\n    \"recommended_ram_gb\": 14.9,\n    \"min_vram_gb\": 8.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 45835,\n    \"hf_likes\": 108,\n    \"release_date\": \"2025-02-22\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 6,\n    \"active_parameters\": 1153367458,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"moonshotai/Moonlight-16B-A3B-Instruct\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"16.0B\",\n    \"parameters_raw\": 15960111936,\n    \"min_ram_gb\": 8.9,\n    \"recommended_ram_gb\": 14.9,\n    \"min_vram_gb\": 8.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 38514,\n    \"hf_likes\": 192,\n    \"release_date\": \"2025-02-22\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 6,\n    \"active_parameters\": 1153367458,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"inclusionAI/LLaDA2.1-mini\",\n    \"provider\": \"inclusionai\",\n    \"parameter_count\": \"16.3B\",\n    \"parameters_raw\": 16255643392,\n    \"min_ram_gb\": 9.1,\n    \"recommended_ram_gb\": 15.1,\n    \"min_vram_gb\": 8.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llada2_moe\",\n    \"hf_downloads\": 21824,\n    \"hf_likes\": 94,\n    \"release_date\": \"2026-02-09\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 1295371577,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/deepseek-moe-16b-base\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"16.4B\",\n    \"parameters_raw\": 16375728128,\n    \"min_ram_gb\": 9.2,\n    \"recommended_ram_gb\": 15.3,\n    \"min_vram_gb\": 8.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek\",\n    \"hf_downloads\": 22326,\n    \"hf_likes\": 139,\n    \"release_date\": \"2024-01-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"inclusionAI/Ling-lite\",\n    \"provider\": \"inclusionai\",\n    \"parameter_count\": \"16.8B\",\n    \"parameters_raw\": 16801974272,\n    \"min_ram_gb\": 9.4,\n    \"recommended_ram_gb\": 15.6,\n    \"min_vram_gb\": 8.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bailing_moe\",\n    \"hf_downloads\": 388,\n    \"hf_likes\": 78,\n    \"release_date\": \"2025-02-28\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2336524543\n  },\n  {\n    \"name\": \"nvidia/Qwen3-32B-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"17.2B\",\n    \"parameters_raw\": 17159312384,\n    \"min_ram_gb\": 9.6,\n    \"recommended_ram_gb\": 16.0,\n    \"min_vram_gb\": 8.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 26285,\n    \"hf_likes\": 11,\n    \"release_date\": \"2025-09-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"18.2B\",\n    \"parameters_raw\": 18237772608,\n    \"min_ram_gb\": 10.2,\n    \"recommended_ram_gb\": 17.0,\n    \"min_vram_gb\": 9.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 490404,\n    \"hf_likes\": 105,\n    \"release_date\": \"2025-12-20\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/GLM-4.5-Air-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"18.6B\",\n    \"parameters_raw\": 18626406504,\n    \"min_ram_gb\": 10.4,\n    \"recommended_ram_gb\": 17.3,\n    \"min_vram_gb\": 9.5,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe\",\n    \"hf_downloads\": 260177,\n    \"hf_likes\": 27,\n    \"release_date\": \"2025-07-29\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"QuantTrio/GLM-4.5-Air-GPTQ-Int4-Int8Mix\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"19.8B\",\n    \"parameters_raw\": 19809102592,\n    \"min_ram_gb\": 11.1,\n    \"recommended_ram_gb\": 18.4,\n    \"min_vram_gb\": 10.1,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe\",\n    \"hf_downloads\": 24759,\n    \"hf_likes\": 10,\n    \"release_date\": \"2025-07-30\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"internlm/internlm2-chat-20b\",\n    \"provider\": \"internlm\",\n    \"parameter_count\": \"19.9B\",\n    \"parameters_raw\": 19861149696,\n    \"min_ram_gb\": 11.1,\n    \"recommended_ram_gb\": 18.5,\n    \"min_vram_gb\": 10.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"internlm2\",\n    \"hf_downloads\": 20010,\n    \"hf_likes\": 88,\n    \"release_date\": \"2024-01-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"openai/gpt-oss-20b\",\n    \"provider\": \"openai\",\n    \"parameter_count\": \"21.5B\",\n    \"parameters_raw\": 21511953984,\n    \"min_ram_gb\": 12.0,\n    \"recommended_ram_gb\": 20.0,\n    \"min_vram_gb\": 11.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 7049150,\n    \"hf_likes\": 4421,\n    \"release_date\": \"2025-08-04\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 3630142231,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gpt-oss-20b-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/gpt-oss-20b\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"21.5B\",\n    \"parameters_raw\": 21511953984,\n    \"min_ram_gb\": 12.0,\n    \"recommended_ram_gb\": 20.0,\n    \"min_vram_gb\": 11.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 20506,\n    \"hf_likes\": 5,\n    \"release_date\": \"2025-09-04\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 3630142231,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gpt-oss-20b-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/ERNIE-4.5-21B-A3B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"21.8B\",\n    \"parameters_raw\": 21825436160,\n    \"min_ram_gb\": 12.2,\n    \"recommended_ram_gb\": 20.3,\n    \"min_vram_gb\": 11.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ernie4_5_moe\",\n    \"hf_downloads\": 24749,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-07-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/ERNIE-4.5-21B-A3B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"21.8B\",\n    \"parameters_raw\": 21825436160,\n    \"min_ram_gb\": 12.2,\n    \"recommended_ram_gb\": 20.3,\n    \"min_vram_gb\": 11.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ernie4_5_moe\",\n    \"hf_downloads\": 24612,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-07-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/ERNIE-4.5-21B-A3B-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"21.8B\",\n    \"parameters_raw\": 21825436160,\n    \"min_ram_gb\": 12.2,\n    \"recommended_ram_gb\": 20.3,\n    \"min_vram_gb\": 11.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ernie4_5_moe\",\n    \"hf_downloads\": 24573,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-07-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"solidrust/Codestral-22B-v0.1-hf-AWQ\",\n    \"provider\": \"solidrust\",\n    \"parameter_count\": \"22.2B\",\n    \"parameters_raw\": 22247282688,\n    \"min_ram_gb\": 12.4,\n    \"recommended_ram_gb\": 20.7,\n    \"min_vram_gb\": 11.4,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 84893,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-05-30\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"stelterlab/Mistral-Small-24B-Instruct-2501-AWQ\",\n    \"provider\": \"stelterlab\",\n    \"parameter_count\": \"23.6B\",\n    \"parameters_raw\": 23572403200,\n    \"min_ram_gb\": 13.2,\n    \"recommended_ram_gb\": 22.0,\n    \"min_vram_gb\": 12.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 266172,\n    \"hf_likes\": 26,\n    \"release_date\": \"2025-01-30\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/Devstral-Small-2507-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.6B\",\n    \"parameters_raw\": 23572403200,\n    \"min_ram_gb\": 13.2,\n    \"recommended_ram_gb\": 22.0,\n    \"min_vram_gb\": 12.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 19891,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-07-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-24B-A2B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843659008,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 207367,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-23\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 2607900202,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-24B-A2B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843659008,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 205544,\n    \"hf_likes\": 2,\n    \"release_date\": \"2026-02-23\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 2607900202,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-24B-A2B-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843659008,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 204884,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-23\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 2607900202,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-24B-A2B-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843659008,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 204308,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-23\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 2607900202,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-24B-A2B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843661440,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Agentic tasks, RAG, summarization\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 2300000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"mistralai/Mistral-Small-24B-Instruct-2501\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"24B\",\n    \"parameters_raw\": 24000000000,\n    \"min_ram_gb\": 13.4,\n    \"recommended_ram_gb\": 22.4,\n    \"min_vram_gb\": 12.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Mistral-Small-24B-Instruct-2501-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Mistral-Small-24B-Instruct-2501-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"google/gemma-2-27b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"27.2B\",\n    \"parameters_raw\": 27227128320,\n    \"min_ram_gb\": 15.2,\n    \"recommended_ram_gb\": 25.4,\n    \"min_vram_gb\": 13.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 409260,\n    \"hf_likes\": 560,\n    \"release_date\": \"2024-06-24\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-2-27b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"google/gemma-3-27b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"27.4B\",\n    \"parameters_raw\": 27432406640,\n    \"min_ram_gb\": 15.3,\n    \"recommended_ram_gb\": 25.5,\n    \"min_vram_gb\": 14.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"gemma3\",\n    \"hf_downloads\": 1520563,\n    \"hf_likes\": 1905,\n    \"release_date\": \"2025-03-01\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gemma-3-27b-it-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-27B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"27.8B\",\n    \"parameters_raw\": 27781427952,\n    \"min_ram_gb\": 15.5,\n    \"recommended_ram_gb\": 25.9,\n    \"min_vram_gb\": 14.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 406808,\n    \"hf_likes\": 565,\n    \"release_date\": \"2026-02-24\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-27B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/GLM-4.7-Flash-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"29.9B\",\n    \"parameters_raw\": 29943393920,\n    \"min_ram_gb\": 16.7,\n    \"recommended_ram_gb\": 27.9,\n    \"min_vram_gb\": 15.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe_lite\",\n    \"hf_downloads\": 1001623,\n    \"hf_likes\": 9,\n    \"release_date\": \"2026-01-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/GLM-4.7-Flash-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"29.9B\",\n    \"parameters_raw\": 29943393920,\n    \"min_ram_gb\": 16.7,\n    \"recommended_ram_gb\": 27.9,\n    \"min_vram_gb\": 15.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe_lite\",\n    \"hf_downloads\": 991211,\n    \"hf_likes\": 8,\n    \"release_date\": \"2026-01-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-30B-A3B-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 226311,\n    \"hf_likes\": 47,\n    \"release_date\": \"2025-05-05\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 191895,\n    \"hf_likes\": 14,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 185814,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-08-01\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 181127,\n    \"hf_likes\": 12,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 179804,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-30B-A3B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 83458,\n    \"hf_likes\": 69,\n    \"release_date\": \"2025-04-28\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"typhoon-ai/typhoon2.5-qwen3-30b-a3b\",\n    \"provider\": \"typhoon-ai\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 53587,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-09-23\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 46035,\n    \"hf_likes\": 6,\n    \"release_date\": \"2025-08-01\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 45854,\n    \"hf_likes\": 6,\n    \"release_date\": \"2025-07-29\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 44199,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-07-29\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 43483,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-29\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Alibaba-NLP/Tongyi-DeepResearch-30B-A3B\",\n    \"provider\": \"alibaba-nlp\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 26559,\n    \"hf_likes\": 802,\n    \"release_date\": \"2025-09-16\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-30B-A3B-Instruct-2507-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30533947392,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 957458,\n    \"hf_likes\": 115,\n    \"release_date\": \"2025-07-28\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339650489,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30533947392,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 265519,\n    \"hf_likes\": 164,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339650489,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"31.1B\",\n    \"parameters_raw\": 31070754032,\n    \"min_ram_gb\": 17.4,\n    \"recommended_ram_gb\": 28.9,\n    \"min_vram_gb\": 15.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_vl_moe\",\n    \"hf_downloads\": 301353,\n    \"hf_likes\": 40,\n    \"release_date\": \"2025-10-04\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 2475950709,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"QuantTrio/GLM-4.7-Flash-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"31.2B\",\n    \"parameters_raw\": 31221488576,\n    \"min_ram_gb\": 17.4,\n    \"recommended_ram_gb\": 29.1,\n    \"min_vram_gb\": 16.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe_lite\",\n    \"hf_downloads\": 103703,\n    \"hf_likes\": 7,\n    \"release_date\": \"2026-01-21\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577935872,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 195432,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-12-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577935872,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 190541,\n    \"hf_likes\": 3,\n    \"release_date\": \"2025-12-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577935872,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 188175,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-12-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577935872,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 188130,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-12-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577937344,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 1025721,\n    \"hf_likes\": 648,\n    \"release_date\": \"2025-12-04\"\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577937344,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 65364,\n    \"hf_likes\": 109,\n    \"release_date\": \"2025-12-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"OpenResearcher/OpenResearcher-30B-A3B\",\n    \"provider\": \"openresearcher\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577937344,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 23630,\n    \"hf_likes\": 59,\n    \"release_date\": \"2026-02-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577946256,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 1412797,\n    \"hf_likes\": 289,\n    \"release_date\": \"2025-12-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LGAI-EXAONE/EXAONE-4.0-32B\",\n    \"provider\": \"LG AI\",\n    \"parameter_count\": \"32B\",\n    \"parameters_raw\": 32000000000,\n    \"min_ram_gb\": 17.9,\n    \"recommended_ram_gb\": 29.8,\n    \"min_vram_gb\": 16.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Hybrid reasoning, multilingual\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-15\"\n  },\n  {\n    \"name\": \"LGAI-EXAONE/EXAONE-4.0.1-32B\",\n    \"provider\": \"lgai-exaone\",\n    \"parameter_count\": \"32.0B\",\n    \"parameters_raw\": 32003216384,\n    \"min_ram_gb\": 17.9,\n    \"recommended_ram_gb\": 29.8,\n    \"min_vram_gb\": 16.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone4\",\n    \"hf_downloads\": 186516,\n    \"hf_likes\": 24,\n    \"release_date\": \"2025-07-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LGAI-EXAONE/EXAONE-4.0-32B-FP8\",\n    \"provider\": \"lgai-exaone\",\n    \"parameter_count\": \"32.0B\",\n    \"parameters_raw\": 32005105664,\n    \"min_ram_gb\": 17.9,\n    \"recommended_ram_gb\": 29.8,\n    \"min_vram_gb\": 16.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone4\",\n    \"hf_downloads\": 20430,\n    \"hf_likes\": 17,\n    \"release_date\": \"2025-07-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMo-2-0325-32B-Instruct\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"32.2B\",\n    \"parameters_raw\": 32234279936,\n    \"min_ram_gb\": 18.0,\n    \"recommended_ram_gb\": 30.0,\n    \"min_vram_gb\": 16.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo2\",\n    \"hf_downloads\": 2979,\n    \"hf_likes\": 148,\n    \"release_date\": \"2025-03-12\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/OLMo-2-0325-32B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.5B\",\n    \"parameters_raw\": 32510000000,\n    \"min_ram_gb\": 18.2,\n    \"recommended_ram_gb\": 30.3,\n    \"min_vram_gb\": 16.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-32B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-32B-Chat\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.5B\",\n    \"parameters_raw\": 32512218112,\n    \"min_ram_gb\": 18.2,\n    \"recommended_ram_gb\": 30.3,\n    \"min_vram_gb\": 16.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 25041,\n    \"hf_likes\": 109,\n    \"release_date\": \"2024-04-03\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen1.5-32B-Chat-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nn-tech/MetalGPT-1\",\n    \"provider\": \"nn-tech\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32759593984,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 20663,\n    \"hf_likes\": 38,\n    \"release_date\": \"2025-12-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-32B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32762123264,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 552811,\n    \"hf_likes\": 129,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-32B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 858975,\n    \"hf_likes\": 2000,\n    \"release_date\": \"2024-11-06\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-32B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-32B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 873156,\n    \"hf_likes\": 1525,\n    \"release_date\": \"2025-01-20\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1643600,\n    \"hf_likes\": 94,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1453252,\n    \"hf_likes\": 173,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-32B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 973260,\n    \"hf_likes\": 33,\n    \"release_date\": \"2024-11-09\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/QwQ-32B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 280279,\n    \"hf_likes\": 133,\n    \"release_date\": \"2025-03-05\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 191251,\n    \"hf_likes\": 40,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"baichuan-inc/Baichuan-M2-32B\",\n    \"provider\": \"baichuan-inc\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 152016,\n    \"hf_likes\": 118,\n    \"release_date\": \"2025-08-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"GPTQ-Int8\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 105034,\n    \"hf_likes\": 14,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-32B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 43109,\n    \"hf_likes\": 142,\n    \"release_date\": \"2024-11-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-32B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"meta-llama/CodeLlama-34b-Instruct-hf\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"33.7B\",\n    \"parameters_raw\": 33743970304,\n    \"min_ram_gb\": 18.9,\n    \"recommended_ram_gb\": 31.4,\n    \"min_vram_gb\": 17.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 950,\n    \"hf_likes\": 19,\n    \"release_date\": \"2024-03-14\"\n  },\n  {\n    \"name\": \"01-ai/Yi-34B-Chat\",\n    \"provider\": \"01.ai\",\n    \"parameter_count\": \"34.4B\",\n    \"parameters_raw\": 34386780160,\n    \"min_ram_gb\": 19.2,\n    \"recommended_ram_gb\": 32.0,\n    \"min_vram_gb\": 17.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Multilingual, Chinese/English chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"yi\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"dphn/dolphin-2.9.1-yi-1.5-34b\",\n    \"provider\": \"dphn\",\n    \"parameter_count\": \"34.4B\",\n    \"parameters_raw\": 34388917248,\n    \"min_ram_gb\": 19.2,\n    \"recommended_ram_gb\": 32.0,\n    \"min_vram_gb\": 17.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 4650971,\n    \"hf_likes\": 56,\n    \"release_date\": \"2024-05-18\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/dolphin-2.9.1-yi-1.5-34b-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"CohereForAI/c4ai-command-r-v01\",\n    \"provider\": \"Cohere\",\n    \"parameter_count\": \"35B\",\n    \"parameters_raw\": 35000000000,\n    \"min_ram_gb\": 19.5,\n    \"recommended_ram_gb\": 32.6,\n    \"min_vram_gb\": 17.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"RAG, tool use, agents\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"cohere\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/c4ai-command-r-v01-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-35B-A3B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"36.0B\",\n    \"parameters_raw\": 35951822704,\n    \"min_ram_gb\": 20.1,\n    \"recommended_ram_gb\": 33.5,\n    \"min_vram_gb\": 18.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5_moe\",\n    \"hf_downloads\": 769032,\n    \"hf_likes\": 905,\n    \"release_date\": \"2026-02-24\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 3000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-35B-A3B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/Seed-OSS-36B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"36.2B\",\n    \"parameters_raw\": 36151104512,\n    \"min_ram_gb\": 20.2,\n    \"recommended_ram_gb\": 33.7,\n    \"min_vram_gb\": 18.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 524288,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"seed_oss\",\n    \"hf_downloads\": 46944,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-08-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Seed-OSS-36B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"36.2B\",\n    \"parameters_raw\": 36151104512,\n    \"min_ram_gb\": 20.2,\n    \"recommended_ram_gb\": 33.7,\n    \"min_vram_gb\": 18.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 524288,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"seed_oss\",\n    \"hf_downloads\": 45348,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Seed-OSS-36B-Instruct-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"36.2B\",\n    \"parameters_raw\": 36151104512,\n    \"min_ram_gb\": 20.2,\n    \"recommended_ram_gb\": 33.7,\n    \"min_vram_gb\": 18.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 524288,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"seed_oss\",\n    \"hf_downloads\": 45061,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-08-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Seed-OSS-36B-Instruct-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"36.2B\",\n    \"parameters_raw\": 36151104512,\n    \"min_ram_gb\": 20.2,\n    \"recommended_ram_gb\": 33.7,\n    \"min_vram_gb\": 18.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 524288,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"seed_oss\",\n    \"hf_downloads\": 44971,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/MiniMax-M2.1-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"36.8B\",\n    \"parameters_raw\": 36811839984,\n    \"min_ram_gb\": 20.6,\n    \"recommended_ram_gb\": 34.3,\n    \"min_vram_gb\": 18.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 36114,\n    \"hf_likes\": 16,\n    \"release_date\": \"2025-12-27\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 2933443495,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/MiniMax-M2.5-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"36.8B\",\n    \"parameters_raw\": 36811839984,\n    \"min_ram_gb\": 20.6,\n    \"recommended_ram_gb\": 34.3,\n    \"min_vram_gb\": 18.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 24338,\n    \"hf_likes\": 6,\n    \"release_date\": \"2026-02-15\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 2933443495,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"mratsim/MiniMax-M2.5-BF16-INT4-AWQ\",\n    \"provider\": \"mratsim\",\n    \"parameter_count\": \"39.1B\",\n    \"parameters_raw\": 39115692032,\n    \"min_ram_gb\": 21.9,\n    \"recommended_ram_gb\": 36.4,\n    \"min_vram_gb\": 20.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 46268,\n    \"hf_likes\": 29,\n    \"release_date\": \"2026-02-14\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 3117031705,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"tiiuae/falcon-40b-instruct\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"40.0B\",\n    \"parameters_raw\": 40000000000,\n    \"min_ram_gb\": 22.4,\n    \"recommended_ram_gb\": 37.3,\n    \"min_vram_gb\": 20.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"mistralai/Mixtral-8x7B-Instruct-v0.1\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"46.7B\",\n    \"parameters_raw\": 46702792704,\n    \"min_ram_gb\": 26.1,\n    \"recommended_ram_gb\": 43.5,\n    \"min_vram_gb\": 23.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"unknown\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 787218,\n    \"hf_likes\": 4641,\n    \"release_date\": \"2023-12-10\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 12900000000\n  },\n  {\n    \"name\": \"Salesforce/xLAM-8x7b-r\",\n    \"provider\": \"salesforce\",\n    \"parameter_count\": \"46.7B\",\n    \"parameters_raw\": 46702792704,\n    \"min_ram_gb\": 26.1,\n    \"recommended_ram_gb\": 43.5,\n    \"min_vram_gb\": 23.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 25430,\n    \"hf_likes\": 15,\n    \"release_date\": \"2024-08-28\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 13427052901,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/xLAM-8x7b-r-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"46.7B\",\n    \"parameters_raw\": 46702809088,\n    \"min_ram_gb\": 26.1,\n    \"recommended_ram_gb\": 43.5,\n    \"min_vram_gb\": 23.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 9050,\n    \"hf_likes\": 453,\n    \"release_date\": \"2024-01-11\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 12900000000\n  },\n  {\n    \"name\": \"moonshotai/Kimi-Linear-48B-A3B-Instruct\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"49.1B\",\n    \"parameters_raw\": 49122681728,\n    \"min_ram_gb\": 27.4,\n    \"recommended_ram_gb\": 45.7,\n    \"min_vram_gb\": 25.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"kimi_linear\",\n    \"hf_downloads\": 35486,\n    \"hf_likes\": 546,\n    \"release_date\": \"2025-10-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/Llama-3_3-Nemotron-Super-49B-v1_5\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"49.9B\",\n    \"parameters_raw\": 49867145216,\n    \"min_ram_gb\": 27.9,\n    \"recommended_ram_gb\": 46.4,\n    \"min_vram_gb\": 25.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron-nas\",\n    \"hf_downloads\": 105079,\n    \"hf_likes\": 226,\n    \"release_date\": \"2025-07-25\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Llama-3_3-Nemotron-Super-49B-v1_5-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/Llama-3_3-Nemotron-Super-49B-v1\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"49.9B\",\n    \"parameters_raw\": 49867145216,\n    \"min_ram_gb\": 27.9,\n    \"recommended_ram_gb\": 46.4,\n    \"min_vram_gb\": 25.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron-nas\",\n    \"hf_downloads\": 23805,\n    \"hf_likes\": 320,\n    \"release_date\": \"2025-03-16\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Llama-3_3-Nemotron-Super-49B-v1-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"txn545/Qwen3.5-122B-A10B-NVFP4\",\n    \"provider\": \"txn545\",\n    \"parameter_count\": \"64.4B\",\n    \"parameters_raw\": 64354266864,\n    \"min_ram_gb\": 36.0,\n    \"recommended_ram_gb\": 59.9,\n    \"min_vram_gb\": 33.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_5_moe\",\n    \"hf_downloads\": 37707,\n    \"hf_likes\": 6,\n    \"release_date\": \"2026-02-24\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 5128230639,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.1-70B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 801189,\n    \"hf_likes\": 894,\n    \"release_date\": \"2024-07-16\"\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.3-70B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Llama-3.3-70B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Llama-3.3-70B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"casperhansen/llama-3.3-70b-instruct-awq\",\n    \"provider\": \"casperhansen\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 674865,\n    \"hf_likes\": 39,\n    \"release_date\": \"2024-12-06\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"kosbu/Llama-3.3-70B-Instruct-AWQ\",\n    \"provider\": \"kosbu\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 505688,\n    \"hf_likes\": 10,\n    \"release_date\": \"2024-12-06\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4\",\n    \"provider\": \"ibnzterrell\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 138353,\n    \"hf_likes\": 30,\n    \"release_date\": \"2024-12-07\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 116205,\n    \"hf_likes\": 32,\n    \"release_date\": \"2024-07-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.1-70B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 75498,\n    \"hf_likes\": 408,\n    \"release_date\": \"2024-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Meta-Llama-3-70B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 61023,\n    \"hf_likes\": 1506,\n    \"release_date\": \"2024-04-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Meta-Llama-3-70B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3\",\n    \"provider\": \"tokyotech-llm\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 35321,\n    \"hf_likes\": 14,\n    \"release_date\": \"2024-12-25\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553707616,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 39962,\n    \"hf_likes\": 50,\n    \"release_date\": \"2024-07-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70560423936,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 42062,\n    \"hf_likes\": 14,\n    \"release_date\": \"2024-12-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70560423936,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 26238,\n    \"hf_likes\": 10,\n    \"release_date\": \"2025-02-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LLM360/K2-Think-V2\",\n    \"provider\": \"llm360\",\n    \"parameter_count\": \"72.6B\",\n    \"parameters_raw\": 72550195200,\n    \"min_ram_gb\": 40.5,\n    \"recommended_ram_gb\": 67.6,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 53839,\n    \"hf_likes\": 23,\n    \"release_date\": \"2026-01-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-72B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 558153,\n    \"hf_likes\": 916,\n    \"release_date\": \"2024-09-16\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-72B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-72B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 45193,\n    \"hf_likes\": 89,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2-72B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 40930,\n    \"hf_likes\": 719,\n    \"release_date\": \"2024-05-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2-72B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2-72B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 34455,\n    \"hf_likes\": 200,\n    \"release_date\": \"2024-05-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"huihui-ai/Qwen2.5-72B-Instruct-abliterated\",\n    \"provider\": \"huihui-ai\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 20754,\n    \"hf_likes\": 35,\n    \"release_date\": \"2024-10-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-72B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"73.0B\",\n    \"parameters_raw\": 72957861888,\n    \"min_ram_gb\": 40.8,\n    \"recommended_ram_gb\": 67.9,\n    \"min_vram_gb\": 37.4,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 922364,\n    \"hf_likes\": 75,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"73.0B\",\n    \"parameters_raw\": 72957861888,\n    \"min_ram_gb\": 40.8,\n    \"recommended_ram_gb\": 67.9,\n    \"min_vram_gb\": 37.4,\n    \"quantization\": \"GPTQ-Int8\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 42593,\n    \"hf_likes\": 28,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"NexVeridian/Qwen3-Coder-Next-8bit\",\n    \"provider\": \"nexveridian\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 300258,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-03\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 48644,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-09-15\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 48355,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-09-15\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 47109,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-09-15\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 47029,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-09-15\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-Next\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674391296,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 484455,\n    \"hf_likes\": 976,\n    \"release_date\": \"2026-01-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-Next-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79679212800,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 398505,\n    \"hf_likes\": 100,\n    \"release_date\": \"2026-02-01\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462383530,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-Next\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"80B\",\n    \"parameters_raw\": 80000000000,\n    \"min_ram_gb\": 44.8,\n    \"recommended_ram_gb\": 74.6,\n    \"min_vram_gb\": 41.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation, agentic coding\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 3000000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-01-30\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-Coder-Next-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Next-80B-A3B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"81.3B\",\n    \"parameters_raw\": 81324862720,\n    \"min_ram_gb\": 45.4,\n    \"recommended_ram_gb\": 75.7,\n    \"min_vram_gb\": 41.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 1224711,\n    \"hf_likes\": 945,\n    \"release_date\": \"2025-09-09\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5575200546,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Next-80B-A3B-Instruct-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"81.3B\",\n    \"parameters_raw\": 81329784384,\n    \"min_ram_gb\": 45.4,\n    \"recommended_ram_gb\": 75.7,\n    \"min_vram_gb\": 41.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 148887,\n    \"hf_likes\": 82,\n    \"release_date\": \"2025-09-22\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5575537949,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-110B-Chat-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"111.2B\",\n    \"parameters_raw\": 111209914368,\n    \"min_ram_gb\": 62.1,\n    \"recommended_ram_gb\": 103.6,\n    \"min_vram_gb\": 57.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 320397,\n    \"hf_likes\": 9,\n    \"release_date\": \"2024-04-27\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/gpt-oss-120b-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"116.8B\",\n    \"parameters_raw\": 116829154368,\n    \"min_ram_gb\": 65.3,\n    \"recommended_ram_gb\": 108.8,\n    \"min_vram_gb\": 59.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 61730,\n    \"hf_likes\": 12,\n    \"release_date\": \"2025-08-05\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 4,\n    \"active_parameters\": 9309823238,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"axolotl-ai-co/gpt-oss-120b-dequantized\",\n    \"provider\": \"axolotl-ai-co\",\n    \"parameter_count\": \"116.8B\",\n    \"parameters_raw\": 116829156672,\n    \"min_ram_gb\": 65.3,\n    \"recommended_ram_gb\": 108.8,\n    \"min_vram_gb\": 59.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 34254,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-07\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 4,\n    \"active_parameters\": 9309823421,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"openai/gpt-oss-120b\",\n    \"provider\": \"openai\",\n    \"parameter_count\": \"120.4B\",\n    \"parameters_raw\": 120412337472,\n    \"min_ram_gb\": 67.3,\n    \"recommended_ram_gb\": 112.1,\n    \"min_vram_gb\": 61.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 4194966,\n    \"hf_likes\": 4542,\n    \"release_date\": \"2025-08-04\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 4,\n    \"active_parameters\": 9595358141,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gpt-oss-120b-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-122B-A10B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"125.1B\",\n    \"parameters_raw\": 125086497008,\n    \"min_ram_gb\": 69.9,\n    \"recommended_ram_gb\": 116.5,\n    \"min_vram_gb\": 64.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5_moe\",\n    \"hf_downloads\": 171055,\n    \"hf_likes\": 389,\n    \"release_date\": \"2026-02-24\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 10000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-122B-A10B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"mistralai/Mixtral-8x22B-Instruct-v0.1\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"140.6B\",\n    \"parameters_raw\": 140630071296,\n    \"min_ram_gb\": 78.6,\n    \"recommended_ram_gb\": 131.0,\n    \"min_vram_gb\": 72.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"unknown\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 15022,\n    \"hf_likes\": 746,\n    \"release_date\": \"2024-04-16\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 39100000000\n  },\n  {\n    \"name\": \"MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ\",\n    \"provider\": \"maziyarpanahi\",\n    \"parameter_count\": \"140.6B\",\n    \"parameters_raw\": 140630071296,\n    \"min_ram_gb\": 78.6,\n    \"recommended_ram_gb\": 131.0,\n    \"min_vram_gb\": 72.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 40221,\n    \"hf_likes\": 13,\n    \"release_date\": \"2024-04-18\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 40431145496,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"rednote-hilab/dots.llm1.inst\",\n    \"provider\": \"rednote-hilab\",\n    \"parameter_count\": \"142.8B\",\n    \"parameters_raw\": 142774381696,\n    \"min_ram_gb\": 79.8,\n    \"recommended_ram_gb\": 133.0,\n    \"min_vram_gb\": 73.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"dots1\",\n    \"hf_downloads\": 5040,\n    \"hf_likes\": 175,\n    \"release_date\": \"2025-05-14\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/dots.llm1.inst-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"bigscience/bloom\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"176.2B\",\n    \"parameters_raw\": 176247271424,\n    \"min_ram_gb\": 98.5,\n    \"recommended_ram_gb\": 164.1,\n    \"min_vram_gb\": 90.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 4896,\n    \"hf_likes\": 4986,\n    \"release_date\": \"2022-05-19\"\n  },\n  {\n    \"name\": \"tiiuae/falcon-180B-chat\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"179.5B\",\n    \"parameters_raw\": 179522565120,\n    \"min_ram_gb\": 100.3,\n    \"recommended_ram_gb\": 167.2,\n    \"min_vram_gb\": 92.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon\",\n    \"hf_downloads\": 65,\n    \"hf_likes\": 545,\n    \"release_date\": \"2023-09-04\"\n  },\n  {\n    \"name\": \"stepfun-ai/Step-3.5-Flash\",\n    \"provider\": \"stepfun-ai\",\n    \"parameter_count\": \"199.4B\",\n    \"parameters_raw\": 199384301376,\n    \"min_ram_gb\": 111.4,\n    \"recommended_ram_gb\": 185.7,\n    \"min_vram_gb\": 102.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"step3p5\",\n    \"hf_downloads\": 327178,\n    \"hf_likes\": 674,\n    \"release_date\": \"2026-02-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/MiniMax-M2.5-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689748992,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 112426,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-13\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223714369,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/MiniMax-M2.5-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689748992,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 105419,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-13\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223714369,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/MiniMax-M2.5-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689748992,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 103821,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-13\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223714369,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/MiniMax-M2-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689748992,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax\",\n    \"hf_downloads\": 19959,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-10-29\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223714369,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"QuantTrio/MiniMax-M2-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689764864,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 586558,\n    \"hf_likes\": 8,\n    \"release_date\": \"2025-10-28\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223715635,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"QuantTrio/MiniMax-M2.5-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689764864,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 45340,\n    \"hf_likes\": 10,\n    \"release_date\": \"2026-02-15\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223715635,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"MiniMaxAI/MiniMax-M2.7\",\n    \"provider\": \"minimaxai\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228703644928,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Latest flagship with enhanced reasoning and coding\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-03-18\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 10000000000\n  },\n  {\n    \"name\": \"MiniMaxAI/MiniMax-M2.5\",\n    \"provider\": \"minimaxai\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228703644928,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 343848,\n    \"hf_likes\": 1080,\n    \"release_date\": \"2026-02-12\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 10000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/MiniMax-M2.5-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"MiniMaxAI/MiniMax-M2\",\n    \"provider\": \"minimaxai\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228703644928,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 275243,\n    \"hf_likes\": 1485,\n    \"release_date\": \"2025-10-22\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18224821702,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/MiniMax-M2-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"MiniMaxAI/MiniMax-M2.1\",\n    \"provider\": \"minimaxai\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228703644928,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 72189,\n    \"hf_likes\": 1257,\n    \"release_date\": \"2025-12-20\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18224821702,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/MiniMax-M2.1-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-235B-A22B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"235.1B\",\n    \"parameters_raw\": 235093634560,\n    \"min_ram_gb\": 131.4,\n    \"recommended_ram_gb\": 218.9,\n    \"min_vram_gb\": 120.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 684371,\n    \"hf_likes\": 1077,\n    \"release_date\": \"2025-04-27\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 22000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-235B-A22B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-235B-A22B-Instruct-2507-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"235.1B\",\n    \"parameters_raw\": 235107904512,\n    \"min_ram_gb\": 131.4,\n    \"recommended_ram_gb\": 219.0,\n    \"min_vram_gb\": 120.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 802366,\n    \"hf_likes\": 146,\n    \"release_date\": \"2025-07-21\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 25714927049,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-235B-A22B-Thinking-2507-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"235.1B\",\n    \"parameters_raw\": 235107904512,\n    \"min_ram_gb\": 131.4,\n    \"recommended_ram_gb\": 219.0,\n    \"min_vram_gb\": 120.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 77936,\n    \"hf_likes\": 83,\n    \"release_date\": \"2025-07-25\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 25714927049,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-235B-A22B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"235.1B\",\n    \"parameters_raw\": 235107904512,\n    \"min_ram_gb\": 131.4,\n    \"recommended_ram_gb\": 219.0,\n    \"min_vram_gb\": 120.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 32322,\n    \"hf_likes\": 90,\n    \"release_date\": \"2025-04-28\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 25714927049,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"casperhansen/deepseek-coder-v2-instruct-awq\",\n    \"provider\": \"casperhansen\",\n    \"parameter_count\": \"235.7B\",\n    \"parameters_raw\": 235741434880,\n    \"min_ram_gb\": 131.7,\n    \"recommended_ram_gb\": 219.6,\n    \"min_vram_gb\": 120.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 163840,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 155456,\n    \"hf_likes\": 11,\n    \"release_date\": \"2024-07-03\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 32782793288,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V2.5\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"235.7B\",\n    \"parameters_raw\": 235741434880,\n    \"min_ram_gb\": 131.7,\n    \"recommended_ram_gb\": 219.6,\n    \"min_vram_gb\": 120.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 84805,\n    \"hf_likes\": 733,\n    \"release_date\": \"2024-09-05\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 32782793288,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/DeepSeek-V2.5-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/DeepSeek-V2.5-1210-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"235.7B\",\n    \"parameters_raw\": 235741492480,\n    \"min_ram_gb\": 131.7,\n    \"recommended_ram_gb\": 219.6,\n    \"min_vram_gb\": 120.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 54313,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-01-04\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 32782801298,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LGAI-EXAONE/K-EXAONE-236B-A23B\",\n    \"provider\": \"lgai-exaone\",\n    \"parameter_count\": \"237.1B\",\n    \"parameters_raw\": 237099669632,\n    \"min_ram_gb\": 132.5,\n    \"recommended_ram_gb\": 220.8,\n    \"min_vram_gb\": 121.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone_moe\",\n    \"hf_downloads\": 23695,\n    \"hf_likes\": 549,\n    \"release_date\": \"2025-12-26\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 25932776361,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"baidu/ERNIE-4.5-300B-A47B-Paddle\",\n    \"provider\": \"baidu\",\n    \"parameter_count\": \"300.5B\",\n    \"parameters_raw\": 300474051776,\n    \"min_ram_gb\": 167.9,\n    \"recommended_ram_gb\": 279.8,\n    \"min_vram_gb\": 153.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ernie4_5_moe\",\n    \"hf_downloads\": 332,\n    \"hf_likes\": 12,\n    \"release_date\": \"2025-06-28\"\n  },\n  {\n    \"name\": \"XiaomiMiMo/MiMo-V2-Flash\",\n    \"provider\": \"xiaomimimo\",\n    \"parameter_count\": \"309.8B\",\n    \"parameters_raw\": 309785318400,\n    \"min_ram_gb\": 173.1,\n    \"recommended_ram_gb\": 288.5,\n    \"min_vram_gb\": 158.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mimo_v2_flash\",\n    \"hf_downloads\": 536830,\n    \"hf_likes\": 636,\n    \"release_date\": \"2025-12-16\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/MiMo-V2-Flash-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zai-org/GLM-4.6\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"356.8B\",\n    \"parameters_raw\": 356785898816,\n    \"min_ram_gb\": 199.4,\n    \"recommended_ram_gb\": 332.3,\n    \"min_vram_gb\": 182.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe\",\n    \"hf_downloads\": 81982,\n    \"hf_likes\": 1204,\n    \"release_date\": \"2025-09-29\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/GLM-4.6-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zai-org/GLM-4.5\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"358.3B\",\n    \"parameters_raw\": 358337791296,\n    \"min_ram_gb\": 200.2,\n    \"recommended_ram_gb\": 333.7,\n    \"min_vram_gb\": 183.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe\",\n    \"hf_downloads\": 42566,\n    \"hf_likes\": 1396,\n    \"release_date\": \"2025-07-20\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/GLM-4.5-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-R1-0528-NVFP4-v2\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"393.6B\",\n    \"parameters_raw\": 393632819968,\n    \"min_ram_gb\": 220.0,\n    \"recommended_ram_gb\": 366.6,\n    \"min_vram_gb\": 201.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 142525,\n    \"hf_likes\": 16,\n    \"release_date\": \"2025-07-21\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 31367615334,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-V3.1-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"393.6B\",\n    \"parameters_raw\": 393632819968,\n    \"min_ram_gb\": 220.0,\n    \"recommended_ram_gb\": 366.6,\n    \"min_vram_gb\": 201.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 37723,\n    \"hf_likes\": 13,\n    \"release_date\": \"2025-11-21\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 31367615334,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-V3.2-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"394.5B\",\n    \"parameters_raw\": 394498304256,\n    \"min_ram_gb\": 220.4,\n    \"recommended_ram_gb\": 367.4,\n    \"min_vram_gb\": 202.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v32\",\n    \"hf_downloads\": 21598,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-12-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-V3-0324-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"396.8B\",\n    \"parameters_raw\": 396767013632,\n    \"min_ram_gb\": 221.7,\n    \"recommended_ram_gb\": 369.5,\n    \"min_vram_gb\": 203.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 84851,\n    \"hf_likes\": 14,\n    \"release_date\": \"2025-05-03\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 31617371393,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-R1-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"396.8B\",\n    \"parameters_raw\": 396767013632,\n    \"min_ram_gb\": 221.7,\n    \"recommended_ram_gb\": 369.5,\n    \"min_vram_gb\": 203.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 43986,\n    \"hf_likes\": 271,\n    \"release_date\": \"2025-02-21\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 31617371393,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-4-Maverick-17B-128E-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"401.6B\",\n    \"parameters_raw\": 401583781376,\n    \"min_ram_gb\": 224.4,\n    \"recommended_ram_gb\": 374.0,\n    \"min_vram_gb\": 205.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"vision\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"llama4\",\n    \"hf_downloads\": 6341,\n    \"hf_likes\": 466,\n    \"release_date\": \"2025-04-01\",\n    \"is_moe\": true,\n    \"num_experts\": 16,\n    \"active_experts\": 1,\n    \"active_parameters\": 17000000000\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-397B-A17B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"403.4B\",\n    \"parameters_raw\": 403397928944,\n    \"min_ram_gb\": 225.4,\n    \"recommended_ram_gb\": 375.7,\n    \"min_vram_gb\": 206.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5_moe\",\n    \"hf_downloads\": 1291825,\n    \"hf_likes\": 1214,\n    \"release_date\": \"2026-02-16\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 17000000000\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.1-405B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"405.9B\",\n    \"parameters_raw\": 405853388800,\n    \"min_ram_gb\": 226.8,\n    \"recommended_ram_gb\": 378.0,\n    \"min_vram_gb\": 207.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 173410,\n    \"hf_likes\": 592,\n    \"release_date\": \"2024-07-16\"\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.1-405B-Instruct-FP8\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"405.9B\",\n    \"parameters_raw\": 405868625920,\n    \"min_ram_gb\": 226.8,\n    \"recommended_ram_gb\": 378.0,\n    \"min_vram_gb\": 207.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 22040,\n    \"hf_likes\": 193,\n    \"release_date\": \"2024-07-20\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-480B-A35B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"480.2B\",\n    \"parameters_raw\": 480154875392,\n    \"min_ram_gb\": 268.3,\n    \"recommended_ram_gb\": 447.2,\n    \"min_vram_gb\": 245.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 75486,\n    \"hf_likes\": 1304,\n    \"release_date\": \"2025-07-22\",\n    \"is_moe\": true,\n    \"num_experts\": 160,\n    \"active_experts\": 8,\n    \"active_parameters\": 35000000000\n  },\n  {\n    \"name\": \"meituan-longcat/LongCat-Flash-Chat\",\n    \"provider\": \"meituan-longcat\",\n    \"parameter_count\": \"561.9B\",\n    \"parameters_raw\": 561862880256,\n    \"min_ram_gb\": 314.0,\n    \"recommended_ram_gb\": 523.3,\n    \"min_vram_gb\": 287.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 30116,\n    \"hf_likes\": 526,\n    \"release_date\": \"2025-08-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"684.5B\",\n    \"parameters_raw\": 684531386000,\n    \"min_ram_gb\": 382.5,\n    \"recommended_ram_gb\": 637.5,\n    \"min_vram_gb\": 350.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 1026085,\n    \"hf_likes\": 13108,\n    \"release_date\": \"2025-01-20\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 37000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/DeepSeek-R1-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-0528\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"684.5B\",\n    \"parameters_raw\": 684531386000,\n    \"min_ram_gb\": 382.5,\n    \"recommended_ram_gb\": 637.5,\n    \"min_vram_gb\": 350.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 1050237,\n    \"hf_likes\": 2403,\n    \"release_date\": \"2025-05-28\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 54548594820,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V3-0324\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"684.5B\",\n    \"parameters_raw\": 684531386000,\n    \"min_ram_gb\": 382.5,\n    \"recommended_ram_gb\": 637.5,\n    \"min_vram_gb\": 350.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 270362,\n    \"hf_likes\": 3088,\n    \"release_date\": \"2025-03-24\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 54548594820,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V3\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"685B\",\n    \"parameters_raw\": 685000000000,\n    \"min_ram_gb\": 382.8,\n    \"recommended_ram_gb\": 638.0,\n    \"min_vram_gb\": 351.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"State-of-the-art, MoE architecture\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 37000000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V3.2-Speciale\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"685B\",\n    \"parameters_raw\": 685000000000,\n    \"min_ram_gb\": 383.2,\n    \"recommended_ram_gb\": 638.7,\n    \"min_vram_gb\": 351.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 37000000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-12-01\"\n  },\n  {\n    \"name\": \"QuantTrio/DeepSeek-V3.2-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"685.0B\",\n    \"parameters_raw\": 685011996928,\n    \"min_ram_gb\": 382.8,\n    \"recommended_ram_gb\": 638.0,\n    \"min_vram_gb\": 350.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v32\",\n    \"hf_downloads\": 103286,\n    \"hf_likes\": 11,\n    \"release_date\": \"2025-12-03\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V3.2\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"685.4B\",\n    \"parameters_raw\": 685396921376,\n    \"min_ram_gb\": 383.0,\n    \"recommended_ram_gb\": 638.3,\n    \"min_vram_gb\": 351.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v32\",\n    \"hf_downloads\": 362520,\n    \"hf_likes\": 1280,\n    \"release_date\": \"2025-12-01\"\n  },\n  {\n    \"name\": \"zai-org/GLM-5\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"753.9B\",\n    \"parameters_raw\": 753864139008,\n    \"min_ram_gb\": 421.3,\n    \"recommended_ram_gb\": 702.1,\n    \"min_vram_gb\": 386.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm_moe_dsa\",\n    \"hf_downloads\": 205187,\n    \"hf_likes\": 1698,\n    \"release_date\": \"2026-02-11\"\n  },\n  {\n    \"name\": \"moonshotai/Kimi-K2-Instruct\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"1026.5B\",\n    \"parameters_raw\": 1026470731056,\n    \"min_ram_gb\": 573.6,\n    \"recommended_ram_gb\": 956.0,\n    \"min_vram_gb\": 525.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"kimi_k2\",\n    \"hf_downloads\": 151155,\n    \"hf_likes\": 2324,\n    \"release_date\": \"2025-07-11\"\n  },\n  {\n    \"name\": \"moonshotai/Kimi-K2-Instruct-0905\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"1026.5B\",\n    \"parameters_raw\": 1026470735448,\n    \"min_ram_gb\": 573.6,\n    \"recommended_ram_gb\": 956.0,\n    \"min_vram_gb\": 525.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"kimi_k2\",\n    \"hf_downloads\": 28801,\n    \"hf_likes\": 683,\n    \"release_date\": \"2025-09-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"moonshotai/Kimi-K2.5\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"1058.6B\",\n    \"parameters_raw\": 1058589420528,\n    \"min_ram_gb\": 591.5,\n    \"recommended_ram_gb\": 985.9,\n    \"min_vram_gb\": 542.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"kimi_k25\",\n    \"hf_downloads\": 1899549,\n    \"hf_likes\": 2220,\n    \"release_date\": \"2026-01-01\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Kimi-K2.5-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  }\n]\n"
  },
  {
    "path": "flake.nix",
    "content": "{\n  description = \"Hundreds of models & providers. One command to find what runs on your hardware.\";\n\n  inputs = {\n    nixpkgs.url = \"github:NixOS/nixpkgs/nixos-unstable\";\n    flake-utils.url = \"github:numtide/flake-utils\";\n  };\n\n  outputs = { self, nixpkgs, flake-utils }:\n    flake-utils.lib.eachDefaultSystem (system:\n      let\n        pkgs = nixpkgs.legacyPackages.${system};\n        version = (builtins.fromTOML (builtins.readFile ./Cargo.toml)).workspace.package.version;\n      in\n      {\n        packages.default = pkgs.rustPlatform.buildRustPackage {\n          pname = \"llmfit\";\n          inherit version;\n\n          src = ./.;\n\n          cargoLock.lockFile = ./Cargo.lock;\n\n          # Build only the TUI binary (default workspace members exclude desktop)\n          cargoBuildFlags = [ \"--package\" \"llmfit\" ];\n          cargoTestFlags = [ \"--package\" \"llmfit\" \"--package\" \"llmfit-core\" ];\n\n          meta = with pkgs.lib; {\n            description = \"Matches LLM models to your hardware capabilities\";\n            homepage = \"https://github.com/AlexsJones/llmfit\";\n            license = licenses.mit;\n            maintainers = [ ];\n            mainProgram = \"llmfit\";\n          };\n        };\n\n        devShells.default = pkgs.mkShell {\n          buildInputs = with pkgs; [\n            rustc\n            cargo\n            rust-analyzer\n            clippy\n            rustfmt\n          ];\n        };\n      });\n}\n"
  },
  {
    "path": "index.html",
    "content": "<html>\n<head>\n  <meta charset=\"utf-8\">\n  <title>llmfit</title>\n</head>\n<body>\n  <h1>llmfit</h1>\n  <p>Match LLM models to your hardware.</p>\n  <pre>curl -fsSL https://llmfit.axjns.dev/install.sh | sh</pre>\n  <p><a href=\"https://github.com/AlexsJones/llmfit\">GitHub</a></p>\n</body>\n</html>\n"
  },
  {
    "path": "install.sh",
    "content": "#!/bin/sh\n# llmfit installer\n# Usage: curl -fsSL https://raw.githubusercontent.com/AlexsJones/llmfit/main/install.sh | sh\n#        curl -fsSL ... | sh -s -- --local   # Install to ~/.local/bin (no sudo)\n#\n# Downloads the latest llmfit release from GitHub and installs\n# the binary to /usr/local/bin (or ~/.local/bin with --local or if no sudo).\n# Supports piped execution: sudo prompts read from /dev/tty when stdin is a pipe.\n\nset -e\n\nREPO=\"AlexsJones/llmfit\"\nBINARY=\"llmfit\"\nLOCAL_INSTALL=\"\"\n\n# --- helpers ---\n\ninfo() { printf '  \\033[1;34m>\\033[0m %s\\n' \"$*\"; }\nwarn() { printf '  \\033[1;33m>\\033[0m %s\\n' \"$*\"; }\nerr()  { printf '  \\033[1;31m!\\033[0m %s\\n' \"$*\" >&2; exit 1; }\n\nneed() {\n    command -v \"$1\" >/dev/null 2>&1 || err \"Required tool '$1' not found. Please install it and try again.\"\n}\n\n# --- parse arguments ---\n\nparse_args() {\n    while [ $# -gt 0 ]; do\n        case \"$1\" in\n            --local|-l)\n                LOCAL_INSTALL=\"1\"\n                ;;\n            --help|-h)\n                echo \"Usage: install.sh [OPTIONS]\"\n                echo \"\"\n                echo \"Options:\"\n                echo \"  --local, -l    Install to ~/.local/bin (no sudo required)\"\n                echo \"  --help, -h     Show this help message\"\n                exit 0\n                ;;\n            *)\n                warn \"Unknown option: $1\"\n                ;;\n        esac\n        shift\n    done\n}\n\n# --- detect platform ---\n\ndetect_platform() {\n    OS=\"$(uname -s)\"\n    ARCH=\"$(uname -m)\"\n\n    case \"$OS\" in\n        Linux)  OS=\"unknown-linux-musl\" ;;\n        Darwin) OS=\"apple-darwin\" ;;\n        *)      err \"Unsupported OS: $OS\" ;;\n    esac\n\n    case \"$ARCH\" in\n        x86_64|amd64)   ARCH=\"x86_64\" ;;\n        aarch64|arm64)  ARCH=\"aarch64\" ;;\n        *)              err \"Unsupported architecture: $ARCH\" ;;\n    esac\n\n    PLATFORM=\"${ARCH}-${OS}\"\n}\n\n# --- fetch latest release ---\n\nfetch_latest_tag() {\n    need curl\n    need tar\n\n    # Use the releases redirect instead of the API to avoid GitHub's\n    # 60-request/hour rate limit on unauthenticated API calls (403).\n    TAG=\"$(curl -fsSI \"https://github.com/${REPO}/releases/latest\" 2>/dev/null \\\n        | grep -i '^location:' \\\n        | head -1 \\\n        | sed 's|.*/tag/||' \\\n        | tr -d '\\r\\n')\"\n\n    [ -n \"$TAG\" ] || err \"Could not determine latest release. Check https://github.com/${REPO}/releases\"\n}\n\n# --- checksum verification ---\n\nverify_checksum() {\n    CHECKSUM_FILE=\"${TMPDIR}/${ASSET}.sha256\"\n\n    # Attempt to download the checksum file (-f exits non-zero on HTTP 4xx/5xx)\n    if ! curl -fsSL --max-time 10 \"${URL}.sha256\" -o \"$CHECKSUM_FILE\" 2>/dev/null; then\n        warn \"No checksum file found for this release — skipping integrity check\"\n        return\n    fi\n\n    info \"Verifying checksum...\"\n    if command -v sha256sum >/dev/null 2>&1; then\n        (cd \"$TMPDIR\" && sha256sum -c \"${ASSET}.sha256\" --quiet) \\\n            || err \"Checksum verification failed. The download may be corrupted or tampered with.\"\n    elif command -v shasum >/dev/null 2>&1; then\n        (cd \"$TMPDIR\" && shasum -a 256 -q -c \"${ASSET}.sha256\") \\\n            || err \"Checksum verification failed. The download may be corrupted or tampered with.\"\n    else\n        warn \"Neither sha256sum nor shasum available — skipping integrity check\"\n    fi\n}\n\n# --- download and install ---\n\ninstall() {\n    ASSET=\"${BINARY}-${TAG}-${PLATFORM}.tar.gz\"\n    URL=\"https://github.com/${REPO}/releases/download/${TAG}/${ASSET}\"\n\n    TMPDIR=\"$(mktemp -d)\"\n    trap 'rm -rf \"$TMPDIR\"' EXIT\n\n    info \"Downloading ${BINARY} ${TAG} for ${PLATFORM}...\"\n    curl -fsSL \"$URL\" -o \"${TMPDIR}/${ASSET}\" \\\n        || err \"Download failed. Asset '${ASSET}' may not exist for your platform.\\n  Check: https://github.com/${REPO}/releases/tag/${TAG}\"\n\n    verify_checksum\n\n    info \"Extracting...\"\n    tar -xzf \"${TMPDIR}/${ASSET}\" -C \"$TMPDIR\"\n\n    # Find the binary in the extracted contents\n    BIN=\"$(find \"$TMPDIR\" -name \"$BINARY\" -type f | head -1)\"\n    [ -n \"$BIN\" ] || err \"Binary not found in archive. Release asset may have an unexpected layout.\"\n    chmod +x \"$BIN\"\n\n    # Determine install directory\n    if [ -n \"$LOCAL_INSTALL\" ]; then\n        # User explicitly requested local install\n        INSTALL_DIR=\"${HOME}/.local/bin\"\n        mkdir -p \"$INSTALL_DIR\"\n        info \"Installing to ${INSTALL_DIR} (--local mode)...\"\n    elif [ -w /usr/local/bin ]; then\n        # /usr/local/bin is writable without sudo\n        INSTALL_DIR=\"/usr/local/bin\"\n    elif command -v sudo >/dev/null 2>&1; then\n        # sudo is available — use /dev/tty for password prompt when stdin is a pipe\n        info \"Installing to /usr/local/bin (requires sudo)...\"\n        if [ -t 0 ]; then\n            SUDO_ASKPASS=\"\" sudo mv \"$BIN\" \"/usr/local/bin/${BINARY}\"\n        elif [ -e /dev/tty ]; then\n            SUDO_ASKPASS=\"\" sudo mv \"$BIN\" \"/usr/local/bin/${BINARY}\" </dev/tty\n        else\n            false\n        fi\n        if [ $? -eq 0 ]; then\n            info \"Installed ${BINARY} to /usr/local/bin/${BINARY}\"\n            return\n        else\n            warn \"sudo failed, falling back to ~/.local/bin\"\n            INSTALL_DIR=\"${HOME}/.local/bin\"\n            mkdir -p \"$INSTALL_DIR\"\n        fi\n    else\n        # No write access and no interactive sudo, use local install\n        INSTALL_DIR=\"${HOME}/.local/bin\"\n        mkdir -p \"$INSTALL_DIR\"\n        info \"Installing to ${INSTALL_DIR} (no sudo available)...\"\n    fi\n\n    mv \"$BIN\" \"${INSTALL_DIR}/${BINARY}\"\n    info \"Installed ${BINARY} to ${INSTALL_DIR}/${BINARY}\"\n\n    # Check if install dir is in PATH\n    case \":$PATH:\" in\n        *\":${INSTALL_DIR}:\"*) ;;\n        *)\n            warn \"Add ${INSTALL_DIR} to your PATH to use '${BINARY}' directly:\"\n            echo \"\"\n            echo \"    export PATH=\\\"\\$HOME/.local/bin:\\$PATH\\\"\"\n            echo \"\"\n            ;;\n    esac\n}\n\n# --- main ---\n\nmain() {\n    parse_args \"$@\"\n    info \"llmfit installer\"\n    detect_platform\n    fetch_latest_tag\n    install\n    info \"Done. Run '${BINARY}' to get started.\"\n}\n\nmain \"$@\"\n"
  },
  {
    "path": "llmfit-core/Cargo.toml",
    "content": "[package]\nname = \"llmfit-core\"\nversion.workspace = true\nedition = \"2024\"\nauthors = [\"Alex Jones <alex@example.com>\"]\ndescription = \"Core library for llmfit — hardware detection, model fitting, and provider integration\"\nlicense = \"MIT\"\nrepository = \"https://github.com/AlexsJones/llmfit\"\nhomepage = \"https://github.com/AlexsJones/llmfit\"\nreadme = \"../README.md\"\nkeywords = [\"llm\", \"hardware\", \"inference\", \"models\", \"gpu\"]\ncategories = [\"hardware-support\"]\n\n[dependencies]\nserde = { version = \"1.0\", features = [\"derive\"] }\nserde_json = \"1.0\"\nsysinfo = \"0.38\"\nureq = { version = \"3.2\", features = [\"json\"] }\n"
  },
  {
    "path": "llmfit-core/data/docker_models.json",
    "content": "{\n  \"generated_by\": \"scrape_docker_models.py\",\n  \"docker_hub_repo_count\": 46,\n  \"matched_model_count\": 35,\n  \"models\": [\n    {\n      \"hf_name\": \"HuggingFaceTB/SmolLM2-135M\",\n      \"docker_tag\": \"ai/smollm2:135m\",\n      \"docker_repo\": \"ai/smollm2\",\n      \"available_tags\": [\n        \"latest\",\n        \"135M-Q4_K_M\",\n        \"360M-Q4_K_M\",\n        \"135M-Q2_K\",\n        \"360M-F16\",\n        \"360M-Q4_0\",\n        \"135M-Q4_0\",\n        \"135M-F16\"\n      ]\n    },\n    {\n      \"hf_name\": \"HuggingFaceTB/SmolLM2-135M-Instruct\",\n      \"docker_tag\": \"ai/smollm2:135m\",\n      \"docker_repo\": \"ai/smollm2\",\n      \"available_tags\": [\n        \"latest\",\n        \"135M-Q4_K_M\",\n        \"360M-Q4_K_M\",\n        \"135M-Q2_K\",\n        \"360M-F16\",\n        \"360M-Q4_0\",\n        \"135M-Q4_0\",\n        \"135M-F16\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-0.5B-Instruct\",\n      \"docker_tag\": \"ai/qwen2.5:0.5b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-0.5B\",\n      \"docker_tag\": \"ai/qwen2.5:0.5b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Gensyn/Qwen2.5-0.5B-Instruct\",\n      \"docker_tag\": \"ai/qwen2.5:0.5b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3-0.6B\",\n      \"docker_tag\": \"ai/qwen3:0.6b\",\n      \"docker_repo\": \"ai/qwen3\",\n      \"available_tags\": [\n        \"latest\",\n        \"4B-F16\",\n        \"4B-UD-Q8_K_XL\",\n        \"4B-UD-Q4_K_XL\",\n        \"14B-Q6_K\",\n        \"30B-A3B-F16\",\n        \"30B-A3B-Q4_K_M\",\n        \"0.6B-Q4_0\",\n        \"0.6B-F16\",\n        \"0.6B-Q4_K_M\",\n        \"8B-Q4_0\",\n        \"8B-F16\",\n        \"8B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"meta-llama/Llama-3.2-1B\",\n      \"docker_tag\": \"ai/llama3.2:1b\",\n      \"docker_repo\": \"ai/llama3.2\",\n      \"available_tags\": [\n        \"latest\",\n        \"3B-Q4_0\",\n        \"1B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1B-F16\",\n        \"1B-Q8_0\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-1.5B-Instruct\",\n      \"docker_tag\": \"ai/qwen2.5:1.5b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-1.5B\",\n      \"docker_tag\": \"ai/qwen2.5:1.5b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3-1.7B-Base\",\n      \"docker_tag\": \"ai/qwen3:1.7b\",\n      \"docker_repo\": \"ai/qwen3\",\n      \"available_tags\": [\n        \"latest\",\n        \"4B-F16\",\n        \"4B-UD-Q8_K_XL\",\n        \"4B-UD-Q4_K_XL\",\n        \"14B-Q6_K\",\n        \"30B-A3B-F16\",\n        \"30B-A3B-Q4_K_M\",\n        \"0.6B-Q4_0\",\n        \"0.6B-F16\",\n        \"0.6B-Q4_K_M\",\n        \"8B-Q4_0\",\n        \"8B-F16\",\n        \"8B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-3B-Instruct\",\n      \"docker_tag\": \"ai/qwen2.5:3b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"meta-llama/Llama-3.2-3B\",\n      \"docker_tag\": \"ai/llama3.2:3b\",\n      \"docker_repo\": \"ai/llama3.2\",\n      \"available_tags\": [\n        \"latest\",\n        \"3B-Q4_0\",\n        \"1B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1B-F16\",\n        \"1B-Q8_0\"\n      ]\n    },\n    {\n      \"hf_name\": \"google/gemma-3n-E2B-it\",\n      \"docker_tag\": \"ai/gemma3n:e2b\",\n      \"docker_repo\": \"ai/gemma3n\",\n      \"available_tags\": [\n        \"latest\",\n        \"2B-F16\",\n        \"2B-Q4_K_M\",\n        \"4B-F16\",\n        \"4B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"mistralai/Mistral-7B-Instruct-v0.2\",\n      \"docker_tag\": \"ai/mistral:7b\",\n      \"docker_repo\": \"ai/mistral\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Featherless-Chat-Models/Mistral-7B-Instruct-v0.2\",\n      \"docker_tag\": \"ai/mistral:7b\",\n      \"docker_repo\": \"ai/mistral\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"mistralai/Mistral-7B-Instruct-v0.3\",\n      \"docker_tag\": \"ai/mistral:7b\",\n      \"docker_repo\": \"ai/mistral\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-7B-Instruct\",\n      \"docker_tag\": \"ai/qwen2.5:7b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-7B\",\n      \"docker_tag\": \"ai/qwen2.5:7b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"google/gemma-3n-E4B-it\",\n      \"docker_tag\": \"ai/gemma3n:e4b\",\n      \"docker_repo\": \"ai/gemma3n\",\n      \"available_tags\": [\n        \"latest\",\n        \"2B-F16\",\n        \"2B-Q4_K_M\",\n        \"4B-F16\",\n        \"4B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"google/gemma-3-12b-it\",\n      \"docker_tag\": \"ai/gemma3:12b\",\n      \"docker_repo\": \"ai/gemma3\",\n      \"available_tags\": [\n        \"latest\",\n        \"4B-Q4_K_M\",\n        \"4B-F16\",\n        \"270M-UD-IQ2_XXS\",\n        \"4B\",\n        \"270M-UD-Q4_K_XL\",\n        \"270M-F16\",\n        \"4B-Q4_0\",\n        \"1B-F16\",\n        \"1B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"mistralai/Mistral-Nemo-Instruct-2407\",\n      \"docker_tag\": \"ai/mistral-nemo\",\n      \"docker_repo\": \"ai/mistral-nemo\",\n      \"available_tags\": [\n        \"latest\",\n        \"12B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"microsoft/phi-4\",\n      \"docker_tag\": \"ai/phi4\",\n      \"docker_repo\": \"ai/phi4\",\n      \"available_tags\": [\n        \"latest\",\n        \"14B-Q4_0\",\n        \"14B-F16\",\n        \"14B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-14B-Instruct\",\n      \"docker_tag\": \"ai/qwen2.5:14b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3-14B\",\n      \"docker_tag\": \"ai/qwen3:14b\",\n      \"docker_repo\": \"ai/qwen3\",\n      \"available_tags\": [\n        \"latest\",\n        \"4B-F16\",\n        \"4B-UD-Q8_K_XL\",\n        \"4B-UD-Q4_K_XL\",\n        \"14B-Q6_K\",\n        \"30B-A3B-F16\",\n        \"30B-A3B-Q4_K_M\",\n        \"0.6B-Q4_0\",\n        \"0.6B-F16\",\n        \"0.6B-Q4_K_M\",\n        \"8B-Q4_0\",\n        \"8B-F16\",\n        \"8B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3.5-27B\",\n      \"docker_tag\": \"ai/qwen3.5\",\n      \"docker_repo\": \"ai/qwen3.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"397B\",\n        \"397B-UD-Q4_K_XL\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-32B-Instruct\",\n      \"docker_tag\": \"ai/qwen2.5:32b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3.5-35B-A3B\",\n      \"docker_tag\": \"ai/qwen3.5:35b\",\n      \"docker_repo\": \"ai/qwen3.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"397B\",\n        \"397B-UD-Q4_K_XL\"\n      ]\n    },\n    {\n      \"hf_name\": \"meta-llama/Llama-3.1-70B-Instruct\",\n      \"docker_tag\": \"ai/llama3.1:70b\",\n      \"docker_repo\": \"ai/llama3.1\",\n      \"available_tags\": [\n        \"latest\",\n        \"8B-F16\",\n        \"8B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"meta-llama/Llama-3.3-70B-Instruct\",\n      \"docker_tag\": \"ai/llama3.3:70b\",\n      \"docker_repo\": \"ai/llama3.3\",\n      \"available_tags\": [\n        \"latest\",\n        \"70B-Q4_0\",\n        \"70B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen2.5-72B-Instruct\",\n      \"docker_tag\": \"ai/qwen2.5:72b\",\n      \"docker_repo\": \"ai/qwen2.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"7B-Q4_0\",\n        \"3B-F16\",\n        \"3B-Q4_K_M\",\n        \"1.5B-F16\",\n        \"0.5B-F16\",\n        \"7B-F16\",\n        \"7B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3-Coder-Next\",\n      \"docker_tag\": \"ai/qwen3-coder-next\",\n      \"docker_repo\": \"ai/qwen3-coder-next\",\n      \"available_tags\": [\n        \"latest\",\n        \"80B\",\n        \"80B-Q8_0\",\n        \"80B-Q4_K_M\",\n        \"80B-Q5_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3-Coder-Next\",\n      \"docker_tag\": \"ai/qwen3-coder-next\",\n      \"docker_repo\": \"ai/qwen3-coder-next\",\n      \"available_tags\": [\n        \"latest\",\n        \"80B\",\n        \"80B-Q8_0\",\n        \"80B-Q4_K_M\",\n        \"80B-Q5_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3.5-122B-A10B\",\n      \"docker_tag\": \"ai/qwen3.5:122b\",\n      \"docker_repo\": \"ai/qwen3.5\",\n      \"available_tags\": [\n        \"latest\",\n        \"397B\",\n        \"397B-UD-Q4_K_XL\"\n      ]\n    },\n    {\n      \"hf_name\": \"Qwen/Qwen3-235B-A22B\",\n      \"docker_tag\": \"ai/qwen3:235b\",\n      \"docker_repo\": \"ai/qwen3\",\n      \"available_tags\": [\n        \"latest\",\n        \"4B-F16\",\n        \"4B-UD-Q8_K_XL\",\n        \"4B-UD-Q4_K_XL\",\n        \"14B-Q6_K\",\n        \"30B-A3B-F16\",\n        \"30B-A3B-Q4_K_M\",\n        \"0.6B-Q4_0\",\n        \"0.6B-F16\",\n        \"0.6B-Q4_K_M\",\n        \"8B-Q4_0\",\n        \"8B-F16\",\n        \"8B-Q4_K_M\"\n      ]\n    },\n    {\n      \"hf_name\": \"meta-llama/Llama-3.1-405B-Instruct\",\n      \"docker_tag\": \"ai/llama3.1:405b\",\n      \"docker_repo\": \"ai/llama3.1\",\n      \"available_tags\": [\n        \"latest\",\n        \"8B-F16\",\n        \"8B-Q4_K_M\"\n      ]\n    }\n  ]\n}\n"
  },
  {
    "path": "llmfit-core/data/hf_models.json",
    "content": "[\n  {\n    \"name\": \"echarlaix/tiny-random-PhiForCausalLM\",\n    \"provider\": \"echarlaix\",\n    \"parameter_count\": \"80K\",\n    \"parameters_raw\": 80074,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi\",\n    \"hf_downloads\": 24984,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-03-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-random-GPT2LMHeadModel\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"83K\",\n    \"parameters_raw\": 83161,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 37534,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-random-gpt2\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"112K\",\n    \"parameters_raw\": 111968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 28458,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/tiny-random-gpt2-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-random-GPTJForCausalLM\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"129K\",\n    \"parameters_raw\": 129184,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gptj\",\n    \"hf_downloads\": 38953,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/Olmo-3-7B-Instruct\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"528K\",\n    \"parameters_raw\": 528384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 101787,\n    \"hf_likes\": 118,\n    \"release_date\": \"2025-11-19\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Olmo-3-7B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"allenai/Olmo-3-7B-Think\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"528K\",\n    \"parameters_raw\": 528384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 44414,\n    \"hf_likes\": 88,\n    \"release_date\": \"2025-11-18\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Olmo-3-7B-Think-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"allenai/Olmo-3-7B-Think-DPO\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"528K\",\n    \"parameters_raw\": 528384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 21555,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-11-18\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Olmo-3-7B-Think-DPO-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"MaxJeblick/llama2-0b-unit-test\",\n    \"provider\": \"maxjeblick\",\n    \"parameter_count\": \"771K\",\n    \"parameters_raw\": 770940,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1024,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 48409,\n    \"hf_likes\": 2,\n    \"release_date\": \"2023-10-25\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-random-OPTForCausalLM\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"812K\",\n    \"parameters_raw\": 812404,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 100,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"opt\",\n    \"hf_downloads\": 388627,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hmellor/tiny-random-LlamaForCausalLM\",\n    \"provider\": \"hmellor\",\n    \"parameter_count\": \"1M\",\n    \"parameters_raw\": 1062992,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1295572,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/tiny-dummy-qwen2\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"1M\",\n    \"parameters_raw\": 1217480,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 102441,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-07-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"SimpleStories/SimpleStories-1.25M\",\n    \"provider\": \"simplestories\",\n    \"parameter_count\": \"1M\",\n    \"parameters_raw\": 1245824,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 86406,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"optimum-intel-internal-testing/tiny-random-Phi3ForCausalLM\",\n    \"provider\": \"optimum-intel-internal-testing\",\n    \"parameter_count\": \"2M\",\n    \"parameters_raw\": 2072736,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 22058,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-10-21\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/tiny-random-Phi3ForCausalLM-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"llamafactory/tiny-random-qwen3\",\n    \"provider\": \"llamafactory\",\n    \"parameter_count\": \"2M\",\n    \"parameters_raw\": 2439264,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 47369,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-01-06\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/tiny-random-qwen3-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"tiny-random/qwen3-next-moe\",\n    \"provider\": \"tiny-random\",\n    \"parameter_count\": \"3M\",\n    \"parameters_raw\": 2839160,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 27920,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-09-12\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 10,\n    \"active_parameters\": 984828,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"llamafactory/tiny-random-Llama-3\",\n    \"provider\": \"llamafactory\",\n    \"parameter_count\": \"4M\",\n    \"parameters_raw\": 4112464,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 950276,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-06-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Maykeye/TinyLLama-v0\",\n    \"provider\": \"maykeye\",\n    \"parameter_count\": \"5M\",\n    \"parameters_raw\": 4621392,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 32384,\n    \"hf_likes\": 43,\n    \"release_date\": \"2023-07-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/TinyLLama-v0-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"optimum-intel-internal-testing/tiny-random-gpt-oss-mxfp4\",\n    \"provider\": \"optimum-intel-internal-testing\",\n    \"parameter_count\": \"7M\",\n    \"parameters_raw\": 6865444,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 27904,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-10-21\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 1158540,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hmellor/tiny-random-Gemma2ForCausalLM\",\n    \"provider\": \"hmellor\",\n    \"parameter_count\": \"8M\",\n    \"parameters_raw\": 8438816,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 339841,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random\",\n    \"provider\": \"michaelbenayoun\",\n    \"parameter_count\": \"9M\",\n    \"parameters_raw\": 8537216,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 52387,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-03-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"tiiuae/falcon-mamba-tiny-dev\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"9M\",\n    \"parameters_raw\": 8765056,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon_mamba\",\n    \"hf_downloads\": 21730,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-10-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"arnir0/Tiny-LLM\",\n    \"provider\": \"arnir0\",\n    \"parameter_count\": \"13M\",\n    \"parameters_raw\": 12988992,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1024,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 54600,\n    \"hf_likes\": 45,\n    \"release_date\": \"2024-11-03\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Tiny-LLM-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/pythia-14m\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"14M\",\n    \"parameters_raw\": 14067712,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 33322,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-24\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/pythia-14m-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"hmellor/tiny-random-BambaForCausalLM\",\n    \"provider\": \"hmellor\",\n    \"parameter_count\": \"33M\",\n    \"parameters_raw\": 33110760,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bamba\",\n    \"hf_downloads\": 173798,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"erwanf/gpt2-mini\",\n    \"provider\": \"erwanf\",\n    \"parameter_count\": \"39M\",\n    \"parameters_raw\": 38604288,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 391187,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-06-23\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/gpt2-mini-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/pythia-14m-deduped\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"39M\",\n    \"parameters_raw\": 39233560,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 69404,\n    \"hf_likes\": 28,\n    \"release_date\": \"2023-07-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hyper-accel/tiny-random-llama\",\n    \"provider\": \"hyper-accel\",\n    \"parameter_count\": \"73M\",\n    \"parameters_raw\": 73271808,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 44649,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-02-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/SmolLM-135M-Instruct-quantized.w8a16\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"83M\",\n    \"parameters_raw\": 83356260,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 20835,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-08-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"tiiuae/Falcon-H1-Tiny-90M-Instruct\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"91M\",\n    \"parameters_raw\": 91131072,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon_h1\",\n    \"hf_downloads\": 301062,\n    \"hf_likes\": 33,\n    \"release_date\": \"2026-01-12\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Falcon-H1-Tiny-90M-Instruct-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/pythia-70m-deduped\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"96M\",\n    \"parameters_raw\": 95592496,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 613928,\n    \"hf_likes\": 27,\n    \"release_date\": \"2023-02-13\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/pythia-70m-deduped-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"gratefulasi/lumeleto\",\n    \"provider\": \"gratefulasi\",\n    \"parameter_count\": \"124M\",\n    \"parameters_raw\": 124439808,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1024,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 47679,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"peft-internal-testing/opt-125m\",\n    \"provider\": \"peft-internal-testing\",\n    \"parameter_count\": \"125M\",\n    \"parameters_raw\": 125239296,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"opt\",\n    \"hf_downloads\": 232784,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"state-spaces/mamba-130m-hf\",\n    \"provider\": \"state-spaces\",\n    \"parameter_count\": \"129M\",\n    \"parameters_raw\": 129135360,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mamba\",\n    \"hf_downloads\": 161407,\n    \"hf_likes\": 68,\n    \"release_date\": \"2024-03-06\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/mamba-130m-hf-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM2-135M\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"135M\",\n    \"parameters_raw\": 134515008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 954486,\n    \"hf_likes\": 168,\n    \"release_date\": \"2024-10-31\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/SmolLM2-135M-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM2-135M-Instruct\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"135M\",\n    \"parameters_raw\": 134515008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 603656,\n    \"hf_likes\": 295,\n    \"release_date\": \"2024-10-31\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/SmolLM2-135M-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/SmolLM2-135M-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM-135M-Instruct\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"135M\",\n    \"parameters_raw\": 134515008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 359214,\n    \"hf_likes\": 133,\n    \"release_date\": \"2024-07-15\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/SmolLM-135M-Instruct-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM-135M\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"135M\",\n    \"parameters_raw\": 134515008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 156129,\n    \"hf_likes\": 249,\n    \"release_date\": \"2024-07-14\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/SmolLM-135M-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nomic-ai/nomic-embed-text-v1.5\",\n    \"provider\": \"Nomic\",\n    \"parameter_count\": \"137M\",\n    \"parameters_raw\": 137000000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"F16\",\n    \"context_length\": 8192,\n    \"use_case\": \"Text embeddings for RAG\",\n    \"pipeline_tag\": \"feature-extraction\",\n    \"architecture\": \"nomic_bert\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/nomic-embed-text-v1.5-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/gpt-neo-125m\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"150M\",\n    \"parameters_raw\": 150364416,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neo\",\n    \"hf_downloads\": 100060,\n    \"hf_likes\": 227,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"JackFram/llama-160m\",\n    \"provider\": \"jackfram\",\n    \"parameter_count\": \"162M\",\n    \"parameters_raw\": 162417792,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 46025,\n    \"hf_likes\": 36,\n    \"release_date\": \"2023-05-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/DialoGPT-small\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"176M\",\n    \"parameters_raw\": 175620096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1024,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 58248,\n    \"hf_likes\": 143,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"183M\",\n    \"parameters_raw\": 182975232,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 441394,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-01-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"AI-Sweden-Models/gpt-sw3-126m\",\n    \"provider\": \"ai-sweden-models\",\n    \"parameter_count\": \"186M\",\n    \"parameters_raw\": 186112512,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt2\",\n    \"hf_downloads\": 115269,\n    \"hf_likes\": 3,\n    \"release_date\": \"2022-12-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"rinna/japanese-gpt-neox-small\",\n    \"provider\": \"rinna\",\n    \"parameter_count\": \"204M\",\n    \"parameters_raw\": 203611008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 457560,\n    \"hf_likes\": 15,\n    \"release_date\": \"2022-08-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-160m-deduped\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"213M\",\n    \"parameters_raw\": 212654688,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 82245,\n    \"hf_likes\": 3,\n    \"release_date\": \"2023-02-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/pythia-160m-deduped-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Vamsi/T5_Paraphrase_Paws\",\n    \"provider\": \"vamsi\",\n    \"parameter_count\": \"223M\",\n    \"parameters_raw\": 222903936,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 512,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5\",\n    \"hf_downloads\": 83813,\n    \"hf_likes\": 40,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"TitanML/tiny-mixtral\",\n    \"provider\": \"titanml\",\n    \"parameter_count\": \"247M\",\n    \"parameters_raw\": 246961152,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 100054,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-04-24\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 71001329,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Instruct-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"256M\",\n    \"parameters_raw\": 256113408,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 441834,\n    \"hf_likes\": 4,\n    \"release_date\": \"2026-01-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-1.7B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"269M\",\n    \"parameters_raw\": 268944384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 25290,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/t5gemma-s-s-prefixlm\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"313M\",\n    \"parameters_raw\": 312517632,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5gemma\",\n    \"hf_downloads\": 41131,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-06-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"329M\",\n    \"parameters_raw\": 329251584,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 449901,\n    \"hf_likes\": 2,\n    \"release_date\": \"2026-01-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-1.2B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"329M\",\n    \"parameters_raw\": 329251584,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 26421,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-ColBERT-350M\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"353M\",\n    \"parameters_raw\": 353322752,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Semantic search, sentence similarity\",\n    \"pipeline_tag\": \"sentence-similarity\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M\",\n    \"provider\": \"liquidai\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 41124,\n    \"hf_likes\": 235,\n    \"release_date\": \"2025-07-10\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2-350M-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM2-360M\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"362M\",\n    \"parameters_raw\": 361821120,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 36444,\n    \"hf_likes\": 87,\n    \"release_date\": \"2024-10-31\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/SmolLM2-360M-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M-Extract\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Data extraction, structured output\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/LFM2-350M-Extract-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M-Math\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Math reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/LFM2-350M-Math-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M-ENJP-MT\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"English-Japanese translation\",\n    \"pipeline_tag\": \"translation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/LFM2-350M-ENJP-MT-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-350M-PII-Extract-JP\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"PII extraction, Japanese\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/LFM2-350M-PII-Extract-JP-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-350M-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"mlx-8bit\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-350M-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"354M\",\n    \"parameters_raw\": 354483968,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"BF16\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM-360M-Instruct\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"362M\",\n    \"parameters_raw\": 361821120,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 26935,\n    \"hf_likes\": 83,\n    \"release_date\": \"2024-07-15\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/SmolLM-360M-Instruct-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"openbmb/MiniCPM4-0.5B\",\n    \"provider\": \"openbmb\",\n    \"parameter_count\": \"434M\",\n    \"parameters_raw\": 433873920,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 28889,\n    \"hf_likes\": 77,\n    \"release_date\": \"2025-06-05\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/MiniCPM4-0.5B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-VL-450M\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"451M\",\n    \"parameters_raw\": 450822656,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"ggml-org/LFM2-VL-450M-GGUF\",\n        \"provider\": \"ggml-org\"\n      },\n      {\n        \"repo\": \"mradermacher/LFM2-VL-450M-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-1.7B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"484M\",\n    \"parameters_raw\": 484000768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 28313,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-0.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 6992099,\n    \"hf_likes\": 470,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-0.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-0.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1408034,\n    \"hf_likes\": 65,\n    \"release_date\": \"2024-11-06\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-0.5B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-0.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-0.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1200041,\n    \"hf_likes\": 378,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen2.5-0.5B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2-0.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 259334,\n    \"hf_likes\": 200,\n    \"release_date\": \"2024-06-03\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2-0.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Gensyn/Qwen2.5-0.5B-Instruct\",\n    \"provider\": \"gensyn\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 106514,\n    \"hf_likes\": 33,\n    \"release_date\": \"2025-03-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-0.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-0.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"494M\",\n    \"parameters_raw\": 494032768,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 64868,\n    \"hf_likes\": 44,\n    \"release_date\": \"2024-11-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-0.5B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/pythia-410m\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"506M\",\n    \"parameters_raw\": 505997504,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 88847,\n    \"hf_likes\": 36,\n    \"release_date\": \"2023-02-13\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/pythia-410m-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/pythia-410m-deduped\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"506M\",\n    \"parameters_raw\": 505997504,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 32196,\n    \"hf_likes\": 20,\n    \"release_date\": \"2023-02-13\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/pythia-410m-deduped-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"h2oai/h2o-danube3-500m-chat\",\n    \"provider\": \"h2oai\",\n    \"parameter_count\": \"514M\",\n    \"parameters_raw\": 513590784,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 31122,\n    \"hf_likes\": 39,\n    \"release_date\": \"2024-07-04\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/h2o-danube3-500m-chat-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"tiiuae/Falcon-H1-0.5B-Base\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"521M\",\n    \"parameters_raw\": 521411104,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon_h1\",\n    \"hf_downloads\": 25562,\n    \"hf_likes\": 16,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Falcon-H1-0.5B-Base-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/Qwen3-30B-A3B-Instruct-2507-speculator.eagle3\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"522M\",\n    \"parameters_raw\": 522152832,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 115085,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-12-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"z-lab/Qwen3-4B-DFlash-b16\",\n    \"provider\": \"z-lab\",\n    \"parameter_count\": \"537M\",\n    \"parameters_raw\": 537427200,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 25679,\n    \"hf_likes\": 22,\n    \"release_date\": \"2026-01-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigscience/bloomz-560m\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"559M\",\n    \"parameters_raw\": 559214592,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 1303926,\n    \"hf_likes\": 137,\n    \"release_date\": \"2022-10-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/bloomz-560m-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"bigscience/bloom-560m\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"559M\",\n    \"parameters_raw\": 559214592,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 134778,\n    \"hf_likes\": 371,\n    \"release_date\": \"2022-05-19\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/bloom-560m-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-MLX-4bit\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"566M\",\n    \"parameters_raw\": 565828096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 74343,\n    \"hf_likes\": 26,\n    \"release_date\": \"2025-05-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/t5gemma-b-b-ul2\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"591M\",\n    \"parameters_raw\": 591490560,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5gemma\",\n    \"hf_downloads\": 39788,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-06-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/t5gemma-b-b-prefixlm\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"591M\",\n    \"parameters_raw\": 591490560,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5gemma\",\n    \"hf_downloads\": 1187971,\n    \"hf_likes\": 13,\n    \"release_date\": \"2025-06-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Phi-4-mini-reasoning-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"600M\",\n    \"parameters_raw\": 599546880,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 43404,\n    \"hf_likes\": 3,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-0.5B-Chat\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"620M\",\n    \"parameters_raw\": 619570176,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 87380,\n    \"hf_likes\": 92,\n    \"release_date\": \"2024-01-31\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen1.5-0.5B-Chat-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-0.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"620M\",\n    \"parameters_raw\": 619570176,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 26651,\n    \"hf_likes\": 173,\n    \"release_date\": \"2024-01-22\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen1.5-0.5B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"629M\",\n    \"parameters_raw\": 628676096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 95794,\n    \"hf_likes\": 10,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Instruct-2507-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"629M\",\n    \"parameters_raw\": 628676096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 66279,\n    \"hf_likes\": 3,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"629M\",\n    \"parameters_raw\": 628676096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 21982,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-700M\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"742M\",\n    \"parameters_raw\": 742489344,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2-700M-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-700M-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"742M\",\n    \"parameters_raw\": 742489344,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"mlx-8bit\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-700M-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"742M\",\n    \"parameters_raw\": 742489344,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.5,\n    \"quantization\": \"BF16\",\n    \"context_length\": 128000,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3-0.6B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"752M\",\n    \"parameters_raw\": 751632384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 11310453,\n    \"hf_likes\": 1120,\n    \"release_date\": \"2025-04-27\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-0.6B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3Guard-Gen-0.6B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"752M\",\n    \"parameters_raw\": 751632384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 146728,\n    \"hf_likes\": 62,\n    \"release_date\": \"2025-09-23\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen3Guard-Gen-0.6B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-0.6B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"752M\",\n    \"parameters_raw\": 751659264,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 1648717,\n    \"hf_likes\": 57,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Instruct-2507-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"754M\",\n    \"parameters_raw\": 754372096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 62740,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"h2oai/h2ovl-mississippi-800m\",\n    \"provider\": \"h2oai\",\n    \"parameter_count\": \"826M\",\n    \"parameters_raw\": 826295808,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"h2ovl_chat\",\n    \"hf_downloads\": 1014882,\n    \"hf_likes\": 39,\n    \"release_date\": \"2024-10-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-0.8B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"873M\",\n    \"parameters_raw\": 873438784,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 93448,\n    \"hf_likes\": 208,\n    \"release_date\": \"2026-02-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-0.8B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-0.8B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"873M\",\n    \"parameters_raw\": 873438784,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 4680,\n    \"hf_likes\": 37,\n    \"release_date\": \"2026-02-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Thinking-2507-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"880M\",\n    \"parameters_raw\": 880068096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 91703,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Instruct-2507-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"880M\",\n    \"parameters_raw\": 880068096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 62883,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Joaoffg/ELM\",\n    \"provider\": \"joaoffg\",\n    \"parameter_count\": \"903M\",\n    \"parameters_raw\": 902891520,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 339775,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-05-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Qwen3-8B-speculator.eagle3\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.0B\",\n    \"parameters_raw\": 1022037632,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 76636,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-09-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-1b\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1078891008,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 27818,\n    \"hf_likes\": 43,\n    \"release_date\": \"2023-03-10\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/pythia-1b-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"TinyLlama/TinyLlama-1.1B-Chat-v1.0\",\n    \"provider\": \"Community\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1100048384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1870099,\n    \"hf_likes\": 1538,\n    \"release_date\": \"2023-12-30\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF\",\n        \"provider\": \"TheBloke\"\n      },\n      {\n        \"repo\": \"mradermacher/TinyLlama-1.1B-Chat-v1.0-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change\",\n    \"provider\": \"nm-testing\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1100048692,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 31348,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-06-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigcode/gpt_bigcode-santacoder\",\n    \"provider\": \"BigCode\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1124886528,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_bigcode\",\n    \"hf_downloads\": 49973,\n    \"hf_likes\": 26,\n    \"release_date\": \"2023-04-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1131460096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 93477,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-4B-Instruct-2507-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.1B\",\n    \"parameters_raw\": 1131460096,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 63832,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-1.2B-Instruct\",\n    \"provider\": \"liquidai\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 116655,\n    \"hf_likes\": 516,\n    \"release_date\": \"2026-01-06\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2.5-1.2B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-1.2B-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 26071,\n    \"hf_likes\": 6,\n    \"release_date\": \"2025-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-1.2B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2-1.2B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-1.2B-Base\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/LFM2.5-1.2B-Base-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-1.2B-Thinking\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2.5-1.2B-Thinking-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"mradermacher/LFM2.5-1.2B-Thinking-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-1.2B-JP\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Japanese language, multilingual chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-1.2B-Tool\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Tool calling, function calling\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-1.2B-RAG\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Retrieval-augmented generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-1.2B-Extract\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Data extraction, structured output\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Thinking-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.2,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"mlx-8bit\",\n    \"context_length\": 128000,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-1.2B-Thinking-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1170340608,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.4,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"BF16\",\n    \"context_length\": 128000,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"allenai/OLMo-1B-hf\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1176764416,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo\",\n    \"hf_downloads\": 23538,\n    \"hf_likes\": 26,\n    \"release_date\": \"2024-04-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Zyphra/Zamba2-1.2B-instruct\",\n    \"provider\": \"zyphra\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1215064704,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"zamba2\",\n    \"hf_downloads\": 72584,\n    \"hf_likes\": 30,\n    \"release_date\": \"2024-09-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.2-1B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1235814400,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1453836,\n    \"hf_likes\": 2306,\n    \"release_date\": \"2024-09-18\"\n  },\n  {\n    \"name\": \"hmellor/Ilama-3.2-1B\",\n    \"provider\": \"hmellor\",\n    \"parameter_count\": \"1.2B\",\n    \"parameters_raw\": 1235814400,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ilama\",\n    \"hf_downloads\": 89998,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"warshanks/Jan-nano-AWQ\",\n    \"provider\": \"warshanks\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1264206840,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 99084,\n    \"hf_likes\": 3,\n    \"release_date\": \"2025-07-12\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"LGAI-EXAONE/EXAONE-4.0-1.2B\",\n    \"provider\": \"lgai-exaone\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1279391488,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone4\",\n    \"hf_downloads\": 100975,\n    \"hf_likes\": 172,\n    \"release_date\": \"2025-07-11\"\n  },\n  {\n    \"name\": \"lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1280062464,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 348365,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-05-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-8B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1280062464,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 39201,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"pfnet/plamo-2-1b\",\n    \"provider\": \"pfnet\",\n    \"parameter_count\": \"1.3B\",\n    \"parameters_raw\": 1291441920,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 10485760,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"plamo2\",\n    \"hf_downloads\": 63725,\n    \"hf_likes\": 38,\n    \"release_date\": \"2025-02-05\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/gpt-neo-1.3B\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"1.4B\",\n    \"parameters_raw\": 1365907456,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neo\",\n    \"hf_downloads\": 48440,\n    \"hf_likes\": 324,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/phi-1_5\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"1.4B\",\n    \"parameters_raw\": 1418270720,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi\",\n    \"hf_downloads\": 152337,\n    \"hf_likes\": 1355,\n    \"release_date\": \"2023-09-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"starvector/starvector-1b-im2svg\",\n    \"provider\": \"starvector\",\n    \"parameter_count\": \"1.4B\",\n    \"parameters_raw\": 1434095620,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starvector\",\n    \"hf_downloads\": 38196,\n    \"hf_likes\": 184,\n    \"release_date\": \"2025-01-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMo-2-0425-1B\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1484916736,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo2\",\n    \"hf_downloads\": 533223,\n    \"hf_likes\": 70,\n    \"release_date\": \"2025-04-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMo-2-0425-1B-Instruct\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1484916736,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo2\",\n    \"hf_downloads\": 38389,\n    \"hf_likes\": 56,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/OLMo-2-0425-1B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/Llama-3.2-1B-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1498482912,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 814349,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-09-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1498859520,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1823969,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-09-25\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-Audio-1.5B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1500000000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Speech-to-speech, ASR, TTS\",\n    \"pipeline_tag\": \"audio-to-audio\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-Audio-1.5B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1500000000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Speech-to-speech, ASR, TTS\",\n    \"pipeline_tag\": \"audio-to-audio\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"EleutherAI/pythia-1.4b\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1515311488,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 27804,\n    \"hf_likes\": 26,\n    \"release_date\": \"2023-02-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-1.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1789513,\n    \"hf_likes\": 107,\n    \"release_date\": \"2024-09-18\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-1.5B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-1.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-1.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 7037921,\n    \"hf_likes\": 627,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-1.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2-1.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 3508972,\n    \"hf_likes\": 161,\n    \"release_date\": \"2024-06-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Math-1.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1064952,\n    \"hf_likes\": 102,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-1.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 431369,\n    \"hf_likes\": 166,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2-1.5B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 114016,\n    \"hf_likes\": 99,\n    \"release_date\": \"2024-05-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Math-1.5B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 80310,\n    \"hf_likes\": 54,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Math-1.5B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/Qwen2-1.5B-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.5B\",\n    \"parameters_raw\": 1543714304,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 24030,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-06-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"KiteFishAI/Minnow-Math-1.5B\",\n    \"provider\": \"kitefishai\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1633781760,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 147620,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-VL-1.6B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1584804000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2.5-VL-1.6B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1596625904,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-VL-1.6B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1596625904,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"mlx-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-VL-1.6B-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1596625904,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.2,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"mlx-6bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2.5-VL-1.6B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1596625904,\n    \"min_ram_gb\": 1.8,\n    \"recommended_ram_gb\": 3.0,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"mlx-8bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"stabilityai/stablelm-2-1_6b-chat\",\n    \"provider\": \"Stability AI\",\n    \"parameter_count\": \"1.6B\",\n    \"parameters_raw\": 1644515328,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"stablelm\",\n    \"hf_downloads\": 955,\n    \"hf_likes\": 34,\n    \"release_date\": \"2024-04-08\"\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM-1.7B\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1711376384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 63387,\n    \"hf_likes\": 180,\n    \"release_date\": \"2024-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM2-1.7B\",\n    \"provider\": \"huggingfacetb\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1711376384,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 25638,\n    \"hf_likes\": 144,\n    \"release_date\": \"2024-10-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/Nanbeige4.1-3B-AWQ-8bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1717865408,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"AWQ-8bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 49220,\n    \"hf_likes\": 2,\n    \"release_date\": \"2026-02-15\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3-1.7B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1720574976,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 295900,\n    \"hf_likes\": 64,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-1.7B-MLX-bf16\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1720574976,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 24714,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigscience/bloom-1b7\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"1.7B\",\n    \"parameters_raw\": 1722408960,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 38813,\n    \"hf_likes\": 122,\n    \"release_date\": \"2022-05-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-1.5B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777088000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 727989,\n    \"hf_likes\": 6,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777088000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 164152,\n    \"hf_likes\": 4,\n    \"release_date\": \"2024-09-20\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2-1.5B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777088000,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 24850,\n    \"hf_likes\": 9,\n    \"release_date\": \"2024-06-06\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777675776,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 24724,\n    \"hf_likes\": 5,\n    \"release_date\": \"2024-06-06\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"RedHatAI/Qwen2.5-1.5B-quantized.w8a8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1777733120,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1091974,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-10-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-1.8B-Chat\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"1.8B\",\n    \"parameters_raw\": 1836828672,\n    \"min_ram_gb\": 1.0,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 0.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 72445,\n    \"hf_likes\": 73,\n    \"release_date\": \"2024-01-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"jonathanli/induction-vl2-mdl-fswd7-20000-720p-proj-256-var\",\n    \"provider\": \"jonathanli\",\n    \"parameter_count\": \"1.9B\",\n    \"parameters_raw\": 1940015872,\n    \"min_ram_gb\": 1.1,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"induction_vl2\",\n    \"hf_downloads\": 24886,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/granite-4.0-h-tiny-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"2.0B\",\n    \"parameters_raw\": 1997098800,\n    \"min_ram_gb\": 1.1,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"granitemoehybrid\",\n    \"hf_downloads\": 63040,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-10-13\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 277721550,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3-1.7B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"2.0B\",\n    \"parameters_raw\": 2031825920,\n    \"min_ram_gb\": 1.1,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 47050,\n    \"hf_likes\": 35,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"h2oai/h2ovl-mississippi-2b\",\n    \"provider\": \"h2oai\",\n    \"parameter_count\": \"2.2B\",\n    \"parameters_raw\": 2152317440,\n    \"min_ram_gb\": 1.2,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"h2ovl_chat\",\n    \"hf_downloads\": 1007240,\n    \"hf_likes\": 42,\n    \"release_date\": \"2024-10-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"warshanks/Qwen3-8B-abliterated-AWQ\",\n    \"provider\": \"warshanks\",\n    \"parameter_count\": \"2.2B\",\n    \"parameters_raw\": 2174236152,\n    \"min_ram_gb\": 1.2,\n    \"recommended_ram_gb\": 2.0,\n    \"min_vram_gb\": 1.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 25559,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-27\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-2B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2274069824,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 46974,\n    \"hf_likes\": 115,\n    \"release_date\": \"2026-02-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-2B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-2B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2274069824,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 3336,\n    \"hf_likes\": 33,\n    \"release_date\": \"2026-02-28\"\n  },\n  {\n    \"name\": \"lmstudio-community/Phi-4-reasoning-plus-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2290897920,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 28622,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2303865856,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 333300,\n    \"hf_likes\": 13,\n    \"release_date\": \"2025-05-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-8B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2303865856,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 37222,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-14B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2307906560,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 46163,\n    \"hf_likes\": 5,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen2.5-Coder-14B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"2.3B\",\n    \"parameters_raw\": 2308527104,\n    \"min_ram_gb\": 1.3,\n    \"recommended_ram_gb\": 2.1,\n    \"min_vram_gb\": 1.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 92774,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-11-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-1.1-2b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"2.5B\",\n    \"parameters_raw\": 2506172416,\n    \"min_ram_gb\": 1.4,\n    \"recommended_ram_gb\": 2.3,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma\",\n    \"hf_downloads\": 66616,\n    \"hf_likes\": 171,\n    \"release_date\": \"2024-03-26\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-1.1-2b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-2.6B\",\n    \"provider\": \"liquidai\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2569272320,\n    \"min_ram_gb\": 1.4,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 25773,\n    \"hf_likes\": 180,\n    \"release_date\": \"2025-09-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-2.6B-Exp\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2569272320,\n    \"min_ram_gb\": 1.4,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, math, knowledge\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-2.6B-Transcript\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2569272320,\n    \"min_ram_gb\": 1.4,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Meeting transcription, summarization\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"google/gemma-2-2b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2614341376,\n    \"min_ram_gb\": 1.5,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-2-2b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Efficient-Large-Model/gemma-2-2b-it\",\n    \"provider\": \"efficient-large-model\",\n    \"parameter_count\": \"2.6B\",\n    \"parameters_raw\": 2614341888,\n    \"min_ram_gb\": 1.5,\n    \"recommended_ram_gb\": 2.4,\n    \"min_vram_gb\": 1.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 50419,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-12-12\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-2-2b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"EleutherAI/gpt-neo-2.7B\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"2.7B\",\n    \"parameters_raw\": 2718416384,\n    \"min_ram_gb\": 1.5,\n    \"recommended_ram_gb\": 2.5,\n    \"min_vram_gb\": 1.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neo\",\n    \"hf_downloads\": 23217,\n    \"hf_likes\": 501,\n    \"release_date\": \"2022-03-02\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/phi-2\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"2.8B\",\n    \"parameters_raw\": 2779683840,\n    \"min_ram_gb\": 1.6,\n    \"recommended_ram_gb\": 2.6,\n    \"min_vram_gb\": 1.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi\",\n    \"hf_downloads\": 1651432,\n    \"hf_likes\": 3429,\n    \"release_date\": \"2023-12-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"stabilityai/stablelm-3b-4e1t\",\n    \"provider\": \"Stability AI\",\n    \"parameter_count\": \"2.8B\",\n    \"parameters_raw\": 2795443200,\n    \"min_ram_gb\": 1.6,\n    \"recommended_ram_gb\": 2.6,\n    \"min_vram_gb\": 1.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"stablelm\",\n    \"hf_downloads\": 24407,\n    \"hf_likes\": 312,\n    \"release_date\": \"2023-09-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"HuggingFaceTB/SmolLM3-3B\",\n    \"provider\": \"HuggingFace\",\n    \"parameter_count\": \"3B\",\n    \"parameters_raw\": 3000000000,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, multilingual reasoning\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"smollm\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-08\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/SmolLM3-3B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-VL-3B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"3.0B\",\n    \"parameters_raw\": 2998975216,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"lfm2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\"\n  },\n  {\n    \"name\": \"bigscience/bloom-3b\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"3.0B\",\n    \"parameters_raw\": 3002557440,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 30567,\n    \"hf_likes\": 94,\n    \"release_date\": \"2022-05-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigcode/starcoder2-3b\",\n    \"provider\": \"BigCode\",\n    \"parameter_count\": \"3.0B\",\n    \"parameters_raw\": 3030371328,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder2\",\n    \"hf_downloads\": 97310,\n    \"hf_likes\": 216,\n    \"release_date\": \"2023-11-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"TechxGenus/gemma-1.1-2b-it-GPTQ\",\n    \"provider\": \"techxgenus\",\n    \"parameter_count\": \"3.0B\",\n    \"parameters_raw\": 3031170048,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.8,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma\",\n    \"hf_downloads\": 20793,\n    \"hf_likes\": 1,\n    \"release_date\": \"2024-04-07\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-3B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 6598470,\n    \"hf_likes\": 409,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-3B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-3B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 297679,\n    \"hf_likes\": 172,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-3B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-3B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 126989,\n    \"hf_likes\": 96,\n    \"release_date\": \"2024-11-06\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-3B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-3B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Salesforce/xLAM-2-3b-fc-r\",\n    \"provider\": \"salesforce\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 44516,\n    \"hf_likes\": 16,\n    \"release_date\": \"2025-03-27\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-3B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.1B\",\n    \"parameters_raw\": 3085938688,\n    \"min_ram_gb\": 1.7,\n    \"recommended_ram_gb\": 2.9,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 42540,\n    \"hf_likes\": 40,\n    \"release_date\": \"2024-11-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-3B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.2-3B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"3.2B\",\n    \"parameters_raw\": 3212749824,\n    \"min_ram_gb\": 1.8,\n    \"recommended_ram_gb\": 3.0,\n    \"min_vram_gb\": 1.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1409393,\n    \"hf_likes\": 702,\n    \"release_date\": \"2024-09-18\"\n  },\n  {\n    \"name\": \"ibm-research/PowerMoE-3b\",\n    \"provider\": \"ibm-research\",\n    \"parameter_count\": \"3.4B\",\n    \"parameters_raw\": 3374286336,\n    \"min_ram_gb\": 1.9,\n    \"recommended_ram_gb\": 3.1,\n    \"min_vram_gb\": 1.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"granitemoe\",\n    \"hf_downloads\": 399266,\n    \"hf_likes\": 17,\n    \"release_date\": \"2024-08-14\",\n    \"is_moe\": true,\n    \"num_experts\": 40,\n    \"active_experts\": 8,\n    \"active_parameters\": 809828716,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-3B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.4B\",\n    \"parameters_raw\": 3397103616,\n    \"min_ram_gb\": 1.9,\n    \"recommended_ram_gb\": 3.2,\n    \"min_vram_gb\": 1.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 38262,\n    \"hf_likes\": 16,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-3B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.4B\",\n    \"parameters_raw\": 3397103616,\n    \"min_ram_gb\": 1.9,\n    \"recommended_ram_gb\": 3.2,\n    \"min_vram_gb\": 1.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 21964,\n    \"hf_likes\": 5,\n    \"release_date\": \"2024-11-09\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"ibm-granite/granite-3b-code-base-2k\",\n    \"provider\": \"ibm-granite\",\n    \"parameter_count\": \"3.5B\",\n    \"parameters_raw\": 3482503680,\n    \"min_ram_gb\": 1.9,\n    \"recommended_ram_gb\": 3.2,\n    \"min_vram_gb\": 1.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 73193,\n    \"hf_likes\": 37,\n    \"release_date\": \"2024-04-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"ibm-research/PowerLM-3b\",\n    \"provider\": \"ibm-research\",\n    \"parameter_count\": \"3.5B\",\n    \"parameters_raw\": 3512017152,\n    \"min_ram_gb\": 2.0,\n    \"recommended_ram_gb\": 3.3,\n    \"min_vram_gb\": 1.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"granite\",\n    \"hf_downloads\": 30013,\n    \"hf_likes\": 20,\n    \"release_date\": \"2024-08-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-VL-3B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3754622976,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.5,\n    \"min_vram_gb\": 1.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen2_5_vl\",\n    \"hf_downloads\": 2621650,\n    \"hf_likes\": 623,\n    \"release_date\": \"2025-01-26\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-VL-3B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/Phi-tiny-MoE-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3755220288,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.5,\n    \"min_vram_gb\": 1.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phimoe\",\n    \"hf_downloads\": 310211,\n    \"hf_likes\": 31,\n    \"release_date\": \"2025-06-23\",\n    \"is_moe\": true,\n    \"num_experts\": 16,\n    \"active_experts\": 2,\n    \"active_parameters\": 633693422,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"llm-jp/llm-jp-3-3.7b-instruct\",\n    \"provider\": \"llm-jp\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3782913024,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.5,\n    \"min_vram_gb\": 1.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 810462,\n    \"hf_likes\": 13,\n    \"release_date\": \"2024-09-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/Phi-4-mini-reasoning\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3800000000,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.5,\n    \"min_vram_gb\": 1.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Lightweight reasoning\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi4\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-01\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Phi-4-mini-reasoning-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/phi-3-mini-4k-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821000000,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/phi-3-mini-4k-instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/Phi-3.5-mini-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821000000,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, long context\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Phi-3.5-mini-instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zstanjj/HTML-Pruner-Phi-3.8B\",\n    \"provider\": \"zstanjj\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821079552,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 88805,\n    \"hf_likes\": 18,\n    \"release_date\": \"2024-10-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Sreenington/Phi-3-mini-4k-instruct-AWQ\",\n    \"provider\": \"sreenington\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821079552,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 40949,\n    \"hf_likes\": 5,\n    \"release_date\": \"2024-05-05\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"numind/NuExtract-1.5\",\n    \"provider\": \"numind\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3821079552,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 31247,\n    \"hf_likes\": 243,\n    \"release_date\": \"2024-09-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"kaitchup/Phi-3-mini-4k-instruct-gptq-4bit\",\n    \"provider\": \"kaitchup\",\n    \"parameter_count\": \"3.8B\",\n    \"parameters_raw\": 3822095360,\n    \"min_ram_gb\": 2.1,\n    \"recommended_ram_gb\": 3.6,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 881144,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-04-25\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Nanbeige/Nanbeige4.1-3B\",\n    \"provider\": \"nanbeige\",\n    \"parameter_count\": \"3.9B\",\n    \"parameters_raw\": 3933637120,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 417673,\n    \"hf_likes\": 941,\n    \"release_date\": \"2026-02-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-3n-E2B-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"4B\",\n    \"parameters_raw\": 4000000000,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, on-device (effective 2B)\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"gemma3n\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-06-25\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gemma-3n-E2B-it-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.0B\",\n    \"parameters_raw\": 4022468096,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 548989,\n    \"hf_likes\": 81,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.0B\",\n    \"parameters_raw\": 4022468096,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 344398,\n    \"hf_likes\": 25,\n    \"release_date\": \"2025-05-05\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"typhoon-ai/typhoon2.5-qwen3-4b\",\n    \"provider\": \"typhoon-ai\",\n    \"parameter_count\": \"4.0B\",\n    \"parameters_raw\": 4022468096,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 51135,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-09-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"JunHowie/Qwen3-4B-Instruct-2507-GPTQ-Int4\",\n    \"provider\": \"junhowie\",\n    \"parameter_count\": \"4.0B\",\n    \"parameters_raw\": 4022468096,\n    \"min_ram_gb\": 2.2,\n    \"recommended_ram_gb\": 3.7,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 36817,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-09-01\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"TIGER-Lab/VLM2Vec-Full\",\n    \"provider\": \"tiger-lab\",\n    \"parameter_count\": \"4.1B\",\n    \"parameters_raw\": 4146621440,\n    \"min_ram_gb\": 2.3,\n    \"recommended_ram_gb\": 3.9,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3_v\",\n    \"hf_downloads\": 64160,\n    \"hf_likes\": 28,\n    \"release_date\": \"2024-10-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-14B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"4.2B\",\n    \"parameters_raw\": 4153891840,\n    \"min_ram_gb\": 2.3,\n    \"recommended_ram_gb\": 3.9,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 42084,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen2.5-Coder-14B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"4.2B\",\n    \"parameters_raw\": 4154676224,\n    \"min_ram_gb\": 2.3,\n    \"recommended_ram_gb\": 3.9,\n    \"min_vram_gb\": 2.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 82050,\n    \"hf_likes\": 1,\n    \"release_date\": \"2024-11-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-SafeRL\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.4B\",\n    \"parameters_raw\": 4411424256,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.1,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 53732,\n    \"hf_likes\": 41,\n    \"release_date\": \"2025-09-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-Instruct-2507-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.4B\",\n    \"parameters_raw\": 4411646016,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.1,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 507765,\n    \"hf_likes\": 69,\n    \"release_date\": \"2025-08-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-4B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.4B\",\n    \"parameters_raw\": 4411646016,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.1,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 250469,\n    \"hf_likes\": 38,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/Nemotron-H-4B-Base-8K\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"4.5B\",\n    \"parameters_raw\": 4489223040,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.2,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 40602,\n    \"hf_likes\": 5,\n    \"release_date\": \"2025-03-20\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/Nemotron-H-4B-Instruct-128K\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"4.5B\",\n    \"parameters_raw\": 4489223040,\n    \"min_ram_gb\": 2.5,\n    \"recommended_ram_gb\": 4.2,\n    \"min_vram_gb\": 2.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 38647,\n    \"hf_likes\": 8,\n    \"release_date\": \"2025-04-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ\",\n    \"provider\": \"stelterlab\",\n    \"parameter_count\": \"4.6B\",\n    \"parameters_raw\": 4605856128,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.3,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 63349,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 503765510,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-4B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.7B\",\n    \"parameters_raw\": 4659865088,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.3,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 99087,\n    \"hf_likes\": 202,\n    \"release_date\": \"2026-02-27\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-4B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-4B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"4.7B\",\n    \"parameters_raw\": 4659865088,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.3,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 3593,\n    \"hf_likes\": 38,\n    \"release_date\": \"2026-02-27\"\n  },\n  {\n    \"name\": \"nvidia/Qwen3-8B-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"4.7B\",\n    \"parameters_raw\": 4717851648,\n    \"min_ram_gb\": 2.6,\n    \"recommended_ram_gb\": 4.4,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 32743,\n    \"hf_likes\": 14,\n    \"release_date\": \"2025-09-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"speakleash/Bielik-4.5B-v3.0-Instruct\",\n    \"provider\": \"speakleash\",\n    \"parameter_count\": \"4.8B\",\n    \"parameters_raw\": 4757260288,\n    \"min_ram_gb\": 2.7,\n    \"recommended_ram_gb\": 4.4,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 43008,\n    \"hf_likes\": 27,\n    \"release_date\": \"2025-04-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"XLabs-AI/xflux_text_encoders\",\n    \"provider\": \"xlabs-ai\",\n    \"parameter_count\": \"4.8B\",\n    \"parameters_raw\": 4762310656,\n    \"min_ram_gb\": 2.7,\n    \"recommended_ram_gb\": 4.4,\n    \"min_vram_gb\": 2.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"t5\",\n    \"hf_downloads\": 162123,\n    \"hf_likes\": 21,\n    \"release_date\": \"2024-08-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ\",\n    \"provider\": \"stelterlab\",\n    \"parameter_count\": \"5.1B\",\n    \"parameters_raw\": 5053827112,\n    \"min_ram_gb\": 2.8,\n    \"recommended_ram_gb\": 4.7,\n    \"min_vram_gb\": 2.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 38947,\n    \"hf_likes\": 4,\n    \"release_date\": \"2026-01-31\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-32B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"5.1B\",\n    \"parameters_raw\": 5119652864,\n    \"min_ram_gb\": 2.9,\n    \"recommended_ram_gb\": 4.8,\n    \"min_vram_gb\": 2.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 26287,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen2.5-Coder-32B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"5.1B\",\n    \"parameters_raw\": 5120300032,\n    \"min_ram_gb\": 2.9,\n    \"recommended_ram_gb\": 4.8,\n    \"min_vram_gb\": 2.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 44413,\n    \"hf_likes\": 6,\n    \"release_date\": \"2024-11-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/QwQ-32B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"5.1B\",\n    \"parameters_raw\": 5120300032,\n    \"min_ram_gb\": 2.9,\n    \"recommended_ram_gb\": 4.8,\n    \"min_vram_gb\": 2.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 32595,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-03-05\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"5.3B\",\n    \"parameters_raw\": 5306567040,\n    \"min_ram_gb\": 3.0,\n    \"recommended_ram_gb\": 4.9,\n    \"min_vram_gb\": 2.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 135548,\n    \"hf_likes\": 40,\n    \"release_date\": \"2025-08-01\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 580405768,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"5.3B\",\n    \"parameters_raw\": 5306567040,\n    \"min_ram_gb\": 3.0,\n    \"recommended_ram_gb\": 4.9,\n    \"min_vram_gb\": 2.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 85989,\n    \"hf_likes\": 30,\n    \"release_date\": \"2025-07-29\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 580405768,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/MiroThinker-v1.5-30B-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"5.3B\",\n    \"parameters_raw\": 5306567040,\n    \"min_ram_gb\": 3.0,\n    \"recommended_ram_gb\": 4.9,\n    \"min_vram_gb\": 2.7,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 20465,\n    \"hf_likes\": 3,\n    \"release_date\": \"2026-01-06\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 580405768,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"01-ai/Yi-6B-Chat\",\n    \"provider\": \"01.ai\",\n    \"parameter_count\": \"6.1B\",\n    \"parameters_raw\": 6061035520,\n    \"min_ram_gb\": 3.4,\n    \"recommended_ram_gb\": 5.6,\n    \"min_vram_gb\": 3.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 15481,\n    \"hf_likes\": 70,\n    \"release_date\": \"2023-11-22\"\n  },\n  {\n    \"name\": \"arcee-ai/Trinity-Nano-Preview\",\n    \"provider\": \"arcee-ai\",\n    \"parameter_count\": \"6.1B\",\n    \"parameters_raw\": 6120003328,\n    \"min_ram_gb\": 3.4,\n    \"recommended_ram_gb\": 5.7,\n    \"min_vram_gb\": 3.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"afmoe\",\n    \"hf_downloads\": 22294,\n    \"hf_likes\": 67,\n    \"release_date\": \"2025-12-01\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 669375358,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/GLM-4.7-Flash-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"6.4B\",\n    \"parameters_raw\": 6407095318,\n    \"min_ram_gb\": 3.6,\n    \"recommended_ram_gb\": 6.0,\n    \"min_vram_gb\": 3.3,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe_lite\",\n    \"hf_downloads\": 217691,\n    \"hf_likes\": 46,\n    \"release_date\": \"2026-01-19\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmsys/vicuna-7b-v1.5\",\n    \"provider\": \"LMSYS\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 6738415616,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"tartuNLP/Llammas-base-p1-GPT-4o-human-error-mix-paragraph-GEC\",\n    \"provider\": \"tartunlp\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738415616,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 36045,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-02-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-2-7b-hf\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738417664,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 617643,\n    \"hf_likes\": 2272,\n    \"release_date\": \"2023-07-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"huggyllama/llama-7b\",\n    \"provider\": \"huggyllama\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738417664,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 103505,\n    \"hf_likes\": 354,\n    \"release_date\": \"2023-04-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"NousResearch/Llama-2-7b-hf\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738417664,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 81336,\n    \"hf_likes\": 171,\n    \"release_date\": \"2023-07-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"NousResearch/Llama-2-7b-chat-hf\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738417664,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 20573,\n    \"hf_likes\": 194,\n    \"release_date\": \"2023-07-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/CodeLlama-7b-Instruct-hf\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738546688,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 5404,\n    \"hf_likes\": 59,\n    \"release_date\": \"2024-03-13\"\n  },\n  {\n    \"name\": \"codellama/CodeLlama-7b-Instruct-hf\",\n    \"provider\": \"codellama\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738546688,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 65896,\n    \"hf_likes\": 254,\n    \"release_date\": \"2023-08-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"codellama/CodeLlama-7b-hf\",\n    \"provider\": \"codellama\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6738546688,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 54518,\n    \"hf_likes\": 375,\n    \"release_date\": \"2023-08-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/deepseek-coder-6.7b-instruct\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6740512768,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 97176,\n    \"hf_likes\": 478,\n    \"release_date\": \"2023-10-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/deepseek-coder-6.7b-base\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"6.7B\",\n    \"parameters_raw\": 6740512768,\n    \"min_ram_gb\": 3.8,\n    \"recommended_ram_gb\": 6.3,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 28134,\n    \"hf_likes\": 122,\n    \"release_date\": \"2023-10-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMoE-1B-7B-0125\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"6.9B\",\n    \"parameters_raw\": 6919161856,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.4,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmoe\",\n    \"hf_downloads\": 42434,\n    \"hf_likes\": 35,\n    \"release_date\": \"2025-01-21\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 8,\n    \"active_parameters\": 1167608556,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMoE-1B-7B-0125-Instruct\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"6.9B\",\n    \"parameters_raw\": 6919161856,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.4,\n    \"min_vram_gb\": 3.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmoe\",\n    \"hf_downloads\": 35624,\n    \"hf_likes\": 58,\n    \"release_date\": \"2025-01-27\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 8,\n    \"active_parameters\": 1167608556,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-6.9b\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 6991520256,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.5,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 20516,\n    \"hf_likes\": 59,\n    \"release_date\": \"2023-02-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"openchat/openchat-3.5-0106\",\n    \"provider\": \"OpenChat\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 7000000000,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.5,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"XiaomiMiMo/MiMo-7B-RL\",\n    \"provider\": \"Xiaomi\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 7000000000,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.5,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Advanced reasoning, math and code\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mimo\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-05-01\"\n  },\n  {\n    \"name\": \"microsoft/Orca-2-7b\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"7.0B\",\n    \"parameters_raw\": 7016400896,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.5,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Reasoning, step-by-step solutions\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"omni-research/Tarsier-7b\",\n    \"provider\": \"omni-research\",\n    \"parameter_count\": \"7.1B\",\n    \"parameters_raw\": 7063427072,\n    \"min_ram_gb\": 3.9,\n    \"recommended_ram_gb\": 6.6,\n    \"min_vram_gb\": 3.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llava\",\n    \"hf_downloads\": 49581,\n    \"hf_likes\": 25,\n    \"release_date\": \"2024-07-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigcode/starcoder2-7b\",\n    \"provider\": \"BigCode\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7173923840,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder2\",\n    \"hf_downloads\": 19199,\n    \"hf_likes\": 208,\n    \"release_date\": \"2024-02-20\"\n  },\n  {\n    \"name\": \"tiiuae/falcon-7b-instruct\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7217189760,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon\",\n    \"hf_downloads\": 47656,\n    \"hf_likes\": 1031,\n    \"release_date\": \"2023-04-25\"\n  },\n  {\n    \"name\": \"HuggingFaceH4/zephyr-7b-beta\",\n    \"provider\": \"HuggingFace\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 107437,\n    \"hf_likes\": 1834,\n    \"release_date\": \"2023-10-26\"\n  },\n  {\n    \"name\": \"mistralai/Mistral-7B-Instruct-v0.2\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 2920309,\n    \"hf_likes\": 3088,\n    \"release_date\": \"2023-12-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"speakleash/Bielik-7B-Instruct-v0.1\",\n    \"provider\": \"speakleash\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 101914,\n    \"hf_likes\": 63,\n    \"release_date\": \"2024-03-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"prometheus-eval/prometheus-7b-v2.0\",\n    \"provider\": \"prometheus-eval\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 54661,\n    \"hf_likes\": 100,\n    \"release_date\": \"2024-02-13\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Salesforce/xLAM-7b-r\",\n    \"provider\": \"salesforce\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 38045,\n    \"hf_likes\": 32,\n    \"release_date\": \"2024-08-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/xLAM-7b-r-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Intel/neural-chat-7b-v3-3\",\n    \"provider\": \"intel\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 27068,\n    \"hf_likes\": 80,\n    \"release_date\": \"2023-12-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Featherless-Chat-Models/Mistral-7B-Instruct-v0.2\",\n    \"provider\": \"featherless-chat-models\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 26186,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-05-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"augmxnt/shisa-gamma-7b-v1\",\n    \"provider\": \"augmxnt\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241732096,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 20213,\n    \"hf_likes\": 18,\n    \"release_date\": \"2023-12-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"dphn/dolphin-2.6-mistral-7b\",\n    \"provider\": \"dphn\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7241740288,\n    \"min_ram_gb\": 4.0,\n    \"recommended_ram_gb\": 6.7,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 60305,\n    \"hf_likes\": 105,\n    \"release_date\": \"2023-12-27\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"mistralai/Mistral-7B-Instruct-v0.3\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7248023552,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"unknown\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 1540743,\n    \"hf_likes\": 2447,\n    \"release_date\": \"2024-05-22\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Mistral-7B-Instruct-v0.3-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"allenai/wildguard\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7248031744,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 23686,\n    \"hf_likes\": 38,\n    \"release_date\": \"2024-06-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"dphn/dolphin-2.9.3-mistral-7B-32k\",\n    \"provider\": \"dphn\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7248039936,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 79357,\n    \"hf_likes\": 57,\n    \"release_date\": \"2024-06-25\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/dolphin-2.9.3-mistral-7B-32k-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"thesven/Mistral-7B-Instruct-v0.3-GPTQ\",\n    \"provider\": \"thesven\",\n    \"parameter_count\": \"7.2B\",\n    \"parameters_raw\": 7249399808,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 35763,\n    \"hf_likes\": 1,\n    \"release_date\": \"2024-05-22\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"allenai/Olmo-3-7B-Instruct-SFT\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"7.3B\",\n    \"parameters_raw\": 7298011136,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 134834,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-11-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/Olmo-3-1025-7B\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"7.3B\",\n    \"parameters_raw\": 7298011136,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.8,\n    \"min_vram_gb\": 3.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo3\",\n    \"hf_downloads\": 71128,\n    \"hf_likes\": 54,\n    \"release_date\": \"2025-09-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"TechxGenus/starcoder2-7b-GPTQ\",\n    \"provider\": \"techxgenus\",\n    \"parameter_count\": \"7.4B\",\n    \"parameters_raw\": 7400416256,\n    \"min_ram_gb\": 4.1,\n    \"recommended_ram_gb\": 6.9,\n    \"min_vram_gb\": 3.8,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder2\",\n    \"hf_downloads\": 36955,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-03-22\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"tiiuae/Falcon3-7B-Instruct\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"7.5B\",\n    \"parameters_raw\": 7455550464,\n    \"min_ram_gb\": 4.2,\n    \"recommended_ram_gb\": 6.9,\n    \"min_vram_gb\": 3.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 18394,\n    \"hf_likes\": 76,\n    \"release_date\": \"2024-11-29\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Falcon3-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 20736120,\n    \"hf_likes\": 1108,\n    \"release_date\": \"2024-09-16\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1575000,\n    \"hf_likes\": 659,\n    \"release_date\": \"2024-09-17\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-7B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 743941,\n    \"hf_likes\": 797,\n    \"release_date\": \"2025-01-20\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 2029944,\n    \"hf_likes\": 266,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-7B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1107387,\n    \"hf_likes\": 19,\n    \"release_date\": \"2024-09-20\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1066717,\n    \"hf_likes\": 13,\n    \"release_date\": \"2024-09-20\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Math-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 318106,\n    \"hf_likes\": 89,\n    \"release_date\": \"2024-09-19\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Math-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 310355,\n    \"hf_likes\": 683,\n    \"release_date\": \"2024-06-04\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2-7B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 240132,\n    \"hf_likes\": 137,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 158122,\n    \"hf_likes\": 29,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Dream-org/Dream-v0-Instruct-7B\",\n    \"provider\": \"dream-org\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"Dream\",\n    \"hf_downloads\": 73949,\n    \"hf_likes\": 154,\n    \"release_date\": \"2025-04-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 70734,\n    \"hf_likes\": 170,\n    \"release_date\": \"2024-06-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Math-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 68238,\n    \"hf_likes\": 106,\n    \"release_date\": \"2024-09-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"DeepHat/DeepHat-V1-7B\",\n    \"provider\": \"deephat\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 63374,\n    \"hf_likes\": 111,\n    \"release_date\": \"2025-04-25\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B-Instruct-1M\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1010000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 46699,\n    \"hf_likes\": 366,\n    \"release_date\": \"2025-01-23\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-7B-Instruct-1M-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7615616512,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"GPTQ-Int8\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 30708,\n    \"hf_likes\": 18,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"microsoft/Phi-mini-MoE-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"7.6B\",\n    \"parameters_raw\": 7647632704,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.1,\n    \"min_vram_gb\": 3.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phimoe\",\n    \"hf_downloads\": 69775,\n    \"hf_likes\": 30,\n    \"release_date\": \"2025-06-23\",\n    \"is_moe\": true,\n    \"num_experts\": 16,\n    \"active_experts\": 2,\n    \"active_parameters\": 1290538017,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen-7B-Chat\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.7B\",\n    \"parameters_raw\": 7721324544,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen\",\n    \"hf_downloads\": 195550,\n    \"hf_likes\": 787,\n    \"release_date\": \"2023-08-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.7B\",\n    \"parameters_raw\": 7721324544,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen\",\n    \"hf_downloads\": 189346,\n    \"hf_likes\": 396,\n    \"release_date\": \"2023-08-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"7.7B\",\n    \"parameters_raw\": 7721324544,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 75458,\n    \"hf_likes\": 56,\n    \"release_date\": \"2024-01-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"BSC-LT/salamandra-7b-instruct\",\n    \"provider\": \"bsc-lt\",\n    \"parameter_count\": \"7.8B\",\n    \"parameters_raw\": 7768117248,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 31017,\n    \"hf_likes\": 75,\n    \"release_date\": \"2024-09-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"kmhf/hf-moshiko\",\n    \"provider\": \"kmhf\",\n    \"parameter_count\": \"7.8B\",\n    \"parameters_raw\": 7783880545,\n    \"min_ram_gb\": 4.3,\n    \"recommended_ram_gb\": 7.2,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 3000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"moshi\",\n    \"hf_downloads\": 123900,\n    \"hf_likes\": 0,\n    \"release_date\": \"2024-09-27\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"XiaomiMiMo/MiMo-7B-Base\",\n    \"provider\": \"xiaomimimo\",\n    \"parameter_count\": \"7.8B\",\n    \"parameters_raw\": 7833409536,\n    \"min_ram_gb\": 4.4,\n    \"recommended_ram_gb\": 7.3,\n    \"min_vram_gb\": 4.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mimo\",\n    \"hf_downloads\": 93937,\n    \"hf_likes\": 124,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-3n-E4B-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"8B\",\n    \"parameters_raw\": 8000000000,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, on-device (effective 4B)\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"gemma3n\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-06-25\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gemma-3n-E4B-it-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"mistralai/Ministral-8B-Instruct-2410\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Ministral-8B-Instruct-2410-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"meta-llama/Meta-Llama-3-8B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 2463959,\n    \"hf_likes\": 6473,\n    \"release_date\": \"2024-04-17\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Meta-Llama-3-8B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 1353966,\n    \"hf_likes\": 4391,\n    \"release_date\": \"2024-04-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Meta-Llama-3-8B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"NousResearch/Hermes-3-Llama-3.1-8B\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 635984,\n    \"hf_likes\": 391,\n    \"release_date\": \"2024-07-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Hermes-3-Llama-3.1-8B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"IlyaGusev/saiga_llama3_8b\",\n    \"provider\": \"ilyagusev\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 399621,\n    \"hf_likes\": 137,\n    \"release_date\": \"2024-04-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"NousResearch/Meta-Llama-3.1-8B-Instruct\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 207258,\n    \"hf_likes\": 39,\n    \"release_date\": \"2024-07-24\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Meta-Llama-3.1-8B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"meta-llama/Llama-Guard-3-8B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 163719,\n    \"hf_likes\": 272,\n    \"release_date\": \"2024-07-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/Llama-3.1-8B-Instruct-FP8\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 93876,\n    \"hf_likes\": 32,\n    \"release_date\": \"2024-08-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"PatronusAI/Llama-3-Patronus-Lynx-8B-Instruct-v1.1\",\n    \"provider\": \"patronusai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261248,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 20626,\n    \"hf_likes\": 10,\n    \"release_date\": \"2024-07-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261696,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 684729,\n    \"hf_likes\": 44,\n    \"release_date\": \"2024-07-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-8B-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030261696,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 200501,\n    \"hf_likes\": 10,\n    \"release_date\": \"2024-07-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"fdtn-ai/Foundation-Sec-1.1-8B-Instruct\",\n    \"provider\": \"fdtn-ai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030326784,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 53389,\n    \"hf_likes\": 13,\n    \"release_date\": \"2025-11-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmms-lab/llava-onevision-qwen2-7b-ov\",\n    \"provider\": \"lmms-lab\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8030348832,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"vision\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llava\",\n    \"hf_downloads\": 133340,\n    \"hf_likes\": 62,\n    \"release_date\": \"2024-06-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8031637504,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 36809,\n    \"hf_likes\": 30,\n    \"release_date\": \"2024-07-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4\",\n    \"provider\": \"hugging-quants\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8031637504,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 27054,\n    \"hf_likes\": 41,\n    \"release_date\": \"2024-07-24\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"8.0B\",\n    \"parameters_raw\": 8031637504,\n    \"min_ram_gb\": 4.5,\n    \"recommended_ram_gb\": 7.5,\n    \"min_vram_gb\": 4.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 21204,\n    \"hf_likes\": 9,\n    \"release_date\": \"2024-07-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"ibm-granite/granite-3.3-8b-instruct\",\n    \"provider\": \"ibm-granite\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8170864640,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"granite\",\n    \"hf_downloads\": 65699,\n    \"hf_likes\": 153,\n    \"release_date\": \"2025-04-09\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/granite-3.3-8b-instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-8B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8190735360,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 790734,\n    \"hf_likes\": 87,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-8B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8190735360,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 327827,\n    \"hf_likes\": 37,\n    \"release_date\": \"2025-05-03\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-0528-Qwen3-8B\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8190735360,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 148562,\n    \"hf_likes\": 1040,\n    \"release_date\": \"2025-05-29\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"huihui-ai/Huihui-Qwen3-8B-abliterated-v2\",\n    \"provider\": \"huihui-ai\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8190735360,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 32025,\n    \"hf_likes\": 34,\n    \"release_date\": \"2025-06-18\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-8B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8191159296,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 196191,\n    \"hf_likes\": 57,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nytopop/Qwen3-8B.w8a8\",\n    \"provider\": \"nytopop\",\n    \"parameter_count\": \"8.2B\",\n    \"parameters_raw\": 8192136192,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.6,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 33985,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-04-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-VL-7B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"8.3B\",\n    \"parameters_raw\": 8292166656,\n    \"min_ram_gb\": 4.6,\n    \"recommended_ram_gb\": 7.7,\n    \"min_vram_gb\": 4.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen2_5_vl\",\n    \"hf_downloads\": 4008802,\n    \"hf_likes\": 1462,\n    \"release_date\": \"2025-01-26\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-VL-7B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-8B-A1B\",\n    \"provider\": \"liquidai\",\n    \"parameter_count\": \"8.3B\",\n    \"parameters_raw\": 8339929856,\n    \"min_ram_gb\": 4.7,\n    \"recommended_ram_gb\": 7.8,\n    \"min_vram_gb\": 4.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 47242,\n    \"hf_likes\": 328,\n    \"release_date\": \"2025-10-07\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 1407363160,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/LFM2-8B-A1B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/Mistral-NeMo-Minitron-8B-Instruct\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.4B\",\n    \"parameters_raw\": 8414105600,\n    \"min_ram_gb\": 4.7,\n    \"recommended_ram_gb\": 7.8,\n    \"min_vram_gb\": 4.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 55809,\n    \"hf_likes\": 82,\n    \"release_date\": \"2024-10-02\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Mistral-NeMo-Minitron-8B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"01-ai/Yi-1.5-9B-Chat\",\n    \"provider\": \"01.ai\",\n    \"parameter_count\": \"8.8B\",\n    \"parameters_raw\": 8829407232,\n    \"min_ram_gb\": 4.9,\n    \"recommended_ram_gb\": 8.2,\n    \"min_vram_gb\": 4.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 19975,\n    \"hf_likes\": 148,\n    \"release_date\": \"2024-05-10\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Yi-1.5-9B-Chat-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2-Base\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.9B\",\n    \"parameters_raw\": 8888227328,\n    \"min_ram_gb\": 5.0,\n    \"recommended_ram_gb\": 8.3,\n    \"min_vram_gb\": 4.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 165722,\n    \"hf_likes\": 43,\n    \"release_date\": \"2025-08-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.9B\",\n    \"parameters_raw\": 8888227328,\n    \"min_ram_gb\": 5.0,\n    \"recommended_ram_gb\": 8.3,\n    \"min_vram_gb\": 4.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 24028,\n    \"hf_likes\": 121,\n    \"release_date\": \"2026-02-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2-FP8\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"8.9B\",\n    \"parameters_raw\": 8888227432,\n    \"min_ram_gb\": 5.0,\n    \"recommended_ram_gb\": 8.3,\n    \"min_vram_gb\": 4.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 70791,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-09-22\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2\",\n    \"provider\": \"NVIDIA\",\n    \"parameter_count\": \"9B\",\n    \"parameters_raw\": 9000000000,\n    \"min_ram_gb\": 5.0,\n    \"recommended_ram_gb\": 8.4,\n    \"min_vram_gb\": 4.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Hybrid Mamba2, reasoning\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-06-01\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-32B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"9.2B\",\n    \"parameters_raw\": 9214833664,\n    \"min_ram_gb\": 5.1,\n    \"recommended_ram_gb\": 8.6,\n    \"min_vram_gb\": 4.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 24718,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen2.5-Coder-32B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"9.2B\",\n    \"parameters_raw\": 9215644672,\n    \"min_ram_gb\": 5.1,\n    \"recommended_ram_gb\": 8.6,\n    \"min_vram_gb\": 4.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 41754,\n    \"hf_likes\": 3,\n    \"release_date\": \"2024-11-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/QwQ-32B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"9.2B\",\n    \"parameters_raw\": 9215644672,\n    \"min_ram_gb\": 5.1,\n    \"recommended_ram_gb\": 8.6,\n    \"min_vram_gb\": 4.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 32269,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-03-05\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-2-9b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"9.2B\",\n    \"parameters_raw\": 9241705984,\n    \"min_ram_gb\": 5.2,\n    \"recommended_ram_gb\": 8.6,\n    \"min_vram_gb\": 4.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 180627,\n    \"hf_likes\": 775,\n    \"release_date\": \"2024-06-24\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-2-9b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zai-org/glm-4-9b-chat-hf\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"9.4B\",\n    \"parameters_raw\": 9399951360,\n    \"min_ram_gb\": 5.3,\n    \"recommended_ram_gb\": 8.8,\n    \"min_vram_gb\": 4.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm\",\n    \"hf_downloads\": 22553,\n    \"hf_likes\": 24,\n    \"release_date\": \"2024-10-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"THUDM/glm-4-9b-chat\",\n    \"provider\": \"thudm\",\n    \"parameter_count\": \"9.4B\",\n    \"parameters_raw\": 9399951392,\n    \"min_ram_gb\": 5.3,\n    \"recommended_ram_gb\": 8.8,\n    \"min_vram_gb\": 4.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"unknown\",\n    \"architecture\": \"chatglm\",\n    \"hf_downloads\": 190092,\n    \"hf_likes\": 702,\n    \"release_date\": \"2024-06-04\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/glm-4-9b-chat-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zai-org/glm-4-9b\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"9.4B\",\n    \"parameters_raw\": 9399951392,\n    \"min_ram_gb\": 5.3,\n    \"recommended_ram_gb\": 8.8,\n    \"min_vram_gb\": 4.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"chatglm\",\n    \"hf_downloads\": 23550,\n    \"hf_likes\": 143,\n    \"release_date\": \"2024-06-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-9B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"9.7B\",\n    \"parameters_raw\": 9653104368,\n    \"min_ram_gb\": 5.4,\n    \"recommended_ram_gb\": 9.0,\n    \"min_vram_gb\": 4.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 172298,\n    \"hf_likes\": 345,\n    \"release_date\": \"2026-02-27\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-9B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-9B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"9.7B\",\n    \"parameters_raw\": 9653104368,\n    \"min_ram_gb\": 5.4,\n    \"recommended_ram_gb\": 9.0,\n    \"min_vram_gb\": 4.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 5324,\n    \"hf_likes\": 38,\n    \"release_date\": \"2026-02-26\"\n  },\n  {\n    \"name\": \"solidrust/gemma-2-9b-it-AWQ\",\n    \"provider\": \"solidrust\",\n    \"parameter_count\": \"10.2B\",\n    \"parameters_raw\": 10159209984,\n    \"min_ram_gb\": 5.7,\n    \"recommended_ram_gb\": 9.5,\n    \"min_vram_gb\": 5.2,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 32664,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-09-03\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.2-11B-Vision-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"11.0B\",\n    \"parameters_raw\": 10665463808,\n    \"min_ram_gb\": 6.0,\n    \"recommended_ram_gb\": 9.9,\n    \"min_vram_gb\": 5.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"upstage/SOLAR-10.7B-Instruct-v1.0\",\n    \"provider\": \"Upstage\",\n    \"parameter_count\": \"10.7B\",\n    \"parameters_raw\": 10700000000,\n    \"min_ram_gb\": 6.0,\n    \"recommended_ram_gb\": 10.0,\n    \"min_vram_gb\": 5.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"High-performance instruction following\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B\",\n    \"provider\": \"naver-hyperclovax\",\n    \"parameter_count\": \"10.7B\",\n    \"parameters_raw\": 10741664520,\n    \"min_ram_gb\": 6.0,\n    \"recommended_ram_gb\": 10.0,\n    \"min_vram_gb\": 5.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"vlm\",\n    \"hf_downloads\": 102546,\n    \"hf_likes\": 181,\n    \"release_date\": \"2025-12-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"speakleash/Bielik-11B-v3.0-Instruct\",\n    \"provider\": \"speakleash\",\n    \"parameter_count\": \"11.2B\",\n    \"parameters_raw\": 11168796672,\n    \"min_ram_gb\": 6.2,\n    \"recommended_ram_gb\": 10.4,\n    \"min_vram_gb\": 5.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 232376,\n    \"hf_likes\": 55,\n    \"release_date\": \"2025-11-07\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cjvt/GaMS3-12B-Instruct\",\n    \"provider\": \"cjvt\",\n    \"parameter_count\": \"11.8B\",\n    \"parameters_raw\": 11766034176,\n    \"min_ram_gb\": 6.6,\n    \"recommended_ram_gb\": 11.0,\n    \"min_vram_gb\": 6.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma3_text\",\n    \"hf_downloads\": 26653,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-12-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"EleutherAI/pythia-12b\",\n    \"provider\": \"eleutherai\",\n    \"parameter_count\": \"12.0B\",\n    \"parameters_raw\": 11997067840,\n    \"min_ram_gb\": 6.7,\n    \"recommended_ram_gb\": 11.2,\n    \"min_vram_gb\": 6.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_neox\",\n    \"hf_downloads\": 43453,\n    \"hf_likes\": 144,\n    \"release_date\": \"2023-02-28\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"google/gemma-3-12b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"12B\",\n    \"parameters_raw\": 12000000000,\n    \"min_ram_gb\": 6.7,\n    \"recommended_ram_gb\": 11.2,\n    \"min_vram_gb\": 6.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, vision and text\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gemma-3-12b-it-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"mistralai/Mistral-Nemo-Instruct-2407\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"12.2B\",\n    \"parameters_raw\": 12247076864,\n    \"min_ram_gb\": 6.8,\n    \"recommended_ram_gb\": 11.4,\n    \"min_vram_gb\": 6.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Mistral-Nemo-Instruct-2407-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Mistral-Nemo-Instruct-2407-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"casperhansen/mistral-nemo-instruct-2407-awq\",\n    \"provider\": \"casperhansen\",\n    \"parameter_count\": \"12.2B\",\n    \"parameters_raw\": 12247782400,\n    \"min_ram_gb\": 6.8,\n    \"recommended_ram_gb\": 11.4,\n    \"min_vram_gb\": 6.3,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 1024000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 189490,\n    \"hf_likes\": 12,\n    \"release_date\": \"2024-07-23\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"m8than/Mistral-Nemo-Instruct-2407-lenient-chatfix\",\n    \"provider\": \"m8than\",\n    \"parameter_count\": \"12.2B\",\n    \"parameters_raw\": 12247782400,\n    \"min_ram_gb\": 6.8,\n    \"recommended_ram_gb\": 11.4,\n    \"min_vram_gb\": 6.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 25879,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-05-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"mixtao/MixTAO-7Bx2-MoE-v8.1\",\n    \"provider\": \"mixtao\",\n    \"parameter_count\": \"12.9B\",\n    \"parameters_raw\": 12879138816,\n    \"min_ram_gb\": 7.2,\n    \"recommended_ram_gb\": 12.0,\n    \"min_vram_gb\": 6.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 20213,\n    \"hf_likes\": 55,\n    \"release_date\": \"2024-02-26\",\n    \"is_moe\": true,\n    \"num_experts\": 2,\n    \"active_experts\": 2,\n    \"active_parameters\": 12879138816,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"microsoft/Orca-2-13b\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13015864320,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Reasoning, step-by-step solutions\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"lmsys/vicuna-13b-v1.5\",\n    \"provider\": \"LMSYS\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13015864320,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"WizardLMTeam/WizardLM-13B-V1.2\",\n    \"provider\": \"WizardLM\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13015864320,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"cais/HarmBench-Llama-2-13b-cls\",\n    \"provider\": \"cais\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13015864320,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 30370,\n    \"hf_likes\": 27,\n    \"release_date\": \"2024-02-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/CodeLlama-13b-Instruct-hf\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"13.0B\",\n    \"parameters_raw\": 13016028160,\n    \"min_ram_gb\": 7.3,\n    \"recommended_ram_gb\": 12.1,\n    \"min_vram_gb\": 6.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 6450,\n    \"hf_likes\": 27,\n    \"release_date\": \"2024-03-13\"\n  },\n  {\n    \"name\": \"microsoft/phi-4\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"14B\",\n    \"parameters_raw\": 14000000000,\n    \"min_ram_gb\": 7.8,\n    \"recommended_ram_gb\": 13.0,\n    \"min_vram_gb\": 7.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Reasoning, STEM, code generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/phi-4-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/phi-4-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/Phi-3-medium-14b-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"14B\",\n    \"parameters_raw\": 14000000000,\n    \"min_ram_gb\": 7.8,\n    \"recommended_ram_gb\": 13.0,\n    \"min_vram_gb\": 7.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Balanced performance and size\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"microsoft/Phi-4-reasoning\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"14B\",\n    \"parameters_raw\": 14000000000,\n    \"min_ram_gb\": 7.8,\n    \"recommended_ram_gb\": 13.0,\n    \"min_vram_gb\": 7.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Advanced reasoning, math and code\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi4\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-01\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Phi-4-reasoning-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"microsoft/Phi-4-multimodal-instruct\",\n    \"provider\": \"Microsoft\",\n    \"parameter_count\": \"14B\",\n    \"parameters_raw\": 14000000000,\n    \"min_ram_gb\": 7.8,\n    \"recommended_ram_gb\": 13.0,\n    \"min_vram_gb\": 7.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Multimodal, vision and audio\",\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"phi4\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-04-01\"\n  },\n  {\n    \"name\": \"Qwen/Qwen-14B-Chat-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.2B\",\n    \"parameters_raw\": 14168796160,\n    \"min_ram_gb\": 7.9,\n    \"recommended_ram_gb\": 13.2,\n    \"min_vram_gb\": 7.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen\",\n    \"hf_downloads\": 45732,\n    \"hf_likes\": 100,\n    \"release_date\": \"2023-09-24\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-MoE-A2.7B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.3B\",\n    \"parameters_raw\": 14315784192,\n    \"min_ram_gb\": 8.0,\n    \"recommended_ram_gb\": 13.3,\n    \"min_vram_gb\": 7.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2_moe\",\n    \"hf_downloads\": 59931,\n    \"hf_likes\": 220,\n    \"release_date\": \"2024-02-29\",\n    \"is_moe\": true,\n    \"num_experts\": 60,\n    \"active_experts\": 4,\n    \"active_parameters\": 1622455541,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bullpoint/Qwen3-Coder-Next-AWQ-4bit\",\n    \"provider\": \"bullpoint\",\n    \"parameter_count\": \"14.4B\",\n    \"parameters_raw\": 14444722944,\n    \"min_ram_gb\": 8.1,\n    \"recommended_ram_gb\": 13.5,\n    \"min_vram_gb\": 7.4,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 1226868,\n    \"hf_likes\": 14,\n    \"release_date\": \"2026-02-03\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 990253467,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"stelterlab/phi-4-AWQ\",\n    \"provider\": \"stelterlab\",\n    \"parameter_count\": \"14.7B\",\n    \"parameters_raw\": 14659507200,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.5,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 16384,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"phi3\",\n    \"hf_downloads\": 55064,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-01-11\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"14.7B\",\n    \"parameters_raw\": 14736242944,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.5,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 192744,\n    \"hf_likes\": 61,\n    \"release_date\": \"2025-09-12\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 1010238527,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/Qwen3-Next-80B-A3B-Thinking-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"14.7B\",\n    \"parameters_raw\": 14736242944,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.5,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 168561,\n    \"hf_likes\": 22,\n    \"release_date\": \"2025-09-12\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 1010238527,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen3-14B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14768307200,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 258163,\n    \"hf_likes\": 57,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"OpenPipe/Qwen3-14B-Instruct\",\n    \"provider\": \"openpipe\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14768307200,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 207053,\n    \"hf_likes\": 12,\n    \"release_date\": \"2025-10-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Goekdeniz-Guelmez/Josiefied-Qwen3-14B-abliterated-v3\",\n    \"provider\": \"goekdeniz-guelmez\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14768307200,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 55059,\n    \"hf_likes\": 24,\n    \"release_date\": \"2025-05-12\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-14B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14768307200,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 50835,\n    \"hf_likes\": 49,\n    \"release_date\": \"2025-04-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen3-14B-Base-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770000000,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-14B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-14B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770000000,\n    \"min_ram_gb\": 8.2,\n    \"recommended_ram_gb\": 13.7,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-14B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-14B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 491583,\n    \"hf_likes\": 142,\n    \"release_date\": \"2024-11-06\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-14B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-14B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1077036,\n    \"hf_likes\": 27,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-Distill-Qwen-14B\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 761474,\n    \"hf_likes\": 608,\n    \"release_date\": \"2025-01-20\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-14B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 168345,\n    \"hf_likes\": 16,\n    \"release_date\": \"2024-11-09\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 100307,\n    \"hf_likes\": 144,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen2.5-14B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 93325,\n    \"hf_likes\": 26,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct-1M\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 1010000,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 54355,\n    \"hf_likes\": 334,\n    \"release_date\": \"2025-01-23\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-14B-Instruct-1M-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"OpenDFM/ChemDFM-R-14B\",\n    \"provider\": \"opendfm\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 41195,\n    \"hf_likes\": 6,\n    \"release_date\": \"2025-10-26\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/ChemDFM-R-14B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"GPTQ-Int8\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 37961,\n    \"hf_likes\": 21,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-14B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"14.8B\",\n    \"parameters_raw\": 14770033664,\n    \"min_ram_gb\": 8.3,\n    \"recommended_ram_gb\": 13.8,\n    \"min_vram_gb\": 7.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 27181,\n    \"hf_likes\": 66,\n    \"release_date\": \"2024-11-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-14B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"WizardLMTeam/WizardCoder-15B-V1.0\",\n    \"provider\": \"WizardLM\",\n    \"parameter_count\": \"15.5B\",\n    \"parameters_raw\": 15515334656,\n    \"min_ram_gb\": 8.7,\n    \"recommended_ram_gb\": 14.5,\n    \"min_vram_gb\": 7.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Code generation and completion\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/WizardCoder-15B-V1.0-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/Qwen3-30B-A3B-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"15.6B\",\n    \"parameters_raw\": 15583623168,\n    \"min_ram_gb\": 8.7,\n    \"recommended_ram_gb\": 14.5,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 63897,\n    \"hf_likes\": 24,\n    \"release_date\": \"2025-07-08\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 1704458782,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"NVFP4/Qwen3-Coder-30B-A3B-Instruct-FP4\",\n    \"provider\": \"nvfp4\",\n    \"parameter_count\": \"15.6B\",\n    \"parameters_raw\": 15583623168,\n    \"min_ram_gb\": 8.7,\n    \"recommended_ram_gb\": 14.5,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 25920,\n    \"hf_likes\": 11,\n    \"release_date\": \"2025-08-05\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 1704458782,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"bigcode/starcoder2-15b\",\n    \"provider\": \"BigCode\",\n    \"parameter_count\": \"15.7B\",\n    \"parameters_raw\": 15700000000,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 16384,\n    \"use_case\": \"Code generation and completion\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"starcoder2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/starcoder2-15b-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"16B\",\n    \"parameters_raw\": 15700000000,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Code generation and completion\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2400000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V2-Lite-Chat\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"15.7B\",\n    \"parameters_raw\": 15706484224,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 330400,\n    \"hf_likes\": 134,\n    \"release_date\": \"2024-05-15\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2184182961,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/DeepSeek-V2-Lite-Chat-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V2-Lite\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"15.7B\",\n    \"parameters_raw\": 15706484224,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 194737,\n    \"hf_likes\": 167,\n    \"release_date\": \"2024-05-15\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2184182961,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/DeepSeek-V2-Lite-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/DeepSeek-Coder-V2-Lite-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"15.7B\",\n    \"parameters_raw\": 15706484224,\n    \"min_ram_gb\": 8.8,\n    \"recommended_ram_gb\": 14.6,\n    \"min_vram_gb\": 8.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 53780,\n    \"hf_likes\": 9,\n    \"release_date\": \"2024-07-17\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2184182961,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"moonshotai/Moonlight-16B-A3B\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"16.0B\",\n    \"parameters_raw\": 15960111936,\n    \"min_ram_gb\": 8.9,\n    \"recommended_ram_gb\": 14.9,\n    \"min_vram_gb\": 8.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 45835,\n    \"hf_likes\": 108,\n    \"release_date\": \"2025-02-22\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 6,\n    \"active_parameters\": 1153367458,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"moonshotai/Moonlight-16B-A3B-Instruct\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"16.0B\",\n    \"parameters_raw\": 15960111936,\n    \"min_ram_gb\": 8.9,\n    \"recommended_ram_gb\": 14.9,\n    \"min_vram_gb\": 8.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 38514,\n    \"hf_likes\": 192,\n    \"release_date\": \"2025-02-22\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 6,\n    \"active_parameters\": 1153367458,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"inclusionAI/LLaDA2.1-mini\",\n    \"provider\": \"inclusionai\",\n    \"parameter_count\": \"16.3B\",\n    \"parameters_raw\": 16255643392,\n    \"min_ram_gb\": 9.1,\n    \"recommended_ram_gb\": 15.1,\n    \"min_vram_gb\": 8.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llada2_moe\",\n    \"hf_downloads\": 21824,\n    \"hf_likes\": 94,\n    \"release_date\": \"2026-02-09\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 1295371577,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/deepseek-moe-16b-base\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"16.4B\",\n    \"parameters_raw\": 16375728128,\n    \"min_ram_gb\": 9.2,\n    \"recommended_ram_gb\": 15.3,\n    \"min_vram_gb\": 8.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek\",\n    \"hf_downloads\": 22326,\n    \"hf_likes\": 139,\n    \"release_date\": \"2024-01-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/deepseek-moe-16b-base-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"inclusionAI/Ling-lite\",\n    \"provider\": \"inclusionai\",\n    \"parameter_count\": \"16.8B\",\n    \"parameters_raw\": 16801974272,\n    \"min_ram_gb\": 9.4,\n    \"recommended_ram_gb\": 15.6,\n    \"min_vram_gb\": 8.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bailing_moe\",\n    \"hf_downloads\": 388,\n    \"hf_likes\": 78,\n    \"release_date\": \"2025-02-28\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 2336524543,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Ling-lite-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/Qwen3-32B-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"17.2B\",\n    \"parameters_raw\": 17159312384,\n    \"min_ram_gb\": 9.6,\n    \"recommended_ram_gb\": 16.0,\n    \"min_vram_gb\": 8.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 26285,\n    \"hf_likes\": 11,\n    \"release_date\": \"2025-09-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"18.2B\",\n    \"parameters_raw\": 18237772608,\n    \"min_ram_gb\": 10.2,\n    \"recommended_ram_gb\": 17.0,\n    \"min_vram_gb\": 9.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 490404,\n    \"hf_likes\": 105,\n    \"release_date\": \"2025-12-20\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/GLM-4.5-Air-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"18.6B\",\n    \"parameters_raw\": 18626406504,\n    \"min_ram_gb\": 10.4,\n    \"recommended_ram_gb\": 17.3,\n    \"min_vram_gb\": 9.5,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe\",\n    \"hf_downloads\": 260177,\n    \"hf_likes\": 27,\n    \"release_date\": \"2025-07-29\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"QuantTrio/GLM-4.5-Air-GPTQ-Int4-Int8Mix\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"19.8B\",\n    \"parameters_raw\": 19809102592,\n    \"min_ram_gb\": 11.1,\n    \"recommended_ram_gb\": 18.4,\n    \"min_vram_gb\": 10.1,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe\",\n    \"hf_downloads\": 24759,\n    \"hf_likes\": 10,\n    \"release_date\": \"2025-07-30\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"internlm/internlm2-chat-20b\",\n    \"provider\": \"internlm\",\n    \"parameter_count\": \"19.9B\",\n    \"parameters_raw\": 19861149696,\n    \"min_ram_gb\": 11.1,\n    \"recommended_ram_gb\": 18.5,\n    \"min_vram_gb\": 10.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"internlm2\",\n    \"hf_downloads\": 20010,\n    \"hf_likes\": 88,\n    \"release_date\": \"2024-01-10\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/internlm2-chat-20b-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"openai/gpt-oss-20b\",\n    \"provider\": \"openai\",\n    \"parameter_count\": \"21.5B\",\n    \"parameters_raw\": 21511953984,\n    \"min_ram_gb\": 12.0,\n    \"recommended_ram_gb\": 20.0,\n    \"min_vram_gb\": 11.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 7049150,\n    \"hf_likes\": 4421,\n    \"release_date\": \"2025-08-04\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 3630142231,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gpt-oss-20b-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/gpt-oss-20b\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"21.5B\",\n    \"parameters_raw\": 21511953984,\n    \"min_ram_gb\": 12.0,\n    \"recommended_ram_gb\": 20.0,\n    \"min_vram_gb\": 11.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 20506,\n    \"hf_likes\": 5,\n    \"release_date\": \"2025-09-04\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 3630142231,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gpt-oss-20b-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/ERNIE-4.5-21B-A3B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"21.8B\",\n    \"parameters_raw\": 21825436160,\n    \"min_ram_gb\": 12.2,\n    \"recommended_ram_gb\": 20.3,\n    \"min_vram_gb\": 11.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ernie4_5_moe\",\n    \"hf_downloads\": 24749,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-07-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/ERNIE-4.5-21B-A3B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"21.8B\",\n    \"parameters_raw\": 21825436160,\n    \"min_ram_gb\": 12.2,\n    \"recommended_ram_gb\": 20.3,\n    \"min_vram_gb\": 11.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ernie4_5_moe\",\n    \"hf_downloads\": 24612,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-07-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/ERNIE-4.5-21B-A3B-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"21.8B\",\n    \"parameters_raw\": 21825436160,\n    \"min_ram_gb\": 12.2,\n    \"recommended_ram_gb\": 20.3,\n    \"min_vram_gb\": 11.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ernie4_5_moe\",\n    \"hf_downloads\": 24573,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-07-10\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"solidrust/Codestral-22B-v0.1-hf-AWQ\",\n    \"provider\": \"solidrust\",\n    \"parameter_count\": \"22.2B\",\n    \"parameters_raw\": 22247282688,\n    \"min_ram_gb\": 12.4,\n    \"recommended_ram_gb\": 20.7,\n    \"min_vram_gb\": 11.4,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 84893,\n    \"hf_likes\": 2,\n    \"release_date\": \"2024-05-30\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"stelterlab/Mistral-Small-24B-Instruct-2501-AWQ\",\n    \"provider\": \"stelterlab\",\n    \"parameter_count\": \"23.6B\",\n    \"parameters_raw\": 23572403200,\n    \"min_ram_gb\": 13.2,\n    \"recommended_ram_gb\": 22.0,\n    \"min_vram_gb\": 12.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 266172,\n    \"hf_likes\": 26,\n    \"release_date\": \"2025-01-30\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/Devstral-Small-2507-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.6B\",\n    \"parameters_raw\": 23572403200,\n    \"min_ram_gb\": 13.2,\n    \"recommended_ram_gb\": 22.0,\n    \"min_vram_gb\": 12.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 19891,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-07-09\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-24B-A2B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843659008,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 207367,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-23\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 2607900202,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-24B-A2B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843659008,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 205544,\n    \"hf_likes\": 2,\n    \"release_date\": \"2026-02-23\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 2607900202,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-24B-A2B-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843659008,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 204884,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-23\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 2607900202,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/LFM2-24B-A2B-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843659008,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2_moe\",\n    \"hf_downloads\": 204308,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-23\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 2607900202,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LiquidAI/LFM2-24B-A2B\",\n    \"provider\": \"Liquid AI\",\n    \"parameter_count\": \"23.8B\",\n    \"parameters_raw\": 23843661440,\n    \"min_ram_gb\": 13.3,\n    \"recommended_ram_gb\": 22.2,\n    \"min_vram_gb\": 12.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 128000,\n    \"use_case\": \"Agentic tasks, RAG, summarization\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"lfm2\",\n    \"is_moe\": true,\n    \"num_experts\": 32,\n    \"active_experts\": 4,\n    \"active_parameters\": 2300000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-11-28\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/LFM2-24B-A2B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"mistralai/Mistral-Small-24B-Instruct-2501\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"24B\",\n    \"parameters_raw\": 24000000000,\n    \"min_ram_gb\": 13.4,\n    \"recommended_ram_gb\": 22.4,\n    \"min_vram_gb\": 12.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mistral\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Mistral-Small-24B-Instruct-2501-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Mistral-Small-24B-Instruct-2501-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"google/gemma-2-27b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"27.2B\",\n    \"parameters_raw\": 27227128320,\n    \"min_ram_gb\": 15.2,\n    \"recommended_ram_gb\": 25.4,\n    \"min_vram_gb\": 13.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gemma2\",\n    \"hf_downloads\": 409260,\n    \"hf_likes\": 560,\n    \"release_date\": \"2024-06-24\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/gemma-2-27b-it-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"google/gemma-3-27b-it\",\n    \"provider\": \"Google\",\n    \"parameter_count\": \"27.4B\",\n    \"parameters_raw\": 27432406640,\n    \"min_ram_gb\": 15.3,\n    \"recommended_ram_gb\": 25.5,\n    \"min_vram_gb\": 14.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"gemma3\",\n    \"hf_downloads\": 1520563,\n    \"hf_likes\": 1905,\n    \"release_date\": \"2025-03-01\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gemma-3-27b-it-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-27B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"27.8B\",\n    \"parameters_raw\": 27781427952,\n    \"min_ram_gb\": 15.5,\n    \"recommended_ram_gb\": 25.9,\n    \"min_vram_gb\": 14.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5\",\n    \"hf_downloads\": 406808,\n    \"hf_likes\": 565,\n    \"release_date\": \"2026-02-24\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-27B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/GLM-4.7-Flash-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"29.9B\",\n    \"parameters_raw\": 29943393920,\n    \"min_ram_gb\": 16.7,\n    \"recommended_ram_gb\": 27.9,\n    \"min_vram_gb\": 15.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe_lite\",\n    \"hf_downloads\": 1001623,\n    \"hf_likes\": 9,\n    \"release_date\": \"2026-01-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/GLM-4.7-Flash-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"29.9B\",\n    \"parameters_raw\": 29943393920,\n    \"min_ram_gb\": 16.7,\n    \"recommended_ram_gb\": 27.9,\n    \"min_vram_gb\": 15.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe_lite\",\n    \"hf_downloads\": 991211,\n    \"hf_likes\": 8,\n    \"release_date\": \"2026-01-19\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-30B-A3B-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 226311,\n    \"hf_likes\": 47,\n    \"release_date\": \"2025-05-05\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 191895,\n    \"hf_likes\": 14,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 185814,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-08-01\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 181127,\n    \"hf_likes\": 12,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 179804,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-30B-A3B-Base\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 83458,\n    \"hf_likes\": 69,\n    \"release_date\": \"2025-04-28\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen3-30B-A3B-Base-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"typhoon-ai/typhoon2.5-qwen3-30b-a3b\",\n    \"provider\": \"typhoon-ai\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 53587,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-09-23\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 46035,\n    \"hf_likes\": 6,\n    \"release_date\": \"2025-08-01\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 45854,\n    \"hf_likes\": 6,\n    \"release_date\": \"2025-07-29\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 44199,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-07-29\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 43483,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-29\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Alibaba-NLP/Tongyi-DeepResearch-30B-A3B\",\n    \"provider\": \"alibaba-nlp\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30532122624,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 26559,\n    \"hf_likes\": 802,\n    \"release_date\": \"2025-09-16\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339450907,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Tongyi-DeepResearch-30B-A3B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-30B-A3B-Instruct-2507-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30533947392,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 957458,\n    \"hf_likes\": 115,\n    \"release_date\": \"2025-07-28\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339650489,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"30.5B\",\n    \"parameters_raw\": 30533947392,\n    \"min_ram_gb\": 17.1,\n    \"recommended_ram_gb\": 28.4,\n    \"min_vram_gb\": 15.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 265519,\n    \"hf_likes\": 164,\n    \"release_date\": \"2025-07-31\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 3339650489,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"31.1B\",\n    \"parameters_raw\": 31070754032,\n    \"min_ram_gb\": 17.4,\n    \"recommended_ram_gb\": 28.9,\n    \"min_vram_gb\": 15.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_vl_moe\",\n    \"hf_downloads\": 301353,\n    \"hf_likes\": 40,\n    \"release_date\": \"2025-10-04\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 2475950709,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"QuantTrio/GLM-4.7-Flash-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"31.2B\",\n    \"parameters_raw\": 31221488576,\n    \"min_ram_gb\": 17.4,\n    \"recommended_ram_gb\": 29.1,\n    \"min_vram_gb\": 16.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe_lite\",\n    \"hf_downloads\": 103703,\n    \"hf_likes\": 7,\n    \"release_date\": \"2026-01-21\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577935872,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 195432,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-12-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577935872,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 190541,\n    \"hf_likes\": 3,\n    \"release_date\": \"2025-12-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577935872,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 188175,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-12-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577935872,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 188130,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-12-16\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577937344,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 1025721,\n    \"hf_likes\": 648,\n    \"release_date\": \"2025-12-04\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577937344,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 65364,\n    \"hf_likes\": 109,\n    \"release_date\": \"2025-12-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"OpenResearcher/OpenResearcher-30B-A3B\",\n    \"provider\": \"openresearcher\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577937344,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 23630,\n    \"hf_likes\": 59,\n    \"release_date\": \"2026-02-03\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/OpenResearcher-30B-A3B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"31.6B\",\n    \"parameters_raw\": 31577946256,\n    \"min_ram_gb\": 17.6,\n    \"recommended_ram_gb\": 29.4,\n    \"min_vram_gb\": 16.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron_h\",\n    \"hf_downloads\": 1412797,\n    \"hf_likes\": 289,\n    \"release_date\": \"2025-12-06\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LGAI-EXAONE/EXAONE-4.0-32B\",\n    \"provider\": \"LG AI\",\n    \"parameter_count\": \"32B\",\n    \"parameters_raw\": 32000000000,\n    \"min_ram_gb\": 17.9,\n    \"recommended_ram_gb\": 29.8,\n    \"min_vram_gb\": 16.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Hybrid reasoning, multilingual\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-07-15\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/EXAONE-4.0-32B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LGAI-EXAONE/EXAONE-4.0.1-32B\",\n    \"provider\": \"lgai-exaone\",\n    \"parameter_count\": \"32.0B\",\n    \"parameters_raw\": 32003216384,\n    \"min_ram_gb\": 17.9,\n    \"recommended_ram_gb\": 29.8,\n    \"min_vram_gb\": 16.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone4\",\n    \"hf_downloads\": 186516,\n    \"hf_likes\": 24,\n    \"release_date\": \"2025-07-29\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/EXAONE-4.0.1-32B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"LGAI-EXAONE/EXAONE-4.0-32B-FP8\",\n    \"provider\": \"lgai-exaone\",\n    \"parameter_count\": \"32.0B\",\n    \"parameters_raw\": 32005105664,\n    \"min_ram_gb\": 17.9,\n    \"recommended_ram_gb\": 29.8,\n    \"min_vram_gb\": 16.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone4\",\n    \"hf_downloads\": 20430,\n    \"hf_likes\": 17,\n    \"release_date\": \"2025-07-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"allenai/OLMo-2-0325-32B-Instruct\",\n    \"provider\": \"allenai\",\n    \"parameter_count\": \"32.2B\",\n    \"parameters_raw\": 32234279936,\n    \"min_ram_gb\": 18.0,\n    \"recommended_ram_gb\": 30.0,\n    \"min_vram_gb\": 16.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"olmo2\",\n    \"hf_downloads\": 2979,\n    \"hf_likes\": 148,\n    \"release_date\": \"2025-03-12\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/OLMo-2-0325-32B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.5B\",\n    \"parameters_raw\": 32510000000,\n    \"min_ram_gb\": 18.2,\n    \"recommended_ram_gb\": 30.3,\n    \"min_vram_gb\": 16.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-32B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-32B-Chat\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.5B\",\n    \"parameters_raw\": 32512218112,\n    \"min_ram_gb\": 18.2,\n    \"recommended_ram_gb\": 30.3,\n    \"min_vram_gb\": 16.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 25041,\n    \"hf_likes\": 109,\n    \"release_date\": \"2024-04-03\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen1.5-32B-Chat-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nn-tech/MetalGPT-1\",\n    \"provider\": \"nn-tech\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32759593984,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 20663,\n    \"hf_likes\": 38,\n    \"release_date\": \"2025-12-04\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-32B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32762123264,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3\",\n    \"hf_downloads\": 552811,\n    \"hf_likes\": 129,\n    \"release_date\": \"2025-05-01\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-32B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 858975,\n    \"hf_likes\": 2000,\n    \"release_date\": \"2024-11-06\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen2.5-Coder-32B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-32B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 873156,\n    \"hf_likes\": 1525,\n    \"release_date\": \"2025-01-20\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1643600,\n    \"hf_likes\": 94,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 1453252,\n    \"hf_likes\": 173,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen2.5-32B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-32B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 973260,\n    \"hf_likes\": 33,\n    \"release_date\": \"2024-11-09\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/QwQ-32B-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 280279,\n    \"hf_likes\": 133,\n    \"release_date\": \"2025-03-05\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"GPTQ-Int4\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 191251,\n    \"hf_likes\": 40,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"baichuan-inc/Baichuan-M2-32B\",\n    \"provider\": \"baichuan-inc\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 152016,\n    \"hf_likes\": 118,\n    \"release_date\": \"2025-08-10\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Baichuan-M2-32B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"GPTQ-Int8\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 105034,\n    \"hf_likes\": 14,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-Coder-32B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"32.8B\",\n    \"parameters_raw\": 32763876352,\n    \"min_ram_gb\": 18.3,\n    \"recommended_ram_gb\": 30.5,\n    \"min_vram_gb\": 16.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 43109,\n    \"hf_likes\": 142,\n    \"release_date\": \"2024-11-08\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-Coder-32B-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"meta-llama/CodeLlama-34b-Instruct-hf\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"33.7B\",\n    \"parameters_raw\": 33743970304,\n    \"min_ram_gb\": 18.9,\n    \"recommended_ram_gb\": 31.4,\n    \"min_vram_gb\": 17.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 950,\n    \"hf_likes\": 19,\n    \"release_date\": \"2024-03-14\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/CodeLlama-34b-Instruct-hf-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"01-ai/Yi-34B-Chat\",\n    \"provider\": \"01.ai\",\n    \"parameter_count\": \"34.4B\",\n    \"parameters_raw\": 34386780160,\n    \"min_ram_gb\": 19.2,\n    \"recommended_ram_gb\": 32.0,\n    \"min_vram_gb\": 17.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Multilingual, Chinese/English chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"yi\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"TheBloke/Yi-34B-Chat-GGUF\",\n        \"provider\": \"TheBloke\"\n      },\n      {\n        \"repo\": \"mradermacher/Yi-34B-Chat-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"dphn/dolphin-2.9.1-yi-1.5-34b\",\n    \"provider\": \"dphn\",\n    \"parameter_count\": \"34.4B\",\n    \"parameters_raw\": 34388917248,\n    \"min_ram_gb\": 19.2,\n    \"recommended_ram_gb\": 32.0,\n    \"min_vram_gb\": 17.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 8192,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 4650971,\n    \"hf_likes\": 56,\n    \"release_date\": \"2024-05-18\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/dolphin-2.9.1-yi-1.5-34b-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"CohereForAI/c4ai-command-r-v01\",\n    \"provider\": \"Cohere\",\n    \"parameter_count\": \"35B\",\n    \"parameters_raw\": 35000000000,\n    \"min_ram_gb\": 19.5,\n    \"recommended_ram_gb\": 32.6,\n    \"min_vram_gb\": 17.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"RAG, tool use, agents\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"cohere\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/c4ai-command-r-v01-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-35B-A3B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"36.0B\",\n    \"parameters_raw\": 35951822704,\n    \"min_ram_gb\": 20.1,\n    \"recommended_ram_gb\": 33.5,\n    \"min_vram_gb\": 18.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5_moe\",\n    \"hf_downloads\": 769032,\n    \"hf_likes\": 905,\n    \"release_date\": \"2026-02-24\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 3000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-35B-A3B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"lmstudio-community/Seed-OSS-36B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"36.2B\",\n    \"parameters_raw\": 36151104512,\n    \"min_ram_gb\": 20.2,\n    \"recommended_ram_gb\": 33.7,\n    \"min_vram_gb\": 18.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 524288,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"seed_oss\",\n    \"hf_downloads\": 46944,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-08-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Seed-OSS-36B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"36.2B\",\n    \"parameters_raw\": 36151104512,\n    \"min_ram_gb\": 20.2,\n    \"recommended_ram_gb\": 33.7,\n    \"min_vram_gb\": 18.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 524288,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"seed_oss\",\n    \"hf_downloads\": 45348,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Seed-OSS-36B-Instruct-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"36.2B\",\n    \"parameters_raw\": 36151104512,\n    \"min_ram_gb\": 20.2,\n    \"recommended_ram_gb\": 33.7,\n    \"min_vram_gb\": 18.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 524288,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"seed_oss\",\n    \"hf_downloads\": 45061,\n    \"hf_likes\": 1,\n    \"release_date\": \"2025-08-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Seed-OSS-36B-Instruct-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"36.2B\",\n    \"parameters_raw\": 36151104512,\n    \"min_ram_gb\": 20.2,\n    \"recommended_ram_gb\": 33.7,\n    \"min_vram_gb\": 18.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 524288,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"seed_oss\",\n    \"hf_downloads\": 44971,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"cyankiwi/MiniMax-M2.1-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"36.8B\",\n    \"parameters_raw\": 36811839984,\n    \"min_ram_gb\": 20.6,\n    \"recommended_ram_gb\": 34.3,\n    \"min_vram_gb\": 18.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 36114,\n    \"hf_likes\": 16,\n    \"release_date\": \"2025-12-27\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 2933443495,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"cyankiwi/MiniMax-M2.5-AWQ-4bit\",\n    \"provider\": \"cyankiwi\",\n    \"parameter_count\": \"36.8B\",\n    \"parameters_raw\": 36811839984,\n    \"min_ram_gb\": 20.6,\n    \"recommended_ram_gb\": 34.3,\n    \"min_vram_gb\": 18.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 24338,\n    \"hf_likes\": 6,\n    \"release_date\": \"2026-02-15\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 2933443495,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"mratsim/MiniMax-M2.5-BF16-INT4-AWQ\",\n    \"provider\": \"mratsim\",\n    \"parameter_count\": \"39.1B\",\n    \"parameters_raw\": 39115692032,\n    \"min_ram_gb\": 21.9,\n    \"recommended_ram_gb\": 36.4,\n    \"min_vram_gb\": 20.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 46268,\n    \"hf_likes\": 29,\n    \"release_date\": \"2026-02-14\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 3117031705,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"tiiuae/falcon-40b-instruct\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"40.0B\",\n    \"parameters_raw\": 40000000000,\n    \"min_ram_gb\": 22.4,\n    \"recommended_ram_gb\": 37.3,\n    \"min_vram_gb\": 20.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 2048,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/falcon-40b-instruct-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"mistralai/Mixtral-8x7B-Instruct-v0.1\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"46.7B\",\n    \"parameters_raw\": 46702792704,\n    \"min_ram_gb\": 26.1,\n    \"recommended_ram_gb\": 43.5,\n    \"min_vram_gb\": 23.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"unknown\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 787218,\n    \"hf_likes\": 4641,\n    \"release_date\": \"2023-12-10\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 12900000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF\",\n        \"provider\": \"TheBloke\"\n      },\n      {\n        \"repo\": \"mradermacher/Mixtral-8x7B-Instruct-v0.1-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Salesforce/xLAM-8x7b-r\",\n    \"provider\": \"salesforce\",\n    \"parameter_count\": \"46.7B\",\n    \"parameters_raw\": 46702792704,\n    \"min_ram_gb\": 26.1,\n    \"recommended_ram_gb\": 43.5,\n    \"min_vram_gb\": 23.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 25430,\n    \"hf_likes\": 15,\n    \"release_date\": \"2024-08-28\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 13427052901,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/xLAM-8x7b-r-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO\",\n    \"provider\": \"NousResearch\",\n    \"parameter_count\": \"46.7B\",\n    \"parameters_raw\": 46702809088,\n    \"min_ram_gb\": 26.1,\n    \"recommended_ram_gb\": 43.5,\n    \"min_vram_gb\": 23.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 9050,\n    \"hf_likes\": 453,\n    \"release_date\": \"2024-01-11\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 12900000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"TheBloke/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF\",\n        \"provider\": \"TheBloke\"\n      },\n      {\n        \"repo\": \"mradermacher/Nous-Hermes-2-Mixtral-8x7B-DPO-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"moonshotai/Kimi-Linear-48B-A3B-Instruct\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"49.1B\",\n    \"parameters_raw\": 49122681728,\n    \"min_ram_gb\": 27.4,\n    \"recommended_ram_gb\": 45.7,\n    \"min_vram_gb\": 25.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"kimi_linear\",\n    \"hf_downloads\": 35486,\n    \"hf_likes\": 546,\n    \"release_date\": \"2025-10-30\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Kimi-Linear-48B-A3B-Instruct-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/Llama-3_3-Nemotron-Super-49B-v1_5\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"49.9B\",\n    \"parameters_raw\": 49867145216,\n    \"min_ram_gb\": 27.9,\n    \"recommended_ram_gb\": 46.4,\n    \"min_vram_gb\": 25.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron-nas\",\n    \"hf_downloads\": 105079,\n    \"hf_likes\": 226,\n    \"release_date\": \"2025-07-25\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Llama-3_3-Nemotron-Super-49B-v1_5-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/Llama-3_3-Nemotron-Super-49B-v1\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"49.9B\",\n    \"parameters_raw\": 49867145216,\n    \"min_ram_gb\": 27.9,\n    \"recommended_ram_gb\": 46.4,\n    \"min_vram_gb\": 25.5,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"nemotron-nas\",\n    \"hf_downloads\": 23805,\n    \"hf_likes\": 320,\n    \"release_date\": \"2025-03-16\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Llama-3_3-Nemotron-Super-49B-v1-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"txn545/Qwen3.5-122B-A10B-NVFP4\",\n    \"provider\": \"txn545\",\n    \"parameter_count\": \"64.4B\",\n    \"parameters_raw\": 64354266864,\n    \"min_ram_gb\": 36.0,\n    \"recommended_ram_gb\": 59.9,\n    \"min_vram_gb\": 33.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_5_moe\",\n    \"hf_downloads\": 37707,\n    \"hf_likes\": 6,\n    \"release_date\": \"2026-02-24\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 5128230639,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.1-70B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 801189,\n    \"hf_likes\": 894,\n    \"release_date\": \"2024-07-16\"\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.3-70B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Llama-3.3-70B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/Llama-3.3-70B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"casperhansen/llama-3.3-70b-instruct-awq\",\n    \"provider\": \"casperhansen\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 674865,\n    \"hf_likes\": 39,\n    \"release_date\": \"2024-12-06\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"kosbu/Llama-3.3-70B-Instruct-AWQ\",\n    \"provider\": \"kosbu\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 505688,\n    \"hf_likes\": 10,\n    \"release_date\": \"2024-12-06\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4\",\n    \"provider\": \"ibnzterrell\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 138353,\n    \"hf_likes\": 30,\n    \"release_date\": \"2024-12-07\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 116205,\n    \"hf_likes\": 32,\n    \"release_date\": \"2024-07-31\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.1-70B\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 75498,\n    \"hf_likes\": 408,\n    \"release_date\": \"2024-07-14\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Meta-Llama-3-70B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 61023,\n    \"hf_likes\": 1506,\n    \"release_date\": \"2024-04-17\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Meta-Llama-3-70B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3\",\n    \"provider\": \"tokyotech-llm\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553706496,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 35321,\n    \"hf_likes\": 14,\n    \"release_date\": \"2024-12-25\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Meta-Llama-3.1-70B-Instruct-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70553707616,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 39962,\n    \"hf_likes\": 50,\n    \"release_date\": \"2024-07-23\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70560423936,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 42062,\n    \"hf_likes\": 14,\n    \"release_date\": \"2024-12-11\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"70.6B\",\n    \"parameters_raw\": 70560423936,\n    \"min_ram_gb\": 39.4,\n    \"recommended_ram_gb\": 65.7,\n    \"min_vram_gb\": 36.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 26238,\n    \"hf_likes\": 10,\n    \"release_date\": \"2025-02-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LLM360/K2-Think-V2\",\n    \"provider\": \"llm360\",\n    \"parameter_count\": \"72.6B\",\n    \"parameters_raw\": 72550195200,\n    \"min_ram_gb\": 40.5,\n    \"recommended_ram_gb\": 67.6,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 53839,\n    \"hf_likes\": 23,\n    \"release_date\": \"2026-01-08\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-72B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 558153,\n    \"hf_likes\": 916,\n    \"release_date\": \"2024-09-16\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2.5-72B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-72B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 45193,\n    \"hf_likes\": 89,\n    \"release_date\": \"2024-09-15\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen2.5-72B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2-72B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 40930,\n    \"hf_likes\": 719,\n    \"release_date\": \"2024-05-28\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/Qwen2-72B-Instruct-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen2-72B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 34455,\n    \"hf_likes\": 200,\n    \"release_date\": \"2024-05-22\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"mradermacher/Qwen2-72B-GGUF\",\n        \"provider\": \"mradermacher\"\n      }\n    ]\n  },\n  {\n    \"name\": \"huihui-ai/Qwen2.5-72B-Instruct-abliterated\",\n    \"provider\": \"huihui-ai\",\n    \"parameter_count\": \"72.7B\",\n    \"parameters_raw\": 72706203648,\n    \"min_ram_gb\": 40.6,\n    \"recommended_ram_gb\": 67.7,\n    \"min_vram_gb\": 37.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 20754,\n    \"hf_likes\": 35,\n    \"release_date\": \"2024-10-26\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-72B-Instruct-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"73.0B\",\n    \"parameters_raw\": 72957861888,\n    \"min_ram_gb\": 40.8,\n    \"recommended_ram_gb\": 67.9,\n    \"min_vram_gb\": 37.4,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 922364,\n    \"hf_likes\": 75,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"73.0B\",\n    \"parameters_raw\": 72957861888,\n    \"min_ram_gb\": 40.8,\n    \"recommended_ram_gb\": 67.9,\n    \"min_vram_gb\": 37.4,\n    \"quantization\": \"GPTQ-Int8\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 42593,\n    \"hf_likes\": 28,\n    \"release_date\": \"2024-09-17\",\n    \"_discovered\": true,\n    \"format\": \"gptq\"\n  },\n  {\n    \"name\": \"NexVeridian/Qwen3-Coder-Next-8bit\",\n    \"provider\": \"nexveridian\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 300258,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-03\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 48644,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-09-15\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 48355,\n    \"hf_likes\": 2,\n    \"release_date\": \"2025-09-15\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 47109,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-09-15\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/Qwen3-Next-80B-A3B-Instruct-MLX-5bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674388992,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 47029,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-09-15\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462052829,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-Next\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79674391296,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 484455,\n    \"hf_likes\": 976,\n    \"release_date\": \"2026-01-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-Next-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"79.7B\",\n    \"parameters_raw\": 79679212800,\n    \"min_ram_gb\": 44.5,\n    \"recommended_ram_gb\": 74.2,\n    \"min_vram_gb\": 40.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 398505,\n    \"hf_likes\": 100,\n    \"release_date\": \"2026-02-01\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5462383530,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-Next\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"80B\",\n    \"parameters_raw\": 80000000000,\n    \"min_ram_gb\": 44.8,\n    \"recommended_ram_gb\": 74.6,\n    \"min_vram_gb\": 41.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation, agentic coding\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 4,\n    \"active_parameters\": 3000000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-01-30\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-Coder-Next-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Next-80B-A3B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"81.3B\",\n    \"parameters_raw\": 81324862720,\n    \"min_ram_gb\": 45.4,\n    \"recommended_ram_gb\": 75.7,\n    \"min_vram_gb\": 41.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 1224711,\n    \"hf_likes\": 945,\n    \"release_date\": \"2025-09-09\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5575200546,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Next-80B-A3B-Instruct-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"81.3B\",\n    \"parameters_raw\": 81329784384,\n    \"min_ram_gb\": 45.4,\n    \"recommended_ram_gb\": 75.7,\n    \"min_vram_gb\": 41.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_next\",\n    \"hf_downloads\": 148887,\n    \"hf_likes\": 82,\n    \"release_date\": \"2025-09-22\",\n    \"is_moe\": true,\n    \"num_experts\": 512,\n    \"active_experts\": 10,\n    \"active_parameters\": 5575537949,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen1.5-110B-Chat-AWQ\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"111.2B\",\n    \"parameters_raw\": 111209914368,\n    \"min_ram_gb\": 62.1,\n    \"recommended_ram_gb\": 103.6,\n    \"min_vram_gb\": 57.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 32768,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen2\",\n    \"hf_downloads\": 320397,\n    \"hf_likes\": 9,\n    \"release_date\": \"2024-04-27\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"lmstudio-community/gpt-oss-120b-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"116.8B\",\n    \"parameters_raw\": 116829154368,\n    \"min_ram_gb\": 65.3,\n    \"recommended_ram_gb\": 108.8,\n    \"min_vram_gb\": 59.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 61730,\n    \"hf_likes\": 12,\n    \"release_date\": \"2025-08-05\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 4,\n    \"active_parameters\": 9309823238,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"axolotl-ai-co/gpt-oss-120b-dequantized\",\n    \"provider\": \"axolotl-ai-co\",\n    \"parameter_count\": \"116.8B\",\n    \"parameters_raw\": 116829156672,\n    \"min_ram_gb\": 65.3,\n    \"recommended_ram_gb\": 108.8,\n    \"min_vram_gb\": 59.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 34254,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-08-07\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 4,\n    \"active_parameters\": 9309823421,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"openai/gpt-oss-120b\",\n    \"provider\": \"openai\",\n    \"parameter_count\": \"120.4B\",\n    \"parameters_raw\": 120412337472,\n    \"min_ram_gb\": 67.3,\n    \"recommended_ram_gb\": 112.1,\n    \"min_vram_gb\": 61.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"gpt_oss\",\n    \"hf_downloads\": 4194966,\n    \"hf_likes\": 4542,\n    \"release_date\": \"2025-08-04\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 4,\n    \"active_parameters\": 9595358141,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/gpt-oss-120b-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-122B-A10B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"125.1B\",\n    \"parameters_raw\": 125086497008,\n    \"min_ram_gb\": 69.9,\n    \"recommended_ram_gb\": 116.5,\n    \"min_vram_gb\": 64.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5_moe\",\n    \"hf_downloads\": 171055,\n    \"hf_likes\": 389,\n    \"release_date\": \"2026-02-24\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 10000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3.5-122B-A10B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"mistralai/Mixtral-8x22B-Instruct-v0.1\",\n    \"provider\": \"Mistral AI\",\n    \"parameter_count\": \"140.6B\",\n    \"parameters_raw\": 140630071296,\n    \"min_ram_gb\": 78.6,\n    \"recommended_ram_gb\": 131.0,\n    \"min_vram_gb\": 72.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"unknown\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 15022,\n    \"hf_likes\": 746,\n    \"release_date\": \"2024-04-16\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 39100000000\n  },\n  {\n    \"name\": \"MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-AWQ\",\n    \"provider\": \"maziyarpanahi\",\n    \"parameter_count\": \"140.6B\",\n    \"parameters_raw\": 140630071296,\n    \"min_ram_gb\": 78.6,\n    \"recommended_ram_gb\": 131.0,\n    \"min_vram_gb\": 72.0,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 65536,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 40221,\n    \"hf_likes\": 13,\n    \"release_date\": \"2024-04-18\",\n    \"is_moe\": true,\n    \"num_experts\": 8,\n    \"active_experts\": 2,\n    \"active_parameters\": 40431145496,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"rednote-hilab/dots.llm1.inst\",\n    \"provider\": \"rednote-hilab\",\n    \"parameter_count\": \"142.8B\",\n    \"parameters_raw\": 142774381696,\n    \"min_ram_gb\": 79.8,\n    \"recommended_ram_gb\": 133.0,\n    \"min_vram_gb\": 73.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 32768,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"dots1\",\n    \"hf_downloads\": 5040,\n    \"hf_likes\": 175,\n    \"release_date\": \"2025-05-14\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/dots.llm1.inst-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"bigscience/bloom\",\n    \"provider\": \"bigscience\",\n    \"parameter_count\": \"176.2B\",\n    \"parameters_raw\": 176247271424,\n    \"min_ram_gb\": 98.5,\n    \"recommended_ram_gb\": 164.1,\n    \"min_vram_gb\": 90.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"bloom\",\n    \"hf_downloads\": 4896,\n    \"hf_likes\": 4986,\n    \"release_date\": \"2022-05-19\"\n  },\n  {\n    \"name\": \"tiiuae/falcon-180B-chat\",\n    \"provider\": \"TII\",\n    \"parameter_count\": \"179.5B\",\n    \"parameters_raw\": 179522565120,\n    \"min_ram_gb\": 100.3,\n    \"recommended_ram_gb\": 167.2,\n    \"min_vram_gb\": 92.0,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"falcon\",\n    \"hf_downloads\": 65,\n    \"hf_likes\": 545,\n    \"release_date\": \"2023-09-04\"\n  },\n  {\n    \"name\": \"stepfun-ai/Step-3.5-Flash\",\n    \"provider\": \"stepfun-ai\",\n    \"parameter_count\": \"199.4B\",\n    \"parameters_raw\": 199384301376,\n    \"min_ram_gb\": 111.4,\n    \"recommended_ram_gb\": 185.7,\n    \"min_vram_gb\": 102.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"step3p5\",\n    \"hf_downloads\": 327178,\n    \"hf_likes\": 674,\n    \"release_date\": \"2026-02-01\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/MiniMax-M2.5-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689748992,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 112426,\n    \"hf_likes\": 1,\n    \"release_date\": \"2026-02-13\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223714369,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/MiniMax-M2.5-MLX-4bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689748992,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 105419,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-13\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223714369,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/MiniMax-M2.5-MLX-6bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689748992,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 103821,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-02-13\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223714369,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"lmstudio-community/MiniMax-M2-MLX-8bit\",\n    \"provider\": \"lmstudio-community\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689748992,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax\",\n    \"hf_downloads\": 19959,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-10-29\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223714369,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"QuantTrio/MiniMax-M2-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689764864,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mixtral\",\n    \"hf_downloads\": 586558,\n    \"hf_likes\": 8,\n    \"release_date\": \"2025-10-28\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223715635,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"QuantTrio/MiniMax-M2.5-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228689764864,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 45340,\n    \"hf_likes\": 10,\n    \"release_date\": \"2026-02-15\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18223715635,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"MiniMaxAI/MiniMax-M2.7\",\n    \"provider\": \"minimaxai\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228703644928,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Latest flagship with enhanced reasoning and coding\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2026-03-18\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 10000000000\n  },\n  {\n    \"name\": \"MiniMaxAI/MiniMax-M2.5\",\n    \"provider\": \"minimaxai\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228703644928,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 343848,\n    \"hf_likes\": 1080,\n    \"release_date\": \"2026-02-12\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 10000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/MiniMax-M2.5-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"MiniMaxAI/MiniMax-M2\",\n    \"provider\": \"minimaxai\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228703644928,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 275243,\n    \"hf_likes\": 1485,\n    \"release_date\": \"2025-10-22\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18224821702,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/MiniMax-M2-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"MiniMaxAI/MiniMax-M2.1\",\n    \"provider\": \"minimaxai\",\n    \"parameter_count\": \"228.7B\",\n    \"parameters_raw\": 228703644928,\n    \"min_ram_gb\": 127.8,\n    \"recommended_ram_gb\": 213.0,\n    \"min_vram_gb\": 117.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 196608,\n    \"use_case\": \"Lightweight, edge deployment\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"minimax_m2\",\n    \"hf_downloads\": 72189,\n    \"hf_likes\": 1257,\n    \"release_date\": \"2025-12-20\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 18224821702,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/MiniMax-M2.1-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-235B-A22B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"235.1B\",\n    \"parameters_raw\": 235093634560,\n    \"min_ram_gb\": 131.4,\n    \"recommended_ram_gb\": 218.9,\n    \"min_vram_gb\": 120.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 684371,\n    \"hf_likes\": 1077,\n    \"release_date\": \"2025-04-27\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 22000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Qwen3-235B-A22B-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"Qwen/Qwen3-235B-A22B-Instruct-2507-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"235.1B\",\n    \"parameters_raw\": 235107904512,\n    \"min_ram_gb\": 131.4,\n    \"recommended_ram_gb\": 219.0,\n    \"min_vram_gb\": 120.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 802366,\n    \"hf_likes\": 146,\n    \"release_date\": \"2025-07-21\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 25714927049,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-235B-A22B-Thinking-2507-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"235.1B\",\n    \"parameters_raw\": 235107904512,\n    \"min_ram_gb\": 131.4,\n    \"recommended_ram_gb\": 219.0,\n    \"min_vram_gb\": 120.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 77936,\n    \"hf_likes\": 83,\n    \"release_date\": \"2025-07-25\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 25714927049,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-235B-A22B-FP8\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"235.1B\",\n    \"parameters_raw\": 235107904512,\n    \"min_ram_gb\": 131.4,\n    \"recommended_ram_gb\": 219.0,\n    \"min_vram_gb\": 120.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 40960,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 32322,\n    \"hf_likes\": 90,\n    \"release_date\": \"2025-04-28\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 25714927049,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"casperhansen/deepseek-coder-v2-instruct-awq\",\n    \"provider\": \"casperhansen\",\n    \"parameter_count\": \"235.7B\",\n    \"parameters_raw\": 235741434880,\n    \"min_ram_gb\": 131.7,\n    \"recommended_ram_gb\": 219.6,\n    \"min_vram_gb\": 120.8,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 163840,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 155456,\n    \"hf_likes\": 11,\n    \"release_date\": \"2024-07-03\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 32782793288,\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V2.5\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"235.7B\",\n    \"parameters_raw\": 235741434880,\n    \"min_ram_gb\": 131.7,\n    \"recommended_ram_gb\": 219.6,\n    \"min_vram_gb\": 120.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 84805,\n    \"hf_likes\": 733,\n    \"release_date\": \"2024-09-05\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 32782793288,\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"bartowski/DeepSeek-V2.5-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"RedHatAI/DeepSeek-V2.5-1210-FP8\",\n    \"provider\": \"redhatai\",\n    \"parameter_count\": \"235.7B\",\n    \"parameters_raw\": 235741492480,\n    \"min_ram_gb\": 131.7,\n    \"recommended_ram_gb\": 219.6,\n    \"min_vram_gb\": 120.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v2\",\n    \"hf_downloads\": 54313,\n    \"hf_likes\": 4,\n    \"release_date\": \"2025-01-04\",\n    \"is_moe\": true,\n    \"num_experts\": 64,\n    \"active_experts\": 6,\n    \"active_parameters\": 32782801298,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"LGAI-EXAONE/K-EXAONE-236B-A23B\",\n    \"provider\": \"lgai-exaone\",\n    \"parameter_count\": \"237.1B\",\n    \"parameters_raw\": 237099669632,\n    \"min_ram_gb\": 132.5,\n    \"recommended_ram_gb\": 220.8,\n    \"min_vram_gb\": 121.4,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"exaone_moe\",\n    \"hf_downloads\": 23695,\n    \"hf_likes\": 549,\n    \"release_date\": \"2025-12-26\",\n    \"is_moe\": true,\n    \"num_experts\": 128,\n    \"active_experts\": 8,\n    \"active_parameters\": 25932776361,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"baidu/ERNIE-4.5-300B-A47B-Paddle\",\n    \"provider\": \"baidu\",\n    \"parameter_count\": \"300.5B\",\n    \"parameters_raw\": 300474051776,\n    \"min_ram_gb\": 167.9,\n    \"recommended_ram_gb\": 279.8,\n    \"min_vram_gb\": 153.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"ernie4_5_moe\",\n    \"hf_downloads\": 332,\n    \"hf_likes\": 12,\n    \"release_date\": \"2025-06-28\"\n  },\n  {\n    \"name\": \"XiaomiMiMo/MiMo-V2-Flash\",\n    \"provider\": \"xiaomimimo\",\n    \"parameter_count\": \"309.8B\",\n    \"parameters_raw\": 309785318400,\n    \"min_ram_gb\": 173.1,\n    \"recommended_ram_gb\": 288.5,\n    \"min_vram_gb\": 158.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"mimo_v2_flash\",\n    \"hf_downloads\": 536830,\n    \"hf_likes\": 636,\n    \"release_date\": \"2025-12-16\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/MiMo-V2-Flash-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zai-org/GLM-4.6\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"356.8B\",\n    \"parameters_raw\": 356785898816,\n    \"min_ram_gb\": 199.4,\n    \"recommended_ram_gb\": 332.3,\n    \"min_vram_gb\": 182.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe\",\n    \"hf_downloads\": 81982,\n    \"hf_likes\": 1204,\n    \"release_date\": \"2025-09-29\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/GLM-4.6-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"zai-org/GLM-4.5\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"358.3B\",\n    \"parameters_raw\": 358337791296,\n    \"min_ram_gb\": 200.2,\n    \"recommended_ram_gb\": 333.7,\n    \"min_vram_gb\": 183.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm4_moe\",\n    \"hf_downloads\": 42566,\n    \"hf_likes\": 1396,\n    \"release_date\": \"2025-07-20\",\n    \"_discovered\": true,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/GLM-4.5-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-R1-0528-NVFP4-v2\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"393.6B\",\n    \"parameters_raw\": 393632819968,\n    \"min_ram_gb\": 220.0,\n    \"recommended_ram_gb\": 366.6,\n    \"min_vram_gb\": 201.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 142525,\n    \"hf_likes\": 16,\n    \"release_date\": \"2025-07-21\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 31367615334,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-V3.1-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"393.6B\",\n    \"parameters_raw\": 393632819968,\n    \"min_ram_gb\": 220.0,\n    \"recommended_ram_gb\": 366.6,\n    \"min_vram_gb\": 201.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 37723,\n    \"hf_likes\": 13,\n    \"release_date\": \"2025-11-21\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 31367615334,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-V3.2-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"394.5B\",\n    \"parameters_raw\": 394498304256,\n    \"min_ram_gb\": 220.4,\n    \"recommended_ram_gb\": 367.4,\n    \"min_vram_gb\": 202.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v32\",\n    \"hf_downloads\": 21598,\n    \"hf_likes\": 7,\n    \"release_date\": \"2025-12-30\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-V3-0324-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"396.8B\",\n    \"parameters_raw\": 396767013632,\n    \"min_ram_gb\": 221.7,\n    \"recommended_ram_gb\": 369.5,\n    \"min_vram_gb\": 203.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 84851,\n    \"hf_likes\": 14,\n    \"release_date\": \"2025-05-03\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 31617371393,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"nvidia/DeepSeek-R1-NVFP4\",\n    \"provider\": \"nvidia\",\n    \"parameter_count\": \"396.8B\",\n    \"parameters_raw\": 396767013632,\n    \"min_ram_gb\": 221.7,\n    \"recommended_ram_gb\": 369.5,\n    \"min_vram_gb\": 203.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 43986,\n    \"hf_likes\": 271,\n    \"release_date\": \"2025-02-21\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 31617371393,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"meta-llama/Llama-4-Maverick-17B-128E-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"401.6B\",\n    \"parameters_raw\": 401583781376,\n    \"min_ram_gb\": 224.4,\n    \"recommended_ram_gb\": 374.0,\n    \"min_vram_gb\": 205.7,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"vision\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"llama4\",\n    \"hf_downloads\": 6341,\n    \"hf_likes\": 466,\n    \"release_date\": \"2025-04-01\",\n    \"is_moe\": true,\n    \"num_experts\": 16,\n    \"active_experts\": 1,\n    \"active_parameters\": 17000000000\n  },\n  {\n    \"name\": \"Qwen/Qwen3.5-397B-A17B\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"403.4B\",\n    \"parameters_raw\": 403397928944,\n    \"min_ram_gb\": 225.4,\n    \"recommended_ram_gb\": 375.7,\n    \"min_vram_gb\": 206.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\",\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"qwen3_5_moe\",\n    \"hf_downloads\": 1291825,\n    \"hf_likes\": 1214,\n    \"release_date\": \"2026-02-16\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 17000000000\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.1-405B-Instruct\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"405.9B\",\n    \"parameters_raw\": 405853388800,\n    \"min_ram_gb\": 226.8,\n    \"recommended_ram_gb\": 378.0,\n    \"min_vram_gb\": 207.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 173410,\n    \"hf_likes\": 592,\n    \"release_date\": \"2024-07-16\"\n  },\n  {\n    \"name\": \"meta-llama/Llama-3.1-405B-Instruct-FP8\",\n    \"provider\": \"Meta\",\n    \"parameter_count\": \"405.9B\",\n    \"parameters_raw\": 405868625920,\n    \"min_ram_gb\": 226.8,\n    \"recommended_ram_gb\": 378.0,\n    \"min_vram_gb\": 207.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 4096,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"llama\",\n    \"hf_downloads\": 22040,\n    \"hf_likes\": 193,\n    \"release_date\": \"2024-07-20\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"Qwen/Qwen3-Coder-480B-A35B-Instruct\",\n    \"provider\": \"Alibaba\",\n    \"parameter_count\": \"480.2B\",\n    \"parameters_raw\": 480154875392,\n    \"min_ram_gb\": 268.3,\n    \"recommended_ram_gb\": 447.2,\n    \"min_vram_gb\": 245.9,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Code generation and completion\",\n    \"capabilities\": [\n      \"tool_use\"\n    ],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"qwen3_moe\",\n    \"hf_downloads\": 75486,\n    \"hf_likes\": 1304,\n    \"release_date\": \"2025-07-22\",\n    \"is_moe\": true,\n    \"num_experts\": 160,\n    \"active_experts\": 8,\n    \"active_parameters\": 35000000000\n  },\n  {\n    \"name\": \"meituan-longcat/LongCat-Flash-Chat\",\n    \"provider\": \"meituan-longcat\",\n    \"parameter_count\": \"561.9B\",\n    \"parameters_raw\": 561862880256,\n    \"min_ram_gb\": 314.0,\n    \"recommended_ram_gb\": 523.3,\n    \"min_vram_gb\": 287.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"unknown\",\n    \"hf_downloads\": 30116,\n    \"hf_likes\": 526,\n    \"release_date\": \"2025-08-29\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"684.5B\",\n    \"parameters_raw\": 684531386000,\n    \"min_ram_gb\": 382.5,\n    \"recommended_ram_gb\": 637.5,\n    \"min_vram_gb\": 350.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 1026085,\n    \"hf_likes\": 13108,\n    \"release_date\": \"2025-01-20\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 37000000000,\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/DeepSeek-R1-GGUF\",\n        \"provider\": \"unsloth\"\n      },\n      {\n        \"repo\": \"bartowski/DeepSeek-R1-GGUF\",\n        \"provider\": \"bartowski\"\n      }\n    ]\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-R1-0528\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"684.5B\",\n    \"parameters_raw\": 684531386000,\n    \"min_ram_gb\": 382.5,\n    \"recommended_ram_gb\": 637.5,\n    \"min_vram_gb\": 350.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 1050237,\n    \"hf_likes\": 2403,\n    \"release_date\": \"2025-05-28\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 54548594820,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V3-0324\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"684.5B\",\n    \"parameters_raw\": 684531386000,\n    \"min_ram_gb\": 382.5,\n    \"recommended_ram_gb\": 637.5,\n    \"min_vram_gb\": 350.6,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"hf_downloads\": 270362,\n    \"hf_likes\": 3088,\n    \"release_date\": \"2025-03-24\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 54548594820,\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V3\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"685B\",\n    \"parameters_raw\": 685000000000,\n    \"min_ram_gb\": 382.8,\n    \"recommended_ram_gb\": 638.0,\n    \"min_vram_gb\": 351.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"State-of-the-art, MoE architecture\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 37000000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": null\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V3.2-Speciale\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"685B\",\n    \"parameters_raw\": 685000000000,\n    \"min_ram_gb\": 383.2,\n    \"recommended_ram_gb\": 638.7,\n    \"min_vram_gb\": 351.3,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Advanced reasoning, chain-of-thought\",\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v3\",\n    \"is_moe\": true,\n    \"num_experts\": 256,\n    \"active_experts\": 8,\n    \"active_parameters\": 37000000000,\n    \"hf_downloads\": 0,\n    \"hf_likes\": 0,\n    \"release_date\": \"2025-12-01\"\n  },\n  {\n    \"name\": \"QuantTrio/DeepSeek-V3.2-AWQ\",\n    \"provider\": \"quanttrio\",\n    \"parameter_count\": \"685.0B\",\n    \"parameters_raw\": 685011996928,\n    \"min_ram_gb\": 382.8,\n    \"recommended_ram_gb\": 638.0,\n    \"min_vram_gb\": 350.9,\n    \"quantization\": \"AWQ-4bit\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v32\",\n    \"hf_downloads\": 103286,\n    \"hf_likes\": 11,\n    \"release_date\": \"2025-12-03\",\n    \"_discovered\": true,\n    \"format\": \"awq\"\n  },\n  {\n    \"name\": \"deepseek-ai/DeepSeek-V3.2\",\n    \"provider\": \"DeepSeek\",\n    \"parameter_count\": \"685.4B\",\n    \"parameters_raw\": 685396921376,\n    \"min_ram_gb\": 383.0,\n    \"recommended_ram_gb\": 638.3,\n    \"min_vram_gb\": 351.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 163840,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"deepseek_v32\",\n    \"hf_downloads\": 362520,\n    \"hf_likes\": 1280,\n    \"release_date\": \"2025-12-01\"\n  },\n  {\n    \"name\": \"zai-org/GLM-5\",\n    \"provider\": \"zai-org\",\n    \"parameter_count\": \"753.9B\",\n    \"parameters_raw\": 753864139008,\n    \"min_ram_gb\": 421.3,\n    \"recommended_ram_gb\": 702.1,\n    \"min_vram_gb\": 386.1,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 202752,\n    \"use_case\": \"General purpose text generation\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"glm_moe_dsa\",\n    \"hf_downloads\": 205187,\n    \"hf_likes\": 1698,\n    \"release_date\": \"2026-02-11\"\n  },\n  {\n    \"name\": \"moonshotai/Kimi-K2-Instruct\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"1026.5B\",\n    \"parameters_raw\": 1026470731056,\n    \"min_ram_gb\": 573.6,\n    \"recommended_ram_gb\": 956.0,\n    \"min_vram_gb\": 525.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 131072,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"kimi_k2\",\n    \"hf_downloads\": 151155,\n    \"hf_likes\": 2324,\n    \"release_date\": \"2025-07-11\"\n  },\n  {\n    \"name\": \"moonshotai/Kimi-K2-Instruct-0905\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"1026.5B\",\n    \"parameters_raw\": 1026470735448,\n    \"min_ram_gb\": 573.6,\n    \"recommended_ram_gb\": 956.0,\n    \"min_vram_gb\": 525.8,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"Instruction following, chat\",\n    \"capabilities\": [],\n    \"pipeline_tag\": \"text-generation\",\n    \"architecture\": \"kimi_k2\",\n    \"hf_downloads\": 28801,\n    \"hf_likes\": 683,\n    \"release_date\": \"2025-09-03\",\n    \"_discovered\": true\n  },\n  {\n    \"name\": \"moonshotai/Kimi-K2.5\",\n    \"provider\": \"moonshotai\",\n    \"parameter_count\": \"1058.6B\",\n    \"parameters_raw\": 1058589420528,\n    \"min_ram_gb\": 591.5,\n    \"recommended_ram_gb\": 985.9,\n    \"min_vram_gb\": 542.2,\n    \"quantization\": \"Q4_K_M\",\n    \"context_length\": 262144,\n    \"use_case\": \"General purpose\",\n    \"capabilities\": [\n      \"vision\"\n    ],\n    \"pipeline_tag\": \"image-text-to-text\",\n    \"architecture\": \"kimi_k25\",\n    \"hf_downloads\": 1899549,\n    \"hf_likes\": 2220,\n    \"release_date\": \"2026-01-01\",\n    \"gguf_sources\": [\n      {\n        \"repo\": \"unsloth/Kimi-K2.5-GGUF\",\n        \"provider\": \"unsloth\"\n      }\n    ]\n  }\n]"
  },
  {
    "path": "llmfit-core/src/fit.rs",
    "content": "use crate::hardware::{GpuBackend, SystemSpecs};\nuse crate::models::{self, LlmModel, UseCase};\n\n/// Inference runtime — the software framework used for inference.\n/// Orthogonal to `GpuBackend` which represents hardware.\n#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize)]\npub enum InferenceRuntime {\n    LlamaCpp, // llama.cpp / Ollama\n    Mlx,      // Apple MLX framework\n    Vllm,     // vLLM (for AWQ/GPTQ pre-quantized models)\n}\n\nimpl InferenceRuntime {\n    pub fn label(&self) -> &'static str {\n        match self {\n            InferenceRuntime::LlamaCpp => \"llama.cpp\",\n            InferenceRuntime::Mlx => \"MLX\",\n            InferenceRuntime::Vllm => \"vLLM\",\n        }\n    }\n}\n\n/// Column to sort model fits by in the TUI/UI.\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\npub enum SortColumn {\n    Score,\n    Tps,\n    Params,\n    MemPct,\n    Ctx,\n    ReleaseDate,\n    UseCase,\n}\n\nimpl SortColumn {\n    pub fn label(&self) -> &str {\n        match self {\n            SortColumn::Score => \"Score\",\n            SortColumn::Tps => \"tok/s\",\n            SortColumn::Params => \"Params\",\n            SortColumn::MemPct => \"Mem%\",\n            SortColumn::Ctx => \"Ctx\",\n            SortColumn::ReleaseDate => \"Date\",\n            SortColumn::UseCase => \"Use\",\n        }\n    }\n\n    pub fn next(&self) -> Self {\n        match self {\n            SortColumn::Score => SortColumn::Tps,\n            SortColumn::Tps => SortColumn::Params,\n            SortColumn::Params => SortColumn::MemPct,\n            SortColumn::MemPct => SortColumn::Ctx,\n            SortColumn::Ctx => SortColumn::ReleaseDate,\n            SortColumn::ReleaseDate => SortColumn::UseCase,\n            SortColumn::UseCase => SortColumn::Score,\n        }\n    }\n}\n\n/// Memory fit -- does the model fit in the available memory pool?\n/// Perfect requires GPU acceleration. CPU paths cap at Good.\n#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize)]\npub enum FitLevel {\n    Perfect,  // Recommended memory met on GPU\n    Good,     // Fits with headroom (GPU tight, or CPU comfortable)\n    Marginal, // Minimum memory met but tight\n    TooTight, // Does not fit in available memory\n}\n\n/// Execution path -- how will inference run?\n/// This is the \"optimization\" dimension, independent of memory fit.\n#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize)]\npub enum RunMode {\n    Gpu,        // Fully loaded into VRAM -- fast\n    MoeOffload, // MoE: active experts in VRAM, inactive offloaded to RAM\n    CpuOffload, // Partial GPU offload, spills to system RAM -- mixed\n    CpuOnly,    // Entirely in system RAM, no GPU -- slow\n}\n\n/// Multi-dimensional score components (0-100 each).\n#[derive(Debug, Clone, Copy, serde::Serialize)]\npub struct ScoreComponents {\n    /// Quality: model family reputation + param count + quant penalty + task alignment.\n    pub quality: f64,\n    /// Speed: estimated tokens/sec normalized to 0-100.\n    pub speed: f64,\n    /// Fit: memory utilization efficiency (closer to filling without exceeding = higher).\n    pub fit: f64,\n    /// Context: context window capability vs reasonable target.\n    pub context: f64,\n}\n\n#[derive(Clone, serde::Serialize)]\npub struct ModelFit {\n    pub model: LlmModel,\n    pub fit_level: FitLevel,\n    pub run_mode: RunMode,\n    pub memory_required_gb: f64, // the memory that matters for this run mode\n    pub memory_available_gb: f64, // the memory pool being used\n    pub utilization_pct: f64,    // memory_required / memory_available * 100\n    pub notes: Vec<String>,\n    pub moe_offloaded_gb: Option<f64>, // GB of inactive experts offloaded to RAM\n    pub score: f64,                    // weighted composite score 0-100\n    pub score_components: ScoreComponents,\n    pub estimated_tps: f64,        // baseline estimated tokens per second\n    pub best_quant: String,        // best quantization for this hardware\n    pub use_case: UseCase,         // inferred use case category\n    pub runtime: InferenceRuntime, // inference runtime (MLX or llama.cpp)\n    pub installed: bool,           // model found in a local runtime provider\n}\n\nimpl ModelFit {\n    pub fn analyze(model: &LlmModel, system: &SystemSpecs) -> Self {\n        Self::analyze_with_context_limit(model, system, None)\n    }\n\n    pub fn analyze_with_context_limit(\n        model: &LlmModel,\n        system: &SystemSpecs,\n        context_limit: Option<u32>,\n    ) -> Self {\n        Self::analyze_inner(model, system, context_limit, None)\n    }\n\n    /// Analyze with an optional runtime override. When `force_runtime` is\n    /// `Some`, the automatic runtime selection (which prefers MLX on Apple\n    /// Silicon) is bypassed so the caller can request e.g. llama.cpp results\n    /// even on a Metal system.  Pre-quantized models always use vLLM\n    /// regardless of the override.\n    pub fn analyze_with_forced_runtime(\n        model: &LlmModel,\n        system: &SystemSpecs,\n        context_limit: Option<u32>,\n        force_runtime: Option<InferenceRuntime>,\n    ) -> Self {\n        Self::analyze_inner(model, system, context_limit, force_runtime)\n    }\n\n    fn analyze_inner(\n        model: &LlmModel,\n        system: &SystemSpecs,\n        context_limit: Option<u32>,\n        force_runtime: Option<InferenceRuntime>,\n    ) -> Self {\n        let mut notes = Vec::new();\n        let estimation_ctx = context_limit\n            .map(|limit| limit.min(model.context_length))\n            .unwrap_or(model.context_length);\n\n        let min_vram = model.min_vram_gb.unwrap_or(model.min_ram_gb);\n        let use_case = UseCase::from_model(model);\n        let default_mem_required =\n            model.estimate_memory_gb(model.quantization.as_str(), estimation_ctx);\n        if estimation_ctx < model.context_length {\n            notes.push(format!(\n                \"Context capped for estimation: {} -> {} tokens\",\n                model.context_length, estimation_ctx\n            ));\n        }\n\n        // Determine inference runtime up front so path selection can use\n        // the correct quantization hierarchy.\n        // Pre-quantized models always use vLLM; otherwise honour the\n        // force_runtime override if provided, falling back to auto-detect.\n        let runtime = if model.is_prequantized() {\n            InferenceRuntime::Vllm\n        } else if let Some(forced) = force_runtime {\n            forced\n        } else if system.backend == GpuBackend::Metal && system.unified_memory {\n            InferenceRuntime::Mlx\n        } else {\n            InferenceRuntime::LlamaCpp\n        };\n        let choose_quant =\n            |budget: f64| best_quant_for_runtime_budget(model, runtime, budget, estimation_ctx);\n\n        // Step 1: pick the best available execution path\n        // Step 2: score memory fit purely on headroom in that path's memory pool\n        let (run_mode, mem_required, mem_available) = if system.has_gpu {\n            if system.unified_memory {\n                // Unified memory (Apple Silicon or NVIDIA Tegra/Grace Blackwell):\n                // GPU and CPU share the same memory pool.\n                // No CpuOffload -- there's no separate pool to spill to.\n                if let Some(pool) = system.gpu_vram_gb {\n                    notes.push(\"Unified memory: GPU and CPU share the same pool\".to_string());\n                    if model.is_moe {\n                        notes.push(format!(\n                            \"MoE: {}/{} experts active (all share unified memory pool)\",\n                            model.active_experts.unwrap_or(0),\n                            model.num_experts.unwrap_or(0)\n                        ));\n                    }\n                    if model.is_moe {\n                        (RunMode::Gpu, min_vram, pool)\n                    } else if let Some((_, best_mem)) = choose_quant(pool) {\n                        (RunMode::Gpu, best_mem, pool)\n                    } else {\n                        (RunMode::Gpu, default_mem_required, pool)\n                    }\n                } else {\n                    cpu_path(model, system, runtime, estimation_ctx, &mut notes)\n                }\n            } else if let Some(system_vram) = system.total_gpu_vram_gb {\n                // Use total VRAM across all same-model GPUs for fit scoring.\n                // Multi-GPU inference (tensor splitting) is supported by llama.cpp, vLLM, etc.\n                if model.is_moe && min_vram <= system_vram {\n                    // Fits in VRAM -- GPU path\n                    notes.push(\"GPU: model loaded into VRAM\".to_string());\n                    if model.is_moe {\n                        notes.push(format!(\n                            \"MoE: all {} experts loaded in VRAM (optimal)\",\n                            model.num_experts.unwrap_or(0)\n                        ));\n                    }\n                    (RunMode::Gpu, min_vram, system_vram)\n                } else if model.is_moe {\n                    // MoE model: try expert offloading before CPU fallback\n                    moe_offload_path(model, system, system_vram, min_vram, runtime, &mut notes)\n                } else if let Some((_, best_mem)) = choose_quant(system_vram) {\n                    notes.push(\"GPU: model loaded into VRAM\".to_string());\n                    (RunMode::Gpu, best_mem, system_vram)\n                } else if let Some((_, best_mem)) = choose_quant(system.available_ram_gb) {\n                    // Doesn't fit in VRAM, spill to system RAM\n                    notes.push(\"GPU: insufficient VRAM, spilling to system RAM\".to_string());\n                    notes.push(\"Performance will be significantly reduced\".to_string());\n                    (RunMode::CpuOffload, best_mem, system.available_ram_gb)\n                } else {\n                    // Doesn't fit anywhere -- report against VRAM since GPU is preferred\n                    notes.push(\"Insufficient VRAM and system RAM\".to_string());\n                    notes.push(format!(\n                        \"Need {:.1} GB VRAM or {:.1} GB system RAM\",\n                        min_vram, model.min_ram_gb\n                    ));\n                    (RunMode::Gpu, default_mem_required, system_vram)\n                }\n            } else {\n                // GPU detected but VRAM unknown -- fall through to CPU\n                notes.push(\"GPU detected but VRAM unknown\".to_string());\n                cpu_path(model, system, runtime, estimation_ctx, &mut notes)\n            }\n        } else {\n            cpu_path(model, system, runtime, estimation_ctx, &mut notes)\n        };\n\n        // Score fit purely on memory headroom (Perfect requires GPU)\n        let fit_level = score_fit(\n            mem_required,\n            mem_available,\n            model.recommended_ram_gb,\n            run_mode,\n        );\n\n        let utilization_pct = if mem_available > 0.0 {\n            (mem_required / mem_available) * 100.0\n        } else {\n            f64::INFINITY\n        };\n\n        // Supplementary notes\n        if run_mode == RunMode::CpuOnly {\n            notes.push(\"No GPU -- inference will be slow\".to_string());\n        }\n        if matches!(run_mode, RunMode::CpuOffload | RunMode::CpuOnly) && system.total_cpu_cores < 4\n        {\n            notes.push(\"Low CPU core count may bottleneck inference\".to_string());\n        }\n\n        // Compute MoE offloaded amount if applicable\n        let moe_offloaded_gb = if run_mode == RunMode::MoeOffload {\n            model.moe_offloaded_ram_gb()\n        } else {\n            None\n        };\n\n        // Dynamic quantization: find best quant that fits\n        // Pre-quantized models (AWQ/GPTQ) have a fixed quantization — skip dynamic selection.\n        let (best_quant, _best_quant_mem) = if model.is_prequantized() {\n            (model.quantization.as_str(), mem_required)\n        } else {\n            let budget = mem_available;\n            let hierarchy: &[&str] = if runtime == InferenceRuntime::Mlx {\n                models::MLX_QUANT_HIERARCHY\n            } else {\n                models::QUANT_HIERARCHY\n            };\n            model\n                .best_quant_for_budget_with(budget, estimation_ctx, hierarchy)\n                .or_else(|| {\n                    // Fall back to GGUF hierarchy if MLX quants don't fit\n                    if runtime == InferenceRuntime::Mlx {\n                        model.best_quant_for_budget(budget, estimation_ctx)\n                    } else {\n                        None\n                    }\n                })\n                .unwrap_or((model.quantization.as_str(), mem_required))\n        };\n        let best_quant_str = if best_quant != model.quantization {\n            notes.push(format!(\n                \"Best quantization for hardware: {} (model default: {})\",\n                best_quant, model.quantization\n            ));\n            best_quant.to_string()\n        } else {\n            model.quantization.clone()\n        };\n\n        // Speed estimation\n        let estimated_tps = estimate_tps(model, &best_quant_str, system, run_mode, runtime);\n\n        // Add runtime comparison note on Apple Silicon\n        if runtime == InferenceRuntime::Mlx {\n            let llamacpp_tps = estimate_tps(\n                model,\n                &best_quant_str,\n                system,\n                run_mode,\n                InferenceRuntime::LlamaCpp,\n            );\n            if llamacpp_tps > 0.1 {\n                let speedup = ((estimated_tps / llamacpp_tps - 1.0) * 100.0).round();\n                if speedup > 0.0 {\n                    notes.push(format!(\n                        \"MLX runtime: ~{:.0}% faster than llama.cpp ({:.1} vs {:.1} tok/s)\",\n                        speedup, estimated_tps, llamacpp_tps\n                    ));\n                }\n            }\n        }\n\n        // Multi-dimensional scoring\n        let score_components = compute_scores(\n            model,\n            &best_quant_str,\n            use_case,\n            estimated_tps,\n            mem_required,\n            mem_available,\n        );\n        let score = weighted_score(score_components, use_case);\n\n        if estimated_tps > 0.0 {\n            notes.push(format!(\n                \"Baseline estimated speed: {:.1} tok/s\",\n                estimated_tps\n            ));\n        }\n\n        ModelFit {\n            model: model.clone(),\n            fit_level,\n            run_mode,\n            memory_required_gb: mem_required,\n            memory_available_gb: mem_available,\n            utilization_pct,\n            notes,\n            moe_offloaded_gb,\n            score,\n            score_components,\n            estimated_tps,\n            best_quant: best_quant_str,\n            use_case,\n            runtime,\n            installed: false, // set later by App after provider detection\n        }\n    }\n\n    pub fn fit_emoji(&self) -> &str {\n        match self.fit_level {\n            FitLevel::Perfect => \"🟢\",\n            FitLevel::Good => \"🟡\",\n            FitLevel::Marginal => \"🟠\",\n            FitLevel::TooTight => \"🔴\",\n        }\n    }\n\n    pub fn fit_text(&self) -> &str {\n        match self.fit_level {\n            FitLevel::Perfect => \"Perfect\",\n            FitLevel::Good => \"Good\",\n            FitLevel::Marginal => \"Marginal\",\n            FitLevel::TooTight => \"Too Tight\",\n        }\n    }\n\n    pub fn runtime_text(&self) -> &str {\n        self.runtime.label()\n    }\n\n    pub fn run_mode_text(&self) -> &str {\n        match self.run_mode {\n            RunMode::Gpu => \"GPU\",\n            RunMode::MoeOffload => \"MoE\",\n            RunMode::CpuOffload => \"CPU+GPU\",\n            RunMode::CpuOnly => \"CPU\",\n        }\n    }\n}\n\n/// Pure memory headroom scoring.\n/// - GPU (including Apple Silicon unified memory): can reach Perfect.\n/// - CpuOffload: caps at Good.\n/// - CpuOnly: caps at Marginal -- CPU-only inference is always a compromise.\nfn score_fit(\n    mem_required: f64,\n    mem_available: f64,\n    recommended: f64,\n    run_mode: RunMode,\n) -> FitLevel {\n    if mem_required > mem_available {\n        return FitLevel::TooTight;\n    }\n\n    match run_mode {\n        RunMode::Gpu => {\n            if recommended <= mem_available {\n                FitLevel::Perfect\n            } else if mem_available >= mem_required * 1.2 {\n                FitLevel::Good\n            } else {\n                FitLevel::Marginal\n            }\n        }\n        RunMode::MoeOffload => {\n            // MoE expert offloading -- GPU handles inference, inactive experts in RAM\n            // Good performance with some latency on expert switching\n            if mem_available >= mem_required * 1.2 {\n                FitLevel::Good\n            } else {\n                FitLevel::Marginal\n            }\n        }\n        RunMode::CpuOffload => {\n            // Mixed GPU/CPU -- decent but not ideal\n            if mem_available >= mem_required * 1.2 {\n                FitLevel::Good\n            } else {\n                FitLevel::Marginal\n            }\n        }\n        RunMode::CpuOnly => {\n            // CPU-only is always a compromise -- cap at Marginal\n            FitLevel::Marginal\n        }\n    }\n}\n\n/// Determine memory pool for CPU-only inference.\nfn cpu_path(\n    model: &LlmModel,\n    system: &SystemSpecs,\n    runtime: InferenceRuntime,\n    estimation_ctx: u32,\n    notes: &mut Vec<String>,\n) -> (RunMode, f64, f64) {\n    notes.push(\"CPU-only: model loaded into system RAM\".to_string());\n    if model.is_moe {\n        notes.push(\"MoE architecture, but expert offloading requires a GPU\".to_string());\n        return (RunMode::CpuOnly, model.min_ram_gb, system.available_ram_gb);\n    }\n\n    if let Some((_, best_mem)) =\n        best_quant_for_runtime_budget(model, runtime, system.available_ram_gb, estimation_ctx)\n    {\n        (RunMode::CpuOnly, best_mem, system.available_ram_gb)\n    } else {\n        (\n            RunMode::CpuOnly,\n            model.estimate_memory_gb(model.quantization.as_str(), estimation_ctx),\n            system.available_ram_gb,\n        )\n    }\n}\n\n/// Try MoE expert offloading: active experts in VRAM, inactive in RAM.\n/// Falls back to CPU paths if offloading isn't viable.\nfn moe_offload_path(\n    model: &LlmModel,\n    system: &SystemSpecs,\n    system_vram: f64,\n    total_vram: f64,\n    runtime: InferenceRuntime,\n    notes: &mut Vec<String>,\n) -> (RunMode, f64, f64) {\n    let hierarchy: &[&str] = if runtime == InferenceRuntime::Mlx {\n        models::MLX_QUANT_HIERARCHY\n    } else {\n        models::QUANT_HIERARCHY\n    };\n\n    for &quant in hierarchy {\n        if let Some((moe_vram, offloaded_gb)) = moe_memory_for_quant(model, quant)\n            && moe_vram <= system_vram\n            && offloaded_gb <= system.available_ram_gb\n        {\n            notes.push(format!(\n                \"MoE: {}/{} experts active in VRAM ({:.1} GB) at {}\",\n                model.active_experts.unwrap_or(0),\n                model.num_experts.unwrap_or(0),\n                moe_vram,\n                quant,\n            ));\n            notes.push(format!(\n                \"Inactive experts offloaded to system RAM ({:.1} GB)\",\n                offloaded_gb,\n            ));\n            return (RunMode::MoeOffload, moe_vram, system_vram);\n        }\n    }\n\n    // On MLX, also try GGUF-style quant levels as a fallback.\n    if runtime == InferenceRuntime::Mlx {\n        for &quant in models::QUANT_HIERARCHY {\n            if let Some((moe_vram, offloaded_gb)) = moe_memory_for_quant(model, quant)\n                && moe_vram <= system_vram\n                && offloaded_gb <= system.available_ram_gb\n            {\n                notes.push(format!(\n                    \"MoE: {}/{} experts active in VRAM ({:.1} GB) at {}\",\n                    model.active_experts.unwrap_or(0),\n                    model.num_experts.unwrap_or(0),\n                    moe_vram,\n                    quant,\n                ));\n                notes.push(format!(\n                    \"Inactive experts offloaded to system RAM ({:.1} GB)\",\n                    offloaded_gb,\n                ));\n                return (RunMode::MoeOffload, moe_vram, system_vram);\n            }\n        }\n    }\n\n    // MoE offloading not viable, fall back to generic paths\n    if model.min_ram_gb <= system.available_ram_gb {\n        notes.push(\"MoE: insufficient VRAM for expert offloading\".to_string());\n        notes.push(\"Spilling entire model to system RAM\".to_string());\n        notes.push(\"Performance will be significantly reduced\".to_string());\n        (\n            RunMode::CpuOffload,\n            model.min_ram_gb,\n            system.available_ram_gb,\n        )\n    } else {\n        notes.push(\"Insufficient VRAM and system RAM\".to_string());\n        notes.push(format!(\n            \"Need {:.1} GB VRAM (full) or {:.1} GB (MoE offload) + RAM\",\n            total_vram,\n            model.moe_active_vram_gb().unwrap_or(total_vram),\n        ));\n        (RunMode::Gpu, total_vram, system_vram)\n    }\n}\n\n/// Compute MoE active VRAM + offloaded RAM for a specific quantization level.\nfn moe_memory_for_quant(model: &LlmModel, quant: &str) -> Option<(f64, f64)> {\n    if !model.is_moe {\n        return None;\n    }\n\n    let active_params = model.active_parameters? as f64;\n    let total_params = model.parameters_raw? as f64;\n    let bpp = models::quant_bpp(quant);\n\n    let active_vram = ((active_params * bpp) / (1024.0 * 1024.0 * 1024.0) * 1.1).max(0.5);\n    let inactive_params = (total_params - active_params).max(0.0);\n    let offloaded_ram = (inactive_params * bpp) / (1024.0 * 1024.0 * 1024.0);\n\n    Some((active_vram, offloaded_ram))\n}\n\nfn best_quant_for_runtime_budget(\n    model: &LlmModel,\n    runtime: InferenceRuntime,\n    budget: f64,\n    estimation_ctx: u32,\n) -> Option<(&'static str, f64)> {\n    // Pre-quantized models (vLLM) don't support dynamic re-quantization\n    if runtime == InferenceRuntime::Vllm {\n        return None;\n    }\n    let hierarchy: &[&str] = if runtime == InferenceRuntime::Mlx {\n        models::MLX_QUANT_HIERARCHY\n    } else {\n        models::QUANT_HIERARCHY\n    };\n    model\n        .best_quant_for_budget_with(budget, estimation_ctx, hierarchy)\n        .or_else(|| {\n            if runtime == InferenceRuntime::Mlx {\n                model.best_quant_for_budget(budget, estimation_ctx)\n            } else {\n                None\n            }\n        })\n}\n\npub fn backend_compatible(model: &LlmModel, system: &SystemSpecs) -> bool {\n    if model.is_mlx_model() {\n        system.backend == GpuBackend::Metal && system.unified_memory\n    } else if model.is_prequantized() {\n        if !matches!(system.backend, GpuBackend::Cuda | GpuBackend::Rocm) {\n            return false;\n        }\n        // For CUDA GPUs, check that the GPU's compute capability meets the\n        // minimum required by the quantization format (e.g. AWQ needs Turing+).\n        // ROCm and unrecognized NVIDIA GPUs are assumed compatible.\n        if system.backend == GpuBackend::Cuda\n            && let Some(min_cc) = crate::hardware::quant_min_compute_capability(&model.quantization)\n            && let Some(gpu_name) = &system.gpu_name\n            && let Some(gpu_cc) = crate::hardware::gpu_compute_capability(gpu_name)\n        {\n            return gpu_cc >= min_cc;\n        }\n        true\n    } else {\n        true\n    }\n}\n\npub fn rank_models_by_fit(models: Vec<ModelFit>) -> Vec<ModelFit> {\n    rank_models_by_fit_opts(models, false)\n}\n\npub fn rank_models_by_fit_opts(models: Vec<ModelFit>, installed_first: bool) -> Vec<ModelFit> {\n    rank_models_by_fit_opts_col(models, installed_first, SortColumn::Score)\n}\n\npub fn rank_models_by_fit_opts_col(\n    models: Vec<ModelFit>,\n    installed_first: bool,\n    sort_column: SortColumn,\n) -> Vec<ModelFit> {\n    let mut ranked = models;\n    ranked.sort_by(|a, b| {\n        // Installed-first: if toggled, installed models sort above non-installed\n        if installed_first {\n            let inst_cmp = b.installed.cmp(&a.installed);\n            if inst_cmp != std::cmp::Ordering::Equal {\n                return inst_cmp;\n            }\n        }\n\n        // TooTight always sorts last regardless of column\n        let a_runnable = a.fit_level != FitLevel::TooTight;\n        let b_runnable = b.fit_level != FitLevel::TooTight;\n\n        match (a_runnable, b_runnable) {\n            (true, false) => return std::cmp::Ordering::Less,\n            (false, true) => return std::cmp::Ordering::Greater,\n            _ => {}\n        }\n\n        // Sort by selected column\n        match sort_column {\n            SortColumn::Score => b\n                .score\n                .partial_cmp(&a.score)\n                .unwrap_or(std::cmp::Ordering::Equal),\n            SortColumn::Tps => {\n                let cmp = b\n                    .estimated_tps\n                    .partial_cmp(&a.estimated_tps)\n                    .unwrap_or(std::cmp::Ordering::Equal);\n                if cmp == std::cmp::Ordering::Equal {\n                    b.score\n                        .partial_cmp(&a.score)\n                        .unwrap_or(std::cmp::Ordering::Equal)\n                } else {\n                    cmp\n                }\n            }\n            SortColumn::Params => {\n                let a_params = a.model.params_b();\n                let b_params = b.model.params_b();\n                b_params\n                    .partial_cmp(&a_params)\n                    .unwrap_or(std::cmp::Ordering::Equal)\n            }\n            SortColumn::MemPct => b\n                .utilization_pct\n                .partial_cmp(&a.utilization_pct)\n                .unwrap_or(std::cmp::Ordering::Equal),\n            SortColumn::Ctx => b.model.context_length.cmp(&a.model.context_length),\n            SortColumn::ReleaseDate => {\n                let a_date = a.model.release_date.as_deref().unwrap_or(\"\");\n                let b_date = b.model.release_date.as_deref().unwrap_or(\"\");\n                match (a_date.is_empty(), b_date.is_empty()) {\n                    (true, false) => std::cmp::Ordering::Greater, // no date = last\n                    (false, true) => std::cmp::Ordering::Less,\n                    (true, true) => b\n                        .score\n                        .partial_cmp(&a.score)\n                        .unwrap_or(std::cmp::Ordering::Equal),\n                    (false, false) => {\n                        let cmp = b_date.cmp(a_date); // descending = newest first\n                        if cmp == std::cmp::Ordering::Equal {\n                            b.score\n                                .partial_cmp(&a.score)\n                                .unwrap_or(std::cmp::Ordering::Equal)\n                        } else {\n                            cmp\n                        }\n                    }\n                }\n            }\n            SortColumn::UseCase => {\n                let cmp = a.use_case.label().cmp(b.use_case.label());\n                if cmp == std::cmp::Ordering::Equal {\n                    // Secondary sort by score within same use case\n                    b.score\n                        .partial_cmp(&a.score)\n                        .unwrap_or(std::cmp::Ordering::Equal)\n                } else {\n                    cmp\n                }\n            }\n        }\n    });\n    ranked\n}\n\n// ────────────────────────────────────────────────────────────────────\n// Speed estimation\n// ────────────────────────────────────────────────────────────────────\n\n/// Estimate tokens per second for a model on given hardware.\n/// Estimate tokens per second for a model on the given hardware.\n///\n/// LLM token generation is **memory-bandwidth-bound**: each generated token\n/// requires reading the full model weights once from VRAM. The theoretical\n/// upper bound is therefore:\n///\n///   max_tps = memory_bandwidth_GB_s / model_size_GB\n///\n/// In practice, real throughput is ~50–70% of this ceiling due to kernel\n/// launch overhead, KV-cache reads, and other fixed costs.\n///\n/// When the GPU model is recognized, we use its **actual memory bandwidth**\n/// (from the lookup table in `hardware::gpu_memory_bandwidth_gbps`) to\n/// produce a physics-grounded estimate. Otherwise we fall back to the\n/// original per-backend constant `K`.\n///\n/// References:\n///  - kipply, \"Transformer Inference Arithmetic\" (2022)\n///  - ggerganov, llama.cpp Apple Silicon benchmarks (Discussion #4167)\n///  - Google, \"Efficiently Scaling Transformer Inference\" (arXiv:2211.05102)\n///  - ggerganov, llama.cpp NVIDIA T4 benchmarks (Discussion #4225)\nfn estimate_tps(\n    model: &LlmModel,\n    quant: &str,\n    system: &SystemSpecs,\n    run_mode: RunMode,\n    runtime: InferenceRuntime,\n) -> f64 {\n    use crate::hardware::gpu_memory_bandwidth_gbps;\n\n    // MoE models execute only active experts per token, so speed estimates should\n    // use active parameters when known; fit/memory paths still use full model size.\n    let params = model\n        .active_parameters\n        .filter(|_| model.is_moe)\n        .map(|p| (p as f64) / 1_000_000_000.0)\n        .unwrap_or_else(|| model.params_b())\n        .max(0.1);\n\n    // ── Bandwidth-based estimation (preferred) ─────────────────────\n    //\n    // If we know the GPU's memory bandwidth, estimate tok/s from first\n    // principles instead of using a fixed constant.\n    //\n    // model_bytes = params_B * bytes_per_param(quant)\n    // raw_tps     = bandwidth_GB_s / model_bytes_GB\n    // estimated   = raw_tps * efficiency * run_mode_factor\n    //\n    // The efficiency factor (0.55) accounts for:\n    //  - Kernel launch / scheduling overhead\n    //  - KV-cache memory reads (not captured in model size)\n    //  - Memory controller inefficiency at high utilization\n    //\n    // Validated against:\n    //  - RTX 4090 (1008 GB/s): Qwen3.5-27B Q4 → ~40 tok/s measured\n    //  - T4 (320 GB/s): 7B F16 → ~16 tok/s (ggerganov benchmark)\n    //  - Apple M1 Max (400 GB/s): 7B Q4_0 → ~61 tok/s (ggerganov benchmark)\n    let gpu_name = system.gpu_name.as_deref().unwrap_or(\"\");\n    let bandwidth = gpu_memory_bandwidth_gbps(gpu_name);\n\n    if run_mode != RunMode::CpuOnly\n        && let Some(bw) = bandwidth\n    {\n        let bytes_per_param = models::quant_bytes_per_param(quant);\n        let model_gb = params * bytes_per_param;\n\n        // Efficiency factor — captures overhead not in the simple\n        // bandwidth / model-size formula.\n        let efficiency = 0.55;\n        let raw_tps = (bw / model_gb) * efficiency;\n\n        let mode_factor = match run_mode {\n            RunMode::Gpu => 1.0,\n            RunMode::MoeOffload => 0.8,\n            RunMode::CpuOffload => 0.5,\n            RunMode::CpuOnly => unreachable!(),\n        };\n\n        return (raw_tps * mode_factor).max(0.1);\n    }\n\n    // ── Fallback: fixed-constant approach ──────────────────────────\n    // Used when the GPU is not recognized (custom/unnamed GPUs,\n    // synthetic entries from --memory override, etc.).\n    let k: f64 = match (system.backend, runtime) {\n        (GpuBackend::Metal, InferenceRuntime::Mlx) => 250.0,\n        (GpuBackend::Metal, InferenceRuntime::LlamaCpp) => 160.0,\n        (GpuBackend::Metal, InferenceRuntime::Vllm) => 160.0,\n        (GpuBackend::Cuda, _) => 220.0,\n        (GpuBackend::Rocm, _) => 180.0,\n        (GpuBackend::Vulkan, _) => 150.0,\n        (GpuBackend::Sycl, _) => 100.0,\n        (GpuBackend::CpuArm, _) => 90.0,\n        (GpuBackend::CpuX86, _) => 70.0,\n        (GpuBackend::Ascend, _) => 390.0,\n    };\n\n    let mut base = k / params;\n\n    // Quantization speed multiplier\n    base *= models::quant_speed_multiplier(quant);\n\n    // Threading bonus for many cores\n    if system.total_cpu_cores >= 8 {\n        base *= 1.1;\n    }\n\n    // Run mode penalties\n    match run_mode {\n        RunMode::Gpu => {}                  // full speed\n        RunMode::MoeOffload => base *= 0.8, // expert switching latency\n        RunMode::CpuOffload => base *= 0.5, // significant penalty\n        RunMode::CpuOnly => base *= 0.3,    // worst case—override K to CPU\n    }\n\n    // CPU-only should use CPU K regardless of detected GPU\n    if run_mode == RunMode::CpuOnly {\n        let cpu_k = if cfg!(target_arch = \"aarch64\") {\n            90.0\n        } else {\n            70.0\n        };\n        base = (cpu_k / params) * models::quant_speed_multiplier(quant);\n        if system.total_cpu_cores >= 8 {\n            base *= 1.1;\n        }\n    }\n\n    base.max(0.1)\n}\n\n// ────────────────────────────────────────────────────────────────────\n// Multi-dimensional scoring (Quality, Speed, Fit, Context)\n// ────────────────────────────────────────────────────────────────────\n\nfn compute_scores(\n    model: &LlmModel,\n    quant: &str,\n    use_case: UseCase,\n    estimated_tps: f64,\n    mem_required: f64,\n    mem_available: f64,\n) -> ScoreComponents {\n    ScoreComponents {\n        quality: quality_score(model, quant, use_case),\n        speed: speed_score(estimated_tps, use_case),\n        fit: fit_score(mem_required, mem_available),\n        context: context_score(model, use_case),\n    }\n}\n\n/// Quality score: base quality from param count + family bump + quant penalty + task alignment.\nfn quality_score(model: &LlmModel, quant: &str, use_case: UseCase) -> f64 {\n    let params = model.params_b();\n\n    // Base quality by parameter count\n    let base = if params < 1.0 {\n        30.0\n    } else if params < 3.0 {\n        45.0\n    } else if params < 7.0 {\n        60.0\n    } else if params < 10.0 {\n        75.0\n    } else if params < 20.0 {\n        82.0\n    } else if params < 40.0 {\n        89.0\n    } else {\n        95.0\n    };\n\n    // Family/provider reputation bumps\n    let name_lower = model.name.to_lowercase();\n    #[allow(clippy::if_same_then_else)]\n    let family_bump = if name_lower.contains(\"qwen\") {\n        2.0\n    } else if name_lower.contains(\"deepseek\") {\n        3.0\n    } else if name_lower.contains(\"llama\") {\n        2.0\n    } else if name_lower.contains(\"mistral\") || name_lower.contains(\"mixtral\") {\n        1.0\n    } else if name_lower.contains(\"gemma\") {\n        1.0\n    } else if name_lower.contains(\"phi\") {\n        0.0\n    } else if name_lower.contains(\"starcoder\") {\n        1.0\n    } else {\n        0.0\n    };\n\n    // Quantization penalty\n    let q_penalty = models::quant_quality_penalty(quant);\n\n    // Task alignment bump\n    let task_bump = match use_case {\n        UseCase::Coding => {\n            if name_lower.contains(\"code\")\n                || name_lower.contains(\"starcoder\")\n                || name_lower.contains(\"wizard\")\n            {\n                6.0\n            } else {\n                0.0\n            }\n        }\n        UseCase::Reasoning => {\n            if params >= 13.0 {\n                5.0\n            } else {\n                0.0\n            }\n        }\n        UseCase::Multimodal => {\n            if name_lower.contains(\"vision\") || model.use_case.to_lowercase().contains(\"vision\") {\n                6.0\n            } else {\n                0.0\n            }\n        }\n        _ => 0.0,\n    };\n\n    (base + family_bump + q_penalty + task_bump).clamp(0.0, 100.0)\n}\n\n/// Speed score: normalize estimated TPS against target for the use case.\nfn speed_score(tps: f64, use_case: UseCase) -> f64 {\n    let target = match use_case {\n        UseCase::General | UseCase::Coding | UseCase::Multimodal | UseCase::Chat => 40.0,\n        UseCase::Reasoning => 25.0,\n        UseCase::Embedding => 200.0,\n    };\n    ((tps / target) * 100.0).clamp(0.0, 100.0)\n}\n\n/// Fit score: how well the model fills available memory without exceeding.\nfn fit_score(required: f64, available: f64) -> f64 {\n    if available <= 0.0 || required > available {\n        return 0.0;\n    }\n    let ratio = required / available;\n    // Sweet spot: 50-80% utilization scores highest\n    if ratio <= 0.5 {\n        // Under-utilizing: still good but not optimal\n        60.0 + (ratio / 0.5) * 40.0\n    } else if ratio <= 0.8 {\n        100.0\n    } else if ratio <= 0.9 {\n        // Getting tight\n        70.0\n    } else {\n        // Very tight\n        50.0\n    }\n}\n\n/// Context score: context window capability vs target for the use case.\nfn context_score(model: &LlmModel, use_case: UseCase) -> f64 {\n    let target: u32 = match use_case {\n        UseCase::General | UseCase::Chat => 4096,\n        UseCase::Coding | UseCase::Reasoning => 8192,\n        UseCase::Multimodal => 4096,\n        UseCase::Embedding => 512,\n    };\n    if model.context_length >= target {\n        100.0\n    } else if model.context_length >= target / 2 {\n        70.0\n    } else {\n        30.0\n    }\n}\n\n/// Weighted composite score based on use-case category.\n/// Weights: [Quality, Speed, Fit, Context]\nfn weighted_score(sc: ScoreComponents, use_case: UseCase) -> f64 {\n    let (wq, ws, wf, wc) = match use_case {\n        UseCase::General => (0.45, 0.30, 0.15, 0.10),\n        UseCase::Coding => (0.50, 0.20, 0.15, 0.15),\n        UseCase::Reasoning => (0.55, 0.15, 0.15, 0.15),\n        UseCase::Chat => (0.40, 0.35, 0.15, 0.10),\n        UseCase::Multimodal => (0.50, 0.20, 0.15, 0.15),\n        UseCase::Embedding => (0.30, 0.40, 0.20, 0.10),\n    };\n    let raw = sc.quality * wq + sc.speed * ws + sc.fit * wf + sc.context * wc;\n    (raw * 10.0).round() / 10.0\n}\n\n#[cfg(test)]\nmod tests {\n    use super::*;\n    use crate::hardware::{GpuBackend, SystemSpecs};\n\n    // ────────────────────────────────────────────────────────────────────\n    // Helper to create test model\n    // ────────────────────────────────────────────────────────────────────\n\n    fn test_model(param_count: &str, min_ram: f64, min_vram: Option<f64>) -> LlmModel {\n        LlmModel {\n            name: \"Test Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: param_count.to_string(),\n            parameters_raw: None,\n            min_ram_gb: min_ram,\n            recommended_ram_gb: min_ram * 2.0,\n            min_vram_gb: min_vram,\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: models::ModelFormat::default(),\n        }\n    }\n\n    fn test_system(ram: f64, has_gpu: bool, vram: Option<f64>) -> SystemSpecs {\n        SystemSpecs {\n            total_ram_gb: ram,\n            available_ram_gb: ram * 0.8, // simulate some usage\n            total_cpu_cores: 8,\n            cpu_name: \"Test CPU\".to_string(),\n            has_gpu,\n            gpu_vram_gb: vram,\n            total_gpu_vram_gb: vram, // same as gpu_vram_gb for single-GPU tests\n            gpu_name: if has_gpu {\n                Some(\"Test GPU\".to_string())\n            } else {\n                None\n            },\n            gpu_count: if has_gpu { 1 } else { 0 },\n            unified_memory: false,\n            backend: if has_gpu {\n                GpuBackend::Cuda\n            } else {\n                GpuBackend::CpuX86\n            },\n            gpus: vec![],\n        }\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // score_fit tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_score_fit_too_tight() {\n        // Model doesn't fit\n        let fit = score_fit(10.0, 8.0, 16.0, RunMode::Gpu);\n        assert_eq!(fit, FitLevel::TooTight);\n    }\n\n    #[test]\n    fn test_score_fit_gpu_perfect() {\n        // GPU with recommended memory met\n        let fit = score_fit(8.0, 16.0, 12.0, RunMode::Gpu);\n        assert_eq!(fit, FitLevel::Perfect);\n    }\n\n    #[test]\n    fn test_score_fit_gpu_good() {\n        // GPU with good headroom but not recommended\n        let fit = score_fit(8.0, 10.0, 16.0, RunMode::Gpu);\n        assert_eq!(fit, FitLevel::Good);\n    }\n\n    #[test]\n    fn test_score_fit_gpu_marginal() {\n        // GPU with minimal headroom\n        let fit = score_fit(8.0, 8.5, 16.0, RunMode::Gpu);\n        assert_eq!(fit, FitLevel::Marginal);\n    }\n\n    #[test]\n    fn test_score_fit_cpu_caps_at_marginal() {\n        // CPU-only never reaches Perfect\n        let fit = score_fit(4.0, 32.0, 8.0, RunMode::CpuOnly);\n        assert_eq!(fit, FitLevel::Marginal);\n    }\n\n    #[test]\n    fn test_score_fit_cpu_offload_caps_at_good() {\n        // CpuOffload with plenty of headroom caps at Good\n        let fit = score_fit(8.0, 16.0, 12.0, RunMode::CpuOffload);\n        assert_eq!(fit, FitLevel::Good);\n    }\n\n    #[test]\n    fn test_score_fit_moe_offload() {\n        // MoE offload with good headroom\n        let fit = score_fit(6.0, 8.0, 12.0, RunMode::MoeOffload);\n        assert_eq!(fit, FitLevel::Good);\n\n        // MoE offload with tight fit\n        let fit_tight = score_fit(7.0, 7.5, 14.0, RunMode::MoeOffload);\n        assert_eq!(fit_tight, FitLevel::Marginal);\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // ModelFit::analyze tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_model_fit_gpu_path() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let system = test_system(16.0, true, Some(8.0));\n\n        let fit = ModelFit::analyze(&model, &system);\n\n        // Should use GPU path\n        assert_eq!(fit.run_mode, RunMode::Gpu);\n        assert!(matches!(fit.fit_level, FitLevel::Good | FitLevel::Perfect));\n        assert_eq!(fit.memory_available_gb, 8.0);\n    }\n\n    #[test]\n    fn test_model_fit_cpu_only() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let system = test_system(16.0, false, None);\n\n        let fit = ModelFit::analyze(&model, &system);\n\n        // Should use CPU path\n        assert_eq!(fit.run_mode, RunMode::CpuOnly);\n        // CPU-only caps at Marginal\n        assert_eq!(fit.fit_level, FitLevel::Marginal);\n    }\n\n    #[test]\n    fn test_model_fit_cpu_offload() {\n        let model = test_model(\"13B\", 8.0, Some(8.0));\n        let system = test_system(32.0, true, Some(4.0));\n\n        let fit = ModelFit::analyze(&model, &system);\n\n        // Model doesn't fit in VRAM but fits in RAM\n        assert_eq!(fit.run_mode, RunMode::CpuOffload);\n        assert!(\n            fit.notes\n                .iter()\n                .any(|n| n.contains(\"spilling to system RAM\"))\n        );\n    }\n\n    #[test]\n    fn test_model_fit_unified_memory() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let mut system = test_system(16.0, true, Some(16.0));\n        system.unified_memory = true;\n\n        let fit = ModelFit::analyze(&model, &system);\n\n        // Should use GPU path on unified memory\n        assert_eq!(fit.run_mode, RunMode::Gpu);\n        assert!(fit.notes.iter().any(|n| n.contains(\"Unified memory\")));\n    }\n\n    #[test]\n    fn test_model_fit_too_tight() {\n        let model = test_model(\"70B\", 40.0, Some(40.0));\n        let system = test_system(16.0, true, Some(8.0));\n\n        let fit = ModelFit::analyze(&model, &system);\n\n        // Model doesn't fit anywhere\n        assert_eq!(fit.fit_level, FitLevel::TooTight);\n    }\n\n    #[test]\n    fn test_moe_offload_tries_lower_quantization() {\n        let model = LlmModel {\n            name: \"MoE Quant Test\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"8x7B\".to_string(),\n            parameters_raw: Some(46_700_000_000),\n            min_ram_gb: 25.0,\n            recommended_ram_gb: 50.0,\n            min_vram_gb: Some(25.0),\n            quantization: \"Q8_0\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: true,\n            num_experts: Some(8),\n            active_experts: Some(2),\n            active_parameters: Some(12_900_000_000),\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: models::ModelFormat::default(),\n        };\n        let mut system = test_system(64.0, true, Some(8.0));\n        system.backend = GpuBackend::Cuda;\n\n        let fit = ModelFit::analyze(&model, &system);\n\n        assert_eq!(fit.run_mode, RunMode::MoeOffload);\n        assert!(fit.memory_required_gb <= fit.memory_available_gb);\n        assert!(fit.notes.iter().any(|n| n.contains(\"at Q\")));\n    }\n\n    #[test]\n    fn test_dense_model_uses_quant_in_path_selection() {\n        // Static requirements are high, but lower quantization should make it runnable on GPU.\n        let model = LlmModel {\n            name: \"Quant Path Test\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 20.0,\n            recommended_ram_gb: 40.0,\n            min_vram_gb: Some(16.0),\n            quantization: \"F16\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: models::ModelFormat::default(),\n        };\n        let system = test_system(12.0, true, Some(8.0));\n\n        let fit = ModelFit::analyze(&model, &system);\n\n        assert_eq!(fit.run_mode, RunMode::Gpu);\n        assert_ne!(fit.fit_level, FitLevel::TooTight);\n        assert_ne!(fit.best_quant, \"F16\");\n        assert!(fit.memory_required_gb <= fit.memory_available_gb);\n    }\n\n    #[test]\n    fn test_model_fit_utilization() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let system = test_system(16.0, true, Some(8.0));\n\n        let fit = ModelFit::analyze(&model, &system);\n\n        // Utilization should be reasonable\n        assert!(fit.utilization_pct > 0.0);\n        assert!(fit.utilization_pct <= 100.0);\n        assert_eq!(\n            fit.utilization_pct,\n            (fit.memory_required_gb / fit.memory_available_gb) * 100.0\n        );\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // rank_models_by_fit tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_rank_models_by_fit() {\n        let model1 = test_model(\"7B\", 4.0, Some(4.0));\n        let model2 = test_model(\"13B\", 8.0, Some(8.0));\n        let model3 = test_model(\"70B\", 40.0, Some(40.0));\n\n        let system = test_system(16.0, true, Some(10.0));\n\n        let fit1 = ModelFit::analyze(&model1, &system);\n        let fit2 = ModelFit::analyze(&model2, &system);\n        let fit3 = ModelFit::analyze(&model3, &system);\n\n        let ranked = rank_models_by_fit(vec![fit3.clone(), fit1.clone(), fit2.clone()]);\n\n        // TooTight models should be at the end\n        assert_eq!(ranked.last().unwrap().fit_level, FitLevel::TooTight);\n\n        // Runnable models should be sorted by score\n        let runnable: Vec<_> = ranked\n            .iter()\n            .filter(|f| f.fit_level != FitLevel::TooTight)\n            .collect();\n\n        // Should be sorted by score descending\n        for i in 0..runnable.len() - 1 {\n            assert!(runnable[i].score >= runnable[i + 1].score);\n        }\n    }\n\n    #[test]\n    fn test_rank_models_separates_runnable_from_too_tight() {\n        let model1 = test_model(\"7B\", 4.0, Some(4.0));\n        let model2 = test_model(\"70B\", 40.0, Some(40.0));\n        let model3 = test_model(\"13B\", 8.0, Some(8.0));\n\n        let system = test_system(16.0, true, Some(10.0));\n\n        let fit1 = ModelFit::analyze(&model1, &system);\n        let fit2 = ModelFit::analyze(&model2, &system); // TooTight\n        let fit3 = ModelFit::analyze(&model3, &system);\n\n        let ranked = rank_models_by_fit(vec![fit2, fit1, fit3]);\n\n        // All TooTight should be at the end\n        let first_too_tight = ranked\n            .iter()\n            .position(|f| f.fit_level == FitLevel::TooTight);\n        if let Some(pos) = first_too_tight {\n            for f in &ranked[pos..] {\n                assert_eq!(f.fit_level, FitLevel::TooTight);\n            }\n        }\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // Scoring function tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_fit_score_sweet_spot() {\n        // Sweet spot: 50-80% utilization\n        let score = fit_score(6.0, 10.0);\n        assert!(score >= 95.0); // Should be near perfect\n\n        let score2 = fit_score(8.0, 10.0);\n        assert_eq!(score2, 100.0);\n    }\n\n    #[test]\n    fn test_fit_score_under_utilized() {\n        // Under-utilizing: still good but not optimal\n        let score = fit_score(2.0, 10.0);\n        assert!(score >= 60.0);\n        assert!(score < 100.0);\n    }\n\n    #[test]\n    fn test_fit_score_tight() {\n        // Very tight fit\n        let score = fit_score(9.5, 10.0);\n        assert!(score >= 50.0);\n        assert!(score < 80.0);\n    }\n\n    #[test]\n    fn test_fit_score_exceeds_available() {\n        // Exceeds available memory\n        let score = fit_score(11.0, 10.0);\n        assert_eq!(score, 0.0);\n    }\n\n    #[test]\n    fn test_speed_score_normalized() {\n        // At target TPS\n        let score = speed_score(40.0, UseCase::General);\n        assert_eq!(score, 100.0);\n\n        // Below target\n        let score2 = speed_score(20.0, UseCase::General);\n        assert_eq!(score2, 50.0);\n\n        // Above target (capped at 100)\n        let score3 = speed_score(80.0, UseCase::General);\n        assert_eq!(score3, 100.0);\n    }\n\n    #[test]\n    fn test_context_score() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n\n        // Context meets target\n        let score = context_score(&model, UseCase::General); // target: 4096\n        assert_eq!(score, 100.0);\n\n        // Context below target\n        let score2 = context_score(&model, UseCase::Coding); // target: 8192\n        assert!(score2 < 100.0);\n    }\n\n    #[test]\n    fn test_quality_score_by_params() {\n        let small = test_model(\"1B\", 1.0, Some(1.0));\n        let medium = test_model(\"7B\", 4.0, Some(4.0));\n        let large = test_model(\"70B\", 40.0, Some(40.0));\n\n        let score_small = quality_score(&small, \"Q4_K_M\", UseCase::General);\n        let score_medium = quality_score(&medium, \"Q4_K_M\", UseCase::General);\n        let score_large = quality_score(&large, \"Q4_K_M\", UseCase::General);\n\n        // Larger models should score higher\n        assert!(score_medium > score_small);\n        assert!(score_large > score_medium);\n    }\n\n    #[test]\n    fn test_quality_score_quant_penalty() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n\n        let score_q8 = quality_score(&model, \"Q8_0\", UseCase::General);\n        let score_q4 = quality_score(&model, \"Q4_K_M\", UseCase::General);\n        let score_q2 = quality_score(&model, \"Q2_K\", UseCase::General);\n\n        // Higher quant should have better quality\n        assert!(score_q8 > score_q4);\n        assert!(score_q4 > score_q2);\n    }\n\n    #[test]\n    fn test_weighted_score_composition() {\n        let components = ScoreComponents {\n            quality: 80.0,\n            speed: 70.0,\n            fit: 90.0,\n            context: 100.0,\n        };\n\n        // Different use cases should produce different scores\n        let general_score = weighted_score(components, UseCase::General);\n        let coding_score = weighted_score(components, UseCase::Coding);\n        let embedding_score = weighted_score(components, UseCase::Embedding);\n\n        // All should be valid scores\n        assert!(general_score > 0.0 && general_score <= 100.0);\n        assert!(coding_score > 0.0 && coding_score <= 100.0);\n        assert!(embedding_score > 0.0 && embedding_score <= 100.0);\n\n        // Scores should differ based on different weights\n        assert_ne!(general_score, embedding_score);\n    }\n\n    #[test]\n    fn test_estimate_tps_mlx_faster_than_llamacpp() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let mut system = test_system(16.0, true, Some(16.0));\n        system.backend = GpuBackend::Metal;\n        system.unified_memory = true;\n\n        let tps_mlx = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::Mlx,\n        );\n        let tps_llamacpp = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        // MLX should be faster on Metal\n        assert!(tps_mlx > tps_llamacpp);\n        // MLX K=250 vs LlamaCpp K=160, so ratio should be ~1.56\n        assert!(tps_mlx / tps_llamacpp > 1.4);\n    }\n\n    #[test]\n    fn test_analyze_selects_mlx_on_apple_silicon() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let mut system = test_system(16.0, true, Some(16.0));\n        system.backend = GpuBackend::Metal;\n        system.unified_memory = true;\n\n        let fit = ModelFit::analyze(&model, &system);\n        assert_eq!(fit.runtime, InferenceRuntime::Mlx);\n        // Should have an MLX comparison note\n        assert!(fit.notes.iter().any(|n| n.contains(\"MLX runtime\")));\n    }\n\n    #[test]\n    fn test_analyze_defaults_llamacpp_on_cuda() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let system = test_system(16.0, true, Some(10.0));\n\n        let fit = ModelFit::analyze(&model, &system);\n        assert_eq!(fit.runtime, InferenceRuntime::LlamaCpp);\n    }\n\n    #[test]\n    fn test_analyze_with_context_limit_reduces_memory_estimate() {\n        let mut model = test_model(\"7B\", 4.0, Some(4.0));\n        model.context_length = 32768;\n        let system = test_system(32.0, true, Some(16.0));\n\n        let baseline = ModelFit::analyze(&model, &system);\n        let capped = ModelFit::analyze_with_context_limit(&model, &system, Some(4096));\n\n        assert!(capped.memory_required_gb < baseline.memory_required_gb);\n        assert!(\n            capped\n                .notes\n                .iter()\n                .any(|n| n.contains(\"Context capped for estimation\"))\n        );\n    }\n\n    #[test]\n    fn test_estimate_tps_run_mode_penalties() {\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let system = test_system(16.0, true, Some(10.0));\n\n        let tps_gpu = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n        let tps_moe = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::MoeOffload,\n            InferenceRuntime::LlamaCpp,\n        );\n        let tps_offload = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::CpuOffload,\n            InferenceRuntime::LlamaCpp,\n        );\n        let tps_cpu = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::CpuOnly,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        // GPU should be fastest\n        assert!(tps_gpu > tps_moe);\n        assert!(tps_moe > tps_offload);\n        assert!(tps_offload > tps_cpu);\n\n        // All should be positive\n        assert!(tps_gpu > 0.0);\n        assert!(tps_cpu > 0.0);\n    }\n\n    #[test]\n    fn test_estimate_tps_moe_uses_active_parameters() {\n        let dense_model = test_model(\"30B\", 18.0, Some(18.0));\n        let mut moe_model = dense_model.clone();\n        moe_model.is_moe = true;\n        moe_model.active_parameters = Some(3_000_000_000);\n\n        let system = test_system(64.0, true, Some(24.0));\n\n        let tps_dense = estimate_tps(\n            &dense_model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n        let tps_moe = estimate_tps(\n            &moe_model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        assert!(tps_moe > tps_dense * 5.0);\n    }\n\n    #[test]\n    fn test_estimate_tps_moe_without_active_parameters_falls_back_to_total() {\n        let dense_model = test_model(\"30B\", 18.0, Some(18.0));\n        let mut moe_without_active = dense_model.clone();\n        moe_without_active.is_moe = true;\n        moe_without_active.active_parameters = None;\n\n        let system = test_system(64.0, true, Some(24.0));\n\n        let tps_dense = estimate_tps(\n            &dense_model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n        let tps_moe = estimate_tps(\n            &moe_without_active,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        assert_eq!(tps_dense, tps_moe);\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // Release date sorting tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_sort_by_tps() {\n        let system = test_system(32.0, true, Some(16.0));\n\n        let mut model_fast = test_model(\"7B\", 4.0, Some(4.0));\n        model_fast.name = \"Fast Model\".to_string();\n\n        let mut model_slow = test_model(\"14B\", 8.0, Some(8.0));\n        model_slow.name = \"Slow Model\".to_string();\n\n        let fits = vec![\n            ModelFit::analyze(&model_slow, &system),\n            ModelFit::analyze(&model_fast, &system),\n        ];\n\n        let ranked = rank_models_by_fit_opts_col(fits, false, SortColumn::Tps);\n\n        assert!(ranked[0].estimated_tps >= ranked[1].estimated_tps);\n        assert_eq!(ranked[0].model.name, \"Fast Model\");\n    }\n\n    #[test]\n    fn test_sort_by_release_date() {\n        let system = test_system(32.0, true, Some(16.0));\n\n        let mut model_new = test_model(\"7B\", 4.0, Some(4.0));\n        model_new.name = \"New Model\".to_string();\n        model_new.release_date = Some(\"2025-06-15\".to_string());\n\n        let mut model_old = test_model(\"7B\", 4.0, Some(4.0));\n        model_old.name = \"Old Model\".to_string();\n        model_old.release_date = Some(\"2024-01-10\".to_string());\n\n        let mut model_none = test_model(\"7B\", 4.0, Some(4.0));\n        model_none.name = \"No Date Model\".to_string();\n        model_none.release_date = None;\n\n        let fits = vec![\n            ModelFit::analyze(&model_old, &system),\n            ModelFit::analyze(&model_none, &system),\n            ModelFit::analyze(&model_new, &system),\n        ];\n\n        let ranked = rank_models_by_fit_opts_col(fits, false, SortColumn::ReleaseDate);\n\n        // Newest first, no-date last\n        assert_eq!(ranked[0].model.name, \"New Model\");\n        assert_eq!(ranked[1].model.name, \"Old Model\");\n        assert_eq!(ranked[2].model.name, \"No Date Model\");\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // Bandwidth-based speed estimation tests\n    // ────────────────────────────────────────────────────────────────────\n\n    /// Helper: create a test system with a specific GPU name for bandwidth lookup.\n    fn test_system_with_gpu(ram: f64, vram: f64, gpu_name: &str) -> SystemSpecs {\n        SystemSpecs {\n            total_ram_gb: ram,\n            available_ram_gb: ram * 0.8,\n            total_cpu_cores: 8,\n            cpu_name: \"Test CPU\".to_string(),\n            has_gpu: true,\n            gpu_vram_gb: Some(vram),\n            total_gpu_vram_gb: Some(vram),\n            gpu_name: Some(gpu_name.to_string()),\n            gpu_count: 1,\n            unified_memory: false,\n            backend: GpuBackend::Cuda,\n            gpus: vec![],\n        }\n    }\n\n    #[test]\n    fn test_bandwidth_estimation_rtx4090_faster_than_rtx3060() {\n        let model = test_model(\"27B\", 16.0, Some(16.0));\n        let sys_4090 = test_system_with_gpu(64.0, 24.0, \"NVIDIA GeForce RTX 4090\");\n        let sys_3060 = test_system_with_gpu(64.0, 12.0, \"NVIDIA GeForce RTX 3060\");\n\n        let tps_4090 = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &sys_4090,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n        let tps_3060 = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &sys_3060,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        // RTX 4090 (1008 GB/s) should be ~2.8x faster than RTX 3060 (360 GB/s)\n        assert!(\n            tps_4090 > tps_3060 * 2.0,\n            \"4090={tps_4090}, 3060={tps_3060}\"\n        );\n    }\n\n    #[test]\n    fn test_bandwidth_estimation_rtx4090_27b_q4_realistic() {\n        // Validated against real-world measurement:\n        // Qwen3.5-27B UD-Q4_K_XL on RTX 4090 → ~40 tok/s\n        let model = test_model(\"27B\", 16.0, Some(16.0));\n        let system = test_system_with_gpu(64.0, 24.0, \"NVIDIA GeForce RTX 4090\");\n\n        let tps = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        // Should be in the 30-50 tok/s range (measured: ~40)\n        assert!(tps > 25.0 && tps < 55.0, \"RTX 4090 27B Q4 tok/s = {tps}\");\n    }\n\n    #[test]\n    fn test_bandwidth_estimation_t4_7b_f16_realistic() {\n        // Validated against ggerganov's T4 benchmark (Discussion #4225):\n        // OpenHermes 7B F16 on T4 → ~16 tok/s\n        let model = test_model(\"7B\", 14.0, Some(14.0));\n        let system = test_system_with_gpu(16.0, 16.0, \"Tesla T4\");\n\n        let tps = estimate_tps(\n            &model,\n            \"F16\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        // Should be in the 10-25 tok/s range (measured: ~16)\n        assert!(tps > 8.0 && tps < 30.0, \"T4 7B F16 tok/s = {tps}\");\n    }\n\n    #[test]\n    fn test_bandwidth_estimation_unknown_gpu_uses_fallback() {\n        // Unknown GPU names should still produce reasonable estimates\n        // via the fallback constant-K path.\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let system = test_system_with_gpu(16.0, 10.0, \"Some Unknown GPU\");\n\n        let tps = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &system,\n            RunMode::Gpu,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        // Should fall back to K=220 path and produce a positive value\n        assert!(tps > 0.0, \"unknown GPU should still produce an estimate\");\n    }\n\n    #[test]\n    fn test_bandwidth_estimation_cpu_only_ignores_bandwidth() {\n        // CPU-only mode should NOT use GPU bandwidth, even if GPU is known.\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let sys_4090 = test_system_with_gpu(64.0, 24.0, \"NVIDIA GeForce RTX 4090\");\n        let sys_unknown = test_system_with_gpu(64.0, 24.0, \"Unknown GPU\");\n\n        let tps_4090 = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &sys_4090,\n            RunMode::CpuOnly,\n            InferenceRuntime::LlamaCpp,\n        );\n        let tps_unknown = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            &sys_unknown,\n            RunMode::CpuOnly,\n            InferenceRuntime::LlamaCpp,\n        );\n\n        // CPU-only should produce the same result regardless of GPU\n        assert!(\n            (tps_4090 - tps_unknown).abs() < 0.01,\n            \"CPU-only should ignore GPU: 4090={tps_4090}, unknown={tps_unknown}\"\n        );\n    }\n\n    #[test]\n    fn test_prequantized_requires_cuda_or_rocm() {\n        let mut model = test_model(\"7B\", 4.0, Some(4.0));\n        model.format = models::ModelFormat::Awq;\n\n        // AWQ on CUDA → compatible (default test GPU name is unrecognized, assumed ok)\n        let cuda_sys = test_system(64.0, true, Some(24.0));\n        assert!(backend_compatible(&model, &cuda_sys));\n\n        // AWQ on Metal → incompatible (no vllm-metal support yet)\n        let mut metal_sys = test_system(64.0, true, Some(64.0));\n        metal_sys.backend = GpuBackend::Metal;\n        metal_sys.unified_memory = true;\n        assert!(!backend_compatible(&model, &metal_sys));\n\n        // AWQ on Vulkan → incompatible\n        let mut vulkan_sys = test_system(64.0, true, Some(24.0));\n        vulkan_sys.backend = GpuBackend::Vulkan;\n        assert!(!backend_compatible(&model, &vulkan_sys));\n\n        // GPTQ on CUDA → compatible\n        model.format = models::ModelFormat::Gptq;\n        assert!(backend_compatible(&model, &cuda_sys));\n\n        // Regular GGUF on Metal → compatible (unchanged behavior)\n        let mut gguf_model = test_model(\"7B\", 4.0, Some(4.0));\n        gguf_model.format = models::ModelFormat::Gguf;\n        assert!(backend_compatible(&gguf_model, &metal_sys));\n    }\n\n    #[test]\n    fn test_awq_incompatible_on_volta_v100() {\n        // V100 is Volta (cc 7.0) — AWQ requires cc >= 7.5\n        let mut model = test_model(\"7B\", 4.0, Some(4.0));\n        model.format = models::ModelFormat::Awq;\n        model.quantization = \"AWQ-4bit\".to_string();\n\n        let v100_sys = test_system_with_gpu(64.0, 16.0, \"Tesla V100-PCIE-16GB\");\n        assert!(!backend_compatible(&model, &v100_sys));\n    }\n\n    #[test]\n    fn test_gptq_incompatible_on_volta_v100() {\n        let mut model = test_model(\"7B\", 4.0, Some(4.0));\n        model.format = models::ModelFormat::Gptq;\n        model.quantization = \"GPTQ-Int4\".to_string();\n\n        let v100_sys = test_system_with_gpu(64.0, 16.0, \"Tesla V100-PCIE-16GB\");\n        assert!(!backend_compatible(&model, &v100_sys));\n    }\n\n    #[test]\n    fn test_awq_compatible_on_turing_and_newer() {\n        let mut model = test_model(\"7B\", 4.0, Some(4.0));\n        model.format = models::ModelFormat::Awq;\n        model.quantization = \"AWQ-4bit\".to_string();\n\n        // T4 is Turing (cc 7.5) — should work\n        let t4_sys = test_system_with_gpu(64.0, 16.0, \"Tesla T4\");\n        assert!(backend_compatible(&model, &t4_sys));\n\n        // RTX 3090 is Ampere (cc 8.6) — should work\n        let ampere_sys = test_system_with_gpu(64.0, 24.0, \"NVIDIA GeForce RTX 3090\");\n        assert!(backend_compatible(&model, &ampere_sys));\n\n        // RTX 4090 is Ada Lovelace (cc 8.9) — should work\n        let ada_sys = test_system_with_gpu(64.0, 24.0, \"NVIDIA GeForce RTX 4090\");\n        assert!(backend_compatible(&model, &ada_sys));\n\n        // H100 is Hopper (cc 9.0) — should work\n        let hopper_sys = test_system_with_gpu(64.0, 80.0, \"NVIDIA H100 SXM\");\n        assert!(backend_compatible(&model, &hopper_sys));\n    }\n\n    #[test]\n    fn test_awq_on_rocm_always_compatible() {\n        // ROCm GPUs don't have NVIDIA compute capability — assume compatible\n        let mut model = test_model(\"7B\", 4.0, Some(4.0));\n        model.format = models::ModelFormat::Awq;\n        model.quantization = \"AWQ-4bit\".to_string();\n\n        let mut rocm_sys = test_system_with_gpu(64.0, 24.0, \"AMD Instinct MI300X\");\n        rocm_sys.backend = GpuBackend::Rocm;\n        assert!(backend_compatible(&model, &rocm_sys));\n    }\n\n    #[test]\n    fn test_awq_on_pascal_incompatible() {\n        // P100 is Pascal (cc 6.1) — AWQ requires cc >= 7.5\n        let mut model = test_model(\"7B\", 4.0, Some(4.0));\n        model.format = models::ModelFormat::Awq;\n        model.quantization = \"AWQ-4bit\".to_string();\n\n        let p100_sys = test_system_with_gpu(64.0, 16.0, \"Tesla P100\");\n        assert!(!backend_compatible(&model, &p100_sys));\n    }\n\n    #[test]\n    fn test_gguf_on_volta_still_compatible() {\n        // GGUF models should remain compatible on any GPU — no CC restriction\n        let model = test_model(\"7B\", 4.0, Some(4.0));\n        let v100_sys = test_system_with_gpu(64.0, 16.0, \"Tesla V100-PCIE-16GB\");\n        assert!(backend_compatible(&model, &v100_sys));\n    }\n}\n"
  },
  {
    "path": "llmfit-core/src/hardware.rs",
    "content": "use std::collections::BTreeMap;\nuse sysinfo::System;\n\n/// The acceleration backend for inference speed estimation.\n#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize)]\npub enum GpuBackend {\n    Cuda,\n    Metal,\n    Rocm,\n    Vulkan, // AMD/other GPUs without ROCm (e.g. Windows AMD, older AMD)\n    Sycl,   // Intel oneAPI\n    CpuArm,\n    CpuX86,\n    Ascend,\n}\n\nimpl GpuBackend {\n    pub fn label(&self) -> &'static str {\n        match self {\n            GpuBackend::Cuda => \"CUDA\",\n            GpuBackend::Metal => \"Metal\",\n            GpuBackend::Rocm => \"ROCm\",\n            GpuBackend::Vulkan => \"Vulkan\",\n            GpuBackend::Sycl => \"SYCL\",\n            GpuBackend::CpuArm => \"CPU (ARM)\",\n            GpuBackend::CpuX86 => \"CPU (x86)\",\n            GpuBackend::Ascend => \"NPU (Ascend)\",\n        }\n    }\n}\n\n/// Information about a single detected GPU.\n#[derive(Debug, Clone, serde::Serialize)]\npub struct GpuInfo {\n    pub name: String,\n    pub vram_gb: Option<f64>,\n    pub backend: GpuBackend,\n    pub count: u32, // >1 for same-model multi-GPU (e.g. 2x RTX 4090)\n    pub unified_memory: bool,\n}\n\n#[derive(Debug, Clone, serde::Serialize)]\npub struct SystemSpecs {\n    pub total_ram_gb: f64,\n    pub available_ram_gb: f64,\n    pub total_cpu_cores: usize,\n    pub cpu_name: String,\n    pub has_gpu: bool,\n    pub gpu_vram_gb: Option<f64>,\n    /// Total VRAM across all same-model GPUs (e.g., 48GB for 2x RTX 3090).\n    /// For multi-GPU inference backends (llama.cpp, vLLM), models can be split\n    /// across cards, so we use total VRAM for fit scoring.\n    pub total_gpu_vram_gb: Option<f64>,\n    pub gpu_name: Option<String>,\n    pub gpu_count: u32,\n    pub unified_memory: bool,\n    pub backend: GpuBackend,\n    /// All detected GPUs (may span different vendors/backends).\n    pub gpus: Vec<GpuInfo>,\n}\n\nimpl SystemSpecs {\n    pub fn detect() -> Self {\n        let mut sys = System::new_all();\n        sys.refresh_all();\n\n        let total_ram_bytes = sys.total_memory();\n        let available_ram_bytes = sys.available_memory();\n        let total_ram_gb = total_ram_bytes as f64 / (1024.0 * 1024.0 * 1024.0);\n        let available_ram_gb = if available_ram_bytes == 0 && total_ram_bytes > 0 {\n            // sysinfo may fail to report available memory on some platforms\n            // (e.g. macOS Tahoe / newer macOS versions). Try fallbacks.\n            Self::available_ram_fallback(&sys, total_ram_bytes, total_ram_gb)\n        } else {\n            available_ram_bytes as f64 / (1024.0 * 1024.0 * 1024.0)\n        };\n\n        let total_cpu_cores = sys.cpus().len();\n        let cpu_name = Self::detect_cpu_name(&sys);\n\n        let gpus = Self::detect_all_gpus(total_ram_gb, &cpu_name);\n\n        // Primary GPU = the one with the most VRAM (best for inference).\n        // For fit scoring, we use the primary GPU's VRAM pool.\n        let primary = gpus.first();\n        let has_gpu = !gpus.is_empty();\n        let gpu_vram_gb = primary.and_then(|g| g.vram_gb);\n        // Total VRAM = per-card VRAM * count (for multi-GPU tensor splitting)\n        let total_gpu_vram_gb = primary.and_then(|g| g.vram_gb.map(|vram| vram * g.count as f64));\n        let gpu_name = primary.map(|g| g.name.clone());\n        let gpu_count = primary.map(|g| g.count).unwrap_or(0);\n        let unified_memory = primary.map(|g| g.unified_memory).unwrap_or(false);\n\n        let cpu_backend =\n            if cfg!(target_arch = \"aarch64\") || cpu_name.to_lowercase().contains(\"apple\") {\n                GpuBackend::CpuArm\n            } else {\n                GpuBackend::CpuX86\n            };\n        let backend = primary.map(|g| g.backend).unwrap_or(cpu_backend);\n\n        SystemSpecs {\n            total_ram_gb,\n            available_ram_gb,\n            total_cpu_cores,\n            cpu_name,\n            has_gpu,\n            gpu_vram_gb,\n            total_gpu_vram_gb,\n            gpu_name,\n            gpu_count,\n            unified_memory,\n            backend,\n            gpus,\n        }\n    }\n\n    /// Detect all GPUs across all vendors. Returns a Vec sorted by VRAM descending\n    /// (best GPU first). Unlike the old cascade, this does NOT short-circuit:\n    /// a system with both NVIDIA and AMD GPUs will report both.\n    fn detect_all_gpus(total_ram_gb: f64, cpu_name: &str) -> Vec<GpuInfo> {\n        let mut gpus = Vec::new();\n\n        // NVIDIA GPUs via nvidia-smi, with sysfs fallback for Linux/toolbox setups\n        let nvidia = Self::detect_nvidia_gpus();\n        if nvidia.is_empty() {\n            if let Some(nvidia_sysfs) = Self::detect_nvidia_gpu_sysfs_info() {\n                gpus.push(nvidia_sysfs);\n            }\n        } else {\n            gpus.extend(nvidia);\n        }\n\n        // AMD GPUs via rocm-smi or sysfs\n        if let Some(amd) = Self::detect_amd_gpu_rocm_info() {\n            gpus.push(amd);\n        } else if let Some(amd) = Self::detect_amd_gpu_sysfs_info() {\n            gpus.push(amd);\n        }\n\n        // Windows WMI (catches GPUs not found by vendor-specific tools)\n        for wmi_gpu in Self::detect_gpu_windows_info() {\n            // Skip if we already found a GPU with the same name from a vendor tool\n            let dominated = gpus.iter().any(|existing| {\n                let existing_lower = existing.name.to_lowercase();\n                let wmi_lower = wmi_gpu.name.to_lowercase();\n                existing_lower.contains(&wmi_lower) || wmi_lower.contains(&existing_lower)\n            });\n            if !dominated {\n                gpus.push(wmi_gpu);\n            }\n        }\n\n        // AMD unified memory APUs (e.g. Ryzen AI MAX series).\n        // These share the full system RAM between CPU and GPU, like Apple Silicon.\n        // WMI AdapterRAM is a 32-bit field capped at ~4 GB, so we override with\n        // total system RAM for these APUs.\n        if is_amd_unified_memory_apu(cpu_name) {\n            let amd_idx = gpus.iter().position(|g| {\n                let lower = g.name.to_lowercase();\n                lower.contains(\"amd\") || lower.contains(\"radeon\")\n            });\n            if let Some(idx) = amd_idx {\n                gpus[idx].unified_memory = true;\n                gpus[idx].vram_gb = Some(total_ram_gb);\n            } else {\n                // No AMD GPU found via other methods; create one.\n                gpus.push(GpuInfo {\n                    name: format!(\"{} (integrated)\", cpu_name),\n                    vram_gb: Some(total_ram_gb),\n                    backend: GpuBackend::Vulkan,\n                    count: 1,\n                    unified_memory: true,\n                });\n            }\n        }\n\n        // NVIDIA Grace / DGX Spark unified memory SoCs (e.g. GB10, GB20).\n        // These share the full system RAM between CPU and GPU, like Apple Silicon.\n        // nvidia-smi may report 0 VRAM or a small dedicated portion, so we\n        // override with total system RAM and flag as unified memory.\n        let is_nvidia_unified = gpus.iter().any(|g| {\n            let lower = g.name.to_lowercase();\n            lower.contains(\"gb10\") || lower.contains(\"gb20\")\n        });\n        if is_nvidia_unified {\n            for gpu in &mut gpus {\n                let lower = gpu.name.to_lowercase();\n                if lower.contains(\"gb10\") || lower.contains(\"gb20\") {\n                    gpu.unified_memory = true;\n                    gpu.vram_gb = Some(total_ram_gb);\n                }\n            }\n        }\n\n        // Intel Arc via sysfs\n        if let Some(vram) = Self::detect_intel_gpu() {\n            let already_found = gpus.iter().any(|g| g.name.to_lowercase().contains(\"intel\"));\n            if !already_found {\n                gpus.push(GpuInfo {\n                    name: \"Intel Arc\".to_string(),\n                    vram_gb: Some(vram),\n                    backend: GpuBackend::Sycl,\n                    count: 1,\n                    unified_memory: false,\n                });\n            }\n        }\n\n        // Apple Silicon (unified memory)\n        if let Some(vram) = Self::detect_apple_gpu(total_ram_gb) {\n            let name = if cpu_name.to_lowercase().contains(\"apple\") {\n                cpu_name.to_string()\n            } else {\n                \"Apple Silicon\".to_string()\n            };\n            gpus.push(GpuInfo {\n                name,\n                vram_gb: Some(vram),\n                backend: GpuBackend::Metal,\n                count: 1,\n                unified_memory: true,\n            });\n        }\n\n        // Ascend NPUs via npu-smi\n        let ascend = Self::detect_ascend_npus();\n        if !ascend.is_empty() {\n            gpus.extend(ascend);\n        }\n\n        // Vulkan fallback (e.g. Android/Termux with Turnip)\n        for vulkan_gpu in Self::detect_vulkan_gpu_info() {\n            let dominated = gpus\n                .iter()\n                .any(|existing| Self::is_same_gpu_name(&existing.name, &vulkan_gpu.name));\n            if !dominated {\n                gpus.push(vulkan_gpu);\n            }\n        }\n\n        // Sort by VRAM descending so the best GPU is primary\n        gpus.sort_by(|a, b| {\n            let va = a.vram_gb.unwrap_or(0.0);\n            let vb = b.vram_gb.unwrap_or(0.0);\n            vb.partial_cmp(&va).unwrap_or(std::cmp::Ordering::Equal)\n        });\n\n        gpus\n    }\n\n    /// Detect NVIDIA GPUs via nvidia-smi. Returns one GpuInfo per unique model,\n    /// with count and per-card VRAM for same-model multi-GPU setups.\n    ///\n    /// First tries querying `addressing_mode` to detect unified memory (Tegra/Grace\n    /// Blackwell platforms). Falls back to the standard 2-column query if the field\n    /// is unavailable on older nvidia-smi versions.\n    fn detect_nvidia_gpus() -> Vec<GpuInfo> {\n        // Try the extended query first (addressing_mode,memory.total,name).\n        // On NVIDIA Tegra / Grace Blackwell, addressing_mode returns \"ATS\"\n        // (Address Translation Services) which signals unified CPU+GPU memory.\n        if let Some(gpus) = Self::try_nvidia_smi_with_addressing_mode() {\n            return gpus;\n        }\n\n        // Fallback: standard 2-column query for older nvidia-smi versions\n        let output = match std::process::Command::new(\"nvidia-smi\")\n            .arg(\"--query-gpu=memory.total,name\")\n            .arg(\"--format=csv,noheader,nounits\")\n            .output()\n        {\n            Ok(o) if o.status.success() => o,\n            _ => return Vec::new(),\n        };\n\n        let text = match String::from_utf8(output.stdout) {\n            Ok(t) => t,\n            Err(_) => return Vec::new(),\n        };\n\n        Self::parse_nvidia_smi_list(&text)\n    }\n\n    /// Try nvidia-smi with `addressing_mode` column. Returns `None` if the\n    /// query fails (e.g. older driver that doesn't support the field), so the\n    /// caller can fall back to the standard query.\n    fn try_nvidia_smi_with_addressing_mode() -> Option<Vec<GpuInfo>> {\n        let output = std::process::Command::new(\"nvidia-smi\")\n            .arg(\"--query-gpu=addressing_mode,memory.total,name\")\n            .arg(\"--format=csv,noheader,nounits\")\n            .output()\n            .ok()?;\n\n        if !output.status.success() {\n            return None;\n        }\n\n        let text = String::from_utf8(output.stdout).ok()?;\n        Some(Self::parse_nvidia_smi_extended(&text))\n    }\n\n    /// Parse `nvidia-smi --query-gpu=addressing_mode,memory.total,name`.\n    /// Detects unified memory when addressing_mode is \"ATS\" and VRAM is\n    /// unavailable — common on NVIDIA Tegra / Grace Blackwell (DGX Spark).\n    /// Falls back to system RAM via /proc/meminfo as the unified memory pool.\n    fn parse_nvidia_smi_extended(text: &str) -> Vec<GpuInfo> {\n        // Track per-model: (count, per_card_vram_mb, is_unified)\n        let mut grouped: BTreeMap<String, (u32, f64, bool)> = BTreeMap::new();\n        let total_ram_gb = read_proc_meminfo_total_gb();\n\n        for line in text.lines() {\n            let line = line.trim();\n            if line.is_empty() {\n                continue;\n            }\n            let parts: Vec<&str> = line.splitn(3, ',').collect();\n            if parts.len() < 3 {\n                continue;\n            }\n\n            let addr_mode = parts[0].trim();\n            let is_unified = addr_mode.eq_ignore_ascii_case(\"ATS\");\n\n            let name = parts[2].trim().to_string();\n            let name = if name.is_empty() {\n                \"NVIDIA GPU\".to_string()\n            } else {\n                name\n            };\n\n            let parsed_vram_mb = parts[1].trim().parse::<f64>().unwrap_or(0.0);\n\n            let vram_mb = if parsed_vram_mb > 0.0 {\n                parsed_vram_mb\n            } else if is_unified {\n                // Unified memory: use total system RAM as the shared pool\n                total_ram_gb.unwrap_or(0.0) * 1024.0\n            } else {\n                estimate_vram_from_name(&name) * 1024.0\n            };\n\n            let entry = grouped.entry(name).or_insert((0, 0.0, false));\n            entry.0 += 1;\n            if vram_mb > entry.1 {\n                entry.1 = vram_mb;\n            }\n            if is_unified {\n                entry.2 = true;\n            }\n        }\n\n        if grouped.is_empty() {\n            return Vec::new();\n        }\n\n        grouped\n            .into_iter()\n            .map(|(name, (count, per_card_vram_mb, is_unified))| GpuInfo {\n                name,\n                vram_gb: if per_card_vram_mb > 0.0 {\n                    Some(per_card_vram_mb / 1024.0)\n                } else {\n                    None\n                },\n                backend: GpuBackend::Cuda,\n                count,\n                unified_memory: is_unified,\n            })\n            .collect()\n    }\n\n    /// Parse `nvidia-smi --query-gpu=memory.total,name --format=csv,noheader,nounits`.\n    /// Groups same-model cards and keeps per-card VRAM (never sums across cards).\n    fn parse_nvidia_smi_list(text: &str) -> Vec<GpuInfo> {\n        let mut grouped: BTreeMap<String, (u32, f64)> = BTreeMap::new();\n\n        for line in text.lines() {\n            let line = line.trim();\n            if line.is_empty() {\n                continue;\n            }\n            let parts: Vec<&str> = line.splitn(2, ',').collect();\n\n            let name = parts\n                .get(1)\n                .map(|s| s.trim())\n                .filter(|s| !s.is_empty())\n                .unwrap_or(\"NVIDIA GPU\")\n                .to_string();\n\n            let parsed_vram_mb = parts\n                .first()\n                .and_then(|s| s.trim().parse::<f64>().ok())\n                .unwrap_or(0.0);\n            let vram_mb = if parsed_vram_mb > 0.0 {\n                parsed_vram_mb\n            } else {\n                estimate_vram_from_name(&name) * 1024.0\n            };\n\n            let entry = grouped.entry(name).or_insert((0, 0.0));\n            entry.0 += 1;\n            if vram_mb > entry.1 {\n                entry.1 = vram_mb;\n            }\n        }\n\n        if grouped.is_empty() {\n            return Vec::new();\n        }\n\n        grouped\n            .into_iter()\n            .map(|(name, (count, per_card_vram_mb))| GpuInfo {\n                name,\n                vram_gb: if per_card_vram_mb > 0.0 {\n                    Some(per_card_vram_mb / 1024.0)\n                } else {\n                    None\n                },\n                backend: GpuBackend::Cuda,\n                count,\n                unified_memory: false,\n            })\n            .collect()\n    }\n\n    /// Detect NVIDIA GPUs via Linux sysfs when nvidia-smi is unavailable.\n    /// This is common in containerized environments (e.g. Toolbx) and\n    /// Nouveau-based systems.\n    fn detect_nvidia_gpu_sysfs_info() -> Option<GpuInfo> {\n        if !cfg!(target_os = \"linux\") {\n            return None;\n        }\n\n        let entries = std::fs::read_dir(\"/sys/class/drm\").ok()?;\n        let mut gpu_count: u32 = 0;\n        let mut total_vram_bytes: u64 = 0;\n        let mut slot_hints: Vec<String> = Vec::new();\n        let mut backend = GpuBackend::Vulkan;\n\n        for entry in entries.flatten() {\n            let card_path = entry.path();\n            let fname = card_path.file_name()?.to_str()?.to_string();\n            // Only look at cardN entries, not connectors (cardN-DP-1, etc.)\n            if !fname.starts_with(\"card\") || fname.contains('-') {\n                continue;\n            }\n\n            let device_path = card_path.join(\"device\");\n            let vendor_path = device_path.join(\"vendor\");\n            let Ok(vendor) = std::fs::read_to_string(&vendor_path) else {\n                continue;\n            };\n            if vendor.trim() != \"0x10de\" {\n                continue;\n            }\n\n            gpu_count += 1;\n\n            if let Ok(vram_str) = std::fs::read_to_string(device_path.join(\"mem_info_vram_total\"))\n                && let Ok(vram_bytes) = vram_str.trim().parse::<u64>()\n                && vram_bytes > 0\n            {\n                // Track the maximum per-card VRAM instead of summing across all cards.\n                total_vram_bytes = total_vram_bytes.max(vram_bytes);\n            }\n\n            if let Ok(uevent) = std::fs::read_to_string(device_path.join(\"uevent\")) {\n                for line in uevent.lines() {\n                    if let Some(slot) = line.strip_prefix(\"PCI_SLOT_NAME=\") {\n                        slot_hints.push(slot.to_string());\n                    } else if let Some(driver) = line.strip_prefix(\"DRIVER=\")\n                        && driver.eq_ignore_ascii_case(\"nvidia\")\n                    {\n                        backend = GpuBackend::Cuda;\n                    }\n                }\n            }\n        }\n\n        if gpu_count == 0 {\n            return None;\n        }\n\n        let name = Self::get_nvidia_gpu_name_lspci(&slot_hints)\n            .unwrap_or_else(|| \"NVIDIA GPU\".to_string());\n\n        let mut vram_gb = if total_vram_bytes > 0 {\n            Some(total_vram_bytes as f64 / (1024.0 * 1024.0 * 1024.0))\n        } else {\n            None\n        };\n\n        if vram_gb.is_none() {\n            let est = estimate_vram_from_name(&name);\n            if est > 0.0 {\n                vram_gb = Some(est);\n            }\n        }\n\n        Some(GpuInfo {\n            name,\n            vram_gb,\n            backend,\n            count: gpu_count,\n            unified_memory: false,\n        })\n    }\n\n    /// Detect AMD GPU via rocm-smi (available on Linux with ROCm installed).\n    /// Parses per-card VRAM and GPU name from rocm-smi output.\n    fn detect_amd_gpu_rocm_info() -> Option<GpuInfo> {\n        // Try rocm-smi --showmeminfo vram for VRAM\n        let vram_output = std::process::Command::new(\"rocm-smi\")\n            .arg(\"--showmeminfo\")\n            .arg(\"vram\")\n            .output()\n            .ok()?;\n\n        if !vram_output.status.success() {\n            return None;\n        }\n\n        let vram_text = String::from_utf8(vram_output.stdout).ok()?;\n\n        // Parse VRAM total from rocm-smi output.\n        // Typical format includes a line like:\n        //   \"GPU[0] : vram Total Memory (B): 8589934592\"\n        // or in table format with \"Total\" and bytes.\n        let mut per_gpu_vram_bytes: Vec<u64> = Vec::new();\n        let mut gpu_count: u32 = 0;\n        for line in vram_text.lines() {\n            let lower = line.to_lowercase();\n            if lower.contains(\"total\") && !lower.contains(\"used\") {\n                // Extract the numeric value (bytes)\n                if let Some(val) = line\n                    .split_whitespace()\n                    .filter_map(|w| w.parse::<u64>().ok())\n                    .next_back()\n                    && val > 0\n                {\n                    per_gpu_vram_bytes.push(val);\n                    gpu_count += 1;\n                }\n            }\n        }\n\n        if gpu_count == 0 {\n            // rocm-smi succeeded but we couldn't parse VRAM; GPU exists though\n            gpu_count = 1;\n        }\n\n        // Try to get GPU name from rocm-smi --showproductname\n        let gpu_name = std::process::Command::new(\"rocm-smi\")\n            .arg(\"--showproductname\")\n            .output()\n            .ok()\n            .and_then(|o| {\n                if o.status.success() {\n                    String::from_utf8(o.stdout).ok()\n                } else {\n                    None\n                }\n            })\n            .and_then(|text| {\n                // Look for \"Card Series\" or \"Card Model\" lines\n                for line in text.lines() {\n                    let lower = line.to_lowercase();\n                    if (lower.contains(\"card series\") || lower.contains(\"card model\"))\n                        && let Some(val) = line.split(':').nth(1)\n                    {\n                        let name = val.trim().to_string();\n                        if !name.is_empty() {\n                            return Some(name);\n                        }\n                    }\n                }\n                None\n            });\n\n        let name = gpu_name.unwrap_or_else(|| \"AMD GPU\".to_string());\n        let max_per_gpu_bytes = per_gpu_vram_bytes.into_iter().max().unwrap_or(0);\n        let vram_gb = if max_per_gpu_bytes > 0 {\n            Some(max_per_gpu_bytes as f64 / (1024.0 * 1024.0 * 1024.0))\n        } else {\n            let est = estimate_vram_from_name(&name);\n            if est > 0.0 { Some(est) } else { None }\n        };\n\n        Some(GpuInfo {\n            name,\n            vram_gb,\n            backend: GpuBackend::Rocm,\n            count: gpu_count,\n            unified_memory: false,\n        })\n    }\n\n    /// Detect AMD GPU via sysfs on Linux (works without ROCm installed).\n    /// AMD vendor ID is 0x1002.\n    fn detect_amd_gpu_sysfs_info() -> Option<GpuInfo> {\n        if !cfg!(target_os = \"linux\") {\n            return None;\n        }\n\n        let mut slot_hints: Vec<String> = Vec::new();\n        let entries = std::fs::read_dir(\"/sys/class/drm\").ok()?;\n\n        for entry in entries.flatten() {\n            let card_path = entry.path();\n            let fname = card_path.file_name()?.to_str()?.to_string();\n            // Only look at cardN entries, not cardN-DP-1 etc.\n            if !fname.starts_with(\"card\") || fname.contains('-') {\n                continue;\n            }\n\n            let device_path = card_path.join(\"device\");\n            let vendor_path = device_path.join(\"vendor\");\n            if let Ok(vendor) = std::fs::read_to_string(&vendor_path) {\n                if vendor.trim() != \"0x1002\" {\n                    continue;\n                }\n            } else {\n                continue;\n            }\n\n            // Found an AMD GPU. Try to read VRAM.\n            let mut vram_gb: Option<f64> = None;\n            let vram_path = device_path.join(\"mem_info_vram_total\");\n            if let Ok(vram_str) = std::fs::read_to_string(&vram_path)\n                && let Ok(vram_bytes) = vram_str.trim().parse::<u64>()\n                && vram_bytes > 0\n            {\n                vram_gb = Some(vram_bytes as f64 / (1024.0 * 1024.0 * 1024.0));\n            }\n\n            if let Ok(uevent) = std::fs::read_to_string(device_path.join(\"uevent\")) {\n                for line in uevent.lines() {\n                    if let Some(slot) = line.strip_prefix(\"PCI_SLOT_NAME=\") {\n                        slot_hints.push(slot.to_string());\n                    }\n                }\n            }\n\n            // Try to get GPU name from lspci\n            let gpu_name = Self::get_amd_gpu_name_lspci(&slot_hints);\n            let name = gpu_name.unwrap_or_else(|| \"AMD GPU\".to_string());\n\n            // If we still don't have VRAM, try to estimate from name\n            if vram_gb.is_none() {\n                let estimated = estimate_vram_from_name(&name);\n                if estimated > 0.0 {\n                    vram_gb = Some(estimated);\n                }\n            }\n\n            // AMD GPU without ROCm — Vulkan is the most likely inference backend\n            return Some(GpuInfo {\n                name,\n                vram_gb,\n                backend: GpuBackend::Vulkan,\n                count: 1,\n                unified_memory: false,\n            });\n        }\n        None\n    }\n\n    /// Extract AMD GPU name from lspci output.\n    fn get_amd_gpu_name_lspci(slot_hints: &[String]) -> Option<String> {\n        let text = Self::lspci_output()?;\n\n        // First pass: match exact slot (e.g. \"0000:01:00.0\"), if available.\n        for slot in slot_hints {\n            for line in text.lines() {\n                let lower = line.to_lowercase();\n                if line.starts_with(slot)\n                    && (lower.contains(\"vga\") || lower.contains(\"3d\") || lower.contains(\"display\"))\n                    && (lower.contains(\"amd\") || lower.contains(\"ati\"))\n                    && let Some(model) = Self::extract_model_from_lspci_line(line)\n                {\n                    return Some(model);\n                }\n            }\n        }\n\n        // Fallback: any AMD/ATI display controller line.\n        for line in text.lines() {\n            let lower = line.to_lowercase();\n            if (lower.contains(\"vga\") || lower.contains(\"3d\"))\n                && (lower.contains(\"amd\") || lower.contains(\"ati\"))\n                && let Some(model) = Self::extract_model_from_lspci_line(line)\n            {\n                return Some(model);\n            }\n        }\n        None\n    }\n\n    /// Resolve NVIDIA GPU name from lspci, optionally prioritizing specific\n    /// PCI slots discovered from sysfs.\n    fn get_nvidia_gpu_name_lspci(slot_hints: &[String]) -> Option<String> {\n        let text = Self::lspci_output()?;\n\n        // First pass: match exact slot (e.g. \"0000:01:00.0\"), if available.\n        for slot in slot_hints {\n            for line in text.lines() {\n                let lower = line.to_lowercase();\n                if line.starts_with(slot)\n                    && (lower.contains(\"vga\") || lower.contains(\"3d\") || lower.contains(\"display\"))\n                    && lower.contains(\"nvidia\")\n                    && let Some(model) = Self::extract_model_from_lspci_line(line)\n                {\n                    return Some(model);\n                }\n            }\n        }\n\n        // Fallback: any NVIDIA display controller line.\n        for line in text.lines() {\n            let lower = line.to_lowercase();\n            if (lower.contains(\"vga\") || lower.contains(\"3d\") || lower.contains(\"display\"))\n                && lower.contains(\"nvidia\")\n                && let Some(model) = Self::extract_model_from_lspci_line(line)\n            {\n                return Some(model);\n            }\n        }\n\n        None\n    }\n\n    /// Read lspci output, with host fallback for containerized environments.\n    fn lspci_output() -> Option<String> {\n        let local = std::process::Command::new(\"lspci\")\n            .arg(\"-nnD\")\n            .output()\n            .ok()\n            .filter(|o| o.status.success())\n            .and_then(|o| String::from_utf8(o.stdout).ok());\n\n        if local.is_some() {\n            return local;\n        }\n\n        std::process::Command::new(\"flatpak-spawn\")\n            .args([\"--host\", \"lspci\", \"-nnD\"])\n            .output()\n            .ok()\n            .filter(|o| o.status.success())\n            .and_then(|o| String::from_utf8(o.stdout).ok())\n    }\n\n    /// Extract a likely model name from an lspci line.\n    /// Prefers human-readable bracketed tokens (e.g. \"[GeForce RTX 2060]\").\n    fn extract_model_from_lspci_line(line: &str) -> Option<String> {\n        let mut best: Option<String> = None;\n        let mut rest = line;\n\n        while let Some(start) = rest.find('[') {\n            let after = &rest[start + 1..];\n            let Some(end) = after.find(']') else { break };\n            let token = after[..end].trim();\n            let usable = !token.is_empty()\n                && !token.contains(':')\n                && !token.chars().all(|c| c.is_ascii_digit());\n\n            if usable\n                && best\n                    .as_ref()\n                    .map(|current| token.len() > current.len())\n                    .unwrap_or(true)\n            {\n                best = Some(token.to_string());\n            }\n\n            rest = &after[end + 1..];\n        }\n\n        if best.is_some() {\n            return best;\n        }\n\n        // Fallback: text after the first \": \" separator.\n        line.split_once(\": \")\n            .map(|(_, right)| right.trim().to_string())\n            .filter(|s| !s.is_empty())\n    }\n\n    /// Detect GPUs on Windows via WMI (Win32_VideoController).\n    /// Returns all discrete GPUs found (AMD, NVIDIA, Intel, etc.).\n    fn detect_gpu_windows_info() -> Vec<GpuInfo> {\n        if !cfg!(target_os = \"windows\") {\n            return Vec::new();\n        }\n\n        // Use PowerShell to query WMI — more reliable than wmic (deprecated)\n        if let Ok(output) = std::process::Command::new(\"powershell\")\n            .arg(\"-NoProfile\")\n            .arg(\"-Command\")\n            .arg(\"Get-CimInstance Win32_VideoController | Select-Object Name,AdapterRAM | ForEach-Object { $_.Name + '|' + $_.AdapterRAM }\")\n            .output()\n            && output.status.success()\n                && let Ok(text) = String::from_utf8(output.stdout) {\n                    let gpus = Self::parse_windows_gpu_list(&text);\n                    if !gpus.is_empty() {\n                        return gpus;\n                    }\n                }\n\n        // Fallback to wmic for older Windows\n        Self::detect_gpu_windows_wmic_list()\n    }\n\n    /// Fallback Windows GPU detection via wmic (works on older systems).\n    fn detect_gpu_windows_wmic_list() -> Vec<GpuInfo> {\n        let output = match std::process::Command::new(\"wmic\")\n            .arg(\"path\")\n            .arg(\"win32_VideoController\")\n            .arg(\"get\")\n            .arg(\"Name,AdapterRAM\")\n            .arg(\"/format:csv\")\n            .output()\n        {\n            Ok(o) if o.status.success() => o,\n            _ => return Vec::new(),\n        };\n\n        let text = match String::from_utf8(output.stdout) {\n            Ok(t) => t,\n            Err(_) => return Vec::new(),\n        };\n\n        let mut gpus = Vec::new();\n        // CSV format: Node,AdapterRAM,Name\n        for line in text.lines().skip(1) {\n            let line = line.trim();\n            if line.is_empty() {\n                continue;\n            }\n            let parts: Vec<&str> = line.split(',').collect();\n            if parts.len() >= 3 {\n                let raw_vram: u64 = parts[1].trim().parse().unwrap_or(0);\n                let name = parts[2..].join(\",\").trim().to_string();\n                let lower = name.to_lowercase();\n                if lower.contains(\"microsoft\")\n                    || lower.contains(\"basic\")\n                    || lower.contains(\"virtual\")\n                {\n                    continue;\n                }\n                let backend = Self::infer_gpu_backend(&name);\n                let vram_gb = Self::resolve_wmi_vram(raw_vram, &name);\n                gpus.push(GpuInfo {\n                    name,\n                    vram_gb,\n                    backend,\n                    count: 1,\n                    unified_memory: false,\n                });\n            }\n        }\n        gpus\n    }\n\n    /// Parse all GPU entries from PowerShell output (Name|AdapterRAM per line).\n    fn parse_windows_gpu_list(text: &str) -> Vec<GpuInfo> {\n        let mut gpus = Vec::new();\n        for line in text.lines() {\n            let line = line.trim();\n            if line.is_empty() {\n                continue;\n            }\n            let parts: Vec<&str> = line.splitn(2, '|').collect();\n            let name = parts[0].trim().to_string();\n            let raw_vram: u64 = parts\n                .get(1)\n                .and_then(|v| v.trim().parse().ok())\n                .unwrap_or(0);\n\n            let lower = name.to_lowercase();\n            if lower.contains(\"microsoft\")\n                || lower.contains(\"basic\")\n                || lower.contains(\"virtual\")\n                || lower.is_empty()\n            {\n                continue;\n            }\n\n            let backend = Self::infer_gpu_backend(&name);\n            let vram_gb = Self::resolve_wmi_vram(raw_vram, &name);\n            gpus.push(GpuInfo {\n                name,\n                vram_gb,\n                backend,\n                count: 1,\n                unified_memory: false,\n            });\n        }\n        gpus\n    }\n\n    /// WMI AdapterRAM is a 32-bit field, capped at ~4 GB.\n    /// If reported value is suspiciously low, estimate from GPU name.\n    fn resolve_wmi_vram(raw_bytes: u64, name: &str) -> Option<f64> {\n        let mut vram_gb = raw_bytes as f64 / (1024.0 * 1024.0 * 1024.0);\n        if vram_gb < 0.1 || (vram_gb <= 4.1 && estimate_vram_from_name(name) > 4.1) {\n            let estimated = estimate_vram_from_name(name);\n            if estimated > 0.0 {\n                vram_gb = estimated;\n            }\n        }\n        if vram_gb > 0.0 { Some(vram_gb) } else { None }\n    }\n\n    /// Infer the most likely inference backend from a GPU name string.\n    fn infer_gpu_backend(name: &str) -> GpuBackend {\n        let lower = name.to_lowercase();\n        if lower.contains(\"nvidia\")\n            || lower.contains(\"geforce\")\n            || lower.contains(\"quadro\")\n            || lower.contains(\"tesla\")\n            || lower.contains(\"rtx\")\n        {\n            GpuBackend::Cuda\n        } else if lower.contains(\"amd\") || lower.contains(\"radeon\") || lower.contains(\"ati\") {\n            // On Windows, Vulkan is the primary inference path for AMD GPUs\n            // (ROCm support on Windows is limited)\n            GpuBackend::Vulkan\n        } else if lower.contains(\"intel\") || lower.contains(\"arc\") {\n            GpuBackend::Sycl\n        } else {\n            GpuBackend::Vulkan\n        }\n    }\n\n    /// Detect Intel Arc / Intel integrated GPU via sysfs or lspci.\n    /// Intel Arc GPUs (A370M, A770, etc.) have dedicated VRAM exposed via\n    /// the DRM subsystem at /sys/class/drm/card*/device/. Even integrated\n    /// Intel GPUs that share system RAM are useful for inference via SYCL/oneAPI.\n    fn detect_intel_gpu() -> Option<f64> {\n        // Try sysfs first: works for Intel discrete (Arc) GPUs on Linux.\n        // Walk /sys/class/drm/card*/device/ looking for Intel vendor ID (0x8086).\n        if let Ok(entries) = std::fs::read_dir(\"/sys/class/drm\") {\n            for entry in entries.flatten() {\n                let card_path = entry.path();\n                let device_path = card_path.join(\"device\");\n\n                // Check vendor ID matches Intel (0x8086)\n                let vendor_path = device_path.join(\"vendor\");\n                if let Ok(vendor) = std::fs::read_to_string(&vendor_path)\n                    && vendor.trim() != \"0x8086\"\n                {\n                    continue;\n                }\n\n                // Look for total VRAM via DRM memory info\n                // Intel discrete GPUs expose this under drm/card*/device/mem_info_vram_total\n                let vram_path = card_path.join(\"device/mem_info_vram_total\");\n                if let Ok(vram_str) = std::fs::read_to_string(&vram_path)\n                    && let Ok(vram_bytes) = vram_str.trim().parse::<u64>()\n                    && vram_bytes > 0\n                {\n                    let vram_gb = vram_bytes as f64 / (1024.0 * 1024.0 * 1024.0);\n                    return Some(vram_gb);\n                }\n\n                // For integrated Intel GPUs, check if it's an Arc-class device\n                // by looking for \"Arc\" in the device name via lspci\n                if let Some(text) = Self::lspci_output() {\n                    for line in text.lines() {\n                        let lower = line.to_lowercase();\n                        if lower.contains(\"intel\") && lower.contains(\"arc\") {\n                            // Intel Arc integrated (e.g. Arc Graphics in Meteor Lake)\n                            // These share system RAM; report None for VRAM and\n                            // let the caller know a GPU exists.\n                            return Some(0.0);\n                        }\n                    }\n                }\n            }\n        }\n\n        // Fallback: check lspci directly for Intel Arc devices\n        // (covers cases where sysfs isn't available or card dirs don't exist)\n        if let Some(text) = Self::lspci_output() {\n            for line in text.lines() {\n                let lower = line.to_lowercase();\n                if lower.contains(\"intel\") && lower.contains(\"arc\") {\n                    return Some(0.0);\n                }\n            }\n        }\n\n        None\n    }\n\n    /// Detect Apple Silicon GPU via system_profiler.\n    /// Returns total system RAM as VRAM since memory is unified.\n    /// The unified memory pool capacity is the total RAM -- it doesn't\n    /// fluctuate with current usage the way available RAM does.\n    fn detect_apple_gpu(total_ram_gb: f64) -> Option<f64> {\n        // system_profiler only exists on macOS\n        let output = std::process::Command::new(\"system_profiler\")\n            .arg(\"SPDisplaysDataType\")\n            .output()\n            .ok()?;\n\n        if !output.status.success() {\n            return None;\n        }\n\n        let text = String::from_utf8(output.stdout).ok()?;\n\n        // Apple Silicon GPUs show \"Apple M1/M2/M3/M4\" in the chipset line.\n        // Discrete AMD/Intel GPUs on older Macs won't match.\n        let is_apple_gpu = text.lines().any(|line| {\n            let lower = line.to_lowercase();\n            lower.contains(\"apple m\") || lower.contains(\"apple gpu\")\n        });\n\n        if is_apple_gpu {\n            // Unified memory: GPU and CPU share the same RAM pool.\n            // Report total RAM as the VRAM capacity.\n            Some(total_ram_gb)\n        } else {\n            None\n        }\n    }\n\n    fn has_command(command: &str) -> bool {\n        let Some(path_var) = std::env::var_os(\"PATH\") else {\n            return false;\n        };\n\n        for path in std::env::split_paths(&path_var) {\n            let candidate = path.join(command);\n            if candidate.is_file() {\n                return true;\n            }\n\n            #[cfg(target_os = \"windows\")]\n            for ext in [\".exe\", \".cmd\", \".bat\", \".com\"] {\n                let candidate = path.join(format!(\"{command}{ext}\"));\n                if candidate.is_file() {\n                    return true;\n                }\n            }\n        }\n\n        false\n    }\n\n    /// Detect GPUs via Vulkan. This is especially useful on Android/Termux,\n    /// where vendor-specific Linux utilities may be unavailable.\n    fn detect_vulkan_gpu_info() -> Vec<GpuInfo> {\n        if !Self::has_command(\"vulkaninfo\") {\n            return Vec::new();\n        }\n\n        let output = match std::process::Command::new(\"vulkaninfo\")\n            .arg(\"--summary\")\n            .output()\n        {\n            Ok(o) if o.status.success() => o,\n            _ => match std::process::Command::new(\"vulkaninfo\").output() {\n                Ok(o) if o.status.success() => o,\n                _ => return Vec::new(),\n            },\n        };\n\n        let text = String::from_utf8_lossy(&output.stdout);\n        let mut grouped: BTreeMap<String, u32> = BTreeMap::new();\n\n        for name in Self::parse_vulkan_device_names(&text) {\n            if Self::is_software_vulkan_device(&name) {\n                continue;\n            }\n            *grouped.entry(name).or_insert(0) += 1;\n        }\n\n        grouped\n            .into_iter()\n            .map(|(name, count)| GpuInfo {\n                backend: GpuBackend::Vulkan,\n                count,\n                name,\n                unified_memory: false,\n                vram_gb: None,\n            })\n            .collect()\n    }\n\n    fn is_same_gpu_name(existing_name: &str, candidate_name: &str) -> bool {\n        Self::normalize_gpu_name_for_dedupe(existing_name)\n            == Self::normalize_gpu_name_for_dedupe(candidate_name)\n    }\n\n    fn normalize_gpu_name_for_dedupe(name: &str) -> String {\n        let mut normalized = String::with_capacity(name.len());\n        let mut last_was_separator = true;\n\n        for ch in name.chars().flat_map(char::to_lowercase) {\n            if ch.is_alphanumeric() {\n                normalized.push(ch);\n                last_was_separator = false;\n            } else if !last_was_separator {\n                normalized.push(' ');\n                last_was_separator = true;\n            }\n        }\n\n        normalized.trim().to_string()\n    }\n\n    fn parse_vulkan_device_names(text: &str) -> Vec<String> {\n        let mut names = Vec::new();\n\n        for line in text.lines() {\n            let trimmed = line.trim();\n            if trimmed.is_empty() {\n                continue;\n            }\n\n            if let Some((key, value)) = trimmed.split_once('=')\n                && key.trim().eq_ignore_ascii_case(\"deviceName\")\n            {\n                let name = value.trim();\n                if !name.is_empty() {\n                    names.push(name.to_string());\n                }\n                continue;\n            }\n\n            if let Some(rest) = trimmed.strip_prefix(\"GPU id\")\n                && let Some(start) = rest.find('(')\n                && let Some(end) = rest.rfind(')')\n                && end > start + 1\n            {\n                let name = rest[start + 1..end].trim();\n                if !name.is_empty() {\n                    names.push(name.to_string());\n                }\n            }\n        }\n\n        names\n    }\n\n    fn is_software_vulkan_device(name: &str) -> bool {\n        let lower = name.to_lowercase();\n        lower.contains(\"llvmpipe\")\n            || lower.contains(\"lavapipe\")\n            || lower.contains(\"swiftshader\")\n            || lower.contains(\"software rasterizer\")\n    }\n\n    /// Detect Ascend NPUs via npu-smi. Returns a vector of NPU info.\n    fn detect_ascend_npus() -> Vec<GpuInfo> {\n        // 1. Get the list of IDs\n        let list_output = match std::process::Command::new(\"npu-smi\")\n            .args([\"info\", \"-l\"])\n            .output()\n        {\n            Ok(o) if o.status.success() => o,\n            _ => return Vec::new(),\n        };\n\n        let list_stdout = String::from_utf8_lossy(&list_output.stdout);\n\n        // Extracting IDs: [\"0\", \"1\", \"2\"...]\n        let ids: Vec<String> = list_stdout\n            .lines()\n            .filter(|line| line.contains(\"NPU ID\"))\n            .filter_map(|line| line.split(':').next_back())\n            .map(|s| s.trim().to_string())\n            .collect();\n\n        if ids.is_empty() {\n            return Vec::new();\n        }\n\n        let mut npu_infos: Vec<GpuInfo> = Vec::new();\n        let npu_name = \"Ascend NPU\";\n\n        // 2. Loop through NPUs\n        for id in &ids {\n            let mem_output = std::process::Command::new(\"npu-smi\")\n                .args([\"info\", \"-t\", \"memory\", \"-i\", id])\n                .output();\n\n            if let Ok(o) = mem_output {\n                let s = String::from_utf8_lossy(&o.stdout);\n\n                // Parse HBM Capacity (e.g., from \"HBM Capacity(MB) : 65536\")\n                let mem = s\n                    .lines()\n                    .find(|l| l.contains(\"HBM Capacity\"))\n                    .and_then(|l| l.split(':').next_back())\n                    .and_then(|v| v.split_whitespace().next())\n                    .and_then(|num| num.parse::<u64>().ok())\n                    .unwrap_or(0);\n\n                let npu_info = GpuInfo {\n                    name: npu_name.to_string(),\n                    vram_gb: Some((mem as f64) / 1024.0),\n                    backend: GpuBackend::Ascend,\n                    count: 1,\n                    unified_memory: false,\n                };\n                npu_infos.push(npu_info);\n            }\n        }\n\n        npu_infos\n    }\n\n    /// Fallback for available RAM when sysinfo returns 0.\n    /// Tries total - used first, then macOS vm_stat parsing.\n    fn available_ram_fallback(sys: &System, total_bytes: u64, total_gb: f64) -> f64 {\n        // Try total - used from sysinfo (may also use vm_statistics64 internally)\n        let used = sys.used_memory();\n        if used > 0 && used < total_bytes {\n            return (total_bytes - used) as f64 / (1024.0 * 1024.0 * 1024.0);\n        }\n\n        // macOS fallback: parse vm_stat output\n        if let Some(avail) = Self::available_ram_from_vm_stat() {\n            return avail;\n        }\n\n        // Last resort: assume 80% of total is available (conservative)\n        total_gb * 0.8\n    }\n\n    /// Parse macOS `vm_stat` to compute available memory.\n    /// Available ≈ (free + inactive + purgeable) * page_size\n    fn available_ram_from_vm_stat() -> Option<f64> {\n        let output = std::process::Command::new(\"vm_stat\").output().ok()?;\n        if !output.status.success() {\n            return None;\n        }\n        let text = String::from_utf8(output.stdout).ok()?;\n\n        // First line: \"Mach Virtual Memory Statistics: (page size of NNNNN bytes)\"\n        let page_size: u64 = text\n            .lines()\n            .next()\n            .and_then(|line| {\n                line.split(\"page size of \")\n                    .nth(1)?\n                    .split(' ')\n                    .next()?\n                    .parse()\n                    .ok()\n            })\n            .unwrap_or(16384); // Apple Silicon default is 16 KB pages\n\n        let mut free: u64 = 0;\n        let mut inactive: u64 = 0;\n        let mut purgeable: u64 = 0;\n\n        for line in text.lines() {\n            if let Some(val) = Self::parse_vm_stat_line(line, \"Pages free\") {\n                free = val;\n            } else if let Some(val) = Self::parse_vm_stat_line(line, \"Pages inactive\") {\n                inactive = val;\n            } else if let Some(val) = Self::parse_vm_stat_line(line, \"Pages purgeable\") {\n                purgeable = val;\n            }\n        }\n\n        let available_bytes = (free + inactive + purgeable) * page_size;\n        if available_bytes > 0 {\n            Some(available_bytes as f64 / (1024.0 * 1024.0 * 1024.0))\n        } else {\n            None\n        }\n    }\n\n    /// Parse a single vm_stat line like \"Pages free:    123456.\"\n    fn parse_vm_stat_line(line: &str, key: &str) -> Option<u64> {\n        if !line.starts_with(key) {\n            return None;\n        }\n        line.split(':')\n            .nth(1)?\n            .trim()\n            .trim_end_matches('.')\n            .parse()\n            .ok()\n    }\n\n    fn detect_cpu_name(sys: &System) -> String {\n        if let Some(cpu_name) = sys\n            .cpus()\n            .iter()\n            .map(|cpu| cpu.brand().trim())\n            .find(|brand| !brand.is_empty() && !brand.eq_ignore_ascii_case(\"unknown\"))\n        {\n            return cpu_name.to_string();\n        }\n\n        if let Some(cpu_name) = Self::read_cpu_name_from_proc_cpuinfo() {\n            return cpu_name;\n        }\n\n        if let Some(cpu_name) = Self::read_android_soc_name() {\n            return cpu_name;\n        }\n\n        \"Unknown CPU\".to_string()\n    }\n\n    fn read_cpu_name_from_proc_cpuinfo() -> Option<String> {\n        #[cfg(target_os = \"linux\")]\n        {\n            let text = std::fs::read_to_string(\"/proc/cpuinfo\").ok()?;\n            return Self::parse_cpu_name_from_cpuinfo(&text);\n        }\n\n        #[cfg(not(target_os = \"linux\"))]\n        {\n            None\n        }\n    }\n\n    fn parse_cpu_name_from_cpuinfo(text: &str) -> Option<String> {\n        for key in [\"model name\", \"hardware\", \"processor\", \"cpu model\", \"model\"] {\n            for line in text.lines() {\n                let Some((lhs, rhs)) = line.split_once(':') else {\n                    continue;\n                };\n                if lhs.trim().eq_ignore_ascii_case(key) {\n                    let candidate = rhs.trim();\n                    if !candidate.is_empty() && !candidate.eq_ignore_ascii_case(\"unknown\") {\n                        return Some(candidate.to_string());\n                    }\n                }\n            }\n        }\n\n        None\n    }\n\n    fn read_android_soc_name() -> Option<String> {\n        #[cfg(target_os = \"linux\")]\n        {\n            let output = std::process::Command::new(\"getprop\")\n                .arg(\"ro.soc.model\")\n                .output()\n                .ok()?;\n            if !output.status.success() {\n                return None;\n            }\n\n            let model = String::from_utf8(output.stdout).ok()?;\n            let model = model.trim();\n            if model.is_empty() {\n                return None;\n            }\n\n            return Some(model.to_string());\n        }\n\n        #[cfg(not(target_os = \"linux\"))]\n        {\n            None\n        }\n    }\n\n    /// Override the primary GPU's VRAM with a user-specified value (in GB).\n    /// This is used by the `--memory` CLI flag when GPU autodetection fails.\n    /// If no GPU was detected, this creates a synthetic GPU entry.\n    pub fn with_gpu_memory_override(mut self, vram_gb: f64) -> Self {\n        if self.gpus.is_empty() {\n            // No GPU was detected; create a synthetic one.\n            let backend = if cfg!(target_arch = \"aarch64\")\n                || self.cpu_name.to_lowercase().contains(\"apple\")\n            {\n                GpuBackend::Metal\n            } else {\n                GpuBackend::Cuda\n            };\n            self.gpus.push(GpuInfo {\n                name: \"User-specified GPU\".to_string(),\n                vram_gb: Some(vram_gb),\n                backend,\n                count: 1,\n                unified_memory: false,\n            });\n            self.has_gpu = true;\n            self.gpu_vram_gb = Some(vram_gb);\n            self.total_gpu_vram_gb = Some(vram_gb);\n            self.gpu_name = Some(\"User-specified GPU\".to_string());\n            self.gpu_count = 1;\n            self.backend = backend;\n        } else {\n            // Override the primary (first) GPU's VRAM.\n            self.gpus[0].vram_gb = Some(vram_gb);\n            self.gpu_vram_gb = Some(vram_gb);\n            // Update total VRAM: per-card VRAM * count.\n            let count = self.gpus[0].count;\n            self.total_gpu_vram_gb = Some(vram_gb * count as f64);\n            self.has_gpu = true;\n        }\n        self\n    }\n\n    pub fn display(&self) {\n        println!(\"\\n=== System Specifications ===\");\n        println!(\"CPU: {} ({} cores)\", self.cpu_name, self.total_cpu_cores);\n        println!(\"Total RAM: {:.2} GB\", self.total_ram_gb);\n        println!(\"Available RAM: {:.2} GB\", self.available_ram_gb);\n        println!(\"Backend: {}\", self.backend.label());\n\n        if self.gpus.is_empty() {\n            println!(\"GPU: Not detected\");\n        } else {\n            for (i, gpu) in self.gpus.iter().enumerate() {\n                let prefix = if self.gpus.len() > 1 {\n                    format!(\"GPU {}: \", i + 1)\n                } else {\n                    \"GPU: \".to_string()\n                };\n                if gpu.unified_memory {\n                    println!(\n                        \"{}{} (unified memory, {:.2} GB shared, {})\",\n                        prefix,\n                        gpu.name,\n                        gpu.vram_gb.unwrap_or(0.0),\n                        gpu.backend.label(),\n                    );\n                } else {\n                    match gpu.vram_gb {\n                        Some(vram) if vram > 0.0 => {\n                            if gpu.count > 1 {\n                                let total_vram = vram * gpu.count as f64;\n                                println!(\n                                    \"{}{} x{} ({:.2} GB VRAM each = {:.0} GB total, {})\",\n                                    prefix,\n                                    gpu.name,\n                                    gpu.count,\n                                    vram,\n                                    total_vram,\n                                    gpu.backend.label()\n                                );\n                            } else {\n                                println!(\n                                    \"{}{} ({:.2} GB VRAM, {})\",\n                                    prefix,\n                                    gpu.name,\n                                    vram,\n                                    gpu.backend.label()\n                                );\n                            }\n                        }\n                        Some(_) => println!(\n                            \"{}{} (shared system memory, {})\",\n                            prefix,\n                            gpu.name,\n                            gpu.backend.label()\n                        ),\n                        None => println!(\n                            \"{}{} (VRAM unknown, {})\",\n                            prefix,\n                            gpu.name,\n                            gpu.backend.label()\n                        ),\n                    }\n                }\n            }\n        }\n        println!();\n    }\n}\n\n/// Parse a human-readable memory size string into gigabytes.\n/// Accepts formats: \"32G\", \"32g\", \"32GB\", \"32gb\", \"32000M\", \"32000m\", \"32000MB\", etc.\n/// Returns `None` if the input is malformed.\npub fn parse_memory_size(s: &str) -> Option<f64> {\n    let s = s.trim();\n    if s.is_empty() {\n        return None;\n    }\n\n    // Split into numeric part and suffix\n    let num_end = s\n        .find(|c: char| !c.is_ascii_digit() && c != '.')\n        .unwrap_or(s.len());\n    let (num_str, suffix) = s.split_at(num_end);\n    let value: f64 = num_str.parse().ok()?;\n    if value < 0.0 {\n        return None;\n    }\n\n    let suffix = suffix.trim().to_lowercase();\n    match suffix.as_str() {\n        \"g\" | \"gb\" | \"gib\" | \"\" => Some(value),     // already in GB\n        \"m\" | \"mb\" | \"mib\" => Some(value / 1024.0), // MB → GB\n        \"t\" | \"tb\" | \"tib\" => Some(value * 1024.0), // TB → GB\n        _ => None,\n    }\n}\n\npub fn is_running_in_wsl() -> bool {\n    static IS_WSL: std::sync::OnceLock<bool> = std::sync::OnceLock::new();\n    *IS_WSL.get_or_init(detect_running_in_wsl)\n}\n\nfn detect_running_in_wsl() -> bool {\n    if !cfg!(target_os = \"linux\") {\n        return false;\n    }\n\n    if std::env::var_os(\"WSL_INTEROP\").is_some() || std::env::var_os(\"WSL_DISTRO_NAME\").is_some() {\n        return true;\n    }\n\n    [\"/proc/sys/kernel/osrelease\", \"/proc/version\"]\n        .iter()\n        .any(|path| {\n            std::fs::read_to_string(path)\n                .map(|text| text.to_ascii_lowercase().contains(\"microsoft\"))\n                .unwrap_or(false)\n        })\n}\n\n/// Check if the CPU name indicates an AMD APU with unified memory architecture.\n/// These APUs share the full system RAM between CPU and GPU (like Apple Silicon).\n/// Currently covers:\n///  - Ryzen AI MAX / MAX+ (Strix Halo): up to 128 GB unified.\n///  - Ryzen AI 9 / 7 / 5 (Strix Point, Krackan Point): configurable shared\n///    memory, users can allocate most of system RAM to GPU via BIOS.\n/// All Ryzen AI APUs have integrated Radeon GPUs that share system memory.\nfn is_amd_unified_memory_apu(cpu_name: &str) -> bool {\n    let lower = cpu_name.to_lowercase();\n    // All \"Ryzen AI\" branded APUs use unified/shared memory.\n    // Examples:\n    //   \"AMD Ryzen AI MAX+ 395 w/ Radeon 8060S\"\n    //   \"AMD Ryzen AI 9 HX 370 w/ Radeon 890M\"\n    //   \"AMD Ryzen AI 7 350\"\n    if lower.contains(\"ryzen ai\") {\n        return true;\n    }\n    false\n}\n\n/// Read total system RAM from /proc/meminfo (Linux only).\n/// Used as the unified memory pool on NVIDIA Tegra / Grace Blackwell platforms\n/// where nvidia-smi cannot report dedicated VRAM.\nfn read_proc_meminfo_total_gb() -> Option<f64> {\n    let text = std::fs::read_to_string(\"/proc/meminfo\").ok()?;\n    for line in text.lines() {\n        if let Some(rest) = line.strip_prefix(\"MemTotal:\") {\n            let kb: u64 = rest.split_whitespace().next()?.parse().ok()?;\n            return Some(kb as f64 / (1024.0 * 1024.0));\n        }\n    }\n    None\n}\n\n/// Estimate GPU memory bandwidth in GB/s from the GPU model name.\n///\n/// Token generation in LLM inference is memory-bandwidth-bound (each token\n/// requires reading the full model weights once). Using per-GPU bandwidth\n/// produces significantly more accurate tok/s estimates than a single\n/// constant for all CUDA/ROCm/Metal devices.\n///\n/// References:\n///  - kipply, \"Transformer Inference Arithmetic\" (2022)\n///  - ggerganov, llama.cpp Apple Silicon benchmarks (Discussion #4167)\n///  - Google, \"Efficiently Scaling Transformer Inference\" (arXiv:2211.05102)\n///  - ggerganov, llama.cpp NVIDIA T4 benchmarks (Discussion #4225)\n///\n/// Returns `None` when the GPU is not recognized; callers should fall back\n/// to the existing fixed-constant approach.\npub fn gpu_memory_bandwidth_gbps(name: &str) -> Option<f64> {\n    let lower = name.to_lowercase();\n\n    // ── NVIDIA Consumer (GeForce) ──────────────────────────────────\n    // RTX 50 series (Blackwell)\n    if lower.contains(\"5090\") {\n        return Some(1792.0);\n    }\n    if lower.contains(\"5080\") {\n        return Some(960.0);\n    }\n    if lower.contains(\"5070 ti\") {\n        return Some(896.0);\n    }\n    if lower.contains(\"5070\") {\n        return Some(672.0);\n    }\n    if lower.contains(\"5060 ti\") {\n        return Some(448.0);\n    }\n    if lower.contains(\"5060\") {\n        return Some(256.0);\n    }\n\n    // RTX 40 series (Ada Lovelace)\n    if lower.contains(\"4090\") {\n        return Some(1008.0);\n    }\n    if lower.contains(\"4080 super\") {\n        return Some(736.0);\n    }\n    if lower.contains(\"4080\") {\n        return Some(717.0);\n    }\n    if lower.contains(\"4070 ti super\") {\n        return Some(672.0);\n    }\n    if lower.contains(\"4070 ti\") {\n        return Some(504.0);\n    }\n    if lower.contains(\"4070 super\") {\n        return Some(504.0);\n    }\n    if lower.contains(\"4070\") {\n        return Some(504.0);\n    }\n    if lower.contains(\"4060 ti\") {\n        return Some(288.0);\n    }\n    if lower.contains(\"4060\") {\n        return Some(272.0);\n    }\n\n    // RTX 30 series (Ampere)\n    if lower.contains(\"3090 ti\") {\n        return Some(1008.0);\n    }\n    if lower.contains(\"3090\") {\n        return Some(936.0);\n    }\n    if lower.contains(\"3080 ti\") {\n        return Some(912.0);\n    }\n    if lower.contains(\"3080\") {\n        return Some(760.0);\n    }\n    if lower.contains(\"3070 ti\") {\n        return Some(608.0);\n    }\n    if lower.contains(\"3070\") {\n        return Some(448.0);\n    }\n    if lower.contains(\"3060 ti\") {\n        return Some(448.0);\n    }\n    if lower.contains(\"3060\") {\n        return Some(360.0);\n    }\n\n    // RTX 20 series (Turing)\n    if lower.contains(\"2080 ti\") {\n        return Some(616.0);\n    }\n    if lower.contains(\"2080 super\") {\n        return Some(496.0);\n    }\n    if lower.contains(\"2080\") {\n        return Some(448.0);\n    }\n    if lower.contains(\"2070 super\") {\n        return Some(448.0);\n    }\n    if lower.contains(\"2070\") {\n        return Some(448.0);\n    }\n    if lower.contains(\"2060 super\") {\n        return Some(448.0);\n    }\n    if lower.contains(\"2060\") {\n        return Some(336.0);\n    }\n\n    // GTX 16 series (Turing, no RT cores)\n    if lower.contains(\"1660 ti\") {\n        return Some(288.0);\n    }\n    if lower.contains(\"1660 super\") {\n        return Some(336.0);\n    }\n    if lower.contains(\"1660\") {\n        return Some(192.0);\n    }\n    if lower.contains(\"1650 super\") {\n        return Some(192.0);\n    }\n    if lower.contains(\"1650\") {\n        return Some(128.0);\n    }\n\n    // ── NVIDIA Data Center / Professional ──────────────────────────\n    if lower.contains(\"h100 sxm\") {\n        return Some(3350.0);\n    }\n    if lower.contains(\"h100\") {\n        return Some(2039.0);\n    } // PCIe\n    if lower.contains(\"h200\") {\n        return Some(4800.0);\n    }\n    if lower.contains(\"a100 sxm\") {\n        return Some(2039.0);\n    }\n    if lower.contains(\"a100\") {\n        return Some(1555.0);\n    } // PCIe 40GB\n    if lower.contains(\"l40s\") {\n        return Some(864.0);\n    }\n    if lower.contains(\"l40\") {\n        return Some(864.0);\n    }\n    if lower.contains(\"l4\") {\n        return Some(300.0);\n    }\n    if lower.contains(\"a10g\") {\n        return Some(600.0);\n    }\n    if lower.contains(\"a10\") {\n        return Some(600.0);\n    }\n    if lower.contains(\"t4\") {\n        return Some(320.0);\n    }\n    if lower.contains(\"v100 sxm\") {\n        return Some(900.0);\n    }\n    if lower.contains(\"v100\") {\n        return Some(897.0);\n    }\n    if lower.contains(\"a6000\") {\n        return Some(768.0);\n    }\n    if lower.contains(\"a5000\") {\n        return Some(768.0);\n    }\n    if lower.contains(\"a4000\") {\n        return Some(448.0);\n    }\n\n    // ── AMD Discrete (RDNA) ────────────────────────────────────────\n    // RX 9000 series (RDNA 4)\n    if lower.contains(\"9070 xt\") {\n        return Some(624.0);\n    }\n    if lower.contains(\"9070\") {\n        return Some(488.0);\n    }\n\n    // RX 7000 series (RDNA 3)\n    if lower.contains(\"7900 xtx\") {\n        return Some(960.0);\n    }\n    if lower.contains(\"7900 xt\") {\n        return Some(800.0);\n    }\n    if lower.contains(\"7900 gre\") {\n        return Some(576.0);\n    }\n    if lower.contains(\"7800 xt\") {\n        return Some(624.0);\n    }\n    if lower.contains(\"7700 xt\") {\n        return Some(432.0);\n    }\n    if lower.contains(\"7600\") {\n        return Some(288.0);\n    }\n\n    // RX 6000 series (RDNA 2)\n    if lower.contains(\"6950 xt\") {\n        return Some(576.0);\n    }\n    if lower.contains(\"6900 xt\") {\n        return Some(512.0);\n    }\n    if lower.contains(\"6800 xt\") {\n        return Some(512.0);\n    }\n    if lower.contains(\"6800\") {\n        return Some(512.0);\n    }\n    if lower.contains(\"6700 xt\") {\n        return Some(384.0);\n    }\n    if lower.contains(\"6600 xt\") {\n        return Some(256.0);\n    }\n    if lower.contains(\"6600\") {\n        return Some(224.0);\n    }\n\n    // AMD data center (CDNA)\n    if lower.contains(\"mi300x\") {\n        return Some(5300.0);\n    }\n    if lower.contains(\"mi300\") {\n        return Some(5300.0);\n    }\n    if lower.contains(\"mi250x\") {\n        return Some(3277.0);\n    }\n    if lower.contains(\"mi250\") {\n        return Some(3277.0);\n    }\n    if lower.contains(\"mi210\") {\n        return Some(1638.0);\n    }\n    if lower.contains(\"mi100\") {\n        return Some(1229.0);\n    }\n\n    // ── Apple Silicon (unified memory bandwidth) ───────────────────\n    if lower.contains(\"m4 ultra\") {\n        return Some(819.0);\n    }\n    if lower.contains(\"m4 max\") {\n        return Some(546.0);\n    }\n    if lower.contains(\"m4 pro\") {\n        return Some(273.0);\n    }\n    if lower.contains(\"m4\") {\n        return Some(120.0);\n    }\n    if lower.contains(\"m3 ultra\") {\n        return Some(800.0);\n    }\n    if lower.contains(\"m3 max\") {\n        return Some(400.0);\n    }\n    if lower.contains(\"m3 pro\") {\n        return Some(150.0);\n    }\n    if lower.contains(\"m3\") {\n        return Some(100.0);\n    }\n    if lower.contains(\"m2 ultra\") {\n        return Some(800.0);\n    }\n    if lower.contains(\"m2 max\") {\n        return Some(400.0);\n    }\n    if lower.contains(\"m2 pro\") {\n        return Some(200.0);\n    }\n    if lower.contains(\"m2\") {\n        return Some(100.0);\n    }\n    if lower.contains(\"m1 ultra\") {\n        return Some(800.0);\n    }\n    if lower.contains(\"m1 max\") {\n        return Some(400.0);\n    }\n    if lower.contains(\"m1 pro\") {\n        return Some(200.0);\n    }\n    if lower.contains(\"m1\") {\n        return Some(68.0);\n    }\n\n    None\n}\n\n/// Returns the NVIDIA compute capability (major, minor) for a known GPU name.\n/// Used to determine compatibility with quantization formats that require\n/// specific hardware features (e.g. AWQ requires Turing+ / cc >= 7.5).\n///\n/// Returns `None` for non-NVIDIA GPUs or unrecognized models.\npub fn gpu_compute_capability(name: &str) -> Option<(u8, u8)> {\n    let lower = name.to_lowercase();\n\n    // ── Blackwell (RTX 50xx, B100/B200) ──────────────────────────\n    if lower.contains(\"5090\")\n        || lower.contains(\"5080\")\n        || lower.contains(\"5070\")\n        || lower.contains(\"5060\")\n        || lower.contains(\"b200\")\n        || lower.contains(\"b100\")\n        || lower.contains(\"gb200\")\n        || lower.contains(\"gb100\")\n    {\n        return Some((10, 0));\n    }\n\n    // ── Hopper (H100, H200) ─────────────────────────────────────\n    if lower.contains(\"h100\") || lower.contains(\"h200\") {\n        return Some((9, 0));\n    }\n\n    // ── Ada Lovelace (RTX 40xx, L4, L40/L40S) ──────────────────\n    if lower.contains(\"4090\")\n        || lower.contains(\"4080\")\n        || lower.contains(\"4070\")\n        || lower.contains(\"4060\")\n        || lower.contains(\"l40\")\n        || lower.contains(\"l4\")\n    {\n        return Some((8, 9));\n    }\n\n    // ── Ampere (RTX 30xx consumer = 8.6, A100/A10/A6000 = 8.0) ─\n    if lower.contains(\"a100\") {\n        return Some((8, 0));\n    }\n    if lower.contains(\"3090\")\n        || lower.contains(\"3080\")\n        || lower.contains(\"3070\")\n        || lower.contains(\"3060\")\n        || lower.contains(\"a10\")\n        || lower.contains(\"a6000\")\n        || lower.contains(\"a5000\")\n        || lower.contains(\"a4000\")\n        || lower.contains(\"a2000\")\n        || lower.contains(\"a16\")\n    {\n        return Some((8, 6));\n    }\n\n    // ── Turing (RTX 20xx, GTX 16xx, T4) ─────────────────────────\n    if lower.contains(\"2080\")\n        || lower.contains(\"2070\")\n        || lower.contains(\"2060\")\n        || lower.contains(\"1660\")\n        || lower.contains(\"1650\")\n        || lower.contains(\"t4\")\n    {\n        return Some((7, 5));\n    }\n\n    // ── Volta (V100, Titan V) ───────────────────────────────────\n    if lower.contains(\"v100\") || lower.contains(\"titan v\") {\n        return Some((7, 0));\n    }\n\n    // ── Pascal (P100, GTX 10xx, Titan X Pascal) ─────────────────\n    if lower.contains(\"p100\")\n        || lower.contains(\"1080\")\n        || lower.contains(\"1070\")\n        || lower.contains(\"1060\")\n        || lower.contains(\"1050\")\n        || lower.contains(\"p40\")\n        || lower.contains(\"p4\")\n    {\n        return Some((6, 1));\n    }\n\n    None\n}\n\n/// Minimum NVIDIA compute capability required by a quantization format\n/// when running under vLLM. Based on vLLM's documented hardware support:\n/// <https://docs.vllm.ai/en/latest/features/quantization/#supported-hardware>\n///\n/// Returns `None` for quantization formats that have no known CC restriction\n/// (e.g. GGUF quants which run through llama.cpp, not vLLM).\npub fn quant_min_compute_capability(quantization: &str) -> Option<(u8, u8)> {\n    match quantization {\n        // AWQ requires Turing+ (int4 tensor-core kernels)\n        \"AWQ-4bit\" | \"AWQ-8bit\" => Some((7, 5)),\n        // GPTQ Marlin kernels require Turing+\n        \"GPTQ-Int4\" | \"GPTQ-Int8\" => Some((7, 5)),\n        _ => None,\n    }\n}\n\n/// Fallback VRAM estimation from GPU model name.\n/// Used when nvidia-smi or other tools report 0 VRAM.\nfn estimate_vram_from_name(name: &str) -> f64 {\n    let lower = name.to_lowercase();\n    // NVIDIA RTX 50 series\n    if lower.contains(\"5090\") {\n        return 32.0;\n    }\n    if lower.contains(\"5080\") {\n        return 16.0;\n    }\n    if lower.contains(\"5070 ti\") {\n        return 16.0;\n    }\n    if lower.contains(\"5070\") {\n        return 12.0;\n    }\n    if lower.contains(\"5060 ti\") {\n        return 16.0;\n    }\n    if lower.contains(\"5060\") {\n        return 8.0;\n    }\n    // NVIDIA RTX 40 series\n    if lower.contains(\"4090\") {\n        return 24.0;\n    }\n    if lower.contains(\"4080\") {\n        return 16.0;\n    }\n    if lower.contains(\"4070 ti\") {\n        return 12.0;\n    }\n    if lower.contains(\"4070\") {\n        return 12.0;\n    }\n    if lower.contains(\"4060 ti\") {\n        return 16.0;\n    }\n    if lower.contains(\"4060\") {\n        return 8.0;\n    }\n    // NVIDIA RTX 30 series\n    if lower.contains(\"3090\") {\n        return 24.0;\n    }\n    if lower.contains(\"3080 ti\") {\n        return 12.0;\n    }\n    if lower.contains(\"3080\") {\n        return 10.0;\n    }\n    if lower.contains(\"3070\") {\n        return 8.0;\n    }\n    if lower.contains(\"3060 ti\") {\n        return 8.0;\n    }\n    if lower.contains(\"3060\") {\n        return 12.0;\n    }\n    // Data center / professional\n    if lower.contains(\"h100\") {\n        return 80.0;\n    }\n    if lower.contains(\"a100\") {\n        return 80.0;\n    }\n    if lower.contains(\"l40\") {\n        return 48.0;\n    }\n    // NVIDIA RTX professional (Ampere) — must be checked before the broad \"a10\" match\n    if lower.contains(\"a6000\") {\n        return 48.0;\n    }\n    if lower.contains(\"a5500\") {\n        return 24.0;\n    }\n    if lower.contains(\"a5000\") {\n        return 24.0;\n    }\n    if lower.contains(\"a4500\") {\n        return 20.0;\n    }\n    if lower.contains(\"a4000\") {\n        return 16.0;\n    }\n    if lower.contains(\"a2000\") {\n        return 12.0;\n    }\n    if lower.contains(\"a10\") {\n        return 24.0;\n    }\n    if lower.contains(\"t4\") {\n        return 16.0;\n    }\n    // NVIDIA Grace / DGX Spark unified memory SoCs\n    if lower.contains(\"gb10\") {\n        return 128.0;\n    }\n    if lower.contains(\"gb20\") {\n        return 128.0;\n    }\n    // AMD RX 9000 series (RDNA 4)\n    if lower.contains(\"9070 xt\") {\n        return 16.0;\n    }\n    if lower.contains(\"9070\") {\n        return 12.0;\n    }\n    if lower.contains(\"9060 xt\") {\n        return 16.0;\n    }\n    if lower.contains(\"9060\") {\n        return 8.0;\n    }\n    // AMD RX 7000 series\n    if lower.contains(\"7900 xtx\") {\n        return 24.0;\n    }\n    if lower.contains(\"7900\") {\n        return 20.0;\n    }\n    if lower.contains(\"7800\") {\n        return 16.0;\n    }\n    if lower.contains(\"7700\") {\n        return 12.0;\n    }\n    if lower.contains(\"7600\") {\n        return 8.0;\n    }\n    // AMD RX 6000 series\n    if lower.contains(\"6950\") {\n        return 16.0;\n    }\n    if lower.contains(\"6900\") {\n        return 16.0;\n    }\n    if lower.contains(\"6800\") {\n        return 16.0;\n    }\n    if lower.contains(\"6750\") {\n        return 12.0;\n    }\n    if lower.contains(\"6700\") {\n        return 12.0;\n    }\n    if lower.contains(\"6650\") {\n        return 8.0;\n    }\n    if lower.contains(\"6600\") {\n        return 8.0;\n    }\n    if lower.contains(\"6500\") {\n        return 4.0;\n    }\n    // AMD RX 5000 series\n    if lower.contains(\"5700 xt\") {\n        return 8.0;\n    }\n    if lower.contains(\"5700\") {\n        return 8.0;\n    }\n    if lower.contains(\"5600\") {\n        return 6.0;\n    }\n    if lower.contains(\"5500\") {\n        return 4.0;\n    }\n    // AMD Radeon 8000 series (Ryzen AI MAX / Strix Halo integrated)\n    // These are unified memory APUs; VRAM = system RAM in practice,\n    // but this fallback gives a reasonable discrete estimate for name-only detection.\n    if lower.contains(\"8060s\") {\n        return 32.0;\n    }\n    if lower.contains(\"8050s\") {\n        return 24.0;\n    }\n    if lower.contains(\"8060\") && !lower.contains(\"8060s\") {\n        return 16.0;\n    }\n    if lower.contains(\"8050\") && !lower.contains(\"8050s\") {\n        return 12.0;\n    }\n    // AMD Radeon 800M series (Ryzen AI 9 / Strix Point integrated)\n    if lower.contains(\"890m\") {\n        return 16.0;\n    }\n    if lower.contains(\"880m\") {\n        return 12.0;\n    }\n    if lower.contains(\"870m\") {\n        return 8.0;\n    }\n    if lower.contains(\"860m\") {\n        return 8.0;\n    }\n\n    // Integrated GPUs (APU iGPUs) — must check before generic fallbacks\n    // APU names like \"AMD Radeon(TM) Graphics\" or \"Radeon Graphics\" without\n    // a discrete model number (RX/HD/R5/R7/R9) have very limited dedicated VRAM.\n    if (lower.contains(\"radeon\") || lower.contains(\"amd\"))\n        && !lower.contains(\"rx \")\n        && !lower.contains(\"hd \")\n        && !lower.contains(\" r5 \")\n        && !lower.contains(\" r7 \")\n        && !lower.contains(\" r9 \")\n        && !lower.contains(\"8060\")\n        && !lower.contains(\"8050\")\n        && (lower.contains(\"graphics\") || lower.contains(\"igpu\"))\n    {\n        return 0.5;\n    }\n\n    // Generic fallbacks\n    if lower.contains(\"rtx\") {\n        return 8.0;\n    }\n    if lower.contains(\"gtx\") {\n        return 4.0;\n    }\n    if lower.contains(\"rx \") || lower.contains(\"radeon\") {\n        return 8.0;\n    }\n    0.0\n}\n\n#[cfg(test)]\nmod tests {\n    use super::SystemSpecs;\n\n    #[test]\n    fn test_parse_nvidia_smi_does_not_sum_multi_gpu_vram() {\n        let text = \"24564, NVIDIA GeForce RTX 4090\\n24564, NVIDIA GeForce RTX 4090\\n\";\n        let gpus = SystemSpecs::parse_nvidia_smi_list(text);\n\n        assert_eq!(gpus.len(), 1);\n        assert_eq!(gpus[0].count, 2);\n        let vram = gpus[0]\n            .vram_gb\n            .expect(\"VRAM should be parsed for RTX 4090 entries\");\n        // 24564 MiB ~= 23.99 GiB; must stay single-card VRAM, not 2x summed.\n        assert!(vram > 23.0 && vram < 25.0, \"unexpected VRAM value: {vram}\");\n    }\n\n    #[test]\n    fn test_parse_nvidia_smi_keeps_distinct_models() {\n        let text = \"24564, NVIDIA GeForce RTX 4090\\n16376, NVIDIA GeForce RTX 4080\\n\";\n        let gpus = SystemSpecs::parse_nvidia_smi_list(text);\n\n        assert_eq!(gpus.len(), 2);\n        assert!(gpus.iter().any(|g| g.name.contains(\"4090\") && g.count == 1));\n        assert!(gpus.iter().any(|g| g.name.contains(\"4080\") && g.count == 1));\n    }\n\n    #[test]\n    fn test_parse_nvidia_smi_gb10_gets_vram_estimate() {\n        // DGX Spark reports GB10 with 0 VRAM from nvidia-smi\n        let text = \"0, NVIDIA GB10\\n\";\n        let gpus = SystemSpecs::parse_nvidia_smi_list(text);\n\n        assert_eq!(gpus.len(), 1);\n        assert!(gpus[0].name.contains(\"GB10\"));\n        // estimate_vram_from_name should kick in and return 128GB\n        let vram = gpus[0].vram_gb.expect(\"GB10 should have estimated VRAM\");\n        assert!(vram > 100.0, \"GB10 VRAM should be ~128GB, got {vram}\");\n    }\n\n    #[test]\n    fn test_estimate_vram_gb10() {\n        assert_eq!(super::estimate_vram_from_name(\"NVIDIA GB10\"), 128.0);\n        assert_eq!(super::estimate_vram_from_name(\"NVIDIA GB20\"), 128.0);\n    }\n\n    #[test]\n    fn test_estimate_vram_rtx_professional() {\n        assert_eq!(super::estimate_vram_from_name(\"NVIDIA RTX A6000\"), 48.0);\n        assert_eq!(super::estimate_vram_from_name(\"NVIDIA RTX A5500\"), 24.0);\n        assert_eq!(super::estimate_vram_from_name(\"NVIDIA RTX A5000\"), 24.0);\n        assert_eq!(super::estimate_vram_from_name(\"NVIDIA RTX A4500\"), 20.0);\n        assert_eq!(super::estimate_vram_from_name(\"NVIDIA RTX A4000\"), 16.0);\n        assert_eq!(super::estimate_vram_from_name(\"NVIDIA RTX A2000\"), 12.0);\n    }\n\n    #[test]\n    fn test_parse_extended_discrete_gpu_not_unified() {\n        // Discrete GPU: addressing_mode is \"None\", VRAM is reported normally\n        let text = \"None, 24564, NVIDIA GeForce RTX 4090\\n\";\n        let gpus = SystemSpecs::parse_nvidia_smi_extended(text);\n\n        assert_eq!(gpus.len(), 1);\n        assert_eq!(gpus[0].name, \"NVIDIA GeForce RTX 4090\");\n        assert!(\n            !gpus[0].unified_memory,\n            \"discrete GPU should not be unified\"\n        );\n        let vram = gpus[0].vram_gb.expect(\"VRAM should be present\");\n        assert!(vram > 23.0 && vram < 25.0, \"unexpected VRAM: {vram}\");\n    }\n\n    #[test]\n    fn test_parse_extended_tegra_unified_memory() {\n        // NVIDIA Tegra / Grace Blackwell: ATS addressing, VRAM is [N/A]\n        // On a real system, /proc/meminfo would provide the fallback.\n        // In tests, /proc/meminfo may or may not exist.\n        let text = \"ATS, [N/A], NVIDIA Thor\\n\";\n        let gpus = SystemSpecs::parse_nvidia_smi_extended(text);\n\n        assert_eq!(gpus.len(), 1);\n        assert_eq!(gpus[0].name, \"NVIDIA Thor\");\n        assert!(gpus[0].unified_memory, \"ATS should set unified_memory=true\");\n        // VRAM comes from /proc/meminfo; if unavailable, it's None\n        // (on Linux test machines it will be Some, on macOS CI it will be None)\n    }\n\n    #[test]\n    fn test_parse_extended_multi_gpu_discrete() {\n        // Two discrete GPUs, no unified memory\n        let text = \"None, 24564, NVIDIA GeForce RTX 4090\\nNone, 24564, NVIDIA GeForce RTX 4090\\n\";\n        let gpus = SystemSpecs::parse_nvidia_smi_extended(text);\n\n        assert_eq!(gpus.len(), 1);\n        assert_eq!(gpus[0].count, 2);\n        assert!(!gpus[0].unified_memory);\n    }\n\n    #[test]\n    fn test_gpu_bandwidth_known_gpus() {\n        // Spot-check a few well-known GPUs\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 4090\"),\n            Some(1008.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 3060\"),\n            Some(360.0)\n        );\n        assert_eq!(super::gpu_memory_bandwidth_gbps(\"Tesla T4\"), Some(320.0));\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA H100 SXM\"),\n            Some(3350.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA A100\"),\n            Some(1555.0)\n        );\n    }\n\n    #[test]\n    fn test_gpu_bandwidth_apple_silicon() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"Apple M1 Max\"),\n            Some(400.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"Apple M4 Pro\"),\n            Some(273.0)\n        );\n    }\n\n    #[test]\n    fn test_gpu_bandwidth_unknown_returns_none() {\n        assert_eq!(super::gpu_memory_bandwidth_gbps(\"Some Random GPU\"), None);\n        assert_eq!(super::gpu_memory_bandwidth_gbps(\"\"), None);\n    }\n\n    #[test]\n    fn test_gpu_bandwidth_amd() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Radeon RX 7900 XTX\"),\n            Some(960.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Instinct MI300X\"),\n            Some(5300.0)\n        );\n    }\n\n    #[test]\n    fn test_parse_cpu_name_from_cpuinfo_prefers_model_name() {\n        let cpuinfo = \"\\\nprocessor   : 0\nmodel name  : Qualcomm Kryo 680\nHardware    : Qualcomm Technologies, Inc SM8350\n\";\n        assert_eq!(\n            SystemSpecs::parse_cpu_name_from_cpuinfo(cpuinfo),\n            Some(\"Qualcomm Kryo 680\".to_string())\n        );\n    }\n\n    #[test]\n    fn test_parse_cpu_name_from_cpuinfo_uses_hardware_fallback() {\n        let cpuinfo = \"\\\nprocessor   : 0\nHardware    : Qualcomm Technologies, Inc SM8650\n\";\n        assert_eq!(\n            SystemSpecs::parse_cpu_name_from_cpuinfo(cpuinfo),\n            Some(\"Qualcomm Technologies, Inc SM8650\".to_string())\n        );\n    }\n\n    #[test]\n    fn test_parse_vulkan_device_names_from_summary_output() {\n        let text = \"\\\nGPU0:\ndeviceName         = Adreno (TM) 740\nGPU1:\ndeviceName         = llvmpipe (LLVM 17.0.0, 256 bits)\n\";\n        let names = SystemSpecs::parse_vulkan_device_names(text);\n        assert_eq!(\n            names,\n            vec![\n                \"Adreno (TM) 740\".to_string(),\n                \"llvmpipe (LLVM 17.0.0, 256 bits)\".to_string()\n            ]\n        );\n    }\n\n    #[test]\n    fn test_parse_vulkan_device_names_from_gpu_id_lines() {\n        let text = \"\\\nGPU id = 0 (Adreno (TM) 740)\nGPU id = 1 (NVIDIA GeForce RTX 4090)\n\";\n        let names = SystemSpecs::parse_vulkan_device_names(text);\n        assert_eq!(\n            names,\n            vec![\n                \"Adreno (TM) 740\".to_string(),\n                \"NVIDIA GeForce RTX 4090\".to_string()\n            ]\n        );\n    }\n\n    #[test]\n    fn test_is_software_vulkan_device() {\n        assert!(SystemSpecs::is_software_vulkan_device(\n            \"llvmpipe (LLVM 17.0.0, 256 bits)\"\n        ));\n        assert!(SystemSpecs::is_software_vulkan_device(\"SwiftShader Device\"));\n        assert!(!SystemSpecs::is_software_vulkan_device(\"Adreno (TM) 740\"));\n    }\n\n    #[test]\n    fn test_is_same_gpu_name_uses_normalized_exact_match() {\n        assert!(SystemSpecs::is_same_gpu_name(\n            \"NVIDIA-GeForce RTX 4090\",\n            \"nvidia geforce rtx 4090\"\n        ));\n        assert!(!SystemSpecs::is_same_gpu_name(\"RTX\", \"RTX 4090\"));\n    }\n\n    #[test]\n    fn test_normalize_gpu_name_for_dedupe() {\n        assert_eq!(\n            SystemSpecs::normalize_gpu_name_for_dedupe(\" Adreno (TM) 740 \"),\n            \"adreno tm 740\"\n        );\n    }\n\n    // ── GpuBackend::label ────────────────────────────────────────────\n\n    #[test]\n    fn test_gpu_backend_labels() {\n        assert_eq!(super::GpuBackend::Cuda.label(), \"CUDA\");\n        assert_eq!(super::GpuBackend::Metal.label(), \"Metal\");\n        assert_eq!(super::GpuBackend::Rocm.label(), \"ROCm\");\n        assert_eq!(super::GpuBackend::Vulkan.label(), \"Vulkan\");\n        assert_eq!(super::GpuBackend::Sycl.label(), \"SYCL\");\n        assert_eq!(super::GpuBackend::CpuArm.label(), \"CPU (ARM)\");\n        assert_eq!(super::GpuBackend::CpuX86.label(), \"CPU (x86)\");\n        assert_eq!(super::GpuBackend::Ascend.label(), \"NPU (Ascend)\");\n    }\n\n    // ── parse_memory_size ────────────────────────────────────────────\n\n    #[test]\n    fn test_parse_memory_size_gb() {\n        assert_eq!(super::parse_memory_size(\"32G\"), Some(32.0));\n        assert_eq!(super::parse_memory_size(\"32GB\"), Some(32.0));\n        assert_eq!(super::parse_memory_size(\"32GiB\"), Some(32.0));\n        assert_eq!(super::parse_memory_size(\"24g\"), Some(24.0));\n        assert_eq!(super::parse_memory_size(\"24gb\"), Some(24.0));\n    }\n\n    #[test]\n    fn test_parse_memory_size_mb() {\n        let result = super::parse_memory_size(\"16384M\").unwrap();\n        assert!((result - 16.0).abs() < 0.01);\n        let result = super::parse_memory_size(\"8192MB\").unwrap();\n        assert!((result - 8.0).abs() < 0.01);\n    }\n\n    #[test]\n    fn test_parse_memory_size_tb() {\n        let result = super::parse_memory_size(\"1T\").unwrap();\n        assert!((result - 1024.0).abs() < 0.01);\n        let result = super::parse_memory_size(\"2TB\").unwrap();\n        assert!((result - 2048.0).abs() < 0.01);\n    }\n\n    #[test]\n    fn test_parse_memory_size_bare_number() {\n        assert_eq!(super::parse_memory_size(\"16\"), Some(16.0));\n    }\n\n    #[test]\n    fn test_parse_memory_size_whitespace() {\n        assert_eq!(super::parse_memory_size(\"  32G  \"), Some(32.0));\n    }\n\n    #[test]\n    fn test_parse_memory_size_empty() {\n        assert_eq!(super::parse_memory_size(\"\"), None);\n        assert_eq!(super::parse_memory_size(\"  \"), None);\n    }\n\n    #[test]\n    fn test_parse_memory_size_invalid_suffix() {\n        assert_eq!(super::parse_memory_size(\"32X\"), None);\n        assert_eq!(super::parse_memory_size(\"32KB\"), None);\n    }\n\n    #[test]\n    fn test_parse_memory_size_fractional() {\n        assert_eq!(super::parse_memory_size(\"16.5G\"), Some(16.5));\n    }\n\n    // ── with_gpu_memory_override ─────────────────────────────────────\n\n    fn make_specs_no_gpu() -> SystemSpecs {\n        SystemSpecs {\n            total_ram_gb: 32.0,\n            available_ram_gb: 24.0,\n            total_cpu_cores: 8,\n            cpu_name: \"Test CPU\".to_string(),\n            has_gpu: false,\n            gpu_vram_gb: None,\n            total_gpu_vram_gb: None,\n            gpu_name: None,\n            gpu_count: 0,\n            unified_memory: false,\n            backend: super::GpuBackend::CpuX86,\n            gpus: vec![],\n        }\n    }\n\n    fn make_specs_with_gpu() -> SystemSpecs {\n        SystemSpecs {\n            total_ram_gb: 32.0,\n            available_ram_gb: 24.0,\n            total_cpu_cores: 8,\n            cpu_name: \"Test CPU\".to_string(),\n            has_gpu: true,\n            gpu_vram_gb: Some(8.0),\n            total_gpu_vram_gb: Some(8.0),\n            gpu_name: Some(\"NVIDIA RTX 3070\".to_string()),\n            gpu_count: 1,\n            unified_memory: false,\n            backend: super::GpuBackend::Cuda,\n            gpus: vec![super::GpuInfo {\n                name: \"NVIDIA RTX 3070\".to_string(),\n                vram_gb: Some(8.0),\n                backend: super::GpuBackend::Cuda,\n                count: 1,\n                unified_memory: false,\n            }],\n        }\n    }\n\n    #[test]\n    fn test_gpu_override_creates_synthetic_gpu_when_none() {\n        let specs = make_specs_no_gpu().with_gpu_memory_override(24.0);\n        assert!(specs.has_gpu);\n        assert_eq!(specs.gpu_vram_gb, Some(24.0));\n        assert_eq!(specs.total_gpu_vram_gb, Some(24.0));\n        assert_eq!(specs.gpu_count, 1);\n        assert_eq!(specs.gpus.len(), 1);\n        assert_eq!(specs.gpus[0].name, \"User-specified GPU\");\n    }\n\n    #[test]\n    fn test_gpu_override_updates_existing_gpu() {\n        let specs = make_specs_with_gpu().with_gpu_memory_override(24.0);\n        assert_eq!(specs.gpu_vram_gb, Some(24.0));\n        assert_eq!(specs.total_gpu_vram_gb, Some(24.0));\n        assert_eq!(specs.gpus[0].vram_gb, Some(24.0));\n        assert_eq!(specs.gpus[0].name, \"NVIDIA RTX 3070\");\n    }\n\n    #[test]\n    fn test_gpu_override_multi_gpu_scales_total() {\n        let mut specs = make_specs_with_gpu();\n        specs.gpus[0].count = 2;\n        let specs = specs.with_gpu_memory_override(24.0);\n        assert_eq!(specs.gpu_vram_gb, Some(24.0));\n        assert_eq!(specs.total_gpu_vram_gb, Some(48.0));\n    }\n\n    // ── is_amd_unified_memory_apu ────────────────────────────────────\n\n    #[test]\n    fn test_amd_unified_memory_apu_detection() {\n        assert!(super::is_amd_unified_memory_apu(\n            \"AMD Ryzen AI MAX+ 395 w/ Radeon 8060S\"\n        ));\n        assert!(super::is_amd_unified_memory_apu(\n            \"AMD Ryzen AI 9 HX 370 w/ Radeon 890M\"\n        ));\n        assert!(super::is_amd_unified_memory_apu(\"AMD Ryzen AI 7 350\"));\n        assert!(!super::is_amd_unified_memory_apu(\"AMD Ryzen 9 7950X\"));\n        assert!(!super::is_amd_unified_memory_apu(\"Intel Core i9-14900K\"));\n    }\n\n    // ── bandwidth: RTX 20 series ─────────────────────────────────────\n\n    #[test]\n    fn test_bandwidth_rtx_20_series() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 2080 Ti\"),\n            Some(616.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 2060\"),\n            Some(336.0)\n        );\n    }\n\n    // ── bandwidth: GTX 16 series ─────────────────────────────────────\n\n    #[test]\n    fn test_bandwidth_gtx_16_series() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce GTX 1660 Ti\"),\n            Some(288.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce GTX 1650\"),\n            Some(128.0)\n        );\n    }\n\n    // ── bandwidth: RTX 50 series ─────────────────────────────────────\n\n    #[test]\n    fn test_bandwidth_rtx_50_series() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 5090\"),\n            Some(1792.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 5080\"),\n            Some(960.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 5070 Ti\"),\n            Some(896.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 5070\"),\n            Some(672.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 5060 Ti\"),\n            Some(448.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA GeForce RTX 5060\"),\n            Some(256.0)\n        );\n    }\n\n    // ── bandwidth: AMD RX 6000 series ────────────────────────────────\n\n    #[test]\n    fn test_bandwidth_amd_rx_6000() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Radeon RX 6950 XT\"),\n            Some(576.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Radeon RX 6700 XT\"),\n            Some(384.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Radeon RX 6600\"),\n            Some(224.0)\n        );\n    }\n\n    // ── bandwidth: NVIDIA professional ───────────────────────────────\n\n    #[test]\n    fn test_bandwidth_nvidia_professional() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA RTX A6000\"),\n            Some(768.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"NVIDIA RTX A4000\"),\n            Some(448.0)\n        );\n        assert_eq!(super::gpu_memory_bandwidth_gbps(\"NVIDIA L40S\"), Some(864.0));\n        assert_eq!(super::gpu_memory_bandwidth_gbps(\"NVIDIA L4\"), Some(300.0));\n    }\n\n    // ── bandwidth: Apple Silicon all variants ────────────────────────\n\n    #[test]\n    fn test_bandwidth_apple_silicon_all() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"Apple M4 Ultra\"),\n            Some(819.0)\n        );\n        assert_eq!(super::gpu_memory_bandwidth_gbps(\"Apple M4\"), Some(120.0));\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"Apple M3 Ultra\"),\n            Some(800.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"Apple M3 Max\"),\n            Some(400.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"Apple M3 Pro\"),\n            Some(150.0)\n        );\n        assert_eq!(super::gpu_memory_bandwidth_gbps(\"Apple M3\"), Some(100.0));\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"Apple M1 Pro\"),\n            Some(200.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"Apple M1 Ultra\"),\n            Some(800.0)\n        );\n    }\n\n    // ── bandwidth: AMD CDNA ──────────────────────────────────────────\n\n    #[test]\n    fn test_bandwidth_amd_cdna() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Instinct MI250X\"),\n            Some(3277.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Instinct MI210\"),\n            Some(1638.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Instinct MI100\"),\n            Some(1229.0)\n        );\n    }\n\n    // ── bandwidth: AMD RDNA 4 ────────────────────────────────────────\n\n    #[test]\n    fn test_bandwidth_amd_rdna4() {\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Radeon RX 9070 XT\"),\n            Some(624.0)\n        );\n        assert_eq!(\n            super::gpu_memory_bandwidth_gbps(\"AMD Radeon RX 9070\"),\n            Some(488.0)\n        );\n    }\n\n    // ── compute capability tests ──────────────────────────────────────\n\n    #[test]\n    fn test_compute_capability_nvidia_generations() {\n        // Pascal\n        assert_eq!(super::gpu_compute_capability(\"Tesla P100\"), Some((6, 1)));\n        // Volta\n        assert_eq!(\n            super::gpu_compute_capability(\"Tesla V100-PCIE-16GB\"),\n            Some((7, 0))\n        );\n        // Turing\n        assert_eq!(super::gpu_compute_capability(\"Tesla T4\"), Some((7, 5)));\n        assert_eq!(\n            super::gpu_compute_capability(\"NVIDIA GeForce RTX 2080 Ti\"),\n            Some((7, 5))\n        );\n        assert_eq!(\n            super::gpu_compute_capability(\"NVIDIA GeForce GTX 1660 Ti\"),\n            Some((7, 5))\n        );\n        // Ampere\n        assert_eq!(super::gpu_compute_capability(\"NVIDIA A100\"), Some((8, 0)));\n        assert_eq!(\n            super::gpu_compute_capability(\"NVIDIA GeForce RTX 3090\"),\n            Some((8, 6))\n        );\n        // Ada Lovelace\n        assert_eq!(\n            super::gpu_compute_capability(\"NVIDIA GeForce RTX 4090\"),\n            Some((8, 9))\n        );\n        assert_eq!(super::gpu_compute_capability(\"NVIDIA L40S\"), Some((8, 9)));\n        // Hopper\n        assert_eq!(\n            super::gpu_compute_capability(\"NVIDIA H100 SXM\"),\n            Some((9, 0))\n        );\n        // Blackwell\n        assert_eq!(\n            super::gpu_compute_capability(\"NVIDIA GeForce RTX 5090\"),\n            Some((10, 0))\n        );\n    }\n\n    #[test]\n    fn test_compute_capability_unknown_returns_none() {\n        assert_eq!(super::gpu_compute_capability(\"Some Random GPU\"), None);\n        assert_eq!(super::gpu_compute_capability(\"Apple M4 Max\"), None);\n        assert_eq!(\n            super::gpu_compute_capability(\"AMD Radeon RX 7900 XTX\"),\n            None\n        );\n    }\n\n    #[test]\n    fn test_quant_min_compute_capability() {\n        assert_eq!(\n            super::quant_min_compute_capability(\"AWQ-4bit\"),\n            Some((7, 5))\n        );\n        assert_eq!(\n            super::quant_min_compute_capability(\"AWQ-8bit\"),\n            Some((7, 5))\n        );\n        assert_eq!(\n            super::quant_min_compute_capability(\"GPTQ-Int4\"),\n            Some((7, 5))\n        );\n        assert_eq!(\n            super::quant_min_compute_capability(\"GPTQ-Int8\"),\n            Some((7, 5))\n        );\n        // GGUF quants have no CC restriction\n        assert_eq!(super::quant_min_compute_capability(\"Q4_K_M\"), None);\n        assert_eq!(super::quant_min_compute_capability(\"Q8_0\"), None);\n    }\n}\n"
  },
  {
    "path": "llmfit-core/src/lib.rs",
    "content": "pub mod fit;\npub mod hardware;\npub mod models;\npub mod plan;\npub mod providers;\n\npub use fit::{FitLevel, InferenceRuntime, ModelFit, RunMode, ScoreComponents, SortColumn};\npub use hardware::{GpuBackend, SystemSpecs};\npub use models::{Capability, LlmModel, ModelDatabase, ModelFormat, UseCase};\npub use plan::{\n    HardwareEstimate, PathEstimate, PlanCurrentStatus, PlanEstimate, PlanRequest, PlanRunPath,\n    UpgradeDelta, estimate_model_plan, normalize_quant, resolve_model_selector,\n};\npub use providers::{\n    LlamaCppProvider, LmStudioProvider, MlxProvider, ModelProvider, OllamaProvider,\n};\n"
  },
  {
    "path": "llmfit-core/src/models.rs",
    "content": "use std::collections::HashMap;\n\nuse serde::{Deserialize, Serialize};\n\n/// Quantization levels ordered from best quality to most compressed.\n/// Used for dynamic quantization selection: try the best that fits.\npub const QUANT_HIERARCHY: &[&str] = &[\"Q8_0\", \"Q6_K\", \"Q5_K_M\", \"Q4_K_M\", \"Q3_K_M\", \"Q2_K\"];\n\n/// MLX-native quantization hierarchy (best quality to most compressed).\npub const MLX_QUANT_HIERARCHY: &[&str] = &[\"mlx-8bit\", \"mlx-4bit\"];\n\n/// Bytes per parameter for each quantization level.\npub fn quant_bpp(quant: &str) -> f64 {\n    match quant {\n        \"F32\" => 4.0,\n        \"F16\" | \"BF16\" => 2.0,\n        \"Q8_0\" => 1.05,\n        \"Q6_K\" => 0.80,\n        \"Q5_K_M\" => 0.68,\n        \"Q4_K_M\" | \"Q4_0\" => 0.58,\n        \"Q3_K_M\" => 0.48,\n        \"Q2_K\" => 0.37,\n        \"mlx-4bit\" => 0.55,\n        \"mlx-8bit\" => 1.0,\n        \"AWQ-4bit\" => 0.5,\n        \"AWQ-8bit\" => 1.0,\n        \"GPTQ-Int4\" => 0.5,\n        \"GPTQ-Int8\" => 1.0,\n        _ => 0.58,\n    }\n}\n\n/// Speed multiplier for quantization (lower quant = faster inference).\npub fn quant_speed_multiplier(quant: &str) -> f64 {\n    match quant {\n        \"F16\" | \"BF16\" => 0.6,\n        \"Q8_0\" => 0.8,\n        \"Q6_K\" => 0.95,\n        \"Q5_K_M\" => 1.0,\n        \"Q4_K_M\" | \"Q4_0\" => 1.15,\n        \"Q3_K_M\" => 1.25,\n        \"Q2_K\" => 1.35,\n        \"mlx-4bit\" => 1.15,\n        \"mlx-8bit\" => 0.85,\n        \"AWQ-4bit\" | \"GPTQ-Int4\" => 1.2,\n        \"AWQ-8bit\" | \"GPTQ-Int8\" => 0.85,\n        _ => 1.0,\n    }\n}\n\n/// Bytes per parameter for a given quantization format.\n/// Used by the bandwidth-based tok/s estimator to compute model size in GB.\npub fn quant_bytes_per_param(quant: &str) -> f64 {\n    match quant {\n        \"F16\" | \"BF16\" => 2.0,\n        \"Q8_0\" => 1.0,\n        \"Q6_K\" => 0.75,\n        \"Q5_K_M\" => 0.625,\n        \"Q4_K_M\" | \"Q4_0\" => 0.5,\n        \"Q3_K_M\" => 0.375,\n        \"Q2_K\" => 0.25,\n        \"mlx-4bit\" => 0.5,\n        \"mlx-8bit\" => 1.0,\n        \"AWQ-4bit\" | \"GPTQ-Int4\" => 0.5,\n        \"AWQ-8bit\" | \"GPTQ-Int8\" => 1.0,\n        _ => 0.5, // default to ~4-bit\n    }\n}\n\n/// Quality penalty for quantization (lower quant = lower quality).\npub fn quant_quality_penalty(quant: &str) -> f64 {\n    match quant {\n        \"F16\" | \"BF16\" => 0.0,\n        \"Q8_0\" => 0.0,\n        \"Q6_K\" => -1.0,\n        \"Q5_K_M\" => -2.0,\n        \"Q4_K_M\" | \"Q4_0\" => -5.0,\n        \"Q3_K_M\" => -8.0,\n        \"Q2_K\" => -12.0,\n        \"mlx-4bit\" => -4.0,\n        \"mlx-8bit\" => 0.0,\n        \"AWQ-4bit\" => -3.0,\n        \"AWQ-8bit\" => 0.0,\n        \"GPTQ-Int4\" => -3.0,\n        \"GPTQ-Int8\" => 0.0,\n        _ => -5.0,\n    }\n}\n\n/// Model capability flags (orthogonal to UseCase).\n#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]\n#[serde(rename_all = \"snake_case\")]\npub enum Capability {\n    Vision,\n    ToolUse,\n}\n\nimpl Capability {\n    pub fn label(&self) -> &'static str {\n        match self {\n            Capability::Vision => \"Vision\",\n            Capability::ToolUse => \"Tool Use\",\n        }\n    }\n\n    pub fn all() -> &'static [Capability] {\n        &[Capability::Vision, Capability::ToolUse]\n    }\n\n    /// Infer capabilities from model metadata when not explicitly set in JSON.\n    pub fn infer(model: &LlmModel) -> Vec<Capability> {\n        let mut caps = model.capabilities.clone();\n        let name = model.name.to_lowercase();\n        let use_case = model.use_case.to_lowercase();\n\n        // Vision detection\n        if !caps.contains(&Capability::Vision)\n            && (name.contains(\"vision\")\n                || name.contains(\"-vl-\")\n                || name.ends_with(\"-vl\")\n                || name.contains(\"llava\")\n                || name.contains(\"onevision\")\n                || name.contains(\"pixtral\")\n                || use_case.contains(\"vision\")\n                || use_case.contains(\"multimodal\"))\n        {\n            caps.push(Capability::Vision);\n        }\n\n        // Tool use detection (known model families)\n        if !caps.contains(&Capability::ToolUse)\n            && (use_case.contains(\"tool\")\n                || use_case.contains(\"function call\")\n                || name.contains(\"qwen3\")\n                || name.contains(\"qwen2.5\")\n                || name.contains(\"command-r\")\n                || (name.contains(\"llama-3\") && name.contains(\"instruct\"))\n                || (name.contains(\"mistral\") && name.contains(\"instruct\"))\n                || name.contains(\"hermes\"))\n        {\n            caps.push(Capability::ToolUse);\n        }\n\n        caps\n    }\n}\n\n/// Model weight format — determines which inference runtime to use.\n#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]\n#[serde(rename_all = \"lowercase\")]\n#[derive(Default)]\npub enum ModelFormat {\n    #[default]\n    Gguf,\n    Awq,\n    Gptq,\n    Mlx,\n    Safetensors,\n}\n\nimpl ModelFormat {\n    /// Returns true for formats that are pre-quantized at a fixed bit width\n    /// and cannot be dynamically re-quantized (AWQ, GPTQ).\n    pub fn is_prequantized(&self) -> bool {\n        matches!(self, ModelFormat::Awq | ModelFormat::Gptq)\n    }\n}\n\n/// Use-case category for scoring weights.\n#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize)]\npub enum UseCase {\n    General,\n    Coding,\n    Reasoning,\n    Chat,\n    Multimodal,\n    Embedding,\n}\n\nimpl UseCase {\n    pub fn label(&self) -> &'static str {\n        match self {\n            UseCase::General => \"General\",\n            UseCase::Coding => \"Coding\",\n            UseCase::Reasoning => \"Reasoning\",\n            UseCase::Chat => \"Chat\",\n            UseCase::Multimodal => \"Multimodal\",\n            UseCase::Embedding => \"Embedding\",\n        }\n    }\n\n    /// Infer use-case from the model's use_case field and name.\n    pub fn from_model(model: &LlmModel) -> Self {\n        let name = model.name.to_lowercase();\n        let use_case = model.use_case.to_lowercase();\n\n        if use_case.contains(\"embedding\") || name.contains(\"embed\") || name.contains(\"bge\") {\n            UseCase::Embedding\n        } else if name.contains(\"code\") || use_case.contains(\"code\") {\n            UseCase::Coding\n        } else if use_case.contains(\"vision\") || use_case.contains(\"multimodal\") {\n            UseCase::Multimodal\n        } else if use_case.contains(\"reason\")\n            || use_case.contains(\"chain-of-thought\")\n            || name.contains(\"deepseek-r1\")\n        {\n            UseCase::Reasoning\n        } else if use_case.contains(\"chat\") || use_case.contains(\"instruction\") {\n            UseCase::Chat\n        } else {\n            UseCase::General\n        }\n    }\n}\n\n#[derive(Debug, Clone, Serialize, Deserialize)]\npub struct LlmModel {\n    pub name: String,\n    pub provider: String,\n    pub parameter_count: String,\n    #[serde(default)]\n    pub parameters_raw: Option<u64>,\n    pub min_ram_gb: f64,\n    pub recommended_ram_gb: f64,\n    pub min_vram_gb: Option<f64>,\n    pub quantization: String,\n    pub context_length: u32,\n    pub use_case: String,\n    #[serde(default)]\n    pub is_moe: bool,\n    #[serde(default)]\n    pub num_experts: Option<u32>,\n    #[serde(default)]\n    pub active_experts: Option<u32>,\n    #[serde(default)]\n    pub active_parameters: Option<u64>,\n    #[serde(default)]\n    pub release_date: Option<String>,\n    /// Known GGUF download sources (e.g. unsloth, bartowski repos on HuggingFace)\n    #[serde(default)]\n    pub gguf_sources: Vec<GgufSource>,\n    /// Model capabilities (vision, tool use, etc.)\n    #[serde(default)]\n    pub capabilities: Vec<Capability>,\n    /// Model weight format (gguf, awq, gptq, mlx, safetensors)\n    #[serde(default)]\n    pub format: ModelFormat,\n}\n\n/// A known GGUF download source for a model on HuggingFace.\n#[derive(Debug, Clone, Serialize, Deserialize)]\npub struct GgufSource {\n    /// HuggingFace repo ID (e.g. \"unsloth/Llama-3.1-8B-Instruct-GGUF\")\n    pub repo: String,\n    /// Provider who published the GGUF (e.g. \"unsloth\", \"bartowski\")\n    pub provider: String,\n}\n\nimpl LlmModel {\n    /// MLX models are Apple-only — they won't run on NVIDIA/AMD/Intel hardware.\n    /// We detect them by the `-MLX-` suffix that's standard on HuggingFace\n    /// (e.g. `Qwen3-8B-MLX-4bit`, `LFM2-1.2B-MLX-8bit`).\n    pub fn is_mlx_model(&self) -> bool {\n        let name_lower = self.name.to_lowercase();\n        name_lower.contains(\"-mlx-\") || name_lower.ends_with(\"-mlx\")\n    }\n\n    /// Returns true if this model uses a pre-quantized format (AWQ/GPTQ)\n    /// that cannot be dynamically re-quantized.\n    pub fn is_prequantized(&self) -> bool {\n        self.format.is_prequantized()\n    }\n\n    /// Bytes-per-parameter for the model's quantization level.\n    fn quant_bpp(&self) -> f64 {\n        quant_bpp(&self.quantization)\n    }\n\n    /// Parameter count in billions, extracted from parameters_raw or parameter_count.\n    pub fn params_b(&self) -> f64 {\n        if let Some(raw) = self.parameters_raw {\n            raw as f64 / 1_000_000_000.0\n        } else {\n            // Parse from string like \"7B\", \"1.1B\", \"137M\"\n            let s = self.parameter_count.trim().to_uppercase();\n            if let Some(num_str) = s.strip_suffix('B') {\n                num_str.parse::<f64>().unwrap_or(7.0)\n            } else if let Some(num_str) = s.strip_suffix('M') {\n                num_str.parse::<f64>().unwrap_or(0.0) / 1000.0\n            } else {\n                7.0\n            }\n        }\n    }\n\n    /// Estimate memory required (GB) at a given quantization and context length.\n    /// Formula: model_weights + KV_cache + runtime_overhead\n    pub fn estimate_memory_gb(&self, quant: &str, ctx: u32) -> f64 {\n        let bpp = quant_bpp(quant);\n        let params = self.params_b();\n        let model_mem = params * bpp;\n        // KV cache: ~0.000008 GB per billion params per context token\n        let kv_cache = 0.000008 * params * ctx as f64;\n        // Runtime overhead (CUDA/Metal context, buffers)\n        let overhead = 0.5;\n        model_mem + kv_cache + overhead\n    }\n\n    /// Select the best quantization level that fits within a memory budget.\n    /// Returns the quant name and estimated memory in GB, or None if nothing fits.\n    pub fn best_quant_for_budget(&self, budget_gb: f64, ctx: u32) -> Option<(&'static str, f64)> {\n        self.best_quant_for_budget_with(budget_gb, ctx, QUANT_HIERARCHY)\n    }\n\n    /// Select the best quantization from a custom hierarchy that fits within a memory budget.\n    pub fn best_quant_for_budget_with(\n        &self,\n        budget_gb: f64,\n        ctx: u32,\n        hierarchy: &[&'static str],\n    ) -> Option<(&'static str, f64)> {\n        // Try best quality first\n        for &q in hierarchy {\n            let mem = self.estimate_memory_gb(q, ctx);\n            if mem <= budget_gb {\n                return Some((q, mem));\n            }\n        }\n        // Try halving context once\n        let half_ctx = ctx / 2;\n        if half_ctx >= 1024 {\n            for &q in hierarchy {\n                let mem = self.estimate_memory_gb(q, half_ctx);\n                if mem <= budget_gb {\n                    return Some((q, mem));\n                }\n            }\n        }\n        None\n    }\n\n    /// For MoE models, compute estimated VRAM for active experts only.\n    /// Returns None for dense models.\n    pub fn moe_active_vram_gb(&self) -> Option<f64> {\n        if !self.is_moe {\n            return None;\n        }\n        let active_params = self.active_parameters? as f64;\n        let bpp = self.quant_bpp();\n        let size_gb = (active_params * bpp) / (1024.0 * 1024.0 * 1024.0);\n        Some((size_gb * 1.1).max(0.5))\n    }\n\n    /// Returns true if this model is MLX-specific (Apple Silicon only).\n    /// MLX models are identified by having \"-MLX\" in their name.\n    pub fn is_mlx_only(&self) -> bool {\n        self.name.to_uppercase().contains(\"-MLX\")\n    }\n\n    /// For MoE models, compute RAM needed for offloaded (inactive) experts.\n    /// Returns None for dense models.\n    pub fn moe_offloaded_ram_gb(&self) -> Option<f64> {\n        if !self.is_moe {\n            return None;\n        }\n        let active = self.active_parameters? as f64;\n        let total = self.parameters_raw? as f64;\n        let inactive = total - active;\n        if inactive <= 0.0 {\n            return Some(0.0);\n        }\n        let bpp = self.quant_bpp();\n        Some((inactive * bpp) / (1024.0 * 1024.0 * 1024.0))\n    }\n}\n\n/// Intermediate struct matching the JSON schema from the scraper.\n/// Extra fields are ignored when mapping to LlmModel.\n#[derive(Debug, Clone, Deserialize)]\nstruct HfModelEntry {\n    name: String,\n    provider: String,\n    parameter_count: String,\n    #[serde(default)]\n    parameters_raw: Option<u64>,\n    min_ram_gb: f64,\n    recommended_ram_gb: f64,\n    min_vram_gb: Option<f64>,\n    quantization: String,\n    context_length: u32,\n    use_case: String,\n    #[serde(default)]\n    is_moe: bool,\n    #[serde(default)]\n    num_experts: Option<u32>,\n    #[serde(default)]\n    active_experts: Option<u32>,\n    #[serde(default)]\n    active_parameters: Option<u64>,\n    #[serde(default)]\n    release_date: Option<String>,\n    #[serde(default)]\n    gguf_sources: Vec<GgufSource>,\n    #[serde(default)]\n    capabilities: Vec<Capability>,\n    #[serde(default)]\n    format: ModelFormat,\n    #[serde(default)]\n    hf_downloads: u64,\n    #[serde(default)]\n    hf_likes: u64,\n}\n\nfn parse_parameter_count_hint(parameter_count: &str) -> Option<u64> {\n    let normalized = parameter_count.trim().replace(',', \"\").to_uppercase();\n\n    if let Some(raw) = normalized.strip_suffix('B') {\n        raw.parse::<f64>()\n            .ok()\n            .map(|value| (value * 1_000_000_000.0).round() as u64)\n    } else if let Some(raw) = normalized.strip_suffix('M') {\n        raw.parse::<f64>()\n            .ok()\n            .map(|value| (value * 1_000_000.0).round() as u64)\n    } else {\n        None\n    }\n}\n\nfn effective_parameters_raw(entry: &HfModelEntry) -> Option<u64> {\n    entry\n        .parameters_raw\n        .or_else(|| parse_parameter_count_hint(&entry.parameter_count))\n}\n\nfn option_max<T: PartialOrd + Copy>(left: Option<T>, right: Option<T>) -> Option<T> {\n    match (left, right) {\n        (Some(left), Some(right)) => Some(if right > left { right } else { left }),\n        (Some(left), None) => Some(left),\n        (None, Some(right)) => Some(right),\n        (None, None) => None,\n    }\n}\n\nfn hf_entry_rank(entry: &HfModelEntry) -> (u64, u64, usize, usize, u8, u64, u64, u32) {\n    (\n        entry.hf_downloads,\n        entry.hf_likes,\n        entry.capabilities.len(),\n        entry.gguf_sources.len(),\n        u8::from(entry.release_date.is_some()),\n        entry.active_parameters.unwrap_or(0),\n        effective_parameters_raw(entry).unwrap_or(0),\n        entry.context_length,\n    )\n}\n\nfn merge_exact_name_entries(\n    mut primary: HfModelEntry,\n    mut secondary: HfModelEntry,\n) -> HfModelEntry {\n    if hf_entry_rank(&secondary) > hf_entry_rank(&primary) {\n        std::mem::swap(&mut primary, &mut secondary);\n    }\n\n    let primary_effective_params = effective_parameters_raw(&primary).unwrap_or(0);\n    let secondary_effective_params = effective_parameters_raw(&secondary).unwrap_or(0);\n\n    if secondary_effective_params > primary_effective_params {\n        primary.parameter_count = secondary.parameter_count.clone();\n    }\n    primary.parameters_raw = option_max(primary.parameters_raw, secondary.parameters_raw);\n    primary.min_ram_gb = primary.min_ram_gb.max(secondary.min_ram_gb);\n    primary.recommended_ram_gb = primary.recommended_ram_gb.max(secondary.recommended_ram_gb);\n    primary.min_vram_gb = option_max(primary.min_vram_gb, secondary.min_vram_gb);\n    primary.context_length = primary.context_length.max(secondary.context_length);\n    primary.is_moe |= secondary.is_moe;\n    primary.num_experts = option_max(primary.num_experts, secondary.num_experts);\n    primary.active_experts = option_max(primary.active_experts, secondary.active_experts);\n    primary.active_parameters = option_max(primary.active_parameters, secondary.active_parameters);\n\n    if primary.provider.is_empty() {\n        primary.provider = secondary.provider.clone();\n    }\n    if primary.quantization.is_empty() {\n        primary.quantization = secondary.quantization.clone();\n    }\n    if primary.use_case.is_empty() {\n        primary.use_case = secondary.use_case.clone();\n    }\n    if primary.format == ModelFormat::default() && secondary.format != ModelFormat::default() {\n        primary.format = secondary.format;\n    }\n    if secondary.release_date.as_deref() > primary.release_date.as_deref() {\n        primary.release_date = secondary.release_date.clone();\n    }\n\n    for capability in secondary.capabilities {\n        if !primary.capabilities.contains(&capability) {\n            primary.capabilities.push(capability);\n        }\n    }\n\n    for source in secondary.gguf_sources {\n        let exists = primary\n            .gguf_sources\n            .iter()\n            .any(|existing| existing.repo == source.repo && existing.provider == source.provider);\n        if !exists {\n            primary.gguf_sources.push(source);\n        }\n    }\n\n    primary\n}\n\nfn dedupe_hf_entries(entries: Vec<HfModelEntry>) -> Vec<HfModelEntry> {\n    let mut deduped_entries: Vec<HfModelEntry> = Vec::with_capacity(entries.len());\n    let mut deduped_indices: HashMap<String, usize> = HashMap::new();\n\n    for entry in entries {\n        let key = entry.name.to_lowercase();\n        if let Some(&idx) = deduped_indices.get(&key) {\n            deduped_entries[idx] = merge_exact_name_entries(deduped_entries[idx].clone(), entry);\n        } else {\n            deduped_indices.insert(key, deduped_entries.len());\n            deduped_entries.push(entry);\n        }\n    }\n\n    deduped_entries\n}\n\nconst HF_MODELS_JSON: &str = include_str!(\"../data/hf_models.json\");\n\npub struct ModelDatabase {\n    models: Vec<LlmModel>,\n}\n\nimpl Default for ModelDatabase {\n    fn default() -> Self {\n        Self::new()\n    }\n}\n\nimpl ModelDatabase {\n    pub fn new() -> Self {\n        let entries: Vec<HfModelEntry> =\n            serde_json::from_str(HF_MODELS_JSON).expect(\"Failed to parse embedded hf_models.json\");\n\n        let deduped_entries = dedupe_hf_entries(entries);\n\n        let models = deduped_entries\n            .into_iter()\n            .map(|e| {\n                let mut model = LlmModel {\n                    name: e.name,\n                    provider: e.provider,\n                    parameter_count: e.parameter_count,\n                    parameters_raw: e.parameters_raw,\n                    min_ram_gb: e.min_ram_gb,\n                    recommended_ram_gb: e.recommended_ram_gb,\n                    min_vram_gb: e.min_vram_gb,\n                    quantization: e.quantization,\n                    context_length: e.context_length,\n                    use_case: e.use_case,\n                    is_moe: e.is_moe,\n                    num_experts: e.num_experts,\n                    active_experts: e.active_experts,\n                    active_parameters: e.active_parameters,\n                    release_date: e.release_date,\n                    gguf_sources: e.gguf_sources,\n                    capabilities: e.capabilities,\n                    format: e.format,\n                };\n                model.capabilities = Capability::infer(&model);\n                model\n            })\n            .collect();\n\n        ModelDatabase { models }\n    }\n\n    pub fn get_all_models(&self) -> &Vec<LlmModel> {\n        &self.models\n    }\n\n    pub fn find_model(&self, query: &str) -> Vec<&LlmModel> {\n        let query_lower = query.to_lowercase();\n        self.models\n            .iter()\n            .filter(|m| {\n                m.name.to_lowercase().contains(&query_lower)\n                    || m.provider.to_lowercase().contains(&query_lower)\n                    || m.parameter_count.to_lowercase().contains(&query_lower)\n            })\n            .collect()\n    }\n\n    pub fn models_fitting_system(\n        &self,\n        available_ram_gb: f64,\n        has_gpu: bool,\n        vram_gb: Option<f64>,\n    ) -> Vec<&LlmModel> {\n        self.models\n            .iter()\n            .filter(|m| {\n                // Check RAM requirement\n                let ram_ok = m.min_ram_gb <= available_ram_gb;\n\n                // If model requires GPU and system has GPU, check VRAM\n                if let Some(min_vram) = m.min_vram_gb {\n                    if has_gpu {\n                        if let Some(system_vram) = vram_gb {\n                            ram_ok && min_vram <= system_vram\n                        } else {\n                            // GPU detected but VRAM unknown, allow but warn\n                            ram_ok\n                        }\n                    } else {\n                        // Model prefers GPU but can run on CPU with enough RAM\n                        ram_ok && available_ram_gb >= m.recommended_ram_gb\n                    }\n                } else {\n                    ram_ok\n                }\n            })\n            .collect()\n    }\n}\n\n#[cfg(test)]\nmod tests {\n    use super::*;\n\n    // ────────────────────────────────────────────────────────────────────\n    // Quantization function tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_mlx_quant_bpp_values() {\n        assert_eq!(quant_bpp(\"mlx-4bit\"), 0.55);\n        assert_eq!(quant_bpp(\"mlx-8bit\"), 1.0);\n        assert_eq!(quant_speed_multiplier(\"mlx-4bit\"), 1.15);\n        assert_eq!(quant_speed_multiplier(\"mlx-8bit\"), 0.85);\n        assert_eq!(quant_quality_penalty(\"mlx-4bit\"), -4.0);\n        assert_eq!(quant_quality_penalty(\"mlx-8bit\"), 0.0);\n    }\n\n    #[test]\n    fn test_best_quant_with_mlx_hierarchy() {\n        let model = LlmModel {\n            name: \"Test Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n\n        // Large budget should return mlx-8bit (best in MLX hierarchy)\n        let result = model.best_quant_for_budget_with(10.0, 4096, MLX_QUANT_HIERARCHY);\n        assert!(result.is_some());\n        let (quant, _) = result.unwrap();\n        assert_eq!(quant, \"mlx-8bit\");\n\n        // Tighter budget should fall to mlx-4bit\n        let result = model.best_quant_for_budget_with(5.0, 4096, MLX_QUANT_HIERARCHY);\n        assert!(result.is_some());\n        let (quant, _) = result.unwrap();\n        assert_eq!(quant, \"mlx-4bit\");\n    }\n\n    #[test]\n    fn test_quant_bpp() {\n        assert_eq!(quant_bpp(\"F32\"), 4.0);\n        assert_eq!(quant_bpp(\"F16\"), 2.0);\n        assert_eq!(quant_bpp(\"Q8_0\"), 1.05);\n        assert_eq!(quant_bpp(\"Q4_K_M\"), 0.58);\n        assert_eq!(quant_bpp(\"Q2_K\"), 0.37);\n        // Unknown quant defaults to Q4_K_M\n        assert_eq!(quant_bpp(\"UNKNOWN\"), 0.58);\n    }\n\n    #[test]\n    fn test_quant_speed_multiplier() {\n        assert_eq!(quant_speed_multiplier(\"F16\"), 0.6);\n        assert_eq!(quant_speed_multiplier(\"Q5_K_M\"), 1.0);\n        assert_eq!(quant_speed_multiplier(\"Q4_K_M\"), 1.15);\n        assert_eq!(quant_speed_multiplier(\"Q2_K\"), 1.35);\n        // Lower quant = faster inference\n        assert!(quant_speed_multiplier(\"Q2_K\") > quant_speed_multiplier(\"Q8_0\"));\n    }\n\n    #[test]\n    fn test_quant_quality_penalty() {\n        assert_eq!(quant_quality_penalty(\"F16\"), 0.0);\n        assert_eq!(quant_quality_penalty(\"Q8_0\"), 0.0);\n        assert_eq!(quant_quality_penalty(\"Q4_K_M\"), -5.0);\n        assert_eq!(quant_quality_penalty(\"Q2_K\"), -12.0);\n        // Lower quant = higher quality penalty\n        assert!(quant_quality_penalty(\"Q2_K\") < quant_quality_penalty(\"Q8_0\"));\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // LlmModel tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_params_b_from_raw() {\n        let model = LlmModel {\n            name: \"Test Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        assert_eq!(model.params_b(), 7.0);\n    }\n\n    #[test]\n    fn test_params_b_from_string() {\n        let model = LlmModel {\n            name: \"Test Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"13B\".to_string(),\n            parameters_raw: None,\n            min_ram_gb: 8.0,\n            recommended_ram_gb: 16.0,\n            min_vram_gb: Some(8.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        assert_eq!(model.params_b(), 13.0);\n    }\n\n    #[test]\n    fn test_params_b_from_millions() {\n        let model = LlmModel {\n            name: \"Test Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"500M\".to_string(),\n            parameters_raw: None,\n            min_ram_gb: 1.0,\n            recommended_ram_gb: 2.0,\n            min_vram_gb: Some(1.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 2048,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        assert_eq!(model.params_b(), 0.5);\n    }\n\n    #[test]\n    fn test_estimate_memory_gb() {\n        let model = LlmModel {\n            name: \"Test Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n\n        let mem = model.estimate_memory_gb(\"Q4_K_M\", 4096);\n        // 7B params * 0.58 bytes = 4.06 GB + KV cache + overhead\n        assert!(mem > 4.0);\n        assert!(mem < 6.0);\n\n        // Q8_0 should require more memory\n        let mem_q8 = model.estimate_memory_gb(\"Q8_0\", 4096);\n        assert!(mem_q8 > mem);\n    }\n\n    #[test]\n    fn test_best_quant_for_budget() {\n        let model = LlmModel {\n            name: \"Test Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n\n        // Large budget should return best quant\n        let result = model.best_quant_for_budget(10.0, 4096);\n        assert!(result.is_some());\n        let (quant, _) = result.unwrap();\n        assert_eq!(quant, \"Q8_0\");\n\n        // Medium budget should find acceptable quant\n        let result = model.best_quant_for_budget(5.0, 4096);\n        assert!(result.is_some());\n\n        // Tiny budget should return None\n        let result = model.best_quant_for_budget(1.0, 4096);\n        assert!(result.is_none());\n    }\n\n    #[test]\n    fn test_moe_active_vram_gb() {\n        // Dense model should return None\n        let dense_model = LlmModel {\n            name: \"Dense Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        assert!(dense_model.moe_active_vram_gb().is_none());\n\n        // MoE model should calculate active VRAM\n        let moe_model = LlmModel {\n            name: \"MoE Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"8x7B\".to_string(),\n            parameters_raw: Some(46_700_000_000),\n            min_ram_gb: 25.0,\n            recommended_ram_gb: 50.0,\n            min_vram_gb: Some(25.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 32768,\n            use_case: \"General\".to_string(),\n            is_moe: true,\n            num_experts: Some(8),\n            active_experts: Some(2),\n            active_parameters: Some(12_900_000_000),\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        let vram = moe_model.moe_active_vram_gb();\n        assert!(vram.is_some());\n        let vram_val = vram.unwrap();\n        // Should be significantly less than full model\n        assert!(vram_val > 0.0);\n        assert!(vram_val < 15.0);\n    }\n\n    #[test]\n    fn test_moe_offloaded_ram_gb() {\n        // Dense model should return None\n        let dense_model = LlmModel {\n            name: \"Dense Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        assert!(dense_model.moe_offloaded_ram_gb().is_none());\n\n        // MoE model should calculate offloaded RAM\n        let moe_model = LlmModel {\n            name: \"MoE Model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"8x7B\".to_string(),\n            parameters_raw: Some(46_700_000_000),\n            min_ram_gb: 25.0,\n            recommended_ram_gb: 50.0,\n            min_vram_gb: Some(25.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 32768,\n            use_case: \"General\".to_string(),\n            is_moe: true,\n            num_experts: Some(8),\n            active_experts: Some(2),\n            active_parameters: Some(12_900_000_000),\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        let offloaded = moe_model.moe_offloaded_ram_gb();\n        assert!(offloaded.is_some());\n        let offloaded_val = offloaded.unwrap();\n        // Should be substantial\n        assert!(offloaded_val > 10.0);\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // UseCase tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_use_case_from_model_coding() {\n        let model = LlmModel {\n            name: \"codellama-7b\".to_string(),\n            provider: \"Meta\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"Coding\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        assert_eq!(UseCase::from_model(&model), UseCase::Coding);\n    }\n\n    #[test]\n    fn test_use_case_from_model_embedding() {\n        let model = LlmModel {\n            name: \"bge-large\".to_string(),\n            provider: \"BAAI\".to_string(),\n            parameter_count: \"335M\".to_string(),\n            parameters_raw: Some(335_000_000),\n            min_ram_gb: 1.0,\n            recommended_ram_gb: 2.0,\n            min_vram_gb: Some(1.0),\n            quantization: \"F16\".to_string(),\n            context_length: 512,\n            use_case: \"Embedding\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        assert_eq!(UseCase::from_model(&model), UseCase::Embedding);\n    }\n\n    #[test]\n    fn test_use_case_from_model_reasoning() {\n        let model = LlmModel {\n            name: \"deepseek-r1-7b\".to_string(),\n            provider: \"DeepSeek\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 8192,\n            use_case: \"Reasoning\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        assert_eq!(UseCase::from_model(&model), UseCase::Reasoning);\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // ModelDatabase tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_model_database_new() {\n        let db = ModelDatabase::new();\n        let models = db.get_all_models();\n        // Should have loaded models from embedded JSON\n        assert!(!models.is_empty());\n    }\n\n    #[test]\n    fn test_dedupe_hf_entries_merges_duplicate_metadata() {\n        let deduped = dedupe_hf_entries(vec![\n            HfModelEntry {\n                name: \"Example/Model\".to_string(),\n                provider: \"Example\".to_string(),\n                parameter_count: \"18B\".to_string(),\n                parameters_raw: Some(18_000_000_000),\n                min_ram_gb: 10.0,\n                recommended_ram_gb: 18.0,\n                min_vram_gb: Some(8.0),\n                quantization: \"Q4_K_M\".to_string(),\n                context_length: 32768,\n                use_case: \"General\".to_string(),\n                is_moe: false,\n                num_experts: None,\n                active_experts: None,\n                active_parameters: None,\n                release_date: Some(\"2026-01-01\".to_string()),\n                gguf_sources: vec![GgufSource {\n                    repo: \"example/example-model-gguf\".to_string(),\n                    provider: \"example\".to_string(),\n                }],\n                capabilities: vec![Capability::Vision],\n                format: ModelFormat::Safetensors,\n                hf_downloads: 10_000,\n                hf_likes: 500,\n            },\n            HfModelEntry {\n                name: \"Example/Model\".to_string(),\n                provider: \"Example\".to_string(),\n                parameter_count: \"20B\".to_string(),\n                parameters_raw: Some(20_000_000_000),\n                min_ram_gb: 12.0,\n                recommended_ram_gb: 24.0,\n                min_vram_gb: Some(10.0),\n                quantization: \"Q4_K_M\".to_string(),\n                context_length: 65536,\n                use_case: \"General\".to_string(),\n                is_moe: true,\n                num_experts: Some(64),\n                active_experts: Some(8),\n                active_parameters: Some(3_000_000_000),\n                release_date: Some(\"2026-02-01\".to_string()),\n                gguf_sources: vec![GgufSource {\n                    repo: \"unsloth/example-model-gguf\".to_string(),\n                    provider: \"unsloth\".to_string(),\n                }],\n                capabilities: vec![Capability::ToolUse],\n                format: ModelFormat::Gguf,\n                hf_downloads: 100,\n                hf_likes: 10,\n            },\n        ]);\n\n        assert_eq!(deduped.len(), 1);\n        let merged = &deduped[0];\n        assert_eq!(merged.parameter_count, \"20B\");\n        assert_eq!(merged.parameters_raw, Some(20_000_000_000));\n        assert_eq!(merged.min_ram_gb, 12.0);\n        assert_eq!(merged.recommended_ram_gb, 24.0);\n        assert_eq!(merged.min_vram_gb, Some(10.0));\n        assert_eq!(merged.context_length, 65536);\n        assert!(merged.is_moe);\n        assert_eq!(merged.num_experts, Some(64));\n        assert_eq!(merged.active_experts, Some(8));\n        assert_eq!(merged.active_parameters, Some(3_000_000_000));\n        assert!(merged.capabilities.contains(&Capability::Vision));\n        assert!(merged.capabilities.contains(&Capability::ToolUse));\n        assert_eq!(merged.gguf_sources.len(), 2);\n        assert!(\n            merged\n                .gguf_sources\n                .iter()\n                .any(|source| source.repo == \"example/example-model-gguf\")\n        );\n        assert!(\n            merged\n                .gguf_sources\n                .iter()\n                .any(|source| source.repo == \"unsloth/example-model-gguf\")\n        );\n    }\n\n    #[test]\n    fn test_model_database_deduplicates_exact_name_collisions() {\n        let db = ModelDatabase::new();\n        let matches: Vec<_> = db\n            .get_all_models()\n            .iter()\n            .filter(|m| m.name == \"Qwen/Qwen3-Coder-Next\")\n            .collect();\n\n        assert_eq!(\n            matches.len(),\n            1,\n            \"duplicate exact model names should be collapsed\"\n        );\n\n        let model = matches[0];\n        assert_eq!(model.use_case, \"Code generation and completion\");\n        assert_eq!(model.parameter_count, \"80B\");\n        assert_eq!(model.parameters_raw, Some(80_000_000_000));\n        assert_eq!(model.min_ram_gb, 44.8);\n        assert_eq!(model.recommended_ram_gb, 74.6);\n        assert_eq!(model.min_vram_gb, Some(41.0));\n        assert!(model.is_moe);\n        assert_eq!(model.num_experts, Some(64));\n        assert_eq!(model.active_experts, Some(4));\n        assert_eq!(model.active_parameters, Some(3_000_000_000));\n        assert!(\n            model\n                .gguf_sources\n                .iter()\n                .any(|source| source.repo == \"unsloth/Qwen3-Coder-Next-GGUF\")\n        );\n    }\n\n    #[test]\n    fn test_find_model() {\n        let db = ModelDatabase::new();\n\n        // Search by name substring (case insensitive)\n        let results = db.find_model(\"llama\");\n        assert!(!results.is_empty());\n        assert!(\n            results\n                .iter()\n                .any(|m| m.name.to_lowercase().contains(\"llama\"))\n        );\n\n        // Search should be case insensitive\n        let results_upper = db.find_model(\"LLAMA\");\n        assert_eq!(results.len(), results_upper.len());\n    }\n\n    #[test]\n    fn test_models_fitting_system() {\n        let db = ModelDatabase::new();\n\n        // Large system should fit many models\n        let fitting = db.models_fitting_system(32.0, true, Some(24.0));\n        assert!(!fitting.is_empty());\n\n        // Very small system should fit fewer or no models\n        let fitting_small = db.models_fitting_system(2.0, false, None);\n        assert!(fitting_small.len() < fitting.len());\n\n        // All fitting models should meet RAM requirements\n        for model in fitting_small {\n            assert!(model.min_ram_gb <= 2.0);\n        }\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // Capability tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_capability_infer_vision() {\n        let model = LlmModel {\n            name: \"meta-llama/Llama-3.2-11B-Vision-Instruct\".to_string(),\n            provider: \"Meta\".to_string(),\n            parameter_count: \"11B\".to_string(),\n            parameters_raw: Some(11_000_000_000),\n            min_ram_gb: 6.0,\n            recommended_ram_gb: 10.0,\n            min_vram_gb: Some(6.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 131072,\n            use_case: \"Multimodal, vision and text\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        let caps = Capability::infer(&model);\n        assert!(caps.contains(&Capability::Vision));\n        // Also gets ToolUse because \"llama-3\" + \"instruct\"\n        assert!(caps.contains(&Capability::ToolUse));\n    }\n\n    #[test]\n    fn test_capability_infer_tool_use() {\n        let model = LlmModel {\n            name: \"Qwen/Qwen3-8B\".to_string(),\n            provider: \"Qwen\".to_string(),\n            parameter_count: \"8B\".to_string(),\n            parameters_raw: Some(8_000_000_000),\n            min_ram_gb: 4.5,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 32768,\n            use_case: \"General purpose text generation\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        let caps = Capability::infer(&model);\n        assert!(caps.contains(&Capability::ToolUse));\n        assert!(!caps.contains(&Capability::Vision));\n    }\n\n    #[test]\n    fn test_capability_infer_none() {\n        let model = LlmModel {\n            name: \"BAAI/bge-large-en-v1.5\".to_string(),\n            provider: \"BAAI\".to_string(),\n            parameter_count: \"335M\".to_string(),\n            parameters_raw: Some(335_000_000),\n            min_ram_gb: 1.0,\n            recommended_ram_gb: 2.0,\n            min_vram_gb: Some(1.0),\n            quantization: \"F16\".to_string(),\n            context_length: 512,\n            use_case: \"Text embeddings for RAG\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: ModelFormat::default(),\n        };\n        let caps = Capability::infer(&model);\n        assert!(caps.is_empty());\n    }\n\n    #[test]\n    fn test_capability_preserves_explicit() {\n        let model = LlmModel {\n            name: \"some-model\".to_string(),\n            provider: \"Test\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 4.0,\n            recommended_ram_gb: 8.0,\n            min_vram_gb: Some(4.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 4096,\n            use_case: \"General\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![Capability::Vision],\n            format: ModelFormat::default(),\n        };\n        let caps = Capability::infer(&model);\n        // Should keep the explicit Vision and not duplicate it\n        assert_eq!(caps.iter().filter(|c| **c == Capability::Vision).count(), 1);\n    }\n\n    #[test]\n    fn test_awq_gptq_quant_values() {\n        // AWQ\n        assert_eq!(quant_bpp(\"AWQ-4bit\"), 0.5);\n        assert_eq!(quant_bpp(\"AWQ-8bit\"), 1.0);\n        assert_eq!(quant_speed_multiplier(\"AWQ-4bit\"), 1.2);\n        assert_eq!(quant_speed_multiplier(\"AWQ-8bit\"), 0.85);\n        assert_eq!(quant_quality_penalty(\"AWQ-4bit\"), -3.0);\n        assert_eq!(quant_quality_penalty(\"AWQ-8bit\"), 0.0);\n        // GPTQ\n        assert_eq!(quant_bpp(\"GPTQ-Int4\"), 0.5);\n        assert_eq!(quant_bpp(\"GPTQ-Int8\"), 1.0);\n        assert_eq!(quant_speed_multiplier(\"GPTQ-Int4\"), 1.2);\n        assert_eq!(quant_speed_multiplier(\"GPTQ-Int8\"), 0.85);\n        assert_eq!(quant_quality_penalty(\"GPTQ-Int4\"), -3.0);\n        assert_eq!(quant_quality_penalty(\"GPTQ-Int8\"), 0.0);\n    }\n\n    #[test]\n    fn test_model_format_prequantized() {\n        assert!(ModelFormat::Awq.is_prequantized());\n        assert!(ModelFormat::Gptq.is_prequantized());\n        assert!(!ModelFormat::Gguf.is_prequantized());\n        assert!(!ModelFormat::Mlx.is_prequantized());\n        assert!(!ModelFormat::Safetensors.is_prequantized());\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // GGUF source catalog tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_gguf_source_deserialization() {\n        let json = r#\"{\"repo\": \"unsloth/Llama-3.1-8B-Instruct-GGUF\", \"provider\": \"unsloth\"}\"#;\n        let source: GgufSource = serde_json::from_str(json).unwrap();\n        assert_eq!(source.repo, \"unsloth/Llama-3.1-8B-Instruct-GGUF\");\n        assert_eq!(source.provider, \"unsloth\");\n    }\n\n    #[test]\n    fn test_gguf_sources_default_to_empty() {\n        let json = r#\"{\n            \"name\": \"test/model\",\n            \"provider\": \"Test\",\n            \"parameter_count\": \"7B\",\n            \"parameters_raw\": 7000000000,\n            \"min_ram_gb\": 4.0,\n            \"recommended_ram_gb\": 8.0,\n            \"quantization\": \"Q4_K_M\",\n            \"context_length\": 4096,\n            \"use_case\": \"General\"\n        }\"#;\n        let entry: HfModelEntry = serde_json::from_str(json).unwrap();\n        assert!(entry.gguf_sources.is_empty());\n    }\n\n    #[test]\n    fn test_catalog_popular_models_have_gguf_sources() {\n        let db = ModelDatabase::new();\n        // These popular models should have gguf_sources populated in the catalog\n        let expected_with_gguf = [\n            \"meta-llama/Llama-3.3-70B-Instruct\",\n            \"Qwen/Qwen2.5-7B-Instruct\",\n            \"Qwen/Qwen2.5-Coder-7B-Instruct\",\n            \"meta-llama/Meta-Llama-3-8B-Instruct\",\n            \"mistralai/Mistral-7B-Instruct-v0.3\",\n        ];\n        for name in &expected_with_gguf {\n            let model = db.get_all_models().iter().find(|m| m.name == *name);\n            assert!(model.is_some(), \"Model {} should exist in catalog\", name);\n            let model = model.unwrap();\n            assert!(\n                !model.gguf_sources.is_empty(),\n                \"Model {} should have gguf_sources but has none\",\n                name\n            );\n        }\n    }\n\n    #[test]\n    fn test_catalog_gguf_sources_have_valid_repos() {\n        let db = ModelDatabase::new();\n        for model in db.get_all_models() {\n            for source in &model.gguf_sources {\n                assert!(\n                    source.repo.contains('/'),\n                    \"GGUF source repo '{}' for model '{}' should be owner/repo format\",\n                    source.repo,\n                    model.name\n                );\n                assert!(\n                    !source.provider.is_empty(),\n                    \"GGUF source provider for model '{}' should not be empty\",\n                    model.name\n                );\n                assert!(\n                    source.repo.to_uppercase().contains(\"GGUF\"),\n                    \"GGUF source repo '{}' for model '{}' should contain 'GGUF'\",\n                    source.repo,\n                    model.name\n                );\n            }\n        }\n    }\n\n    #[test]\n    fn test_catalog_has_significant_gguf_coverage() {\n        let db = ModelDatabase::new();\n        let total = db.get_all_models().len();\n        let with_gguf = db\n            .get_all_models()\n            .iter()\n            .filter(|m| !m.gguf_sources.is_empty())\n            .count();\n        // We should have at least 25% coverage after enrichment\n        let coverage_pct = (with_gguf as f64 / total as f64) * 100.0;\n        assert!(\n            coverage_pct >= 25.0,\n            \"GGUF source coverage is only {:.1}% ({}/{}), expected at least 25%\",\n            coverage_pct,\n            with_gguf,\n            total\n        );\n    }\n}\n"
  },
  {
    "path": "llmfit-core/src/plan.rs",
    "content": "use crate::fit::{FitLevel, RunMode};\nuse crate::hardware::{GpuBackend, SystemSpecs};\nuse crate::models::{LlmModel, quant_speed_multiplier};\n\nconst SUPPORTED_QUANTS: &[&str] = &[\n    \"F32\",\n    \"F16\",\n    \"BF16\",\n    \"Q8_0\",\n    \"Q6_K\",\n    \"Q5_K_M\",\n    \"Q4_K_M\",\n    \"Q4_0\",\n    \"Q3_K_M\",\n    \"Q2_K\",\n    \"mlx-8bit\",\n    \"mlx-4bit\",\n    \"AWQ-4bit\",\n    \"AWQ-8bit\",\n    \"GPTQ-Int4\",\n    \"GPTQ-Int8\",\n];\n\n#[derive(Debug, Clone, serde::Serialize)]\npub struct PlanRequest {\n    pub context: u32,\n    pub quant: Option<String>,\n    pub target_tps: Option<f64>,\n}\n\n#[derive(Debug, Clone, serde::Serialize)]\npub struct HardwareEstimate {\n    pub vram_gb: Option<f64>,\n    pub ram_gb: f64,\n    pub cpu_cores: usize,\n}\n\n#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize)]\n#[serde(rename_all = \"snake_case\")]\npub enum PlanRunPath {\n    Gpu,\n    CpuOffload,\n    CpuOnly,\n}\n\nimpl PlanRunPath {\n    pub fn label(&self) -> &'static str {\n        match self {\n            PlanRunPath::Gpu => \"GPU\",\n            PlanRunPath::CpuOffload => \"CPU offload\",\n            PlanRunPath::CpuOnly => \"CPU-only\",\n        }\n    }\n\n    fn run_mode(self) -> RunMode {\n        match self {\n            PlanRunPath::Gpu => RunMode::Gpu,\n            PlanRunPath::CpuOffload => RunMode::CpuOffload,\n            PlanRunPath::CpuOnly => RunMode::CpuOnly,\n        }\n    }\n}\n\n#[derive(Debug, Clone, serde::Serialize)]\npub struct PathEstimate {\n    pub path: PlanRunPath,\n    pub feasible: bool,\n    pub minimum: Option<HardwareEstimate>,\n    pub recommended: Option<HardwareEstimate>,\n    pub estimated_tps: Option<f64>,\n    pub fit_level: Option<FitLevel>,\n    pub notes: Vec<String>,\n}\n\n#[derive(Debug, Clone, serde::Serialize)]\npub struct UpgradeDelta {\n    pub resource: String,\n    pub add_gb: Option<f64>,\n    pub add_cores: Option<usize>,\n    pub target_fit: Option<FitLevel>,\n    pub path: PlanRunPath,\n    pub description: String,\n}\n\n#[derive(Debug, Clone, serde::Serialize)]\npub struct PlanCurrentStatus {\n    pub fit_level: FitLevel,\n    pub run_mode: RunMode,\n    pub estimated_tps: f64,\n}\n\n#[derive(Debug, Clone, serde::Serialize)]\npub struct PlanEstimate {\n    pub estimate_notice: String,\n    pub model_name: String,\n    pub provider: String,\n    pub context: u32,\n    pub quantization: String,\n    pub target_tps: Option<f64>,\n    pub minimum: HardwareEstimate,\n    pub recommended: HardwareEstimate,\n    pub run_paths: Vec<PathEstimate>,\n    pub current: PlanCurrentStatus,\n    pub upgrade_deltas: Vec<UpgradeDelta>,\n}\n\npub fn normalize_quant(quant: &str) -> Option<String> {\n    let trimmed = quant.trim();\n    if trimmed.is_empty() {\n        return None;\n    }\n\n    if trimmed.eq_ignore_ascii_case(\"mlx-4bit\") {\n        return Some(\"mlx-4bit\".to_string());\n    }\n    if trimmed.eq_ignore_ascii_case(\"mlx-8bit\") {\n        return Some(\"mlx-8bit\".to_string());\n    }\n\n    // AWQ quantization formats\n    if trimmed.eq_ignore_ascii_case(\"awq-4bit\") {\n        return Some(\"AWQ-4bit\".to_string());\n    }\n    if trimmed.eq_ignore_ascii_case(\"awq-8bit\") {\n        return Some(\"AWQ-8bit\".to_string());\n    }\n    // GPTQ quantization formats\n    if trimmed.eq_ignore_ascii_case(\"gptq-int4\") {\n        return Some(\"GPTQ-Int4\".to_string());\n    }\n    if trimmed.eq_ignore_ascii_case(\"gptq-int8\") {\n        return Some(\"GPTQ-Int8\".to_string());\n    }\n\n    let upper = trimmed.to_uppercase();\n    if SUPPORTED_QUANTS.contains(&upper.as_str()) {\n        Some(upper)\n    } else {\n        None\n    }\n}\n\nfn estimate_tps(\n    model: &LlmModel,\n    quant: &str,\n    backend: GpuBackend,\n    path: PlanRunPath,\n    cpu_cores: usize,\n) -> f64 {\n    estimate_tps_with_gpu(model, quant, backend, path, cpu_cores, None)\n}\n\n/// Bandwidth-aware tok/s estimation (mirrors fit.rs logic).\n/// When `gpu_name` is provided and recognized, uses memory-bandwidth-based\n/// estimation instead of fixed constants.\nfn estimate_tps_with_gpu(\n    model: &LlmModel,\n    quant: &str,\n    backend: GpuBackend,\n    path: PlanRunPath,\n    cpu_cores: usize,\n    gpu_name: Option<&str>,\n) -> f64 {\n    use crate::hardware::gpu_memory_bandwidth_gbps;\n    use crate::models::quant_bytes_per_param;\n\n    let params = model.params_b().max(0.1);\n\n    // Bandwidth-based estimation when GPU is recognized\n    if path != PlanRunPath::CpuOnly\n        && let Some(name) = gpu_name\n        && let Some(bw) = gpu_memory_bandwidth_gbps(name)\n    {\n        let model_gb = params * quant_bytes_per_param(quant);\n        let efficiency = 0.55;\n        let raw_tps = (bw / model_gb) * efficiency;\n\n        let mode_factor = match path {\n            PlanRunPath::Gpu => 1.0,\n            PlanRunPath::CpuOffload => 0.5,\n            PlanRunPath::CpuOnly => unreachable!(),\n        };\n\n        return (raw_tps * mode_factor).max(0.1);\n    }\n\n    // Fallback: fixed-constant approach\n    let k: f64 = match backend {\n        GpuBackend::Metal => 160.0,\n        GpuBackend::Cuda => 220.0,\n        GpuBackend::Rocm => 180.0,\n        GpuBackend::Vulkan => 150.0,\n        GpuBackend::Sycl => 100.0,\n        GpuBackend::CpuArm => 90.0,\n        GpuBackend::CpuX86 => 70.0,\n        GpuBackend::Ascend => 390.0,\n    };\n\n    let mut base = (k / params) * quant_speed_multiplier(quant);\n\n    if cpu_cores >= 8 {\n        base *= 1.1;\n    }\n\n    match path {\n        PlanRunPath::Gpu => {}\n        PlanRunPath::CpuOffload => base *= 0.5,\n        PlanRunPath::CpuOnly => {\n            let cpu_k = if cfg!(target_arch = \"aarch64\") {\n                90.0\n            } else {\n                70.0\n            };\n            base = (cpu_k / params) * quant_speed_multiplier(quant);\n            if cpu_cores >= 8 {\n                base *= 1.1;\n            }\n        }\n    }\n\n    base.max(0.1)\n}\n\nfn fit_level_for(\n    path: PlanRunPath,\n    required_gb: f64,\n    available_gb: f64,\n    recommended_gb: f64,\n) -> FitLevel {\n    if required_gb > available_gb {\n        return FitLevel::TooTight;\n    }\n\n    match path {\n        PlanRunPath::Gpu => {\n            if recommended_gb <= available_gb {\n                FitLevel::Perfect\n            } else if available_gb >= required_gb * 1.2 {\n                FitLevel::Good\n            } else {\n                FitLevel::Marginal\n            }\n        }\n        PlanRunPath::CpuOffload => {\n            if available_gb >= required_gb * 1.2 {\n                FitLevel::Good\n            } else {\n                FitLevel::Marginal\n            }\n        }\n        PlanRunPath::CpuOnly => FitLevel::Marginal,\n    }\n}\n\nfn minimum_cores_for_target(\n    model: &LlmModel,\n    quant: &str,\n    backend: GpuBackend,\n    path: PlanRunPath,\n    target_tps: Option<f64>,\n) -> Option<usize> {\n    let Some(target) = target_tps else {\n        return Some(4);\n    };\n\n    for cores in 1..=64 {\n        let tps = estimate_tps(model, quant, backend, path, cores);\n        if tps >= target {\n            return Some(cores);\n        }\n    }\n\n    None\n}\n\nfn default_gpu_backend(system: &SystemSpecs) -> GpuBackend {\n    if system.has_gpu {\n        system.backend\n    } else {\n        GpuBackend::Cuda\n    }\n}\n\nfn evaluate_current(\n    model: &LlmModel,\n    quant: &str,\n    context: u32,\n    target_tps: Option<f64>,\n    system: &SystemSpecs,\n) -> PlanCurrentStatus {\n    let model_mem = model.estimate_memory_gb(quant, context);\n    let gpu_vram = system\n        .total_gpu_vram_gb\n        .or(system.gpu_vram_gb)\n        .unwrap_or(0.0);\n\n    let mut candidates: Vec<(FitLevel, PlanRunPath, f64)> = Vec::new();\n\n    if system.has_gpu && gpu_vram > 0.0 {\n        let gpu_fit = fit_level_for(\n            PlanRunPath::Gpu,\n            model_mem,\n            gpu_vram,\n            model.recommended_ram_gb,\n        );\n        let gpu_name = system.gpu_name.as_deref();\n        let gpu_tps = estimate_tps_with_gpu(\n            model,\n            quant,\n            system.backend,\n            PlanRunPath::Gpu,\n            system.total_cpu_cores,\n            gpu_name,\n        );\n        if target_tps.is_none_or(|t| gpu_tps >= t) {\n            candidates.push((gpu_fit, PlanRunPath::Gpu, gpu_tps));\n        }\n\n        if !system.unified_memory {\n            let offload_fit = fit_level_for(\n                PlanRunPath::CpuOffload,\n                model_mem,\n                system.available_ram_gb,\n                model.recommended_ram_gb,\n            );\n            let offload_tps = estimate_tps_with_gpu(\n                model,\n                quant,\n                system.backend,\n                PlanRunPath::CpuOffload,\n                system.total_cpu_cores,\n                gpu_name,\n            );\n            if target_tps.is_none_or(|t| offload_tps >= t) {\n                candidates.push((offload_fit, PlanRunPath::CpuOffload, offload_tps));\n            }\n        }\n    }\n\n    let cpu_fit = fit_level_for(\n        PlanRunPath::CpuOnly,\n        model_mem,\n        system.available_ram_gb,\n        model.recommended_ram_gb,\n    );\n    let cpu_tps = estimate_tps(\n        model,\n        quant,\n        system.backend,\n        PlanRunPath::CpuOnly,\n        system.total_cpu_cores,\n    );\n    if target_tps.is_none_or(|t| cpu_tps >= t) {\n        candidates.push((cpu_fit, PlanRunPath::CpuOnly, cpu_tps));\n    }\n\n    candidates.sort_by(|a, b| {\n        let rank = |fit: FitLevel| match fit {\n            FitLevel::Perfect => 4,\n            FitLevel::Good => 3,\n            FitLevel::Marginal => 2,\n            FitLevel::TooTight => 1,\n        };\n        rank(b.0).cmp(&rank(a.0)).then_with(|| {\n            let p = |path: PlanRunPath| match path {\n                PlanRunPath::Gpu => 3,\n                PlanRunPath::CpuOffload => 2,\n                PlanRunPath::CpuOnly => 1,\n            };\n            p(b.1).cmp(&p(a.1))\n        })\n    });\n\n    if let Some((fit_level, path, tps)) = candidates.first() {\n        PlanCurrentStatus {\n            fit_level: *fit_level,\n            run_mode: path.run_mode(),\n            estimated_tps: *tps,\n        }\n    } else {\n        PlanCurrentStatus {\n            fit_level: FitLevel::TooTight,\n            run_mode: RunMode::CpuOnly,\n            estimated_tps: 0.0,\n        }\n    }\n}\n\nfn build_path_estimate(\n    model: &LlmModel,\n    quant: &str,\n    context: u32,\n    target_tps: Option<f64>,\n    path: PlanRunPath,\n    system: &SystemSpecs,\n) -> PathEstimate {\n    let model_mem = model.estimate_memory_gb(quant, context);\n    let backend = default_gpu_backend(system);\n    let mut notes = vec![];\n\n    let min_cores = match minimum_cores_for_target(model, quant, backend, path, target_tps) {\n        Some(c) => c,\n        None => {\n            return PathEstimate {\n                path,\n                feasible: false,\n                minimum: None,\n                recommended: None,\n                estimated_tps: None,\n                fit_level: None,\n                notes: vec![\n                    \"Target TPS is not reachable under current speed heuristics\".to_string(),\n                ],\n            };\n        }\n    };\n\n    let recommended_cores = min_cores.max(8);\n\n    let gpu_name = system.gpu_name.as_deref();\n\n    match path {\n        PlanRunPath::Gpu => {\n            let min_vram = model_mem;\n            let rec_vram = model.recommended_ram_gb.max(model_mem * 1.2);\n            let min_ram = (model_mem * 0.2).max(8.0);\n            let rec_ram = (min_ram * 1.25).max(12.0);\n            let tps = estimate_tps_with_gpu(model, quant, backend, path, min_cores, gpu_name);\n\n            let fit = fit_level_for(path, min_vram, min_vram, model.recommended_ram_gb);\n            notes.push(\n                \"Estimated from quant/context memory and fit headroom thresholds\".to_string(),\n            );\n\n            PathEstimate {\n                path,\n                feasible: true,\n                minimum: Some(HardwareEstimate {\n                    vram_gb: Some(min_vram),\n                    ram_gb: min_ram,\n                    cpu_cores: min_cores,\n                }),\n                recommended: Some(HardwareEstimate {\n                    vram_gb: Some(rec_vram),\n                    ram_gb: rec_ram,\n                    cpu_cores: recommended_cores,\n                }),\n                estimated_tps: Some(tps),\n                fit_level: Some(fit),\n                notes,\n            }\n        }\n        PlanRunPath::CpuOffload => {\n            if system.unified_memory {\n                return PathEstimate {\n                    path,\n                    feasible: false,\n                    minimum: None,\n                    recommended: None,\n                    estimated_tps: None,\n                    fit_level: None,\n                    notes: vec![\"CPU offload is skipped on unified-memory systems\".to_string()],\n                };\n            }\n\n            let min_vram = 2.0;\n            let rec_vram = 4.0;\n            let min_ram = model_mem;\n            let rec_ram = model_mem * 1.2;\n            let fit = fit_level_for(path, min_ram, min_ram, model.recommended_ram_gb);\n            let tps = estimate_tps_with_gpu(model, quant, backend, path, min_cores, gpu_name);\n            notes.push(\"RAM is the primary memory pool for CPU offload\".to_string());\n\n            PathEstimate {\n                path,\n                feasible: true,\n                minimum: Some(HardwareEstimate {\n                    vram_gb: Some(min_vram),\n                    ram_gb: min_ram,\n                    cpu_cores: min_cores,\n                }),\n                recommended: Some(HardwareEstimate {\n                    vram_gb: Some(rec_vram),\n                    ram_gb: rec_ram,\n                    cpu_cores: recommended_cores,\n                }),\n                estimated_tps: Some(tps),\n                fit_level: Some(fit),\n                notes,\n            }\n        }\n        PlanRunPath::CpuOnly => {\n            let min_ram = model_mem;\n            let rec_ram = model_mem * 1.2;\n            let fit = fit_level_for(path, min_ram, min_ram, model.recommended_ram_gb);\n            let tps = estimate_tps(model, quant, GpuBackend::CpuX86, path, min_cores);\n            notes.push(\n                \"CPU-only fit is always capped at Marginal in current heuristics\".to_string(),\n            );\n\n            PathEstimate {\n                path,\n                feasible: true,\n                minimum: Some(HardwareEstimate {\n                    vram_gb: None,\n                    ram_gb: min_ram,\n                    cpu_cores: min_cores,\n                }),\n                recommended: Some(HardwareEstimate {\n                    vram_gb: None,\n                    ram_gb: rec_ram,\n                    cpu_cores: recommended_cores,\n                }),\n                estimated_tps: Some(tps),\n                fit_level: Some(fit),\n                notes,\n            }\n        }\n    }\n}\n\npub fn estimate_model_plan(\n    model: &LlmModel,\n    request: &PlanRequest,\n    system: &SystemSpecs,\n) -> Result<PlanEstimate, String> {\n    if request.context == 0 {\n        return Err(\"--context must be greater than 0\".to_string());\n    }\n    if let Some(target) = request.target_tps\n        && target <= 0.0\n    {\n        return Err(\"--target-tps must be greater than 0\".to_string());\n    }\n\n    let quant = if let Some(ref q) = request.quant {\n        normalize_quant(q).ok_or_else(|| format!(\"Unsupported quantization '{}'.\", q))?\n    } else {\n        model.quantization.clone()\n    };\n\n    let context = request.context;\n    let run_paths = vec![\n        build_path_estimate(\n            model,\n            &quant,\n            context,\n            request.target_tps,\n            PlanRunPath::Gpu,\n            system,\n        ),\n        build_path_estimate(\n            model,\n            &quant,\n            context,\n            request.target_tps,\n            PlanRunPath::CpuOffload,\n            system,\n        ),\n        build_path_estimate(\n            model,\n            &quant,\n            context,\n            request.target_tps,\n            PlanRunPath::CpuOnly,\n            system,\n        ),\n    ];\n\n    let current = evaluate_current(model, &quant, context, request.target_tps, system);\n\n    let preferred = run_paths\n        .iter()\n        .find(|p| p.path == PlanRunPath::Gpu && p.feasible)\n        .or_else(|| {\n            run_paths\n                .iter()\n                .find(|p| p.path == PlanRunPath::CpuOffload && p.feasible)\n        })\n        .or_else(|| {\n            run_paths\n                .iter()\n                .find(|p| p.path == PlanRunPath::CpuOnly && p.feasible)\n        })\n        .ok_or_else(|| \"No feasible run path found for this configuration\".to_string())?;\n\n    let minimum = preferred\n        .minimum\n        .clone()\n        .ok_or_else(|| \"Missing minimum estimate\".to_string())?;\n    let recommended = preferred\n        .recommended\n        .clone()\n        .ok_or_else(|| \"Missing recommended estimate\".to_string())?;\n\n    let mut upgrade_deltas = Vec::new();\n\n    let current_vram = system\n        .total_gpu_vram_gb\n        .or(system.gpu_vram_gb)\n        .unwrap_or(0.0);\n    if let Some(gpu_path) = run_paths.iter().find(|p| p.path == PlanRunPath::Gpu)\n        && let Some(min_hw) = &gpu_path.minimum\n    {\n        let add_good = (min_hw.vram_gb.unwrap_or(0.0) - current_vram).max(0.0);\n        upgrade_deltas.push(UpgradeDelta {\n            resource: \"vram_gb\".to_string(),\n            add_gb: Some(add_good),\n            add_cores: None,\n            target_fit: Some(FitLevel::Good),\n            path: PlanRunPath::Gpu,\n            description: format!(\"+{add_good:.1} GB VRAM -> Good\"),\n        });\n    }\n    if let Some(gpu_path) = run_paths.iter().find(|p| p.path == PlanRunPath::Gpu)\n        && let Some(rec_hw) = &gpu_path.recommended\n    {\n        let add_perfect = (rec_hw.vram_gb.unwrap_or(0.0) - current_vram).max(0.0);\n        upgrade_deltas.push(UpgradeDelta {\n            resource: \"vram_gb\".to_string(),\n            add_gb: Some(add_perfect),\n            add_cores: None,\n            target_fit: Some(FitLevel::Perfect),\n            path: PlanRunPath::Gpu,\n            description: format!(\"+{add_perfect:.1} GB VRAM -> Perfect\"),\n        });\n    }\n\n    let current_ram = system.available_ram_gb;\n    if minimum.ram_gb > current_ram {\n        let add_ram = minimum.ram_gb - current_ram;\n        upgrade_deltas.push(UpgradeDelta {\n            resource: \"ram_gb\".to_string(),\n            add_gb: Some(add_ram),\n            add_cores: None,\n            target_fit: Some(FitLevel::Marginal),\n            path: preferred.path,\n            description: format!(\"+{add_ram:.1} GB RAM -> Runnable\"),\n        });\n    }\n\n    if minimum.cpu_cores > system.total_cpu_cores {\n        let add_cores = minimum.cpu_cores - system.total_cpu_cores;\n        upgrade_deltas.push(UpgradeDelta {\n            resource: \"cpu_cores\".to_string(),\n            add_gb: None,\n            add_cores: Some(add_cores),\n            target_fit: None,\n            path: preferred.path,\n            description: format!(\"+{add_cores} CPU cores -> Target TPS\"),\n        });\n    }\n\n    Ok(PlanEstimate {\n        estimate_notice: \"Estimate-based output using current llmfit fit/speed heuristics; not an exact benchmark.\"\n            .to_string(),\n        model_name: model.name.clone(),\n        provider: model.provider.clone(),\n        context,\n        quantization: quant,\n        target_tps: request.target_tps,\n        minimum,\n        recommended,\n        run_paths,\n        current,\n        upgrade_deltas,\n    })\n}\n\npub fn resolve_model_selector<'a>(\n    models: &'a [LlmModel],\n    selector: &str,\n) -> Result<&'a LlmModel, String> {\n    let needle = selector.trim().to_lowercase();\n    if needle.is_empty() {\n        return Err(\"Model selector cannot be empty\".to_string());\n    }\n\n    let exact: Vec<&LlmModel> = models\n        .iter()\n        .filter(|m| m.name.to_lowercase() == needle)\n        .collect();\n    if exact.len() == 1 {\n        return Ok(exact[0]);\n    }\n\n    let partial: Vec<&LlmModel> = models\n        .iter()\n        .filter(|m| m.name.to_lowercase().contains(&needle))\n        .collect();\n\n    match partial.len() {\n        0 => Err(format!(\"No model found matching '{}'.\", selector)),\n        1 => Ok(partial[0]),\n        _ => {\n            let suggestions = partial\n                .iter()\n                .take(10)\n                .map(|m| m.name.as_str())\n                .collect::<Vec<_>>()\n                .join(\", \");\n            Err(format!(\n                \"Model selector '{}' is ambiguous. Matches: {}\",\n                selector, suggestions\n            ))\n        }\n    }\n}\n\n#[cfg(test)]\nmod tests {\n    use super::*;\n\n    fn test_model() -> LlmModel {\n        LlmModel {\n            name: \"Qwen-Test-7B\".to_string(),\n            provider: \"Qwen\".to_string(),\n            parameter_count: \"7B\".to_string(),\n            parameters_raw: Some(7_000_000_000),\n            min_ram_gb: 6.0,\n            recommended_ram_gb: 12.0,\n            min_vram_gb: Some(6.0),\n            quantization: \"Q4_K_M\".to_string(),\n            context_length: 32768,\n            use_case: \"Coding\".to_string(),\n            is_moe: false,\n            num_experts: None,\n            active_experts: None,\n            active_parameters: None,\n            release_date: None,\n            gguf_sources: vec![],\n            capabilities: vec![],\n            format: crate::models::ModelFormat::default(),\n        }\n    }\n\n    fn test_specs() -> SystemSpecs {\n        SystemSpecs {\n            total_ram_gb: 32.0,\n            available_ram_gb: 24.0,\n            total_cpu_cores: 8,\n            cpu_name: \"Test CPU\".to_string(),\n            has_gpu: true,\n            gpu_vram_gb: Some(12.0),\n            total_gpu_vram_gb: Some(12.0),\n            gpu_name: Some(\"Test GPU\".to_string()),\n            gpu_count: 1,\n            unified_memory: false,\n            backend: GpuBackend::Cuda,\n            gpus: vec![],\n        }\n    }\n\n    #[test]\n    fn test_normalize_quant() {\n        assert_eq!(normalize_quant(\"q4_k_m\"), Some(\"Q4_K_M\".to_string()));\n        assert_eq!(normalize_quant(\"mlx-4bit\"), Some(\"mlx-4bit\".to_string()));\n        assert_eq!(normalize_quant(\"bad\"), None);\n    }\n\n    #[test]\n    fn test_normalize_quant_all_supported() {\n        for q in SUPPORTED_QUANTS {\n            if q.starts_with(\"mlx-\") || q.starts_with(\"AWQ-\") || q.starts_with(\"GPTQ-\") {\n                continue; // handled by case-insensitive paths\n            }\n            assert_eq!(\n                normalize_quant(&q.to_lowercase()),\n                Some(q.to_string()),\n                \"lowercase '{}' should normalize\",\n                q\n            );\n        }\n    }\n\n    #[test]\n    fn test_normalize_quant_whitespace_handling() {\n        assert_eq!(normalize_quant(\"  q4_k_m  \"), Some(\"Q4_K_M\".to_string()));\n        assert_eq!(normalize_quant(\"\"), None);\n        assert_eq!(normalize_quant(\"   \"), None);\n    }\n\n    #[test]\n    fn test_estimate_model_plan() {\n        let req = PlanRequest {\n            context: 8192,\n            quant: Some(\"Q4_K_M\".to_string()),\n            target_tps: Some(8.0),\n        };\n        let plan =\n            estimate_model_plan(&test_model(), &req, &test_specs()).expect(\"plan should build\");\n        assert_eq!(plan.quantization, \"Q4_K_M\");\n        assert!(!plan.run_paths.is_empty());\n        assert!(plan.minimum.ram_gb > 0.0);\n    }\n\n    #[test]\n    fn test_estimate_model_plan_zero_context_errors() {\n        let req = PlanRequest {\n            context: 0,\n            quant: None,\n            target_tps: None,\n        };\n        let result = estimate_model_plan(&test_model(), &req, &test_specs());\n        assert!(result.is_err());\n        assert!(\n            result\n                .unwrap_err()\n                .contains(\"--context must be greater than 0\")\n        );\n    }\n\n    #[test]\n    fn test_estimate_model_plan_negative_tps_errors() {\n        let req = PlanRequest {\n            context: 4096,\n            quant: None,\n            target_tps: Some(-5.0),\n        };\n        let result = estimate_model_plan(&test_model(), &req, &test_specs());\n        assert!(result.is_err());\n        assert!(\n            result\n                .unwrap_err()\n                .contains(\"--target-tps must be greater than 0\")\n        );\n    }\n\n    #[test]\n    fn test_estimate_model_plan_invalid_quant_errors() {\n        let req = PlanRequest {\n            context: 4096,\n            quant: Some(\"INVALID_QUANT\".to_string()),\n            target_tps: None,\n        };\n        let result = estimate_model_plan(&test_model(), &req, &test_specs());\n        assert!(result.is_err());\n        assert!(result.unwrap_err().contains(\"Unsupported quantization\"));\n    }\n\n    #[test]\n    fn test_estimate_model_plan_uses_model_quant_when_none() {\n        let req = PlanRequest {\n            context: 4096,\n            quant: None,\n            target_tps: None,\n        };\n        let plan = estimate_model_plan(&test_model(), &req, &test_specs()).unwrap();\n        assert_eq!(plan.quantization, \"Q4_K_M\"); // model default\n    }\n\n    #[test]\n    fn test_estimate_model_plan_has_three_run_paths() {\n        let req = PlanRequest {\n            context: 4096,\n            quant: None,\n            target_tps: None,\n        };\n        let plan = estimate_model_plan(&test_model(), &req, &test_specs()).unwrap();\n        assert_eq!(plan.run_paths.len(), 3);\n        assert_eq!(plan.run_paths[0].path, PlanRunPath::Gpu);\n        assert_eq!(plan.run_paths[1].path, PlanRunPath::CpuOffload);\n        assert_eq!(plan.run_paths[2].path, PlanRunPath::CpuOnly);\n    }\n\n    #[test]\n    fn test_estimate_model_plan_gpu_path_feasible() {\n        let req = PlanRequest {\n            context: 4096,\n            quant: Some(\"Q4_K_M\".to_string()),\n            target_tps: None,\n        };\n        let plan = estimate_model_plan(&test_model(), &req, &test_specs()).unwrap();\n        let gpu_path = &plan.run_paths[0];\n        assert!(gpu_path.feasible);\n        assert!(gpu_path.minimum.is_some());\n        assert!(gpu_path.recommended.is_some());\n        assert!(gpu_path.estimated_tps.unwrap() > 0.0);\n    }\n\n    // ── fit_level_for ────────────────────────────────────────────────\n\n    #[test]\n    fn test_fit_level_for_gpu_perfect() {\n        let fit = fit_level_for(PlanRunPath::Gpu, 8.0, 24.0, 12.0);\n        assert_eq!(fit, FitLevel::Perfect);\n    }\n\n    #[test]\n    fn test_fit_level_for_gpu_good() {\n        // required*1.2 = 9.6, available = 10.0 > 9.6, but recommended = 12.0 > 10.0\n        let fit = fit_level_for(PlanRunPath::Gpu, 8.0, 10.0, 12.0);\n        assert_eq!(fit, FitLevel::Good);\n    }\n\n    #[test]\n    fn test_fit_level_for_gpu_marginal() {\n        // available barely exceeds required, but less than required*1.2\n        let fit = fit_level_for(PlanRunPath::Gpu, 8.0, 8.5, 12.0);\n        assert_eq!(fit, FitLevel::Marginal);\n    }\n\n    #[test]\n    fn test_fit_level_for_too_tight() {\n        let fit = fit_level_for(PlanRunPath::Gpu, 24.0, 8.0, 32.0);\n        assert_eq!(fit, FitLevel::TooTight);\n    }\n\n    #[test]\n    fn test_fit_level_for_cpu_offload_caps_at_good() {\n        let fit = fit_level_for(PlanRunPath::CpuOffload, 8.0, 24.0, 12.0);\n        assert_eq!(fit, FitLevel::Good);\n    }\n\n    #[test]\n    fn test_fit_level_for_cpu_only_always_marginal() {\n        let fit = fit_level_for(PlanRunPath::CpuOnly, 4.0, 64.0, 8.0);\n        assert_eq!(fit, FitLevel::Marginal);\n    }\n\n    // ── PlanRunPath ──────────────────────────────────────────────────\n\n    #[test]\n    fn test_plan_run_path_labels() {\n        assert_eq!(PlanRunPath::Gpu.label(), \"GPU\");\n        assert_eq!(PlanRunPath::CpuOffload.label(), \"CPU offload\");\n        assert_eq!(PlanRunPath::CpuOnly.label(), \"CPU-only\");\n    }\n\n    #[test]\n    fn test_plan_run_path_to_run_mode() {\n        assert_eq!(PlanRunPath::Gpu.run_mode(), RunMode::Gpu);\n        assert_eq!(PlanRunPath::CpuOffload.run_mode(), RunMode::CpuOffload);\n        assert_eq!(PlanRunPath::CpuOnly.run_mode(), RunMode::CpuOnly);\n    }\n\n    // ── estimate_tps ─────────────────────────────────────────────────\n\n    #[test]\n    fn test_estimate_tps_gpu_faster_than_cpu() {\n        let model = test_model();\n        let gpu_tps = estimate_tps(&model, \"Q4_K_M\", GpuBackend::Cuda, PlanRunPath::Gpu, 8);\n        let cpu_tps = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            GpuBackend::CpuX86,\n            PlanRunPath::CpuOnly,\n            8,\n        );\n        assert!(gpu_tps > cpu_tps);\n    }\n\n    #[test]\n    fn test_estimate_tps_cpu_offload_slower_than_gpu() {\n        let model = test_model();\n        let gpu_tps = estimate_tps(&model, \"Q4_K_M\", GpuBackend::Cuda, PlanRunPath::Gpu, 8);\n        let offload_tps = estimate_tps(\n            &model,\n            \"Q4_K_M\",\n            GpuBackend::Cuda,\n            PlanRunPath::CpuOffload,\n            8,\n        );\n        assert!(gpu_tps > offload_tps);\n    }\n\n    #[test]\n    fn test_estimate_tps_more_cores_helps() {\n        let model = test_model();\n        let tps_4 = estimate_tps(&model, \"Q4_K_M\", GpuBackend::Cuda, PlanRunPath::Gpu, 4);\n        let tps_16 = estimate_tps(&model, \"Q4_K_M\", GpuBackend::Cuda, PlanRunPath::Gpu, 16);\n        assert!(tps_16 >= tps_4);\n    }\n\n    #[test]\n    fn test_estimate_tps_with_known_gpu_uses_bandwidth() {\n        let model = test_model();\n        let bw_tps = estimate_tps_with_gpu(\n            &model,\n            \"Q4_K_M\",\n            GpuBackend::Cuda,\n            PlanRunPath::Gpu,\n            8,\n            Some(\"NVIDIA RTX 4090\"),\n        );\n        let fallback_tps = estimate_tps_with_gpu(\n            &model,\n            \"Q4_K_M\",\n            GpuBackend::Cuda,\n            PlanRunPath::Gpu,\n            8,\n            None,\n        );\n        // Known GPU should give a different (bandwidth-based) estimate\n        assert!((bw_tps - fallback_tps).abs() > 0.01);\n    }\n\n    // ── minimum_cores_for_target ─────────────────────────────────────\n\n    #[test]\n    fn test_minimum_cores_no_target_returns_default() {\n        let model = test_model();\n        let cores =\n            minimum_cores_for_target(&model, \"Q4_K_M\", GpuBackend::Cuda, PlanRunPath::Gpu, None);\n        assert_eq!(cores, Some(4));\n    }\n\n    #[test]\n    fn test_minimum_cores_with_reachable_target() {\n        let model = test_model();\n        let cores = minimum_cores_for_target(\n            &model,\n            \"Q4_K_M\",\n            GpuBackend::Cuda,\n            PlanRunPath::Gpu,\n            Some(5.0),\n        );\n        assert!(cores.is_some());\n        assert!(cores.unwrap() >= 1);\n    }\n\n    #[test]\n    fn test_minimum_cores_unreachable_target_returns_none() {\n        let model = test_model();\n        let cores = minimum_cores_for_target(\n            &model,\n            \"Q4_K_M\",\n            GpuBackend::CpuX86,\n            PlanRunPath::CpuOnly,\n            Some(999999.0),\n        );\n        assert!(cores.is_none());\n    }\n\n    // ── default_gpu_backend ──────────────────────────────────────────\n\n    #[test]\n    fn test_default_gpu_backend_uses_system_when_gpu() {\n        let specs = test_specs();\n        assert_eq!(default_gpu_backend(&specs), GpuBackend::Cuda);\n    }\n\n    #[test]\n    fn test_default_gpu_backend_falls_back_to_cuda() {\n        let mut specs = test_specs();\n        specs.has_gpu = false;\n        assert_eq!(default_gpu_backend(&specs), GpuBackend::Cuda);\n    }\n\n    // ── evaluate_current ─────────────────────────────────────────────\n\n    #[test]\n    fn test_evaluate_current_with_gpu() {\n        let model = test_model();\n        let specs = test_specs();\n        let status = evaluate_current(&model, \"Q4_K_M\", 4096, None, &specs);\n        assert!(status.estimated_tps > 0.0);\n        // With 12GB VRAM and 7B model, GPU should be preferred\n        assert_eq!(status.run_mode, RunMode::Gpu);\n    }\n\n    #[test]\n    fn test_evaluate_current_no_gpu_uses_cpu() {\n        let model = test_model();\n        let mut specs = test_specs();\n        specs.has_gpu = false;\n        specs.gpu_vram_gb = None;\n        specs.total_gpu_vram_gb = None;\n        let status = evaluate_current(&model, \"Q4_K_M\", 4096, None, &specs);\n        assert_eq!(status.run_mode, RunMode::CpuOnly);\n        assert!(status.estimated_tps > 0.0);\n    }\n\n    #[test]\n    fn test_evaluate_current_too_tight_when_no_memory() {\n        let model = test_model();\n        let mut specs = test_specs();\n        specs.has_gpu = false;\n        specs.gpu_vram_gb = None;\n        specs.total_gpu_vram_gb = None;\n        specs.available_ram_gb = 0.5; // too small for the model\n        let status = evaluate_current(&model, \"Q4_K_M\", 4096, Some(999999.0), &specs);\n        assert_eq!(status.fit_level, FitLevel::TooTight);\n    }\n\n    // ── build_path_estimate ──────────────────────────────────────────\n\n    #[test]\n    fn test_build_path_estimate_gpu() {\n        let model = test_model();\n        let specs = test_specs();\n        let estimate = build_path_estimate(&model, \"Q4_K_M\", 4096, None, PlanRunPath::Gpu, &specs);\n        assert!(estimate.feasible);\n        let min = estimate.minimum.unwrap();\n        assert!(min.vram_gb.unwrap() > 0.0);\n        assert!(min.ram_gb > 0.0);\n    }\n\n    #[test]\n    fn test_build_path_estimate_cpu_offload_on_unified_is_infeasible() {\n        let model = test_model();\n        let mut specs = test_specs();\n        specs.unified_memory = true;\n        let estimate = build_path_estimate(\n            &model,\n            \"Q4_K_M\",\n            4096,\n            None,\n            PlanRunPath::CpuOffload,\n            &specs,\n        );\n        assert!(!estimate.feasible);\n        assert!(estimate.notes.iter().any(|n| n.contains(\"unified-memory\")));\n    }\n\n    #[test]\n    fn test_build_path_estimate_cpu_only_no_vram() {\n        let model = test_model();\n        let specs = test_specs();\n        let estimate =\n            build_path_estimate(&model, \"Q4_K_M\", 4096, None, PlanRunPath::CpuOnly, &specs);\n        assert!(estimate.feasible);\n        assert!(estimate.minimum.as_ref().unwrap().vram_gb.is_none());\n    }\n\n    // ── resolve_model_selector ───────────────────────────────────────\n\n    #[test]\n    fn test_resolve_model_selector() {\n        let models = vec![test_model()];\n        let found = resolve_model_selector(&models, \"qwen-test-7b\").expect(\"exact match\");\n        assert_eq!(found.name, \"Qwen-Test-7B\");\n    }\n\n    #[test]\n    fn test_resolve_model_selector_empty_errors() {\n        let models = vec![test_model()];\n        let result = resolve_model_selector(&models, \"\");\n        assert!(result.is_err());\n        assert!(result.unwrap_err().contains(\"cannot be empty\"));\n    }\n\n    #[test]\n    fn test_resolve_model_selector_not_found() {\n        let models = vec![test_model()];\n        let result = resolve_model_selector(&models, \"nonexistent-model\");\n        assert!(result.is_err());\n        assert!(result.unwrap_err().contains(\"No model found\"));\n    }\n\n    #[test]\n    fn test_resolve_model_selector_ambiguous() {\n        let mut m1 = test_model();\n        m1.name = \"Qwen-Test-7B\".to_string();\n        let mut m2 = test_model();\n        m2.name = \"Qwen-Test-14B\".to_string();\n        let models = vec![m1, m2];\n        let result = resolve_model_selector(&models, \"qwen-test\");\n        assert!(result.is_err());\n        assert!(result.unwrap_err().contains(\"ambiguous\"));\n    }\n\n    #[test]\n    fn test_resolve_model_selector_partial_match() {\n        let models = vec![test_model()];\n        let found = resolve_model_selector(&models, \"test-7b\").expect(\"partial match\");\n        assert_eq!(found.name, \"Qwen-Test-7B\");\n    }\n\n    // ── upgrade_deltas ───────────────────────────────────────────────\n\n    #[test]\n    fn test_plan_has_upgrade_deltas() {\n        let model = test_model();\n        let mut specs = test_specs();\n        specs.gpu_vram_gb = Some(4.0); // small VRAM triggers upgrade suggestion\n        specs.total_gpu_vram_gb = Some(4.0);\n        let req = PlanRequest {\n            context: 4096,\n            quant: Some(\"Q4_K_M\".to_string()),\n            target_tps: None,\n        };\n        let plan = estimate_model_plan(&model, &req, &specs).unwrap();\n        assert!(!plan.upgrade_deltas.is_empty());\n    }\n\n    #[test]\n    fn test_normalize_awq_gptq_quants() {\n        assert_eq!(normalize_quant(\"awq-4bit\"), Some(\"AWQ-4bit\".to_string()));\n        assert_eq!(normalize_quant(\"AWQ-4BIT\"), Some(\"AWQ-4bit\".to_string()));\n        assert_eq!(normalize_quant(\"awq-8bit\"), Some(\"AWQ-8bit\".to_string()));\n        assert_eq!(normalize_quant(\"gptq-int4\"), Some(\"GPTQ-Int4\".to_string()));\n        assert_eq!(normalize_quant(\"GPTQ-INT8\"), Some(\"GPTQ-Int8\".to_string()));\n    }\n}\n"
  },
  {
    "path": "llmfit-core/src/providers.rs",
    "content": "//! Runtime model providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).\n//!\n//! Each provider can list locally installed models and pull new ones.\n//! The trait is designed to be extended for vLLM, etc.\n\nuse std::collections::HashSet;\nuse std::path::PathBuf;\n\n// ---------------------------------------------------------------------------\n// Provider trait\n// ---------------------------------------------------------------------------\n\n/// A runtime provider that can serve LLM models locally.\npub trait ModelProvider {\n    /// Human-readable name shown in the UI.\n    fn name(&self) -> &str;\n\n    /// Whether the provider service is reachable right now.\n    fn is_available(&self) -> bool;\n\n    /// Return the set of model name stems that are currently installed.\n    /// Names are normalised lowercase, e.g. \"llama3.1:8b\".\n    fn installed_models(&self) -> HashSet<String>;\n\n    /// Start pulling a model. Returns immediately; progress is polled\n    /// via `pull_progress()`.\n    fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String>;\n}\n\n/// Handle returned by `start_pull`. The TUI polls this in a background\n/// thread and reads status/progress.\npub struct PullHandle {\n    pub model_tag: String,\n    pub receiver: std::sync::mpsc::Receiver<PullEvent>,\n}\n\n#[derive(Debug, Clone)]\npub enum PullEvent {\n    Progress {\n        status: String,\n        percent: Option<f64>,\n    },\n    Done,\n    Error(String),\n}\n\n// ---------------------------------------------------------------------------\n// Ollama provider\n// ---------------------------------------------------------------------------\n\npub struct OllamaProvider {\n    base_url: String,\n}\n\nfn normalize_ollama_host(raw: &str) -> Option<String> {\n    let host = raw.trim();\n    if host.is_empty() {\n        return None;\n    }\n\n    if host.starts_with(\"http://\") || host.starts_with(\"https://\") {\n        return Some(host.to_string());\n    }\n\n    if host.contains(\"://\") {\n        // Unsupported scheme (e.g. ftp://)\n        return None;\n    }\n\n    Some(format!(\"http://{host}\"))\n}\n\nimpl Default for OllamaProvider {\n    fn default() -> Self {\n        let base_url = std::env::var(\"OLLAMA_HOST\")\n            .ok()\n            .and_then(|raw| {\n                let normalized = normalize_ollama_host(&raw);\n                if normalized.is_none() {\n                    eprintln!(\n                        \"Warning: could not parse OLLAMA_HOST='{}'. Expected host:port or http(s)://host:port\",\n                        raw\n                    );\n                }\n                normalized\n            })\n            .unwrap_or_else(|| \"http://localhost:11434\".to_string());\n        Self { base_url }\n    }\n}\n\nimpl OllamaProvider {\n    pub fn new() -> Self {\n        Self::default()\n    }\n\n    /// Build the full API URL for a given endpoint path.\n    fn api_url(&self, path: &str) -> String {\n        format!(\"{}/api/{}\", self.base_url.trim_end_matches('/'), path)\n    }\n\n    /// Single-pass startup probe to avoid duplicate `/api/tags` calls.\n    /// Returns `(available, installed_models)`.\n    pub fn detect_with_installed(&self) -> (bool, HashSet<String>, usize) {\n        let mut set = HashSet::new();\n        let Ok(resp) = ureq::get(&self.api_url(\"tags\"))\n            .config()\n            .timeout_global(Some(std::time::Duration::from_millis(800)))\n            .build()\n            .call()\n        else {\n            return (false, set, 0);\n        };\n\n        let Ok(tags): Result<TagsResponse, _> = resp.into_body().read_json() else {\n            return (true, set, 0);\n        };\n        let count = tags.models.len();\n        for m in tags.models {\n            let lower = m.name.to_lowercase();\n            set.insert(lower.clone());\n            if let Some(family) = lower.split(':').next() {\n                set.insert(family.to_string());\n            }\n        }\n        (true, set, count)\n    }\n\n    /// Like `installed_models`, but also returns the true model count.\n    /// The HashSet may have fewer entries than 2*count due to family-name deduplication,\n    /// so `len() / 2` is unreliable for counting models.\n    pub fn installed_models_counted(&self) -> (HashSet<String>, usize) {\n        let mut set = HashSet::new();\n        let Ok(resp) = ureq::get(&self.api_url(\"tags\"))\n            .config()\n            .timeout_global(Some(std::time::Duration::from_secs(5)))\n            .build()\n            .call()\n        else {\n            return (set, 0);\n        };\n        let Ok(tags): Result<TagsResponse, _> = resp.into_body().read_json() else {\n            return (set, 0);\n        };\n        let count = tags.models.len();\n        for m in tags.models {\n            let lower = m.name.to_lowercase();\n            set.insert(lower.clone());\n            if let Some(family) = lower.split(':').next() {\n                set.insert(family.to_string());\n            }\n        }\n        (set, count)\n    }\n\n    /// Best-effort check that a tag exists in Ollama's remote registry.\n    /// Uses the local Ollama daemon's `/api/show` resolution path.\n    pub fn has_remote_tag(&self, model_tag: &str) -> bool {\n        let body = serde_json::json!({ \"model\": model_tag });\n        ureq::post(&self.api_url(\"show\"))\n            .config()\n            .timeout_global(Some(std::time::Duration::from_millis(1200)))\n            .build()\n            .send_json(&body)\n            .is_ok()\n    }\n}\n\n// -- JSON response types for Ollama API --\n\n#[derive(serde::Deserialize)]\nstruct TagsResponse {\n    models: Vec<OllamaModel>,\n}\n\n#[derive(serde::Deserialize)]\nstruct OllamaModel {\n    /// e.g. \"llama3.1:8b-instruct-q4_K_M\"\n    name: String,\n}\n\n#[derive(serde::Deserialize)]\nstruct PullStreamLine {\n    #[serde(default)]\n    status: String,\n    #[serde(default)]\n    total: Option<u64>,\n    #[serde(default)]\n    completed: Option<u64>,\n    #[serde(default)]\n    error: Option<String>,\n}\n\nimpl ModelProvider for OllamaProvider {\n    fn name(&self) -> &str {\n        \"Ollama\"\n    }\n\n    fn is_available(&self) -> bool {\n        ureq::get(&self.api_url(\"tags\"))\n            .config()\n            .timeout_global(Some(std::time::Duration::from_secs(2)))\n            .build()\n            .call()\n            .is_ok()\n    }\n\n    fn installed_models(&self) -> HashSet<String> {\n        let (set, _) = self.installed_models_counted();\n        set\n    }\n\n    fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {\n        let url = self.api_url(\"pull\");\n        let tag = model_tag.to_string();\n        let (tx, rx) = std::sync::mpsc::channel();\n\n        let body = serde_json::json!({\n            \"model\": tag,\n            \"stream\": true,\n        });\n\n        std::thread::spawn(move || {\n            let resp = ureq::post(&url)\n                .config()\n                .timeout_global(Some(std::time::Duration::from_secs(3600)))\n                .build()\n                .send_json(&body);\n\n            match resp {\n                Ok(resp) => {\n                    let reader = std::io::BufReader::new(resp.into_body().into_reader());\n                    use std::io::BufRead;\n                    for line in reader.lines() {\n                        let Ok(line) = line else { break };\n                        if line.is_empty() {\n                            continue;\n                        }\n                        if let Ok(parsed) = serde_json::from_str::<PullStreamLine>(&line) {\n                            // Check for error responses from Ollama\n                            if let Some(ref err) = parsed.error {\n                                let _ = tx.send(PullEvent::Error(err.clone()));\n                                return;\n                            }\n                            let percent = match (parsed.completed, parsed.total) {\n                                (Some(c), Some(t)) if t > 0 => Some(c as f64 / t as f64 * 100.0),\n                                _ => None,\n                            };\n                            let _ = tx.send(PullEvent::Progress {\n                                status: parsed.status.clone(),\n                                percent,\n                            });\n                            if parsed.status == \"success\" {\n                                let _ = tx.send(PullEvent::Done);\n                                return;\n                            }\n                        }\n                    }\n                    // Stream ended without \"success\" — treat as error\n                    let _ = tx.send(PullEvent::Error(\n                        \"Pull ended without success (model may not exist in Ollama registry)\"\n                            .to_string(),\n                    ));\n                }\n                Err(e) => {\n                    let _ = tx.send(PullEvent::Error(format!(\"{e}\")));\n                }\n            }\n        });\n\n        Ok(PullHandle {\n            model_tag: model_tag.to_string(),\n            receiver: rx,\n        })\n    }\n}\n\n// ---------------------------------------------------------------------------\n// MLX provider (Apple MLX framework via HuggingFace cache)\n// ---------------------------------------------------------------------------\n\npub struct MlxProvider {\n    server_url: String,\n}\n\nimpl Default for MlxProvider {\n    fn default() -> Self {\n        let server_url = std::env::var(\"MLX_LM_HOST\")\n            .ok()\n            .and_then(|url| {\n                if url.starts_with(\"http://\") || url.starts_with(\"https://\") {\n                    Some(url)\n                } else {\n                    eprintln!(\n                        \"Warning: MLX_LM_HOST must start with http:// or https://, ignoring: {}\",\n                        url\n                    );\n                    None\n                }\n            })\n            .unwrap_or_else(|| \"http://localhost:8080\".to_string());\n        Self { server_url }\n    }\n}\n\nimpl MlxProvider {\n    pub fn new() -> Self {\n        Self::default()\n    }\n\n    /// Single-pass startup probe for MLX.\n    /// On non-macOS, skips network checks and reports `available=false`.\n    pub fn detect_with_installed(&self) -> (bool, HashSet<String>) {\n        let mut set = scan_hf_cache_for_mlx();\n        if !cfg!(target_os = \"macos\") {\n            return (false, set);\n        }\n\n        let url = format!(\"{}/v1/models\", self.server_url.trim_end_matches('/'));\n        if let Ok(resp) = ureq::get(&url)\n            .config()\n            .timeout_global(Some(std::time::Duration::from_millis(800)))\n            .build()\n            .call()\n        {\n            if let Ok(json) = resp.into_body().read_json::<serde_json::Value>()\n                && let Some(data) = json.get(\"data\").and_then(|d| d.as_array())\n            {\n                for model in data {\n                    if let Some(id) = model.get(\"id\").and_then(|i| i.as_str()) {\n                        set.insert(id.to_lowercase());\n                    }\n                }\n            }\n            return (true, set);\n        }\n\n        (check_mlx_python(), set)\n    }\n}\n\n/// Cache whether mlx_lm Python package is importable.\nstatic MLX_PYTHON_AVAILABLE: std::sync::OnceLock<bool> = std::sync::OnceLock::new();\n\nfn check_mlx_python() -> bool {\n    *MLX_PYTHON_AVAILABLE.get_or_init(|| {\n        std::process::Command::new(\"python3\")\n            .args([\"-c\", \"import mlx_lm\"])\n            .stdout(std::process::Stdio::null())\n            .stderr(std::process::Stdio::null())\n            .status()\n            .map(|s| s.success())\n            .unwrap_or(false)\n    })\n}\n\nfn is_likely_mlx_repo(owner: &str, repo: &str) -> bool {\n    let owner_lower = owner.to_lowercase();\n    let repo_lower = repo.to_lowercase();\n    owner_lower == \"mlx-community\"\n        || repo_lower.contains(\"-mlx-\")\n        || repo_lower.ends_with(\"-mlx\")\n        || repo_lower.contains(\"mlx-\")\n        || repo_lower.ends_with(\"mlx\")\n}\n\n/// Scan ~/.cache/huggingface/hub/ for MLX model directories.\nfn scan_hf_cache_for_mlx() -> HashSet<String> {\n    let mut set = HashSet::new();\n    let cache_dir = dirs_hf_cache();\n    let Ok(entries) = std::fs::read_dir(&cache_dir) else {\n        return set;\n    };\n    for entry in entries.flatten() {\n        let name = entry.file_name();\n        let name_str = name.to_string_lossy();\n        let Some(rest) = name_str.strip_prefix(\"models--\") else {\n            continue;\n        };\n        let mut parts = rest.splitn(2, \"--\");\n        let Some(owner) = parts.next() else {\n            continue;\n        };\n        let Some(repo) = parts.next() else {\n            continue;\n        };\n\n        if !is_likely_mlx_repo(owner, repo) {\n            continue;\n        }\n\n        let owner_lower = owner.to_lowercase();\n        let repo_lower = repo.to_lowercase();\n        set.insert(format!(\"{}/{}\", owner_lower, repo_lower));\n        set.insert(repo_lower);\n    }\n    set\n}\n\nfn dirs_hf_cache() -> std::path::PathBuf {\n    if let Ok(cache) = std::env::var(\"HF_HOME\") {\n        std::path::PathBuf::from(cache).join(\"hub\")\n    } else if let Ok(home) = std::env::var(\"HOME\") {\n        std::path::PathBuf::from(home)\n            .join(\".cache\")\n            .join(\"huggingface\")\n            .join(\"hub\")\n    } else {\n        std::path::PathBuf::from(\"/tmp/.cache/huggingface/hub\")\n    }\n}\n\nimpl ModelProvider for MlxProvider {\n    fn name(&self) -> &str {\n        \"MLX\"\n    }\n\n    fn is_available(&self) -> bool {\n        if !cfg!(target_os = \"macos\") {\n            return false;\n        }\n        // Try the MLX server first\n        let url = format!(\"{}/v1/models\", self.server_url.trim_end_matches('/'));\n        if ureq::get(&url)\n            .config()\n            .timeout_global(Some(std::time::Duration::from_secs(2)))\n            .build()\n            .call()\n            .is_ok()\n        {\n            return true;\n        }\n        // Fall back to checking if mlx_lm is installed\n        check_mlx_python()\n    }\n\n    fn installed_models(&self) -> HashSet<String> {\n        let mut set = scan_hf_cache_for_mlx();\n        if !cfg!(target_os = \"macos\") {\n            return set;\n        }\n        // Also try querying the MLX server if running\n        let url = format!(\"{}/v1/models\", self.server_url.trim_end_matches('/'));\n        if let Ok(resp) = ureq::get(&url)\n            .config()\n            .timeout_global(Some(std::time::Duration::from_secs(2)))\n            .build()\n            .call()\n            && let Ok(json) = resp.into_body().read_json::<serde_json::Value>()\n            && let Some(data) = json.get(\"data\").and_then(|d| d.as_array())\n        {\n            for model in data {\n                if let Some(id) = model.get(\"id\").and_then(|i| i.as_str()) {\n                    set.insert(id.to_lowercase());\n                }\n            }\n        }\n        set\n    }\n\n    fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {\n        let repo_id = if model_tag.contains('/') {\n            model_tag.to_string()\n        } else {\n            format!(\"mlx-community/{}\", model_tag)\n        };\n        let repo_for_thread = repo_id.clone();\n        let (tx, rx) = std::sync::mpsc::channel();\n\n        // Resolve the hf binary path before spawning the thread so we can\n        // give a clear \"not found\" error instead of a confusing OS error.\n        let hf_bin = find_binary(\"hf\").ok_or_else(|| {\n            \"hf not found in PATH. Install it with: uv tool install 'huggingface_hub[cli]'\"\n                .to_string()\n        })?;\n\n        std::thread::spawn(move || {\n            let _ = tx.send(PullEvent::Progress {\n                status: format!(\"Downloading {}...\", repo_for_thread),\n                percent: None,\n            });\n\n            // Download from Hugging Face using their CLI tool\n            let result = std::process::Command::new(&hf_bin)\n                .args([\"download\", &repo_for_thread])\n                .stdout(std::process::Stdio::piped())\n                .stderr(std::process::Stdio::piped())\n                .output();\n\n            match result {\n                Ok(output) if output.status.success() => {\n                    let _ = tx.send(PullEvent::Done);\n                }\n                Ok(output) => {\n                    let stderr = String::from_utf8_lossy(&output.stderr);\n                    let _ = tx.send(PullEvent::Error(format!(\n                        \"hf download failed (exit {}): {}\",\n                        output.status.code().unwrap_or(-1),\n                        stderr.trim()\n                    )));\n                }\n                Err(e) => {\n                    let _ = tx.send(PullEvent::Error(format!(\"failed to run hf: {e}\")));\n                }\n            }\n        });\n\n        Ok(PullHandle {\n            model_tag: repo_id,\n            receiver: rx,\n        })\n    }\n}\n\n// ---------------------------------------------------------------------------\n// llama.cpp provider (direct GGUF download from HuggingFace)\n// ---------------------------------------------------------------------------\n\n/// A provider that downloads GGUF model files directly from HuggingFace\n/// and uses llama.cpp binaries (`llama-cli`, `llama-server`) to run them.\n///\n/// Unlike Ollama, this doesn't require a running daemon — it downloads\n/// GGUF files to a local cache directory and invokes llama.cpp directly.\npub struct LlamaCppProvider {\n    /// Directory where GGUF models are stored.\n    models_dir: PathBuf,\n    /// Path to llama-cli binary, if found.\n    llama_cli: Option<String>,\n    /// Path to llama-server binary, if found.\n    llama_server: Option<String>,\n}\n\nimpl Default for LlamaCppProvider {\n    fn default() -> Self {\n        let models_dir = llamacpp_models_dir();\n        let llama_cli = find_binary(\"llama-cli\");\n        let llama_server = find_binary(\"llama-server\");\n        Self {\n            models_dir,\n            llama_cli,\n            llama_server,\n        }\n    }\n}\n\nimpl LlamaCppProvider {\n    pub fn new() -> Self {\n        Self::default()\n    }\n\n    /// Like `installed_models`, but also returns the true GGUF file count.\n    /// The HashSet may have fewer entries than 2*count due to deduplication\n    /// when stripping quantization suffixes, so `len() / 2` is unreliable.\n    pub fn installed_models_counted(&self) -> (HashSet<String>, usize) {\n        let mut set = HashSet::new();\n        let mut count = 0usize;\n        for path in self.list_gguf_files() {\n            if let Some(stem) = path.file_stem().and_then(|s| s.to_str()) {\n                count += 1;\n                let lower = stem.to_lowercase();\n                set.insert(lower.clone());\n                if let Some(base) = strip_gguf_quant_suffix(&lower) {\n                    set.insert(base);\n                }\n            }\n        }\n        (set, count)\n    }\n\n    /// Return the directory where GGUF models are cached.\n    pub fn models_dir(&self) -> &std::path::Path {\n        &self.models_dir\n    }\n\n    /// Path to `llama-cli` if detected.\n    pub fn llama_cli_path(&self) -> Option<&str> {\n        self.llama_cli.as_deref()\n    }\n\n    /// Path to `llama-server` if detected.\n    pub fn llama_server_path(&self) -> Option<&str> {\n        self.llama_server.as_deref()\n    }\n\n    /// List all `.gguf` files in the cache directory.\n    pub fn list_gguf_files(&self) -> Vec<PathBuf> {\n        let mut files = Vec::new();\n        if let Ok(entries) = std::fs::read_dir(&self.models_dir) {\n            for entry in entries.flatten() {\n                let path = entry.path();\n                if path.extension().and_then(|e| e.to_str()) == Some(\"gguf\") {\n                    files.push(path);\n                }\n            }\n        }\n        files\n    }\n\n    /// Search HuggingFace for GGUF repositories matching a query.\n    /// Returns a list of (repo_id, description) tuples.\n    pub fn search_hf_gguf(query: &str) -> Vec<(String, String)> {\n        let url = format!(\n            \"https://huggingface.co/api/models?library=gguf&search={}&sort=trending&limit=20\",\n            urlencoding::encode(query)\n        );\n        let Ok(resp) = ureq::get(&url)\n            .config()\n            .timeout_global(Some(std::time::Duration::from_secs(15)))\n            .build()\n            .call()\n        else {\n            return Vec::new();\n        };\n        let Ok(models) = resp.into_body().read_json::<Vec<serde_json::Value>>() else {\n            return Vec::new();\n        };\n        models\n            .into_iter()\n            .filter_map(|m| {\n                let id = m.get(\"id\")?.as_str()?.to_string();\n                let desc = m\n                    .get(\"pipeline_tag\")\n                    .and_then(|v| v.as_str())\n                    .unwrap_or(\"model\")\n                    .to_string();\n                Some((id, desc))\n            })\n            .collect()\n    }\n\n    /// List GGUF files available in a HuggingFace repository.\n    /// Returns a list of (filename, size_bytes) tuples.\n    pub fn list_repo_gguf_files(repo_id: &str) -> Vec<(String, u64)> {\n        let url = format!(\"https://huggingface.co/api/models/{}/tree/main\", repo_id);\n        let Ok(resp) = ureq::get(&url)\n            .config()\n            .timeout_global(Some(std::time::Duration::from_secs(15)))\n            .build()\n            .call()\n        else {\n            return Vec::new();\n        };\n        let Ok(entries) = resp.into_body().read_json::<Vec<serde_json::Value>>() else {\n            return Vec::new();\n        };\n        parse_repo_gguf_entries(entries)\n    }\n\n    /// Select the best GGUF file from a repo that fits within a memory budget.\n    /// Prefers higher quality quantizations (Q8 > Q6 > Q5 > Q4 > Q3 > Q2).\n    /// `budget_gb` is the available memory in gigabytes.\n    pub fn select_best_gguf(files: &[(String, u64)], budget_gb: f64) -> Option<(String, u64)> {\n        // Quant preference order (best quality first)\n        let quant_order = [\n            \"Q8_0\", \"q8_0\", \"Q6_K\", \"q6_k\", \"Q6_K_L\", \"q6_k_l\", \"Q5_K_M\", \"q5_k_m\", \"Q5_K_S\",\n            \"q5_k_s\", \"Q4_K_M\", \"q4_k_m\", \"Q4_K_S\", \"q4_k_s\", \"Q4_0\", \"q4_0\", \"Q3_K_M\", \"q3_k_m\",\n            \"Q3_K_S\", \"q3_k_s\", \"Q2_K\", \"q2_k\", \"IQ4_XS\", \"iq4_xs\", \"IQ3_M\", \"iq3_m\", \"IQ2_M\",\n            \"iq2_m\", \"IQ1_M\", \"iq1_m\",\n        ];\n        let budget_bytes = (budget_gb * 1024.0 * 1024.0 * 1024.0) as u64;\n\n        // Try each quant level in preference order\n        for quant in &quant_order {\n            for (filename, size) in files {\n                if *size > 0\n                    && *size <= budget_bytes\n                    && filename.contains(quant)\n                    && !is_split_file(filename)\n                {\n                    return Some((filename.clone(), *size));\n                }\n            }\n        }\n\n        // Fallback: smallest file that fits\n        let mut fitting: Vec<_> = files\n            .iter()\n            .filter(|(f, s)| *s > 0 && *s <= budget_bytes && !is_split_file(f))\n            .collect();\n        fitting.sort_by_key(|(_, s)| *s);\n        fitting.last().map(|(f, s)| (f.clone(), *s))\n    }\n\n    /// Download a GGUF file from a HuggingFace repository.\n    /// `repo_id` is e.g. \"bartowski/Llama-3.1-8B-Instruct-GGUF\"\n    /// `filename` is e.g. \"Llama-3.1-8B-Instruct-Q4_K_M.gguf\"\n    pub fn download_gguf(&self, repo_id: &str, filename: &str) -> Result<PullHandle, String> {\n        // Sanitize filename to prevent path traversal (security: issue #127)\n        validate_gguf_filename(filename)?;\n\n        let models_dir = self.models_dir.clone();\n        let url = format!(\n            \"https://huggingface.co/{}/resolve/main/{}\",\n            repo_id, filename\n        );\n        let dest_path = models_dir.join(filename);\n\n        // Final safety check: ensure resolved path stays within models_dir\n        if let (Ok(canonical_dir), Ok(canonical_dest)) = (\n            std::fs::create_dir_all(&models_dir).and_then(|_| models_dir.canonicalize()),\n            // dest may not exist yet, so canonicalize the parent\n            dest_path\n                .parent()\n                .ok_or_else(|| std::io::Error::other(\"no parent\"))\n                .and_then(|p| {\n                    std::fs::create_dir_all(p)?;\n                    p.canonicalize()\n                }),\n        ) && !canonical_dest.starts_with(&canonical_dir)\n        {\n            return Err(format!(\n                \"Security: download path escapes cache directory: {}\",\n                dest_path.display()\n            ));\n        }\n\n        let tag = format!(\"{}/{}\", repo_id, filename);\n        let filename_owned = filename.to_string();\n        let (tx, rx) = std::sync::mpsc::channel();\n\n        std::thread::spawn(move || {\n            let _ = tx.send(PullEvent::Progress {\n                status: format!(\"Connecting to {}...\", url),\n                percent: Some(0.0),\n            });\n\n            let resp = ureq::get(&url)\n                .config()\n                .timeout_global(Some(std::time::Duration::from_secs(7200)))\n                .build()\n                .call();\n\n            match resp {\n                Ok(resp) => {\n                    let total_size = resp\n                        .headers()\n                        .get(\"content-length\")\n                        .and_then(|v| v.to_str().ok())\n                        .and_then(|s| s.parse::<u64>().ok())\n                        .unwrap_or(0);\n\n                    let _ = tx.send(PullEvent::Progress {\n                        status: format!(\n                            \"Downloading {} ({:.1} GB)...\",\n                            filename_owned,\n                            total_size as f64 / 1_073_741_824.0\n                        ),\n                        percent: Some(0.0),\n                    });\n\n                    // Write to a temp file, then rename to avoid partial files\n                    let tmp_path = dest_path.with_extension(\"gguf.part\");\n                    let file = match std::fs::File::create(&tmp_path) {\n                        Ok(f) => f,\n                        Err(e) => {\n                            let _ =\n                                tx.send(PullEvent::Error(format!(\"Failed to create file: {}\", e)));\n                            return;\n                        }\n                    };\n\n                    let mut writer = std::io::BufWriter::new(file);\n                    let mut reader = resp.into_body().into_reader();\n                    let mut downloaded: u64 = 0;\n                    let mut buf = [0u8; 128 * 1024]; // 128 KB buffer\n                    let mut last_report = std::time::Instant::now();\n\n                    loop {\n                        match std::io::Read::read(&mut reader, &mut buf) {\n                            Ok(0) => break, // EOF\n                            Ok(n) => {\n                                if let Err(e) = std::io::Write::write_all(&mut writer, &buf[..n]) {\n                                    let _ =\n                                        tx.send(PullEvent::Error(format!(\"Write error: {}\", e)));\n                                    let _ = std::fs::remove_file(&tmp_path);\n                                    return;\n                                }\n                                downloaded += n as u64;\n\n                                // Report progress at most every 200ms\n                                if last_report.elapsed() >= std::time::Duration::from_millis(200) {\n                                    let pct = if total_size > 0 {\n                                        downloaded as f64 / total_size as f64 * 100.0\n                                    } else {\n                                        0.0\n                                    };\n                                    let dl_gb = downloaded as f64 / 1_073_741_824.0;\n                                    let total_gb = total_size as f64 / 1_073_741_824.0;\n                                    let _ = tx.send(PullEvent::Progress {\n                                        status: format!(\n                                            \"Downloading {:.1}/{:.1} GB\",\n                                            dl_gb, total_gb\n                                        ),\n                                        percent: Some(pct),\n                                    });\n                                    last_report = std::time::Instant::now();\n                                }\n                            }\n                            Err(e) => {\n                                let _ = tx.send(PullEvent::Error(format!(\"Download error: {}\", e)));\n                                let _ = std::fs::remove_file(&tmp_path);\n                                return;\n                            }\n                        }\n                    }\n\n                    // Flush and rename\n                    if let Err(e) = std::io::Write::flush(&mut writer) {\n                        let _ = tx.send(PullEvent::Error(format!(\"Flush error: {}\", e)));\n                        let _ = std::fs::remove_file(&tmp_path);\n                        return;\n                    }\n                    drop(writer);\n\n                    if let Err(e) = std::fs::rename(&tmp_path, &dest_path) {\n                        let _ = tx.send(PullEvent::Error(format!(\n                            \"Failed to finalize download: {}\",\n                            e\n                        )));\n                        let _ = std::fs::remove_file(&tmp_path);\n                        return;\n                    }\n\n                    let _ = tx.send(PullEvent::Progress {\n                        status: \"Download complete!\".to_string(),\n                        percent: Some(100.0),\n                    });\n                    let _ = tx.send(PullEvent::Done);\n                }\n                Err(e) => {\n                    let _ = tx.send(PullEvent::Error(format!(\"Download failed: {}\", e)));\n                }\n            }\n        });\n\n        Ok(PullHandle {\n            model_tag: tag,\n            receiver: rx,\n        })\n    }\n}\n\n/// Validate a GGUF filename used for local cache writes.\nfn validate_gguf_filename(filename: &str) -> Result<(), String> {\n    if filename.is_empty() {\n        return Err(\"GGUF filename must not be empty\".to_string());\n    }\n\n    if filename.contains('/') || filename.contains('\\\\') {\n        return Err(format!(\n            \"Security: path separators not allowed in GGUF filename: {}\",\n            filename\n        ));\n    }\n\n    let path = std::path::Path::new(filename);\n\n    if path.is_absolute() {\n        return Err(format!(\n            \"Security: absolute paths not allowed in GGUF filename: {}\",\n            filename\n        ));\n    }\n\n    if !filename.ends_with(\".gguf\") {\n        return Err(format!(\n            \"GGUF filename must end in .gguf, got: {}\",\n            filename\n        ));\n    }\n\n    if path.file_name().and_then(|n| n.to_str()) != Some(filename) {\n        return Err(format!(\n            \"Security: GGUF filename must be a basename without path components: {}\",\n            filename\n        ));\n    }\n\n    Ok(())\n}\n\nfn is_split_file(filename: &str) -> bool {\n    // Pattern: anything with \"-NNNNN-of-NNNNN\" before .gguf\n    filename.contains(\"-of-\")\n}\n\nfn parse_repo_gguf_entries(entries: Vec<serde_json::Value>) -> Vec<(String, u64)> {\n    entries\n        .into_iter()\n        .filter_map(|e| {\n            let path = e.get(\"path\")?.as_str()?.to_string();\n            if validate_gguf_filename(&path).is_err() {\n                return None;\n            }\n            let size = e.get(\"size\").and_then(|v| v.as_u64()).unwrap_or(0);\n            // Skip split files (e.g., model-00001-of-00003.gguf) but not the\n            // primary file. We look for files that look like quantized models.\n            Some((path, size))\n        })\n        .collect()\n}\n\n/// Default directory for llama.cpp GGUF model cache.\nfn llamacpp_models_dir() -> PathBuf {\n    if let Ok(dir) = std::env::var(\"LLMFIT_MODELS_DIR\") {\n        PathBuf::from(dir)\n    } else if let Ok(home) = std::env::var(\"HOME\") {\n        PathBuf::from(home)\n            .join(\".cache\")\n            .join(\"llmfit\")\n            .join(\"models\")\n    } else {\n        PathBuf::from(\"/tmp/.cache/llmfit/models\")\n    }\n}\n\n/// Find a binary in PATH using `which`.\nfn find_binary(name: &str) -> Option<String> {\n    std::process::Command::new(\"which\")\n        .arg(name)\n        .stdout(std::process::Stdio::piped())\n        .stderr(std::process::Stdio::null())\n        .output()\n        .ok()\n        .and_then(|out| {\n            if out.status.success() {\n                String::from_utf8(out.stdout)\n                    .ok()\n                    .map(|s| s.trim().to_string())\n            } else {\n                None\n            }\n        })\n}\n\n/// Simple percent-encoding for URL query parameters.\nmod urlencoding {\n    pub fn encode(s: &str) -> String {\n        let mut result = String::with_capacity(s.len() * 3);\n        for byte in s.bytes() {\n            match byte {\n                b'A'..=b'Z' | b'a'..=b'z' | b'0'..=b'9' | b'-' | b'_' | b'.' | b'~' => {\n                    result.push(byte as char);\n                }\n                _ => {\n                    result.push('%');\n                    result.push_str(&format!(\"{:02X}\", byte));\n                }\n            }\n        }\n        result\n    }\n}\n\nimpl ModelProvider for LlamaCppProvider {\n    fn name(&self) -> &str {\n        \"llama.cpp\"\n    }\n\n    fn is_available(&self) -> bool {\n        self.llama_cli.is_some() || self.llama_server.is_some()\n    }\n\n    fn installed_models(&self) -> HashSet<String> {\n        let (set, _) = self.installed_models_counted();\n        set\n    }\n\n    fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {\n        // model_tag can be:\n        // 1. A HuggingFace repo ID like \"bartowski/Llama-3.1-8B-Instruct-GGUF\"\n        // 2. A repo_id/filename like \"bartowski/Llama-3.1-8B-Instruct-GGUF/Q4_K_M.gguf\"\n        // 3. A short search term like \"llama-3.1-8b\"\n\n        // If it contains a slash and ends with .gguf, treat as repo/file\n        if model_tag.matches('/').count() >= 2 && model_tag.ends_with(\".gguf\") {\n            let parts: Vec<&str> = model_tag.splitn(3, '/').collect();\n            if parts.len() == 3 {\n                let repo = format!(\"{}/{}\", parts[0], parts[1]);\n                let filename = parts[2];\n                return self.download_gguf(&repo, filename);\n            }\n        }\n\n        // If it looks like a repo (org/name), list files and pick the best\n        if model_tag.contains('/') {\n            let files = Self::list_repo_gguf_files(model_tag);\n            if files.is_empty() {\n                return Err(format!(\"No GGUF files found in repository '{}'\", model_tag));\n            }\n            // Pick a reasonable default (Q4_K_M or similar)\n            if let Some((filename, _)) = Self::select_best_gguf(&files, 999.0) {\n                return self.download_gguf(model_tag, &filename);\n            }\n            // Fallback: just pick the first\n            let (filename, _) = &files[0];\n            return self.download_gguf(model_tag, filename);\n        }\n\n        // Otherwise, search HuggingFace for GGUF repos\n        let results = Self::search_hf_gguf(model_tag);\n        if results.is_empty() {\n            return Err(format!(\n                \"No GGUF models found on HuggingFace for '{}'\",\n                model_tag\n            ));\n        }\n        // Use the first result\n        let (repo_id, _) = &results[0];\n        let files = Self::list_repo_gguf_files(repo_id);\n        if files.is_empty() {\n            return Err(format!(\"No GGUF files found in repository '{}'\", repo_id));\n        }\n        if let Some((filename, _)) = Self::select_best_gguf(&files, 999.0) {\n            return self.download_gguf(repo_id, &filename);\n        }\n        let (filename, _) = &files[0];\n        self.download_gguf(repo_id, filename)\n    }\n}\n\n// ---------------------------------------------------------------------------\n// Docker Model Runner provider\n// ---------------------------------------------------------------------------\n\n/// Docker Model Runner — Docker Desktop's built-in model serving feature.\n///\n/// Exposes an OpenAI-compatible API at `http://localhost:12434` by default.\n/// Models are listed via `GET /engines` and pulled via `docker model pull`.\npub struct DockerModelRunnerProvider {\n    base_url: String,\n}\n\nfn normalize_docker_mr_host(raw: &str) -> Option<String> {\n    let host = raw.trim();\n    if host.is_empty() {\n        return None;\n    }\n\n    if host.starts_with(\"http://\") || host.starts_with(\"https://\") {\n        return Some(host.to_string());\n    }\n\n    if host.contains(\"://\") {\n        return None;\n    }\n\n    Some(format!(\"http://{host}\"))\n}\n\nimpl Default for DockerModelRunnerProvider {\n    fn default() -> Self {\n        let base_url = std::env::var(\"DOCKER_MODEL_RUNNER_HOST\")\n            .ok()\n            .and_then(|raw| {\n                let normalized = normalize_docker_mr_host(&raw);\n                if normalized.is_none() {\n                    eprintln!(\n                        \"Warning: could not parse DOCKER_MODEL_RUNNER_HOST='{}'. \\\n                         Expected host:port or http(s)://host:port\",\n                        raw\n                    );\n                }\n                normalized\n            })\n            .unwrap_or_else(|| \"http://localhost:12434\".to_string());\n        Self { base_url }\n    }\n}\n\nimpl DockerModelRunnerProvider {\n    pub fn new() -> Self {\n        Self::default()\n    }\n\n    fn models_url(&self) -> String {\n        format!(\"{}/v1/models\", self.base_url.trim_end_matches('/'))\n    }\n\n    /// Single-pass startup probe.\n    /// Returns `(available, installed_models, count)`.\n    pub fn detect_with_installed(&self) -> (bool, HashSet<String>, usize) {\n        let mut set = HashSet::new();\n        let Ok(resp) = ureq::get(&self.models_url())\n            .config()\n            .timeout_global(Some(std::time::Duration::from_millis(800)))\n            .build()\n            .call()\n        else {\n            return (false, set, 0);\n        };\n\n        let Ok(list) = resp.into_body().read_json::<DockerModelList>() else {\n            return (true, set, 0);\n        };\n        let engines = list.data;\n        let count = engines.len();\n        for e in engines {\n            let lower = e.id.to_lowercase();\n            set.insert(lower.clone());\n            // Also insert the model part after the namespace (e.g. \"ai/llama3.1\" → \"llama3.1\")\n            if let Some(name) = lower.split('/').next_back()\n                && name != lower\n            {\n                set.insert(name.to_string());\n            }\n            // Strip quantization tag if present (e.g. \"llama3.1:8B-Q4_K_M\" → \"llama3.1:8b\")\n            if let Some(base) = lower.split(':').next() {\n                set.insert(base.to_string());\n            }\n        }\n        (true, set, count)\n    }\n\n    pub fn installed_models_counted(&self) -> (HashSet<String>, usize) {\n        let (_, set, count) = self.detect_with_installed();\n        (set, count)\n    }\n}\n\n#[derive(serde::Deserialize)]\nstruct DockerModelList {\n    data: Vec<DockerEngine>,\n}\n\n#[derive(serde::Deserialize)]\nstruct DockerEngine {\n    /// Model ID, e.g. \"ai/llama3.1:8B-Q4_K_M\"\n    id: String,\n}\n\nimpl ModelProvider for DockerModelRunnerProvider {\n    fn name(&self) -> &str {\n        \"Docker Model Runner\"\n    }\n\n    fn is_available(&self) -> bool {\n        ureq::get(&self.models_url())\n            .config()\n            .timeout_global(Some(std::time::Duration::from_secs(2)))\n            .build()\n            .call()\n            .is_ok()\n    }\n\n    fn installed_models(&self) -> HashSet<String> {\n        let (set, _) = self.installed_models_counted();\n        set\n    }\n\n    fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {\n        let tag = model_tag.to_string();\n        let (tx, rx) = std::sync::mpsc::channel();\n\n        std::thread::spawn(move || {\n            let _ = tx.send(PullEvent::Progress {\n                status: format!(\"Pulling {} via docker model pull...\", tag),\n                percent: None,\n            });\n\n            let result = std::process::Command::new(\"docker\")\n                .args([\"model\", \"pull\", &tag])\n                .stdout(std::process::Stdio::piped())\n                .stderr(std::process::Stdio::piped())\n                .output();\n\n            match result {\n                Ok(output) if output.status.success() => {\n                    let _ = tx.send(PullEvent::Done);\n                }\n                Ok(output) => {\n                    let stderr = String::from_utf8_lossy(&output.stderr);\n                    let _ = tx.send(PullEvent::Error(format!(\n                        \"docker model pull failed: {}\",\n                        stderr.trim()\n                    )));\n                }\n                Err(e) => {\n                    let _ = tx.send(PullEvent::Error(format!(\"Failed to run docker: {e}\")));\n                }\n            }\n        });\n\n        Ok(PullHandle {\n            model_tag: model_tag.to_string(),\n            receiver: rx,\n        })\n    }\n}\n\n// ---------------------------------------------------------------------------\n// LM Studio provider\n// ---------------------------------------------------------------------------\n\n/// LM Studio — local model server with REST API for model management.\n///\n/// Exposes an OpenAI-compatible API plus management endpoints at\n/// `http://127.0.0.1:1234` by default. Models are downloaded via\n/// `POST /api/v1/models/download` and listed via `GET /v1/models`.\npub struct LmStudioProvider {\n    base_url: String,\n}\n\nfn normalize_lmstudio_host(raw: &str) -> Option<String> {\n    let host = raw.trim();\n    if host.is_empty() {\n        return None;\n    }\n\n    if host.starts_with(\"http://\") || host.starts_with(\"https://\") {\n        return Some(host.to_string());\n    }\n\n    if host.contains(\"://\") {\n        return None;\n    }\n\n    Some(format!(\"http://{host}\"))\n}\n\nimpl Default for LmStudioProvider {\n    fn default() -> Self {\n        let base_url = std::env::var(\"LMSTUDIO_HOST\")\n            .ok()\n            .and_then(|raw| {\n                let normalized = normalize_lmstudio_host(&raw);\n                if normalized.is_none() {\n                    eprintln!(\n                        \"Warning: could not parse LMSTUDIO_HOST='{}'. \\\n                         Expected host:port or http(s)://host:port\",\n                        raw\n                    );\n                }\n                normalized\n            })\n            .unwrap_or_else(|| \"http://127.0.0.1:1234\".to_string());\n        Self { base_url }\n    }\n}\n\nimpl LmStudioProvider {\n    pub fn new() -> Self {\n        Self::default()\n    }\n\n    fn models_url(&self) -> String {\n        format!(\"{}/v1/models\", self.base_url.trim_end_matches('/'))\n    }\n\n    fn download_url(&self) -> String {\n        format!(\n            \"{}/api/v1/models/download\",\n            self.base_url.trim_end_matches('/')\n        )\n    }\n\n    fn download_status_url(&self) -> String {\n        format!(\n            \"{}/api/v1/models/download-status\",\n            self.base_url.trim_end_matches('/')\n        )\n    }\n\n    /// Single-pass startup probe.\n    /// Returns `(available, installed_models, count)`.\n    pub fn detect_with_installed(&self) -> (bool, HashSet<String>, usize) {\n        let mut set = HashSet::new();\n        let Ok(resp) = ureq::get(&self.models_url())\n            .config()\n            .timeout_global(Some(std::time::Duration::from_millis(800)))\n            .build()\n            .call()\n        else {\n            return (false, set, 0);\n        };\n\n        let Ok(list) = resp.into_body().read_json::<LmStudioModelList>() else {\n            return (true, set, 0);\n        };\n        let models = list.models;\n        let count = models.len();\n        for m in models {\n            let lower = m.key.to_lowercase();\n            set.insert(lower.clone());\n            // Also insert the model part after the publisher (e.g. \"lmstudio-community/Qwen3-1.7B-MLX-4bit\" → \"qwen3-1.7b-mlx-4bit\")\n            if let Some(name) = lower.split('/').next_back()\n                && name != lower\n            {\n                set.insert(name.to_string());\n            }\n        }\n        (true, set, count)\n    }\n\n    pub fn installed_models_counted(&self) -> (HashSet<String>, usize) {\n        let (_, set, count) = self.detect_with_installed();\n        (set, count)\n    }\n}\n\n#[derive(serde::Deserialize)]\nstruct LmStudioModelList {\n    models: Vec<LmStudioModel>,\n}\n\n#[derive(serde::Deserialize)]\nstruct LmStudioModel {\n    /// Model key, e.g. \"lmstudio-community/Qwen3-1.7B-MLX-4bit\"\n    key: String,\n}\n\n#[derive(serde::Deserialize)]\nstruct LmStudioDownloadResponse {\n    #[serde(default)]\n    #[allow(dead_code)]\n    job_id: Option<String>,\n    #[serde(default)]\n    status: String,\n    #[serde(default)]\n    #[allow(dead_code)]\n    total_size_bytes: Option<u64>,\n}\n\n#[derive(serde::Deserialize)]\nstruct LmStudioDownloadStatus {\n    #[serde(default)]\n    status: String,\n    #[serde(default)]\n    progress: Option<f64>,\n    #[serde(default)]\n    downloaded_bytes: Option<u64>,\n    #[serde(default)]\n    total_size_bytes: Option<u64>,\n}\n\nimpl ModelProvider for LmStudioProvider {\n    fn name(&self) -> &str {\n        \"LM Studio\"\n    }\n\n    fn is_available(&self) -> bool {\n        ureq::get(&self.models_url())\n            .config()\n            .timeout_global(Some(std::time::Duration::from_secs(2)))\n            .build()\n            .call()\n            .is_ok()\n    }\n\n    fn installed_models(&self) -> HashSet<String> {\n        let (set, _) = self.installed_models_counted();\n        set\n    }\n\n    fn start_pull(&self, model_tag: &str) -> Result<PullHandle, String> {\n        let download_url = self.download_url();\n        let status_url = self.download_status_url();\n        let tag = model_tag.to_string();\n        let (tx, rx) = std::sync::mpsc::channel();\n\n        let body = serde_json::json!({\n            \"model\": tag,\n        });\n\n        std::thread::spawn(move || {\n            // Initiate download\n            let resp = ureq::post(&download_url)\n                .config()\n                .timeout_global(Some(std::time::Duration::from_secs(30)))\n                .build()\n                .send_json(&body);\n\n            match resp {\n                Ok(resp) => {\n                    let Ok(dl_resp) = resp.into_body().read_json::<LmStudioDownloadResponse>()\n                    else {\n                        let _ = tx.send(PullEvent::Error(\n                            \"Failed to parse LM Studio download response\".to_string(),\n                        ));\n                        return;\n                    };\n\n                    if dl_resp.status == \"already_downloaded\" {\n                        let _ = tx.send(PullEvent::Progress {\n                            status: \"Already downloaded\".to_string(),\n                            percent: Some(100.0),\n                        });\n                        let _ = tx.send(PullEvent::Done);\n                        return;\n                    }\n\n                    if dl_resp.status == \"failed\" {\n                        let _ = tx.send(PullEvent::Error(\"LM Studio download failed\".to_string()));\n                        return;\n                    }\n\n                    let _ = tx.send(PullEvent::Progress {\n                        status: format!(\"Downloading via LM Studio ({})\", dl_resp.status),\n                        percent: Some(0.0),\n                    });\n\n                    // Poll for progress\n                    loop {\n                        std::thread::sleep(std::time::Duration::from_millis(500));\n\n                        let poll = ureq::get(&status_url)\n                            .config()\n                            .timeout_global(Some(std::time::Duration::from_secs(10)))\n                            .build()\n                            .call();\n\n                        match poll {\n                            Ok(resp) => {\n                                // Try to parse as array (multiple jobs) or single object\n                                let body_str = match resp.into_body().read_to_string() {\n                                    Ok(s) => s,\n                                    Err(_) => continue,\n                                };\n\n                                // Try parsing as array first\n                                let status_opt: Option<LmStudioDownloadStatus> =\n                                    if let Ok(statuses) =\n                                        serde_json::from_str::<Vec<LmStudioDownloadStatus>>(\n                                            &body_str,\n                                        )\n                                    {\n                                        // Find our job by looking for a downloading status\n                                        statuses.into_iter().find(|s| {\n                                            s.status == \"downloading\"\n                                                || s.status == \"completed\"\n                                                || s.status == \"failed\"\n                                        })\n                                    } else {\n                                        serde_json::from_str(&body_str).ok()\n                                    };\n\n                                let Some(st) = status_opt else {\n                                    continue;\n                                };\n\n                                let percent = st.progress.map(|p| p * 100.0).or_else(|| {\n                                    match (st.downloaded_bytes, st.total_size_bytes) {\n                                        (Some(dl), Some(total)) if total > 0 => {\n                                            Some(dl as f64 / total as f64 * 100.0)\n                                        }\n                                        _ => None,\n                                    }\n                                });\n\n                                if st.status == \"completed\" {\n                                    let _ = tx.send(PullEvent::Progress {\n                                        status: \"Download complete\".to_string(),\n                                        percent: Some(100.0),\n                                    });\n                                    let _ = tx.send(PullEvent::Done);\n                                    return;\n                                }\n\n                                if st.status == \"failed\" {\n                                    let _ = tx.send(PullEvent::Error(\n                                        \"LM Studio download failed\".to_string(),\n                                    ));\n                                    return;\n                                }\n\n                                let _ = tx.send(PullEvent::Progress {\n                                    status: \"Downloading via LM Studio...\".to_string(),\n                                    percent,\n                                });\n                            }\n                            Err(_) => {\n                                // Status endpoint unreachable, keep trying\n                                continue;\n                            }\n                        }\n                    }\n                }\n                Err(e) => {\n                    let _ = tx.send(PullEvent::Error(format!(\"LM Studio download error: {e}\")));\n                }\n            }\n        });\n\n        Ok(PullHandle {\n            model_tag: model_tag.to_string(),\n            receiver: rx,\n        })\n    }\n}\n\n// ---------------------------------------------------------------------------\n// LM Studio name-matching helpers\n// ---------------------------------------------------------------------------\n\n/// LM Studio uses HuggingFace model names directly. We match against the\n/// model's GGUF sources and common naming patterns.\npub fn hf_name_to_lmstudio_candidates(hf_name: &str) -> Vec<String> {\n    let repo = hf_name\n        .split('/')\n        .next_back()\n        .unwrap_or(hf_name)\n        .to_lowercase();\n    let mut candidates = vec![hf_name.to_lowercase()];\n    if repo != hf_name.to_lowercase() {\n        candidates.push(repo.clone());\n    }\n    // Strip common suffixes for matching\n    let stripped = repo\n        .replace(\"-instruct\", \"\")\n        .replace(\"-chat\", \"\")\n        .replace(\"-hf\", \"\")\n        .replace(\"-it\", \"\");\n    if stripped != repo {\n        candidates.push(stripped);\n    }\n    candidates\n}\n\n/// Check if any LM Studio candidates for an HF model appear in the installed set.\npub fn is_model_installed_lmstudio(hf_name: &str, installed: &HashSet<String>) -> bool {\n    let candidates = hf_name_to_lmstudio_candidates(hf_name);\n    candidates.iter().any(|candidate| {\n        installed\n            .iter()\n            .any(|installed_name| installed_name.contains(candidate))\n    })\n}\n\n/// LM Studio can download any HuggingFace model, so we always return true\n/// if the model has GGUF sources (which have HF repo IDs).\npub fn has_lmstudio_mapping(hf_name: &str) -> bool {\n    // LM Studio can download from HF directly, so any model with a known\n    // GGUF source or a HF name is potentially downloadable.\n    !hf_name.is_empty()\n}\n\n/// Given an HF model name, return the model identifier to use for LM Studio download.\n/// LM Studio accepts HF model names directly.\npub fn lmstudio_pull_tag(hf_name: &str) -> Option<String> {\n    if hf_name.is_empty() {\n        return None;\n    }\n    // Use the full HF name as the download identifier\n    Some(hf_name.to_string())\n}\n\n// ---------------------------------------------------------------------------\n// Docker Model Runner name-matching helpers\n// ---------------------------------------------------------------------------\n\n/// Embedded catalog of HF models confirmed to exist in Docker Hub's ai/ namespace.\n/// Generated by `scripts/scrape_docker_models.py` and refreshed alongside the model DB.\nconst DOCKER_MODELS_JSON: &str = include_str!(\"../data/docker_models.json\");\n\n#[derive(serde::Deserialize)]\nstruct DockerModelCatalog {\n    models: Vec<DockerModelEntry>,\n}\n\n#[derive(serde::Deserialize)]\nstruct DockerModelEntry {\n    hf_name: String,\n    docker_tag: String,\n}\n\n/// Lazily parsed Docker Model Runner catalog.\nfn docker_mr_catalog() -> &'static [(String, String)] {\n    use std::sync::OnceLock;\n    static CATALOG: OnceLock<Vec<(String, String)>> = OnceLock::new();\n    CATALOG.get_or_init(|| {\n        let Ok(catalog) = serde_json::from_str::<DockerModelCatalog>(DOCKER_MODELS_JSON) else {\n            return Vec::new();\n        };\n        catalog\n            .models\n            .into_iter()\n            .map(|e| (e.hf_name.to_lowercase(), e.docker_tag))\n            .collect()\n    })\n}\n\n/// Returns `true` if this HF model has a confirmed Docker Model Runner image.\npub fn has_docker_mr_mapping(hf_name: &str) -> bool {\n    docker_mr_pull_tag(hf_name).is_some()\n}\n\n/// Given an HF model name, return the Docker Model Runner tag to use for pulling.\n/// Returns `None` if the model has no confirmed Docker image.\npub fn docker_mr_pull_tag(hf_name: &str) -> Option<String> {\n    let lower = hf_name.to_lowercase();\n    docker_mr_catalog()\n        .iter()\n        .find(|(name, _)| *name == lower)\n        .map(|(_, tag)| tag.clone())\n}\n\n/// Docker Model Runner uses the Ollama naming convention (e.g. \"ai/llama3.1:8b\").\n/// We generate candidates from the confirmed catalog, plus base-name variants for\n/// matching against locally installed models.\npub fn hf_name_to_docker_mr_candidates(hf_name: &str) -> Vec<String> {\n    let Some(tag) = docker_mr_pull_tag(hf_name) else {\n        return Vec::new();\n    };\n    let mut candidates = vec![tag.clone()];\n    // Also add without \"ai/\" prefix for matching installed models\n    if let Some(stripped) = tag.strip_prefix(\"ai/\") {\n        candidates.push(stripped.to_string());\n    }\n    // Add base repo name (without size tag) e.g. \"ai/llama3.1\"\n    if let Some(base) = tag.split(':').next() {\n        candidates.push(base.to_string());\n    }\n    candidates\n}\n\n/// Check if any of the Docker Model Runner candidates for an HF model\n/// appear in the installed set.\npub fn is_model_installed_docker_mr(hf_name: &str, installed: &HashSet<String>) -> bool {\n    let candidates = hf_name_to_docker_mr_candidates(hf_name);\n    candidates.iter().any(|candidate| {\n        installed\n            .iter()\n            .any(|installed_name| docker_mr_installed_matches(installed_name, candidate))\n    })\n}\n\nfn docker_mr_installed_matches(installed_name: &str, candidate: &str) -> bool {\n    if installed_name == candidate {\n        return true;\n    }\n    // Allow variant tags, e.g. candidate \"ai/llama3.1:8b\" matching\n    // installed \"ai/llama3.1:8b-q4_k_m\"\n    if candidate.contains(':') {\n        return installed_name.starts_with(&format!(\"{candidate}-\"));\n    }\n    false\n}\n\n/// Strip quantization suffix from a GGUF file stem.\n/// \"llama-3.1-8b-instruct-q4_k_m\" → \"llama-3.1-8b-instruct\"\nfn strip_gguf_quant_suffix(stem: &str) -> Option<String> {\n    let quant_patterns = [\n        \"-q8_0\", \"-q6_k\", \"-q6_k_l\", \"-q5_k_m\", \"-q5_k_s\", \"-q4_k_m\", \"-q4_k_s\", \"-q4_0\",\n        \"-q3_k_m\", \"-q3_k_s\", \"-q2_k\", \"-iq4_xs\", \"-iq3_m\", \"-iq2_m\", \"-iq1_m\", \"-f16\", \"-f32\",\n        \"-bf16\", \".q8_0\", \".q6_k\", \".q5_k_m\", \".q4_k_m\", \".q4_0\", \".q3_k_m\", \".q2_k\",\n    ];\n    for pat in &quant_patterns {\n        if let Some(pos) = stem.rfind(pat) {\n            return Some(stem[..pos].to_string());\n        }\n    }\n    None\n}\n\n// ---------------------------------------------------------------------------\n// llama.cpp name-matching helpers\n// ---------------------------------------------------------------------------\n\n/// Authoritative mapping from HF repo names to known GGUF repository IDs on HuggingFace.\n/// Models not in this table fall back to a heuristic search.\nconst LLAMACPP_GGUF_MAPPINGS: &[(&str, &str)] = &[\n    // Meta Llama\n    (\n        \"llama-3.3-70b-instruct\",\n        \"bartowski/Llama-3.3-70B-Instruct-GGUF\",\n    ),\n    (\n        \"llama-3.2-3b-instruct\",\n        \"bartowski/Llama-3.2-3B-Instruct-GGUF\",\n    ),\n    (\n        \"llama-3.2-1b-instruct\",\n        \"bartowski/Llama-3.2-1B-Instruct-GGUF\",\n    ),\n    (\n        \"llama-3.1-8b-instruct\",\n        \"bartowski/Llama-3.1-8B-Instruct-GGUF\",\n    ),\n    (\n        \"llama-3.1-70b-instruct\",\n        \"bartowski/Llama-3.1-70B-Instruct-GGUF\",\n    ),\n    (\n        \"llama-3.1-405b-instruct\",\n        \"bartowski/Meta-Llama-3.1-405B-Instruct-GGUF\",\n    ),\n    (\n        \"meta-llama-3-8b-instruct\",\n        \"bartowski/Meta-Llama-3-8B-Instruct-GGUF\",\n    ),\n    // Qwen\n    (\n        \"qwen2.5-72b-instruct\",\n        \"bartowski/Qwen2.5-72B-Instruct-GGUF\",\n    ),\n    (\n        \"qwen2.5-32b-instruct\",\n        \"bartowski/Qwen2.5-32B-Instruct-GGUF\",\n    ),\n    (\n        \"qwen2.5-14b-instruct\",\n        \"bartowski/Qwen2.5-14B-Instruct-GGUF\",\n    ),\n    (\"qwen2.5-7b-instruct\", \"bartowski/Qwen2.5-7B-Instruct-GGUF\"),\n    (\"qwen2.5-3b-instruct\", \"bartowski/Qwen2.5-3B-Instruct-GGUF\"),\n    (\n        \"qwen2.5-1.5b-instruct\",\n        \"bartowski/Qwen2.5-1.5B-Instruct-GGUF\",\n    ),\n    (\n        \"qwen2.5-0.5b-instruct\",\n        \"bartowski/Qwen2.5-0.5B-Instruct-GGUF\",\n    ),\n    (\n        \"qwen2.5-coder-32b-instruct\",\n        \"bartowski/Qwen2.5-Coder-32B-Instruct-GGUF\",\n    ),\n    (\n        \"qwen2.5-coder-14b-instruct\",\n        \"bartowski/Qwen2.5-Coder-14B-Instruct-GGUF\",\n    ),\n    (\n        \"qwen2.5-coder-7b-instruct\",\n        \"bartowski/Qwen2.5-Coder-7B-Instruct-GGUF\",\n    ),\n    (\"qwen3-32b\", \"bartowski/Qwen3-32B-GGUF\"),\n    (\"qwen3-14b\", \"bartowski/Qwen3-14B-GGUF\"),\n    (\"qwen3-8b\", \"bartowski/Qwen3-8B-GGUF\"),\n    (\"qwen3-4b\", \"bartowski/Qwen3-4B-GGUF\"),\n    (\"qwen3-0.6b\", \"bartowski/Qwen3-0.6B-GGUF\"),\n    // Mistral\n    (\n        \"mistral-7b-instruct-v0.3\",\n        \"bartowski/Mistral-7B-Instruct-v0.3-GGUF\",\n    ),\n    (\n        \"mistral-small-24b-instruct-2501\",\n        \"bartowski/Mistral-Small-24B-Instruct-2501-GGUF\",\n    ),\n    (\n        \"mixtral-8x7b-instruct-v0.1\",\n        \"bartowski/Mixtral-8x7B-Instruct-v0.1-GGUF\",\n    ),\n    // Google Gemma\n    (\"gemma-3-12b-it\", \"bartowski/gemma-3-12b-it-GGUF\"),\n    (\"gemma-2-27b-it\", \"bartowski/gemma-2-27b-it-GGUF\"),\n    (\"gemma-2-9b-it\", \"bartowski/gemma-2-9b-it-GGUF\"),\n    (\"gemma-2-2b-it\", \"bartowski/gemma-2-2b-it-GGUF\"),\n    // Microsoft Phi\n    (\"phi-4\", \"bartowski/phi-4-GGUF\"),\n    (\"phi-4-mini-instruct\", \"bartowski/phi-4-mini-instruct-GGUF\"),\n    (\n        \"phi-3.5-mini-instruct\",\n        \"bartowski/Phi-3.5-mini-instruct-GGUF\",\n    ),\n    (\n        \"phi-3-mini-4k-instruct\",\n        \"bartowski/Phi-3-mini-4k-instruct-GGUF\",\n    ),\n    // DeepSeek\n    (\"deepseek-r1\", \"bartowski/DeepSeek-R1-GGUF\"),\n    (\n        \"deepseek-r1-distill-qwen-32b\",\n        \"bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF\",\n    ),\n    (\n        \"deepseek-r1-distill-qwen-14b\",\n        \"bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF\",\n    ),\n    (\n        \"deepseek-r1-distill-qwen-7b\",\n        \"bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF\",\n    ),\n    (\"deepseek-v3\", \"bartowski/DeepSeek-V3-GGUF\"),\n    // Community\n    (\n        \"tinyllama-1.1b-chat-v1.0\",\n        \"TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF\",\n    ),\n    (\"falcon-7b-instruct\", \"TheBloke/falcon-7b-instruct-GGUF\"),\n    (\n        \"smollm2-135m-instruct\",\n        \"bartowski/SmolLM2-135M-Instruct-GGUF\",\n    ),\n];\n\n/// Look up a known GGUF repo for an HF model name.\nfn lookup_gguf_repo(hf_name: &str) -> Option<&'static str> {\n    let repo = hf_name\n        .split('/')\n        .next_back()\n        .unwrap_or(hf_name)\n        .to_lowercase();\n    LLAMACPP_GGUF_MAPPINGS\n        .iter()\n        .find(|&&(hf_suffix, _)| repo == hf_suffix)\n        .map(|&(_, gguf_repo)| gguf_repo)\n}\n\n/// Map a HuggingFace model name to candidate GGUF repo IDs.\npub fn hf_name_to_gguf_candidates(hf_name: &str) -> Vec<String> {\n    if let Some(repo) = lookup_gguf_repo(hf_name) {\n        return vec![repo.to_string()];\n    }\n\n    // Heuristic: try common GGUF repo naming patterns\n    let base = hf_name.split('/').next_back().unwrap_or(hf_name);\n\n    vec![\n        format!(\"bartowski/{}-GGUF\", base),\n        format!(\"ggml-org/{}-GGUF\", base),\n        format!(\"TheBloke/{}-GGUF\", base),\n    ]\n}\n\n/// Returns `true` if this HF model has a known GGUF mapping.\npub fn has_gguf_mapping(hf_name: &str) -> bool {\n    lookup_gguf_repo(hf_name).is_some()\n}\n\n/// Check if a model is installed in the llama.cpp cache.\npub fn is_model_installed_llamacpp(hf_name: &str, installed: &HashSet<String>) -> bool {\n    let repo = hf_name\n        .split('/')\n        .next_back()\n        .unwrap_or(hf_name)\n        .to_lowercase();\n\n    // Direct match on model name stem\n    if installed.contains(&repo) {\n        return true;\n    }\n\n    // Check with common suffixes stripped\n    let stripped = repo\n        .replace(\"-instruct\", \"\")\n        .replace(\"-chat\", \"\")\n        .replace(\"-hf\", \"\")\n        .replace(\"-it\", \"\");\n\n    installed.iter().any(|name| {\n        name.contains(&repo) || name.contains(&stripped) || repo.contains(name.as_str())\n    })\n}\n\n/// Given an HF model name, return the best GGUF repo to pull from.\npub fn gguf_pull_tag(hf_name: &str) -> Option<String> {\n    lookup_gguf_repo(hf_name).map(|s| s.to_string())\n}\n\n/// Best-effort check that a Hugging Face model repository exists.\npub fn hf_repo_exists(repo_id: &str) -> bool {\n    let url = format!(\"https://huggingface.co/api/models/{}\", repo_id);\n    ureq::get(&url)\n        .config()\n        .timeout_global(Some(std::time::Duration::from_millis(1200)))\n        .build()\n        .call()\n        .is_ok()\n}\n\n/// Resolve the first GGUF repo that appears to exist remotely.\npub fn first_existing_gguf_repo(hf_name: &str) -> Option<String> {\n    if let Some(repo) = gguf_pull_tag(hf_name)\n        && hf_repo_exists(&repo)\n    {\n        return Some(repo);\n    }\n    let candidates = hf_name_to_gguf_candidates(hf_name);\n    candidates.into_iter().find(|repo| hf_repo_exists(repo))\n}\n\n// ---------------------------------------------------------------------------\n// MLX name-matching helpers\n// ---------------------------------------------------------------------------\n\nfn push_unique_candidate(candidates: &mut Vec<String>, candidate: String) {\n    if !candidate.is_empty() && !candidates.iter().any(|c| c == &candidate) {\n        candidates.push(candidate);\n    }\n}\n\nfn strip_trailing_quant_suffix(name: &str) -> String {\n    for suffix in [\"-4bit\", \"-6bit\", \"-8bit\"] {\n        if let Some(stripped) = name.strip_suffix(suffix) {\n            return stripped.to_string();\n        }\n    }\n    name.to_string()\n}\n\nfn normalize_mlx_repo_base(repo_lower: &str) -> String {\n    let without_quant = strip_trailing_quant_suffix(repo_lower);\n\n    without_quant\n        .strip_suffix(\"-mlx\")\n        .unwrap_or(&without_quant)\n        .trim_matches('-')\n        .to_string()\n}\n\nfn strip_trailing_common_model_suffixes(name: &str) -> String {\n    let mut out = name.to_string();\n    loop {\n        let mut changed = false;\n        for suffix in [\"-instruct\", \"-chat\", \"-hf\", \"-it\", \"-base\"] {\n            if let Some(stripped) = out.strip_suffix(suffix) {\n                out = stripped.trim_end_matches('-').to_string();\n                changed = true;\n                break;\n            }\n        }\n        if !changed {\n            break;\n        }\n    }\n    out\n}\n\nfn explicit_mlx_repo_id(hf_name: &str) -> Option<String> {\n    if hf_name.matches('/').count() != 1 {\n        return None;\n    }\n    let mut parts = hf_name.splitn(2, '/');\n    let owner = parts.next()?.trim();\n    let repo = parts.next()?.trim();\n    if owner.is_empty() || repo.is_empty() || !is_likely_mlx_repo(owner, repo) {\n        return None;\n    }\n    Some(format!(\"{}/{}\", owner.to_lowercase(), repo.to_lowercase()))\n}\n\n/// Map a HuggingFace model name to mlx-community repo name candidates.\n/// Pattern: mlx-community/{RepoName}-{quant}bit\npub fn hf_name_to_mlx_candidates(hf_name: &str) -> Vec<String> {\n    let mut candidates = Vec::new();\n\n    if let Some(repo_id) = explicit_mlx_repo_id(hf_name) {\n        push_unique_candidate(&mut candidates, repo_id.clone());\n        if let Some(repo_name) = repo_id.split('/').next_back() {\n            push_unique_candidate(&mut candidates, repo_name.to_string());\n        }\n    }\n\n    let repo = hf_name.split('/').next_back().unwrap_or(hf_name);\n    let repo_lower = repo.to_lowercase();\n    push_unique_candidate(&mut candidates, repo_lower.clone());\n\n    let normalized_repo = normalize_mlx_repo_base(&repo_lower);\n\n    // Explicit mappings: HF repo suffix → mlx-community repo name (without quant suffix)\n    let mappings: &[(&str, &str)] = &[\n        // Meta Llama\n        (\"Llama-3.3-70B-Instruct\", \"Llama-3.3-70B-Instruct\"),\n        (\"Llama-3.2-3B-Instruct\", \"Llama-3.2-3B-Instruct\"),\n        (\"Llama-3.2-1B-Instruct\", \"Llama-3.2-1B-Instruct\"),\n        (\"Llama-3.1-8B-Instruct\", \"Llama-3.1-8B-Instruct\"),\n        (\"Llama-3.1-70B-Instruct\", \"Llama-3.1-70B-Instruct\"),\n        // Qwen\n        (\"Qwen2.5-72B-Instruct\", \"Qwen2.5-72B-Instruct\"),\n        (\"Qwen2.5-32B-Instruct\", \"Qwen2.5-32B-Instruct\"),\n        (\"Qwen2.5-14B-Instruct\", \"Qwen2.5-14B-Instruct\"),\n        (\"Qwen2.5-7B-Instruct\", \"Qwen2.5-7B-Instruct\"),\n        (\"Qwen2.5-Coder-32B-Instruct\", \"Qwen2.5-Coder-32B-Instruct\"),\n        (\"Qwen2.5-Coder-14B-Instruct\", \"Qwen2.5-Coder-14B-Instruct\"),\n        (\"Qwen2.5-Coder-7B-Instruct\", \"Qwen2.5-Coder-7B-Instruct\"),\n        (\"Qwen3-32B\", \"Qwen3-32B\"),\n        (\"Qwen3-14B\", \"Qwen3-14B\"),\n        (\"Qwen3-8B\", \"Qwen3-8B\"),\n        (\"Qwen3-4B\", \"Qwen3-4B\"),\n        (\"Qwen3-1.7B\", \"Qwen3-1.7B\"),\n        (\"Qwen3-0.6B\", \"Qwen3-0.6B\"),\n        (\"Qwen3-30B-A3B\", \"Qwen3-30B-A3B\"),\n        (\"Qwen3-235B-A22B\", \"Qwen3-235B-A22B\"),\n        // Qwen3.5\n        (\"Qwen3.5-0.6B\", \"Qwen3.5-0.6B\"),\n        (\"Qwen3.5-1.7B\", \"Qwen3.5-1.7B\"),\n        (\"Qwen3.5-4B\", \"Qwen3.5-4B\"),\n        (\"Qwen3.5-8B\", \"Qwen3.5-8B\"),\n        (\"Qwen3.5-9B\", \"Qwen3.5-9B\"),\n        (\"Qwen3.5-14B\", \"Qwen3.5-14B\"),\n        (\"Qwen3.5-27B\", \"Qwen3.5-27B\"),\n        (\"Qwen3.5-32B\", \"Qwen3.5-32B\"),\n        (\"Qwen3.5-35B-A3B\", \"Qwen3.5-35B-A3B\"),\n        (\"Qwen3.5-72B\", \"Qwen3.5-72B\"),\n        (\"Qwen3.5-122B-A10B\", \"Qwen3.5-122B-A10B\"),\n        (\"Qwen3.5-397B-A17B\", \"Qwen3.5-397B-A17B\"),\n        // Mistral\n        (\"Mistral-7B-Instruct-v0.3\", \"Mistral-7B-Instruct-v0.3\"),\n        (\n            \"Mistral-Small-24B-Instruct-2501\",\n            \"Mistral-Small-24B-Instruct-2501\",\n        ),\n        (\"Mixtral-8x7B-Instruct-v0.1\", \"Mixtral-8x7B-Instruct-v0.1\"),\n        (\n            \"Mistral-Small-3.1-24B-Instruct-2503\",\n            \"Mistral-Small-3.1-24B-Instruct-2503\",\n        ),\n        (\"Ministral-8B-Instruct-2410\", \"Ministral-8B-Instruct-2410\"),\n        (\"Mistral-Nemo-Instruct-2407\", \"Mistral-Nemo-Instruct-2407\"),\n        // DeepSeek\n        (\n            \"DeepSeek-R1-Distill-Qwen-32B\",\n            \"DeepSeek-R1-Distill-Qwen-32B\",\n        ),\n        (\"DeepSeek-R1-Distill-Qwen-7B\", \"DeepSeek-R1-Distill-Qwen-7B\"),\n        (\n            \"DeepSeek-R1-Distill-Qwen-14B\",\n            \"DeepSeek-R1-Distill-Qwen-14B\",\n        ),\n        (\n            \"DeepSeek-R1-Distill-Llama-8B\",\n            \"DeepSeek-R1-Distill-Llama-8B\",\n        ),\n        (\n            \"DeepSeek-R1-Distill-Llama-70B\",\n            \"DeepSeek-R1-Distill-Llama-70B\",\n        ),\n        // Gemma\n        (\"gemma-3-12b-it\", \"gemma-3-12b-it\"),\n        (\"gemma-2-27b-it\", \"gemma-2-27b-it\"),\n        (\"gemma-2-9b-it\", \"gemma-2-9b-it\"),\n        (\"gemma-2-2b-it\", \"gemma-2-2b-it\"),\n        (\"gemma-3-1b-it\", \"gemma-3-1b-it\"),\n        (\"gemma-3-4b-it\", \"gemma-3-4b-it\"),\n        (\"gemma-3-27b-it\", \"gemma-3-27b-it\"),\n        (\"gemma-3n-E4B-it\", \"gemma-3n-E4B-it\"),\n        (\"gemma-3n-E2B-it\", \"gemma-3n-E2B-it\"),\n        // Phi\n        (\"Phi-4\", \"Phi-4\"),\n        (\"Phi-3.5-mini-instruct\", \"Phi-3.5-mini-instruct\"),\n        (\"Phi-3-mini-4k-instruct\", \"Phi-3-mini-4k-instruct\"),\n        (\"Phi-4-mini-instruct\", \"Phi-4-mini-instruct\"),\n        (\"Phi-4-reasoning\", \"Phi-4-reasoning\"),\n        (\"Phi-4-mini-reasoning\", \"Phi-4-mini-reasoning\"),\n        // Llama 4\n        (\n            \"Llama-4-Scout-17B-16E-Instruct\",\n            \"Llama-4-Scout-17B-16E-Instruct\",\n        ),\n        (\n            \"Llama-4-Maverick-17B-128E-Instruct\",\n            \"Llama-4-Maverick-17B-128E-Instruct\",\n        ),\n    ];\n\n    for &(hf_suffix, mlx_base) in mappings {\n        let mapped_suffix = hf_suffix.to_lowercase();\n        if repo_lower == mapped_suffix || normalized_repo == mapped_suffix {\n            let base_lower = mlx_base.to_lowercase();\n            push_unique_candidate(&mut candidates, format!(\"{}-4bit\", base_lower));\n            push_unique_candidate(&mut candidates, format!(\"{}-8bit\", base_lower));\n            push_unique_candidate(&mut candidates, base_lower);\n            return candidates;\n        }\n    }\n\n    // Fallback heuristic: normalize explicit MLX names and try common variants.\n    if !normalized_repo.is_empty() {\n        push_unique_candidate(&mut candidates, format!(\"{}-4bit\", normalized_repo));\n        push_unique_candidate(&mut candidates, format!(\"{}-8bit\", normalized_repo));\n        // Some mlx-community repos use a -MLX- infix (e.g. Model-MLX-4bit)\n        push_unique_candidate(&mut candidates, format!(\"{}-mlx-4bit\", normalized_repo));\n        push_unique_candidate(&mut candidates, format!(\"{}-mlx-8bit\", normalized_repo));\n        push_unique_candidate(&mut candidates, normalized_repo.clone());\n    }\n\n    let stripped = strip_trailing_common_model_suffixes(&normalized_repo);\n    if !stripped.is_empty() && stripped != normalized_repo {\n        push_unique_candidate(&mut candidates, format!(\"{}-4bit\", stripped));\n        push_unique_candidate(&mut candidates, format!(\"{}-8bit\", stripped));\n        push_unique_candidate(&mut candidates, format!(\"{}-mlx-4bit\", stripped));\n        push_unique_candidate(&mut candidates, format!(\"{}-mlx-8bit\", stripped));\n        push_unique_candidate(&mut candidates, stripped);\n    }\n\n    candidates\n}\n\n/// Check if any MLX candidates for an HF model appear in the installed set.\npub fn is_model_installed_mlx(hf_name: &str, installed: &HashSet<String>) -> bool {\n    let candidates = hf_name_to_mlx_candidates(hf_name);\n    candidates.iter().any(|c| installed.contains(c))\n}\n\n/// Given an HF model name, return the best MLX tag to use for pulling.\npub fn mlx_pull_tag(hf_name: &str) -> String {\n    if let Some(repo_id) = explicit_mlx_repo_id(hf_name) {\n        return repo_id;\n    }\n    let candidates = hf_name_to_mlx_candidates(hf_name);\n    // Prefer 4bit (smaller download) for pulling\n    candidates\n        .iter()\n        .find(|c| c.ends_with(\"-4bit\"))\n        .cloned()\n        .unwrap_or_else(|| {\n            candidates.into_iter().next().unwrap_or_else(|| {\n                hf_name\n                    .split('/')\n                    .next_back()\n                    .unwrap_or(hf_name)\n                    .to_lowercase()\n            })\n        })\n}\n\n// ---------------------------------------------------------------------------\n// Ollama name-matching helpers\n// ---------------------------------------------------------------------------\n\n/// Authoritative mapping from HF repo name (lowercased, after slash) to Ollama tag.\n/// Only models with a known Ollama registry entry are listed here.\n/// If a model is not in this table, it cannot be pulled from Ollama.\nconst OLLAMA_MAPPINGS: &[(&str, &str)] = &[\n    // Meta Llama family\n    (\"llama-3.3-70b-instruct\", \"llama3.3:70b\"),\n    (\"llama-3.2-11b-vision-instruct\", \"llama3.2-vision:11b\"),\n    (\"llama-3.2-3b-instruct\", \"llama3.2:3b\"),\n    (\"llama-3.2-3b\", \"llama3.2:3b\"),\n    (\"llama-3.2-1b-instruct\", \"llama3.2:1b\"),\n    (\"llama-3.2-1b\", \"llama3.2:1b\"),\n    (\"llama-3.1-405b-instruct\", \"llama3.1:405b\"),\n    (\"llama-3.1-405b\", \"llama3.1:405b\"),\n    (\"llama-3.1-70b-instruct\", \"llama3.1:70b\"),\n    (\"llama-3.1-8b-instruct\", \"llama3.1:8b\"),\n    (\"llama-3.1-8b\", \"llama3.1:8b\"),\n    (\"meta-llama-3-8b-instruct\", \"llama3:8b\"),\n    (\"meta-llama-3-8b\", \"llama3:8b\"),\n    (\"llama-2-7b-hf\", \"llama2:7b\"),\n    (\"codellama-34b-instruct-hf\", \"codellama:34b\"),\n    (\"codellama-13b-instruct-hf\", \"codellama:13b\"),\n    (\"codellama-7b-instruct-hf\", \"codellama:7b\"),\n    // Google Gemma\n    (\"gemma-3-12b-it\", \"gemma3:12b\"),\n    (\"gemma-2-27b-it\", \"gemma2:27b\"),\n    (\"gemma-2-9b-it\", \"gemma2:9b\"),\n    (\"gemma-2-2b-it\", \"gemma2:2b\"),\n    // Microsoft Phi\n    (\"phi-4\", \"phi4\"),\n    (\"phi-4-mini-instruct\", \"phi4-mini\"),\n    (\"phi-3.5-mini-instruct\", \"phi3.5\"),\n    (\"phi-3-mini-4k-instruct\", \"phi3\"),\n    (\"phi-3-medium-14b-instruct\", \"phi3:14b\"),\n    (\"phi-2\", \"phi\"),\n    (\"orca-2-7b\", \"orca2:7b\"),\n    (\"orca-2-13b\", \"orca2:13b\"),\n    // Mistral\n    (\"mistral-7b-instruct-v0.3\", \"mistral:7b\"),\n    (\"mistral-7b-instruct-v0.2\", \"mistral:7b\"),\n    (\"mistral-nemo-instruct-2407\", \"mistral-nemo\"),\n    (\"mistral-small-24b-instruct-2501\", \"mistral-small:24b\"),\n    (\"mistral-large-instruct-2407\", \"mistral-large\"),\n    (\"mixtral-8x7b-instruct-v0.1\", \"mixtral:8x7b\"),\n    (\"mixtral-8x22b-instruct-v0.1\", \"mixtral:8x22b\"),\n    // Qwen 2 / 2.5\n    (\"qwen2-1.5b-instruct\", \"qwen2:1.5b\"),\n    (\"qwen2.5-72b-instruct\", \"qwen2.5:72b\"),\n    (\"qwen2.5-32b-instruct\", \"qwen2.5:32b\"),\n    (\"qwen2.5-14b-instruct\", \"qwen2.5:14b\"),\n    (\"qwen2.5-7b-instruct\", \"qwen2.5:7b\"),\n    (\"qwen2.5-7b\", \"qwen2.5:7b\"),\n    (\"qwen2.5-3b-instruct\", \"qwen2.5:3b\"),\n    (\"qwen2.5-1.5b-instruct\", \"qwen2.5:1.5b\"),\n    (\"qwen2.5-1.5b\", \"qwen2.5:1.5b\"),\n    (\"qwen2.5-0.5b-instruct\", \"qwen2.5:0.5b\"),\n    (\"qwen2.5-0.5b\", \"qwen2.5:0.5b\"),\n    (\"qwen2.5-coder-32b-instruct\", \"qwen2.5-coder:32b\"),\n    (\"qwen2.5-coder-14b-instruct\", \"qwen2.5-coder:14b\"),\n    (\"qwen2.5-coder-7b-instruct\", \"qwen2.5-coder:7b\"),\n    (\"qwen2.5-coder-1.5b-instruct\", \"qwen2.5-coder:1.5b\"),\n    (\"qwen2.5-coder-0.5b-instruct\", \"qwen2.5-coder:0.5b\"),\n    (\"qwen2.5-vl-7b-instruct\", \"qwen2.5vl:7b\"),\n    (\"qwen2.5-vl-3b-instruct\", \"qwen2.5vl:3b\"),\n    // Qwen 3\n    (\"qwen3-235b-a22b\", \"qwen3:235b\"),\n    (\"qwen3-32b\", \"qwen3:32b\"),\n    (\"qwen3-30b-a3b\", \"qwen3:30b-a3b\"),\n    (\"qwen3-30b-a3b-instruct-2507\", \"qwen3:30b-a3b\"),\n    (\"qwen3-14b\", \"qwen3:14b\"),\n    (\"qwen3-8b\", \"qwen3:8b\"),\n    (\"qwen3-4b\", \"qwen3:4b\"),\n    (\"qwen3-4b-instruct-2507\", \"qwen3:4b\"),\n    (\"qwen3-1.7b-base\", \"qwen3:1.7b\"),\n    (\"qwen3-0.6b\", \"qwen3:0.6b\"),\n    (\"qwen3-coder-30b-a3b-instruct\", \"qwen3-coder\"),\n    // Qwen 3.5\n    (\"qwen3.5-27b\", \"qwen3.5\"),\n    (\"qwen3.5-35b-a3b\", \"qwen3.5:35b\"),\n    (\"qwen3.5-122b-a10b\", \"qwen3.5:122b\"),\n    // Qwen3-Coder-Next\n    (\"qwen3-coder-next\", \"qwen3-coder-next\"),\n    // DeepSeek\n    (\"deepseek-v3\", \"deepseek-v3\"),\n    (\"deepseek-v3.2\", \"deepseek-v3\"),\n    (\"deepseek-r1\", \"deepseek-r1\"),\n    (\"deepseek-r1-0528\", \"deepseek-r1\"),\n    (\"deepseek-r1-distill-qwen-32b\", \"deepseek-r1:32b\"),\n    (\"deepseek-r1-distill-qwen-14b\", \"deepseek-r1:14b\"),\n    (\"deepseek-r1-distill-qwen-7b\", \"deepseek-r1:7b\"),\n    (\"deepseek-coder-v2-lite-instruct\", \"deepseek-coder-v2:16b\"),\n    // Community / other\n    (\"tinyllama-1.1b-chat-v1.0\", \"tinyllama\"),\n    (\"stablelm-2-1_6b-chat\", \"stablelm2:1.6b\"),\n    (\"yi-6b-chat\", \"yi:6b\"),\n    (\"yi-34b-chat\", \"yi:34b\"),\n    (\"starcoder2-7b\", \"starcoder2:7b\"),\n    (\"starcoder2-15b\", \"starcoder2:15b\"),\n    (\"falcon-7b-instruct\", \"falcon:7b\"),\n    (\"falcon-40b-instruct\", \"falcon:40b\"),\n    (\"falcon-180b-chat\", \"falcon:180b\"),\n    (\"falcon3-7b-instruct\", \"falcon3:7b\"),\n    (\"openchat-3.5-0106\", \"openchat:7b\"),\n    (\"vicuna-7b-v1.5\", \"vicuna:7b\"),\n    (\"vicuna-13b-v1.5\", \"vicuna:13b\"),\n    (\"glm-4-9b-chat\", \"glm4:9b\"),\n    (\"solar-10.7b-instruct-v1.0\", \"solar:10.7b\"),\n    (\"zephyr-7b-beta\", \"zephyr:7b\"),\n    (\"c4ai-command-r-v01\", \"command-r\"),\n    (\n        \"nous-hermes-2-mixtral-8x7b-dpo\",\n        \"nous-hermes2-mixtral:8x7b\",\n    ),\n    (\"hermes-3-llama-3.1-8b\", \"hermes3:8b\"),\n    (\"nomic-embed-text-v1.5\", \"nomic-embed-text\"),\n    (\"bge-large-en-v1.5\", \"bge-large\"),\n    (\"smollm2-135m-instruct\", \"smollm2:135m\"),\n    (\"smollm2-135m\", \"smollm2:135m\"),\n    // Google Gemma 3n\n    (\"gemma-3n-e4b-it\", \"gemma3n:e4b\"),\n    (\"gemma-3n-e2b-it\", \"gemma3n:e2b\"),\n    // Microsoft Phi-4 reasoning\n    (\"phi-4-reasoning\", \"phi4-reasoning\"),\n    (\"phi-4-mini-reasoning\", \"phi4-mini-reasoning\"),\n    // DeepSeek V3.2 Speciale (no local Ollama tag yet, maps to v3)\n    (\"deepseek-v3.2-speciale\", \"deepseek-v3\"),\n    // Liquid AI LFM2\n    (\"lfm2-350m\", \"lfm2:350m\"),\n    (\"lfm2-700m\", \"lfm2:700m\"),\n    (\"lfm2-1.2b\", \"lfm2:1.2b\"),\n    (\"lfm2-2.6b\", \"lfm2:2.6b\"),\n    (\"lfm2-2.6b-exp\", \"lfm2:2.6b\"),\n    (\"lfm2-8b-a1b\", \"lfm2:8b-a1b\"),\n    (\"lfm2-24b-a2b\", \"lfm2:24b\"),\n    // Liquid AI LFM2.5\n    (\"lfm2.5-1.2b-instruct\", \"lfm2.5:1.2b\"),\n    (\"lfm2.5-1.2b-thinking\", \"lfm2.5-thinking:1.2b\"),\n];\n\n/// Look up the Ollama tag for an HF repo name. Returns the first match\n/// from `OLLAMA_MAPPINGS`, or `None` if the model has no known Ollama equivalent.\nfn lookup_ollama_tag(hf_name: &str) -> Option<&'static str> {\n    let repo = hf_name\n        .split('/')\n        .next_back()\n        .unwrap_or(hf_name)\n        .to_lowercase();\n    OLLAMA_MAPPINGS\n        .iter()\n        .find(|&&(hf_suffix, _)| repo == hf_suffix)\n        .map(|&(_, tag)| tag)\n}\n\n/// Map a HuggingFace model name to Ollama candidate tags for install checking.\n/// Returns candidates from the authoritative mapping table only.\npub fn hf_name_to_ollama_candidates(hf_name: &str) -> Vec<String> {\n    match lookup_ollama_tag(hf_name) {\n        Some(tag) => vec![tag.to_string()],\n        None => vec![],\n    }\n}\n\n/// Returns `true` if this HF model has a known Ollama registry entry\n/// and can be pulled.\npub fn has_ollama_mapping(hf_name: &str) -> bool {\n    lookup_ollama_tag(hf_name).is_some()\n}\n\nfn ollama_installed_matches_candidate(installed_name: &str, candidate: &str) -> bool {\n    if installed_name == candidate {\n        return true;\n    }\n\n    // Allow variant tags reported by `ollama list`, e.g.\n    // candidate: \"qwen2.5-coder:7b\"\n    // installed: \"qwen2.5-coder:7b-instruct-q4_K_M\"\n    if candidate.contains(':') {\n        return installed_name.starts_with(&format!(\"{candidate}-\"));\n    }\n\n    false\n}\n\n/// Check if any of the Ollama candidates for an HF model appear in the\n/// installed set.\npub fn is_model_installed(hf_name: &str, installed: &HashSet<String>) -> bool {\n    let candidates = hf_name_to_ollama_candidates(hf_name);\n    candidates.iter().any(|candidate| {\n        installed\n            .iter()\n            .any(|installed_name| ollama_installed_matches_candidate(installed_name, candidate))\n    })\n}\n\n/// Given an HF model name, return the Ollama tag to use for pulling.\n/// Returns `None` if the model has no known Ollama mapping.\npub fn ollama_pull_tag(hf_name: &str) -> Option<String> {\n    lookup_ollama_tag(hf_name).map(|s| s.to_string())\n}\n\n#[cfg(test)]\nmod tests {\n    use super::*;\n\n    #[test]\n    fn test_hf_name_to_mlx_candidates() {\n        let candidates = hf_name_to_mlx_candidates(\"meta-llama/Llama-3.1-8B-Instruct\");\n        assert!(\n            candidates\n                .iter()\n                .any(|c| c.contains(\"llama-3.1-8b-instruct\"))\n        );\n        assert!(candidates.iter().any(|c| c.ends_with(\"-4bit\")));\n        assert!(candidates.iter().any(|c| c.ends_with(\"-8bit\")));\n\n        let qwen = hf_name_to_mlx_candidates(\"Qwen/Qwen2.5-Coder-14B-Instruct\");\n        assert!(\n            qwen.iter()\n                .any(|c| c.contains(\"qwen2.5-coder-14b-instruct\"))\n        );\n    }\n\n    #[test]\n    fn test_hf_name_to_mlx_candidates_qwen35() {\n        let candidates = hf_name_to_mlx_candidates(\"Qwen/Qwen3.5-9B\");\n        assert!(candidates.iter().any(|c| c == \"qwen3.5-9b-4bit\"));\n        assert!(candidates.iter().any(|c| c == \"qwen3.5-9b-8bit\"));\n    }\n\n    #[test]\n    fn test_hf_name_to_mlx_candidates_llama4() {\n        let candidates = hf_name_to_mlx_candidates(\"meta-llama/Llama-4-Scout-17B-16E-Instruct\");\n        assert!(candidates.iter().any(|c| c.contains(\"llama-4-scout\")));\n        assert!(candidates.iter().any(|c| c.ends_with(\"-4bit\")));\n    }\n\n    #[test]\n    fn test_hf_name_to_mlx_candidates_gemma3() {\n        let candidates = hf_name_to_mlx_candidates(\"google/gemma-3-27b-it\");\n        assert!(candidates.iter().any(|c| c == \"gemma-3-27b-it-4bit\"));\n        assert!(candidates.iter().any(|c| c == \"gemma-3-27b-it-8bit\"));\n    }\n\n    #[test]\n    fn test_hf_name_to_mlx_fallback_generates_mlx_infix_candidates() {\n        // For models not in the explicit mapping, the fallback should also\n        // generate candidates with the -mlx- infix pattern\n        let candidates = hf_name_to_mlx_candidates(\"SomeOrg/SomeNewModel-7B\");\n        assert!(candidates.iter().any(|c| c == \"somenewmodel-7b-mlx-4bit\"));\n        assert!(candidates.iter().any(|c| c == \"somenewmodel-7b-mlx-8bit\"));\n    }\n\n    #[test]\n    fn test_hf_name_to_mlx_candidates_normalizes_explicit_mlx_repo() {\n        let candidates =\n            hf_name_to_mlx_candidates(\"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit\");\n\n        assert!(\n            candidates\n                .contains(&\"lmstudio-community/qwen3-coder-30b-a3b-instruct-mlx-8bit\".to_string())\n        );\n        assert!(candidates.contains(&\"qwen3-coder-30b-a3b-instruct-4bit\".to_string()));\n        assert!(candidates.contains(&\"qwen3-coder-30b-a3b-instruct-8bit\".to_string()));\n        assert!(!candidates.iter().any(|c| c.contains(\"-8bit-4bit\")));\n        assert!(!candidates.iter().any(|c| c.contains(\"-8bit-8bit\")));\n    }\n\n    #[test]\n    fn test_mlx_pull_tag_prefers_explicit_repo_id() {\n        let tag = mlx_pull_tag(\"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit\");\n        assert_eq!(\n            tag,\n            \"lmstudio-community/qwen3-coder-30b-a3b-instruct-mlx-8bit\"\n        );\n    }\n\n    #[test]\n    fn test_mlx_cache_scan_parsing() {\n        // Test that the candidate matching works with cache-style names\n        let mut installed = HashSet::new();\n        installed.insert(\"llama-3.1-8b-instruct-4bit\".to_string());\n\n        assert!(is_model_installed_mlx(\n            \"meta-llama/Llama-3.1-8B-Instruct\",\n            &installed\n        ));\n        // Should not match unrelated model\n        assert!(!is_model_installed_mlx(\n            \"Qwen/Qwen2.5-7B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_is_model_installed_mlx() {\n        let mut installed = HashSet::new();\n        installed.insert(\"qwen2.5-coder-14b-instruct-8bit\".to_string());\n\n        assert!(is_model_installed_mlx(\n            \"Qwen/Qwen2.5-Coder-14B-Instruct\",\n            &installed\n        ));\n        assert!(!is_model_installed_mlx(\n            \"Qwen/Qwen2.5-14B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_is_model_installed_mlx_with_owner_prefixed_repo_id() {\n        let mut installed = HashSet::new();\n        installed.insert(\"lmstudio-community/qwen3-coder-30b-a3b-instruct-mlx-8bit\".to_string());\n\n        assert!(is_model_installed_mlx(\n            \"lmstudio-community/Qwen3-Coder-30B-A3B-Instruct-MLX-8bit\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_qwen_coder_14b_matches_coder_entry() {\n        // \"qwen2.5-coder:14b\" from `ollama list` should match\n        // the HF entry \"Qwen/Qwen2.5-Coder-14B-Instruct\", NOT\n        // the base \"Qwen/Qwen2.5-14B-Instruct\".\n        let mut installed = HashSet::new();\n        installed.insert(\"qwen2.5-coder:14b\".to_string());\n        installed.insert(\"qwen2.5-coder\".to_string());\n\n        assert!(is_model_installed(\n            \"Qwen/Qwen2.5-Coder-14B-Instruct\",\n            &installed\n        ));\n        // Must NOT match the non-coder model\n        assert!(!is_model_installed(\"Qwen/Qwen2.5-14B-Instruct\", &installed));\n    }\n\n    #[test]\n    fn test_qwen_base_does_not_match_coder() {\n        // \"qwen2.5:14b\" from `ollama list` should match the base model,\n        // not the coder variant.\n        let mut installed = HashSet::new();\n        installed.insert(\"qwen2.5:14b\".to_string());\n        installed.insert(\"qwen2.5\".to_string());\n\n        assert!(is_model_installed(\"Qwen/Qwen2.5-14B-Instruct\", &installed));\n        assert!(!is_model_installed(\n            \"Qwen/Qwen2.5-Coder-14B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_installed_variant_suffix_matches_ollama_candidate() {\n        // Real-world `ollama list` may include variant suffixes that still map\n        // to the canonical pull tag in OLLAMA_MAPPINGS.\n        let mut installed = HashSet::new();\n        installed.insert(\"qwen2.5-coder:7b-instruct\".to_string());\n\n        assert!(is_model_installed(\n            \"Qwen/Qwen2.5-Coder-7B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_candidates_for_coder_model() {\n        let candidates = hf_name_to_ollama_candidates(\"Qwen/Qwen2.5-Coder-14B-Instruct\");\n        assert!(candidates.contains(&\"qwen2.5-coder:14b\".to_string()));\n    }\n\n    #[test]\n    fn test_candidates_for_base_model() {\n        let candidates = hf_name_to_ollama_candidates(\"Qwen/Qwen2.5-14B-Instruct\");\n        assert!(candidates.contains(&\"qwen2.5:14b\".to_string()));\n    }\n\n    #[test]\n    fn test_llama_mapping() {\n        let candidates = hf_name_to_ollama_candidates(\"meta-llama/Llama-3.1-8B-Instruct\");\n        assert!(candidates.contains(&\"llama3.1:8b\".to_string()));\n    }\n\n    #[test]\n    fn test_deepseek_coder_mapping() {\n        let candidates =\n            hf_name_to_ollama_candidates(\"deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct\");\n        assert!(candidates.contains(&\"deepseek-coder-v2:16b\".to_string()));\n    }\n\n    #[test]\n    fn test_normalize_ollama_host_with_scheme() {\n        assert_eq!(\n            normalize_ollama_host(\"https://ollama.example.com:11434\"),\n            Some(\"https://ollama.example.com:11434\".to_string())\n        );\n    }\n\n    #[test]\n    fn test_normalize_ollama_host_without_scheme() {\n        assert_eq!(\n            normalize_ollama_host(\"ollama.example.com:11434\"),\n            Some(\"http://ollama.example.com:11434\".to_string())\n        );\n    }\n\n    #[test]\n    fn test_normalize_ollama_host_rejects_unsupported_scheme() {\n        assert_eq!(\n            normalize_ollama_host(\"ftp://ollama.example.com:11434\"),\n            None\n        );\n    }\n\n    #[test]\n    fn test_validate_gguf_filename_valid() {\n        assert!(validate_gguf_filename(\"Llama-3.1-8B-Q4_K_M.gguf\").is_ok());\n        assert!(validate_gguf_filename(\"model.gguf\").is_ok());\n    }\n\n    #[test]\n    fn test_validate_gguf_filename_traversal() {\n        assert!(validate_gguf_filename(\"../../outside.gguf\").is_err());\n        assert!(validate_gguf_filename(\"../evil.gguf\").is_err());\n        assert!(validate_gguf_filename(\"foo/../bar.gguf\").is_err());\n    }\n\n    #[test]\n    fn test_validate_gguf_filename_absolute() {\n        assert!(validate_gguf_filename(\"/etc/passwd\").is_err());\n        assert!(validate_gguf_filename(\"/tmp/model.gguf\").is_err());\n    }\n\n    #[test]\n    fn test_validate_gguf_filename_bad_extension() {\n        assert!(validate_gguf_filename(\"malware.exe\").is_err());\n        assert!(validate_gguf_filename(\"script.sh\").is_err());\n        assert!(validate_gguf_filename(\"./model.guuf\").is_err());\n    }\n\n    #[test]\n    fn test_validate_gguf_filename_empty() {\n        assert!(validate_gguf_filename(\"\").is_err());\n    }\n\n    #[test]\n    fn test_validate_gguf_filename_subdirectory() {\n        assert!(validate_gguf_filename(\"subdir/model.gguf\").is_err());\n    }\n\n    #[test]\n    fn test_validate_gguf_filename_rejects_non_basename_forms() {\n        assert!(validate_gguf_filename(\"./model.gguf\").is_err());\n        assert!(validate_gguf_filename(\"model.gguf/\").is_err());\n        assert!(validate_gguf_filename(\".\\\\model.gguf\").is_err());\n        assert!(validate_gguf_filename(\"C:/models/model.gguf\").is_err());\n        assert!(validate_gguf_filename(\"C:\\\\models\\\\model.gguf\").is_err());\n    }\n\n    #[test]\n    fn test_parse_repo_gguf_entries_filters_unsafe_paths() {\n        let entries = vec![\n            serde_json::json!({\"path\": \"good.gguf\", \"size\": 123u64}),\n            serde_json::json!({\"path\": \"../escape.gguf\", \"size\": 456u64}),\n            serde_json::json!({\"path\": \"nested/model.gguf\", \"size\": 789u64}),\n            serde_json::json!({\"path\": \"./model.gguf\", \"size\": 99u64}),\n            serde_json::json!({\"path\": \"readme.md\", \"size\": 12u64}),\n        ];\n\n        let files = parse_repo_gguf_entries(entries);\n        assert_eq!(files, vec![(\"good.gguf\".to_string(), 123u64)]);\n    }\n\n    // ────────────────────────────────────────────────────────────────────\n    // GGUF candidate generation tests\n    // ────────────────────────────────────────────────────────────────────\n\n    #[test]\n    fn test_hf_name_to_gguf_candidates_generates_common_patterns() {\n        // Use a model without a hardcoded mapping to test heuristic generation\n        let candidates = hf_name_to_gguf_candidates(\"SomeOrg/Cool-Model-7B\");\n        assert!(\n            candidates\n                .iter()\n                .any(|c| c == \"bartowski/Cool-Model-7B-GGUF\"),\n            \"Should generate bartowski candidate, got: {:?}\",\n            candidates\n        );\n        assert!(\n            candidates\n                .iter()\n                .any(|c| c == \"ggml-org/Cool-Model-7B-GGUF\"),\n            \"Should generate ggml-org candidate, got: {:?}\",\n            candidates\n        );\n        assert!(\n            candidates\n                .iter()\n                .any(|c| c == \"TheBloke/Cool-Model-7B-GGUF\"),\n            \"Should generate TheBloke candidate, got: {:?}\",\n            candidates\n        );\n    }\n\n    #[test]\n    fn test_hf_name_to_gguf_candidates_strips_owner() {\n        // Should use the model name part, not the full \"owner/name\"\n        let candidates = hf_name_to_gguf_candidates(\"Qwen/Qwen2.5-7B-Instruct\");\n        for c in &candidates {\n            assert!(\n                !c.contains(\"Qwen/Qwen\"),\n                \"Candidate should not contain original owner prefix: {}\",\n                c\n            );\n        }\n    }\n\n    #[test]\n    fn test_lookup_gguf_repo_known_mappings() {\n        // Models with hardcoded mappings should be found\n        assert!(lookup_gguf_repo(\"meta-llama/Llama-3.1-8B-Instruct\").is_some());\n        assert!(lookup_gguf_repo(\"deepseek-r1\").is_some());\n    }\n\n    #[test]\n    fn test_lookup_gguf_repo_unknown_returns_none() {\n        assert!(lookup_gguf_repo(\"totally-unknown/model-xyz\").is_none());\n    }\n\n    #[test]\n    fn test_has_gguf_mapping_matches_known_models() {\n        assert!(has_gguf_mapping(\"meta-llama/Llama-3.1-8B-Instruct\"));\n        assert!(!has_gguf_mapping(\"some-random/UnknownModel\"));\n    }\n\n    #[test]\n    fn test_gguf_candidates_fallback_covers_major_providers() {\n        // For a model without a hardcoded mapping, candidates should cover\n        // the major GGUF providers\n        let candidates = hf_name_to_gguf_candidates(\"SomeOrg/NewModel-7B\");\n        assert!(candidates.iter().any(|c| c.starts_with(\"bartowski/\")));\n        assert!(candidates.iter().any(|c| c.starts_with(\"ggml-org/\")));\n        assert!(candidates.iter().any(|c| c.starts_with(\"TheBloke/\")));\n        assert!(candidates.iter().all(|c| c.ends_with(\"-GGUF\")));\n    }\n\n    #[test]\n    fn test_gguf_candidates_known_mapping_returns_single() {\n        // Models with a hardcoded mapping should return just that repo\n        let candidates = hf_name_to_gguf_candidates(\"meta-llama/Llama-3.1-8B-Instruct\");\n        assert_eq!(candidates.len(), 1);\n        assert!(candidates[0].contains(\"GGUF\"));\n    }\n\n    // ── select_best_gguf ─────────────────────────────────────────────\n\n    #[test]\n    fn test_select_best_gguf_prefers_higher_quality() {\n        let files = vec![\n            (\"model-Q2_K.gguf\".to_string(), 2_000_000_000u64),\n            (\"model-Q4_K_M.gguf\".to_string(), 4_000_000_000u64),\n            (\"model-Q8_0.gguf\".to_string(), 8_000_000_000u64),\n        ];\n        let result = LlamaCppProvider::select_best_gguf(&files, 10.0);\n        assert!(result.is_some());\n        let (name, _) = result.unwrap();\n        assert!(name.contains(\"Q8_0\"), \"should prefer Q8, got: {}\", name);\n    }\n\n    #[test]\n    fn test_select_best_gguf_respects_budget() {\n        let files = vec![\n            (\"model-Q2_K.gguf\".to_string(), 2_000_000_000u64),\n            (\"model-Q4_K_M.gguf\".to_string(), 4_000_000_000u64),\n            (\"model-Q8_0.gguf\".to_string(), 8_000_000_000u64),\n        ];\n        // Budget ~3.7GB → Q2_K fits\n        let result = LlamaCppProvider::select_best_gguf(&files, 3.7);\n        assert!(result.is_some());\n        let (name, _) = result.unwrap();\n        assert!(\n            name.contains(\"Q2_K\"),\n            \"should select Q2_K for 3.7GB budget, got: {}\",\n            name\n        );\n    }\n\n    #[test]\n    fn test_select_best_gguf_nothing_fits() {\n        let files = vec![(\"model-Q2_K.gguf\".to_string(), 8_000_000_000u64)];\n        let result = LlamaCppProvider::select_best_gguf(&files, 1.0);\n        assert!(result.is_none());\n    }\n\n    #[test]\n    fn test_select_best_gguf_skips_split_files() {\n        let files = vec![\n            (\n                \"model-Q4_K_M-00001-of-00003.gguf\".to_string(),\n                4_000_000_000u64,\n            ),\n            (\"model-Q2_K.gguf\".to_string(), 2_000_000_000u64),\n        ];\n        let result = LlamaCppProvider::select_best_gguf(&files, 10.0);\n        assert!(result.is_some());\n        let (name, _) = result.unwrap();\n        assert!(\n            name.contains(\"Q2_K\"),\n            \"should skip split file, got: {}\",\n            name\n        );\n    }\n\n    #[test]\n    fn test_select_best_gguf_empty_list() {\n        let result = LlamaCppProvider::select_best_gguf(&[], 10.0);\n        assert!(result.is_none());\n    }\n\n    // ── is_split_file ────────────────────────────────────────────────\n\n    #[test]\n    fn test_is_split_file() {\n        assert!(is_split_file(\"model-00001-of-00003.gguf\"));\n        assert!(!is_split_file(\"model-Q4_K_M.gguf\"));\n        assert!(!is_split_file(\"model.gguf\"));\n    }\n\n    // ── urlencoding ──────────────────────────────────────────────────\n\n    #[test]\n    fn test_urlencoding_ascii() {\n        assert_eq!(urlencoding::encode(\"hello\"), \"hello\");\n        assert_eq!(urlencoding::encode(\"test-model_v1.0\"), \"test-model_v1.0\");\n    }\n\n    #[test]\n    fn test_urlencoding_special_chars() {\n        assert_eq!(urlencoding::encode(\"hello world\"), \"hello%20world\");\n        assert_eq!(urlencoding::encode(\"a+b\"), \"a%2Bb\");\n        assert_eq!(urlencoding::encode(\"foo/bar\"), \"foo%2Fbar\");\n    }\n\n    #[test]\n    fn test_urlencoding_empty() {\n        assert_eq!(urlencoding::encode(\"\"), \"\");\n    }\n\n    // ── is_model_installed_llamacpp ──────────────────────────────────\n\n    #[test]\n    fn test_is_model_installed_llamacpp_exact() {\n        let mut installed = HashSet::new();\n        installed.insert(\"llama-3.1-8b-instruct\".to_string());\n        assert!(is_model_installed_llamacpp(\n            \"meta-llama/Llama-3.1-8B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_is_model_installed_llamacpp_stripped_suffixes() {\n        let mut installed = HashSet::new();\n        installed.insert(\"llama-3.1-8b\".to_string());\n        assert!(is_model_installed_llamacpp(\n            \"meta-llama/Llama-3.1-8B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_is_model_installed_llamacpp_not_installed() {\n        let installed = HashSet::new();\n        assert!(!is_model_installed_llamacpp(\n            \"meta-llama/Llama-3.1-8B-Instruct\",\n            &installed\n        ));\n    }\n\n    // ── gguf_pull_tag ────────────────────────────────────────────────\n\n    #[test]\n    fn test_gguf_pull_tag_known() {\n        let tag = gguf_pull_tag(\"meta-llama/Llama-3.1-8B-Instruct\");\n        assert!(tag.is_some());\n        assert!(tag.unwrap().contains(\"GGUF\"));\n    }\n\n    #[test]\n    fn test_gguf_pull_tag_unknown() {\n        assert!(gguf_pull_tag(\"totally-unknown/model-xyz\").is_none());\n    }\n\n    // ── has_ollama_mapping ───────────────────────────────────────────\n\n    #[test]\n    fn test_has_ollama_mapping_known() {\n        assert!(has_ollama_mapping(\"meta-llama/Llama-3.1-8B-Instruct\"));\n        assert!(has_ollama_mapping(\"Qwen/Qwen2.5-7B-Instruct\"));\n    }\n\n    #[test]\n    fn test_has_ollama_mapping_unknown() {\n        assert!(!has_ollama_mapping(\"totally-unknown/model-xyz\"));\n    }\n\n    // ── ollama_pull_tag ──────────────────────────────────────────────\n\n    #[test]\n    fn test_ollama_pull_tag_known() {\n        let tag = ollama_pull_tag(\"meta-llama/Llama-3.1-8B-Instruct\");\n        assert_eq!(tag, Some(\"llama3.1:8b\".to_string()));\n    }\n\n    #[test]\n    fn test_ollama_pull_tag_unknown() {\n        assert!(ollama_pull_tag(\"totally-unknown/model-xyz\").is_none());\n    }\n\n    // ── mlx_pull_tag ─────────────────────────────────────────────────\n\n    #[test]\n    fn test_mlx_pull_tag_prefers_4bit() {\n        let tag = mlx_pull_tag(\"meta-llama/Llama-3.1-8B-Instruct\");\n        assert!(tag.ends_with(\"-4bit\"), \"should prefer 4bit, got: {}\", tag);\n    }\n\n    #[test]\n    fn test_mlx_pull_tag_fallback() {\n        let tag = mlx_pull_tag(\"SomeUnknown/Model-7B\");\n        assert!(!tag.is_empty());\n    }\n\n    // ── ollama_installed_matches_candidate ────────────────────────────\n\n    #[test]\n    fn test_ollama_installed_matches_exact() {\n        assert!(ollama_installed_matches_candidate(\n            \"llama3.1:8b\",\n            \"llama3.1:8b\"\n        ));\n    }\n\n    #[test]\n    fn test_ollama_installed_matches_variant_suffix() {\n        assert!(ollama_installed_matches_candidate(\n            \"llama3.1:8b-instruct-q4_K_M\",\n            \"llama3.1:8b\"\n        ));\n    }\n\n    #[test]\n    fn test_ollama_installed_no_match() {\n        assert!(!ollama_installed_matches_candidate(\n            \"qwen2.5:7b\",\n            \"llama3.1:8b\"\n        ));\n    }\n\n    // ── parse_repo_gguf_entries ──────────────────────────────────────\n\n    #[test]\n    fn test_parse_repo_gguf_entries_valid() {\n        let entries = vec![\n            serde_json::json!({\"path\": \"model-Q4_K_M.gguf\", \"size\": 4_000_000_000u64}),\n            serde_json::json!({\"path\": \"model-Q8_0.gguf\", \"size\": 8_000_000_000u64}),\n        ];\n        let files = parse_repo_gguf_entries(entries);\n        assert_eq!(files.len(), 2);\n        assert_eq!(files[0].0, \"model-Q4_K_M.gguf\");\n        assert_eq!(files[1].0, \"model-Q8_0.gguf\");\n    }\n\n    #[test]\n    fn test_parse_repo_gguf_entries_missing_size_defaults_to_zero() {\n        let entries = vec![serde_json::json!({\"path\": \"model.gguf\"})];\n        let files = parse_repo_gguf_entries(entries);\n        assert_eq!(files.len(), 1);\n        assert_eq!(files[0].1, 0);\n    }\n\n    #[test]\n    fn test_parse_repo_gguf_entries_skips_non_gguf() {\n        let entries = vec![\n            serde_json::json!({\"path\": \"README.md\", \"size\": 1000u64}),\n            serde_json::json!({\"path\": \"config.json\", \"size\": 500u64}),\n            serde_json::json!({\"path\": \"model.gguf\", \"size\": 4_000_000_000u64}),\n        ];\n        let files = parse_repo_gguf_entries(entries);\n        assert_eq!(files.len(), 1);\n        assert_eq!(files[0].0, \"model.gguf\");\n    }\n\n    // ── hf_name_to_mlx_candidates edge cases ─────────────────────────\n\n    #[test]\n    fn test_hf_name_to_mlx_candidates_bare_model_name() {\n        let candidates = hf_name_to_mlx_candidates(\"Phi-4\");\n        assert!(candidates.iter().any(|c| c.contains(\"phi-4\")));\n        assert!(candidates.iter().any(|c| c.ends_with(\"-4bit\")));\n    }\n\n    #[test]\n    fn test_hf_name_to_mlx_candidates_no_duplicates() {\n        let candidates = hf_name_to_mlx_candidates(\"meta-llama/Llama-3.1-8B-Instruct\");\n        let unique: HashSet<_> = candidates.iter().collect();\n        assert_eq!(\n            unique.len(),\n            candidates.len(),\n            \"candidates should have no duplicates: {:?}\",\n            candidates\n        );\n    }\n\n    // ── hf_name_to_ollama_candidates edge cases ──────────────────────\n\n    #[test]\n    fn test_hf_name_to_ollama_candidates_unknown_returns_empty() {\n        let candidates = hf_name_to_ollama_candidates(\"totally-unknown/model-xyz\");\n        assert!(candidates.is_empty());\n    }\n\n    #[test]\n    fn test_hf_name_to_ollama_candidates_multiple_models() {\n        // Test a variety of known models\n        assert!(!hf_name_to_ollama_candidates(\"meta-llama/Llama-3.1-8B-Instruct\").is_empty());\n        assert!(!hf_name_to_ollama_candidates(\"Qwen/Qwen2.5-Coder-7B-Instruct\").is_empty());\n        assert!(!hf_name_to_ollama_candidates(\"google/gemma-2-9b-it\").is_empty());\n    }\n\n    // ── Docker Model Runner ─────────────────────────────────────────\n\n    #[test]\n    fn test_docker_mr_catalog_parses() {\n        // The embedded catalog should parse without errors\n        let catalog = docker_mr_catalog();\n        assert!(!catalog.is_empty(), \"Docker MR catalog should not be empty\");\n    }\n\n    #[test]\n    fn test_has_docker_mr_mapping_known() {\n        // Llama 3.1 70B is in both our HF database and Docker Hub ai/ namespace\n        assert!(has_docker_mr_mapping(\"meta-llama/Llama-3.1-70B-Instruct\"));\n    }\n\n    #[test]\n    fn test_has_docker_mr_mapping_unknown() {\n        assert!(!has_docker_mr_mapping(\"totally-unknown/model-xyz\"));\n    }\n\n    #[test]\n    fn test_docker_mr_pull_tag_returns_ai_prefixed() {\n        let tag = docker_mr_pull_tag(\"meta-llama/Llama-3.1-70B-Instruct\");\n        assert!(tag.is_some());\n        assert!(tag.unwrap().starts_with(\"ai/\"));\n    }\n\n    #[test]\n    fn test_docker_mr_candidates_includes_ai_prefix() {\n        let candidates = hf_name_to_docker_mr_candidates(\"meta-llama/Llama-3.1-70B-Instruct\");\n        assert!(candidates.iter().any(|c| c.starts_with(\"ai/\")));\n    }\n\n    #[test]\n    fn test_docker_mr_candidates_unknown_returns_empty() {\n        let candidates = hf_name_to_docker_mr_candidates(\"totally-unknown/model-xyz\");\n        assert!(candidates.is_empty());\n    }\n\n    #[test]\n    fn test_is_model_installed_docker_mr_exact() {\n        let mut installed = HashSet::new();\n        installed.insert(\"ai/llama3.1:70b\".to_string());\n        installed.insert(\"llama3.1:70b\".to_string());\n        installed.insert(\"llama3.1\".to_string());\n        assert!(is_model_installed_docker_mr(\n            \"meta-llama/Llama-3.1-70B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_is_model_installed_docker_mr_variant_suffix() {\n        let mut installed = HashSet::new();\n        installed.insert(\"ai/llama3.1:70b-q4_k_m\".to_string());\n        assert!(is_model_installed_docker_mr(\n            \"meta-llama/Llama-3.1-70B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_is_model_installed_docker_mr_not_installed() {\n        let installed = HashSet::new();\n        assert!(!is_model_installed_docker_mr(\n            \"meta-llama/Llama-3.1-70B-Instruct\",\n            &installed\n        ));\n    }\n\n    #[test]\n    fn test_normalize_docker_mr_host_with_scheme() {\n        assert_eq!(\n            normalize_docker_mr_host(\"https://docker.example.com:12434\"),\n            Some(\"https://docker.example.com:12434\".to_string())\n        );\n    }\n\n    #[test]\n    fn test_normalize_docker_mr_host_without_scheme() {\n        assert_eq!(\n            normalize_docker_mr_host(\"docker.example.com:12434\"),\n            Some(\"http://docker.example.com:12434\".to_string())\n        );\n    }\n\n    #[test]\n    fn test_normalize_docker_mr_host_rejects_unsupported_scheme() {\n        assert_eq!(\n            normalize_docker_mr_host(\"ftp://docker.example.com:12434\"),\n            None\n        );\n    }\n}\n"
  },
  {
    "path": "llmfit-desktop/Cargo.toml",
    "content": "[package]\nname = \"llmfit-desktop\"\nversion.workspace = true\nedition = \"2024\"\nauthors = [\"axjns\"]\ndescription = \"macOS desktop application for llmfit — visual LLM hardware fitting\"\nlicense = \"MIT\"\nrepository = \"https://github.com/AlexsJones/llmfit\"\ndefault-run = \"llmfit-desktop\"\n\n[[bin]]\nname = \"llmfit-desktop\"\npath = \"src/main.rs\"\n\n[dependencies]\nllmfit-core = { path = \"../llmfit-core\" }\ntauri = { version = \"2\", features = [] }\nserde = { version = \"1.0\", features = [\"derive\"] }\nserde_json = \"1.0\"\n\n[build-dependencies]\ntauri-build = { version = \"2\", features = [] }\n\n[features]\ndefault = [\"custom-protocol\"]\ncustom-protocol = [\"tauri/custom-protocol\"]\n"
  },
  {
    "path": "llmfit-desktop/build.rs",
    "content": "fn main() {\n    tauri_build::build()\n}\n"
  },
  {
    "path": "llmfit-desktop/capabilities/default.json",
    "content": "{\n  \"identifier\": \"default\",\n  \"description\": \"Default permissions for llmfit desktop\",\n  \"windows\": [\"main\"],\n  \"permissions\": [\"core:default\"]\n}\n"
  },
  {
    "path": "llmfit-desktop/src/main.rs",
    "content": "#![cfg_attr(not(debug_assertions), windows_subsystem = \"windows\")]\n\nuse llmfit_core::fit::{FitLevel, InferenceRuntime, ModelFit, RunMode};\nuse llmfit_core::hardware::SystemSpecs;\nuse llmfit_core::models::ModelDatabase;\nuse llmfit_core::providers::{ModelProvider, OllamaProvider, PullEvent};\nuse serde::Serialize;\nuse std::sync::Mutex;\nuse tauri::State;\n\n#[derive(Serialize)]\nstruct GpuInfoJs {\n    name: String,\n    vram_gb: Option<f64>,\n    backend: String,\n    count: u32,\n    unified_memory: bool,\n}\n\n#[derive(Serialize)]\nstruct SystemInfo {\n    total_ram_gb: f64,\n    available_ram_gb: f64,\n    cpu_name: String,\n    cpu_cores: usize,\n    gpus: Vec<GpuInfoJs>,\n    unified_memory: bool,\n}\n\n#[derive(Serialize, Clone)]\nstruct ModelFitInfo {\n    name: String,\n    params_b: f64,\n    quant: String,\n    fit_level: String,\n    run_mode: String,\n    score: f64,\n    memory_required_gb: f64,\n    memory_available_gb: f64,\n    utilization_pct: f64,\n    estimated_tps: f64,\n    use_case: String,\n    runtime: String,\n    installed: bool,\n    notes: Vec<String>,\n    release_date: Option<String>,\n}\n\n#[derive(Serialize)]\nstruct PullStatus {\n    status: String,\n    percent: Option<f64>,\n    done: bool,\n    error: Option<String>,\n}\n\nstruct AppState {\n    ollama: OllamaProvider,\n    pull_handle: Mutex<Option<llmfit_core::providers::PullHandle>>,\n}\n\n#[tauri::command]\nfn get_system_specs() -> Result<SystemInfo, String> {\n    let specs = SystemSpecs::detect();\n    let gpus = specs\n        .gpus\n        .iter()\n        .map(|g| GpuInfoJs {\n            name: g.name.clone(),\n            vram_gb: g.vram_gb,\n            backend: format!(\"{:?}\", g.backend),\n            count: g.count,\n            unified_memory: g.unified_memory,\n        })\n        .collect();\n    Ok(SystemInfo {\n        total_ram_gb: specs.total_ram_gb,\n        available_ram_gb: specs.available_ram_gb,\n        cpu_name: specs.cpu_name.clone(),\n        cpu_cores: specs.total_cpu_cores,\n        gpus,\n        unified_memory: specs.unified_memory,\n    })\n}\n\n#[tauri::command]\nfn get_model_fits() -> Result<Vec<ModelFitInfo>, String> {\n    let specs = SystemSpecs::detect();\n    let db = ModelDatabase::new();\n\n    let mut fits: Vec<ModelFit> = db\n        .get_all_models()\n        .iter()\n        .map(|m| ModelFit::analyze(m, &specs))\n        .collect();\n\n    fits = llmfit_core::fit::rank_models_by_fit(fits);\n\n    Ok(fits\n        .into_iter()\n        .map(|f| ModelFitInfo {\n            name: f.model.name.clone(),\n            params_b: f.model.parameters_raw.unwrap_or(0) as f64 / 1e9,\n            quant: f.best_quant.clone(),\n            fit_level: match f.fit_level {\n                FitLevel::Perfect => \"Perfect\".to_string(),\n                FitLevel::Good => \"Good\".to_string(),\n                FitLevel::Marginal => \"Marginal\".to_string(),\n                FitLevel::TooTight => \"Too Tight\".to_string(),\n            },\n            run_mode: match f.run_mode {\n                RunMode::Gpu => \"GPU\".to_string(),\n                RunMode::CpuOffload => \"CPU Offload\".to_string(),\n                RunMode::CpuOnly => \"CPU Only\".to_string(),\n                RunMode::MoeOffload => \"MoE Offload\".to_string(),\n            },\n            score: f.score,\n            memory_required_gb: f.memory_required_gb,\n            memory_available_gb: f.memory_available_gb,\n            utilization_pct: f.utilization_pct,\n            estimated_tps: f.estimated_tps,\n            use_case: format!(\"{:?}\", f.use_case),\n            runtime: match f.runtime {\n                InferenceRuntime::LlamaCpp => \"llama.cpp\".to_string(),\n                InferenceRuntime::Mlx => \"MLX\".to_string(),\n            },\n            installed: f.installed,\n            notes: f.notes.clone(),\n            release_date: f.model.release_date.clone(),\n        })\n        .collect())\n}\n\n#[tauri::command]\nfn start_pull(model_tag: String, state: State<'_, AppState>) -> Result<String, String> {\n    let handle = state.ollama.start_pull(&model_tag)?;\n    let mut pull = state.pull_handle.lock().map_err(|e| e.to_string())?;\n    *pull = Some(handle);\n    Ok(\"started\".to_string())\n}\n\n#[tauri::command]\nfn poll_pull(state: State<'_, AppState>) -> Result<PullStatus, String> {\n    let pull = state.pull_handle.lock().map_err(|e| e.to_string())?;\n    if let Some(ref handle) = *pull {\n        match handle.receiver.try_recv() {\n            Ok(PullEvent::Progress { status, percent }) => Ok(PullStatus {\n                status,\n                percent,\n                done: false,\n                error: None,\n            }),\n            Ok(PullEvent::Done) => Ok(PullStatus {\n                status: \"Complete\".to_string(),\n                percent: Some(100.0),\n                done: true,\n                error: None,\n            }),\n            Ok(PullEvent::Error(e)) => Ok(PullStatus {\n                status: \"Error\".to_string(),\n                percent: None,\n                done: true,\n                error: Some(e),\n            }),\n            Err(std::sync::mpsc::TryRecvError::Empty) => Ok(PullStatus {\n                status: \"Waiting...\".to_string(),\n                percent: None,\n                done: false,\n                error: None,\n            }),\n            Err(std::sync::mpsc::TryRecvError::Disconnected) => Ok(PullStatus {\n                status: \"Complete\".to_string(),\n                percent: Some(100.0),\n                done: true,\n                error: None,\n            }),\n        }\n    } else {\n        Err(\"No pull in progress\".to_string())\n    }\n}\n\n#[tauri::command]\nfn is_ollama_available(state: State<'_, AppState>) -> bool {\n    state.ollama.is_available()\n}\n\nfn main() {\n    tauri::Builder::default()\n        .manage(AppState {\n            ollama: OllamaProvider::new(),\n            pull_handle: Mutex::new(None),\n        })\n        .invoke_handler(tauri::generate_handler![\n            get_system_specs,\n            get_model_fits,\n            start_pull,\n            poll_pull,\n            is_ollama_available,\n        ])\n        .run(tauri::generate_context!())\n        .expect(\"error while running tauri application\");\n}\n"
  },
  {
    "path": "llmfit-desktop/tauri.conf.json",
    "content": "{\n  \"productName\": \"llmfit\",\n  \"version\": \"0.4.8\",\n  \"identifier\": \"com.llmfit.desktop\",\n  \"build\": {\n    \"frontendDist\": \"./ui\"\n  },\n  \"app\": {\n    \"withGlobalTauri\": true,\n    \"windows\": [\n      {\n        \"title\": \"llmfit — LLM Hardware Fitting\",\n        \"width\": 1200,\n        \"height\": 800,\n        \"resizable\": true\n      }\n    ],\n    \"security\": {\n      \"csp\": \"default-src 'self'; style-src 'self'; script-src 'self'\"\n    }\n  }\n}\n"
  },
  {
    "path": "llmfit-desktop/ui/app.js",
    "content": "const invoke = window.__TAURI_INTERNALS__\n  ? window.__TAURI_INTERNALS__.invoke\n  : async (cmd) => { console.warn('Tauri not available, cmd:', cmd); return null; };\n\nlet allFits = [];\nlet ollamaAvailable = false;\nlet pullInterval = null;\n\nfunction esc(s) {\n  const d = document.createElement('div');\n  d.textContent = s;\n  return d.innerHTML;\n}\n\nasync function loadSpecs() {\n  try {\n    const specs = await invoke('get_system_specs');\n    if (!specs) return;\n\n    document.getElementById('cpu-name').textContent = specs.cpu_name;\n    document.getElementById('cpu-cores').textContent = specs.cpu_cores + ' cores';\n    document.getElementById('ram-total').textContent = specs.total_ram_gb.toFixed(1) + ' GB';\n    document.getElementById('ram-available').textContent = specs.available_ram_gb.toFixed(1) + ' GB';\n\n    const container = document.getElementById('gpus-container');\n    container.innerHTML = '';\n\n    if (specs.gpus.length === 0) {\n      const card = document.createElement('div');\n      card.className = 'spec-card';\n      card.innerHTML = '<span class=\"spec-label\">GPU</span>' +\n        '<span class=\"spec-value\">No GPU detected</span>';\n      container.appendChild(card);\n    } else {\n      specs.gpus.forEach((gpu, i) => {\n        const card = document.createElement('div');\n        card.className = 'spec-card';\n        const label = specs.gpus.length > 1 ? 'GPU ' + (i + 1) : 'GPU';\n        const countStr = gpu.count > 1 ? ' ×' + gpu.count : '';\n        const vramStr = gpu.vram_gb != null ? gpu.vram_gb.toFixed(1) + ' GB VRAM' : 'Shared memory';\n        const backendStr = gpu.backend !== 'None' ? gpu.backend : '';\n        const details = [vramStr, backendStr].filter(Boolean).join(' · ');\n        card.innerHTML = '<span class=\"spec-label\">' + esc(label) + '</span>' +\n          '<span class=\"spec-value\">' + esc(gpu.name + countStr) + '</span>' +\n          '<span class=\"spec-detail\">' + esc(details) + '</span>';\n        container.appendChild(card);\n      });\n    }\n\n    if (specs.unified_memory) {\n      const archCard = document.getElementById('memory-arch-card');\n      archCard.style.display = '';\n      document.getElementById('memory-arch').textContent = 'Unified (CPU + GPU shared)';\n    }\n  } catch (e) {\n    console.error('Failed to load specs:', e);\n    document.getElementById('cpu-name').textContent = 'Error loading specs';\n  }\n}\n\nfunction fitClass(level) {\n  switch (level) {\n    case 'Perfect': return 'fit-perfect';\n    case 'Good': return 'fit-good';\n    case 'Marginal': return 'fit-marginal';\n    default: return 'fit-tight';\n  }\n}\n\nfunction modeClass(mode) {\n  switch (mode) {\n    case 'GPU': return 'mode-gpu';\n    case 'MoE Offload': return 'mode-moe';\n    case 'CPU Offload': return 'mode-cpuoffload';\n    default: return 'mode-cpuonly';\n  }\n}\n\nfunction showModal(fit) {\n  const modal = document.getElementById('model-modal');\n  const body = document.getElementById('modal-body');\n\n  const memBar = Math.min(fit.utilization_pct, 100);\n  const memBarClass = fit.utilization_pct > 95 ? 'bar-red' : fit.utilization_pct > 80 ? 'bar-yellow' : 'bar-green';\n\n  let notesHtml = '';\n  if (fit.notes && fit.notes.length > 0) {\n    notesHtml = '<div class=\"modal-section\"><h4>Notes</h4><ul>' +\n      fit.notes.map(n => '<li>' + esc(n) + '</li>').join('') +\n      '</ul></div>';\n  }\n\n  const installedBadge = fit.installed\n    ? '<span class=\"badge badge-installed\">Installed</span>'\n    : '<span class=\"badge badge-not-installed\">Not Installed</span>';\n\n  const downloadBtn = (!fit.installed && ollamaAvailable)\n    ? '<button class=\"btn-download\" onclick=\"pullModel(\\'' + esc(fit.name) + '\\')\">⬇ Download via Ollama</button>'\n    : '';\n\n  body.innerHTML = `\n    <div class=\"modal-header-row\">\n      <h3>${esc(fit.name)}</h3>\n      ${installedBadge}\n    </div>\n\n    <div class=\"modal-grid\">\n      <div class=\"modal-stat\">\n        <span class=\"stat-label\">Parameters</span>\n        <span class=\"stat-value\">${esc(fit.params_b.toFixed(1))}B</span>\n      </div>\n      <div class=\"modal-stat\">\n        <span class=\"stat-label\">Quantization</span>\n        <span class=\"stat-value\">${esc(fit.quant)}</span>\n      </div>\n      <div class=\"modal-stat\">\n        <span class=\"stat-label\">Runtime</span>\n        <span class=\"stat-value\">${esc(fit.runtime)}</span>\n      </div>\n      <div class=\"modal-stat\">\n        <span class=\"stat-label\">Score</span>\n        <span class=\"stat-value\">${esc(fit.score.toFixed(0))}/100</span>\n      </div>\n      <div class=\"modal-stat\">\n        <span class=\"stat-label\">Est. Speed</span>\n        <span class=\"stat-value\">${esc(fit.estimated_tps.toFixed(1))} tok/s</span>\n      </div>\n      <div class=\"modal-stat\">\n        <span class=\"stat-label\">Use Case</span>\n        <span class=\"stat-value\">${esc(fit.use_case)}</span>\n      </div>\n    </div>\n\n    <div class=\"modal-section\">\n      <h4>Fit Analysis</h4>\n      <div class=\"fit-row\">\n        <span class=\"${fitClass(fit.fit_level)}\">${esc(fit.fit_level)}</span>\n        <span class=\"fit-detail\">${esc(fit.run_mode)}</span>\n      </div>\n      <div class=\"mem-bar-container\">\n        <div class=\"mem-bar-label\">\n          <span>Memory: ${esc(fit.memory_required_gb.toFixed(1))} / ${esc(fit.memory_available_gb.toFixed(1))} GB</span>\n          <span>${esc(fit.utilization_pct.toFixed(0))}%</span>\n        </div>\n        <div class=\"mem-bar-track\">\n          <div class=\"mem-bar-fill ${memBarClass}\" style=\"width: ${memBar}%\"></div>\n        </div>\n      </div>\n    </div>\n\n    ${notesHtml}\n\n    <div id=\"pull-status\" class=\"pull-status\" style=\"display:none\">\n      <div class=\"pull-status-text\"></div>\n      <div class=\"mem-bar-track\">\n        <div class=\"pull-bar-fill\" style=\"width: 0%\"></div>\n      </div>\n    </div>\n\n    <div class=\"modal-actions\">\n      ${downloadBtn}\n      <button class=\"btn-close\" onclick=\"closeModal()\">Close</button>\n    </div>\n  `;\n\n  modal.classList.add('visible');\n}\n\nfunction closeModal() {\n  document.getElementById('model-modal').classList.remove('visible');\n  if (pullInterval) {\n    clearInterval(pullInterval);\n    pullInterval = null;\n  }\n}\n\nasync function pullModel(name) {\n  const statusEl = document.getElementById('pull-status');\n  const textEl = statusEl.querySelector('.pull-status-text');\n  const barEl = statusEl.querySelector('.pull-bar-fill');\n  const btn = document.querySelector('.btn-download');\n\n  statusEl.style.display = '';\n  if (btn) btn.disabled = true;\n  textEl.textContent = 'Starting download...';\n\n  try {\n    await invoke('start_pull', { modelTag: name });\n\n    pullInterval = setInterval(async () => {\n      try {\n        const s = await invoke('poll_pull');\n        if (!s) return;\n        textEl.textContent = s.status;\n        if (s.percent != null) barEl.style.width = s.percent + '%';\n        if (s.done) {\n          clearInterval(pullInterval);\n          pullInterval = null;\n          if (s.error) {\n            textEl.textContent = 'Error: ' + s.error;\n            if (btn) btn.disabled = false;\n          } else {\n            textEl.textContent = 'Download complete!';\n            barEl.style.width = '100%';\n            // Refresh model list\n            await loadModels();\n          }\n        }\n      } catch (e) {\n        console.error('Poll error:', e);\n      }\n    }, 500);\n  } catch (e) {\n    textEl.textContent = 'Error: ' + e;\n    if (btn) btn.disabled = false;\n  }\n}\n\nfunction renderModels(fits) {\n  const tbody = document.getElementById('models-body');\n  if (!fits || fits.length === 0) {\n    tbody.innerHTML = '<tr><td colspan=\"9\" class=\"loading\">No models found</td></tr>';\n    return;\n  }\n  tbody.innerHTML = fits.map((f, i) => `\n    <tr class=\"model-row\" data-index=\"${i}\">\n      <td><strong>${esc(f.name)}</strong>${f.installed ? ' <span class=\"installed-dot\" title=\"Installed\">●</span>' : ''}</td>\n      <td>${esc(f.params_b.toFixed(1))}B</td>\n      <td>${esc(f.quant)}</td>\n      <td class=\"${fitClass(f.fit_level)}\">${esc(f.fit_level)}</td>\n      <td class=\"${modeClass(f.run_mode)}\">${esc(f.run_mode)}</td>\n      <td>${esc(f.score.toFixed(0))}</td>\n      <td>${esc(f.memory_required_gb.toFixed(1))} GB</td>\n      <td>${esc(f.estimated_tps.toFixed(1))}</td>\n      <td>${esc(f.use_case)}</td>\n    </tr>\n  `).join('');\n\n  // Attach click handlers\n  const currentFits = fits;\n  tbody.querySelectorAll('.model-row').forEach(row => {\n    row.addEventListener('click', () => {\n      const idx = parseInt(row.dataset.index, 10);\n      showModal(currentFits[idx]);\n    });\n  });\n}\n\nfunction applyFilters() {\n  const search = document.getElementById('search').value.toLowerCase();\n  const fitFilter = document.getElementById('fit-filter').value;\n\n  let filtered = allFits;\n  if (search) {\n    filtered = filtered.filter(f => f.name.toLowerCase().includes(search));\n  }\n  if (fitFilter !== 'all') {\n    filtered = filtered.filter(f => f.fit_level === fitFilter);\n  }\n  renderModels(filtered);\n}\n\nasync function loadModels() {\n  try {\n    allFits = await invoke('get_model_fits') || [];\n    applyFilters();\n  } catch (e) {\n    console.error('Failed to load models:', e);\n    document.getElementById('models-body').innerHTML =\n      '<tr><td colspan=\"9\" class=\"loading\">Error loading models</td></tr>';\n  }\n}\n\n// Close modal on backdrop click\ndocument.getElementById('model-modal').addEventListener('click', (e) => {\n  if (e.target === e.currentTarget) closeModal();\n});\n\n// Close modal on Escape\ndocument.addEventListener('keydown', (e) => {\n  if (e.key === 'Escape') closeModal();\n});\n\ndocument.getElementById('search').addEventListener('input', applyFilters);\ndocument.getElementById('fit-filter').addEventListener('change', applyFilters);\n\nasync function init() {\n  ollamaAvailable = await invoke('is_ollama_available') || false;\n  loadSpecs();\n  loadModels();\n}\n\ninit();\n"
  },
  {
    "path": "llmfit-desktop/ui/index.html",
    "content": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n  <meta charset=\"UTF-8\">\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n  <title>llmfit</title>\n  <link rel=\"stylesheet\" href=\"styles.css\">\n</head>\n<body>\n  <section id=\"system-panel\">\n    <h2>System</h2>\n    <div id=\"specs-grid\" class=\"specs-grid\">\n      <div class=\"spec-card\">\n        <span class=\"spec-label\">CPU</span>\n        <span id=\"cpu-name\" class=\"spec-value\">Detecting...</span>\n        <span id=\"cpu-cores\" class=\"spec-detail\"></span>\n      </div>\n      <div class=\"spec-card\">\n        <span class=\"spec-label\">Total RAM</span>\n        <span id=\"ram-total\" class=\"spec-value\">—</span>\n      </div>\n      <div class=\"spec-card\">\n        <span class=\"spec-label\">Available RAM</span>\n        <span id=\"ram-available\" class=\"spec-value\">—</span>\n      </div>\n      <div id=\"gpus-container\">\n        <!-- GPU cards injected by JS -->\n      </div>\n      <div class=\"spec-card\" id=\"memory-arch-card\" style=\"display:none\">\n        <span class=\"spec-label\">Memory</span>\n        <span id=\"memory-arch\" class=\"spec-value\">—</span>\n      </div>\n    </div>\n  </section>\n\n  <section id=\"models-panel\">\n    <h2>Model Compatibility</h2>\n    <div class=\"controls\">\n      <input type=\"text\" id=\"search\" placeholder=\"Filter models...\" />\n      <select id=\"fit-filter\">\n        <option value=\"all\">All Fit Levels</option>\n        <option value=\"Perfect\">Perfect</option>\n        <option value=\"Good\">Good</option>\n        <option value=\"Marginal\">Marginal</option>\n        <option value=\"Too Tight\">Too Tight</option>\n      </select>\n    </div>\n    <div id=\"models-table-container\">\n      <table id=\"models-table\">\n        <thead>\n          <tr>\n            <th>Model</th>\n            <th>Params</th>\n            <th>Quant</th>\n            <th>Fit</th>\n            <th>Mode</th>\n            <th>Score</th>\n            <th>RAM Req</th>\n            <th>Est. TPS</th>\n            <th>Use Case</th>\n          </tr>\n        </thead>\n        <tbody id=\"models-body\">\n          <tr><td colspan=\"9\" class=\"loading\">Loading models...</td></tr>\n        </tbody>\n      </table>\n    </div>\n  </section>\n\n  <div id=\"model-modal\" class=\"modal-overlay\">\n    <div class=\"modal-content\" id=\"modal-body\"></div>\n  </div>\n\n  <script src=\"app.js\"></script>\n</body>\n</html>\n"
  },
  {
    "path": "llmfit-desktop/ui/styles.css",
    "content": ":root {\n  --bg: #0d1117;\n  --surface: #161b22;\n  --border: #30363d;\n  --text: #e6edf3;\n  --text-dim: #8b949e;\n  --accent: #58a6ff;\n  --green: #3fb950;\n  --yellow: #d29922;\n  --red: #f85149;\n  --cyan: #39c5cf;\n}\n\n* { box-sizing: border-box; margin: 0; padding: 0; }\n\nbody {\n  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif;\n  background: var(--bg);\n  color: var(--text);\n  padding: 24px;\n  line-height: 1.5;\n}\n\nh2 { font-size: 18px; font-weight: 600; margin-bottom: 12px; color: var(--text-dim); }\n\n.specs-grid {\n  display: grid;\n  grid-template-columns: repeat(auto-fill, minmax(200px, 1fr));\n  gap: 12px;\n  margin-bottom: 24px;\n}\n\n#gpus-container {\n  display: contents;\n}\n\n.spec-card {\n  background: var(--surface);\n  border: 1px solid var(--border);\n  border-radius: 8px;\n  padding: 16px;\n  display: flex;\n  flex-direction: column;\n}\n\n.spec-label { font-size: 12px; text-transform: uppercase; color: var(--text-dim); margin-bottom: 4px; }\n.spec-value { font-size: 16px; font-weight: 600; }\n.spec-detail { font-size: 13px; color: var(--text-dim); margin-top: 2px; }\n\n.controls {\n  display: flex;\n  gap: 12px;\n  margin-bottom: 12px;\n}\n\n#search {\n  flex: 1;\n  padding: 8px 12px;\n  background: var(--surface);\n  border: 1px solid var(--border);\n  border-radius: 6px;\n  color: var(--text);\n  font-size: 14px;\n}\n\n#fit-filter {\n  padding: 8px 12px;\n  background: var(--surface);\n  border: 1px solid var(--border);\n  border-radius: 6px;\n  color: var(--text);\n  font-size: 14px;\n}\n\n#models-table-container {\n  overflow-x: auto;\n  border: 1px solid var(--border);\n  border-radius: 8px;\n}\n\ntable {\n  width: 100%;\n  border-collapse: collapse;\n  font-size: 14px;\n}\n\nth {\n  text-align: left;\n  padding: 10px 12px;\n  background: var(--surface);\n  border-bottom: 1px solid var(--border);\n  font-weight: 600;\n  color: var(--text-dim);\n  position: sticky;\n  top: 0;\n}\n\ntd {\n  padding: 8px 12px;\n  border-bottom: 1px solid var(--border);\n}\n\ntr:hover { background: var(--surface); }\n\n.loading { text-align: center; color: var(--text-dim); padding: 32px; }\n\n.fit-perfect { color: var(--green); font-weight: 600; }\n.fit-good { color: var(--accent); font-weight: 600; }\n.fit-marginal { color: var(--yellow); }\n.fit-tight { color: var(--red); }\n\n.mode-gpu { color: var(--green); }\n.mode-moe { color: var(--cyan); }\n.mode-cpuoffload { color: var(--yellow); }\n.mode-cpuonly { color: var(--text-dim); }\n\n/* Clickable rows */\n.model-row { cursor: pointer; }\n.model-row:hover { background: var(--surface) !important; }\n\n.installed-dot { color: var(--green); font-size: 10px; }\n\n/* Modal */\n.modal-overlay {\n  display: none;\n  position: fixed;\n  inset: 0;\n  background: rgba(0, 0, 0, 0.6);\n  backdrop-filter: blur(4px);\n  z-index: 100;\n  align-items: center;\n  justify-content: center;\n}\n\n.modal-overlay.visible { display: flex; }\n\n.modal-content {\n  background: var(--surface);\n  border: 1px solid var(--border);\n  border-radius: 12px;\n  padding: 24px;\n  max-width: 560px;\n  width: 90%;\n  max-height: 80vh;\n  overflow-y: auto;\n}\n\n.modal-header-row {\n  display: flex;\n  align-items: center;\n  gap: 12px;\n  margin-bottom: 20px;\n}\n\n.modal-header-row h3 { font-size: 20px; font-weight: 700; }\n\n.badge {\n  font-size: 11px;\n  padding: 2px 8px;\n  border-radius: 12px;\n  font-weight: 600;\n  text-transform: uppercase;\n}\n\n.badge-installed { background: rgba(63, 185, 80, 0.15); color: var(--green); }\n.badge-not-installed { background: rgba(139, 148, 158, 0.15); color: var(--text-dim); }\n\n.modal-grid {\n  display: grid;\n  grid-template-columns: repeat(3, 1fr);\n  gap: 12px;\n  margin-bottom: 20px;\n}\n\n.modal-stat {\n  display: flex;\n  flex-direction: column;\n}\n\n.stat-label { font-size: 11px; text-transform: uppercase; color: var(--text-dim); margin-bottom: 2px; }\n.stat-value { font-size: 15px; font-weight: 600; }\n\n.modal-section { margin-bottom: 16px; }\n.modal-section h4 { font-size: 13px; color: var(--text-dim); text-transform: uppercase; margin-bottom: 8px; }\n.modal-section ul { padding-left: 18px; font-size: 13px; color: var(--text-dim); }\n.modal-section li { margin-bottom: 4px; }\n\n.fit-row {\n  display: flex;\n  align-items: center;\n  gap: 12px;\n  margin-bottom: 8px;\n  font-size: 15px;\n}\n\n.fit-detail { color: var(--text-dim); font-size: 13px; }\n\n.mem-bar-container { margin-top: 4px; }\n\n.mem-bar-label {\n  display: flex;\n  justify-content: space-between;\n  font-size: 12px;\n  color: var(--text-dim);\n  margin-bottom: 4px;\n}\n\n.mem-bar-track {\n  height: 8px;\n  background: var(--border);\n  border-radius: 4px;\n  overflow: hidden;\n}\n\n.mem-bar-fill {\n  height: 100%;\n  border-radius: 4px;\n  transition: width 0.3s;\n}\n\n.bar-green { background: var(--green); }\n.bar-yellow { background: var(--yellow); }\n.bar-red { background: var(--red); }\n\n.pull-status {\n  margin: 16px 0;\n}\n\n.pull-status-text {\n  font-size: 13px;\n  color: var(--text-dim);\n  margin-bottom: 6px;\n}\n\n.pull-bar-fill {\n  height: 100%;\n  border-radius: 4px;\n  background: var(--accent);\n  transition: width 0.3s;\n}\n\n.modal-actions {\n  display: flex;\n  gap: 12px;\n  justify-content: flex-end;\n  margin-top: 20px;\n}\n\n.btn-download {\n  padding: 8px 16px;\n  background: var(--accent);\n  color: var(--bg);\n  border: none;\n  border-radius: 6px;\n  font-size: 14px;\n  font-weight: 600;\n  cursor: pointer;\n}\n\n.btn-download:hover { opacity: 0.9; }\n.btn-download:disabled { opacity: 0.5; cursor: not-allowed; }\n\n.btn-close {\n  padding: 8px 16px;\n  background: transparent;\n  color: var(--text-dim);\n  border: 1px solid var(--border);\n  border-radius: 6px;\n  font-size: 14px;\n  cursor: pointer;\n}\n\n.btn-close:hover { color: var(--text); border-color: var(--text-dim); }\n"
  },
  {
    "path": "llmfit-tui/Cargo.toml",
    "content": "[package]\nname = \"llmfit\"\nversion.workspace = true\nedition = \"2024\"\nbuild = \"build.rs\"\nauthors = [\"Alex Jones <alex@example.com>\"]\ndescription = \"Right-size LLM models to your system hardware. Interactive TUI and CLI to match models against available RAM, CPU, and GPU.\"\nlicense = \"MIT\"\nrepository = \"https://github.com/AlexsJones/llmfit\"\nhomepage = \"https://github.com/AlexsJones/llmfit\"\nreadme = \"../README.md\"\nkeywords = [\"llm\", \"tui\", \"hardware\", \"inference\", \"models\"]\ncategories = [\"command-line-utilities\"]\n\n[[bin]]\nname = \"llmfit\"\npath = \"src/main.rs\"\n\n[dependencies]\nllmfit-core = { version = \"0.8.0\", path = \"../llmfit-core\" }\nclap = { version = \"4.5\", features = [\"derive\"] }\nserde = { version = \"1.0\", features = [\"derive\"] }\nserde_json = \"1.0\"\ntabled = \"0.20\"\ncolored = \"3.1\"\nratatui = \"0.30\"\ncrossterm = \"0.29\"\naxum = \"0.8\"\ntokio = { version = \"1.47\", features = [\"rt-multi-thread\", \"signal\", \"net\"] }\n\n[dev-dependencies]\nhttp-body-util = \"0.1\"\ntower = \"0.5\"\n"
  },
  {
    "path": "llmfit-tui/build.rs",
    "content": "use std::env;\nuse std::fs;\nuse std::path::{Path, PathBuf};\n\nfn main() {\n    println!(\"cargo:rerun-if-changed=build.rs\");\n\n    let manifest_dir = PathBuf::from(env::var(\"CARGO_MANIFEST_DIR\").expect(\"CARGO_MANIFEST_DIR\"));\n    let dist_dir = manifest_dir.join(\"..\").join(\"llmfit-web\").join(\"dist\");\n    println!(\"cargo:rerun-if-changed={}\", dist_dir.display());\n\n    let out_dir = PathBuf::from(env::var(\"OUT_DIR\").expect(\"OUT_DIR\"));\n    let out_file = out_dir.join(\"web_assets.rs\");\n\n    let generated = if dist_dir.exists() {\n        let mut files = Vec::new();\n        collect_files(&dist_dir, &mut files);\n        files.sort();\n        for file in &files {\n            println!(\"cargo:rerun-if-changed={}\", file.display());\n        }\n        generate_assets_from_dist(&dist_dir, &files)\n    } else {\n        println!(\n            \"cargo:warning=llmfit-web/dist not found. Falling back to placeholder embedded dashboard. Run `npm ci && npm run build` in llmfit-web.\"\n        );\n        generate_fallback_assets()\n    };\n\n    fs::write(&out_file, generated).expect(\"failed to write generated web_assets.rs\");\n}\n\nfn collect_files(dir: &Path, files: &mut Vec<PathBuf>) {\n    let entries = fs::read_dir(dir).unwrap_or_else(|_| panic!(\"failed to read {}\", dir.display()));\n\n    for entry in entries {\n        let entry = entry.expect(\"invalid dir entry\");\n        let path = entry.path();\n        if path.is_dir() {\n            collect_files(&path, files);\n        } else {\n            files.push(path);\n        }\n    }\n}\n\nfn generate_assets_from_dist(dist_dir: &Path, files: &[PathBuf]) -> String {\n    let mut output = String::new();\n    output.push_str(\"#[derive(Clone, Copy)]\\n\");\n    output.push_str(\"pub(crate) struct EmbeddedAsset {\\n\");\n    output.push_str(\"    pub(crate) path: &'static str,\\n\");\n    output.push_str(\"    pub(crate) content_type: &'static str,\\n\");\n    output.push_str(\"    pub(crate) bytes: &'static [u8],\\n\");\n    output.push_str(\"}\\n\\n\");\n    output.push_str(\"pub(crate) static EMBEDDED_WEB_ASSETS: &[EmbeddedAsset] = &[\\n\");\n\n    for file in files {\n        let relative = file\n            .strip_prefix(dist_dir)\n            .unwrap_or_else(|_| panic!(\"{} not under dist dir\", file.display()));\n        let route_path = format!(\"/{}\", relative.to_string_lossy().replace('\\\\', \"/\"));\n        let include_path = file.to_string_lossy();\n        let content_type = content_type_for_path(file);\n\n        output.push_str(&format!(\n            \"    EmbeddedAsset {{ path: {route_path:?}, content_type: {content_type:?}, bytes: include_bytes!({include_path:?}) }},\\n\"\n        ));\n    }\n\n    output.push_str(\"];\\n\");\n    output\n}\n\nfn generate_fallback_assets() -> String {\n    let mut output = String::new();\n    output.push_str(\"#[derive(Clone, Copy)]\\n\");\n    output.push_str(\"pub(crate) struct EmbeddedAsset {\\n\");\n    output.push_str(\"    pub(crate) path: &'static str,\\n\");\n    output.push_str(\"    pub(crate) content_type: &'static str,\\n\");\n    output.push_str(\"    pub(crate) bytes: &'static [u8],\\n\");\n    output.push_str(\"}\\n\\n\");\n    output.push_str(\"const FALLBACK_INDEX_HTML: &[u8] = br#\\\"<!doctype html>\\n\");\n    output.push_str(\"<html lang=\\\\\\\"en\\\\\\\">\\n\");\n    output.push_str(\"  <head><meta charset=\\\\\\\"UTF-8\\\\\\\"/><meta name=\\\\\\\"viewport\\\\\\\" content=\\\\\\\"width=device-width, initial-scale=1.0\\\\\\\"/><title>llmfit</title></head>\\n\");\n    output.push_str(\"  <body style=\\\\\\\"font-family: sans-serif; padding: 24px\\\\\\\">\\n\");\n    output.push_str(\"    <h1>llmfit Web Dashboard</h1>\\n\");\n    output.push_str(\"    <p>Frontend assets are missing.</p>\\n\");\n    output.push_str(\n        \"    <p>From repo root run: <code>cd llmfit-web && npm ci && npm run build</code></p>\\n\",\n    );\n    output.push_str(\"    <script src=\\\\\\\"/assets/fallback.js\\\\\\\"></script>\\n\");\n    output.push_str(\"  </body>\\n\");\n    output.push_str(\"</html>\\\"#;\\n\");\n    output.push_str(\"const FALLBACK_JS: &[u8] = br\\\"console.warn('llmfit-web dist assets not found; serving fallback page.');\\\";\\n\\n\");\n    output.push_str(\"pub(crate) static EMBEDDED_WEB_ASSETS: &[EmbeddedAsset] = &[\\n\");\n    output.push_str(\"    EmbeddedAsset { path: \\\"/index.html\\\", content_type: \\\"text/html; charset=utf-8\\\", bytes: FALLBACK_INDEX_HTML },\\n\");\n    output.push_str(\"    EmbeddedAsset { path: \\\"/assets/fallback.js\\\", content_type: \\\"text/javascript; charset=utf-8\\\", bytes: FALLBACK_JS },\\n\");\n    output.push_str(\"];\\n\");\n    output\n}\n\nfn content_type_for_path(path: &Path) -> &'static str {\n    match path.extension().and_then(|ext| ext.to_str()) {\n        Some(\"html\") => \"text/html; charset=utf-8\",\n        Some(\"js\") => \"text/javascript; charset=utf-8\",\n        Some(\"css\") => \"text/css; charset=utf-8\",\n        Some(\"json\") | Some(\"map\") => \"application/json; charset=utf-8\",\n        Some(\"svg\") => \"image/svg+xml\",\n        Some(\"png\") => \"image/png\",\n        Some(\"jpg\") | Some(\"jpeg\") => \"image/jpeg\",\n        Some(\"gif\") => \"image/gif\",\n        Some(\"webp\") => \"image/webp\",\n        Some(\"ico\") => \"image/x-icon\",\n        Some(\"woff\") => \"font/woff\",\n        Some(\"woff2\") => \"font/woff2\",\n        Some(\"txt\") => \"text/plain; charset=utf-8\",\n        _ => \"application/octet-stream\",\n    }\n}\n"
  },
  {
    "path": "llmfit-tui/src/display.rs",
    "content": "use colored::*;\nuse llmfit_core::fit::{FitLevel, ModelFit};\nuse llmfit_core::hardware::SystemSpecs;\nuse llmfit_core::models::LlmModel;\nuse llmfit_core::plan::PlanEstimate;\nuse tabled::{Table, Tabled, settings::Style};\n\n#[derive(Tabled)]\nstruct ModelRow {\n    #[tabled(rename = \"Status\")]\n    status: String,\n    #[tabled(rename = \"Model\")]\n    name: String,\n    #[tabled(rename = \"Provider\")]\n    provider: String,\n    #[tabled(rename = \"Size\")]\n    size: String,\n    #[tabled(rename = \"Score\")]\n    score: String,\n    #[tabled(rename = \"tok/s est.\")]\n    tps: String,\n    #[tabled(rename = \"Quant\")]\n    quant: String,\n    #[tabled(rename = \"Runtime\")]\n    runtime: String,\n    #[tabled(rename = \"Mode\")]\n    mode: String,\n    #[tabled(rename = \"Mem %\")]\n    mem_use: String,\n    #[tabled(rename = \"Context\")]\n    context: String,\n}\n\npub fn display_all_models(models: &[LlmModel]) {\n    println!(\"\\n{}\", \"=== Available LLM Models ===\".bold().cyan());\n    println!(\"Total models: {}\\n\", models.len());\n\n    let rows: Vec<ModelRow> = models\n        .iter()\n        .map(|m| ModelRow {\n            status: \"--\".to_string(),\n            name: m.name.clone(),\n            provider: m.provider.clone(),\n            size: m.parameter_count.clone(),\n            score: \"-\".to_string(),\n            tps: \"-\".to_string(),\n            quant: m.quantization.clone(),\n            runtime: \"-\".to_string(),\n            mode: \"-\".to_string(),\n            mem_use: \"-\".to_string(),\n            context: format!(\"{}k\", m.context_length / 1000),\n        })\n        .collect();\n\n    let table = Table::new(rows).with(Style::rounded()).to_string();\n    println!(\"{}\", table);\n}\n\npub fn display_model_fits(fits: &[ModelFit]) {\n    if fits.is_empty() {\n        println!(\n            \"\\n{}\",\n            \"No compatible models found for your system.\".yellow()\n        );\n        return;\n    }\n\n    println!(\"\\n{}\", \"=== Model Compatibility Analysis ===\".bold().cyan());\n    println!(\"Found {} compatible model(s)\\n\", fits.len());\n\n    let rows: Vec<ModelRow> = fits\n        .iter()\n        .map(|fit| {\n            let status_text = format!(\"{} {}\", fit.fit_emoji(), fit.fit_text());\n\n            ModelRow {\n                status: status_text,\n                name: fit.model.name.clone(),\n                provider: fit.model.provider.clone(),\n                size: fit.model.parameter_count.clone(),\n                score: format!(\"{:.0}\", fit.score),\n                tps: format!(\"{:.1}\", fit.estimated_tps),\n                quant: fit.best_quant.clone(),\n                runtime: fit.runtime_text().to_string(),\n                mode: fit.run_mode_text().to_string(),\n                mem_use: format!(\"{:.1}%\", fit.utilization_pct),\n                context: format!(\"{}k\", fit.model.context_length / 1000),\n            }\n        })\n        .collect();\n\n    let table = Table::new(rows).with(Style::rounded()).to_string();\n    println!(\"{}\", table);\n    println!(\n        \"  Note: tok/s values are baseline estimates; real runtime depends on engine/runtime.\"\n    );\n}\n\npub fn display_model_detail(fit: &ModelFit) {\n    println!(\"\\n{}\", format!(\"=== {} ===\", fit.model.name).bold().cyan());\n    println!();\n    println!(\"{}: {}\", \"Provider\".bold(), fit.model.provider);\n    println!(\"{}: {}\", \"Parameters\".bold(), fit.model.parameter_count);\n    println!(\"{}: {}\", \"Quantization\".bold(), fit.model.quantization);\n    println!(\"{}: {}\", \"Best Quant\".bold(), fit.best_quant);\n    println!(\n        \"{}: {} tokens\",\n        \"Context Length\".bold(),\n        fit.model.context_length\n    );\n    println!(\"{}: {}\", \"Use Case\".bold(), fit.model.use_case);\n    println!(\"{}: {}\", \"Category\".bold(), fit.use_case.label());\n    if let Some(ref date) = fit.model.release_date {\n        println!(\"{}: {}\", \"Released\".bold(), date);\n    }\n    println!(\n        \"{}: {} (baseline est. ~{:.1} tok/s)\",\n        \"Runtime\".bold(),\n        fit.runtime_text(),\n        fit.estimated_tps\n    );\n    println!();\n\n    println!(\"{}\", \"Score Breakdown:\".bold().underline());\n    println!(\"  Overall Score: {:.1} / 100\", fit.score);\n    println!(\n        \"  Quality: {:.0}  Speed: {:.0}  Fit: {:.0}  Context: {:.0}\",\n        fit.score_components.quality,\n        fit.score_components.speed,\n        fit.score_components.fit,\n        fit.score_components.context\n    );\n    println!(\"  Baseline Est. Speed: {:.1} tok/s\", fit.estimated_tps);\n    println!();\n\n    println!(\"{}\", \"Resource Requirements:\".bold().underline());\n    if let Some(vram) = fit.model.min_vram_gb {\n        println!(\"  Min VRAM: {:.1} GB\", vram);\n    }\n    println!(\"  Min RAM: {:.1} GB (CPU inference)\", fit.model.min_ram_gb);\n    println!(\"  Recommended RAM: {:.1} GB\", fit.model.recommended_ram_gb);\n\n    // MoE Architecture info\n    if fit.model.is_moe {\n        println!();\n        println!(\"{}\", \"MoE Architecture:\".bold().underline());\n        if let (Some(num_experts), Some(active_experts)) =\n            (fit.model.num_experts, fit.model.active_experts)\n        {\n            println!(\n                \"  Experts: {} active / {} total per token\",\n                active_experts, num_experts\n            );\n        }\n        if let Some(active_vram) = fit.model.moe_active_vram_gb() {\n            println!(\n                \"  Active VRAM: {:.1} GB (vs {:.1} GB full model)\",\n                active_vram,\n                fit.model.min_vram_gb.unwrap_or(0.0)\n            );\n        }\n        if let Some(offloaded) = fit.moe_offloaded_gb {\n            println!(\"  Offloaded: {:.1} GB inactive experts in RAM\", offloaded);\n        }\n    }\n    println!();\n\n    println!(\"{}\", \"Fit Analysis:\".bold().underline());\n\n    let fit_color = match fit.fit_level {\n        FitLevel::Perfect => \"green\",\n        FitLevel::Good => \"yellow\",\n        FitLevel::Marginal => \"orange\",\n        FitLevel::TooTight => \"red\",\n    };\n\n    println!(\n        \"  Status: {} {}\",\n        fit.fit_emoji(),\n        fit.fit_text().color(fit_color)\n    );\n    println!(\"  Run Mode: {}\", fit.run_mode_text());\n    println!(\n        \"  Memory Utilization: {:.1}% ({:.1} / {:.1} GB)\",\n        fit.utilization_pct, fit.memory_required_gb, fit.memory_available_gb\n    );\n    println!();\n\n    if !fit.model.gguf_sources.is_empty() {\n        println!(\"{}\", \"GGUF Downloads:\".bold().underline());\n        for src in &fit.model.gguf_sources {\n            println!(\"  {} → https://huggingface.co/{}\", src.provider, src.repo);\n        }\n        println!(\n            \"  {}\",\n            format!(\n                \"Tip: llmfit download {} --quant {}\",\n                fit.model.gguf_sources[0].repo, fit.best_quant\n            )\n            .dimmed()\n        );\n        println!();\n    }\n\n    if !fit.notes.is_empty() {\n        println!(\"{}\", \"Notes:\".bold().underline());\n        for note in &fit.notes {\n            println!(\"  {}\", note);\n        }\n        println!();\n    }\n}\n\npub fn display_model_diff(fits: &[ModelFit], sort_label: &str) {\n    if fits.len() < 2 {\n        println!(\"\\n{}\", \"Need at least 2 models to compare.\".yellow());\n        return;\n    }\n\n    println!(\"\\n{}\", \"=== Model Diff ===\".bold().cyan());\n    println!(\n        \"Comparing {} model(s) (sorted by {})\\n\",\n        fits.len(),\n        sort_label\n    );\n\n    let metric_width = 20usize;\n    let col_width = 32usize;\n\n    let model_headers: Vec<String> = fits\n        .iter()\n        .enumerate()\n        .map(|(i, fit)| {\n            let label = format!(\"M{}: {}\", i + 1, fit.model.name);\n            truncate_to_width(&label, col_width)\n        })\n        .collect();\n\n    print!(\"{:<metric_width$}\", \"Metric\".bold());\n    for header in &model_headers {\n        print!(\"  {:<col_width$}\", header.bold());\n    }\n    println!();\n\n    print!(\"{:-<metric_width$}\", \"\");\n    for _ in &model_headers {\n        print!(\"  {:-<col_width$}\", \"\");\n    }\n    println!();\n\n    let base = &fits[0];\n\n    print_metric_row(\n        \"Score\",\n        fits.iter()\n            .map(|f| format_with_delta(format!(\"{:.1}\", f.score), f.score - base.score))\n            .collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Baseline tok/s\",\n        fits.iter()\n            .map(|f| {\n                format_with_delta(\n                    format!(\"{:.1}\", f.estimated_tps),\n                    f.estimated_tps - base.estimated_tps,\n                )\n            })\n            .collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Fit\",\n        fits.iter()\n            .map(|f| format!(\"{} {}\", f.fit_emoji(), f.fit_text()))\n            .collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Run Mode\",\n        fits.iter().map(|f| f.run_mode_text().to_string()).collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Runtime\",\n        fits.iter().map(|f| f.runtime_text().to_string()).collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Memory %\",\n        fits.iter()\n            .map(|f| {\n                format_with_delta(\n                    format!(\"{:.1}%\", f.utilization_pct),\n                    f.utilization_pct - base.utilization_pct,\n                )\n            })\n            .collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Params\",\n        fits.iter()\n            .map(|f| f.model.parameter_count.clone())\n            .collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Context\",\n        fits.iter()\n            .map(|f| format!(\"{} tokens\", f.model.context_length))\n            .collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Best Quant\",\n        fits.iter().map(|f| f.best_quant.clone()).collect(),\n        metric_width,\n        col_width,\n    );\n    print_metric_row(\n        \"Provider\",\n        fits.iter().map(|f| f.model.provider.clone()).collect(),\n        metric_width,\n        col_width,\n    );\n}\n\nfn print_metric_row(metric: &str, values: Vec<String>, metric_width: usize, col_width: usize) {\n    print!(\"{:<metric_width$}\", metric);\n    for value in values {\n        print!(\"  {:<col_width$}\", truncate_to_width(&value, col_width));\n    }\n    println!();\n}\n\nfn format_with_delta(value: String, delta: f64) -> String {\n    if delta.abs() < 0.05 {\n        return value;\n    }\n    format!(\"{} ({:+.1})\", value, delta)\n}\n\nfn truncate_to_width(input: &str, width: usize) -> String {\n    if input.chars().count() <= width {\n        return input.to_string();\n    }\n    let mut out = input\n        .chars()\n        .take(width.saturating_sub(3))\n        .collect::<String>();\n    out.push_str(\"...\");\n    out\n}\n\npub fn display_search_results(models: &[&LlmModel], query: &str) {\n    if models.is_empty() {\n        println!(\n            \"\\n{}\",\n            format!(\"No models found matching '{}'\", query).yellow()\n        );\n        return;\n    }\n\n    println!(\n        \"\\n{}\",\n        format!(\"=== Search Results for '{}' ===\", query)\n            .bold()\n            .cyan()\n    );\n    println!(\"Found {} model(s)\\n\", models.len());\n\n    let rows: Vec<ModelRow> = models\n        .iter()\n        .map(|m| ModelRow {\n            status: \"--\".to_string(),\n            name: m.name.clone(),\n            provider: m.provider.clone(),\n            size: m.parameter_count.clone(),\n            score: \"-\".to_string(),\n            tps: \"-\".to_string(),\n            quant: m.quantization.clone(),\n            runtime: \"-\".to_string(),\n            mode: \"-\".to_string(),\n            mem_use: \"-\".to_string(),\n            context: format!(\"{}k\", m.context_length / 1000),\n        })\n        .collect();\n\n    let table = Table::new(rows).with(Style::rounded()).to_string();\n    println!(\"{}\", table);\n}\n\n// ────────────────────────────────────────────────────────────────────\n// JSON output for machine consumption (OpenClaw skills, scripts, etc.)\n// ────────────────────────────────────────────────────────────────────\n\n/// Serialize system specs to JSON and print to stdout.\npub fn display_json_system(specs: &SystemSpecs) {\n    let output = serde_json::json!({\n        \"system\": system_json(specs),\n    });\n    println!(\n        \"{}\",\n        serde_json::to_string_pretty(&output).expect(\"JSON serialization failed\")\n    );\n}\n\n/// Serialize system specs + model fits to JSON and print to stdout.\npub fn display_json_fits(specs: &SystemSpecs, fits: &[ModelFit]) {\n    let models: Vec<serde_json::Value> = fits.iter().map(fit_to_json).collect();\n    let output = serde_json::json!({\n        \"system\": system_json(specs),\n        \"models\": models,\n    });\n    println!(\n        \"{}\",\n        serde_json::to_string_pretty(&output).expect(\"JSON serialization failed\")\n    );\n}\n\n/// Serialize diff output via serde derives (new diff-only path).\npub fn display_json_diff_fits(specs: &SystemSpecs, fits: &[ModelFit]) {\n    #[derive(serde::Serialize)]\n    struct FitsOutput<'a> {\n        system: &'a SystemSpecs,\n        models: &'a [ModelFit],\n    }\n    let output = FitsOutput {\n        system: specs,\n        models: fits,\n    };\n    println!(\n        \"{}\",\n        serde_json::to_string_pretty(&output).expect(\"JSON serialization failed\")\n    );\n}\n\nfn system_json(specs: &SystemSpecs) -> serde_json::Value {\n    let gpus_json: Vec<serde_json::Value> = specs\n        .gpus\n        .iter()\n        .map(|g| {\n            serde_json::json!({\n                \"name\": g.name,\n                \"vram_gb\": g.vram_gb.map(round2),\n                \"backend\": g.backend.label(),\n                \"count\": g.count,\n                \"unified_memory\": g.unified_memory,\n            })\n        })\n        .collect();\n\n    serde_json::json!({\n        \"total_ram_gb\": round2(specs.total_ram_gb),\n        \"available_ram_gb\": round2(specs.available_ram_gb),\n        \"cpu_cores\": specs.total_cpu_cores,\n        \"cpu_name\": specs.cpu_name,\n        \"has_gpu\": specs.has_gpu,\n        \"gpu_vram_gb\": specs.gpu_vram_gb.map(round2),\n        \"gpu_name\": specs.gpu_name,\n        \"gpu_count\": specs.gpu_count,\n        \"unified_memory\": specs.unified_memory,\n        \"backend\": specs.backend.label(),\n        \"gpus\": gpus_json,\n    })\n}\n\nfn fit_to_json(fit: &ModelFit) -> serde_json::Value {\n    serde_json::json!({\n        \"name\": fit.model.name,\n        \"provider\": fit.model.provider,\n        \"parameter_count\": fit.model.parameter_count,\n        \"params_b\": round2(fit.model.params_b()),\n        \"context_length\": fit.model.context_length,\n        \"use_case\": fit.model.use_case,\n        \"category\": fit.use_case.label(),\n        \"release_date\": fit.model.release_date,\n        \"is_moe\": fit.model.is_moe,\n        \"fit_level\": fit.fit_text(),\n        \"run_mode\": fit.run_mode_text(),\n        \"score\": round1(fit.score),\n        \"score_components\": {\n            \"quality\": round1(fit.score_components.quality),\n            \"speed\": round1(fit.score_components.speed),\n            \"fit\": round1(fit.score_components.fit),\n            \"context\": round1(fit.score_components.context),\n        },\n        \"estimated_tps\": round1(fit.estimated_tps),\n        \"runtime\": fit.runtime_text(),\n        \"runtime_label\": fit.runtime.label(),\n        \"best_quant\": fit.best_quant,\n        \"memory_required_gb\": round2(fit.memory_required_gb),\n        \"memory_available_gb\": round2(fit.memory_available_gb),\n        \"moe_offloaded_gb\": fit.moe_offloaded_gb.map(round2),\n        \"total_memory_gb\": round2(fit.memory_required_gb + fit.moe_offloaded_gb.unwrap_or(0.0)),\n        \"utilization_pct\": round1(fit.utilization_pct),\n        \"notes\": fit.notes,\n        \"gguf_sources\": fit.model.gguf_sources,\n    })\n}\n\nfn round1(v: f64) -> f64 {\n    (v * 10.0).round() / 10.0\n}\n\nfn round2(v: f64) -> f64 {\n    (v * 100.0).round() / 100.0\n}\n\npub fn display_model_plan(plan: &PlanEstimate) {\n    println!(\"\\n{}\", \"=== Hardware Planning Estimate ===\".bold().cyan());\n    println!(\"{} {}\", \"Model:\".bold(), plan.model_name);\n    println!(\"{} {}\", \"Provider:\".bold(), plan.provider);\n    println!(\"{} {}\", \"Context:\".bold(), plan.context);\n    println!(\"{} {}\", \"Quantization:\".bold(), plan.quantization);\n    if let Some(tps) = plan.target_tps {\n        println!(\"{} {:.1} tok/s\", \"Target TPS:\".bold(), tps);\n    }\n    println!(\"{} {}\", \"Note:\".bold(), plan.estimate_notice);\n    println!();\n\n    println!(\"{}\", \"Minimum Hardware:\".bold().underline());\n    println!(\n        \"  VRAM: {}\",\n        plan.minimum\n            .vram_gb\n            .map(|v| format!(\"{v:.1} GB\"))\n            .unwrap_or_else(|| \"Not required\".to_string())\n    );\n    println!(\"  RAM: {:.1} GB\", plan.minimum.ram_gb);\n    println!(\"  CPU Cores: {}\", plan.minimum.cpu_cores);\n    println!();\n\n    println!(\"{}\", \"Recommended Hardware:\".bold().underline());\n    println!(\n        \"  VRAM: {}\",\n        plan.recommended\n            .vram_gb\n            .map(|v| format!(\"{v:.1} GB\"))\n            .unwrap_or_else(|| \"Not required\".to_string())\n    );\n    println!(\"  RAM: {:.1} GB\", plan.recommended.ram_gb);\n    println!(\"  CPU Cores: {}\", plan.recommended.cpu_cores);\n    println!();\n\n    println!(\"{}\", \"Feasible Run Paths:\".bold().underline());\n    for path in &plan.run_paths {\n        println!(\n            \"  {}: {}\",\n            path.path.label(),\n            if path.feasible { \"Yes\" } else { \"No\" }\n        );\n        if let Some(min) = &path.minimum {\n            println!(\n                \"    min: VRAM={} RAM={:.1} GB cores={}\",\n                min.vram_gb\n                    .map(|v| format!(\"{v:.1} GB\"))\n                    .unwrap_or_else(|| \"n/a\".to_string()),\n                min.ram_gb,\n                min.cpu_cores\n            );\n        }\n        if let Some(tps) = path.estimated_tps {\n            println!(\"    est speed: {:.1} tok/s\", tps);\n        }\n    }\n    println!();\n\n    println!(\"{}\", \"Upgrade Deltas:\".bold().underline());\n    if plan.upgrade_deltas.is_empty() {\n        println!(\"  None required for the selected target.\");\n    } else {\n        for delta in &plan.upgrade_deltas {\n            println!(\"  {}\", delta.description);\n        }\n    }\n    println!();\n}\n\npub fn display_json_plan(plan: &PlanEstimate) {\n    println!(\n        \"{}\",\n        serde_json::to_string_pretty(plan).expect(\"JSON serialization failed\")\n    );\n}\n"
  },
  {
    "path": "llmfit-tui/src/main.rs",
    "content": "mod display;\nmod serve_api;\nmod theme;\nmod tui_app;\nmod tui_events;\nmod tui_ui;\n\nuse clap::{Parser, Subcommand};\nuse std::net::{TcpStream, ToSocketAddrs};\nuse std::process::Stdio;\nuse std::thread;\nuse std::time::Duration;\n\nuse llmfit_core::fit::{ModelFit, SortColumn, backend_compatible};\nuse llmfit_core::hardware::SystemSpecs;\nuse llmfit_core::models::ModelDatabase;\nuse llmfit_core::plan::{PlanRequest, estimate_model_plan, resolve_model_selector};\n\nconst DEFAULT_DASHBOARD_HOST: &str = \"0.0.0.0\";\nconst DEFAULT_DASHBOARD_PORT: u16 = 8787;\n\n#[derive(clap::ValueEnum, Clone, Copy, Debug)]\nenum SortArg {\n    /// Composite ranking score (default)\n    Score,\n    /// Estimated tokens/second\n    #[value(alias = \"tokens\", alias = \"toks\", alias = \"throughput\")]\n    Tps,\n    /// Model parameter count\n    Params,\n    /// Memory utilization percentage\n    #[value(alias = \"memory\", alias = \"mem_pct\", alias = \"utilization\")]\n    Mem,\n    /// Context window length\n    #[value(alias = \"context\")]\n    Ctx,\n    /// Release date (newest first)\n    #[value(alias = \"release\", alias = \"released\")]\n    Date,\n    /// Use-case grouping\n    #[value(alias = \"use_case\", alias = \"usecase\")]\n    Use,\n}\n\nimpl From<SortArg> for SortColumn {\n    fn from(value: SortArg) -> Self {\n        match value {\n            SortArg::Score => SortColumn::Score,\n            SortArg::Tps => SortColumn::Tps,\n            SortArg::Params => SortColumn::Params,\n            SortArg::Mem => SortColumn::MemPct,\n            SortArg::Ctx => SortColumn::Ctx,\n            SortArg::Date => SortColumn::ReleaseDate,\n            SortArg::Use => SortColumn::UseCase,\n        }\n    }\n}\n\n#[derive(clap::ValueEnum, Clone, Copy, Debug)]\nenum FitArg {\n    All,\n    Perfect,\n    Good,\n    Marginal,\n    Tight,\n    Runnable,\n}\n\n#[derive(Parser)]\n#[command(name = \"llmfit\")]\n#[command(about = \"Right-size LLM models to your system's hardware\")]\n#[command(long_about = \"\\\nRight-size LLM models to your system's hardware.\n\nllmfit detects your system's RAM, CPU, and GPU (NVIDIA, AMD, Apple Silicon),\nthen scores every model in its database for fit, speed, and quality. It can\nrecommend models, compare them side-by-side, plan hardware upgrades, download\nGGUF weights, and launch inference — all from a single binary.\n\nGLOBAL FLAGS:\n  --json           Output structured JSON on every subcommand (for tool/agent\n                   integration). Always exits 0 on success, 1 on error.\n  --memory <SIZE>  Override GPU VRAM (e.g. \\\"32G\\\", \\\"32000M\\\", \\\"1.5T\\\").\n  --max-context N  Cap context length for memory estimation (tokens).\n                   Falls back to OLLAMA_CONTEXT_LENGTH env var if unset.\n\nEXIT CODES:\n  0  Success\n  1  Any error (hardware detection failure, model not found, network error, etc.)\n\nENVIRONMENT VARIABLES:\n  OLLAMA_CONTEXT_LENGTH  Default context-length cap when --max-context is not set.\")]\n#[command(after_long_help = \"For a compact summary, use -h instead of --help.\")]\n#[command(version)]\nstruct Cli {\n    #[command(subcommand)]\n    command: Option<Commands>,\n\n    /// Show only models that perfectly match recommended specs\n    #[arg(short, long)]\n    perfect: bool,\n\n    /// Limit number of results\n    #[arg(short = 'n', long)]\n    limit: Option<usize>,\n\n    /// Sort column for CLI fit output\n    #[arg(long, value_enum, default_value_t = SortArg::Score)]\n    sort: SortArg,\n\n    /// Use classic CLI table output instead of TUI\n    #[arg(long)]\n    cli: bool,\n\n    /// Output results as JSON (for tool integration)\n    #[arg(long, global = true)]\n    json: bool,\n\n    /// Override GPU VRAM size (e.g. \"32G\", \"32000M\", \"1.5T\").\n    /// Useful when GPU memory autodetection fails.\n    #[arg(long, value_name = \"SIZE\")]\n    memory: Option<String>,\n\n    /// Cap context length used for memory estimation (tokens).\n    /// Falls back to OLLAMA_CONTEXT_LENGTH if not set.\n    #[arg(long, value_name = \"TOKENS\", value_parser = clap::value_parser!(u32).range(1..))]\n    max_context: Option<u32>,\n\n    /// Do not auto-start the background dashboard server\n    #[arg(long, global = true)]\n    no_dashboard: bool,\n}\n\n#[derive(Subcommand)]\nenum Commands {\n    /// Show system hardware specifications\n    #[command(long_about = \"\\\nShow system hardware specifications.\n\nDetects RAM, CPU, and GPU (NVIDIA via nvidia-smi, AMD via rocm-smi/sysfs,\nApple Silicon via system_profiler). On unified-memory systems (Apple Silicon),\nVRAM is reported as system RAM.\n\nPRECONDITIONS:\n  None. GPU detection is best-effort and fails silently if tools are missing.\n\nSIDE EFFECTS:\n  None — read-only.\n\nEXIT CODES:\n  0  Success\n\nAGENT USAGE:\n  llmfit system --json\n\n  JSON output fields: { system: { cpu, ram_gb, gpu_name, gpu_vram_gb,\n  gpu_backend, unified_memory, os } }\")]\n    System,\n\n    /// List all available LLM models\n    #[command(long_about = \"\\\nList all available LLM models.\n\nPrints every model in the embedded database with name, provider, parameter\ncount, quantization, and context length. No hardware analysis is performed.\n\nPRECONDITIONS:\n  None.\n\nSIDE EFFECTS:\n  None — read-only.\n\nEXIT CODES:\n  0  Success\n\nAGENT USAGE:\n  llmfit list --json\n\n  JSON output: array of model objects with fields: name, provider,\n  parameter_count, min_ram_gb, recommended_ram_gb, min_vram_gb,\n  quantization, context_length, use_case, capabilities.\")]\n    List,\n\n    /// Find models that fit your system (classic table output)\n    #[command(long_about = \"\\\nFind models that fit your system (classic table output).\n\nDetects hardware, scores every model for fit/speed/quality, and prints a\nranked table. Models incompatible with the detected backend are hidden.\n\nPRECONDITIONS:\n  Requires hardware detection (GPU via nvidia-smi/rocm-smi/system_profiler).\n  Use --memory to override GPU VRAM if autodetection fails.\n\nSIDE EFFECTS:\n  None — read-only.\n\nEXIT CODES:\n  0  Success\n  1  Hardware detection or internal error\n\nAGENT USAGE:\n  llmfit fit --json\n  llmfit fit --json --perfect -n 5\n  llmfit fit --json --sort tps\n\n  JSON output fields: { system: {...}, models: [{ name, provider,\n  parameter_count, fit_level, run_mode, score, score_components,\n  estimated_tps, memory_required_gb, memory_available_gb,\n  utilization_pct, best_quant, use_case, runtime }] }\")]\n    Fit {\n        /// Show only models that perfectly match recommended specs\n        #[arg(short, long)]\n        perfect: bool,\n\n        /// Limit number of results\n        #[arg(short = 'n', long)]\n        limit: Option<usize>,\n\n        /// Sort column for fit output\n        #[arg(long, value_enum, default_value_t = SortArg::Score)]\n        sort: SortArg,\n    },\n\n    /// Search for specific models\n    #[command(long_about = \"\\\nSearch for specific models.\n\nSearches the embedded model database by name, provider, or parameter size.\nNo hardware analysis is performed.\n\nPRECONDITIONS:\n  None.\n\nSIDE EFFECTS:\n  None — read-only.\n\nEXIT CODES:\n  0  Success (even if no matches found)\n\nAGENT USAGE:\n  No --json support for this command. Use 'llmfit list --json' and filter\n  client-side, or use 'llmfit info <model> --json' for a specific model.\")]\n    Search {\n        /// Search query (model name, provider, or size)\n        query: String,\n    },\n\n    /// Show detailed information about a specific model\n    #[command(long_about = \"\\\nShow detailed information about a specific model.\n\nLooks up a model by name (or partial name) and displays full specs plus a\nhardware fit analysis against the current system.\n\nPRECONDITIONS:\n  None. Hardware detection runs automatically for fit analysis.\n\nSIDE EFFECTS:\n  None — read-only.\n\nEXIT CODES:\n  0  Success\n  1  No model found, or ambiguous partial match\n\nAGENT USAGE:\n  llmfit info \\\"llama-3.1-8b\\\" --json\n\n  JSON output fields: { system: {...}, models: [{ <single model with full\n  fit analysis: name, fit_level, run_mode, score, score_components,\n  estimated_tps, memory_required_gb, utilization_pct, ... > }] }\")]\n    Info {\n        /// Model name or partial name to look up\n        model: String,\n    },\n\n    /// Compare two models side-by-side, or auto-compare top N filtered models\n    #[command(long_about = \"\\\nCompare two models side-by-side, or auto-compare top N filtered models.\n\nWhen two model selectors are given, compares those two models. When none are\ngiven, picks the top N models (default 2) after applying fit-level and sort\nfilters, and compares them.\n\nPRECONDITIONS:\n  Requires hardware detection for fit analysis. At least 2 models must pass\n  the filter for auto-compare mode.\n\nSIDE EFFECTS:\n  None — read-only.\n\nEXIT CODES:\n  0  Success\n  1  Model not found, ambiguous selector, fewer than 2 candidates, or\n     both selectors resolve to the same model\n\nAGENT USAGE:\n  llmfit diff --json\n  llmfit diff \\\"llama-8b\\\" \\\"qwen-7b\\\" --json\n  llmfit diff --json --fit good --sort tps -n 3\n\n  JSON output fields: { system: {...}, models: [{ name, fit_level,\n  run_mode, score, estimated_tps, memory_required_gb, ... }] }\")]\n    Diff {\n        /// First model selector (name or unique partial name)\n        model_a: Option<String>,\n\n        /// Second model selector (name or unique partial name)\n        model_b: Option<String>,\n\n        /// Sort column before selecting candidates\n        #[arg(long, value_enum, default_value_t = SortArg::Score)]\n        sort: SortArg,\n\n        /// Fit-level filter before candidate selection\n        #[arg(long, value_enum, default_value_t = FitArg::Runnable)]\n        fit: FitArg,\n\n        /// Number of top models to include when model names are omitted\n        #[arg(short = 'n', long, default_value_t = 2)]\n        limit: usize,\n    },\n\n    /// Plan hardware requirements for a specific model configuration\n    #[command(long_about = \"\\\nPlan hardware requirements for a specific model configuration.\n\nEstimates VRAM/RAM requirements, expected throughput, and recommended hardware\nfor running a model at a given context length and quantization. Useful for\ncapacity planning and hardware purchasing decisions.\n\nPRECONDITIONS:\n  Model must exist in the embedded database (use 'llmfit search' to verify).\n\nSIDE EFFECTS:\n  None — read-only.\n\nEXIT CODES:\n  0  Success\n  1  Model not found or invalid configuration\n\nAGENT USAGE:\n  llmfit plan \\\"llama-3.1-70b\\\" --context 8192 --json\n  llmfit plan \\\"qwen-72b\\\" --context 4096 --quant Q4_K_M --target-tps 15 --json\n\n  JSON output: PlanEstimate object with fields: model_name, context_length,\n  quantization, weight_gb, kv_cache_gb, total_vram_gb, fits_in_vram,\n  estimated_tps, recommended_gpu, notes.\")]\n    Plan {\n        /// Model selector (name or unique partial name)\n        model: String,\n\n        /// Context length for estimation (tokens)\n        #[arg(long, value_name = \"TOKENS\", value_parser = clap::value_parser!(u32).range(1..))]\n        context: u32,\n\n        /// Quantization override (e.g. Q4_K_M, Q8_0, mlx-4bit)\n        #[arg(long)]\n        quant: Option<String>,\n\n        /// Target decode speed in tokens/sec\n        #[arg(long, value_name = \"TOK_S\")]\n        target_tps: Option<f64>,\n    },\n\n    /// Recommend top models for your hardware (JSON-friendly)\n    #[command(long_about = \"\\\nRecommend top models for your hardware (JSON-friendly).\n\nAnalyzes all models against detected hardware and returns the top N ranked\nrecommendations. Supports filtering by use case, fit level, inference runtime,\nand model capabilities. JSON output is enabled by default.\n\nPRECONDITIONS:\n  Requires hardware detection. Use --memory to override GPU VRAM if needed.\n\nSIDE EFFECTS:\n  None — read-only.\n\nEXIT CODES:\n  0  Success\n  1  Hardware detection or internal error\n\nAGENT USAGE:\n  llmfit recommend\n  llmfit recommend -n 3 --use-case coding --min-fit good\n  llmfit recommend --runtime mlx --capability vision\n  llmfit recommend --force-runtime llamacpp  # get llama.cpp results on Apple Silicon\n\n  JSON output is the default. Fields: { system: {...}, models: [{ name,\n  provider, parameter_count, fit_level, run_mode, score, score_components\n  { quality, speed, fit, context }, estimated_tps, memory_required_gb,\n  memory_available_gb, utilization_pct, best_quant, use_case, runtime,\n  capabilities }] }\")]\n    Recommend {\n        /// Limit number of recommendations\n        #[arg(short = 'n', long, default_value = \"5\")]\n        limit: usize,\n\n        /// Filter by use case: general, coding, reasoning, chat, multimodal, embedding\n        #[arg(long, value_name = \"CATEGORY\")]\n        use_case: Option<String>,\n\n        /// Filter by minimum fit level: perfect, good, marginal\n        #[arg(long, default_value = \"marginal\")]\n        min_fit: String,\n\n        /// Filter by inference runtime: mlx, llamacpp, any\n        #[arg(long, default_value = \"any\")]\n        runtime: String,\n\n        /// Force a specific runtime override, bypassing automatic selection\n        /// (e.g. get llama.cpp recommendations on Apple Silicon instead of MLX)\n        #[arg(long, value_name = \"RUNTIME\")]\n        force_runtime: Option<String>,\n\n        /// Filter by capability: vision, tool_use (comma-separated for multiple)\n        #[arg(long, value_name = \"CAPS\")]\n        capability: Option<String>,\n\n        /// Output as JSON (default for recommend)\n        #[arg(long, default_value = \"true\")]\n        json: bool,\n    },\n\n    /// Download a GGUF model from HuggingFace for use with llama.cpp\n    #[command(long_about = \"\\\nDownload a GGUF model from HuggingFace for use with llama.cpp.\n\nAccepts a HuggingFace repo ID, a search query, or a known model name.\nAutomatically selects the best quantization that fits your hardware unless\n--quant is specified. Use --list to browse available files without downloading.\n\nPRECONDITIONS:\n  Network access to huggingface.co. Hardware detection runs for auto quant\n  selection (override with --budget or --quant).\n\nSIDE EFFECTS:\n  Downloads a GGUF file to the local model cache directory\n  (~/.cache/llmfit/models/ or platform equivalent).\n\nEXIT CODES:\n  0  Success\n  1  Model/repo not found, no GGUF files available, network error, or\n     download failure\n\nAGENT USAGE:\n  No --json support. Parse stdout for progress and completion messages.\n  Use --list to enumerate available quantizations before downloading.\")]\n    Download {\n        /// Model to download. Can be:\n        ///   - HuggingFace repo (e.g. \"bartowski/Llama-3.1-8B-Instruct-GGUF\")\n        ///   - Search query (e.g. \"llama 8b\")\n        ///   - Known model name (e.g. \"llama-3.1-8b-instruct\")\n        model: String,\n\n        /// Specific GGUF quantization to download (e.g. \"Q4_K_M\", \"Q8_0\").\n        /// If omitted, selects the best quantization that fits your hardware.\n        #[arg(short, long)]\n        quant: Option<String>,\n\n        /// Maximum memory budget in GB for quantization selection\n        #[arg(long, value_name = \"GB\")]\n        budget: Option<f64>,\n\n        /// List available GGUF files in the repo without downloading\n        #[arg(long)]\n        list: bool,\n    },\n\n    /// Search HuggingFace for GGUF models compatible with llama.cpp\n    #[command(long_about = \"\\\nSearch HuggingFace for GGUF models compatible with llama.cpp.\n\nQueries the HuggingFace Hub API for repositories containing GGUF model files.\nResults include the repository ID and model type.\n\nPRECONDITIONS:\n  Network access to huggingface.co.\n\nSIDE EFFECTS:\n  None — read-only (network query only).\n\nEXIT CODES:\n  0  Success (even if no results found)\n\nAGENT USAGE:\n  No --json support. Parse the tabular stdout output, or use the llmfit\n  REST API ('llmfit serve') for programmatic access.\")]\n    HfSearch {\n        /// Search query (model name, architecture, etc.)\n        query: String,\n\n        /// Maximum number of results\n        #[arg(short = 'n', long, default_value = \"10\")]\n        limit: usize,\n    },\n\n    /// Run a downloaded GGUF model with llama-cli or llama-server\n    #[command(long_about = \"\\\nRun a downloaded GGUF model with llama-cli or llama-server.\n\nLaunches an interactive chat session (default) or an OpenAI-compatible API\nserver (--server). The model can be specified as a file path or a name to\nsearch in the local cache.\n\nPRECONDITIONS:\n  llama-cli (or llama-server with --server) must be installed and in PATH.\n  A GGUF model file must exist locally (use 'llmfit download' first).\n\nSIDE EFFECTS:\n  Launches an external llama.cpp process. In server mode, binds to the\n  specified port.\n\nEXIT CODES:\n  0  Clean exit from llama.cpp\n  1  llama-cli/llama-server not found, model not found, or process error\n  *  Other codes are proxied from the llama.cpp process\n\nAGENT USAGE:\n  No --json support. For API server mode, use:\n    llmfit run <model> --server --port 8080\n  Then interact via the OpenAI-compatible API at http://localhost:8080.\")]\n    Run {\n        /// Model file or name to run. If a name is given, searches the local cache.\n        model: String,\n\n        /// Run as an OpenAI-compatible API server instead of interactive chat\n        #[arg(long)]\n        server: bool,\n\n        /// Port for the API server (default: 8080)\n        #[arg(long, default_value = \"8080\")]\n        port: u16,\n\n        /// Number of GPU layers to offload (-1 = all)\n        #[arg(long, short = 'g', default_value = \"-1\")]\n        ngl: i32,\n\n        /// Context size in tokens\n        #[arg(long, short = 'c', default_value = \"4096\")]\n        ctx_size: u32,\n    },\n\n    /// Start llmfit REST API server for cluster/node scheduling workflows\n    #[command(long_about = \"\\\nStart llmfit REST API server for cluster/node scheduling workflows.\n\nExposes llmfit's hardware detection and model fitting as a REST API. Useful\nfor multi-node clusters, CI pipelines, and orchestration systems that need\nto query hardware capabilities and model recommendations programmatically.\n\nPRECONDITIONS:\n  The specified host:port must be available for binding.\n\nSIDE EFFECTS:\n  Binds an HTTP server on the specified host and port (default 127.0.0.1:8787).\n  Also serves the local web dashboard at `/` on the same host/port.\n  Runs until terminated.\n\nEXIT CODES:\n  0  Clean shutdown\n  1  Port binding failure or runtime error\n\nAGENT USAGE:\n  llmfit serve --port 8787\n  llmfit serve --host 0.0.0.0 --port 8787  # expose to other machines\n  All endpoints return JSON. See API.md for the full endpoint reference.\")]\n    Serve {\n        /// Host interface to bind\n        #[arg(long, default_value = \"127.0.0.1\")]\n        host: String,\n\n        /// Port to listen on\n        #[arg(long, default_value = \"8787\")]\n        port: u16,\n    },\n}\n\n/// Detect system specs with optional GPU memory override.\nfn detect_specs(memory_override: &Option<String>) -> SystemSpecs {\n    let specs = SystemSpecs::detect();\n    if let Some(mem_str) = memory_override {\n        match llmfit_core::hardware::parse_memory_size(mem_str) {\n            Some(gb) => specs.with_gpu_memory_override(gb),\n            None => {\n                eprintln!(\n                    \"Warning: could not parse --memory value '{}'. Expected format: 32G, 32000M, 1.5T\",\n                    mem_str\n                );\n                specs\n            }\n        }\n    } else {\n        specs\n    }\n}\n\nfn resolve_context_limit(max_context: Option<u32>) -> Option<u32> {\n    if max_context.is_some() {\n        return max_context;\n    }\n\n    let Ok(raw) = std::env::var(\"OLLAMA_CONTEXT_LENGTH\") else {\n        return None;\n    };\n    match raw.trim().parse::<u32>() {\n        Ok(v) if v > 0 => Some(v),\n        _ => {\n            eprintln!(\n                \"Warning: could not parse OLLAMA_CONTEXT_LENGTH='{}'. Expected a positive integer.\",\n                raw\n            );\n            None\n        }\n    }\n}\n\nfn dashboard_pid_path() -> std::path::PathBuf {\n    std::env::temp_dir().join(\"llmfit-dashboard.pid\")\n}\n\nfn write_dashboard_pid(pid: u32) {\n    let _ = std::fs::write(dashboard_pid_path(), pid.to_string());\n}\n\nstruct DashboardGuard {\n    child: std::process::Child,\n}\n\nimpl Drop for DashboardGuard {\n    fn drop(&mut self) {\n        let _ = self.child.kill();\n        let _ = std::fs::remove_file(dashboard_pid_path());\n    }\n}\n\nfn dashboard_target_from_env() -> (String, u16) {\n    let host = std::env::var(\"LLMFIT_DASHBOARD_HOST\")\n        .ok()\n        .map(|s| s.trim().to_string())\n        .filter(|s| !s.is_empty())\n        .unwrap_or_else(|| DEFAULT_DASHBOARD_HOST.to_string());\n\n    let port = std::env::var(\"LLMFIT_DASHBOARD_PORT\")\n        .ok()\n        .and_then(|raw| match raw.trim().parse::<u16>() {\n            Ok(value) => Some(value),\n            Err(_) => {\n                eprintln!(\n                    \"Warning: invalid LLMFIT_DASHBOARD_PORT='{}'. Using {}.\",\n                    raw, DEFAULT_DASHBOARD_PORT\n                );\n                None\n            }\n        })\n        .unwrap_or(DEFAULT_DASHBOARD_PORT);\n\n    (host, port)\n}\n\nfn dashboard_reachable(host: &str, port: u16) -> bool {\n    let Ok(mut addrs) = format!(\"{host}:{port}\").to_socket_addrs() else {\n        return false;\n    };\n    let Some(addr) = addrs.next() else {\n        return false;\n    };\n    TcpStream::connect_timeout(&addr, Duration::from_millis(250)).is_ok()\n}\n\nfn ensure_dashboard_available(\n    memory_override: &Option<String>,\n    context_limit: Option<u32>,\n) -> Option<DashboardGuard> {\n    let (host, port) = dashboard_target_from_env();\n    let _url = format!(\"http://{}:{}/\", host, port);\n\n    if dashboard_reachable(&host, port) {\n        return None;\n    }\n\n    let exe = match std::env::current_exe() {\n        Ok(path) => path,\n        Err(err) => {\n            eprintln!(\"Warning: could not resolve llmfit executable for dashboard launch: {err}\");\n            return None;\n        }\n    };\n\n    let mut command = std::process::Command::new(exe);\n    command.arg(\"--no-dashboard\");\n    if let Some(memory) = memory_override {\n        command.arg(\"--memory\").arg(memory);\n    }\n    if let Some(ctx) = context_limit {\n        command.arg(\"--max-context\").arg(ctx.to_string());\n    }\n\n    command\n        .arg(\"serve\")\n        .arg(\"--host\")\n        .arg(&host)\n        .arg(\"--port\")\n        .arg(port.to_string())\n        .stdin(Stdio::null())\n        .stdout(Stdio::null())\n        .stderr(Stdio::null());\n\n    let mut child = match command.spawn() {\n        Ok(child) => child,\n        Err(err) => {\n            eprintln!(\"Warning: could not start dashboard server: {err}\");\n            return None;\n        }\n    };\n\n    write_dashboard_pid(child.id());\n\n    for _ in 0..20 {\n        if dashboard_reachable(&host, port) {\n            return Some(DashboardGuard { child });\n        }\n\n        match child.try_wait() {\n            Ok(Some(status)) => {\n                eprintln!(\n                    \"Warning: dashboard server exited early (status: {}). Run `llmfit serve` to inspect logs.\",\n                    status\n                );\n                return None;\n            }\n            Ok(None) => {}\n            Err(err) => {\n                eprintln!(\"Warning: could not check dashboard server status: {err}\");\n                return None;\n            }\n        }\n\n        thread::sleep(Duration::from_millis(100));\n    }\n\n    Some(DashboardGuard { child })\n}\n\nfn run_fit(\n    perfect: bool,\n    limit: Option<usize>,\n    sort: SortColumn,\n    json: bool,\n    memory_override: &Option<String>,\n    context_limit: Option<u32>,\n) {\n    let specs = detect_specs(memory_override);\n    let db = ModelDatabase::new();\n\n    if !json {\n        specs.display();\n    }\n\n    let hidden: usize = db\n        .get_all_models()\n        .iter()\n        .filter(|m| !backend_compatible(m, &specs))\n        .count();\n\n    let mut fits: Vec<ModelFit> = db\n        .get_all_models()\n        .iter()\n        .filter(|m| backend_compatible(m, &specs))\n        .map(|m| ModelFit::analyze_with_context_limit(m, &specs, context_limit))\n        .collect();\n\n    if perfect {\n        fits.retain(|f| f.fit_level == llmfit_core::fit::FitLevel::Perfect);\n    }\n\n    fits = llmfit_core::fit::rank_models_by_fit_opts_col(fits, false, sort);\n\n    if let Some(n) = limit {\n        fits.truncate(n);\n    }\n\n    if json {\n        display::display_json_fits(&specs, &fits);\n    } else {\n        if hidden > 0 {\n            eprintln!(\n                \"({} model{} hidden — incompatible backend)\",\n                hidden,\n                if hidden == 1 { \"\" } else { \"s\" }\n            );\n        }\n        display::display_model_fits(&fits);\n    }\n}\n\nfn fit_matches_filter(fit: &ModelFit, filter: FitArg) -> bool {\n    match filter {\n        FitArg::All => true,\n        FitArg::Perfect => fit.fit_level == llmfit_core::fit::FitLevel::Perfect,\n        FitArg::Good => fit.fit_level == llmfit_core::fit::FitLevel::Good,\n        FitArg::Marginal => fit.fit_level == llmfit_core::fit::FitLevel::Marginal,\n        FitArg::Tight => fit.fit_level == llmfit_core::fit::FitLevel::TooTight,\n        FitArg::Runnable => fit.fit_level != llmfit_core::fit::FitLevel::TooTight,\n    }\n}\n\nfn find_name_index_by_selector<T>(\n    items: &[T],\n    selector: &str,\n    get_name: impl Fn(&T) -> &str,\n) -> Result<usize, String> {\n    let needle = selector.trim().to_lowercase();\n    if needle.is_empty() {\n        return Err(\"Model selector cannot be empty\".to_string());\n    }\n\n    if let Some((idx, _)) = items\n        .iter()\n        .enumerate()\n        .find(|(_, item)| get_name(item).to_lowercase() == needle)\n    {\n        return Ok(idx);\n    }\n\n    let matches: Vec<(usize, String)> = items\n        .iter()\n        .enumerate()\n        .filter_map(|(i, item)| {\n            let name = get_name(item);\n            if name.to_lowercase().contains(&needle) {\n                Some((i, name.to_string()))\n            } else {\n                None\n            }\n        })\n        .collect();\n\n    match matches.as_slice() {\n        [] => Err(format!(\"No model found matching '{}'\", selector)),\n        [(idx, _)] => Ok(*idx),\n        _ => {\n            let names = matches\n                .iter()\n                .take(8)\n                .map(|(_, name)| format!(\"  - {}\", name))\n                .collect::<Vec<_>>()\n                .join(\"\\n\");\n            Err(format!(\n                \"Multiple models match '{}'. Please be more specific:\\n{}\",\n                selector, names\n            ))\n        }\n    }\n}\n\nfn find_fit_index_by_selector(fits: &[ModelFit], selector: &str) -> Result<usize, String> {\n    find_name_index_by_selector(fits, selector, |fit| fit.model.name.as_str())\n}\n\nfn run_diff(\n    model_a: Option<String>,\n    model_b: Option<String>,\n    fit_filter: FitArg,\n    sort: SortColumn,\n    limit: usize,\n    json: bool,\n    memory_override: &Option<String>,\n    context_limit: Option<u32>,\n) {\n    if limit < 2 {\n        eprintln!(\"Error: --limit must be at least 2 for diff\");\n        std::process::exit(1);\n    }\n\n    if (model_a.is_some() && model_b.is_none()) || (model_a.is_none() && model_b.is_some()) {\n        eprintln!(\"Error: provide both model selectors, or neither to auto-compare top N\");\n        std::process::exit(1);\n    }\n\n    let specs = detect_specs(memory_override);\n    let db = ModelDatabase::new();\n\n    let mut fits: Vec<ModelFit> = db\n        .get_all_models()\n        .iter()\n        .filter(|m| backend_compatible(m, &specs))\n        .map(|m| ModelFit::analyze_with_context_limit(m, &specs, context_limit))\n        .collect();\n\n    fits.retain(|f| fit_matches_filter(f, fit_filter));\n    fits = llmfit_core::fit::rank_models_by_fit_opts_col(fits, false, sort);\n\n    let selected: Vec<ModelFit> =\n        if let (Some(a), Some(b)) = (model_a.as_deref(), model_b.as_deref()) {\n            let a_idx = match find_fit_index_by_selector(&fits, a) {\n                Ok(i) => i,\n                Err(e) => {\n                    eprintln!(\"Error: {}\", e);\n                    std::process::exit(1);\n                }\n            };\n            let b_idx = match find_fit_index_by_selector(&fits, b) {\n                Ok(i) => i,\n                Err(e) => {\n                    eprintln!(\"Error: {}\", e);\n                    std::process::exit(1);\n                }\n            };\n\n            if a_idx == b_idx {\n                eprintln!(\"Error: both selectors resolved to the same model\");\n                std::process::exit(1);\n            }\n\n            vec![fits[a_idx].clone(), fits[b_idx].clone()]\n        } else {\n            if fits.len() < 2 {\n                eprintln!(\"Error: need at least 2 models after filtering to compare\");\n                std::process::exit(1);\n            }\n            fits.into_iter().take(limit).collect()\n        };\n\n    if json {\n        display::display_json_diff_fits(&specs, &selected);\n    } else {\n        specs.display();\n        display::display_model_diff(&selected, sort.label());\n    }\n}\n\nfn run_tui(memory_override: &Option<String>, context_limit: Option<u32>) -> std::io::Result<()> {\n    // Setup terminal\n    crossterm::terminal::enable_raw_mode()?;\n    let mut stdout = std::io::stdout();\n    crossterm::execute!(\n        stdout,\n        crossterm::terminal::EnterAlternateScreen,\n        crossterm::event::EnableMouseCapture\n    )?;\n\n    let backend = ratatui::backend::CrosstermBackend::new(stdout);\n    let mut terminal = ratatui::Terminal::new(backend)?;\n    draw_boot_screen(&mut terminal, \"Detecting system hardware...\")?;\n\n    // Create app state\n    let specs = detect_specs(memory_override);\n    draw_boot_screen(&mut terminal, \"Loading providers and models...\")?;\n    let mut app = tui_app::App::with_specs_and_context(specs, context_limit);\n\n    // Main loop\n    loop {\n        terminal.draw(|frame| {\n            tui_ui::draw(frame, &mut app);\n        })?;\n\n        tui_events::handle_events(&mut app)?;\n\n        if app.should_quit {\n            break;\n        }\n    }\n\n    // Restore terminal\n    crossterm::terminal::disable_raw_mode()?;\n    crossterm::execute!(\n        terminal.backend_mut(),\n        crossterm::terminal::LeaveAlternateScreen,\n        crossterm::event::DisableMouseCapture\n    )?;\n    terminal.show_cursor()?;\n\n    Ok(())\n}\n\nfn draw_boot_screen(\n    terminal: &mut ratatui::Terminal<ratatui::backend::CrosstermBackend<std::io::Stdout>>,\n    message: &str,\n) -> std::io::Result<()> {\n    use ratatui::layout::{Constraint, Direction, Layout};\n    use ratatui::style::{Modifier, Style};\n    use ratatui::text::{Line, Span};\n    use ratatui::widgets::{Block, Borders, Paragraph};\n\n    terminal.draw(|frame| {\n        let area = frame.area();\n        let layout = Layout::default()\n            .direction(Direction::Vertical)\n            .constraints([\n                Constraint::Percentage(45),\n                Constraint::Length(3),\n                Constraint::Percentage(52),\n            ])\n            .split(area);\n\n        let block = Block::default()\n            .borders(Borders::ALL)\n            .title(\" llmfit \")\n            .title_style(Style::default().add_modifier(Modifier::BOLD));\n        let line = Line::from(vec![\n            Span::raw(\" \"),\n            Span::styled(\"Loading: \", Style::default().add_modifier(Modifier::BOLD)),\n            Span::raw(message),\n        ]);\n        frame.render_widget(Paragraph::new(line).block(block), layout[1]);\n    })?;\n    Ok(())\n}\n\nfn run_recommend(\n    limit: usize,\n    use_case: Option<String>,\n    min_fit: String,\n    runtime_filter: String,\n    force_runtime: Option<String>,\n    capability: Option<String>,\n    json: bool,\n    memory_override: &Option<String>,\n    context_limit: Option<u32>,\n) {\n    let specs = detect_specs(memory_override);\n    let db = ModelDatabase::new();\n\n    // Parse --force-runtime into an InferenceRuntime if provided\n    let forced_rt = force_runtime\n        .as_deref()\n        .map(|rt| match rt.to_lowercase().as_str() {\n            \"mlx\" => llmfit_core::fit::InferenceRuntime::Mlx,\n            \"llamacpp\" | \"llama.cpp\" | \"llama_cpp\" => llmfit_core::fit::InferenceRuntime::LlamaCpp,\n            \"vllm\" => llmfit_core::fit::InferenceRuntime::Vllm,\n            other => {\n                eprintln!(\n                    \"Unknown runtime '{}'. Valid options: mlx, llamacpp, vllm\",\n                    other\n                );\n                std::process::exit(1);\n            }\n        });\n\n    let mut fits: Vec<ModelFit> = db\n        .get_all_models()\n        .iter()\n        .filter(|m| backend_compatible(m, &specs))\n        .map(|m| ModelFit::analyze_with_forced_runtime(m, &specs, context_limit, forced_rt))\n        .collect();\n\n    // Filter by minimum fit level\n    let min_level = match min_fit.to_lowercase().as_str() {\n        \"perfect\" => llmfit_core::fit::FitLevel::Perfect,\n        \"good\" => llmfit_core::fit::FitLevel::Good,\n        \"marginal\" => llmfit_core::fit::FitLevel::Marginal,\n        _ => llmfit_core::fit::FitLevel::Marginal,\n    };\n    fits.retain(|f| match (min_level, f.fit_level) {\n        (llmfit_core::fit::FitLevel::Marginal, llmfit_core::fit::FitLevel::TooTight) => false,\n        (\n            llmfit_core::fit::FitLevel::Good,\n            llmfit_core::fit::FitLevel::TooTight | llmfit_core::fit::FitLevel::Marginal,\n        ) => false,\n        (llmfit_core::fit::FitLevel::Perfect, llmfit_core::fit::FitLevel::Perfect) => true,\n        (llmfit_core::fit::FitLevel::Perfect, _) => false,\n        _ => true,\n    });\n\n    // Hide MLX-only models on non-Apple Silicon systems\n    let is_apple_silicon =\n        specs.backend == llmfit_core::hardware::GpuBackend::Metal && specs.unified_memory;\n    if !is_apple_silicon {\n        fits.retain(|f| !f.model.is_mlx_only());\n    }\n\n    // Filter by runtime\n    match runtime_filter.to_lowercase().as_str() {\n        \"mlx\" => fits.retain(|f| f.runtime == llmfit_core::fit::InferenceRuntime::Mlx),\n        \"llamacpp\" | \"llama.cpp\" | \"llama_cpp\" => {\n            fits.retain(|f| f.runtime == llmfit_core::fit::InferenceRuntime::LlamaCpp)\n        }\n        \"vllm\" => fits.retain(|f| f.runtime == llmfit_core::fit::InferenceRuntime::Vllm),\n        _ => {} // \"any\" or unrecognized — keep all\n    }\n\n    // Filter by use case if specified\n    if let Some(ref uc) = use_case {\n        let target = match uc.to_lowercase().as_str() {\n            \"coding\" | \"code\" => Some(llmfit_core::models::UseCase::Coding),\n            \"reasoning\" | \"reason\" => Some(llmfit_core::models::UseCase::Reasoning),\n            \"chat\" => Some(llmfit_core::models::UseCase::Chat),\n            \"multimodal\" | \"vision\" => Some(llmfit_core::models::UseCase::Multimodal),\n            \"embedding\" | \"embed\" => Some(llmfit_core::models::UseCase::Embedding),\n            \"general\" => Some(llmfit_core::models::UseCase::General),\n            _ => None,\n        };\n        if let Some(target_uc) = target {\n            fits.retain(|f| f.use_case == target_uc);\n        }\n    }\n\n    // Filter by capability if specified\n    if let Some(ref caps_str) = capability {\n        let requested: Vec<&str> = caps_str.split(',').map(|s| s.trim()).collect();\n        fits.retain(|f| {\n            requested\n                .iter()\n                .all(|req| match req.to_lowercase().as_str() {\n                    \"vision\" => f\n                        .model\n                        .capabilities\n                        .contains(&llmfit_core::models::Capability::Vision),\n                    \"tool_use\" | \"tools\" | \"tool-use\" | \"function_calling\" => f\n                        .model\n                        .capabilities\n                        .contains(&llmfit_core::models::Capability::ToolUse),\n                    _ => true,\n                })\n        });\n    }\n\n    fits = llmfit_core::fit::rank_models_by_fit(fits);\n    fits.truncate(limit);\n\n    if json {\n        display::display_json_fits(&specs, &fits);\n    } else {\n        if !fits.is_empty() {\n            specs.display();\n        }\n        display::display_model_fits(&fits);\n    }\n}\n\nfn run_download(\n    model: &str,\n    quant: Option<&str>,\n    budget: Option<f64>,\n    list_only: bool,\n    memory_override: &Option<String>,\n) {\n    use llmfit_core::providers::LlamaCppProvider;\n\n    let provider = LlamaCppProvider::new();\n\n    // Resolve repo ID: try known mapping, then treat as repo, then search\n    let repo_id = if model.contains('/') {\n        model.to_string()\n    } else if let Some(repo) = llmfit_core::providers::gguf_pull_tag(model) {\n        repo\n    } else {\n        // Search HuggingFace\n        println!(\n            \"Searching HuggingFace for GGUF models matching '{}'...\",\n            model\n        );\n        let results = LlamaCppProvider::search_hf_gguf(model);\n        if results.is_empty() {\n            eprintln!(\n                \"No GGUF models found for '{}'. Try a different search term.\",\n                model\n            );\n            eprintln!(\"Tip: use 'llmfit hf-search <query>' to browse available models.\");\n            std::process::exit(1);\n        }\n        if results.len() > 1 && !list_only {\n            println!(\"\\nFound {} repositories:\", results.len());\n            for (i, (id, desc)) in results.iter().enumerate().take(10) {\n                println!(\"  {}. {} ({})\", i + 1, id, desc);\n            }\n            println!(\"\\nUsing first result: {}\", results[0].0);\n        }\n        results[0].0.clone()\n    };\n\n    // List available GGUF files\n    println!(\"Fetching available files from {}...\", repo_id);\n    let files = LlamaCppProvider::list_repo_gguf_files(&repo_id);\n    if files.is_empty() {\n        eprintln!(\"No GGUF files found in repository '{}'.\", repo_id);\n        eprintln!(\"Make sure this is a valid GGUF repository on HuggingFace.\");\n        std::process::exit(1);\n    }\n\n    if list_only {\n        println!(\"\\nAvailable GGUF files in {}:\", repo_id);\n        println!(\"{:<60} {:>10}\", \"Filename\", \"Size\");\n        println!(\"{}\", \"-\".repeat(72));\n        for (filename, size) in &files {\n            let size_str = if *size > 1_073_741_824 {\n                format!(\"{:.1} GB\", *size as f64 / 1_073_741_824.0)\n            } else {\n                format!(\"{:.0} MB\", *size as f64 / 1_048_576.0)\n            };\n            println!(\"{:<60} {:>10}\", filename, size_str);\n        }\n        return;\n    }\n\n    // Select the file to download\n    let (filename, file_size) = if let Some(q) = quant {\n        // User specified a quantization\n        let q_lower = q.to_lowercase();\n        if let Some((f, s)) = files\n            .iter()\n            .find(|(f, _)| f.to_lowercase().contains(&q_lower))\n        {\n            (f.clone(), *s)\n        } else {\n            eprintln!(\n                \"No GGUF file found matching quantization '{}' in {}.\",\n                q, repo_id\n            );\n            eprintln!(\"\\nAvailable files:\");\n            for (f, s) in &files {\n                let size_str = format!(\"{:.1} GB\", *s as f64 / 1_073_741_824.0);\n                eprintln!(\"  {} ({})\", f, size_str);\n            }\n            std::process::exit(1);\n        }\n    } else {\n        // Auto-select based on hardware budget\n        let mem_budget = if let Some(b) = budget {\n            b\n        } else {\n            let specs = detect_specs(memory_override);\n            specs\n                .total_gpu_vram_gb\n                .or(Some(specs.available_ram_gb))\n                .unwrap_or(16.0)\n        };\n        if let Some(result) = LlamaCppProvider::select_best_gguf(&files, mem_budget) {\n            println!(\n                \"Selected {} ({:.1} GB) for {:.0} GB memory budget\",\n                result.0,\n                result.1 as f64 / 1_073_741_824.0,\n                mem_budget\n            );\n            result\n        } else {\n            // Nothing fits — pick smallest\n            let mut sorted = files.clone();\n            sorted.sort_by_key(|(_, s)| *s);\n            let (f, s) = sorted.first().expect(\"files list is not empty\");\n            println!(\n                \"Warning: No quantization fits within {:.0} GB. Downloading smallest: {} ({:.1} GB)\",\n                mem_budget,\n                f,\n                *s as f64 / 1_073_741_824.0\n            );\n            (f.clone(), *s)\n        }\n    };\n\n    println!(\n        \"\\nDownloading {} ({:.1} GB) to {}\",\n        filename,\n        file_size as f64 / 1_073_741_824.0,\n        provider.models_dir().display()\n    );\n\n    match provider.download_gguf(&repo_id, &filename) {\n        Ok(handle) => {\n            // Poll for progress\n            loop {\n                match handle.receiver.recv() {\n                    Ok(llmfit_core::providers::PullEvent::Progress { status, percent }) => {\n                        if let Some(p) = percent {\n                            print!(\"\\r\\x1b[K  {:.1}% - {}\", p, status);\n                            use std::io::Write;\n                            let _ = std::io::stdout().flush();\n                        } else {\n                            println!(\"  {}\", status);\n                        }\n                    }\n                    Ok(llmfit_core::providers::PullEvent::Done) => {\n                        println!(\"\\n\\n✓ Download complete!\");\n                        let dest = provider.models_dir().join(&filename);\n                        println!(\"  Saved to: {}\", dest.display());\n                        if provider.llama_cli_path().is_some() {\n                            println!(\n                                \"\\n  Run with: llmfit run {}\",\n                                filename.trim_end_matches(\".gguf\")\n                            );\n                            println!(\"  Or directly: llama-cli -m {} -cnv\", dest.display());\n                        } else {\n                            println!(\"\\n  Install llama.cpp to run this model:\");\n                            println!(\"    brew install llama.cpp\");\n                            println!(\"    # or build from source:\");\n                            println!(\n                                \"    git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp\"\n                            );\n                            println!(\"    cmake -B build && cmake --build build --config Release\");\n                            println!(\"\\n  Then run: llama-cli -m {} -cnv\", dest.display());\n                        }\n                        break;\n                    }\n                    Ok(llmfit_core::providers::PullEvent::Error(e)) => {\n                        eprintln!(\"\\n\\n✗ Download failed: {}\", e);\n                        std::process::exit(1);\n                    }\n                    Err(_) => {\n                        eprintln!(\"\\n\\n✗ Download channel closed unexpectedly\");\n                        std::process::exit(1);\n                    }\n                }\n            }\n        }\n        Err(e) => {\n            eprintln!(\"Failed to start download: {}\", e);\n            std::process::exit(1);\n        }\n    }\n}\n\nfn run_hf_search(query: &str, limit: usize) {\n    use llmfit_core::providers::LlamaCppProvider;\n\n    println!(\n        \"Searching HuggingFace for GGUF models matching '{}'...\\n\",\n        query\n    );\n    let results = LlamaCppProvider::search_hf_gguf(query);\n\n    if results.is_empty() {\n        println!(\"No GGUF models found. Try a different search term.\");\n        return;\n    }\n\n    println!(\"{:<50} Type\", \"Repository\");\n    println!(\"{}\", \"-\".repeat(65));\n    for (id, desc) in results.iter().take(limit) {\n        println!(\"{:<50} {}\", id, desc);\n    }\n\n    println!(\"\\nTo download: llmfit download <repository>\");\n    println!(\"To list files: llmfit download <repository> --list\");\n}\n\nfn run_model(model: &str, server: bool, port: u16, ngl: i32, ctx_size: u32) {\n    use llmfit_core::providers::LlamaCppProvider;\n\n    let provider = LlamaCppProvider::new();\n\n    // Find the model file\n    let model_path = if std::path::Path::new(model).exists() {\n        std::path::PathBuf::from(model)\n    } else {\n        // Search in cache directory\n        let gguf_files = provider.list_gguf_files();\n        let search = model.to_lowercase();\n        let found = gguf_files.into_iter().find(|p| {\n            p.file_stem()\n                .and_then(|s| s.to_str())\n                .map(|s| s.to_lowercase().contains(&search))\n                .unwrap_or(false)\n        });\n        match found {\n            Some(p) => p,\n            None => {\n                eprintln!(\"Model '{}' not found.\", model);\n                eprintln!(\"\\nAvailable models in {}:\", provider.models_dir().display());\n                for f in provider.list_gguf_files() {\n                    eprintln!(\"  {}\", f.file_name().unwrap_or_default().to_string_lossy());\n                }\n                eprintln!(\"\\nUse 'llmfit download <model>' to download a model first.\");\n                std::process::exit(1);\n            }\n        }\n    };\n\n    if server {\n        let Some(bin) = provider.llama_server_path() else {\n            eprintln!(\"llama-server not found in PATH.\");\n            eprintln!(\"Install llama.cpp: brew install llama.cpp\");\n            eprintln!(\"Or build from source: https://github.com/ggml-org/llama.cpp\");\n            std::process::exit(1);\n        };\n\n        println!(\n            \"Starting llama-server on port {} with {}...\",\n            port,\n            model_path.display()\n        );\n        let status = std::process::Command::new(bin)\n            .args([\n                \"-m\",\n                model_path.to_str().unwrap_or(\"\"),\n                \"--port\",\n                &port.to_string(),\n                \"-ngl\",\n                &ngl.to_string(),\n                \"-c\",\n                &ctx_size.to_string(),\n            ])\n            .status();\n\n        match status {\n            Ok(s) if !s.success() => {\n                std::process::exit(s.code().unwrap_or(1));\n            }\n            Err(e) => {\n                eprintln!(\"Failed to run llama-server: {}\", e);\n                std::process::exit(1);\n            }\n            _ => {}\n        }\n    } else {\n        let Some(bin) = provider.llama_cli_path() else {\n            eprintln!(\"llama-cli not found in PATH.\");\n            eprintln!(\"Install llama.cpp: brew install llama.cpp\");\n            eprintln!(\"Or build from source: https://github.com/ggml-org/llama.cpp\");\n            std::process::exit(1);\n        };\n\n        println!(\"Running {} with llama-cli...\\n\", model_path.display());\n        let status = std::process::Command::new(bin)\n            .args([\n                \"-m\",\n                model_path.to_str().unwrap_or(\"\"),\n                \"-ngl\",\n                &ngl.to_string(),\n                \"-c\",\n                &ctx_size.to_string(),\n                \"-cnv\",\n            ])\n            .status();\n\n        match status {\n            Ok(s) if !s.success() => {\n                std::process::exit(s.code().unwrap_or(1));\n            }\n            Err(e) => {\n                eprintln!(\"Failed to run llama-cli: {}\", e);\n                std::process::exit(1);\n            }\n            _ => {}\n        }\n    }\n}\n\nfn run_plan(\n    model_selector: &str,\n    context: u32,\n    quant: Option<String>,\n    target_tps: Option<f64>,\n    json: bool,\n    memory_override: &Option<String>,\n) -> Result<(), String> {\n    let db = ModelDatabase::new();\n    let specs = detect_specs(memory_override);\n    let model = resolve_model_selector(db.get_all_models(), model_selector)?;\n\n    let request = PlanRequest {\n        context,\n        quant,\n        target_tps,\n    };\n    let plan = estimate_model_plan(model, &request, &specs)?;\n\n    if json {\n        display::display_json_plan(&plan);\n    } else {\n        specs.display();\n        display::display_model_plan(&plan);\n    }\n\n    Ok(())\n}\n\nfn main() {\n    let cli = Cli::parse();\n    let context_limit = resolve_context_limit(cli.max_context);\n    let auto_dashboard = !cli.no_dashboard\n        && !cli.json\n        && !matches!(cli.command.as_ref(), Some(Commands::Serve { .. }));\n\n    let _dashboard_guard = if auto_dashboard {\n        ensure_dashboard_available(&cli.memory, context_limit)\n    } else {\n        None\n    };\n\n    // If a subcommand is given, use classic CLI mode\n    if let Some(command) = cli.command {\n        match command {\n            Commands::System => {\n                let specs = detect_specs(&cli.memory);\n                if cli.json {\n                    display::display_json_system(&specs);\n                } else {\n                    specs.display();\n                }\n            }\n\n            Commands::List => {\n                let db = ModelDatabase::new();\n                if cli.json {\n                    println!(\n                        \"{}\",\n                        serde_json::to_string_pretty(db.get_all_models())\n                            .expect(\"JSON serialization failed\")\n                    );\n                } else {\n                    display::display_all_models(db.get_all_models());\n                }\n            }\n\n            Commands::Fit {\n                perfect,\n                limit,\n                sort,\n            } => {\n                run_fit(\n                    perfect,\n                    limit,\n                    sort.into(),\n                    cli.json,\n                    &cli.memory,\n                    context_limit,\n                );\n            }\n\n            Commands::Search { query } => {\n                let db = ModelDatabase::new();\n                let results = db.find_model(&query);\n                display::display_search_results(&results, &query);\n            }\n\n            Commands::Info { model } => {\n                let db = ModelDatabase::new();\n                let specs = detect_specs(&cli.memory);\n                let models = db.get_all_models();\n\n                let idx = match find_name_index_by_selector(models, &model, |m| m.name.as_str()) {\n                    Ok(i) => i,\n                    Err(err) => {\n                        println!(\"\\n{}\", err);\n                        return;\n                    }\n                };\n\n                let fit = ModelFit::analyze_with_context_limit(&models[idx], &specs, context_limit);\n                if cli.json {\n                    display::display_json_fits(&specs, &[fit]);\n                } else {\n                    display::display_model_detail(&fit);\n                }\n            }\n\n            Commands::Diff {\n                model_a,\n                model_b,\n                sort,\n                fit,\n                limit,\n            } => {\n                run_diff(\n                    model_a,\n                    model_b,\n                    fit,\n                    sort.into(),\n                    limit,\n                    cli.json,\n                    &cli.memory,\n                    context_limit,\n                );\n            }\n\n            Commands::Plan {\n                model,\n                context,\n                quant,\n                target_tps,\n            } => {\n                if let Err(err) =\n                    run_plan(&model, context, quant, target_tps, cli.json, &cli.memory)\n                {\n                    eprintln!(\"Error: {}\", err);\n                    std::process::exit(1);\n                }\n            }\n\n            Commands::Recommend {\n                limit,\n                use_case,\n                min_fit,\n                runtime,\n                force_runtime,\n                capability,\n                json,\n            } => {\n                run_recommend(\n                    limit,\n                    use_case,\n                    min_fit,\n                    runtime,\n                    force_runtime,\n                    capability,\n                    json,\n                    &cli.memory,\n                    context_limit,\n                );\n            }\n\n            Commands::Download {\n                model,\n                quant,\n                budget,\n                list,\n            } => {\n                run_download(&model, quant.as_deref(), budget, list, &cli.memory);\n            }\n\n            Commands::HfSearch { query, limit } => {\n                run_hf_search(&query, limit);\n            }\n\n            Commands::Run {\n                model,\n                server,\n                port,\n                ngl,\n                ctx_size,\n            } => {\n                run_model(&model, server, port, ngl, ctx_size);\n            }\n\n            Commands::Serve { host, port } => {\n                if let Err(err) = serve_api::run_serve(&host, port, &cli.memory, context_limit) {\n                    eprintln!(\"Error: {}\", err);\n                    std::process::exit(1);\n                }\n            }\n        }\n        return;\n    }\n\n    // If --cli or --json flag, use classic fit output\n    if cli.cli || cli.json {\n        run_fit(\n            cli.perfect,\n            cli.limit,\n            cli.sort.into(),\n            cli.json,\n            &cli.memory,\n            context_limit,\n        );\n        return;\n    }\n\n    // Default: launch TUI\n    if let Err(e) = run_tui(&cli.memory, context_limit) {\n        eprintln!(\"Error running TUI: {}\", e);\n        std::process::exit(1);\n    }\n}\n\n#[cfg(test)]\nmod tests {\n    use super::*;\n    use llmfit_core::fit::{FitLevel, InferenceRuntime, RunMode, ScoreComponents};\n    use llmfit_core::models::LlmModel;\n\n    fn mock_fit(name: &str, fit_level: FitLevel) -> ModelFit {\n        ModelFit {\n            model: LlmModel {\n                name: name.to_string(),\n                provider: \"test\".to_string(),\n                parameter_count: \"7B\".to_string(),\n                parameters_raw: None,\n                min_ram_gb: 4.0,\n                recommended_ram_gb: 8.0,\n                min_vram_gb: Some(4.0),\n                quantization: \"Q4_K_M\".to_string(),\n                context_length: 8192,\n                use_case: \"general\".to_string(),\n                is_moe: false,\n                num_experts: None,\n                active_experts: None,\n                active_parameters: None,\n                release_date: Some(\"2025-01-01\".to_string()),\n                gguf_sources: vec![],\n                capabilities: vec![],\n                format: llmfit_core::models::ModelFormat::default(),\n            },\n            fit_level,\n            run_mode: RunMode::Gpu,\n            memory_required_gb: 4.0,\n            memory_available_gb: 8.0,\n            utilization_pct: 50.0,\n            notes: vec![],\n            moe_offloaded_gb: None,\n            score: 80.0,\n            score_components: ScoreComponents {\n                quality: 80.0,\n                speed: 80.0,\n                fit: 80.0,\n                context: 80.0,\n            },\n            estimated_tps: 30.0,\n            best_quant: \"Q4_K_M\".to_string(),\n            use_case: llmfit_core::models::UseCase::General,\n            runtime: InferenceRuntime::LlamaCpp,\n            installed: false,\n        }\n    }\n\n    #[test]\n    fn fit_filter_runnable_excludes_too_tight() {\n        let runnable = mock_fit(\"alpha/model\", FitLevel::Good);\n        let tight = mock_fit(\"beta/model\", FitLevel::TooTight);\n        assert!(fit_matches_filter(&runnable, FitArg::Runnable));\n        assert!(!fit_matches_filter(&tight, FitArg::Runnable));\n    }\n\n    #[test]\n    fn selector_prefers_exact_match() {\n        let fits = vec![\n            mock_fit(\"org/model-a\", FitLevel::Perfect),\n            mock_fit(\"org/model-a-instruct\", FitLevel::Perfect),\n        ];\n        let idx = find_fit_index_by_selector(&fits, \"org/model-a\").expect(\"should resolve\");\n        assert_eq!(idx, 0);\n    }\n\n    #[test]\n    fn selector_errors_on_ambiguous_partial() {\n        let fits = vec![\n            mock_fit(\"org/model-a\", FitLevel::Perfect),\n            mock_fit(\"org/model-a-instruct\", FitLevel::Perfect),\n        ];\n        let err = find_fit_index_by_selector(&fits, \"model-a\").expect_err(\"should be ambiguous\");\n        assert!(err.contains(\"Multiple models match\"));\n    }\n\n    #[test]\n    fn generic_selector_prefers_exact_match_for_models() {\n        let models = vec![\n            LlmModel {\n                name: \"Qwen/Qwen3-Coder-Next-FP8\".to_string(),\n                provider: \"Qwen\".to_string(),\n                parameter_count: \"7B\".to_string(),\n                parameters_raw: None,\n                min_ram_gb: 4.0,\n                recommended_ram_gb: 8.0,\n                min_vram_gb: Some(4.0),\n                quantization: \"Q4_K_M\".to_string(),\n                context_length: 8192,\n                use_case: \"general\".to_string(),\n                is_moe: false,\n                num_experts: None,\n                active_experts: None,\n                active_parameters: None,\n                release_date: None,\n                gguf_sources: vec![],\n                capabilities: vec![],\n                format: llmfit_core::models::ModelFormat::default(),\n            },\n            LlmModel {\n                name: \"Qwen/Qwen3-Coder-Next\".to_string(),\n                provider: \"Qwen\".to_string(),\n                parameter_count: \"7B\".to_string(),\n                parameters_raw: None,\n                min_ram_gb: 4.0,\n                recommended_ram_gb: 8.0,\n                min_vram_gb: Some(4.0),\n                quantization: \"Q4_K_M\".to_string(),\n                context_length: 8192,\n                use_case: \"general\".to_string(),\n                is_moe: false,\n                num_experts: None,\n                active_experts: None,\n                active_parameters: None,\n                release_date: None,\n                gguf_sources: vec![],\n                capabilities: vec![],\n                format: llmfit_core::models::ModelFormat::default(),\n            },\n        ];\n\n        let idx =\n            find_name_index_by_selector(&models, \"Qwen/Qwen3-Coder-Next\", |m| m.name.as_str())\n                .expect(\"should resolve exact model\");\n        assert_eq!(idx, 1);\n    }\n}\n"
  },
  {
    "path": "llmfit-tui/src/serve_api.rs",
    "content": "use std::collections::HashMap;\nuse std::net::{IpAddr, SocketAddr};\nuse std::sync::{Arc, LazyLock};\n\nuse axum::extract::{Path, Query, State};\nuse axum::http::header::{CACHE_CONTROL, CONTENT_TYPE};\nuse axum::http::{HeaderValue, StatusCode};\nuse axum::response::{IntoResponse, Response};\nuse axum::routing::get;\nuse axum::{Json, Router};\nuse llmfit_core::fit::{\n    FitLevel, InferenceRuntime, ModelFit, SortColumn, backend_compatible,\n    rank_models_by_fit_opts_col,\n};\nuse llmfit_core::hardware::{GpuBackend, SystemSpecs};\nuse llmfit_core::models::{LlmModel, ModelDatabase, UseCase};\nuse serde::{Deserialize, Serialize};\n\ninclude!(concat!(env!(\"OUT_DIR\"), \"/web_assets.rs\"));\n\nstatic ASSET_MAP: LazyLock<HashMap<&'static str, &'static EmbeddedAsset>> =\n    LazyLock::new(|| EMBEDDED_WEB_ASSETS.iter().map(|a| (a.path, a)).collect());\n\n#[derive(Clone)]\nstruct AppState {\n    node_name: String,\n    os: String,\n    specs: SystemSpecs,\n    models: Vec<LlmModel>,\n    context_limit: Option<u32>,\n}\n\n#[derive(Debug, Deserialize)]\nstruct ModelsQuery {\n    limit: Option<usize>,\n    #[serde(alias = \"n\")]\n    top: Option<usize>,\n    perfect: Option<bool>,\n    min_fit: Option<String>,\n    runtime: Option<String>,\n    use_case: Option<String>,\n    provider: Option<String>,\n    search: Option<String>,\n    sort: Option<String>,\n    include_too_tight: Option<bool>,\n    max_context: Option<u32>,\n    force_runtime: Option<String>,\n}\n\n#[derive(Debug, Serialize)]\nstruct NodeInfo {\n    name: String,\n    os: String,\n}\n\n#[derive(Debug, Serialize)]\nstruct ApiEnvelope {\n    node: NodeInfo,\n    system: serde_json::Value,\n    total_models: usize,\n    returned_models: usize,\n    filters: serde_json::Value,\n    models: Vec<serde_json::Value>,\n}\n\n#[derive(Debug)]\nstruct ApiError {\n    status: StatusCode,\n    message: String,\n}\n\nimpl ApiError {\n    fn bad_request(message: impl Into<String>) -> Self {\n        Self {\n            status: StatusCode::BAD_REQUEST,\n            message: message.into(),\n        }\n    }\n\n    fn internal(message: impl Into<String>) -> Self {\n        Self {\n            status: StatusCode::INTERNAL_SERVER_ERROR,\n            message: message.into(),\n        }\n    }\n}\n\nimpl IntoResponse for ApiError {\n    fn into_response(self) -> Response {\n        (\n            self.status,\n            Json(serde_json::json!({\n                \"error\": self.message,\n            })),\n        )\n            .into_response()\n    }\n}\n\ntype ApiResult<T> = Result<T, ApiError>;\n\npub fn run_serve(\n    host: &str,\n    port: u16,\n    memory_override: &Option<String>,\n    context_limit: Option<u32>,\n) -> Result<(), String> {\n    let ip: IpAddr = host\n        .parse()\n        .map_err(|_| format!(\"invalid --host value: '{host}'\"))?;\n    let addr = SocketAddr::new(ip, port);\n\n    let specs = detect_specs(memory_override);\n    let db = ModelDatabase::new();\n    let all_models = db.get_all_models().clone();\n\n    let node_name = std::env::var(\"HOSTNAME\")\n        .ok()\n        .filter(|v| !v.trim().is_empty())\n        .unwrap_or_else(|| \"unknown-node\".to_string());\n\n    let state = Arc::new(AppState {\n        node_name,\n        os: std::env::consts::OS.to_string(),\n        specs,\n        models: all_models,\n        context_limit,\n    });\n\n    let app = build_router(state);\n\n    println!(\"llmfit dashboard listening on http://{}/\", addr);\n    println!(\"  API models: http://{}/api/v1/models\", addr);\n    println!(\"  GET /health\");\n    println!(\"  GET /api/v1/system\");\n    println!(\"  GET /api/v1/models?limit=20&min_fit=marginal&sort=score\");\n    println!(\"  GET /api/v1/models/top?limit=5&use_case=coding&min_fit=good\");\n    println!(\"  GET /api/v1/models/<name>\");\n\n    let runtime = tokio::runtime::Builder::new_multi_thread()\n        .enable_all()\n        .build()\n        .map_err(|e| format!(\"failed to start tokio runtime: {e}\"))?;\n\n    runtime\n        .block_on(async move {\n            let listener = tokio::net::TcpListener::bind(addr)\n                .await\n                .map_err(|e| ApiError::internal(format!(\"bind failed on {addr}: {e}\")))?;\n\n            axum::serve(listener, app)\n                .with_graceful_shutdown(async {\n                    let _ = tokio::signal::ctrl_c().await;\n                })\n                .await\n                .map_err(|e| ApiError::internal(format!(\"server error: {e}\")))\n        })\n        .map_err(|e| e.message)\n}\n\nfn build_router(state: Arc<AppState>) -> Router {\n    Router::new()\n        .route(\"/\", get(web_index))\n        .route(\"/assets/{*path}\", get(web_asset))\n        .route(\"/health\", get(health))\n        .route(\"/api/v1/system\", get(system))\n        .route(\"/api/v1/models\", get(models))\n        .route(\"/api/v1/models/top\", get(top_models))\n        .route(\"/api/v1/models/{name}\", get(model_by_name))\n        .route(\"/{*path}\", get(spa_fallback))\n        .with_state(state)\n}\n\nasync fn health(State(state): State<Arc<AppState>>) -> Json<serde_json::Value> {\n    Json(serde_json::json!({\n        \"status\": \"ok\",\n        \"node\": {\n            \"name\": state.node_name,\n            \"os\": state.os,\n        }\n    }))\n}\n\nasync fn system(State(state): State<Arc<AppState>>) -> Json<serde_json::Value> {\n    Json(serde_json::json!({\n        \"node\": {\n            \"name\": state.node_name,\n            \"os\": state.os,\n        },\n        \"system\": system_json(&state.specs),\n    }))\n}\n\nasync fn web_index() -> Response {\n    serve_web_path(\"/index.html\")\n}\n\nasync fn web_asset(Path(path): Path<String>) -> Response {\n    let asset_path = format!(\"/assets/{}\", path.trim_start_matches('/'));\n    serve_web_path(&asset_path)\n}\n\nasync fn spa_fallback(Path(path): Path<String>) -> Response {\n    if path.starts_with(\"api/\") || path == \"health\" || path.starts_with(\"assets/\") {\n        return StatusCode::NOT_FOUND.into_response();\n    }\n    serve_web_path(\"/index.html\")\n}\n\nfn serve_web_path(path: &str) -> Response {\n    let Some(asset) = find_web_asset(path) else {\n        return StatusCode::NOT_FOUND.into_response();\n    };\n\n    let mut response = asset.bytes.to_vec().into_response();\n    response\n        .headers_mut()\n        .insert(CONTENT_TYPE, HeaderValue::from_static(asset.content_type));\n    let cache_value = if path.starts_with(\"/assets/\") {\n        \"public, max-age=31536000, immutable\"\n    } else {\n        \"no-cache\"\n    };\n    response\n        .headers_mut()\n        .insert(CACHE_CONTROL, HeaderValue::from_static(cache_value));\n    response\n}\n\nfn find_web_asset(path: &str) -> Option<&'static EmbeddedAsset> {\n    ASSET_MAP.get(path).copied()\n}\n\nasync fn models(\n    State(state): State<Arc<AppState>>,\n    Query(query): Query<ModelsQuery>,\n) -> ApiResult<Json<ApiEnvelope>> {\n    let mut fits = filtered_fits(&state, &query, false)?;\n    let total_models = fits.len();\n\n    let limit = query.limit.or(query.top).unwrap_or(usize::MAX);\n    if limit < fits.len() {\n        fits.truncate(limit);\n    }\n\n    let envelope = ApiEnvelope {\n        node: NodeInfo {\n            name: state.node_name.clone(),\n            os: state.os.clone(),\n        },\n        system: system_json(&state.specs),\n        total_models,\n        returned_models: fits.len(),\n        filters: active_filters_json(&query, false),\n        models: fits.iter().map(fit_to_json).collect(),\n    };\n\n    Ok(Json(envelope))\n}\n\nasync fn top_models(\n    State(state): State<Arc<AppState>>,\n    Query(query): Query<ModelsQuery>,\n) -> ApiResult<Json<ApiEnvelope>> {\n    let mut fits = filtered_fits(&state, &query, true)?;\n    let total_models = fits.len();\n\n    let limit = query.limit.or(query.top).unwrap_or(5);\n    if limit < fits.len() {\n        fits.truncate(limit);\n    }\n\n    let envelope = ApiEnvelope {\n        node: NodeInfo {\n            name: state.node_name.clone(),\n            os: state.os.clone(),\n        },\n        system: system_json(&state.specs),\n        total_models,\n        returned_models: fits.len(),\n        filters: active_filters_json(&query, true),\n        models: fits.iter().map(fit_to_json).collect(),\n    };\n\n    Ok(Json(envelope))\n}\n\nasync fn model_by_name(\n    State(state): State<Arc<AppState>>,\n    Path(name): Path<String>,\n    Query(query): Query<ModelsQuery>,\n) -> ApiResult<Json<ApiEnvelope>> {\n    let mut scoped = query;\n    scoped.search = Some(name);\n\n    let mut fits = filtered_fits(&state, &scoped, false)?;\n    let total_models = fits.len();\n\n    let limit = scoped.limit.or(scoped.top).unwrap_or(20);\n    if limit < fits.len() {\n        fits.truncate(limit);\n    }\n\n    let envelope = ApiEnvelope {\n        node: NodeInfo {\n            name: state.node_name.clone(),\n            os: state.os.clone(),\n        },\n        system: system_json(&state.specs),\n        total_models,\n        returned_models: fits.len(),\n        filters: active_filters_json(&scoped, false),\n        models: fits.iter().map(fit_to_json).collect(),\n    };\n\n    Ok(Json(envelope))\n}\n\nfn filtered_fits(\n    state: &AppState,\n    query: &ModelsQuery,\n    top_only: bool,\n) -> Result<Vec<ModelFit>, ApiError> {\n    let sort_column = parse_sort(query.sort.as_deref())?;\n    let min_fit = parse_min_fit(query.min_fit.as_deref())?;\n    let runtime_filter = parse_runtime(query.runtime.as_deref())?;\n    let use_case_filter = parse_use_case(query.use_case.as_deref())?;\n\n    let context_limit = query.max_context.or(state.context_limit);\n    let forced_rt = parse_force_runtime(query.force_runtime.as_deref())?;\n    let mut fits: Vec<ModelFit> = state\n        .models\n        .iter()\n        .filter(|m| backend_compatible(m, &state.specs))\n        .map(|m| ModelFit::analyze_with_forced_runtime(m, &state.specs, context_limit, forced_rt))\n        .collect();\n\n    let is_apple_silicon = state.specs.backend == GpuBackend::Metal && state.specs.unified_memory;\n    if !is_apple_silicon {\n        fits.retain(|f| !f.model.is_mlx_only());\n    }\n\n    if let Some(provider) = query.provider.as_ref() {\n        let provider_lower = provider.to_lowercase();\n        fits.retain(|f| f.model.provider.to_lowercase().contains(&provider_lower));\n    }\n\n    if let Some(search) = query.search.as_ref() {\n        let search_lower = search.to_lowercase();\n        fits.retain(|f| {\n            f.model.name.to_lowercase().contains(&search_lower)\n                || f.model.provider.to_lowercase().contains(&search_lower)\n                || f.model\n                    .parameter_count\n                    .to_lowercase()\n                    .contains(&search_lower)\n                || f.model.use_case.to_lowercase().contains(&search_lower)\n                || f.use_case.label().to_lowercase().contains(&search_lower)\n        });\n    }\n\n    if query.perfect.unwrap_or(false) {\n        fits.retain(|f| f.fit_level == FitLevel::Perfect);\n    } else {\n        fits.retain(|f| fit_at_least(f.fit_level, min_fit));\n    }\n\n    match runtime_filter {\n        RuntimeFilter::Any => {}\n        RuntimeFilter::Mlx => fits.retain(|f| f.runtime == InferenceRuntime::Mlx),\n        RuntimeFilter::Vllm => fits.retain(|f| f.runtime == InferenceRuntime::Vllm),\n        RuntimeFilter::LlamaCpp => {\n            fits.retain(|f| f.runtime == InferenceRuntime::LlamaCpp);\n        }\n    }\n\n    if let Some(use_case) = use_case_filter {\n        fits.retain(|f| f.use_case == use_case);\n    }\n\n    let include_too_tight = query.include_too_tight.unwrap_or(!top_only);\n    if top_only || !include_too_tight {\n        fits.retain(|f| f.fit_level != FitLevel::TooTight);\n    }\n\n    Ok(rank_models_by_fit_opts_col(fits, false, sort_column))\n}\n\n#[derive(Debug, Clone, Copy)]\nenum RuntimeFilter {\n    Any,\n    Mlx,\n    LlamaCpp,\n    Vllm,\n}\n\nfn parse_sort(raw: Option<&str>) -> Result<SortColumn, ApiError> {\n    let value = raw.unwrap_or(\"score\").trim().to_lowercase();\n    let sort = match value.as_str() {\n        \"score\" => SortColumn::Score,\n        \"tps\" | \"tokens\" | \"throughput\" => SortColumn::Tps,\n        \"params\" | \"parameters\" => SortColumn::Params,\n        \"mem\" | \"memory\" | \"mem_pct\" | \"utilization\" => SortColumn::MemPct,\n        \"ctx\" | \"context\" => SortColumn::Ctx,\n        \"date\" | \"release\" | \"released\" => SortColumn::ReleaseDate,\n        \"use\" | \"use_case\" | \"usecase\" => SortColumn::UseCase,\n        _ => {\n            return Err(ApiError::bad_request(\n                \"invalid sort value: use score|tps|params|mem|ctx|date|use_case\",\n            ));\n        }\n    };\n    Ok(sort)\n}\n\nfn parse_min_fit(raw: Option<&str>) -> Result<FitLevel, ApiError> {\n    let value = raw.unwrap_or(\"marginal\").trim().to_lowercase();\n    let min_fit = match value.as_str() {\n        \"perfect\" => FitLevel::Perfect,\n        \"good\" => FitLevel::Good,\n        \"marginal\" => FitLevel::Marginal,\n        \"too_tight\" | \"tootight\" | \"tight\" => FitLevel::TooTight,\n        _ => {\n            return Err(ApiError::bad_request(\n                \"invalid min_fit value: use perfect|good|marginal|too_tight\",\n            ));\n        }\n    };\n    Ok(min_fit)\n}\n\nfn parse_runtime(raw: Option<&str>) -> Result<RuntimeFilter, ApiError> {\n    let Some(value) = raw else {\n        return Ok(RuntimeFilter::Any);\n    };\n\n    let runtime = match value.trim().to_lowercase().as_str() {\n        \"any\" => RuntimeFilter::Any,\n        \"mlx\" => RuntimeFilter::Mlx,\n        \"llamacpp\" | \"llama.cpp\" | \"llama_cpp\" => RuntimeFilter::LlamaCpp,\n        \"vllm\" => RuntimeFilter::Vllm,\n        _ => {\n            return Err(ApiError::bad_request(\n                \"invalid runtime value: use any|mlx|llamacpp|vllm\",\n            ));\n        }\n    };\n    Ok(runtime)\n}\n\nfn parse_force_runtime(\n    raw: Option<&str>,\n) -> Result<Option<llmfit_core::fit::InferenceRuntime>, ApiError> {\n    let Some(value) = raw else {\n        return Ok(None);\n    };\n    match value.trim().to_lowercase().as_str() {\n        \"mlx\" => Ok(Some(llmfit_core::fit::InferenceRuntime::Mlx)),\n        \"llamacpp\" | \"llama.cpp\" | \"llama_cpp\" => {\n            Ok(Some(llmfit_core::fit::InferenceRuntime::LlamaCpp))\n        }\n        \"vllm\" => Ok(Some(llmfit_core::fit::InferenceRuntime::Vllm)),\n        _ => Err(ApiError::bad_request(\n            \"invalid force_runtime value: use mlx|llamacpp|vllm\",\n        )),\n    }\n}\n\nfn parse_use_case(raw: Option<&str>) -> Result<Option<UseCase>, ApiError> {\n    let Some(value) = raw else {\n        return Ok(None);\n    };\n\n    let use_case = match value.trim().to_lowercase().as_str() {\n        \"coding\" | \"code\" => UseCase::Coding,\n        \"reasoning\" | \"reason\" => UseCase::Reasoning,\n        \"chat\" => UseCase::Chat,\n        \"multimodal\" | \"vision\" => UseCase::Multimodal,\n        \"embedding\" | \"embed\" => UseCase::Embedding,\n        \"general\" => UseCase::General,\n        _ => {\n            return Err(ApiError::bad_request(\n                \"invalid use_case value: use general|coding|reasoning|chat|multimodal|embedding\",\n            ));\n        }\n    };\n    Ok(Some(use_case))\n}\n\nfn fit_at_least(actual: FitLevel, minimum: FitLevel) -> bool {\n    let rank = |fit: FitLevel| match fit {\n        FitLevel::Perfect => 3,\n        FitLevel::Good => 2,\n        FitLevel::Marginal => 1,\n        FitLevel::TooTight => 0,\n    };\n    rank(actual) >= rank(minimum)\n}\n\nfn active_filters_json(query: &ModelsQuery, top_only: bool) -> serde_json::Value {\n    serde_json::json!({\n        \"limit\": query.limit.or(query.top),\n        \"perfect\": query.perfect,\n        \"min_fit\": query.min_fit,\n        \"runtime\": query.runtime,\n        \"use_case\": query.use_case,\n        \"provider\": query.provider,\n        \"search\": query.search,\n        \"sort\": query.sort,\n        \"max_context\": query.max_context,\n        \"include_too_tight\": query.include_too_tight,\n        \"top_only\": top_only,\n    })\n}\n\nfn fit_level_code(fit_level: FitLevel) -> &'static str {\n    match fit_level {\n        FitLevel::Perfect => \"perfect\",\n        FitLevel::Good => \"good\",\n        FitLevel::Marginal => \"marginal\",\n        FitLevel::TooTight => \"too_tight\",\n    }\n}\n\nfn run_mode_code(run_mode: llmfit_core::fit::RunMode) -> &'static str {\n    match run_mode {\n        llmfit_core::fit::RunMode::Gpu => \"gpu\",\n        llmfit_core::fit::RunMode::MoeOffload => \"moe_offload\",\n        llmfit_core::fit::RunMode::CpuOffload => \"cpu_offload\",\n        llmfit_core::fit::RunMode::CpuOnly => \"cpu_only\",\n    }\n}\n\nfn runtime_code(runtime: InferenceRuntime) -> &'static str {\n    match runtime {\n        InferenceRuntime::Mlx => \"mlx\",\n        InferenceRuntime::LlamaCpp => \"llamacpp\",\n        InferenceRuntime::Vllm => \"vllm\",\n    }\n}\n\nfn system_json(specs: &SystemSpecs) -> serde_json::Value {\n    let gpus_json: Vec<serde_json::Value> = specs\n        .gpus\n        .iter()\n        .map(|g| {\n            serde_json::json!({\n                \"name\": g.name,\n                \"vram_gb\": g.vram_gb.map(round2),\n                \"backend\": g.backend.label(),\n                \"count\": g.count,\n                \"unified_memory\": g.unified_memory,\n            })\n        })\n        .collect();\n\n    serde_json::json!({\n        \"total_ram_gb\": round2(specs.total_ram_gb),\n        \"available_ram_gb\": round2(specs.available_ram_gb),\n        \"cpu_cores\": specs.total_cpu_cores,\n        \"cpu_name\": specs.cpu_name,\n        \"has_gpu\": specs.has_gpu,\n        \"gpu_vram_gb\": specs.gpu_vram_gb.map(round2),\n        \"gpu_name\": specs.gpu_name,\n        \"gpu_count\": specs.gpu_count,\n        \"unified_memory\": specs.unified_memory,\n        \"backend\": specs.backend.label(),\n        \"gpus\": gpus_json,\n    })\n}\n\nfn fit_to_json(fit: &ModelFit) -> serde_json::Value {\n    serde_json::json!({\n        \"name\": fit.model.name,\n        \"provider\": fit.model.provider,\n        \"parameter_count\": fit.model.parameter_count,\n        \"params_b\": round2(fit.model.params_b()),\n        \"context_length\": fit.model.context_length,\n        \"use_case\": fit.model.use_case,\n        \"category\": fit.use_case.label(),\n        \"release_date\": fit.model.release_date,\n        \"is_moe\": fit.model.is_moe,\n        \"fit_level\": fit_level_code(fit.fit_level),\n        \"fit_label\": fit.fit_text(),\n        \"run_mode\": run_mode_code(fit.run_mode),\n        \"run_mode_label\": fit.run_mode_text(),\n        \"score\": round1(fit.score),\n        \"score_components\": {\n            \"quality\": round1(fit.score_components.quality),\n            \"speed\": round1(fit.score_components.speed),\n            \"fit\": round1(fit.score_components.fit),\n            \"context\": round1(fit.score_components.context),\n        },\n        \"estimated_tps\": round1(fit.estimated_tps),\n        \"runtime\": runtime_code(fit.runtime),\n        \"runtime_label\": fit.runtime_text(),\n        \"best_quant\": fit.best_quant,\n        \"memory_required_gb\": round2(fit.memory_required_gb),\n        \"memory_available_gb\": round2(fit.memory_available_gb),\n        \"moe_offloaded_gb\": fit.moe_offloaded_gb.map(round2),\n        \"total_memory_gb\": round2(fit.memory_required_gb + fit.moe_offloaded_gb.unwrap_or(0.0)),\n        \"utilization_pct\": round1(fit.utilization_pct),\n        \"notes\": fit.notes,\n        \"gguf_sources\": fit.model.gguf_sources,\n    })\n}\n\nfn round1(v: f64) -> f64 {\n    (v * 10.0).round() / 10.0\n}\n\nfn round2(v: f64) -> f64 {\n    (v * 100.0).round() / 100.0\n}\n\n/// Detect system specs with optional GPU memory override.\nfn detect_specs(memory_override: &Option<String>) -> SystemSpecs {\n    let specs = SystemSpecs::detect();\n    if let Some(mem_str) = memory_override {\n        match llmfit_core::hardware::parse_memory_size(mem_str) {\n            Some(gb) => specs.with_gpu_memory_override(gb),\n            None => specs,\n        }\n    } else {\n        specs\n    }\n}\n\n#[cfg(test)]\nmod tests {\n    use super::*;\n    use axum::body::Body;\n    use axum::http::Request;\n    use http_body_util::BodyExt as _;\n    use std::future::Future;\n    use tower::ServiceExt;\n\n    fn test_state() -> Arc<AppState> {\n        let db = ModelDatabase::new();\n        Arc::new(AppState {\n            node_name: \"test-node\".to_string(),\n            os: \"test-os\".to_string(),\n            specs: SystemSpecs::detect(),\n            models: db.get_all_models().clone(),\n            context_limit: None,\n        })\n    }\n\n    fn test_router() -> Router {\n        build_router(test_state())\n    }\n\n    fn find_asset_path_with_ext(ext: &str) -> Option<&'static EmbeddedAsset> {\n        EMBEDDED_WEB_ASSETS\n            .iter()\n            .find(|asset| asset.path.starts_with(\"/assets/\") && asset.path.ends_with(ext))\n    }\n\n    fn run_async<T>(future: impl Future<Output = T>) -> T {\n        tokio::runtime::Builder::new_current_thread()\n            .enable_all()\n            .build()\n            .expect(\"tokio runtime\")\n            .block_on(future)\n    }\n\n    #[test]\n    fn root_serves_index_html() {\n        run_async(async {\n            let response = test_router()\n                .oneshot(Request::builder().uri(\"/\").body(Body::empty()).unwrap())\n                .await\n                .unwrap();\n\n            assert_eq!(response.status(), StatusCode::OK);\n            assert_eq!(\n                response.headers().get(CONTENT_TYPE).unwrap(),\n                \"text/html; charset=utf-8\"\n            );\n        });\n    }\n\n    #[test]\n    fn assets_route_serves_embedded_file_with_content_type() {\n        let Some(asset) = find_asset_path_with_ext(\".js\")\n            .or_else(|| find_asset_path_with_ext(\".css\"))\n            .or_else(|| find_asset_path_with_ext(\".svg\"))\n        else {\n            panic!(\"no embedded assets available under /assets/\");\n        };\n\n        run_async(async {\n            let response = test_router()\n                .oneshot(\n                    Request::builder()\n                        .uri(asset.path)\n                        .body(Body::empty())\n                        .unwrap(),\n                )\n                .await\n                .unwrap();\n\n            assert_eq!(response.status(), StatusCode::OK);\n            assert_eq!(\n                response.headers().get(CONTENT_TYPE).unwrap(),\n                asset.content_type\n            );\n        });\n    }\n\n    #[test]\n    fn unknown_non_api_routes_fallback_to_index() {\n        run_async(async {\n            let response = test_router()\n                .oneshot(\n                    Request::builder()\n                        .uri(\"/dashboard/models\")\n                        .body(Body::empty())\n                        .unwrap(),\n                )\n                .await\n                .unwrap();\n\n            assert_eq!(response.status(), StatusCode::OK);\n            assert_eq!(\n                response.headers().get(CONTENT_TYPE).unwrap(),\n                \"text/html; charset=utf-8\"\n            );\n        });\n    }\n\n    #[test]\n    fn existing_api_route_response_shape_is_preserved() {\n        run_async(async {\n            let response = test_router()\n                .oneshot(\n                    Request::builder()\n                        .uri(\"/api/v1/system\")\n                        .body(Body::empty())\n                        .unwrap(),\n                )\n                .await\n                .unwrap();\n\n            assert_eq!(response.status(), StatusCode::OK);\n            let bytes = response.into_body().collect().await.unwrap().to_bytes();\n            let value: serde_json::Value = serde_json::from_slice(&bytes).unwrap();\n            assert!(value.get(\"node\").is_some());\n            assert!(value.get(\"system\").is_some());\n        });\n    }\n\n    #[test]\n    fn unknown_api_paths_do_not_fallback_to_html() {\n        run_async(async {\n            let response = test_router()\n                .oneshot(\n                    Request::builder()\n                        .uri(\"/api/v1/not-found\")\n                        .body(Body::empty())\n                        .unwrap(),\n                )\n                .await\n                .unwrap();\n\n            assert_eq!(response.status(), StatusCode::NOT_FOUND);\n        });\n    }\n}\n"
  },
  {
    "path": "llmfit-tui/src/theme.rs",
    "content": "use ratatui::style::Color;\nuse std::fs;\nuse std::path::PathBuf;\n\n/// Available color themes for the TUI.\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\npub enum Theme {\n    Default,\n    Dracula,\n    Solarized,\n    Nord,\n    Monokai,\n    Gruvbox,\n    CatppuccinLatte,\n    CatppuccinFrappe,\n    CatppuccinMacchiato,\n    CatppuccinMocha,\n}\n\nimpl Theme {\n    pub fn label(&self) -> &'static str {\n        match self {\n            Theme::Default => \"Default\",\n            Theme::Dracula => \"Dracula\",\n            Theme::Solarized => \"Solarized\",\n            Theme::Nord => \"Nord\",\n            Theme::Monokai => \"Monokai\",\n            Theme::Gruvbox => \"Gruvbox\",\n            Theme::CatppuccinLatte => \"Catppuccin Latte\",\n            Theme::CatppuccinFrappe => \"Catppuccin Frappé\",\n            Theme::CatppuccinMacchiato => \"Catppuccin Macchiato\",\n            Theme::CatppuccinMocha => \"Catppuccin Mocha\",\n        }\n    }\n\n    pub fn next(&self) -> Self {\n        match self {\n            Theme::Default => Theme::Dracula,\n            Theme::Dracula => Theme::Solarized,\n            Theme::Solarized => Theme::Nord,\n            Theme::Nord => Theme::Monokai,\n            Theme::Monokai => Theme::Gruvbox,\n            Theme::Gruvbox => Theme::CatppuccinLatte,\n            Theme::CatppuccinLatte => Theme::CatppuccinFrappe,\n            Theme::CatppuccinFrappe => Theme::CatppuccinMacchiato,\n            Theme::CatppuccinMacchiato => Theme::CatppuccinMocha,\n            Theme::CatppuccinMocha => Theme::Default,\n        }\n    }\n\n    pub fn colors(&self) -> ThemeColors {\n        match self {\n            Theme::Default => default_colors(),\n            Theme::Dracula => dracula_colors(),\n            Theme::Solarized => solarized_colors(),\n            Theme::Nord => nord_colors(),\n            Theme::Monokai => monokai_colors(),\n            Theme::Gruvbox => gruvbox_colors(),\n            Theme::CatppuccinLatte => catppuccin_latte_colors(),\n            Theme::CatppuccinFrappe => catppuccin_frappe_colors(),\n            Theme::CatppuccinMacchiato => catppuccin_macchiato_colors(),\n            Theme::CatppuccinMocha => catppuccin_mocha_colors(),\n        }\n    }\n\n    /// Path to the config file: ~/.config/llmfit/theme\n    fn config_path() -> Option<PathBuf> {\n        let home = std::env::var(\"HOME\")\n            .or_else(|_| std::env::var(\"USERPROFILE\"))\n            .ok()?;\n        Some(\n            PathBuf::from(home)\n                .join(\".config\")\n                .join(\"llmfit\")\n                .join(\"theme\"),\n        )\n    }\n\n    /// Save the current theme to disk.\n    pub fn save(&self) {\n        if let Some(path) = Self::config_path() {\n            if let Some(parent) = path.parent() {\n                let _ = fs::create_dir_all(parent);\n            }\n            let _ = fs::write(&path, self.label());\n        }\n    }\n\n    /// Load the saved theme from disk, falling back to Default.\n    pub fn load() -> Self {\n        Self::config_path()\n            .and_then(|path| fs::read_to_string(path).ok())\n            .map(|s| Self::from_label(s.trim()))\n            .unwrap_or(Theme::Default)\n    }\n\n    fn from_label(s: &str) -> Self {\n        match s {\n            \"Dracula\" => Theme::Dracula,\n            \"Solarized\" => Theme::Solarized,\n            \"Nord\" => Theme::Nord,\n            \"Monokai\" => Theme::Monokai,\n            \"Gruvbox\" => Theme::Gruvbox,\n            \"Catppuccin Latte\" => Theme::CatppuccinLatte,\n            \"Catppuccin Frappé\" => Theme::CatppuccinFrappe,\n            \"Catppuccin Macchiato\" => Theme::CatppuccinMacchiato,\n            \"Catppuccin Mocha\" => Theme::CatppuccinMocha,\n            _ => Theme::Default,\n        }\n    }\n}\n\n/// All semantic colors used throughout the TUI, mapped from each theme.\npub struct ThemeColors {\n    // General\n    pub bg: Color,\n    pub fg: Color,\n    pub muted: Color,\n    pub border: Color,\n    pub title: Color,\n    pub highlight_bg: Color,\n\n    // Accent colors\n    pub accent: Color,\n    pub accent_secondary: Color,\n\n    // Status colors\n    pub good: Color,\n    pub warning: Color,\n    pub error: Color,\n    pub info: Color,\n\n    // Score colors\n    pub score_high: Color,\n    pub score_mid: Color,\n    pub score_low: Color,\n\n    // Fit levels\n    pub fit_perfect: Color,\n    pub fit_good: Color,\n    pub fit_marginal: Color,\n    pub fit_tight: Color,\n\n    // Run modes\n    pub mode_gpu: Color,\n    pub mode_moe: Color,\n    pub mode_offload: Color,\n    pub mode_cpu: Color,\n\n    // Status bar\n    pub status_bg: Color,\n    pub status_fg: Color,\n}\n\nfn default_colors() -> ThemeColors {\n    // Default theme uses Color::Reset for fg so it inherits the terminal's\n    // foreground color, making it work on both light and dark terminals.\n    // Inspired by AndiDog's light-theme-support approach.\n    ThemeColors {\n        bg: Color::Reset,\n        fg: Color::Reset,\n        muted: Color::DarkGray,\n        border: Color::DarkGray,\n        title: Color::Green,\n        highlight_bg: Color::LightBlue,\n\n        accent: Color::Cyan,\n        accent_secondary: Color::Yellow,\n\n        good: Color::Green,\n        warning: Color::Yellow,\n        error: Color::Red,\n        info: Color::Cyan,\n\n        score_high: Color::Green,\n        score_mid: Color::Yellow,\n        score_low: Color::Red,\n\n        fit_perfect: Color::Green,\n        fit_good: Color::Yellow,\n        fit_marginal: Color::Magenta,\n        fit_tight: Color::Red,\n\n        mode_gpu: Color::Green,\n        mode_moe: Color::Cyan,\n        mode_offload: Color::Yellow,\n        mode_cpu: Color::DarkGray,\n\n        status_bg: Color::Green,\n        status_fg: Color::Black,\n    }\n}\n\nfn dracula_colors() -> ThemeColors {\n    // Dracula: dark purple bg, pastel accents\n    ThemeColors {\n        bg: Color::Rgb(40, 42, 54),\n        fg: Color::Rgb(248, 248, 242),\n        muted: Color::Rgb(98, 114, 164),\n        border: Color::Rgb(68, 71, 90),\n        title: Color::Rgb(80, 250, 123),\n        highlight_bg: Color::Rgb(68, 71, 90),\n\n        accent: Color::Rgb(139, 233, 253),\n        accent_secondary: Color::Rgb(241, 250, 140),\n\n        good: Color::Rgb(80, 250, 123),\n        warning: Color::Rgb(241, 250, 140),\n        error: Color::Rgb(255, 85, 85),\n        info: Color::Rgb(139, 233, 253),\n\n        score_high: Color::Rgb(80, 250, 123),\n        score_mid: Color::Rgb(241, 250, 140),\n        score_low: Color::Rgb(255, 85, 85),\n\n        fit_perfect: Color::Rgb(80, 250, 123),\n        fit_good: Color::Rgb(241, 250, 140),\n        fit_marginal: Color::Rgb(189, 147, 249),\n        fit_tight: Color::Rgb(255, 85, 85),\n\n        mode_gpu: Color::Rgb(80, 250, 123),\n        mode_moe: Color::Rgb(139, 233, 253),\n        mode_offload: Color::Rgb(241, 250, 140),\n        mode_cpu: Color::Rgb(98, 114, 164),\n\n        status_bg: Color::Rgb(189, 147, 249),\n        status_fg: Color::Rgb(40, 42, 54),\n    }\n}\n\nfn solarized_colors() -> ThemeColors {\n    // Solarized Dark\n    ThemeColors {\n        bg: Color::Rgb(0, 43, 54),\n        fg: Color::Rgb(131, 148, 150),\n        muted: Color::Rgb(88, 110, 117),\n        border: Color::Rgb(88, 110, 117),\n        title: Color::Rgb(133, 153, 0),\n        highlight_bg: Color::Rgb(7, 54, 66),\n\n        accent: Color::Rgb(38, 139, 210),\n        accent_secondary: Color::Rgb(181, 137, 0),\n\n        good: Color::Rgb(133, 153, 0),\n        warning: Color::Rgb(181, 137, 0),\n        error: Color::Rgb(220, 50, 47),\n        info: Color::Rgb(38, 139, 210),\n\n        score_high: Color::Rgb(133, 153, 0),\n        score_mid: Color::Rgb(181, 137, 0),\n        score_low: Color::Rgb(220, 50, 47),\n\n        fit_perfect: Color::Rgb(133, 153, 0),\n        fit_good: Color::Rgb(181, 137, 0),\n        fit_marginal: Color::Rgb(211, 54, 130),\n        fit_tight: Color::Rgb(220, 50, 47),\n\n        mode_gpu: Color::Rgb(133, 153, 0),\n        mode_moe: Color::Rgb(42, 161, 152),\n        mode_offload: Color::Rgb(181, 137, 0),\n        mode_cpu: Color::Rgb(88, 110, 117),\n\n        status_bg: Color::Rgb(38, 139, 210),\n        status_fg: Color::Rgb(253, 246, 227),\n    }\n}\n\nfn nord_colors() -> ThemeColors {\n    // Nord: cool blue-gray palette\n    ThemeColors {\n        bg: Color::Rgb(46, 52, 64),\n        fg: Color::Rgb(216, 222, 233),\n        muted: Color::Rgb(76, 86, 106),\n        border: Color::Rgb(67, 76, 94),\n        title: Color::Rgb(163, 190, 140),\n        highlight_bg: Color::Rgb(59, 66, 82),\n\n        accent: Color::Rgb(136, 192, 208),\n        accent_secondary: Color::Rgb(235, 203, 139),\n\n        good: Color::Rgb(163, 190, 140),\n        warning: Color::Rgb(235, 203, 139),\n        error: Color::Rgb(191, 97, 106),\n        info: Color::Rgb(136, 192, 208),\n\n        score_high: Color::Rgb(163, 190, 140),\n        score_mid: Color::Rgb(235, 203, 139),\n        score_low: Color::Rgb(191, 97, 106),\n\n        fit_perfect: Color::Rgb(163, 190, 140),\n        fit_good: Color::Rgb(235, 203, 139),\n        fit_marginal: Color::Rgb(180, 142, 173),\n        fit_tight: Color::Rgb(191, 97, 106),\n\n        mode_gpu: Color::Rgb(163, 190, 140),\n        mode_moe: Color::Rgb(136, 192, 208),\n        mode_offload: Color::Rgb(235, 203, 139),\n        mode_cpu: Color::Rgb(76, 86, 106),\n\n        status_bg: Color::Rgb(129, 161, 193),\n        status_fg: Color::Rgb(46, 52, 64),\n    }\n}\n\nfn monokai_colors() -> ThemeColors {\n    // Monokai Pro\n    ThemeColors {\n        bg: Color::Rgb(39, 40, 34),\n        fg: Color::Rgb(248, 248, 242),\n        muted: Color::Rgb(117, 113, 94),\n        border: Color::Rgb(73, 72, 62),\n        title: Color::Rgb(166, 226, 46),\n        highlight_bg: Color::Rgb(73, 72, 62),\n\n        accent: Color::Rgb(102, 217, 239),\n        accent_secondary: Color::Rgb(230, 219, 116),\n\n        good: Color::Rgb(166, 226, 46),\n        warning: Color::Rgb(230, 219, 116),\n        error: Color::Rgb(249, 38, 114),\n        info: Color::Rgb(102, 217, 239),\n\n        score_high: Color::Rgb(166, 226, 46),\n        score_mid: Color::Rgb(230, 219, 116),\n        score_low: Color::Rgb(249, 38, 114),\n\n        fit_perfect: Color::Rgb(166, 226, 46),\n        fit_good: Color::Rgb(230, 219, 116),\n        fit_marginal: Color::Rgb(174, 129, 255),\n        fit_tight: Color::Rgb(249, 38, 114),\n\n        mode_gpu: Color::Rgb(166, 226, 46),\n        mode_moe: Color::Rgb(102, 217, 239),\n        mode_offload: Color::Rgb(230, 219, 116),\n        mode_cpu: Color::Rgb(117, 113, 94),\n\n        status_bg: Color::Rgb(253, 151, 31),\n        status_fg: Color::Rgb(39, 40, 34),\n    }\n}\n\nfn gruvbox_colors() -> ThemeColors {\n    // Gruvbox Dark\n    ThemeColors {\n        bg: Color::Rgb(40, 40, 40),\n        fg: Color::Rgb(235, 219, 178),\n        muted: Color::Rgb(146, 131, 116),\n        border: Color::Rgb(80, 73, 69),\n        title: Color::Rgb(184, 187, 38),\n        highlight_bg: Color::Rgb(60, 56, 54),\n\n        accent: Color::Rgb(131, 165, 152),\n        accent_secondary: Color::Rgb(250, 189, 47),\n\n        good: Color::Rgb(184, 187, 38),\n        warning: Color::Rgb(250, 189, 47),\n        error: Color::Rgb(251, 73, 52),\n        info: Color::Rgb(131, 165, 152),\n\n        score_high: Color::Rgb(184, 187, 38),\n        score_mid: Color::Rgb(250, 189, 47),\n        score_low: Color::Rgb(251, 73, 52),\n\n        fit_perfect: Color::Rgb(184, 187, 38),\n        fit_good: Color::Rgb(250, 189, 47),\n        fit_marginal: Color::Rgb(211, 134, 155),\n        fit_tight: Color::Rgb(251, 73, 52),\n\n        mode_gpu: Color::Rgb(184, 187, 38),\n        mode_moe: Color::Rgb(131, 165, 152),\n        mode_offload: Color::Rgb(250, 189, 47),\n        mode_cpu: Color::Rgb(146, 131, 116),\n\n        status_bg: Color::Rgb(214, 93, 14),\n        status_fg: Color::Rgb(40, 40, 40),\n    }\n}\n\nfn catppuccin_latte_colors() -> ThemeColors {\n    // Catppuccin Latte — light variant\n    // https://catppuccin.com/palette/\n    ThemeColors {\n        bg: Color::Rgb(239, 241, 245),           // Base\n        fg: Color::Rgb(76, 79, 105),             // Text\n        muted: Color::Rgb(140, 143, 161),        // Overlay 1\n        border: Color::Rgb(172, 176, 190),       // Surface 2\n        title: Color::Rgb(64, 160, 43),          // Green\n        highlight_bg: Color::Rgb(204, 208, 218), // Surface 0\n\n        accent: Color::Rgb(30, 102, 245),           // Blue\n        accent_secondary: Color::Rgb(254, 100, 11), // Peach\n\n        good: Color::Rgb(64, 160, 43),     // Green\n        warning: Color::Rgb(223, 142, 29), // Yellow\n        error: Color::Rgb(210, 15, 57),    // Red\n        info: Color::Rgb(23, 146, 153),    // Teal\n\n        score_high: Color::Rgb(64, 160, 43),\n        score_mid: Color::Rgb(223, 142, 29),\n        score_low: Color::Rgb(210, 15, 57),\n\n        fit_perfect: Color::Rgb(64, 160, 43),\n        fit_good: Color::Rgb(223, 142, 29),\n        fit_marginal: Color::Rgb(136, 57, 239), // Mauve\n        fit_tight: Color::Rgb(210, 15, 57),\n\n        mode_gpu: Color::Rgb(64, 160, 43),\n        mode_moe: Color::Rgb(4, 165, 229),      // Sky\n        mode_offload: Color::Rgb(254, 100, 11), // Peach\n        mode_cpu: Color::Rgb(140, 143, 161),    // Overlay 1\n\n        status_bg: Color::Rgb(136, 57, 239),  // Mauve\n        status_fg: Color::Rgb(239, 241, 245), // Base\n    }\n}\n\nfn catppuccin_frappe_colors() -> ThemeColors {\n    // Catppuccin Frappé — low-contrast dark variant\n    // https://catppuccin.com/palette/\n    ThemeColors {\n        bg: Color::Rgb(48, 52, 70),           // Base\n        fg: Color::Rgb(198, 208, 245),        // Text\n        muted: Color::Rgb(131, 139, 167),     // Overlay 1\n        border: Color::Rgb(98, 104, 128),     // Surface 2\n        title: Color::Rgb(166, 209, 137),     // Green\n        highlight_bg: Color::Rgb(65, 69, 89), // Surface 0\n\n        accent: Color::Rgb(140, 170, 238),           // Blue\n        accent_secondary: Color::Rgb(239, 159, 118), // Peach\n\n        good: Color::Rgb(166, 209, 137),    // Green\n        warning: Color::Rgb(229, 200, 144), // Yellow\n        error: Color::Rgb(231, 130, 132),   // Red\n        info: Color::Rgb(153, 209, 219),    // Sky\n\n        score_high: Color::Rgb(166, 209, 137),\n        score_mid: Color::Rgb(229, 200, 144),\n        score_low: Color::Rgb(231, 130, 132),\n\n        fit_perfect: Color::Rgb(166, 209, 137),\n        fit_good: Color::Rgb(229, 200, 144),\n        fit_marginal: Color::Rgb(202, 158, 230), // Mauve\n        fit_tight: Color::Rgb(231, 130, 132),\n\n        mode_gpu: Color::Rgb(166, 209, 137),\n        mode_moe: Color::Rgb(153, 209, 219),     // Sky\n        mode_offload: Color::Rgb(239, 159, 118), // Peach\n        mode_cpu: Color::Rgb(131, 139, 167),     // Overlay 1\n\n        status_bg: Color::Rgb(186, 187, 241), // Lavender\n        status_fg: Color::Rgb(35, 38, 52),    // Crust\n    }\n}\n\nfn catppuccin_macchiato_colors() -> ThemeColors {\n    // Catppuccin Macchiato — medium-contrast dark variant\n    // https://catppuccin.com/palette/\n    ThemeColors {\n        bg: Color::Rgb(36, 39, 58),           // Base\n        fg: Color::Rgb(202, 211, 245),        // Text\n        muted: Color::Rgb(128, 135, 162),     // Overlay 1\n        border: Color::Rgb(91, 96, 120),      // Surface 2\n        title: Color::Rgb(166, 218, 149),     // Green\n        highlight_bg: Color::Rgb(54, 58, 79), // Surface 0\n\n        accent: Color::Rgb(138, 173, 244),           // Blue\n        accent_secondary: Color::Rgb(245, 169, 127), // Peach\n\n        good: Color::Rgb(166, 218, 149),    // Green\n        warning: Color::Rgb(238, 212, 159), // Yellow\n        error: Color::Rgb(237, 135, 150),   // Red\n        info: Color::Rgb(145, 215, 227),    // Sky\n\n        score_high: Color::Rgb(166, 218, 149),\n        score_mid: Color::Rgb(238, 212, 159),\n        score_low: Color::Rgb(237, 135, 150),\n\n        fit_perfect: Color::Rgb(166, 218, 149),\n        fit_good: Color::Rgb(238, 212, 159),\n        fit_marginal: Color::Rgb(198, 160, 246), // Mauve\n        fit_tight: Color::Rgb(237, 135, 150),\n\n        mode_gpu: Color::Rgb(166, 218, 149),\n        mode_moe: Color::Rgb(145, 215, 227),     // Sky\n        mode_offload: Color::Rgb(245, 169, 127), // Peach\n        mode_cpu: Color::Rgb(128, 135, 162),     // Overlay 1\n\n        status_bg: Color::Rgb(183, 189, 248), // Lavender\n        status_fg: Color::Rgb(24, 25, 38),    // Crust\n    }\n}\n\nfn catppuccin_mocha_colors() -> ThemeColors {\n    // Catppuccin Mocha — darkest variant (the original)\n    // https://catppuccin.com/palette/\n    ThemeColors {\n        bg: Color::Rgb(30, 30, 46),           // Base\n        fg: Color::Rgb(205, 214, 244),        // Text\n        muted: Color::Rgb(127, 132, 156),     // Overlay 1\n        border: Color::Rgb(88, 91, 112),      // Surface 2\n        title: Color::Rgb(166, 227, 161),     // Green\n        highlight_bg: Color::Rgb(49, 50, 68), // Surface 0\n\n        accent: Color::Rgb(137, 180, 250),           // Blue\n        accent_secondary: Color::Rgb(250, 179, 135), // Peach\n\n        good: Color::Rgb(166, 227, 161),    // Green\n        warning: Color::Rgb(249, 226, 175), // Yellow\n        error: Color::Rgb(243, 139, 168),   // Red\n        info: Color::Rgb(137, 220, 235),    // Sky\n\n        score_high: Color::Rgb(166, 227, 161),\n        score_mid: Color::Rgb(249, 226, 175),\n        score_low: Color::Rgb(243, 139, 168),\n\n        fit_perfect: Color::Rgb(166, 227, 161),\n        fit_good: Color::Rgb(249, 226, 175),\n        fit_marginal: Color::Rgb(203, 166, 247), // Mauve\n        fit_tight: Color::Rgb(243, 139, 168),\n\n        mode_gpu: Color::Rgb(166, 227, 161),\n        mode_moe: Color::Rgb(137, 220, 235),     // Sky\n        mode_offload: Color::Rgb(250, 179, 135), // Peach\n        mode_cpu: Color::Rgb(127, 132, 156),     // Overlay 1\n\n        status_bg: Color::Rgb(180, 190, 254), // Lavender\n        status_fg: Color::Rgb(17, 17, 27),    // Crust\n    }\n}\n"
  },
  {
    "path": "llmfit-tui/src/tui_app.rs",
    "content": "use llmfit_core::fit::{FitLevel, ModelFit, SortColumn, backend_compatible};\nuse llmfit_core::hardware::SystemSpecs;\nuse llmfit_core::models::{Capability, ModelDatabase, UseCase};\nuse llmfit_core::plan::{PlanEstimate, PlanRequest, estimate_model_plan};\nuse llmfit_core::providers::{\n    self, DockerModelRunnerProvider, LlamaCppProvider, LmStudioProvider, MlxProvider,\n    ModelProvider, OllamaProvider, PullEvent, PullHandle,\n};\n\nuse std::collections::{HashMap, HashSet};\nuse std::sync::mpsc;\n\nuse crate::theme::Theme;\n\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\npub enum InputMode {\n    Normal,\n    Visual,\n    Select,\n    Search,\n    Plan,\n    ProviderPopup,\n    UseCasePopup,\n    CapabilityPopup,\n    DownloadProviderPopup,\n    QuantPopup,\n    RunModePopup,\n    ParamsBucketPopup,\n}\n\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\npub enum PlanField {\n    Context,\n    Quant,\n    TargetTps,\n}\n\nimpl PlanField {\n    fn next(self) -> Self {\n        match self {\n            PlanField::Context => PlanField::Quant,\n            PlanField::Quant => PlanField::TargetTps,\n            PlanField::TargetTps => PlanField::Context,\n        }\n    }\n\n    fn prev(self) -> Self {\n        match self {\n            PlanField::Context => PlanField::TargetTps,\n            PlanField::Quant => PlanField::Context,\n            PlanField::TargetTps => PlanField::Quant,\n        }\n    }\n}\n\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\npub enum FitFilter {\n    All,\n    Perfect,\n    Good,\n    Marginal,\n    TooTight,\n    Runnable, // Perfect + Good + Marginal (excludes TooTight)\n}\n\nimpl FitFilter {\n    pub fn label(&self) -> &str {\n        match self {\n            FitFilter::All => \"All\",\n            FitFilter::Perfect => \"Perfect\",\n            FitFilter::Good => \"Good\",\n            FitFilter::Marginal => \"Marginal\",\n            FitFilter::TooTight => \"Too Tight\",\n            FitFilter::Runnable => \"Runnable\",\n        }\n    }\n\n    pub fn next(&self) -> Self {\n        match self {\n            FitFilter::All => FitFilter::Runnable,\n            FitFilter::Runnable => FitFilter::Perfect,\n            FitFilter::Perfect => FitFilter::Good,\n            FitFilter::Good => FitFilter::Marginal,\n            FitFilter::Marginal => FitFilter::TooTight,\n            FitFilter::TooTight => FitFilter::All,\n        }\n    }\n}\n\n/// Filter by model availability / download readiness.\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\npub enum AvailabilityFilter {\n    All,\n    HasGguf,   // Has GGUF download sources (unsloth, bartowski, etc.)\n    Installed, // Already installed in a local runtime\n}\n\nimpl AvailabilityFilter {\n    pub fn label(&self) -> &str {\n        match self {\n            AvailabilityFilter::All => \"All\",\n            AvailabilityFilter::HasGguf => \"GGUF Avail\",\n            AvailabilityFilter::Installed => \"Installed\",\n        }\n    }\n\n    pub fn next(&self) -> Self {\n        match self {\n            AvailabilityFilter::All => AvailabilityFilter::HasGguf,\n            AvailabilityFilter::HasGguf => AvailabilityFilter::Installed,\n            AvailabilityFilter::Installed => AvailabilityFilter::All,\n        }\n    }\n}\n\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\npub enum DownloadProvider {\n    Ollama,\n    Mlx,\n    LlamaCpp,\n    DockerModelRunner,\n    LmStudio,\n}\n\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\npub enum DownloadCapability {\n    Unknown,\n    /// Bitfield: OLLAMA=1, LLAMACPP=2, DOCKER=4\n    Known(u8),\n}\n\npub const DL_OLLAMA: u8 = 0b0001;\npub const DL_LLAMACPP: u8 = 0b0010;\npub const DL_DOCKER: u8 = 0b0100;\npub const DL_LMSTUDIO: u8 = 0b1000;\n\n#[derive(Debug, Clone, Copy, PartialEq, Eq)]\nenum ActivePullProvider {\n    Ollama,\n    Mlx,\n    LlamaCpp,\n    DockerModelRunner,\n    LmStudio,\n}\n\nimpl ActivePullProvider {\n    fn label(self) -> &'static str {\n        match self {\n            ActivePullProvider::Ollama => \"Ollama\",\n            ActivePullProvider::Mlx => \"MLX\",\n            ActivePullProvider::LlamaCpp => \"llama.cpp\",\n            ActivePullProvider::DockerModelRunner => \"Docker\",\n            ActivePullProvider::LmStudio => \"LM Studio\",\n        }\n    }\n}\n\npub struct App {\n    pub should_quit: bool,\n    pub input_mode: InputMode,\n    pub search_query: String,\n    pub cursor_position: usize,\n\n    // Data\n    pub specs: SystemSpecs,\n    pub all_fits: Vec<ModelFit>,\n    pub filtered_fits: Vec<usize>, // indices into all_fits\n    pub providers: Vec<String>,\n    pub selected_providers: Vec<bool>,\n    pub use_cases: Vec<UseCase>,\n    pub selected_use_cases: Vec<bool>,\n    pub capabilities: Vec<Capability>,\n    pub selected_capabilities: Vec<bool>,\n\n    // Filters\n    pub fit_filter: FitFilter,\n    pub availability_filter: AvailabilityFilter,\n    pub installed_first: bool,\n    pub sort_column: SortColumn,\n    pub sort_ascending: bool,\n\n    // Table state\n    pub selected_row: usize,\n\n    // Detail view\n    pub show_detail: bool,\n    pub show_compare: bool,\n    pub compare_mark_model: Option<String>,\n    pub show_multi_compare: bool,\n    pub compare_models: Vec<usize>, // indices into all_fits\n    pub compare_scroll: usize,      // horizontal scroll for multi-compare\n    pub show_plan: bool,\n    plan_model_idx: Option<usize>,\n    pub plan_field: PlanField,\n    pub plan_context_input: String,\n    pub plan_quant_input: String,\n    pub plan_target_tps_input: String,\n    pub plan_cursor_position: usize,\n    pub plan_estimate: Option<PlanEstimate>,\n    pub plan_error: Option<String>,\n\n    // Provider popup\n    pub provider_cursor: usize,\n    pub use_case_cursor: usize,\n    pub capability_cursor: usize,\n    pub download_provider_cursor: usize,\n    pub download_provider_options: Vec<DownloadProvider>,\n    pub download_provider_model: Option<String>,\n\n    // Provider state\n    pub ollama_available: bool,\n    pub ollama_binary_available: bool,\n    pub ollama_installed: HashSet<String>,\n    pub ollama_installed_count: usize,\n    ollama: OllamaProvider,\n    pub mlx_available: bool,\n    pub mlx_installed: HashSet<String>,\n    mlx: MlxProvider,\n    pub llamacpp_available: bool,\n    pub llamacpp_installed: HashSet<String>,\n    pub llamacpp_installed_count: usize,\n    llamacpp: LlamaCppProvider,\n    pub docker_mr_available: bool,\n    pub docker_mr_installed: HashSet<String>,\n    pub docker_mr_installed_count: usize,\n    docker_mr: DockerModelRunnerProvider,\n    pub lmstudio_available: bool,\n    pub lmstudio_installed: HashSet<String>,\n    pub lmstudio_installed_count: usize,\n    lmstudio: LmStudioProvider,\n\n    // Download state\n    pub pull_active: Option<PullHandle>,\n    pub pull_status: Option<String>,\n    pub pull_percent: Option<f64>,\n    pub pull_model_name: Option<String>,\n    pull_provider: Option<ActivePullProvider>,\n    pub download_capabilities: HashMap<String, DownloadCapability>,\n    download_capability_inflight: HashSet<String>,\n    download_capability_tx: mpsc::Sender<(String, DownloadCapability)>,\n    download_capability_rx: mpsc::Receiver<(String, DownloadCapability)>,\n    /// Animation frame counter, incremented every tick while pulling.\n    pub tick_count: u64,\n    /// When true, the next 'd' press will confirm and start the download.\n    pub confirm_download: bool,\n\n    // Visual mode\n    pub visual_anchor: Option<usize>,\n\n    // Select mode\n    pub select_column: usize,\n\n    // Quant filter (popup)\n    pub quants: Vec<String>,\n    pub selected_quants: Vec<bool>,\n    pub quant_cursor: usize,\n\n    // RunMode filter (popup)\n    pub run_modes: Vec<String>,\n    pub selected_run_modes: Vec<bool>,\n    pub run_mode_cursor: usize,\n\n    // Params bucket filter (popup)\n    pub params_buckets: Vec<String>,\n    pub selected_params_buckets: Vec<bool>,\n    pub params_bucket_cursor: usize,\n\n    // Theme\n    pub theme: Theme,\n\n    /// How many models we silently dropped because they can't run on this\n    /// hardware — shown in the system bar so users aren't left wondering\n    /// why the list looks shorter than expected.\n    pub backend_hidden_count: usize,\n}\n\nimpl App {\n    pub fn with_specs_and_context(specs: SystemSpecs, context_limit: Option<u32>) -> Self {\n        let db = ModelDatabase::new();\n\n        // Detect Ollama\n        let ollama = OllamaProvider::new();\n        let (ollama_available, ollama_installed, ollama_installed_count) =\n            ollama.detect_with_installed();\n        let ollama_binary_available = command_exists(\"ollama\");\n\n        // Detect MLX\n        let mlx = MlxProvider::new();\n        let (mlx_available, mlx_installed) = mlx.detect_with_installed();\n\n        // Detect llama.cpp\n        let llamacpp = LlamaCppProvider::new();\n        let llamacpp_available = llamacpp.is_available();\n        let (llamacpp_installed, llamacpp_installed_count) = llamacpp.installed_models_counted();\n\n        // Detect Docker Model Runner\n        let docker_mr = DockerModelRunnerProvider::new();\n        let (docker_mr_available, docker_mr_installed, docker_mr_installed_count) =\n            docker_mr.detect_with_installed();\n\n        // Detect LM Studio\n        let lmstudio = LmStudioProvider::new();\n        let (lmstudio_available, lmstudio_installed, lmstudio_installed_count) =\n            lmstudio.detect_with_installed();\n\n        // Track how many we're skipping so the UI can surface it.\n        let backend_hidden_count = db\n            .get_all_models()\n            .iter()\n            .filter(|m| !backend_compatible(m, &specs))\n            .count();\n\n        // Only analyze models that can actually run on this hardware.\n        let mut all_fits: Vec<ModelFit> = db\n            .get_all_models()\n            .iter()\n            .filter(|m| backend_compatible(m, &specs))\n            .map(|m| {\n                let mut fit = ModelFit::analyze_with_context_limit(m, &specs, context_limit);\n                fit.installed = providers::is_model_installed(&m.name, &ollama_installed)\n                    || providers::is_model_installed_mlx(&m.name, &mlx_installed)\n                    || providers::is_model_installed_llamacpp(&m.name, &llamacpp_installed)\n                    || providers::is_model_installed_docker_mr(&m.name, &docker_mr_installed)\n                    || providers::is_model_installed_lmstudio(&m.name, &lmstudio_installed);\n                fit\n            })\n            .collect();\n\n        // Sort by fit level then RAM usage\n        all_fits = llmfit_core::fit::rank_models_by_fit(all_fits);\n\n        // Extract unique providers\n        let mut model_providers: Vec<String> = all_fits\n            .iter()\n            .map(|f| f.model.provider.clone())\n            .collect::<std::collections::BTreeSet<_>>()\n            .into_iter()\n            .collect();\n        model_providers.sort();\n\n        let selected_providers = vec![true; model_providers.len()];\n        let model_use_cases = [\n            UseCase::General,\n            UseCase::Coding,\n            UseCase::Reasoning,\n            UseCase::Chat,\n            UseCase::Multimodal,\n            UseCase::Embedding,\n        ]\n        .into_iter()\n        .filter(|uc| all_fits.iter().any(|f| f.use_case == *uc))\n        .collect::<Vec<_>>();\n        let selected_use_cases = vec![true; model_use_cases.len()];\n\n        let model_capabilities = Capability::all().to_vec();\n        let selected_capabilities = vec![true; model_capabilities.len()];\n\n        // Extract unique quantizations\n        let mut model_quants: Vec<String> = all_fits\n            .iter()\n            .map(|f| f.best_quant.clone())\n            .collect::<std::collections::BTreeSet<_>>()\n            .into_iter()\n            .collect();\n        model_quants.sort();\n        let selected_quants = vec![true; model_quants.len()];\n\n        // Run modes\n        let model_run_modes = vec![\n            \"GPU\".to_string(),\n            \"MoE\".to_string(),\n            \"CPU+GPU\".to_string(),\n            \"CPU\".to_string(),\n        ];\n        let selected_run_modes = vec![true; model_run_modes.len()];\n\n        // Params buckets\n        let params_buckets = vec![\n            \"<3B\".to_string(),\n            \"3-7B\".to_string(),\n            \"7-14B\".to_string(),\n            \"14-30B\".to_string(),\n            \"30-70B\".to_string(),\n            \"70B+\".to_string(),\n        ];\n        let selected_params_buckets = vec![true; params_buckets.len()];\n\n        let filtered_count = all_fits.len();\n\n        let (download_capability_tx, download_capability_rx) = mpsc::channel();\n\n        let mut app = App {\n            should_quit: false,\n            input_mode: InputMode::Normal,\n            search_query: String::new(),\n            cursor_position: 0,\n            specs,\n            all_fits,\n            filtered_fits: (0..filtered_count).collect(),\n            providers: model_providers,\n            selected_providers,\n            use_cases: model_use_cases,\n            selected_use_cases,\n            capabilities: model_capabilities,\n            selected_capabilities,\n            fit_filter: FitFilter::All,\n            availability_filter: AvailabilityFilter::All,\n            installed_first: false,\n            sort_column: SortColumn::Score,\n            sort_ascending: false,\n            selected_row: 0,\n            show_detail: false,\n            show_compare: false,\n            compare_mark_model: None,\n            show_multi_compare: false,\n            compare_models: Vec::new(),\n            compare_scroll: 0,\n            show_plan: false,\n            plan_model_idx: None,\n            plan_field: PlanField::Context,\n            plan_context_input: String::new(),\n            plan_quant_input: String::new(),\n            plan_target_tps_input: String::new(),\n            plan_cursor_position: 0,\n            plan_estimate: None,\n            plan_error: None,\n            provider_cursor: 0,\n            use_case_cursor: 0,\n            capability_cursor: 0,\n            download_provider_cursor: 0,\n            download_provider_options: Vec::new(),\n            download_provider_model: None,\n            ollama_available,\n            ollama_binary_available,\n            ollama_installed,\n            ollama_installed_count,\n            ollama,\n            mlx_available,\n            mlx_installed,\n            mlx,\n            llamacpp_available,\n            llamacpp_installed,\n            llamacpp_installed_count,\n            llamacpp,\n            docker_mr_available,\n            docker_mr_installed,\n            docker_mr_installed_count,\n            docker_mr,\n            lmstudio_available,\n            lmstudio_installed,\n            lmstudio_installed_count,\n            lmstudio,\n            pull_active: None,\n            pull_status: None,\n            pull_percent: None,\n            pull_model_name: None,\n            pull_provider: None,\n            download_capabilities: HashMap::new(),\n            download_capability_inflight: HashSet::new(),\n            download_capability_tx,\n            download_capability_rx,\n            tick_count: 0,\n            confirm_download: false,\n            visual_anchor: None,\n            select_column: 2, // start on Model column\n            quants: model_quants,\n            selected_quants,\n            quant_cursor: 0,\n            run_modes: model_run_modes,\n            selected_run_modes,\n            run_mode_cursor: 0,\n            params_buckets,\n            selected_params_buckets,\n            params_bucket_cursor: 0,\n            theme: Theme::load(),\n            backend_hidden_count,\n        };\n\n        app.apply_filters();\n        app.enqueue_capability_probes_for_visible(24);\n        app\n    }\n\n    pub fn apply_filters(&mut self) {\n        let query = self.search_query.to_lowercase();\n        // Split query into space-separated terms for fuzzy matching\n        let terms: Vec<&str> = query.split_whitespace().collect();\n\n        self.filtered_fits = self\n            .all_fits\n            .iter()\n            .enumerate()\n            .filter(|(_, fit)| {\n                // Search filter: all terms must match (fuzzy/AND logic)\n                let matches_search = if terms.is_empty() {\n                    true\n                } else {\n                    let caps_text = fit\n                        .model\n                        .capabilities\n                        .iter()\n                        .map(|c| c.label().to_lowercase())\n                        .collect::<Vec<_>>()\n                        .join(\" \");\n                    // Combine all searchable fields into one string\n                    let searchable = format!(\n                        \"{} {} {} {} {} {}\",\n                        fit.model.name.to_lowercase(),\n                        fit.model.provider.to_lowercase(),\n                        fit.model.parameter_count.to_lowercase(),\n                        fit.model.use_case.to_lowercase(),\n                        fit.use_case.label().to_lowercase(),\n                        caps_text\n                    );\n                    // All terms must be present (AND logic)\n                    terms.iter().all(|term| searchable.contains(term))\n                };\n\n                // Provider filter\n                let provider_idx = self.providers.iter().position(|p| p == &fit.model.provider);\n                let matches_provider = provider_idx\n                    .map(|idx| self.selected_providers[idx])\n                    .unwrap_or(true);\n                let use_case_idx = self.use_cases.iter().position(|uc| *uc == fit.use_case);\n                let matches_use_case = use_case_idx\n                    .map(|idx| self.selected_use_cases[idx])\n                    .unwrap_or(true);\n\n                // Hide MLX-only models on non-Apple Silicon systems\n                let is_apple_silicon = self.specs.backend\n                    == llmfit_core::hardware::GpuBackend::Metal\n                    && self.specs.unified_memory;\n                if fit.model.is_mlx_only() && !is_apple_silicon {\n                    return false;\n                }\n\n                // Fit filter\n                let matches_fit = match self.fit_filter {\n                    FitFilter::All => true,\n                    FitFilter::Perfect => fit.fit_level == FitLevel::Perfect,\n                    FitFilter::Good => fit.fit_level == FitLevel::Good,\n                    FitFilter::Marginal => fit.fit_level == FitLevel::Marginal,\n                    FitFilter::TooTight => fit.fit_level == FitLevel::TooTight,\n                    FitFilter::Runnable => fit.fit_level != FitLevel::TooTight,\n                };\n\n                // Availability filter\n                let matches_availability = match self.availability_filter {\n                    AvailabilityFilter::All => true,\n                    AvailabilityFilter::HasGguf => !fit.model.gguf_sources.is_empty(),\n                    AvailabilityFilter::Installed => fit.installed,\n                };\n\n                // Capability filter\n                let matches_capability = {\n                    let all_selected = self.selected_capabilities.iter().all(|&s| s);\n                    if all_selected {\n                        true\n                    } else {\n                        self.capabilities\n                            .iter()\n                            .zip(self.selected_capabilities.iter())\n                            .filter(|(_, sel)| **sel)\n                            .any(|(cap, _)| fit.model.capabilities.contains(cap))\n                    }\n                };\n\n                // Quant filter\n                let matches_quant = {\n                    let all_selected = self.selected_quants.iter().all(|&s| s);\n                    if all_selected {\n                        true\n                    } else {\n                        self.quants\n                            .iter()\n                            .zip(self.selected_quants.iter())\n                            .any(|(q, &sel)| sel && *q == fit.best_quant)\n                    }\n                };\n\n                // RunMode filter\n                let matches_run_mode = {\n                    let all_selected = self.selected_run_modes.iter().all(|&s| s);\n                    if all_selected {\n                        true\n                    } else {\n                        let mode_text = fit.run_mode_text();\n                        self.run_modes\n                            .iter()\n                            .zip(self.selected_run_modes.iter())\n                            .any(|(m, &sel)| sel && *m == mode_text)\n                    }\n                };\n\n                // Params bucket filter\n                let matches_params_bucket = {\n                    let all_selected = self.selected_params_buckets.iter().all(|&s| s);\n                    if all_selected {\n                        true\n                    } else {\n                        let params = fit.model.params_b();\n                        let bucket_idx = if params < 3.0 {\n                            0\n                        } else if params < 7.0 {\n                            1\n                        } else if params < 14.0 {\n                            2\n                        } else if params < 30.0 {\n                            3\n                        } else if params < 70.0 {\n                            4\n                        } else {\n                            5\n                        };\n                        self.selected_params_buckets\n                            .get(bucket_idx)\n                            .copied()\n                            .unwrap_or(true)\n                    }\n                };\n\n                matches_search\n                    && matches_provider\n                    && matches_use_case\n                    && matches_fit\n                    && matches_availability\n                    && matches_capability\n                    && matches_quant\n                    && matches_run_mode\n                    && matches_params_bucket\n            })\n            .map(|(i, _)| i)\n            .collect();\n\n        // Clamp selection\n        if self.filtered_fits.is_empty() {\n            self.selected_row = 0;\n        } else if self.selected_row >= self.filtered_fits.len() {\n            self.selected_row = self.filtered_fits.len() - 1;\n        }\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn selected_fit(&self) -> Option<&ModelFit> {\n        self.filtered_fits\n            .get(self.selected_row)\n            .map(|&idx| &self.all_fits[idx])\n    }\n\n    pub fn move_up(&mut self) {\n        self.confirm_download = false;\n        if self.selected_row > 0 {\n            self.selected_row -= 1;\n        }\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn move_down(&mut self) {\n        self.confirm_download = false;\n        if !self.filtered_fits.is_empty() && self.selected_row < self.filtered_fits.len() - 1 {\n            self.selected_row += 1;\n        }\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn page_up(&mut self) {\n        self.confirm_download = false;\n        self.selected_row = self.selected_row.saturating_sub(10);\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn page_down(&mut self) {\n        self.confirm_download = false;\n        if !self.filtered_fits.is_empty() {\n            self.selected_row = (self.selected_row + 10).min(self.filtered_fits.len() - 1);\n        }\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn half_page_up(&mut self) {\n        self.selected_row = self.selected_row.saturating_sub(5);\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn half_page_down(&mut self) {\n        if !self.filtered_fits.is_empty() {\n            self.selected_row = (self.selected_row + 5).min(self.filtered_fits.len() - 1);\n        }\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn home(&mut self) {\n        self.selected_row = 0;\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn end(&mut self) {\n        if !self.filtered_fits.is_empty() {\n            self.selected_row = self.filtered_fits.len() - 1;\n        }\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn cycle_fit_filter(&mut self) {\n        self.fit_filter = self.fit_filter.next();\n        self.apply_filters();\n    }\n\n    pub fn cycle_availability_filter(&mut self) {\n        self.availability_filter = self.availability_filter.next();\n        self.apply_filters();\n    }\n\n    pub fn cycle_sort_column(&mut self) {\n        self.sort_column = self.sort_column.next();\n        self.sort_ascending = false;\n        self.re_sort();\n    }\n\n    pub fn cycle_theme(&mut self) {\n        self.theme = self.theme.next();\n        self.theme.save();\n    }\n\n    pub fn enter_search(&mut self) {\n        self.input_mode = InputMode::Search;\n    }\n\n    pub fn exit_search(&mut self) {\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn search_input(&mut self, c: char) {\n        self.search_query.insert(self.cursor_position, c);\n        self.cursor_position += 1;\n        self.apply_filters();\n    }\n\n    pub fn search_backspace(&mut self) {\n        if self.cursor_position > 0 {\n            self.cursor_position -= 1;\n            self.search_query.remove(self.cursor_position);\n            self.apply_filters();\n        }\n    }\n\n    pub fn search_delete(&mut self) {\n        if self.cursor_position < self.search_query.len() {\n            self.search_query.remove(self.cursor_position);\n            self.apply_filters();\n        }\n    }\n\n    pub fn clear_search(&mut self) {\n        self.search_query.clear();\n        self.cursor_position = 0;\n        self.apply_filters();\n    }\n\n    pub fn toggle_detail(&mut self) {\n        self.show_plan = false;\n        self.show_compare = false;\n        self.show_detail = !self.show_detail;\n    }\n\n    pub fn mark_selected_for_compare(&mut self) {\n        let Some(model_name) = self.selected_fit().map(|fit| fit.model.name.clone()) else {\n            self.pull_status = Some(\"No selected model to mark\".to_string());\n            return;\n        };\n        self.compare_mark_model = Some(model_name.clone());\n        self.pull_status = Some(format!(\"Marked '{}' for compare\", model_name));\n    }\n\n    pub fn clear_compare_mark(&mut self) {\n        self.compare_mark_model = None;\n        self.show_compare = false;\n        self.pull_status = Some(\"Cleared compare mark\".to_string());\n    }\n\n    pub fn selected_compare_pair(&self) -> Option<(&ModelFit, &ModelFit)> {\n        let selected = self.selected_fit()?;\n        let mark_name = self.compare_mark_model.as_deref()?;\n        let marked = self.all_fits.iter().find(|f| f.model.name == mark_name)?;\n        if marked.model.name == selected.model.name {\n            return None;\n        }\n        Some((marked, selected))\n    }\n\n    pub fn toggle_compare_view(&mut self) {\n        if self.show_compare {\n            self.show_compare = false;\n            return;\n        }\n        if self.compare_mark_model.is_none() {\n            self.pull_status = Some(\"No marked model. Press m to mark one first\".to_string());\n            return;\n        }\n        if self.selected_compare_pair().is_none() {\n            self.pull_status =\n                Some(\"Select a different model than the marked one to compare\".to_string());\n            return;\n        }\n        self.show_detail = false;\n        self.show_plan = false;\n        self.show_compare = true;\n    }\n\n    pub fn open_plan_mode(&mut self) {\n        let Some(&fit_idx) = self.filtered_fits.get(self.selected_row) else {\n            return;\n        };\n        let fit = &self.all_fits[fit_idx];\n\n        self.show_detail = false;\n        self.show_compare = false;\n        self.show_plan = true;\n        self.input_mode = InputMode::Plan;\n        self.plan_model_idx = Some(fit_idx);\n        self.plan_field = PlanField::Context;\n        self.plan_context_input = fit.model.context_length.min(8192).to_string();\n        self.plan_quant_input = fit.model.quantization.clone();\n        self.plan_target_tps_input.clear();\n        self.plan_cursor_position = self.plan_context_input.len();\n        self.refresh_plan_estimate();\n    }\n\n    pub fn close_plan_mode(&mut self) {\n        self.show_plan = false;\n        self.plan_model_idx = None;\n        self.plan_estimate = None;\n        self.plan_error = None;\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn plan_next_field(&mut self) {\n        self.plan_field = self.plan_field.next();\n        self.plan_cursor_position = self.active_plan_input().len();\n    }\n\n    pub fn plan_prev_field(&mut self) {\n        self.plan_field = self.plan_field.prev();\n        self.plan_cursor_position = self.active_plan_input().len();\n    }\n\n    pub fn plan_cursor_left(&mut self) {\n        if self.plan_cursor_position > 0 {\n            self.plan_cursor_position -= 1;\n        }\n    }\n\n    pub fn plan_cursor_right(&mut self) {\n        let len = self.active_plan_input().len();\n        if self.plan_cursor_position < len {\n            self.plan_cursor_position += 1;\n        }\n    }\n\n    pub fn plan_input(&mut self, c: char) {\n        match self.plan_field {\n            PlanField::Context => {\n                if !c.is_ascii_digit() {\n                    return;\n                }\n            }\n            PlanField::Quant => {\n                if !(c.is_ascii_alphanumeric() || c == '_' || c == '-') {\n                    return;\n                }\n            }\n            PlanField::TargetTps => {\n                if !(c.is_ascii_digit() || c == '.') {\n                    return;\n                }\n                if c == '.' && self.plan_target_tps_input.contains('.') {\n                    return;\n                }\n            }\n        }\n\n        let cursor = self.plan_cursor_position;\n        let input = self.active_plan_input_mut();\n        if cursor <= input.len() {\n            input.insert(cursor, c);\n            self.plan_cursor_position = cursor + 1;\n            self.refresh_plan_estimate();\n        }\n    }\n\n    pub fn plan_backspace(&mut self) {\n        if self.plan_cursor_position == 0 {\n            return;\n        }\n        let cursor = self.plan_cursor_position;\n        let input = self.active_plan_input_mut();\n        if cursor <= input.len() {\n            input.remove(cursor - 1);\n            self.plan_cursor_position = cursor - 1;\n            self.refresh_plan_estimate();\n        }\n    }\n\n    pub fn plan_delete(&mut self) {\n        let cursor = self.plan_cursor_position;\n        let input = self.active_plan_input_mut();\n        if cursor < input.len() {\n            input.remove(cursor);\n            self.refresh_plan_estimate();\n        }\n    }\n\n    pub fn plan_clear_field(&mut self) {\n        self.active_plan_input_mut().clear();\n        self.plan_cursor_position = 0;\n        self.refresh_plan_estimate();\n    }\n\n    pub fn refresh_plan_estimate(&mut self) {\n        let Some(model_idx) = self.plan_model_idx else {\n            self.plan_estimate = None;\n            self.plan_error = Some(\"No model selected for plan\".to_string());\n            return;\n        };\n        let Some(fit) = self.all_fits.get(model_idx) else {\n            self.plan_estimate = None;\n            self.plan_error = Some(\"Selected model is no longer available\".to_string());\n            return;\n        };\n\n        let context = match self.plan_context_input.trim().parse::<u32>() {\n            Ok(v) if v > 0 => v,\n            _ => {\n                self.plan_estimate = None;\n                self.plan_error = Some(\"Context must be a positive integer\".to_string());\n                return;\n            }\n        };\n\n        let quant = if self.plan_quant_input.trim().is_empty() {\n            None\n        } else {\n            Some(self.plan_quant_input.trim().to_string())\n        };\n\n        let target_tps = if self.plan_target_tps_input.trim().is_empty() {\n            None\n        } else {\n            match self.plan_target_tps_input.trim().parse::<f64>() {\n                Ok(v) if v > 0.0 => Some(v),\n                _ => {\n                    self.plan_estimate = None;\n                    self.plan_error = Some(\"Target TPS must be a positive number\".to_string());\n                    return;\n                }\n            }\n        };\n\n        let request = PlanRequest {\n            context,\n            quant,\n            target_tps,\n        };\n\n        match estimate_model_plan(&fit.model, &request, &self.specs) {\n            Ok(plan) => {\n                self.plan_estimate = Some(plan);\n                self.plan_error = None;\n            }\n            Err(e) => {\n                self.plan_estimate = None;\n                self.plan_error = Some(e);\n            }\n        }\n    }\n\n    pub fn plan_model_name(&self) -> Option<&str> {\n        self.plan_model_idx\n            .and_then(|idx| self.all_fits.get(idx))\n            .map(|fit| fit.model.name.as_str())\n    }\n\n    pub fn open_provider_popup(&mut self) {\n        self.input_mode = InputMode::ProviderPopup;\n        // Don't reset cursor -- keep it where it was last time\n    }\n\n    pub fn close_provider_popup(&mut self) {\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn open_use_case_popup(&mut self) {\n        self.input_mode = InputMode::UseCasePopup;\n        // Don't reset cursor -- keep it where it was last time\n    }\n\n    pub fn close_use_case_popup(&mut self) {\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn provider_popup_up(&mut self) {\n        if self.provider_cursor > 0 {\n            self.provider_cursor -= 1;\n        }\n    }\n\n    pub fn provider_popup_down(&mut self) {\n        if self.provider_cursor + 1 < self.providers.len() {\n            self.provider_cursor += 1;\n        }\n    }\n\n    pub fn provider_popup_toggle(&mut self) {\n        if self.provider_cursor < self.selected_providers.len() {\n            self.selected_providers[self.provider_cursor] =\n                !self.selected_providers[self.provider_cursor];\n            self.apply_filters();\n        }\n    }\n\n    pub fn provider_popup_select_all(&mut self) {\n        let all_selected = self.selected_providers.iter().all(|&s| s);\n        let new_val = !all_selected;\n        for s in &mut self.selected_providers {\n            *s = new_val;\n        }\n        self.apply_filters();\n    }\n\n    pub fn use_case_popup_up(&mut self) {\n        if self.use_case_cursor > 0 {\n            self.use_case_cursor -= 1;\n        }\n    }\n\n    pub fn use_case_popup_down(&mut self) {\n        if self.use_case_cursor + 1 < self.use_cases.len() {\n            self.use_case_cursor += 1;\n        }\n    }\n\n    pub fn use_case_popup_toggle(&mut self) {\n        if self.use_case_cursor < self.selected_use_cases.len() {\n            self.selected_use_cases[self.use_case_cursor] =\n                !self.selected_use_cases[self.use_case_cursor];\n            self.apply_filters();\n        }\n    }\n\n    pub fn use_case_popup_select_all(&mut self) {\n        let all_selected = self.selected_use_cases.iter().all(|&s| s);\n        let new_val = !all_selected;\n        for s in &mut self.selected_use_cases {\n            *s = new_val;\n        }\n        self.apply_filters();\n    }\n\n    pub fn open_capability_popup(&mut self) {\n        self.input_mode = InputMode::CapabilityPopup;\n    }\n\n    pub fn close_capability_popup(&mut self) {\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn capability_popup_up(&mut self) {\n        if self.capability_cursor > 0 {\n            self.capability_cursor -= 1;\n        }\n    }\n\n    pub fn capability_popup_down(&mut self) {\n        if self.capability_cursor + 1 < self.capabilities.len() {\n            self.capability_cursor += 1;\n        }\n    }\n\n    pub fn capability_popup_toggle(&mut self) {\n        if self.capability_cursor < self.selected_capabilities.len() {\n            self.selected_capabilities[self.capability_cursor] =\n                !self.selected_capabilities[self.capability_cursor];\n            self.apply_filters();\n        }\n    }\n\n    pub fn capability_popup_select_all(&mut self) {\n        let all_selected = self.selected_capabilities.iter().all(|&s| s);\n        let new_val = !all_selected;\n        for s in &mut self.selected_capabilities {\n            *s = new_val;\n        }\n        self.apply_filters();\n    }\n\n    // ── Visual mode ──────────────────────────────────────────────\n\n    pub fn enter_visual_mode(&mut self) {\n        self.visual_anchor = Some(self.selected_row);\n        self.input_mode = InputMode::Visual;\n    }\n\n    pub fn exit_visual_mode(&mut self) {\n        self.visual_anchor = None;\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn visual_range(&self) -> Option<std::ops::RangeInclusive<usize>> {\n        let anchor = self.visual_anchor?;\n        let lo = anchor.min(self.selected_row);\n        let hi = anchor.max(self.selected_row);\n        Some(lo..=hi)\n    }\n\n    pub fn visual_selection_count(&self) -> usize {\n        self.visual_range()\n            .map(|r| r.end() - r.start() + 1)\n            .unwrap_or(0)\n    }\n\n    /// In visual mode, compare all selected models.\n    pub fn visual_compare(&mut self) {\n        let Some(range) = self.visual_range() else {\n            return;\n        };\n        let lo = *range.start();\n        let hi = *range.end();\n        if lo == hi {\n            self.pull_status = Some(\"Select at least 2 models to compare\".to_string());\n            return;\n        }\n        // Collect all filtered_fits indices in the visual range\n        self.compare_models = (lo..=hi)\n            .filter_map(|row| self.filtered_fits.get(row).copied())\n            .collect();\n        self.compare_scroll = 0;\n        self.exit_visual_mode();\n        self.show_detail = false;\n        self.show_plan = false;\n        self.show_compare = false;\n        self.show_multi_compare = true;\n    }\n\n    pub fn close_multi_compare(&mut self) {\n        self.show_multi_compare = false;\n        self.compare_models.clear();\n    }\n\n    pub fn multi_compare_scroll_left(&mut self) {\n        if self.compare_scroll > 0 {\n            self.compare_scroll -= 1;\n        }\n    }\n\n    pub fn multi_compare_scroll_right(&mut self) {\n        if !self.compare_models.is_empty()\n            && self.compare_scroll < self.compare_models.len().saturating_sub(1)\n        {\n            self.compare_scroll += 1;\n        }\n    }\n\n    // ── Select mode ─────────────────────────────────────────────\n\n    pub fn enter_select_mode(&mut self) {\n        self.input_mode = InputMode::Select;\n    }\n\n    pub fn exit_select_mode(&mut self) {\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn select_column_left(&mut self) {\n        if self.select_column > 1 {\n            self.select_column -= 1;\n        }\n    }\n\n    pub fn select_column_right(&mut self) {\n        if self.select_column < 13 {\n            self.select_column += 1;\n        }\n    }\n\n    /// Activate the filter for the currently focused column in Select mode.\n    pub fn activate_select_column_filter(&mut self) {\n        match self.select_column {\n            1 => self.cycle_availability_filter(), // Inst\n            2 => {\n                self.input_mode = InputMode::Search;\n            } // Model → search\n            3 => {\n                self.input_mode = InputMode::ProviderPopup;\n            } // Provider\n            4 => {\n                self.input_mode = InputMode::ParamsBucketPopup;\n            } // Params\n            5 => self.set_or_toggle_sort(SortColumn::Score), // Score\n            6 => self.set_or_toggle_sort(SortColumn::Tps), // tok/s\n            7 => {\n                self.input_mode = InputMode::QuantPopup;\n            } // Quant\n            8 => {\n                self.input_mode = InputMode::RunModePopup;\n            } // Mode\n            9 => self.set_or_toggle_sort(SortColumn::MemPct), // Mem%\n            10 => self.set_or_toggle_sort(SortColumn::Ctx), // Ctx\n            11 => self.set_or_toggle_sort(SortColumn::ReleaseDate), // Date\n            12 => self.cycle_fit_filter(),         // Fit\n            13 => {\n                self.input_mode = InputMode::UseCasePopup;\n            } // Use Case\n            _ => {}\n        }\n    }\n\n    /// Set sort column, or toggle ascending/descending if already on that column.\n    fn set_or_toggle_sort(&mut self, col: SortColumn) {\n        if self.sort_column == col {\n            self.sort_ascending = !self.sort_ascending;\n        } else {\n            self.sort_column = col;\n            self.sort_ascending = false;\n        }\n        self.re_sort();\n    }\n\n    // ── Quant popup ─────────────────────────────────────────────\n\n    pub fn close_quant_popup(&mut self) {\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn quant_popup_up(&mut self) {\n        if self.quant_cursor > 0 {\n            self.quant_cursor -= 1;\n        }\n    }\n\n    pub fn quant_popup_down(&mut self) {\n        if self.quant_cursor + 1 < self.quants.len() {\n            self.quant_cursor += 1;\n        }\n    }\n\n    pub fn quant_popup_toggle(&mut self) {\n        if self.quant_cursor < self.selected_quants.len() {\n            self.selected_quants[self.quant_cursor] = !self.selected_quants[self.quant_cursor];\n            self.apply_filters();\n        }\n    }\n\n    pub fn quant_popup_select_all(&mut self) {\n        let all_selected = self.selected_quants.iter().all(|&s| s);\n        let new_val = !all_selected;\n        for s in &mut self.selected_quants {\n            *s = new_val;\n        }\n        self.apply_filters();\n    }\n\n    // ── RunMode popup ───────────────────────────────────────────\n\n    pub fn close_run_mode_popup(&mut self) {\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn run_mode_popup_up(&mut self) {\n        if self.run_mode_cursor > 0 {\n            self.run_mode_cursor -= 1;\n        }\n    }\n\n    pub fn run_mode_popup_down(&mut self) {\n        if self.run_mode_cursor + 1 < self.run_modes.len() {\n            self.run_mode_cursor += 1;\n        }\n    }\n\n    pub fn run_mode_popup_toggle(&mut self) {\n        if self.run_mode_cursor < self.selected_run_modes.len() {\n            self.selected_run_modes[self.run_mode_cursor] =\n                !self.selected_run_modes[self.run_mode_cursor];\n            self.apply_filters();\n        }\n    }\n\n    pub fn run_mode_popup_select_all(&mut self) {\n        let all_selected = self.selected_run_modes.iter().all(|&s| s);\n        let new_val = !all_selected;\n        for s in &mut self.selected_run_modes {\n            *s = new_val;\n        }\n        self.apply_filters();\n    }\n\n    // ── Params bucket popup ─────────────────────────────────────\n\n    pub fn close_params_bucket_popup(&mut self) {\n        self.input_mode = InputMode::Normal;\n    }\n\n    pub fn params_bucket_popup_up(&mut self) {\n        if self.params_bucket_cursor > 0 {\n            self.params_bucket_cursor -= 1;\n        }\n    }\n\n    pub fn params_bucket_popup_down(&mut self) {\n        if self.params_bucket_cursor + 1 < self.params_buckets.len() {\n            self.params_bucket_cursor += 1;\n        }\n    }\n\n    pub fn params_bucket_popup_toggle(&mut self) {\n        if self.params_bucket_cursor < self.selected_params_buckets.len() {\n            self.selected_params_buckets[self.params_bucket_cursor] =\n                !self.selected_params_buckets[self.params_bucket_cursor];\n            self.apply_filters();\n        }\n    }\n\n    pub fn params_bucket_popup_select_all(&mut self) {\n        let all_selected = self.selected_params_buckets.iter().all(|&s| s);\n        let new_val = !all_selected;\n        for s in &mut self.selected_params_buckets {\n            *s = new_val;\n        }\n        self.apply_filters();\n    }\n\n    pub fn toggle_installed_first(&mut self) {\n        self.installed_first = !self.installed_first;\n        self.re_sort();\n    }\n\n    /// Re-sort all_fits using current sort column and installed_first preference, then refilter.\n    fn re_sort(&mut self) {\n        let fits = std::mem::take(&mut self.all_fits);\n        let mut sorted = llmfit_core::fit::rank_models_by_fit_opts_col(\n            fits,\n            self.installed_first,\n            self.sort_column,\n        );\n        if self.sort_ascending {\n            sorted.reverse();\n        }\n        self.all_fits = sorted;\n        self.apply_filters();\n    }\n\n    /// Start pulling the currently selected model via the best available provider.\n    pub fn start_download(&mut self) {\n        let any_available = self.ollama_available\n            || self.mlx_available\n            || self.llamacpp_available\n            || self.docker_mr_available\n            || self.lmstudio_available;\n        if !any_available {\n            self.pull_status = Some(\n                \"No runtime available — install Ollama, llama.cpp, Docker, or LM Studio\"\n                    .to_string(),\n            );\n            return;\n        }\n        if self.pull_active.is_some() {\n            return; // already pulling\n        }\n        let Some(fit) = self.selected_fit() else {\n            return;\n        };\n        if fit.installed {\n            self.pull_status = Some(\"Already installed\".to_string());\n            return;\n        }\n        let model_name = fit.model.name.clone();\n        let model_format = fit.model.format;\n        let is_mlx_model = fit.model.is_mlx_model();\n        let has_catalog_gguf = !fit.model.gguf_sources.is_empty();\n\n        let download_options = self.available_download_providers(&model_name, has_catalog_gguf);\n        if !download_options.is_empty() {\n            self.open_download_provider_popup(model_name, download_options);\n        } else {\n            let any_runtime = self.ollama_available\n                || self.ollama_binary_available\n                || self.llamacpp_available\n                || self.mlx_available\n                || self.docker_mr_available\n                || self.lmstudio_available;\n            self.pull_status = Some(if any_runtime {\n                Self::format_no_download_message(model_format, is_mlx_model)\n            } else {\n                \"No runtime available — install Ollama, llama.cpp, Docker, or LM Studio\".to_string()\n            });\n        }\n    }\n\n    /// Build a user-friendly message explaining why no download is available,\n    /// based on the model's weight format.\n    fn format_no_download_message(\n        format: llmfit_core::models::ModelFormat,\n        is_mlx_model: bool,\n    ) -> String {\n        use llmfit_core::models::ModelFormat;\n        if is_mlx_model {\n            \"MLX model — requires Apple Silicon with MLX installed\".to_string()\n        } else {\n            match format {\n                ModelFormat::Awq => {\n                    \"AWQ model — requires vLLM or a CUDA/ROCm GPU; no GGUF conversion available\"\n                        .to_string()\n                }\n                ModelFormat::Gptq => {\n                    \"GPTQ model — requires vLLM or a CUDA/ROCm GPU; no GGUF conversion available\"\n                        .to_string()\n                }\n                _ => \"No downloadable format found for this model\".to_string(),\n            }\n        }\n    }\n\n    fn start_mlx_download(&mut self, model_name: String) {\n        let tag = providers::mlx_pull_tag(&model_name);\n        match self.mlx.start_pull(&tag) {\n            Ok(handle) => {\n                self.pull_model_name = Some(model_name);\n                let repo_display = if tag.contains('/') {\n                    tag\n                } else {\n                    format!(\"mlx-community/{}\", tag)\n                };\n                self.pull_status = Some(format!(\"Pulling {}...\", repo_display));\n                self.pull_percent = None;\n                self.pull_provider = Some(ActivePullProvider::Mlx);\n                self.pull_active = Some(handle);\n            }\n            Err(e) => {\n                self.pull_status = Some(format!(\"MLX pull failed: {}\", e));\n            }\n        }\n    }\n\n    fn start_download_with_provider(&mut self, model_name: String, provider: DownloadProvider) {\n        match provider {\n            DownloadProvider::Ollama => self.start_ollama_download(model_name),\n            DownloadProvider::Mlx => self.start_mlx_download(model_name),\n            DownloadProvider::LlamaCpp => self.start_llamacpp_download_for_model(model_name),\n            DownloadProvider::DockerModelRunner => self.start_docker_mr_download(model_name),\n            DownloadProvider::LmStudio => self.start_lmstudio_download(model_name),\n        }\n    }\n\n    fn start_ollama_download(&mut self, model_name: String) {\n        let Some(tag) = providers::ollama_pull_tag(&model_name) else {\n            self.pull_status = Some(\"Not available in Ollama registry\".to_string());\n            return;\n        };\n        match self.ollama.start_pull(&tag) {\n            Ok(handle) => {\n                self.pull_model_name = Some(model_name);\n                self.pull_status = Some(format!(\"Pulling {}...\", tag));\n                self.pull_percent = Some(0.0);\n                self.pull_provider = Some(ActivePullProvider::Ollama);\n                self.pull_active = Some(handle);\n            }\n            Err(e) => {\n                self.pull_status = Some(format!(\"Pull failed: {}\", e));\n            }\n        }\n    }\n\n    /// Start downloading a GGUF model via the llama.cpp provider.\n    fn start_llamacpp_download_for_model(&mut self, model_name: String) {\n        // Check catalog gguf_sources first (instant), then fall back to HTTP probe\n        let catalog_repo = self\n            .all_fits\n            .iter()\n            .find(|f| f.model.name == model_name)\n            .and_then(|f| f.model.gguf_sources.first())\n            .map(|s| s.repo.clone());\n        let Some(repo) = catalog_repo.or_else(|| providers::first_existing_gguf_repo(&model_name))\n        else {\n            self.pull_status = Some(\"No GGUF repo found in remote registry\".to_string());\n            return;\n        };\n\n        match self.llamacpp.start_pull(&repo) {\n            Ok(handle) => {\n                self.pull_model_name = Some(model_name);\n                self.pull_status = Some(format!(\"Downloading GGUF from {}...\", repo));\n                self.pull_percent = Some(0.0);\n                self.pull_provider = Some(ActivePullProvider::LlamaCpp);\n                self.pull_active = Some(handle);\n            }\n            Err(e) => {\n                self.pull_status = Some(format!(\"GGUF download failed: {}\", e));\n            }\n        }\n    }\n\n    fn start_docker_mr_download(&mut self, model_name: String) {\n        let Some(docker_tag) = providers::docker_mr_pull_tag(&model_name) else {\n            self.pull_status = Some(\"Not available for Docker Model Runner\".to_string());\n            return;\n        };\n        match self.docker_mr.start_pull(&docker_tag) {\n            Ok(handle) => {\n                self.pull_model_name = Some(model_name);\n                self.pull_status = Some(format!(\"Pulling {} via Docker...\", docker_tag));\n                self.pull_percent = None;\n                self.pull_provider = Some(ActivePullProvider::DockerModelRunner);\n                self.pull_active = Some(handle);\n            }\n            Err(e) => {\n                self.pull_status = Some(format!(\"Docker pull failed: {}\", e));\n            }\n        }\n    }\n\n    fn start_lmstudio_download(&mut self, model_name: String) {\n        let Some(tag) = providers::lmstudio_pull_tag(&model_name) else {\n            self.pull_status = Some(\"Not available for LM Studio\".to_string());\n            return;\n        };\n        match self.lmstudio.start_pull(&tag) {\n            Ok(handle) => {\n                self.pull_model_name = Some(model_name);\n                self.pull_status = Some(format!(\"Downloading {} via LM Studio...\", tag));\n                self.pull_percent = Some(0.0);\n                self.pull_provider = Some(ActivePullProvider::LmStudio);\n                self.pull_active = Some(handle);\n            }\n            Err(e) => {\n                self.pull_status = Some(format!(\"LM Studio download failed: {}\", e));\n            }\n        }\n    }\n\n    /// Poll the active pull for progress. Called each TUI tick.\n    pub fn tick_pull(&mut self) {\n        self.enqueue_capability_probes_for_visible(24);\n        self.tick_download_capability();\n        if self.pull_active.is_some() {\n            self.tick_count = self.tick_count.wrapping_add(1);\n        }\n        let Some(handle) = &self.pull_active else {\n            return;\n        };\n        // Drain all available events\n        loop {\n            match handle.receiver.try_recv() {\n                Ok(PullEvent::Progress { status, percent }) => {\n                    if let Some(p) = percent {\n                        self.pull_percent = Some(p);\n                    }\n                    self.pull_status = Some(status);\n                }\n                Ok(PullEvent::Done) => {\n                    let done_msg = if let Some(provider) = self.pull_provider {\n                        format!(\"Download complete via {}!\", provider.label())\n                    } else {\n                        \"Download complete!\".to_string()\n                    };\n                    self.pull_status = Some(done_msg);\n                    self.pull_percent = None;\n                    self.pull_active = None;\n                    self.pull_provider = None;\n                    // Refresh installed models\n                    self.refresh_installed();\n                    return;\n                }\n                Ok(PullEvent::Error(e)) => {\n                    self.pull_status = Some(format!(\"Error: {}\", e));\n                    self.pull_percent = None;\n                    self.pull_active = None;\n                    self.pull_provider = None;\n                    return;\n                }\n                Err(mpsc::TryRecvError::Empty) => break,\n                Err(mpsc::TryRecvError::Disconnected) => {\n                    self.pull_status = Some(\"Pull ended\".to_string());\n                    self.pull_percent = None;\n                    self.pull_active = None;\n                    self.pull_provider = None;\n                    self.refresh_installed();\n                    return;\n                }\n            }\n        }\n    }\n\n    fn available_download_providers(\n        &self,\n        model_name: &str,\n        has_catalog_gguf: bool,\n    ) -> Vec<DownloadProvider> {\n        let mut providers_for_model = Vec::new();\n        if providers::has_ollama_mapping(model_name)\n            && (self.ollama_available || self.ollama_binary_available)\n        {\n            providers_for_model.push(DownloadProvider::Ollama);\n        }\n        if self.mlx_available {\n            providers_for_model.push(DownloadProvider::Mlx);\n        }\n        // Check catalog gguf_sources first (no HTTP probe needed), then\n        // fall back to the heuristic repo lookup\n        if self.llamacpp_available\n            && (has_catalog_gguf || providers::first_existing_gguf_repo(model_name).is_some())\n        {\n            providers_for_model.push(DownloadProvider::LlamaCpp);\n        }\n        if self.docker_mr_available && providers::has_docker_mr_mapping(model_name) {\n            providers_for_model.push(DownloadProvider::DockerModelRunner);\n        }\n        if self.lmstudio_available && providers::has_lmstudio_mapping(model_name) {\n            providers_for_model.push(DownloadProvider::LmStudio);\n        }\n        providers_for_model\n    }\n\n    fn open_download_provider_popup(&mut self, model_name: String, options: Vec<DownloadProvider>) {\n        self.download_provider_model = Some(model_name);\n        self.download_provider_options = options;\n        self.download_provider_cursor = 0;\n        self.input_mode = InputMode::DownloadProviderPopup;\n        self.pull_status = Some(\"Choose download runtime and press Enter\".to_string());\n    }\n\n    pub fn close_download_provider_popup(&mut self) {\n        self.download_provider_model = None;\n        self.download_provider_options.clear();\n        self.download_provider_cursor = 0;\n        self.input_mode = InputMode::Normal;\n        self.pull_status = Some(\"Download cancelled\".to_string());\n    }\n\n    pub fn download_provider_popup_up(&mut self) {\n        if self.download_provider_cursor > 0 {\n            self.download_provider_cursor -= 1;\n        }\n    }\n\n    pub fn download_provider_popup_down(&mut self) {\n        if self.download_provider_cursor + 1 < self.download_provider_options.len() {\n            self.download_provider_cursor += 1;\n        }\n    }\n\n    pub fn confirm_download_provider_selection(&mut self) {\n        let Some(model_name) = self.download_provider_model.clone() else {\n            self.input_mode = InputMode::Normal;\n            return;\n        };\n        let Some(provider) = self\n            .download_provider_options\n            .get(self.download_provider_cursor)\n            .copied()\n        else {\n            self.close_download_provider_popup();\n            return;\n        };\n\n        self.download_provider_model = None;\n        self.download_provider_options.clear();\n        self.download_provider_cursor = 0;\n        self.input_mode = InputMode::Normal;\n        self.start_download_with_provider(model_name, provider);\n    }\n\n    /// Re-query all providers for installed models and update all_fits.\n    pub fn refresh_installed(&mut self) {\n        let (ollama_set, ollama_count) = self.ollama.installed_models_counted();\n        self.ollama_installed = ollama_set;\n        self.ollama_installed_count = ollama_count;\n        self.mlx_installed = self.mlx.installed_models();\n        let (llamacpp_set, llamacpp_count) = self.llamacpp.installed_models_counted();\n        self.llamacpp_installed = llamacpp_set;\n        self.llamacpp_installed_count = llamacpp_count;\n        let (docker_mr_set, docker_mr_count) = self.docker_mr.installed_models_counted();\n        self.docker_mr_installed = docker_mr_set;\n        self.docker_mr_installed_count = docker_mr_count;\n        let (lmstudio_set, lmstudio_count) = self.lmstudio.installed_models_counted();\n        self.lmstudio_installed = lmstudio_set;\n        self.lmstudio_installed_count = lmstudio_count;\n        for fit in &mut self.all_fits {\n            fit.installed = providers::is_model_installed(&fit.model.name, &self.ollama_installed)\n                || providers::is_model_installed_mlx(&fit.model.name, &self.mlx_installed)\n                || providers::is_model_installed_llamacpp(\n                    &fit.model.name,\n                    &self.llamacpp_installed,\n                )\n                || providers::is_model_installed_docker_mr(\n                    &fit.model.name,\n                    &self.docker_mr_installed,\n                )\n                || providers::is_model_installed_lmstudio(\n                    &fit.model.name,\n                    &self.lmstudio_installed,\n                );\n        }\n        self.re_sort();\n        self.enqueue_capability_probes_for_visible(24);\n    }\n\n    pub fn download_capability_for(&self, model_name: &str) -> DownloadCapability {\n        self.download_capabilities\n            .get(model_name)\n            .copied()\n            .unwrap_or(DownloadCapability::Unknown)\n    }\n\n    pub fn enqueue_capability_probes_for_visible(&mut self, window: usize) {\n        if self.filtered_fits.is_empty() {\n            return;\n        }\n        let start = self.selected_row.saturating_sub(window / 2);\n        let end = (start + window).min(self.filtered_fits.len());\n        for idx in start..end {\n            if let Some(&fit_idx) = self.filtered_fits.get(idx) {\n                let model_name = self.all_fits[fit_idx].model.name.clone();\n                let has_catalog_gguf = !self.all_fits[fit_idx].model.gguf_sources.is_empty();\n                self.enqueue_capability_probe(model_name, has_catalog_gguf);\n            }\n        }\n    }\n\n    fn enqueue_capability_probe(&mut self, model_name: String, has_catalog_gguf: bool) {\n        if self.download_capabilities.contains_key(&model_name)\n            || self.download_capability_inflight.contains(&model_name)\n            || self.download_capability_inflight.len() >= 12\n        {\n            return;\n        }\n        self.download_capability_inflight.insert(model_name.clone());\n\n        let tx = self.download_capability_tx.clone();\n        let ollama_runtime_available = self.ollama_available || self.ollama_binary_available;\n        let llamacpp_available = self.llamacpp_available;\n        let docker_mr_available = self.docker_mr_available;\n        let lmstudio_available = self.lmstudio_available;\n        std::thread::spawn(move || {\n            let has_ollama = ollama_runtime_available && providers::has_ollama_mapping(&model_name);\n            let has_llamacpp = if llamacpp_available {\n                // Use catalog data when available to skip slow HTTP probes\n                has_catalog_gguf || providers::first_existing_gguf_repo(&model_name).is_some()\n            } else {\n                false\n            };\n            let has_docker = docker_mr_available && providers::has_docker_mr_mapping(&model_name);\n            let has_lmstudio = lmstudio_available && providers::has_lmstudio_mapping(&model_name);\n\n            let mut flags = 0u8;\n            if has_ollama {\n                flags |= DL_OLLAMA;\n            }\n            if has_llamacpp {\n                flags |= DL_LLAMACPP;\n            }\n            if has_docker {\n                flags |= DL_DOCKER;\n            }\n            if has_lmstudio {\n                flags |= DL_LMSTUDIO;\n            }\n            let _ = tx.send((model_name, DownloadCapability::Known(flags)));\n        });\n    }\n\n    fn tick_download_capability(&mut self) {\n        loop {\n            match self.download_capability_rx.try_recv() {\n                Ok((name, capability)) => {\n                    self.download_capability_inflight.remove(&name);\n                    self.download_capabilities.insert(name, capability);\n                }\n                Err(mpsc::TryRecvError::Empty) => break,\n                Err(mpsc::TryRecvError::Disconnected) => break,\n            }\n        }\n    }\n\n    fn active_plan_input(&self) -> &String {\n        match self.plan_field {\n            PlanField::Context => &self.plan_context_input,\n            PlanField::Quant => &self.plan_quant_input,\n            PlanField::TargetTps => &self.plan_target_tps_input,\n        }\n    }\n\n    fn active_plan_input_mut(&mut self) -> &mut String {\n        match self.plan_field {\n            PlanField::Context => &mut self.plan_context_input,\n            PlanField::Quant => &mut self.plan_quant_input,\n            PlanField::TargetTps => &mut self.plan_target_tps_input,\n        }\n    }\n}\n\nfn command_exists(name: &str) -> bool {\n    std::process::Command::new(\"which\")\n        .arg(name)\n        .stdout(std::process::Stdio::null())\n        .stderr(std::process::Stdio::null())\n        .status()\n        .map(|s| s.success())\n        .unwrap_or(false)\n}\n"
  },
  {
    "path": "llmfit-tui/src/tui_events.rs",
    "content": "use crossterm::event::{self, Event, KeyCode, KeyEvent, KeyEventKind, KeyModifiers};\nuse std::time::Duration;\n\nuse crate::tui_app::{App, InputMode};\n\n/// Poll for and handle events. Returns true if an event was processed.\npub fn handle_events(app: &mut App) -> std::io::Result<bool> {\n    // Always tick the pull progress (non-blocking)\n    app.tick_pull();\n\n    if event::poll(Duration::from_millis(50))?\n        && let Event::Key(key) = event::read()?\n    {\n        // Only handle Press events (ignore Release on some platforms)\n        if key.kind != KeyEventKind::Press {\n            return Ok(false);\n        }\n        match app.input_mode {\n            InputMode::Normal => handle_normal_mode(app, key),\n            InputMode::Visual => handle_visual_mode(app, key),\n            InputMode::Select => handle_select_mode(app, key),\n            InputMode::Search => handle_search_mode(app, key),\n            InputMode::Plan => handle_plan_mode(app, key),\n            InputMode::ProviderPopup => handle_provider_popup_mode(app, key),\n            InputMode::UseCasePopup => handle_use_case_popup_mode(app, key),\n            InputMode::CapabilityPopup => handle_capability_popup_mode(app, key),\n            InputMode::DownloadProviderPopup => handle_download_provider_popup_mode(app, key),\n            InputMode::QuantPopup => handle_quant_popup_mode(app, key),\n            InputMode::RunModePopup => handle_run_mode_popup_mode(app, key),\n            InputMode::ParamsBucketPopup => handle_params_bucket_popup_mode(app, key),\n        }\n        return Ok(true);\n    }\n    Ok(false)\n}\n\nfn handle_normal_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        // Quit\n        KeyCode::Char('q') | KeyCode::Esc => {\n            if app.show_multi_compare {\n                app.close_multi_compare();\n            } else if app.show_detail {\n                app.show_detail = false;\n            } else if app.show_compare {\n                app.show_compare = false;\n            } else {\n                app.should_quit = true;\n            }\n        }\n\n        // Navigation — in multi-compare, h/l scroll columns\n        KeyCode::Char('h') if app.show_multi_compare => app.multi_compare_scroll_left(),\n        KeyCode::Char('l') if app.show_multi_compare => app.multi_compare_scroll_right(),\n        KeyCode::Left if app.show_multi_compare => app.multi_compare_scroll_left(),\n        KeyCode::Right if app.show_multi_compare => app.multi_compare_scroll_right(),\n\n        KeyCode::Char('u') if key.modifiers.contains(KeyModifiers::CONTROL) => app.half_page_up(),\n        KeyCode::Char('d') if key.modifiers.contains(KeyModifiers::CONTROL) => app.half_page_down(),\n        KeyCode::Up | KeyCode::Char('k') => app.move_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.move_down(),\n        KeyCode::PageUp => app.page_up(),\n        KeyCode::PageDown => app.page_down(),\n        KeyCode::Home | KeyCode::Char('g') => app.home(),\n        KeyCode::End | KeyCode::Char('G') => app.end(),\n\n        // Visual mode\n        KeyCode::Char('v') => app.enter_visual_mode(),\n\n        // Select mode\n        KeyCode::Char('V') => app.enter_select_mode(),\n\n        // Search\n        KeyCode::Char('/') => app.enter_search(),\n\n        // Fit filter\n        KeyCode::Char('f') => app.cycle_fit_filter(),\n\n        // Availability filter\n        KeyCode::Char('a') => app.cycle_availability_filter(),\n\n        // Sort column\n        KeyCode::Char('s') => app.cycle_sort_column(),\n\n        // Theme\n        KeyCode::Char('t') => app.cycle_theme(),\n\n        // Plan view\n        KeyCode::Char('p') => app.open_plan_mode(),\n\n        // Provider popup\n        KeyCode::Char('P') => app.open_provider_popup(),\n        KeyCode::Char('U') => app.open_use_case_popup(),\n        KeyCode::Char('C') => app.open_capability_popup(),\n\n        // Installed-first sort toggle (any provider)\n        KeyCode::Char('i')\n            if app.ollama_available\n                || app.mlx_available\n                || app.llamacpp_available\n                || app.lmstudio_available =>\n        {\n            app.toggle_installed_first()\n        }\n\n        // Download model via best provider (requires confirmation)\n        KeyCode::Char('d')\n            if app.ollama_available\n                || app.mlx_available\n                || app.llamacpp_available\n                || app.lmstudio_available =>\n        {\n            if app.pull_active.is_none() {\n                app.start_download();\n            }\n        }\n\n        // Refresh installed models\n        KeyCode::Char('r')\n            if app.ollama_available\n                || app.mlx_available\n                || app.llamacpp_available\n                || app.lmstudio_available =>\n        {\n            app.refresh_installed()\n        }\n\n        // Detail view\n        KeyCode::Enter => app.toggle_detail(),\n\n        // Compare view\n        KeyCode::Char('m') => app.mark_selected_for_compare(),\n        KeyCode::Char('c') => app.toggle_compare_view(),\n        KeyCode::Char('x') => app.clear_compare_mark(),\n\n        _ => {}\n    }\n}\n\nfn handle_visual_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        // Exit visual mode\n        KeyCode::Esc | KeyCode::Char('q') | KeyCode::Char('v') => app.exit_visual_mode(),\n\n        // Navigation (extends selection)\n        KeyCode::Char('u') if key.modifiers.contains(KeyModifiers::CONTROL) => app.half_page_up(),\n        KeyCode::Char('d') if key.modifiers.contains(KeyModifiers::CONTROL) => app.half_page_down(),\n        KeyCode::Up | KeyCode::Char('k') => app.move_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.move_down(),\n        KeyCode::PageUp => app.page_up(),\n        KeyCode::PageDown => app.page_down(),\n        KeyCode::Home | KeyCode::Char('g') => app.home(),\n        KeyCode::End | KeyCode::Char('G') => app.end(),\n\n        // Mark all selected for compare\n        KeyCode::Char('m') => app.mark_selected_for_compare(),\n\n        // Compare first and last in visual selection\n        KeyCode::Char('c') => app.visual_compare(),\n\n        _ => {}\n    }\n}\n\nfn handle_select_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        // Exit select mode\n        KeyCode::Esc | KeyCode::Char('q') => app.exit_select_mode(),\n\n        // Column navigation\n        KeyCode::Left | KeyCode::Char('h') => app.select_column_left(),\n        KeyCode::Right | KeyCode::Char('l') => app.select_column_right(),\n\n        // Activate filter for current column\n        KeyCode::Enter | KeyCode::Char(' ') => app.activate_select_column_filter(),\n\n        // Row navigation (still works in select mode)\n        KeyCode::Up | KeyCode::Char('k') => app.move_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.move_down(),\n\n        _ => {}\n    }\n}\n\nfn handle_search_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Enter => app.exit_search(),\n\n        KeyCode::Backspace => app.search_backspace(),\n        KeyCode::Delete => app.search_delete(),\n\n        KeyCode::Char('u') if key.modifiers.contains(KeyModifiers::CONTROL) => {\n            app.clear_search();\n        }\n\n        KeyCode::Char(c) => app.search_input(c),\n\n        // Allow navigation while searching\n        KeyCode::Up => app.move_up(),\n        KeyCode::Down => app.move_down(),\n\n        _ => {}\n    }\n}\n\nfn handle_provider_popup_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Char('P') | KeyCode::Char('q') => app.close_provider_popup(),\n\n        KeyCode::Up | KeyCode::Char('k') => app.provider_popup_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.provider_popup_down(),\n\n        KeyCode::Char(' ') | KeyCode::Enter => app.provider_popup_toggle(),\n\n        KeyCode::Char('a') => app.provider_popup_select_all(),\n\n        _ => {}\n    }\n}\n\nfn handle_plan_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Char('q') => app.close_plan_mode(),\n        KeyCode::Tab | KeyCode::Down | KeyCode::Char('j') => app.plan_next_field(),\n        KeyCode::BackTab | KeyCode::Up | KeyCode::Char('k') => app.plan_prev_field(),\n        KeyCode::Left => app.plan_cursor_left(),\n        KeyCode::Right => app.plan_cursor_right(),\n        KeyCode::Backspace => app.plan_backspace(),\n        KeyCode::Delete => app.plan_delete(),\n        KeyCode::Char('u') if key.modifiers.contains(KeyModifiers::CONTROL) => {\n            app.plan_clear_field()\n        }\n        KeyCode::Char(c) => app.plan_input(c),\n        _ => {}\n    }\n}\n\nfn handle_use_case_popup_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Char('U') | KeyCode::Char('q') => app.close_use_case_popup(),\n\n        KeyCode::Up | KeyCode::Char('k') => app.use_case_popup_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.use_case_popup_down(),\n\n        KeyCode::Char(' ') | KeyCode::Enter => app.use_case_popup_toggle(),\n\n        KeyCode::Char('a') => app.use_case_popup_select_all(),\n\n        _ => {}\n    }\n}\n\nfn handle_capability_popup_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Char('C') | KeyCode::Char('q') => app.close_capability_popup(),\n\n        KeyCode::Up | KeyCode::Char('k') => app.capability_popup_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.capability_popup_down(),\n\n        KeyCode::Char(' ') | KeyCode::Enter => app.capability_popup_toggle(),\n\n        KeyCode::Char('a') => app.capability_popup_select_all(),\n\n        _ => {}\n    }\n}\n\nfn handle_download_provider_popup_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Char('q') => app.close_download_provider_popup(),\n        KeyCode::Up | KeyCode::Char('k') => app.download_provider_popup_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.download_provider_popup_down(),\n        KeyCode::Enter | KeyCode::Char(' ') => app.confirm_download_provider_selection(),\n        _ => {}\n    }\n}\n\nfn handle_quant_popup_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Char('q') => app.close_quant_popup(),\n\n        KeyCode::Up | KeyCode::Char('k') => app.quant_popup_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.quant_popup_down(),\n\n        KeyCode::Char(' ') | KeyCode::Enter => app.quant_popup_toggle(),\n\n        KeyCode::Char('a') => app.quant_popup_select_all(),\n\n        _ => {}\n    }\n}\n\nfn handle_run_mode_popup_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Char('q') => app.close_run_mode_popup(),\n\n        KeyCode::Up | KeyCode::Char('k') => app.run_mode_popup_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.run_mode_popup_down(),\n\n        KeyCode::Char(' ') | KeyCode::Enter => app.run_mode_popup_toggle(),\n\n        KeyCode::Char('a') => app.run_mode_popup_select_all(),\n\n        _ => {}\n    }\n}\n\nfn handle_params_bucket_popup_mode(app: &mut App, key: KeyEvent) {\n    match key.code {\n        KeyCode::Esc | KeyCode::Char('q') => app.close_params_bucket_popup(),\n\n        KeyCode::Up | KeyCode::Char('k') => app.params_bucket_popup_up(),\n        KeyCode::Down | KeyCode::Char('j') => app.params_bucket_popup_down(),\n\n        KeyCode::Char(' ') | KeyCode::Enter => app.params_bucket_popup_toggle(),\n\n        KeyCode::Char('a') => app.params_bucket_popup_select_all(),\n\n        _ => {}\n    }\n}\n"
  },
  {
    "path": "llmfit-tui/src/tui_ui.rs",
    "content": "use ratatui::{\n    Frame,\n    layout::{Constraint, Direction, Layout, Rect},\n    style::{Color, Modifier, Style},\n    text::{Line, Span},\n    widgets::{\n        Block, Borders, Cell, Clear, Paragraph, Row, Scrollbar, ScrollbarOrientation,\n        ScrollbarState, Table, TableState, Wrap,\n    },\n};\n\nuse crate::theme::ThemeColors;\nuse crate::tui_app::{\n    App, AvailabilityFilter, DL_DOCKER, DL_LLAMACPP, DL_LMSTUDIO, DL_OLLAMA, DownloadCapability,\n    DownloadProvider, FitFilter, InputMode, PlanField,\n};\nuse llmfit_core::fit::{FitLevel, ModelFit, SortColumn};\nuse llmfit_core::hardware::is_running_in_wsl;\nuse llmfit_core::providers;\n\npub fn draw(frame: &mut Frame, app: &mut App) {\n    let tc = app.theme.colors();\n\n    // Fill background if theme specifies one\n    if tc.bg != Color::Reset {\n        let bg_block = Block::default().style(Style::default().bg(tc.bg));\n        frame.render_widget(bg_block, frame.area());\n    }\n\n    let outer = Layout::default()\n        .direction(Direction::Vertical)\n        .constraints([\n            Constraint::Length(3), // system info bar\n            Constraint::Length(3), // search + filters\n            Constraint::Min(10),   // main table\n            Constraint::Length(1), // status bar\n        ])\n        .split(frame.area());\n\n    draw_system_bar(frame, app, outer[0], &tc);\n    draw_search_and_filters(frame, app, outer[1], &tc);\n\n    if app.show_plan {\n        draw_plan(frame, app, outer[2], &tc);\n    } else if app.show_multi_compare {\n        draw_multi_compare(frame, app, outer[2], &tc);\n    } else if app.show_compare {\n        draw_compare(frame, app, outer[2], &tc);\n    } else if app.show_detail {\n        draw_detail(frame, app, outer[2], &tc);\n    } else {\n        draw_table(frame, app, outer[2], &tc);\n    }\n\n    draw_status_bar(frame, app, outer[3], &tc);\n\n    // Draw popup overlays on top if active\n    if app.input_mode == InputMode::ProviderPopup {\n        draw_provider_popup(frame, app, &tc);\n    } else if app.input_mode == InputMode::UseCasePopup {\n        draw_use_case_popup(frame, app, &tc);\n    } else if app.input_mode == InputMode::CapabilityPopup {\n        draw_capability_popup(frame, app, &tc);\n    } else if app.input_mode == InputMode::DownloadProviderPopup {\n        draw_download_provider_popup(frame, app, &tc);\n    } else if app.input_mode == InputMode::QuantPopup {\n        draw_quant_popup(frame, app, &tc);\n    } else if app.input_mode == InputMode::RunModePopup {\n        draw_run_mode_popup(frame, app, &tc);\n    } else if app.input_mode == InputMode::ParamsBucketPopup {\n        draw_params_bucket_popup(frame, app, &tc);\n    }\n}\n\nfn draw_system_bar(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColors) {\n    let gpu_info = if app.specs.gpus.is_empty() {\n        format!(\"GPU: none ({})\", app.specs.backend.label())\n    } else {\n        let primary = &app.specs.gpus[0];\n        let backend = primary.backend.label();\n        let primary_str = if primary.unified_memory {\n            format!(\n                \"{} ({:.1} GB shared, {})\",\n                primary.name,\n                primary.vram_gb.unwrap_or(0.0),\n                backend\n            )\n        } else {\n            match primary.vram_gb {\n                Some(vram) if vram > 0.0 => {\n                    if primary.count > 1 {\n                        let total_vram = vram * primary.count as f64;\n                        format!(\n                            \"{} x{} ({:.1} GB each = {:.0} GB total, {})\",\n                            primary.name, primary.count, vram, total_vram, backend\n                        )\n                    } else {\n                        format!(\"{} ({:.1} GB, {})\", primary.name, vram, backend)\n                    }\n                }\n                Some(_) => format!(\"{} (shared, {})\", primary.name, backend),\n                None => format!(\"{} ({})\", primary.name, backend),\n            }\n        };\n        let extra = app.specs.gpus.len() - 1;\n        if extra > 0 {\n            format!(\"GPU: {} +{} more\", primary_str, extra)\n        } else {\n            format!(\"GPU: {}\", primary_str)\n        }\n    };\n\n    let ollama_info = if app.ollama_available {\n        format!(\"Ollama: ✓ ({} installed)\", app.ollama_installed_count)\n    } else {\n        \"Ollama: ✗\".to_string()\n    };\n    let ollama_color = if app.ollama_available {\n        tc.good\n    } else {\n        tc.muted\n    };\n\n    let mlx_info = if app.mlx_available {\n        format!(\"MLX: ✓ ({} installed)\", app.mlx_installed.len())\n    } else if !app.mlx_installed.is_empty() {\n        format!(\"MLX: ({} cached)\", app.mlx_installed.len())\n    } else {\n        \"MLX: ✗\".to_string()\n    };\n    let mlx_color = if app.mlx_available {\n        tc.good\n    } else if !app.mlx_installed.is_empty() {\n        tc.warning\n    } else {\n        tc.muted\n    };\n\n    let llamacpp_info = if app.llamacpp_available {\n        format!(\"llama.cpp: ✓ ({} models)\", app.llamacpp_installed_count)\n    } else if !app.llamacpp_installed.is_empty() {\n        format!(\"llama.cpp: ({} cached)\", app.llamacpp_installed_count)\n    } else {\n        \"llama.cpp: ✗\".to_string()\n    };\n    let llamacpp_color = if app.llamacpp_available {\n        tc.good\n    } else if !app.llamacpp_installed.is_empty() {\n        tc.warning\n    } else {\n        tc.muted\n    };\n\n    let docker_mr_info = if app.docker_mr_available {\n        format!(\"Docker: ✓ ({} models)\", app.docker_mr_installed_count)\n    } else {\n        \"Docker: ✗\".to_string()\n    };\n    let docker_mr_color = if app.docker_mr_available {\n        tc.good\n    } else {\n        tc.muted\n    };\n\n    let lmstudio_info = if app.lmstudio_available {\n        format!(\"LM Studio: ✓ ({} models)\", app.lmstudio_installed_count)\n    } else {\n        \"LM Studio: ✗\".to_string()\n    };\n    let lmstudio_color = if app.lmstudio_available {\n        tc.good\n    } else {\n        tc.muted\n    };\n\n    let mut spans = vec![\n        Span::styled(\" CPU: \", Style::default().fg(tc.muted)),\n        Span::styled(\n            format!(\n                \"{} ({} cores)\",\n                app.specs.cpu_name, app.specs.total_cpu_cores\n            ),\n            Style::default().fg(tc.fg),\n        ),\n        Span::styled(\"  │  \", Style::default().fg(tc.muted)),\n        Span::styled(\"RAM: \", Style::default().fg(tc.muted)),\n        Span::styled(\n            format!(\n                \"{:.1} GB avail / {:.1} GB total{}\",\n                app.specs.available_ram_gb,\n                app.specs.total_ram_gb,\n                if is_running_in_wsl() { \" (WSL)\" } else { \"\" }\n            ),\n            Style::default().fg(tc.accent),\n        ),\n        Span::styled(\"  │  \", Style::default().fg(tc.muted)),\n        Span::styled(gpu_info, Style::default().fg(tc.accent_secondary)),\n        Span::styled(\"  │  \", Style::default().fg(tc.muted)),\n        Span::styled(ollama_info, Style::default().fg(ollama_color)),\n        Span::styled(\"  │  \", Style::default().fg(tc.muted)),\n        Span::styled(mlx_info, Style::default().fg(mlx_color)),\n        Span::styled(\"  │  \", Style::default().fg(tc.muted)),\n        Span::styled(llamacpp_info, Style::default().fg(llamacpp_color)),\n        Span::styled(\"  │  \", Style::default().fg(tc.muted)),\n        Span::styled(docker_mr_info, Style::default().fg(docker_mr_color)),\n        Span::styled(\"  │  \", Style::default().fg(tc.muted)),\n        Span::styled(lmstudio_info, Style::default().fg(lmstudio_color)),\n    ];\n\n    if app.backend_hidden_count > 0 {\n        spans.push(Span::styled(\"  │  \", Style::default().fg(tc.muted)));\n        spans.push(Span::styled(\n            format!(\n                \"{} model{} hidden (incompatible backend)\",\n                app.backend_hidden_count,\n                if app.backend_hidden_count == 1 {\n                    \"\"\n                } else {\n                    \"s\"\n                }\n            ),\n            Style::default().fg(tc.muted),\n        ));\n    }\n\n    let text = Line::from(spans);\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(\" llmfit \")\n        .title_style(Style::default().fg(tc.title).add_modifier(Modifier::BOLD));\n\n    let paragraph = Paragraph::new(text).block(block);\n    frame.render_widget(paragraph, area);\n}\n\nfn draw_search_and_filters(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColors) {\n    let chunks = Layout::default()\n        .direction(Direction::Horizontal)\n        .constraints([\n            Constraint::Min(30),    // search\n            Constraint::Length(18), // provider summary\n            Constraint::Length(18), // use-case summary\n            Constraint::Length(16), // capability summary\n            Constraint::Length(18), // sort column\n            Constraint::Length(20), // fit filter\n            Constraint::Length(20), // availability filter\n            Constraint::Length(16), // theme\n        ])\n        .split(area);\n\n    // Search box\n    let search_style = match app.input_mode {\n        InputMode::Search => Style::default().fg(tc.accent_secondary),\n        InputMode::Normal\n        | InputMode::Plan\n        | InputMode::ProviderPopup\n        | InputMode::UseCasePopup\n        | InputMode::CapabilityPopup\n        | InputMode::DownloadProviderPopup\n        | InputMode::Visual\n        | InputMode::Select\n        | InputMode::QuantPopup\n        | InputMode::RunModePopup\n        | InputMode::ParamsBucketPopup => Style::default().fg(tc.muted),\n    };\n\n    let search_text = if app.search_query.is_empty() && app.input_mode == InputMode::Normal {\n        Line::from(Span::styled(\n            \"Press / to search...\",\n            Style::default().fg(tc.muted),\n        ))\n    } else {\n        Line::from(Span::styled(&app.search_query, Style::default().fg(tc.fg)))\n    };\n\n    let search_block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(search_style)\n        .title(\" Search \")\n        .title_style(search_style);\n\n    let search = Paragraph::new(search_text).block(search_block);\n    frame.render_widget(search, chunks[0]);\n\n    if app.input_mode == InputMode::Search {\n        frame.set_cursor_position((\n            chunks[0].x + app.cursor_position as u16 + 1,\n            chunks[0].y + 1,\n        ));\n    }\n\n    // Provider filter summary\n    let active_count = app.selected_providers.iter().filter(|&&s| s).count();\n    let total_count = app.providers.len();\n    let provider_text = if active_count == total_count {\n        \"All\".to_string()\n    } else {\n        format!(\"{}/{}\", active_count, total_count)\n    };\n    let provider_color = if active_count == total_count {\n        tc.good\n    } else if active_count == 0 {\n        tc.error\n    } else {\n        tc.warning\n    };\n\n    let provider_block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(\" Providers (P) \")\n        .title_style(Style::default().fg(tc.muted));\n\n    let providers = Paragraph::new(Line::from(Span::styled(\n        format!(\" {}\", provider_text),\n        Style::default().fg(provider_color),\n    )))\n    .block(provider_block);\n    frame.render_widget(providers, chunks[1]);\n\n    // Use-case filter summary\n    let active_count = app.selected_use_cases.iter().filter(|&&s| s).count();\n    let total_count = app.use_cases.len();\n    let use_case_text = if active_count == total_count {\n        \"All\".to_string()\n    } else {\n        format!(\"{}/{}\", active_count, total_count)\n    };\n    let use_case_color = if active_count == total_count {\n        tc.good\n    } else if active_count == 0 {\n        tc.error\n    } else {\n        tc.warning\n    };\n\n    let use_case_block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(\" Use Case (U) \")\n        .title_style(Style::default().fg(tc.muted));\n\n    let use_cases = Paragraph::new(Line::from(Span::styled(\n        format!(\" {}\", use_case_text),\n        Style::default().fg(use_case_color),\n    )))\n    .block(use_case_block);\n    frame.render_widget(use_cases, chunks[2]);\n\n    // Capability filter summary\n    let active_cap_count = app.selected_capabilities.iter().filter(|&&s| s).count();\n    let total_cap_count = app.capabilities.len();\n    let cap_text = if active_cap_count == total_cap_count {\n        \"All\".to_string()\n    } else {\n        format!(\"{}/{}\", active_cap_count, total_cap_count)\n    };\n    let cap_color = if active_cap_count == total_cap_count {\n        tc.good\n    } else if active_cap_count == 0 {\n        tc.error\n    } else {\n        tc.warning\n    };\n\n    let cap_block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(\" Caps (C) \")\n        .title_style(Style::default().fg(tc.muted));\n\n    let caps = Paragraph::new(Line::from(Span::styled(\n        format!(\" {}\", cap_text),\n        Style::default().fg(cap_color),\n    )))\n    .block(cap_block);\n    frame.render_widget(caps, chunks[3]);\n\n    // Sort column\n    let sort_block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(\" Sort [s] \")\n        .title_style(Style::default().fg(tc.muted));\n\n    let sort_text = Paragraph::new(Line::from(Span::styled(\n        format!(\" {}\", app.sort_column.label()),\n        Style::default().fg(tc.accent),\n    )))\n    .block(sort_block);\n    frame.render_widget(sort_text, chunks[4]);\n\n    // Fit filter\n    let fit_style = match app.fit_filter {\n        FitFilter::All => Style::default().fg(tc.fg),\n        FitFilter::Runnable => Style::default().fg(tc.good),\n        FitFilter::Perfect => Style::default().fg(tc.good),\n        FitFilter::Good => Style::default().fg(tc.warning),\n        FitFilter::Marginal => Style::default().fg(tc.fit_marginal),\n        FitFilter::TooTight => Style::default().fg(tc.error),\n    };\n\n    let fit_block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(\" Fit [f] \")\n        .title_style(Style::default().fg(tc.muted));\n\n    let fit_text = Paragraph::new(Line::from(Span::styled(app.fit_filter.label(), fit_style)))\n        .block(fit_block);\n    frame.render_widget(fit_text, chunks[5]);\n\n    // Availability filter\n    let avail_style = match app.availability_filter {\n        AvailabilityFilter::All => Style::default().fg(tc.fg),\n        AvailabilityFilter::HasGguf => Style::default().fg(tc.info),\n        AvailabilityFilter::Installed => Style::default().fg(tc.good),\n    };\n\n    let avail_block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(\" Avail [a] \")\n        .title_style(Style::default().fg(tc.muted));\n\n    let avail_text = Paragraph::new(Line::from(Span::styled(\n        app.availability_filter.label(),\n        avail_style,\n    )))\n    .block(avail_block);\n    frame.render_widget(avail_text, chunks[6]);\n\n    // Theme indicator\n    let theme_block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(\" Theme [t] \")\n        .title_style(Style::default().fg(tc.muted));\n\n    let theme_text = Paragraph::new(Line::from(Span::styled(\n        format!(\" {}\", app.theme.label()),\n        Style::default().fg(tc.info),\n    )))\n    .block(theme_block);\n    frame.render_widget(theme_text, chunks[7]);\n}\n\nfn fit_color(level: FitLevel, tc: &ThemeColors) -> Color {\n    match level {\n        FitLevel::Perfect => tc.fit_perfect,\n        FitLevel::Good => tc.fit_good,\n        FitLevel::Marginal => tc.fit_marginal,\n        FitLevel::TooTight => tc.fit_tight,\n    }\n}\n\nfn fit_indicator(level: FitLevel) -> &'static str {\n    match level {\n        FitLevel::Perfect => \"●\",\n        FitLevel::Good => \"●\",\n        FitLevel::Marginal => \"●\",\n        FitLevel::TooTight => \"●\",\n    }\n}\n\n/// Build a compact animated download indicator for the \"Inst\" column.\nfn pull_indicator(percent: Option<f64>, tick: u64) -> String {\n    const SPINNER: &[char] = &['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'];\n    let spin = SPINNER[(tick as usize / 3) % SPINNER.len()];\n\n    match percent {\n        Some(pct) => {\n            const BLOCKS: &[char] = &[' ', '░', '▒', '▓', '█'];\n            let filled = pct / 100.0 * 3.0;\n            let mut bar = String::with_capacity(5);\n            bar.push(spin);\n            for i in 0..3 {\n                let level = (filled - i as f64).clamp(0.0, 1.0);\n                let idx = (level * 4.0).round() as usize;\n                bar.push(BLOCKS[idx]);\n            }\n            bar\n        }\n        None => format!(\" {} \", spin),\n    }\n}\n\nfn draw_table(frame: &mut Frame, app: &mut App, area: Rect, tc: &ThemeColors) {\n    let sort_col = app.sort_column;\n    let header_names = [\n        \"\", \"Inst\", \"Model\", \"Provider\", \"Params\", \"Score\", \"tok/s*\", \"Quant\", \"Mode\", \"Mem %\",\n        \"Ctx\", \"Date\", \"Fit\", \"Use Case\",\n    ];\n    let sort_col_idx: Option<usize> = match sort_col {\n        SortColumn::Score => Some(5),\n        SortColumn::Tps => Some(6),\n        SortColumn::Params => Some(4),\n        SortColumn::MemPct => Some(9),\n        SortColumn::Ctx => Some(10),\n        SortColumn::ReleaseDate => Some(11),\n        SortColumn::UseCase => Some(13),\n    };\n    let in_select_mode = app.input_mode == InputMode::Select;\n    let header_cells = header_names.iter().enumerate().map(|(i, h)| {\n        if in_select_mode && app.select_column == i {\n            Cell::from(format!(\"▸{}◂\", h)).style(\n                Style::default()\n                    .fg(tc.fg)\n                    .bg(tc.accent_secondary)\n                    .add_modifier(Modifier::BOLD),\n            )\n        } else if sort_col_idx == Some(i) {\n            let arrow = if app.sort_ascending { \"▲\" } else { \"▼\" };\n            Cell::from(format!(\"{} {}\", h, arrow)).style(\n                Style::default()\n                    .fg(tc.accent_secondary)\n                    .add_modifier(Modifier::BOLD),\n            )\n        } else {\n            Cell::from(*h).style(Style::default().fg(tc.accent).add_modifier(Modifier::BOLD))\n        }\n    });\n    let header = Row::new(header_cells).height(1);\n\n    let visible_rows = (area.height as usize).saturating_sub(3).max(1);\n    let total_rows = app.filtered_fits.len();\n    let viewport_start = if total_rows <= visible_rows || app.selected_row < visible_rows {\n        0\n    } else {\n        app.selected_row + 1 - visible_rows\n    };\n    let viewport_end = (viewport_start + visible_rows).min(total_rows);\n\n    let visual_range = app.visual_range();\n    let rows: Vec<Row> = app\n        .filtered_fits\n        .iter()\n        .enumerate()\n        .skip(viewport_start)\n        .take(viewport_end.saturating_sub(viewport_start))\n        .map(|(row_idx, &idx)| {\n            let fit = &app.all_fits[idx];\n            let color = fit_color(fit.fit_level, tc);\n\n            let mode_color = match fit.run_mode {\n                llmfit_core::fit::RunMode::Gpu => tc.mode_gpu,\n                llmfit_core::fit::RunMode::MoeOffload => tc.mode_moe,\n                llmfit_core::fit::RunMode::CpuOffload => tc.mode_offload,\n                llmfit_core::fit::RunMode::CpuOnly => tc.mode_cpu,\n            };\n\n            let score_color = if fit.score >= 70.0 {\n                tc.score_high\n            } else if fit.score >= 50.0 {\n                tc.score_mid\n            } else {\n                tc.score_low\n            };\n\n            #[allow(clippy::if_same_then_else)]\n            let tps_text = if fit.estimated_tps >= 100.0 {\n                format!(\"{:.0}\", fit.estimated_tps)\n            } else if fit.estimated_tps >= 10.0 {\n                format!(\"{:.1}\", fit.estimated_tps)\n            } else {\n                format!(\"{:.1}\", fit.estimated_tps)\n            };\n\n            let is_pulling = app.pull_active.is_some()\n                && app.pull_model_name.as_deref() == Some(&fit.model.name);\n            let capability = app.download_capability_for(&fit.model.name);\n\n            let installed_icon = if fit.installed {\n                \" ✓\".to_string()\n            } else if is_pulling {\n                pull_indicator(app.pull_percent, app.tick_count)\n            } else {\n                match capability {\n                    DownloadCapability::Unknown => \" …\".to_string(),\n                    DownloadCapability::Known(flags) => {\n                        if flags == 0 {\n                            \" —\".to_string()\n                        } else {\n                            let mut s = String::new();\n                            if flags & DL_OLLAMA != 0 {\n                                s.push('O');\n                            }\n                            if flags & DL_LLAMACPP != 0 {\n                                s.push('L');\n                            }\n                            if flags & DL_DOCKER != 0 {\n                                s.push('D');\n                            }\n                            if flags & DL_LMSTUDIO != 0 {\n                                s.push('S');\n                            }\n                            format!(\"{:>2}\", s)\n                        }\n                    }\n                }\n            };\n            let installed_color = if fit.installed {\n                tc.good\n            } else if is_pulling {\n                tc.warning\n            } else {\n                match capability {\n                    DownloadCapability::Unknown => tc.muted,\n                    DownloadCapability::Known(0) => tc.muted,\n                    DownloadCapability::Known(_) => tc.info,\n                }\n            };\n\n            let in_visual_range = visual_range\n                .as_ref()\n                .map(|r| r.contains(&row_idx))\n                .unwrap_or(false);\n            let row_style = if is_pulling {\n                Style::default().bg(Color::Rgb(50, 50, 0))\n            } else if in_visual_range {\n                Style::default().bg(Color::Rgb(40, 40, 80))\n            } else {\n                Style::default()\n            };\n\n            let marker = if app.compare_mark_model.as_deref() == Some(fit.model.name.as_str()) {\n                format!(\"{}*\", fit_indicator(fit.fit_level))\n            } else {\n                fit_indicator(fit.fit_level).to_string()\n            };\n\n            Row::new(vec![\n                Cell::from(marker).style(Style::default().fg(color)),\n                Cell::from(installed_icon).style(Style::default().fg(installed_color)),\n                Cell::from(fit.model.name.clone()).style(Style::default().fg(tc.fg)),\n                Cell::from(fit.model.provider.clone()).style(Style::default().fg(tc.muted)),\n                Cell::from(fit.model.parameter_count.clone()).style(Style::default().fg(tc.fg)),\n                Cell::from(format!(\"{:.0}\", fit.score)).style(Style::default().fg(score_color)),\n                Cell::from(tps_text).style(Style::default().fg(tc.fg)),\n                Cell::from(fit.best_quant.clone()).style(Style::default().fg(tc.muted)),\n                Cell::from(fit.run_mode_text().to_string()).style(Style::default().fg(mode_color)),\n                Cell::from(format!(\"{:.0}%\", fit.utilization_pct))\n                    .style(Style::default().fg(color)),\n                Cell::from(format!(\"{}k\", fit.model.context_length / 1000))\n                    .style(Style::default().fg(tc.muted)),\n                Cell::from(\n                    fit.model\n                        .release_date\n                        .as_deref()\n                        .and_then(|d| d.get(..7))\n                        .unwrap_or(\"\\u{2014}\")\n                        .to_string(),\n                )\n                .style(Style::default().fg(tc.muted)),\n                Cell::from(fit.fit_text().to_string()).style(Style::default().fg(color)),\n                Cell::from(fit.use_case.label().to_string()).style(Style::default().fg(tc.muted)),\n            ])\n            .style(row_style)\n        })\n        .collect();\n\n    let widths = [\n        Constraint::Length(2),  // indicator\n        Constraint::Length(5),  // installed / pull %\n        Constraint::Min(20),    // model name\n        Constraint::Length(12), // provider\n        Constraint::Length(8),  // params\n        Constraint::Length(6),  // score\n        Constraint::Length(6),  // tok/s\n        Constraint::Length(10), // quant (AWQ-4bit, GPTQ-Int4, GPTQ-Int8)\n        Constraint::Length(7),  // mode\n        Constraint::Length(6),  // mem %\n        Constraint::Length(5),  // ctx\n        Constraint::Length(8),  // date (YYYY-MM)\n        Constraint::Length(10), // fit\n        Constraint::Min(10),    // use case\n    ];\n\n    let count_text = format!(\n        \" Models ({}/{}) \",\n        app.filtered_fits.len(),\n        app.all_fits.len()\n    );\n\n    let table = Table::new(rows, widths)\n        .header(header)\n        .block(\n            Block::default()\n                .borders(Borders::ALL)\n                .border_style(Style::default().fg(tc.border))\n                .title(count_text)\n                .title_style(Style::default().fg(tc.fg)),\n        )\n        .row_highlight_style(\n            Style::default()\n                .bg(tc.highlight_bg)\n                .add_modifier(Modifier::BOLD),\n        )\n        .highlight_symbol(\"▶ \");\n\n    let mut state = TableState::default();\n    if !app.filtered_fits.is_empty() {\n        state.select(Some(app.selected_row.saturating_sub(viewport_start)));\n    }\n\n    frame.render_stateful_widget(table, area, &mut state);\n\n    // Scrollbar\n    if app.filtered_fits.len() > (area.height as usize).saturating_sub(3) {\n        let mut scrollbar_state =\n            ScrollbarState::new(app.filtered_fits.len()).position(app.selected_row);\n        frame.render_stateful_widget(\n            Scrollbar::new(ScrollbarOrientation::VerticalRight)\n                .begin_symbol(Some(\"↑\"))\n                .end_symbol(Some(\"↓\")),\n            area,\n            &mut scrollbar_state,\n        );\n    }\n}\n\nfn draw_compare(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColors) {\n    let Some((left, right)) = app.selected_compare_pair() else {\n        let block = Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border))\n            .title(\" Compare \")\n            .title_style(Style::default().fg(tc.title).add_modifier(Modifier::BOLD));\n        let body = Paragraph::new(vec![\n            Line::from(\"\"),\n            Line::from(Span::styled(\n                \"  Compare requires two different models.\",\n                Style::default().fg(tc.warning),\n            )),\n            Line::from(\"\"),\n            Line::from(Span::styled(\n                \"  1) Move to a model and press m (mark).\",\n                Style::default().fg(tc.muted),\n            )),\n            Line::from(Span::styled(\n                \"  2) Move to another model and press c (compare).\",\n                Style::default().fg(tc.muted),\n            )),\n            Line::from(Span::styled(\n                \"  3) Press c again to return.\",\n                Style::default().fg(tc.muted),\n            )),\n        ])\n        .block(block);\n        frame.render_widget(body, area);\n        return;\n    };\n\n    let sections = Layout::default()\n        .direction(Direction::Vertical)\n        .constraints([Constraint::Length(3), Constraint::Min(10)])\n        .split(area);\n    let cols = Layout::default()\n        .direction(Direction::Horizontal)\n        .constraints([Constraint::Percentage(50), Constraint::Percentage(50)])\n        .split(sections[1]);\n\n    let title = Paragraph::new(Line::from(vec![\n        Span::styled(\" Compare \", Style::default().fg(tc.accent).bold()),\n        Span::styled(\n            format!(\"{}  vs  {}\", left.model.name, right.model.name),\n            Style::default().fg(tc.fg),\n        ),\n    ]))\n    .block(\n        Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border)),\n    );\n    frame.render_widget(title, sections[0]);\n\n    let score_delta = right.score - left.score;\n    let tps_delta = right.estimated_tps - left.estimated_tps;\n    let mem_delta = right.utilization_pct - left.utilization_pct;\n    let params_delta = right.model.params_b() - left.model.params_b();\n    let ctx_delta = right.model.context_length as i64 - left.model.context_length as i64;\n\n    let score_hint = if score_delta > 0.05 {\n        \" ↑\"\n    } else if score_delta < -0.05 {\n        \" ↓\"\n    } else {\n        \" =\"\n    };\n    let tps_hint = if tps_delta > 0.05 {\n        \" ↑\"\n    } else if tps_delta < -0.05 {\n        \" ↓\"\n    } else {\n        \" =\"\n    };\n    let mem_hint = if mem_delta > 0.05 {\n        \" ↑\"\n    } else if mem_delta < -0.05 {\n        \" ↓\"\n    } else {\n        \" =\"\n    };\n    let params_hint = if params_delta > 0.01 {\n        \" ↑\"\n    } else if params_delta < -0.01 {\n        \" ↓\"\n    } else {\n        \" =\"\n    };\n    let ctx_hint = if ctx_delta > 0 {\n        \" ↑\"\n    } else if ctx_delta < 0 {\n        \" ↓\"\n    } else {\n        \" =\"\n    };\n\n    let score_style = Style::default().fg(if score_delta >= 0.0 {\n        tc.good\n    } else {\n        tc.warning\n    });\n    let tps_style = Style::default().fg(if tps_delta >= 0.0 {\n        tc.good\n    } else {\n        tc.warning\n    });\n    let mem_style = Style::default().fg(if mem_delta <= 0.0 {\n        tc.good\n    } else {\n        tc.warning\n    });\n    let params_style = Style::default().fg(if params_delta >= 0.0 {\n        tc.good\n    } else {\n        tc.warning\n    });\n    let ctx_style = Style::default().fg(if ctx_delta >= 0 { tc.good } else { tc.warning });\n\n    let legend = Paragraph::new(Line::from(Span::styled(\n        \"  Delta hints: ↑ value increased, ↓ value decreased (for Mem%, lower is better)\",\n        Style::default().fg(tc.muted),\n    )));\n    frame.render_widget(legend, sections[0]);\n\n    let left_metrics = CompareMetrics {\n        score: format!(\"{:.1}\", left.score),\n        score_style: Style::default().fg(tc.score_high),\n        tps: format!(\"{:.1}\", left.estimated_tps),\n        tps_style: Style::default().fg(tc.fg),\n        mem: format!(\"{:.1}%\", left.utilization_pct),\n        mem_style: Style::default().fg(fit_color(left.fit_level, tc)),\n        params: left.model.parameter_count.clone(),\n        params_style: Style::default().fg(tc.fg),\n        context: format!(\" {} tokens\", left.model.context_length),\n        context_style: Style::default().fg(tc.fg),\n    };\n\n    let right_metrics = CompareMetrics {\n        score: format!(\"{:.1} ({:+.1}){}\", right.score, score_delta, score_hint),\n        score_style,\n        tps: format!(\"{:.1} ({:+.1}){}\", right.estimated_tps, tps_delta, tps_hint),\n        tps_style,\n        mem: format!(\n            \"{:.1}% ({:+.1}%){}\",\n            right.utilization_pct, mem_delta, mem_hint\n        ),\n        mem_style,\n        params: format!(\n            \"{} ({:+.2}B){}\",\n            right.model.parameter_count, params_delta, params_hint\n        ),\n        params_style,\n        context: format!(\n            \" {} tokens ({:+}){}\",\n            right.model.context_length, ctx_delta, ctx_hint\n        ),\n        context_style: ctx_style,\n    };\n\n    render_compare_panel(\n        frame,\n        cols[0],\n        tc,\n        \" Marked (baseline) \",\n        left,\n        &left_metrics,\n    );\n    render_compare_panel(\n        frame,\n        cols[1],\n        tc,\n        \" Selected (delta vs baseline) \",\n        right,\n        &right_metrics,\n    );\n}\n\nstruct CompareMetrics {\n    score: String,\n    score_style: Style,\n    tps: String,\n    tps_style: Style,\n    mem: String,\n    mem_style: Style,\n    params: String,\n    params_style: Style,\n    context: String,\n    context_style: Style,\n}\n\nfn compare_badges(fit: &ModelFit) -> String {\n    let mut tags = Vec::new();\n    if fit.model.is_moe {\n        tags.push(\"MoE\");\n    }\n    if fit.run_mode == llmfit_core::fit::RunMode::MoeOffload {\n        tags.push(\"Offload\");\n    }\n    if !fit.notes.is_empty() {\n        tags.push(\"Notes\");\n    }\n    if tags.is_empty() {\n        \"-\".to_string()\n    } else {\n        tags.join(\", \")\n    }\n}\n\nfn render_compare_panel(\n    frame: &mut Frame,\n    area: Rect,\n    tc: &ThemeColors,\n    title: &str,\n    fit: &ModelFit,\n    metrics: &CompareMetrics,\n) {\n    let lines = vec![\n        Line::from(\"\"),\n        Line::from(vec![\n            Span::styled(\"  Model: \", Style::default().fg(tc.muted)),\n            Span::styled(fit.model.name.clone(), Style::default().fg(tc.fg).bold()),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Provider:\", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\" {}\", fit.model.provider),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Use:    \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\" {}\", fit.use_case.label()),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Released:\", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\n                    \" {}\",\n                    fit.model.release_date.as_deref().unwrap_or(\"Unknown\")\n                ),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Score: \", Style::default().fg(tc.muted)),\n            Span::styled(metrics.score.clone(), metrics.score_style),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Fit:   \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{} {}\", fit_indicator(fit.fit_level), fit.fit_text()),\n                Style::default().fg(fit_color(fit.fit_level, tc)),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  tok/s: \", Style::default().fg(tc.muted)),\n            Span::styled(metrics.tps.clone(), metrics.tps_style),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Mem%:  \", Style::default().fg(tc.muted)),\n            Span::styled(metrics.mem.clone(), metrics.mem_style),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Runtime:\", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\" {}\", fit.runtime_text()),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Mode:   \", Style::default().fg(tc.muted)),\n            Span::styled(fit.run_mode_text(), Style::default().fg(tc.fg)),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Params: \", Style::default().fg(tc.muted)),\n            Span::styled(metrics.params.clone(), metrics.params_style),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Context:\", Style::default().fg(tc.muted)),\n            Span::styled(metrics.context.clone(), metrics.context_style),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Quant:  \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{} (default {})\", fit.best_quant, fit.model.quantization),\n                Style::default().fg(tc.good),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Badges: \", Style::default().fg(tc.muted)),\n            Span::styled(compare_badges(fit), Style::default().fg(tc.info)),\n        ]),\n    ];\n\n    frame.render_widget(\n        Paragraph::new(lines).block(\n            Block::default()\n                .borders(Borders::ALL)\n                .border_style(Style::default().fg(tc.border))\n                .title(title)\n                .title_style(Style::default().fg(tc.accent_secondary)),\n        ),\n        area,\n    );\n}\n\nfn draw_multi_compare(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColors) {\n    if app.compare_models.is_empty() {\n        let block = Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border))\n            .title(\" Compare \")\n            .title_style(Style::default().fg(tc.title).add_modifier(Modifier::BOLD));\n        let body = Paragraph::new(\"  No models selected for comparison.\").block(block);\n        frame.render_widget(body, area);\n        return;\n    }\n\n    let models: Vec<&ModelFit> = app\n        .compare_models\n        .iter()\n        .filter_map(|&idx| app.all_fits.get(idx))\n        .collect();\n\n    if models.len() < 2 {\n        let block = Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border))\n            .title(\" Compare \")\n            .title_style(Style::default().fg(tc.title).add_modifier(Modifier::BOLD));\n        let body = Paragraph::new(\"  Need at least 2 models to compare.\").block(block);\n        frame.render_widget(body, area);\n        return;\n    }\n\n    // Attribute rows: label, value extractor, color logic\n    struct AttrRow {\n        label: &'static str,\n        values: Vec<String>,\n        styles: Vec<Style>,\n    }\n\n    let label_width: u16 = 12;\n    // How many model columns can we fit?\n    let available_width = area.width.saturating_sub(label_width + 3); // borders + label col\n    let col_width: u16 = 20;\n    let max_visible = (available_width / col_width).max(1) as usize;\n    let scroll = app\n        .compare_scroll\n        .min(models.len().saturating_sub(max_visible));\n    let visible_models: Vec<&ModelFit> = models\n        .iter()\n        .skip(scroll)\n        .take(max_visible)\n        .copied()\n        .collect();\n    let n = visible_models.len();\n\n    // Find best/worst for highlighting\n    let best_score = models.iter().map(|m| m.score).fold(f64::MIN, f64::max);\n    let best_tps = models\n        .iter()\n        .map(|m| m.estimated_tps)\n        .fold(f64::MIN, f64::max);\n    let best_mem = models\n        .iter()\n        .map(|m| m.utilization_pct)\n        .fold(f64::MAX, f64::min); // lower is better\n    let best_ctx = models\n        .iter()\n        .map(|m| m.model.context_length)\n        .max()\n        .unwrap_or(0);\n\n    let mut rows: Vec<AttrRow> = Vec::new();\n\n    // Model name\n    rows.push(AttrRow {\n        label: \"Model\",\n        values: visible_models\n            .iter()\n            .map(|m| truncate_str(&m.model.name, col_width as usize - 1))\n            .collect(),\n        styles: vec![Style::default().fg(tc.fg).add_modifier(Modifier::BOLD); n],\n    });\n\n    // Provider\n    rows.push(AttrRow {\n        label: \"Provider\",\n        values: visible_models\n            .iter()\n            .map(|m| m.model.provider.clone())\n            .collect(),\n        styles: vec![Style::default().fg(tc.muted); n],\n    });\n\n    // Score\n    rows.push(AttrRow {\n        label: \"Score\",\n        values: visible_models\n            .iter()\n            .map(|m| format!(\"{:.1}\", m.score))\n            .collect(),\n        styles: visible_models\n            .iter()\n            .map(|m| {\n                if (m.score - best_score).abs() < 0.1 {\n                    Style::default().fg(tc.good).add_modifier(Modifier::BOLD)\n                } else if m.score >= 70.0 {\n                    Style::default().fg(tc.score_high)\n                } else if m.score >= 50.0 {\n                    Style::default().fg(tc.score_mid)\n                } else {\n                    Style::default().fg(tc.score_low)\n                }\n            })\n            .collect(),\n    });\n\n    // tok/s\n    rows.push(AttrRow {\n        label: \"tok/s\",\n        values: visible_models\n            .iter()\n            .map(|m| format!(\"{:.1}\", m.estimated_tps))\n            .collect(),\n        styles: visible_models\n            .iter()\n            .map(|m| {\n                if (m.estimated_tps - best_tps).abs() < 0.1 {\n                    Style::default().fg(tc.good).add_modifier(Modifier::BOLD)\n                } else {\n                    Style::default().fg(tc.fg)\n                }\n            })\n            .collect(),\n    });\n\n    // Fit\n    rows.push(AttrRow {\n        label: \"Fit\",\n        values: visible_models\n            .iter()\n            .map(|m| format!(\"{} {}\", fit_indicator(m.fit_level), m.fit_text()))\n            .collect(),\n        styles: visible_models\n            .iter()\n            .map(|m| Style::default().fg(fit_color(m.fit_level, tc)))\n            .collect(),\n    });\n\n    // Mem %\n    rows.push(AttrRow {\n        label: \"Mem %\",\n        values: visible_models\n            .iter()\n            .map(|m| format!(\"{:.1}%\", m.utilization_pct))\n            .collect(),\n        styles: visible_models\n            .iter()\n            .map(|m| {\n                if (m.utilization_pct - best_mem).abs() < 0.1 {\n                    Style::default().fg(tc.good).add_modifier(Modifier::BOLD)\n                } else {\n                    Style::default().fg(fit_color(m.fit_level, tc))\n                }\n            })\n            .collect(),\n    });\n\n    // Params\n    rows.push(AttrRow {\n        label: \"Params\",\n        values: visible_models\n            .iter()\n            .map(|m| m.model.parameter_count.clone())\n            .collect(),\n        styles: vec![Style::default().fg(tc.fg); n],\n    });\n\n    // Mode\n    rows.push(AttrRow {\n        label: \"Mode\",\n        values: visible_models\n            .iter()\n            .map(|m| m.run_mode_text().to_string())\n            .collect(),\n        styles: visible_models\n            .iter()\n            .map(|m| {\n                let c = match m.run_mode {\n                    llmfit_core::fit::RunMode::Gpu => tc.mode_gpu,\n                    llmfit_core::fit::RunMode::MoeOffload => tc.mode_moe,\n                    llmfit_core::fit::RunMode::CpuOffload => tc.mode_offload,\n                    llmfit_core::fit::RunMode::CpuOnly => tc.mode_cpu,\n                };\n                Style::default().fg(c)\n            })\n            .collect(),\n    });\n\n    // Context\n    rows.push(AttrRow {\n        label: \"Context\",\n        values: visible_models\n            .iter()\n            .map(|m| format!(\"{}k\", m.model.context_length / 1000))\n            .collect(),\n        styles: visible_models\n            .iter()\n            .map(|m| {\n                if m.model.context_length == best_ctx {\n                    Style::default().fg(tc.good).add_modifier(Modifier::BOLD)\n                } else {\n                    Style::default().fg(tc.muted)\n                }\n            })\n            .collect(),\n    });\n\n    // Quant\n    rows.push(AttrRow {\n        label: \"Quant\",\n        values: visible_models\n            .iter()\n            .map(|m| m.best_quant.clone())\n            .collect(),\n        styles: vec![Style::default().fg(tc.muted); n],\n    });\n\n    // Use Case\n    rows.push(AttrRow {\n        label: \"Use Case\",\n        values: visible_models\n            .iter()\n            .map(|m| m.use_case.label().to_string())\n            .collect(),\n        styles: vec![Style::default().fg(tc.muted); n],\n    });\n\n    // Runtime\n    rows.push(AttrRow {\n        label: \"Runtime\",\n        values: visible_models\n            .iter()\n            .map(|m| m.runtime_text().to_string())\n            .collect(),\n        styles: vec![Style::default().fg(tc.fg); n],\n    });\n\n    // Build the table\n    let mut header_cells = vec![Cell::from(\"\").style(Style::default().fg(tc.accent).bold())];\n    for (i, m) in visible_models.iter().enumerate() {\n        let name = truncate_str(&m.model.name, col_width as usize - 1);\n        let style = if i == 0 && scroll == 0 {\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD)\n        } else {\n            Style::default().fg(tc.accent).add_modifier(Modifier::BOLD)\n        };\n        header_cells.push(Cell::from(name).style(style));\n    }\n    let header = Row::new(header_cells).height(1);\n\n    let table_rows: Vec<Row> = rows\n        .iter()\n        .enumerate()\n        .map(|(row_idx, attr)| {\n            let mut cells =\n                vec![Cell::from(attr.label).style(Style::default().fg(tc.muted).bold())];\n            for (col_idx, (val, style)) in attr.values.iter().zip(attr.styles.iter()).enumerate() {\n                let _ = col_idx;\n                cells.push(Cell::from(val.as_str()).style(*style));\n            }\n            let bg = if row_idx % 2 == 0 {\n                Style::default()\n            } else {\n                Style::default().bg(Color::Rgb(25, 25, 35))\n            };\n            Row::new(cells).style(bg)\n        })\n        .collect();\n\n    let mut widths = vec![Constraint::Length(label_width)];\n    for _ in 0..n {\n        widths.push(Constraint::Length(col_width));\n    }\n\n    let scroll_info = if models.len() > max_visible {\n        format!(\" Compare ({}/{})  ←/→ scroll \", models.len(), models.len())\n    } else {\n        format!(\" Compare ({} models) \", models.len())\n    };\n\n    let table = Table::new(table_rows, widths).header(header).block(\n        Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border))\n            .title(scroll_info)\n            .title_style(\n                Style::default()\n                    .fg(tc.accent_secondary)\n                    .add_modifier(Modifier::BOLD),\n            ),\n    );\n\n    frame.render_widget(table, area);\n}\n\nfn truncate_str(s: &str, max_len: usize) -> String {\n    if s.len() <= max_len {\n        s.to_string()\n    } else {\n        format!(\"{}~\", &s[..max_len.saturating_sub(1)])\n    }\n}\n\nfn draw_detail(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColors) {\n    let fit = match app.selected_fit() {\n        Some(f) => f,\n        None => {\n            let block = Block::default()\n                .borders(Borders::ALL)\n                .title(\" No model selected \");\n            frame.render_widget(block, area);\n            return;\n        }\n    };\n\n    let color = fit_color(fit.fit_level, tc);\n\n    let mut lines = vec![\n        Line::from(\"\"),\n        Line::from(vec![\n            Span::styled(\"  Model:       \", Style::default().fg(tc.muted)),\n            Span::styled(&fit.model.name, Style::default().fg(tc.fg).bold()),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Provider:    \", Style::default().fg(tc.muted)),\n            Span::styled(&fit.model.provider, Style::default().fg(tc.fg)),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Parameters:  \", Style::default().fg(tc.muted)),\n            Span::styled(&fit.model.parameter_count, Style::default().fg(tc.fg)),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Quantization:\", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\" {}\", fit.model.quantization),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Best Quant:  \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\" {} (for this hardware)\", fit.best_quant),\n                Style::default().fg(tc.good),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Context:     \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{} tokens\", fit.model.context_length),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Use Case:    \", Style::default().fg(tc.muted)),\n            Span::styled(&fit.model.use_case, Style::default().fg(tc.fg)),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Category:    \", Style::default().fg(tc.muted)),\n            Span::styled(fit.use_case.label(), Style::default().fg(tc.accent)),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Capabilities:\", Style::default().fg(tc.muted)),\n            Span::styled(\n                if fit.model.capabilities.is_empty() {\n                    \" None\".to_string()\n                } else {\n                    format!(\n                        \" {}\",\n                        fit.model\n                            .capabilities\n                            .iter()\n                            .map(|c| c.label())\n                            .collect::<Vec<_>>()\n                            .join(\", \")\n                    )\n                },\n                Style::default().fg(tc.info),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Released:    \", Style::default().fg(tc.muted)),\n            Span::styled(\n                fit.model.release_date.as_deref().unwrap_or(\"Unknown\"),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Runtime:     \", Style::default().fg(tc.muted)),\n            Span::styled(\n                fit.runtime_text(),\n                Style::default().fg(match fit.runtime {\n                    llmfit_core::fit::InferenceRuntime::Mlx => tc.accent,\n                    llmfit_core::fit::InferenceRuntime::Vllm => tc.accent_secondary,\n                    _ => tc.fg,\n                }),\n            ),\n            Span::styled(\n                format!(\" (baseline est. ~{:.1} tok/s)\", fit.estimated_tps),\n                Style::default().fg(tc.muted),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Installed:   \", Style::default().fg(tc.muted)),\n            {\n                let mut installed_providers = Vec::new();\n                if providers::is_model_installed(&fit.model.name, &app.ollama_installed) {\n                    installed_providers.push(\"Ollama\");\n                }\n                if providers::is_model_installed_mlx(&fit.model.name, &app.mlx_installed) {\n                    installed_providers.push(\"MLX\");\n                }\n                if providers::is_model_installed_llamacpp(&fit.model.name, &app.llamacpp_installed)\n                {\n                    installed_providers.push(\"llama.cpp\");\n                }\n                if providers::is_model_installed_docker_mr(\n                    &fit.model.name,\n                    &app.docker_mr_installed,\n                ) {\n                    installed_providers.push(\"Docker\");\n                }\n                if providers::is_model_installed_lmstudio(&fit.model.name, &app.lmstudio_installed)\n                {\n                    installed_providers.push(\"LM Studio\");\n                }\n                let any_available = app.ollama_available\n                    || app.mlx_available\n                    || app.llamacpp_available\n                    || app.docker_mr_available\n                    || app.lmstudio_available;\n\n                if !installed_providers.is_empty() {\n                    let label = installed_providers\n                        .iter()\n                        .map(|p| format!(\"✓ {p}\"))\n                        .collect::<Vec<_>>()\n                        .join(\"  \");\n                    Span::styled(label, Style::default().fg(tc.good).bold())\n                } else if any_available {\n                    Span::styled(\"✗ No  (press d to pull)\", Style::default().fg(tc.muted))\n                } else {\n                    Span::styled(\"- No runtime detected\", Style::default().fg(tc.muted))\n                }\n            },\n        ]),\n    ];\n\n    // Scoring section\n    let score_color = if fit.score >= 70.0 {\n        tc.score_high\n    } else if fit.score >= 50.0 {\n        tc.score_mid\n    } else {\n        tc.score_low\n    };\n    lines.extend_from_slice(&[\n        Line::from(\"\"),\n        Line::from(Span::styled(\n            \"  ── Score Breakdown ──\",\n            Style::default().fg(tc.accent),\n        )),\n        Line::from(\"\"),\n        Line::from(vec![\n            Span::styled(\"  Overall:     \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.1} / 100\", fit.score),\n                Style::default().fg(score_color).bold(),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Quality:     \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.0}\", fit.score_components.quality),\n                Style::default().fg(tc.fg),\n            ),\n            Span::styled(\"  Speed: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.0}\", fit.score_components.speed),\n                Style::default().fg(tc.fg),\n            ),\n            Span::styled(\"  Fit: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.0}\", fit.score_components.fit),\n                Style::default().fg(tc.fg),\n            ),\n            Span::styled(\"  Context: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.0}\", fit.score_components.context),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Baseline Est:\", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.1} tok/s\", fit.estimated_tps),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n    ]);\n\n    // MoE Architecture section\n    if fit.model.is_moe {\n        lines.push(Line::from(\"\"));\n        lines.push(Line::from(Span::styled(\n            \"  ── MoE Architecture ──\",\n            Style::default().fg(tc.accent),\n        )));\n        lines.push(Line::from(\"\"));\n\n        if let (Some(num_experts), Some(active_experts)) =\n            (fit.model.num_experts, fit.model.active_experts)\n        {\n            lines.push(Line::from(vec![\n                Span::styled(\"  Experts:     \", Style::default().fg(tc.muted)),\n                Span::styled(\n                    format!(\n                        \"{} active / {} total per token\",\n                        active_experts, num_experts\n                    ),\n                    Style::default().fg(tc.accent),\n                ),\n            ]));\n        }\n\n        if let Some(active_vram) = fit.model.moe_active_vram_gb() {\n            lines.push(Line::from(vec![\n                Span::styled(\"  Active VRAM: \", Style::default().fg(tc.muted)),\n                Span::styled(\n                    format!(\"{:.1} GB\", active_vram),\n                    Style::default().fg(tc.accent),\n                ),\n                Span::styled(\n                    format!(\n                        \"  (vs {:.1} GB full model)\",\n                        fit.model.min_vram_gb.unwrap_or(0.0)\n                    ),\n                    Style::default().fg(tc.muted),\n                ),\n            ]));\n        }\n\n        if let Some(offloaded) = fit.moe_offloaded_gb {\n            lines.push(Line::from(vec![\n                Span::styled(\"  Offloaded:   \", Style::default().fg(tc.muted)),\n                Span::styled(\n                    format!(\"{:.1} GB inactive experts in RAM\", offloaded),\n                    Style::default().fg(tc.warning),\n                ),\n            ]));\n        }\n\n        if fit.run_mode == llmfit_core::fit::RunMode::MoeOffload {\n            lines.push(Line::from(vec![\n                Span::styled(\"  Strategy:    \", Style::default().fg(tc.muted)),\n                Span::styled(\n                    \"Expert offloading (active in VRAM, inactive in RAM)\",\n                    Style::default().fg(tc.good),\n                ),\n            ]));\n        } else if fit.run_mode == llmfit_core::fit::RunMode::Gpu {\n            lines.push(Line::from(vec![\n                Span::styled(\"  Strategy:    \", Style::default().fg(tc.muted)),\n                Span::styled(\n                    \"All experts loaded in VRAM (optimal)\",\n                    Style::default().fg(tc.good),\n                ),\n            ]));\n        }\n    }\n\n    lines.extend_from_slice(&[\n        Line::from(\"\"),\n        Line::from(Span::styled(\n            \"  ── System Fit ──\",\n            Style::default().fg(tc.accent),\n        )),\n        Line::from(\"\"),\n        Line::from(vec![\n            Span::styled(\"  Fit Level:   \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{} {}\", fit_indicator(fit.fit_level), fit.fit_text()),\n                Style::default().fg(color).bold(),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Run Mode:    \", Style::default().fg(tc.muted)),\n            Span::styled(fit.run_mode_text(), Style::default().fg(tc.fg).bold()),\n        ]),\n        Line::from(\"\"),\n        Line::from(Span::styled(\n            \"  -- Memory --\",\n            Style::default().fg(tc.accent),\n        )),\n        Line::from(\"\"),\n    ]);\n\n    if let Some(vram) = fit.model.min_vram_gb {\n        let vram_label = if app.specs.has_gpu {\n            if app.specs.unified_memory {\n                if let Some(sys_vram) = app.specs.gpu_vram_gb {\n                    format!(\"  (shared: {:.1} GB)\", sys_vram)\n                } else {\n                    \"  (shared memory)\".to_string()\n                }\n            } else if let Some(sys_vram) = app.specs.gpu_vram_gb {\n                format!(\"  (system: {:.1} GB)\", sys_vram)\n            } else {\n                \"  (system: unknown)\".to_string()\n            }\n        } else {\n            \"  (no GPU)\".to_string()\n        };\n        lines.push(Line::from(vec![\n            Span::styled(\"  Min VRAM:    \", Style::default().fg(tc.muted)),\n            Span::styled(format!(\"{:.1} GB\", vram), Style::default().fg(tc.fg)),\n            Span::styled(vram_label, Style::default().fg(tc.muted)),\n        ]));\n    }\n\n    lines.extend_from_slice(&[\n        Line::from(vec![\n            Span::styled(\"  Min RAM:     \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.1} GB\", fit.model.min_ram_gb),\n                Style::default().fg(tc.fg),\n            ),\n            Span::styled(\n                format!(\"  (system: {:.1} GB avail)\", app.specs.available_ram_gb),\n                Style::default().fg(tc.muted),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Rec RAM:     \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.1} GB\", fit.model.recommended_ram_gb),\n                Style::default().fg(tc.fg),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Mem Usage:   \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.1}%\", fit.utilization_pct),\n                Style::default().fg(color),\n            ),\n            Span::styled(\n                format!(\n                    \"  ({:.1} / {:.1} GB)\",\n                    fit.memory_required_gb, fit.memory_available_gb\n                ),\n                Style::default().fg(tc.muted),\n            ),\n        ]),\n    ]);\n\n    // Build right-pane content (GGUF sources + notes)\n    let has_right_pane = !fit.model.gguf_sources.is_empty() || !fit.notes.is_empty();\n\n    let mut right_lines: Vec<Line> = vec![Line::from(\"\")];\n\n    if !fit.model.gguf_sources.is_empty() {\n        right_lines.push(Line::from(Span::styled(\n            \"  ── GGUF Downloads ──\",\n            Style::default().fg(tc.accent),\n        )));\n        right_lines.push(Line::from(\"\"));\n        for src in &fit.model.gguf_sources {\n            right_lines.push(Line::from(vec![\n                Span::styled(\n                    format!(\"  📦 {:<12}\", src.provider),\n                    Style::default().fg(tc.info),\n                ),\n                Span::styled(format!(\"hf.co/{}\", src.repo), Style::default().fg(tc.fg)),\n            ]));\n        }\n        right_lines.push(Line::from(\"\"));\n        right_lines.push(Line::from(Span::styled(\n            \"  llmfit download \\\\\".to_string(),\n            Style::default().fg(tc.muted),\n        )));\n        right_lines.push(Line::from(Span::styled(\n            format!(\"    {} \\\\\", fit.model.gguf_sources[0].repo),\n            Style::default().fg(tc.muted),\n        )));\n        right_lines.push(Line::from(Span::styled(\n            format!(\"    --quant {}\", fit.best_quant),\n            Style::default().fg(tc.muted),\n        )));\n        right_lines.push(Line::from(\"\"));\n    }\n\n    if !fit.notes.is_empty() {\n        right_lines.push(Line::from(Span::styled(\n            \"  ── Notes ──\",\n            Style::default().fg(tc.accent),\n        )));\n        right_lines.push(Line::from(\"\"));\n        for note in &fit.notes {\n            right_lines.push(Line::from(Span::styled(\n                format!(\"  {}\", note),\n                Style::default().fg(tc.fg),\n            )));\n        }\n    }\n\n    // Track the left pane area for cursor positioning\n    let left_area;\n\n    if has_right_pane {\n        // Split into left (model info) and right (downloads + notes) panes\n        let h_layout = Layout::default()\n            .direction(Direction::Horizontal)\n            .constraints([Constraint::Percentage(55), Constraint::Percentage(45)])\n            .split(area);\n\n        left_area = h_layout[0];\n\n        let left_block = Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border))\n            .title(format!(\" {} \", fit.model.name))\n            .title_style(Style::default().fg(tc.fg).bold());\n\n        let left_paragraph = Paragraph::new(lines)\n            .block(left_block)\n            .wrap(Wrap { trim: false });\n        frame.render_widget(left_paragraph, h_layout[0]);\n\n        let right_title = if !fit.model.gguf_sources.is_empty() {\n            \" 📦 Downloads & Notes \"\n        } else {\n            \" Notes \"\n        };\n        let right_block = Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border))\n            .title(right_title)\n            .title_style(Style::default().fg(tc.info).bold());\n\n        let right_paragraph = Paragraph::new(right_lines)\n            .block(right_block)\n            .wrap(Wrap { trim: false });\n        frame.render_widget(right_paragraph, h_layout[1]);\n    } else {\n        left_area = area;\n\n        let block = Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border))\n            .title(format!(\" {} \", fit.model.name))\n            .title_style(Style::default().fg(tc.fg).bold());\n\n        let paragraph = Paragraph::new(lines)\n            .block(block)\n            .wrap(Wrap { trim: false });\n        frame.render_widget(paragraph, area);\n    }\n\n    if app.input_mode == InputMode::Plan {\n        let (row_offset, label_len) = match app.plan_field {\n            PlanField::Context => (5u16, \"  Context:    \".len() as u16),\n            PlanField::Quant => (6u16, \"  Quant:      \".len() as u16),\n            PlanField::TargetTps => (7u16, \"  Target TPS: \".len() as u16),\n        };\n        let x = left_area.x + 1 + label_len + app.plan_cursor_position as u16;\n        let y = left_area.y + 1 + row_offset;\n        if x < left_area.x + left_area.width.saturating_sub(1)\n            && y < left_area.y + left_area.height.saturating_sub(1)\n        {\n            frame.set_cursor_position((x, y));\n        }\n    }\n}\n\nfn draw_plan(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColors) {\n    let Some(model_name) = app.plan_model_name() else {\n        let block = Block::default()\n            .borders(Borders::ALL)\n            .border_style(Style::default().fg(tc.border))\n            .title(\" Planner \");\n        frame.render_widget(block, area);\n        return;\n    };\n\n    let field_style = |field: PlanField| {\n        if app.input_mode == InputMode::Plan && app.plan_field == field {\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD)\n        } else {\n            Style::default().fg(tc.fg)\n        }\n    };\n\n    let mut lines = vec![\n        Line::from(\"\"),\n        Line::from(vec![\n            Span::styled(\"  Model: \", Style::default().fg(tc.muted)),\n            Span::styled(model_name, Style::default().fg(tc.fg).bold()),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Note: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                \"Estimate-based using current llmfit fit/speed heuristics.\",\n                Style::default().fg(tc.warning),\n            ),\n        ]),\n        Line::from(\"\"),\n        Line::from(Span::styled(\n            \"  Inputs (editable)\",\n            Style::default().fg(tc.accent),\n        )),\n        Line::from(vec![\n            Span::styled(\"  Context:    \", Style::default().fg(tc.muted)),\n            Span::styled(\n                if app.plan_context_input.is_empty() {\n                    \"<required>\"\n                } else {\n                    app.plan_context_input.as_str()\n                },\n                field_style(PlanField::Context),\n            ),\n            Span::styled(\" tokens\", Style::default().fg(tc.muted)),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Quant:      \", Style::default().fg(tc.muted)),\n            Span::styled(\n                if app.plan_quant_input.is_empty() {\n                    \"<auto>\"\n                } else {\n                    app.plan_quant_input.as_str()\n                },\n                field_style(PlanField::Quant),\n            ),\n        ]),\n        Line::from(vec![\n            Span::styled(\"  Target TPS: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                if app.plan_target_tps_input.is_empty() {\n                    \"<none>\"\n                } else {\n                    app.plan_target_tps_input.as_str()\n                },\n                field_style(PlanField::TargetTps),\n            ),\n            Span::styled(\" tok/s\", Style::default().fg(tc.muted)),\n        ]),\n        Line::from(\"\"),\n    ];\n\n    if let Some(err) = &app.plan_error {\n        lines.push(Line::from(vec![\n            Span::styled(\"  Error: \", Style::default().fg(tc.error)),\n            Span::styled(err, Style::default().fg(tc.error).bold()),\n        ]));\n    } else if let Some(plan) = &app.plan_estimate {\n        lines.push(Line::from(Span::styled(\n            \"  Minimum Hardware\",\n            Style::default().fg(tc.accent),\n        )));\n        lines.push(Line::from(vec![\n            Span::styled(\"  VRAM: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                plan.minimum\n                    .vram_gb\n                    .map(|v| format!(\"{v:.1} GB\"))\n                    .unwrap_or_else(|| \"n/a\".to_string()),\n                Style::default().fg(tc.fg),\n            ),\n            Span::styled(\"   RAM: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.1} GB\", plan.minimum.ram_gb),\n                Style::default().fg(tc.fg),\n            ),\n            Span::styled(\"   CPU: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{} cores\", plan.minimum.cpu_cores),\n                Style::default().fg(tc.fg),\n            ),\n        ]));\n        lines.push(Line::from(\" \"));\n        lines.push(Line::from(Span::styled(\n            \"  Recommended Hardware\",\n            Style::default().fg(tc.accent),\n        )));\n        lines.push(Line::from(vec![\n            Span::styled(\"  VRAM: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                plan.recommended\n                    .vram_gb\n                    .map(|v| format!(\"{v:.1} GB\"))\n                    .unwrap_or_else(|| \"n/a\".to_string()),\n                Style::default().fg(tc.fg),\n            ),\n            Span::styled(\"   RAM: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{:.1} GB\", plan.recommended.ram_gb),\n                Style::default().fg(tc.fg),\n            ),\n            Span::styled(\"   CPU: \", Style::default().fg(tc.muted)),\n            Span::styled(\n                format!(\"{} cores\", plan.recommended.cpu_cores),\n                Style::default().fg(tc.fg),\n            ),\n        ]));\n        lines.push(Line::from(\" \"));\n        lines.push(Line::from(Span::styled(\n            \"  Run Paths\",\n            Style::default().fg(tc.accent),\n        )));\n\n        for path in &plan.run_paths {\n            let path_color = if path.feasible { tc.good } else { tc.error };\n            let status = if path.feasible { \"yes\" } else { \"no\" };\n            lines.push(Line::from(vec![\n                Span::styled(\"  - \", Style::default().fg(tc.muted)),\n                Span::styled(path.path.label(), Style::default().fg(tc.fg).bold()),\n                Span::styled(\": \", Style::default().fg(tc.muted)),\n                Span::styled(status, Style::default().fg(path_color)),\n                Span::styled(\"  tps=\", Style::default().fg(tc.muted)),\n                Span::styled(\n                    path.estimated_tps\n                        .map(|t| format!(\"{t:.1}\"))\n                        .unwrap_or_else(|| \"-\".to_string()),\n                    Style::default().fg(tc.fg),\n                ),\n                Span::styled(\"  fit=\", Style::default().fg(tc.muted)),\n                Span::styled(\n                    path.fit_level\n                        .map(|f| match f {\n                            FitLevel::Perfect => \"Perfect\",\n                            FitLevel::Good => \"Good\",\n                            FitLevel::Marginal => \"Marginal\",\n                            FitLevel::TooTight => \"Too Tight\",\n                        })\n                        .unwrap_or(\"-\"),\n                    Style::default().fg(path_color),\n                ),\n            ]));\n        }\n\n        lines.push(Line::from(\" \"));\n        lines.push(Line::from(Span::styled(\n            \"  Upgrade Deltas\",\n            Style::default().fg(tc.accent),\n        )));\n        if plan.upgrade_deltas.is_empty() {\n            lines.push(Line::from(Span::styled(\n                \"  - none required\",\n                Style::default().fg(tc.good),\n            )));\n        } else {\n            for delta in &plan.upgrade_deltas {\n                lines.push(Line::from(Span::styled(\n                    format!(\"  - {}\", delta.description),\n                    Style::default().fg(tc.fg),\n                )));\n            }\n        }\n    }\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.border))\n        .title(format!(\" Plan: {} \", model_name))\n        .title_style(Style::default().fg(tc.fg).bold());\n\n    let paragraph = Paragraph::new(lines)\n        .block(block)\n        .wrap(Wrap { trim: false });\n    frame.render_widget(paragraph, area);\n}\n\nfn draw_provider_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {\n    let area = frame.area();\n\n    let max_name_len = app.providers.iter().map(|p| p.len()).max().unwrap_or(10);\n    let popup_width = (max_name_len as u16 + 10).min(area.width.saturating_sub(4));\n    let popup_height = (app.providers.len() as u16 + 2).min(area.height.saturating_sub(4));\n\n    let x = area.x + (area.width.saturating_sub(popup_width)) / 2;\n    let y = area.y + (area.height.saturating_sub(popup_height)) / 2;\n    let popup_area = Rect::new(x, y, popup_width, popup_height);\n\n    frame.render_widget(Clear, popup_area);\n\n    let inner_height = popup_height.saturating_sub(2) as usize;\n    let total = app.providers.len();\n\n    let scroll_offset = if app.provider_cursor >= inner_height {\n        app.provider_cursor - inner_height + 1\n    } else {\n        0\n    };\n\n    let lines: Vec<Line> = app\n        .providers\n        .iter()\n        .enumerate()\n        .skip(scroll_offset)\n        .take(inner_height)\n        .map(|(i, name)| {\n            let checkbox = if app.selected_providers[i] {\n                \"[x]\"\n            } else {\n                \"[ ]\"\n            };\n            let is_cursor = i == app.provider_cursor;\n\n            let style = if is_cursor {\n                if app.selected_providers[i] {\n                    Style::default()\n                        .fg(tc.good)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                } else {\n                    Style::default()\n                        .fg(tc.fg)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                }\n            } else if app.selected_providers[i] {\n                Style::default().fg(tc.good)\n            } else {\n                Style::default().fg(tc.muted)\n            };\n\n            Line::from(Span::styled(format!(\" {} {}\", checkbox, name), style))\n        })\n        .collect();\n\n    let active_count = app.selected_providers.iter().filter(|&&s| s).count();\n    let title = format!(\" Providers ({}/{}) \", active_count, total);\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.accent_secondary))\n        .title(title)\n        .title_style(\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD),\n        );\n\n    let paragraph = Paragraph::new(lines).block(block);\n    frame.render_widget(paragraph, popup_area);\n}\n\nfn draw_use_case_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {\n    let area = frame.area();\n\n    let max_name_len = app\n        .use_cases\n        .iter()\n        .map(|uc| uc.label().len())\n        .max()\n        .unwrap_or(10);\n    let popup_width = (max_name_len as u16 + 10).min(area.width.saturating_sub(4));\n    let popup_height = (app.use_cases.len() as u16 + 2).min(area.height.saturating_sub(4));\n\n    let x = area.x + (area.width.saturating_sub(popup_width)) / 2;\n    let y = area.y + (area.height.saturating_sub(popup_height)) / 2;\n    let popup_area = Rect::new(x, y, popup_width, popup_height);\n\n    frame.render_widget(Clear, popup_area);\n\n    let inner_height = popup_height.saturating_sub(2) as usize;\n    let total = app.use_cases.len();\n\n    let scroll_offset = if app.use_case_cursor >= inner_height {\n        app.use_case_cursor - inner_height + 1\n    } else {\n        0\n    };\n\n    let lines: Vec<Line> = app\n        .use_cases\n        .iter()\n        .enumerate()\n        .skip(scroll_offset)\n        .take(inner_height)\n        .map(|(i, use_case)| {\n            let checkbox = if app.selected_use_cases[i] {\n                \"[x]\"\n            } else {\n                \"[ ]\"\n            };\n            let is_cursor = i == app.use_case_cursor;\n\n            let style = if is_cursor {\n                if app.selected_use_cases[i] {\n                    Style::default()\n                        .fg(tc.good)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                } else {\n                    Style::default()\n                        .fg(tc.fg)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                }\n            } else if app.selected_use_cases[i] {\n                Style::default().fg(tc.good)\n            } else {\n                Style::default().fg(tc.muted)\n            };\n\n            Line::from(Span::styled(\n                format!(\" {} {}\", checkbox, use_case.label()),\n                style,\n            ))\n        })\n        .collect();\n\n    let active_count = app.selected_use_cases.iter().filter(|&&s| s).count();\n    let title = format!(\" Use Cases ({}/{}) \", active_count, total);\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.accent_secondary))\n        .title(title)\n        .title_style(\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD),\n        );\n\n    let paragraph = Paragraph::new(lines).block(block);\n    frame.render_widget(paragraph, popup_area);\n}\n\nfn draw_capability_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {\n    let area = frame.area();\n\n    let max_name_len = app\n        .capabilities\n        .iter()\n        .map(|c| c.label().len())\n        .max()\n        .unwrap_or(10);\n    let popup_width = (max_name_len as u16 + 10).min(area.width.saturating_sub(4));\n    let popup_height = (app.capabilities.len() as u16 + 2).min(area.height.saturating_sub(4));\n\n    let x = area.x + (area.width.saturating_sub(popup_width)) / 2;\n    let y = area.y + (area.height.saturating_sub(popup_height)) / 2;\n    let popup_area = Rect::new(x, y, popup_width, popup_height);\n\n    frame.render_widget(Clear, popup_area);\n\n    let inner_height = popup_height.saturating_sub(2) as usize;\n    let total = app.capabilities.len();\n\n    let scroll_offset = if app.capability_cursor >= inner_height {\n        app.capability_cursor - inner_height + 1\n    } else {\n        0\n    };\n\n    let lines: Vec<Line> = app\n        .capabilities\n        .iter()\n        .enumerate()\n        .skip(scroll_offset)\n        .take(inner_height)\n        .map(|(i, cap)| {\n            let checkbox = if app.selected_capabilities[i] {\n                \"[x]\"\n            } else {\n                \"[ ]\"\n            };\n            let is_cursor = i == app.capability_cursor;\n\n            let style = if is_cursor {\n                if app.selected_capabilities[i] {\n                    Style::default()\n                        .fg(tc.good)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                } else {\n                    Style::default()\n                        .fg(tc.fg)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                }\n            } else if app.selected_capabilities[i] {\n                Style::default().fg(tc.good)\n            } else {\n                Style::default().fg(tc.muted)\n            };\n\n            Line::from(Span::styled(\n                format!(\" {} {}\", checkbox, cap.label()),\n                style,\n            ))\n        })\n        .collect();\n\n    let active_count = app.selected_capabilities.iter().filter(|&&s| s).count();\n    let title = format!(\" Capabilities ({}/{}) \", active_count, total);\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.accent_secondary))\n        .title(title)\n        .title_style(\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD),\n        );\n\n    let paragraph = Paragraph::new(lines).block(block);\n    frame.render_widget(paragraph, popup_area);\n}\n\nfn draw_download_provider_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {\n    let area = frame.area();\n    let popup_width = 44.min(area.width.saturating_sub(4));\n    let popup_height = 8.min(area.height.saturating_sub(4));\n\n    let x = area.x + (area.width.saturating_sub(popup_width)) / 2;\n    let y = area.y + (area.height.saturating_sub(popup_height)) / 2;\n    let popup_area = Rect::new(x, y, popup_width, popup_height);\n\n    frame.render_widget(Clear, popup_area);\n\n    let mut lines = Vec::new();\n    if let Some(name) = &app.download_provider_model {\n        lines.push(Line::from(Span::styled(\n            format!(\" Model: {}\", name),\n            Style::default().fg(tc.muted),\n        )));\n        lines.push(Line::from(\"\"));\n    }\n\n    for (i, provider) in app.download_provider_options.iter().enumerate() {\n        let label = match provider {\n            DownloadProvider::Ollama => \"Ollama\",\n            DownloadProvider::Mlx => \"MLX\",\n            DownloadProvider::LlamaCpp => \"llama.cpp\",\n            DownloadProvider::DockerModelRunner => \"Docker Model Runner\",\n            DownloadProvider::LmStudio => \"LM Studio\",\n        };\n        let is_cursor = i == app.download_provider_cursor;\n        let prefix = if is_cursor { \">\" } else { \" \" };\n        let style = if is_cursor {\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD)\n                .bg(tc.highlight_bg)\n        } else {\n            Style::default().fg(tc.fg)\n        };\n        lines.push(Line::from(Span::styled(\n            format!(\" {} {}\", prefix, label),\n            style,\n        )));\n    }\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.accent_secondary))\n        .title(\" Download With \")\n        .title_style(\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD),\n        );\n\n    let paragraph = Paragraph::new(lines).block(block);\n    frame.render_widget(paragraph, popup_area);\n}\n\nfn status_keys_and_mode(app: &App) -> (String, String) {\n    match app.input_mode {\n        InputMode::Normal => {\n            if app.show_multi_compare {\n                return (\n                    \" ←/→/hl:scroll  q/Esc:close\".to_string(),\n                    \"COMPARE\".to_string(),\n                );\n            }\n            let detail_key = if app.show_detail {\n                \"Enter:table\"\n            } else {\n                \"Enter:detail\"\n            };\n            let any_provider = app.ollama_available\n                || app.mlx_available\n                || app.llamacpp_available\n                || app.docker_mr_available\n                || app.lmstudio_available;\n            let ollama_keys = if any_provider {\n                let installed_key = if app.installed_first {\n                    \"i:all\"\n                } else {\n                    \"i:installed↑\"\n                };\n                format!(\"  {}  d:pull  r:refresh\", installed_key)\n            } else {\n                String::new()\n            };\n            (\n                format!(\n                    \" ↑↓/jk:nav  {}  /:search  f:fit  s:sort  v:visual  V:select  t:theme  p:plan  m:mark  c:compare  x:clear mark{}  P:providers  U:use cases  C:caps  q:quit  tok/s*:est\",\n                    detail_key, ollama_keys,\n                ),\n                \"NORMAL\".to_string(),\n            )\n        }\n        InputMode::Visual => {\n            let count = app.visual_selection_count();\n            (\n                format!(\n                    \" ↑↓/jk:extend  c:compare  m:mark  Esc:exit  ({} selected)\",\n                    count\n                ),\n                \"VISUAL\".to_string(),\n            )\n        }\n        InputMode::Select => {\n            let header_names = [\n                \"\", \"Inst\", \"Model\", \"Provider\", \"Params\", \"Score\", \"tok/s*\", \"Quant\", \"Mode\",\n                \"Mem %\", \"Ctx\", \"Date\", \"Fit\", \"Use Case\",\n            ];\n            let col_name = header_names.get(app.select_column).unwrap_or(&\"\");\n            (\n                format!(\" ←/→:column  ↑↓:nav  Enter:filter [{}]  Esc:exit\", col_name),\n                \"SELECT\".to_string(),\n            )\n        }\n        InputMode::Search => (\n            \"  Type to search  Esc:done  Ctrl-U:clear\".to_string(),\n            \"SEARCH\".to_string(),\n        ),\n        InputMode::Plan => (\n            \"  Tab/jk:field  ←/→:cursor  type:edit  Backspace/Delete  Ctrl-U:clear  Esc:close\"\n                .to_string(),\n            \"PLAN\".to_string(),\n        ),\n        InputMode::ProviderPopup => (\n            \"  ↑↓/jk:navigate  Space:toggle  a:all/none  Esc:close\".to_string(),\n            \"PROVIDERS\".to_string(),\n        ),\n        InputMode::UseCasePopup => (\n            \"  ↑↓/jk:navigate  Space:toggle  a:all/none  Esc:close\".to_string(),\n            \"USE CASES\".to_string(),\n        ),\n        InputMode::CapabilityPopup => (\n            \"  ↑↓/jk:navigate  Space:toggle  a:all/none  Esc:close\".to_string(),\n            \"CAPABILITIES\".to_string(),\n        ),\n        InputMode::DownloadProviderPopup => (\n            \"  ↑↓/jk:choose  Enter:download  Esc:cancel\".to_string(),\n            \"DOWNLOAD\".to_string(),\n        ),\n        InputMode::QuantPopup => (\n            \"  ↑↓/jk:navigate  Space:toggle  a:all/none  Esc:close\".to_string(),\n            \"QUANT\".to_string(),\n        ),\n        InputMode::RunModePopup => (\n            \"  ↑↓/jk:navigate  Space:toggle  a:all/none  Esc:close\".to_string(),\n            \"RUN MODE\".to_string(),\n        ),\n        InputMode::ParamsBucketPopup => (\n            \"  ↑↓/jk:navigate  Space:toggle  a:all/none  Esc:close\".to_string(),\n            \"PARAMS\".to_string(),\n        ),\n    }\n}\n\nfn draw_status_bar(frame: &mut Frame, app: &App, area: Rect, tc: &ThemeColors) {\n    let (keys, mode_text) = status_keys_and_mode(app);\n\n    // If a download is in progress, show the progress bar\n    if let Some(status) = &app.pull_status {\n        let progress_text = if let Some(pct) = app.pull_percent {\n            format!(\" {} [{:.0}%] \", status, pct)\n        } else {\n            format!(\" {} \", status)\n        };\n\n        let chunks = Layout::default()\n            .direction(Direction::Horizontal)\n            .constraints([\n                Constraint::Min(20),\n                Constraint::Length(progress_text.len() as u16 + 2),\n            ])\n            .split(area);\n\n        let status_line = Line::from(vec![\n            Span::styled(\n                format!(\" {} \", mode_text),\n                Style::default().fg(tc.status_fg).bg(tc.status_bg).bold(),\n            ),\n            Span::styled(keys, Style::default().fg(tc.muted)),\n        ]);\n        frame.render_widget(Paragraph::new(status_line), chunks[0]);\n\n        let pull_color = if app.pull_active.is_some() {\n            tc.warning\n        } else {\n            tc.good\n        };\n        frame.render_widget(\n            Paragraph::new(Line::from(Span::styled(\n                progress_text,\n                Style::default().fg(pull_color),\n            ))),\n            chunks[1],\n        );\n        return;\n    }\n\n    let status_line = Line::from(vec![\n        Span::styled(\n            format!(\" {} \", mode_text),\n            Style::default().fg(tc.status_fg).bg(tc.status_bg).bold(),\n        ),\n        Span::styled(keys, Style::default().fg(tc.muted)),\n    ]);\n\n    frame.render_widget(Paragraph::new(status_line), area);\n}\n\nfn draw_quant_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {\n    let area = frame.area();\n\n    let max_name_len = app.quants.iter().map(|q| q.len()).max().unwrap_or(10);\n    let popup_width = (max_name_len as u16 + 10).min(area.width.saturating_sub(4));\n    let popup_height = (app.quants.len() as u16 + 2).min(area.height.saturating_sub(4));\n\n    let x = area.x + (area.width.saturating_sub(popup_width)) / 2;\n    let y = area.y + (area.height.saturating_sub(popup_height)) / 2;\n    let popup_area = Rect::new(x, y, popup_width, popup_height);\n\n    frame.render_widget(Clear, popup_area);\n\n    let inner_height = popup_height.saturating_sub(2) as usize;\n    let total = app.quants.len();\n\n    let scroll_offset = if app.quant_cursor >= inner_height {\n        app.quant_cursor - inner_height + 1\n    } else {\n        0\n    };\n\n    let lines: Vec<Line> = app\n        .quants\n        .iter()\n        .enumerate()\n        .skip(scroll_offset)\n        .take(inner_height)\n        .map(|(i, name)| {\n            let checkbox = if app.selected_quants[i] { \"[x]\" } else { \"[ ]\" };\n            let is_cursor = i == app.quant_cursor;\n\n            let style = if is_cursor {\n                if app.selected_quants[i] {\n                    Style::default()\n                        .fg(tc.good)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                } else {\n                    Style::default()\n                        .fg(tc.fg)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                }\n            } else if app.selected_quants[i] {\n                Style::default().fg(tc.good)\n            } else {\n                Style::default().fg(tc.muted)\n            };\n\n            Line::from(Span::styled(format!(\" {} {}\", checkbox, name), style))\n        })\n        .collect();\n\n    let active_count = app.selected_quants.iter().filter(|&&s| s).count();\n    let title = format!(\" Quant ({}/{}) \", active_count, total);\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.accent_secondary))\n        .title(title)\n        .title_style(\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD),\n        );\n\n    let paragraph = Paragraph::new(lines).block(block);\n    frame.render_widget(paragraph, popup_area);\n}\n\nfn draw_run_mode_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {\n    let area = frame.area();\n\n    let max_name_len = app.run_modes.iter().map(|m| m.len()).max().unwrap_or(10);\n    let popup_width = (max_name_len as u16 + 10).min(area.width.saturating_sub(4));\n    let popup_height = (app.run_modes.len() as u16 + 2).min(area.height.saturating_sub(4));\n\n    let x = area.x + (area.width.saturating_sub(popup_width)) / 2;\n    let y = area.y + (area.height.saturating_sub(popup_height)) / 2;\n    let popup_area = Rect::new(x, y, popup_width, popup_height);\n\n    frame.render_widget(Clear, popup_area);\n\n    let inner_height = popup_height.saturating_sub(2) as usize;\n    let total = app.run_modes.len();\n\n    let scroll_offset = if app.run_mode_cursor >= inner_height {\n        app.run_mode_cursor - inner_height + 1\n    } else {\n        0\n    };\n\n    let lines: Vec<Line> = app\n        .run_modes\n        .iter()\n        .enumerate()\n        .skip(scroll_offset)\n        .take(inner_height)\n        .map(|(i, name)| {\n            let checkbox = if app.selected_run_modes[i] {\n                \"[x]\"\n            } else {\n                \"[ ]\"\n            };\n            let is_cursor = i == app.run_mode_cursor;\n\n            let style = if is_cursor {\n                if app.selected_run_modes[i] {\n                    Style::default()\n                        .fg(tc.good)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                } else {\n                    Style::default()\n                        .fg(tc.fg)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                }\n            } else if app.selected_run_modes[i] {\n                Style::default().fg(tc.good)\n            } else {\n                Style::default().fg(tc.muted)\n            };\n\n            Line::from(Span::styled(format!(\" {} {}\", checkbox, name), style))\n        })\n        .collect();\n\n    let active_count = app.selected_run_modes.iter().filter(|&&s| s).count();\n    let title = format!(\" Run Mode ({}/{}) \", active_count, total);\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.accent_secondary))\n        .title(title)\n        .title_style(\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD),\n        );\n\n    let paragraph = Paragraph::new(lines).block(block);\n    frame.render_widget(paragraph, popup_area);\n}\n\nfn draw_params_bucket_popup(frame: &mut Frame, app: &App, tc: &ThemeColors) {\n    let area = frame.area();\n\n    let max_name_len = app\n        .params_buckets\n        .iter()\n        .map(|b| b.len())\n        .max()\n        .unwrap_or(10);\n    let popup_width = (max_name_len as u16 + 10).min(area.width.saturating_sub(4));\n    let popup_height = (app.params_buckets.len() as u16 + 2).min(area.height.saturating_sub(4));\n\n    let x = area.x + (area.width.saturating_sub(popup_width)) / 2;\n    let y = area.y + (area.height.saturating_sub(popup_height)) / 2;\n    let popup_area = Rect::new(x, y, popup_width, popup_height);\n\n    frame.render_widget(Clear, popup_area);\n\n    let inner_height = popup_height.saturating_sub(2) as usize;\n    let total = app.params_buckets.len();\n\n    let scroll_offset = if app.params_bucket_cursor >= inner_height {\n        app.params_bucket_cursor - inner_height + 1\n    } else {\n        0\n    };\n\n    let lines: Vec<Line> = app\n        .params_buckets\n        .iter()\n        .enumerate()\n        .skip(scroll_offset)\n        .take(inner_height)\n        .map(|(i, name)| {\n            let checkbox = if app.selected_params_buckets[i] {\n                \"[x]\"\n            } else {\n                \"[ ]\"\n            };\n            let is_cursor = i == app.params_bucket_cursor;\n\n            let style = if is_cursor {\n                if app.selected_params_buckets[i] {\n                    Style::default()\n                        .fg(tc.good)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                } else {\n                    Style::default()\n                        .fg(tc.fg)\n                        .add_modifier(Modifier::BOLD)\n                        .bg(tc.highlight_bg)\n                }\n            } else if app.selected_params_buckets[i] {\n                Style::default().fg(tc.good)\n            } else {\n                Style::default().fg(tc.muted)\n            };\n\n            Line::from(Span::styled(format!(\" {} {}\", checkbox, name), style))\n        })\n        .collect();\n\n    let active_count = app.selected_params_buckets.iter().filter(|&&s| s).count();\n    let title = format!(\" Params ({}/{}) \", active_count, total);\n\n    let block = Block::default()\n        .borders(Borders::ALL)\n        .border_style(Style::default().fg(tc.accent_secondary))\n        .title(title)\n        .title_style(\n            Style::default()\n                .fg(tc.accent_secondary)\n                .add_modifier(Modifier::BOLD),\n        );\n\n    let paragraph = Paragraph::new(lines).block(block);\n    frame.render_widget(paragraph, popup_area);\n}\n"
  },
  {
    "path": "llmfit-web/README.md",
    "content": "# llmfit-web\n\nReact + Vite frontend for the llmfit local web dashboard.\n\n## Development\n\n```sh\nnpm ci\nnpm run dev\n```\n\nThis starts Vite on `http://127.0.0.1:5173` and proxies `/api/*` to `http://127.0.0.1:8787`.\n\n## Build\n\n```sh\nnpm run build\n```\n\nBuild output is written to `llmfit-web/dist` and embedded into `llmfit serve` at compile time.\n"
  },
  {
    "path": "llmfit-web/index.html",
    "content": "<!doctype html>\n<html lang=\"en\">\n  <head>\n    <meta charset=\"UTF-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\" />\n    <title>llmfit Web Dashboard</title>\n  </head>\n  <body>\n    <div id=\"root\"></div>\n    <script type=\"module\" src=\"/src/main.jsx\"></script>\n  </body>\n</html>\n"
  },
  {
    "path": "llmfit-web/package.json",
    "content": "{\n  \"name\": \"llmfit-web\",\n  \"private\": true,\n  \"version\": \"0.1.0\",\n  \"type\": \"module\",\n  \"scripts\": {\n    \"dev\": \"vite\",\n    \"build\": \"vite build\",\n    \"preview\": \"vite preview\",\n    \"test\": \"vitest run\",\n    \"test:watch\": \"vitest\"\n  },\n  \"dependencies\": {\n    \"react\": \"^18.2.0\",\n    \"react-dom\": \"^18.2.0\"\n  },\n  \"devDependencies\": {\n    \"@testing-library/jest-dom\": \"^6.6.3\",\n    \"@testing-library/react\": \"^16.2.0\",\n    \"@vitejs/plugin-react\": \"^4.3.4\",\n    \"jsdom\": \"^26.0.0\",\n    \"vite\": \"^5.4.12\",\n    \"vitest\": \"^2.1.8\"\n  }\n}\n"
  },
  {
    "path": "llmfit-web/src/App.jsx",
    "content": "import { useEffect, useMemo, useState } from 'react';\nimport { DEFAULT_FILTERS, fetchModels, fetchSystemInfo } from './api';\n\nconst THEME_KEY = 'llmfit-theme';\n\nconst FIT_OPTIONS = [\n  { value: 'marginal', label: 'Runnable (Marginal+)' },\n  { value: 'good', label: 'Good or better' },\n  { value: 'perfect', label: 'Perfect only' },\n  { value: 'too_tight', label: 'Too-tight only' },\n  { value: 'all', label: 'All levels' }\n];\n\nconst RUNTIME_OPTIONS = [\n  { value: 'any', label: 'Any runtime' },\n  { value: 'mlx', label: 'MLX' },\n  { value: 'llamacpp', label: 'llama.cpp' },\n  { value: 'vllm', label: 'vLLM' }\n];\n\nconst USE_CASE_OPTIONS = [\n  { value: 'all', label: 'All use cases' },\n  { value: 'general', label: 'General' },\n  { value: 'coding', label: 'Coding' },\n  { value: 'reasoning', label: 'Reasoning' },\n  { value: 'chat', label: 'Chat' },\n  { value: 'multimodal', label: 'Multimodal' },\n  { value: 'embedding', label: 'Embedding' }\n];\n\nconst LIMIT_OPTIONS = [\n  { value: '10', label: '10' },\n  { value: '20', label: '20' },\n  { value: '50', label: '50' },\n  { value: '100', label: '100' },\n  { value: '200', label: '200' },\n  { value: '', label: 'All' }\n];\n\nconst SORT_OPTIONS = [\n  { value: 'score', label: 'Sort: Score' },\n  { value: 'tps', label: 'Sort: TPS' },\n  { value: 'params', label: 'Sort: Params' },\n  { value: 'mem', label: 'Sort: Memory' },\n  { value: 'ctx', label: 'Sort: Context' },\n  { value: 'date', label: 'Sort: Release date' },\n  { value: 'use_case', label: 'Sort: Use case' }\n];\n\nfunction initialTheme() {\n  if (typeof window === 'undefined') {\n    return 'light';\n  }\n\n  const stored = window.localStorage.getItem(THEME_KEY);\n  if (stored === 'light' || stored === 'dark') {\n    return stored;\n  }\n\n  return window.matchMedia?.('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';\n}\n\nfunction round(value, digits = 1) {\n  if (typeof value !== 'number' || Number.isNaN(value)) {\n    return '—';\n  }\n  return value.toFixed(digits);\n}\n\nfunction fitClass(code) {\n  return `fit fit-${code || 'unknown'}`;\n}\n\nfunction modeClass(code) {\n  return `mode mode-${code || 'unknown'}`;\n}\n\nfunction SystemCard({ label, value, detail }) {\n  return (\n    <article className=\"system-card\">\n      <p className=\"system-label\">{label}</p>\n      <p className=\"system-value\">{value}</p>\n      {detail ? <p className=\"system-detail\">{detail}</p> : null}\n    </article>\n  );\n}\n\nfunction MetricBar({ label, value }) {\n  const safe = Number.isFinite(value) ? Math.max(0, Math.min(value, 100)) : 0;\n  return (\n    <div className=\"metric-row\">\n      <div className=\"metric-text\">\n        <span>{label}</span>\n        <span>{round(value, 1)}</span>\n      </div>\n      <div className=\"metric-track\">\n        <div className=\"metric-fill\" style={{ width: `${safe}%` }} />\n      </div>\n    </div>\n  );\n}\n\nfunction fitRank(level) {\n  switch (level) {\n    case 'perfect':\n      return 3;\n    case 'good':\n      return 2;\n    case 'marginal':\n      return 1;\n    case 'too_tight':\n      return 0;\n    default:\n      return -1;\n  }\n}\n\nfunction applyClientFitFilter(models, minFit) {\n  const list = Array.isArray(models) ? models : [];\n  if (minFit === 'all') {\n    return list;\n  }\n  if (minFit === 'too_tight') {\n    return list.filter((model) => model.fit_level === 'too_tight');\n  }\n\n  const threshold = fitRank(minFit);\n  return list.filter((model) => {\n    const rank = fitRank(model.fit_level);\n    return rank >= threshold;\n  });\n}\n\nexport default function App() {\n  const [theme, setTheme] = useState(initialTheme);\n  const [filters, setFilters] = useState(DEFAULT_FILTERS);\n  const [systemState, setSystemState] = useState({\n    loading: true,\n    error: '',\n    payload: null\n  });\n  const [modelsState, setModelsState] = useState({\n    loading: true,\n    error: '',\n    models: [],\n    total: 0,\n    returned: 0\n  });\n  const [selectedModelName, setSelectedModelName] = useState(null);\n  const [refreshTick, setRefreshTick] = useState(0);\n\n  useEffect(() => {\n    document.documentElement.dataset.theme = theme;\n    window.localStorage.setItem(THEME_KEY, theme);\n  }, [theme]);\n\n  useEffect(() => {\n    const controller = new AbortController();\n\n    async function loadSystem() {\n      setSystemState((prev) => ({ ...prev, loading: true, error: '' }));\n      try {\n        const payload = await fetchSystemInfo(controller.signal);\n        setSystemState({ loading: false, error: '', payload });\n      } catch (error) {\n        if (controller.signal.aborted) {\n          return;\n        }\n        setSystemState({\n          loading: false,\n          error: error instanceof Error ? error.message : 'Unable to load system details.',\n          payload: null\n        });\n      }\n    }\n\n    loadSystem();\n    return () => controller.abort();\n  }, [refreshTick]);\n\n  useEffect(() => {\n    const controller = new AbortController();\n\n    async function loadModels() {\n      setModelsState((prev) => ({ ...prev, loading: true, error: '' }));\n      try {\n        const payload = await fetchModels(filters, controller.signal);\n        const fetchedModels = Array.isArray(payload.models) ? payload.models : [];\n        const fitFiltered = applyClientFitFilter(fetchedModels, filters.minFit);\n        const limit = Number.parseInt(filters.limit, 10);\n        const models = Number.isFinite(limit) && limit > 0 ? fitFiltered.slice(0, limit) : fitFiltered;\n        const serverTotal =\n          typeof payload.total_models === 'number' && Number.isFinite(payload.total_models)\n            ? payload.total_models\n            : fitFiltered.length;\n        const total = filters.minFit === 'too_tight' ? fitFiltered.length : serverTotal;\n        setModelsState({\n          loading: false,\n          error: '',\n          models,\n          total,\n          returned: models.length\n        });\n\n        setSelectedModelName((current) => {\n          if (!current) {\n            return models[0]?.name ?? null;\n          }\n          const stillVisible = models.some((model) => model.name === current);\n          return stillVisible ? current : models[0]?.name ?? null;\n        });\n      } catch (error) {\n        if (controller.signal.aborted) {\n          return;\n        }\n        setModelsState({\n          loading: false,\n          error: error instanceof Error ? error.message : 'Unable to load model fits.',\n          models: [],\n          total: 0,\n          returned: 0\n        });\n        setSelectedModelName(null);\n      }\n    }\n\n    loadModels();\n    return () => controller.abort();\n  }, [filters, refreshTick]);\n\n  const selectedModel = useMemo(\n    () => modelsState.models.find((model) => model.name === selectedModelName) ?? null,\n    [modelsState.models, selectedModelName]\n  );\n\n  const handleFieldChange = (field) => (event) => {\n    const value = event.target.type === 'checkbox' ? event.target.checked : event.target.value;\n    setFilters((current) => ({\n      ...current,\n      [field]: value\n    }));\n  };\n\n  const gpus = systemState.payload?.system?.gpus ?? [];\n  const gpuSummary =\n    gpus.length === 0\n      ? 'No GPU detected'\n      : gpus\n          .map((gpu) => `${gpu.name}${gpu.vram_gb ? ` (${round(gpu.vram_gb, 1)} GB)` : ''}`)\n          .join(', ');\n\n  return (\n    <div className=\"page-shell\">\n      <div className=\"orb orb-one\" aria-hidden=\"true\" />\n      <div className=\"orb orb-two\" aria-hidden=\"true\" />\n\n      <header className=\"hero-shell\">\n        <div>\n          <p className=\"hero-eyebrow\">Local LLM Planning</p>\n          <h1>llmfit Dashboard</h1>\n          <p className=\"hero-copy\">Hundreds of models &amp; providers. One command to find what runs on your hardware.</p>\n        </div>\n\n        <div className=\"hero-actions\">\n          <button type=\"button\" onClick={() => setFilters(DEFAULT_FILTERS)} className=\"btn btn-ghost\">\n            Reset filters\n          </button>\n          <button type=\"button\" onClick={() => setRefreshTick((tick) => tick + 1)} className=\"btn btn-accent\">\n            Refresh\n          </button>\n          <button\n            type=\"button\"\n            onClick={() => setTheme((current) => (current === 'dark' ? 'light' : 'dark'))}\n            className=\"btn btn-theme\"\n          >\n            {theme === 'dark' ? 'Light mode' : 'Dark mode'}\n          </button>\n        </div>\n      </header>\n\n      <section className=\"panel system-panel\">\n        <div className=\"panel-heading\">\n          <h2>System Summary</h2>\n          {systemState.payload?.node ? (\n            <span className=\"chip\">\n              {systemState.payload.node.name} · {systemState.payload.node.os}\n            </span>\n          ) : null}\n        </div>\n\n        {systemState.error ? (\n          <div role=\"alert\" className=\"alert error\">\n            Could not load system information: {systemState.error}. Make sure `llmfit serve` is running.\n          </div>\n        ) : null}\n\n        <div className=\"system-grid\" aria-busy={systemState.loading}>\n          <SystemCard\n            label=\"CPU\"\n            value={systemState.payload?.system?.cpu_name ?? 'Loading…'}\n            detail={\n              systemState.payload?.system?.cpu_cores\n                ? `${systemState.payload.system.cpu_cores} cores`\n                : undefined\n            }\n          />\n          <SystemCard\n            label=\"Total RAM\"\n            value={\n              systemState.payload?.system?.total_ram_gb\n                ? `${round(systemState.payload.system.total_ram_gb, 1)} GB`\n                : '—'\n            }\n          />\n          <SystemCard\n            label=\"Available RAM\"\n            value={\n              systemState.payload?.system?.available_ram_gb\n                ? `${round(systemState.payload.system.available_ram_gb, 1)} GB`\n                : '—'\n            }\n          />\n          <SystemCard\n            label=\"GPU\"\n            value={gpuSummary}\n            detail={\n              systemState.payload?.system?.unified_memory\n                ? 'Unified memory (CPU + GPU shared)'\n                : undefined\n            }\n          />\n        </div>\n      </section>\n\n      <section className=\"panel models-panel\">\n        <div className=\"panel-heading\">\n          <h2>Model Fit Explorer</h2>\n          <span className=\"chip\">\n            {modelsState.returned} shown / {modelsState.total} matched\n          </span>\n        </div>\n\n        <div className=\"filters-shell\">\n          <label>\n            <span>Search</span>\n            <input\n              type=\"text\"\n              value={filters.search}\n              onChange={handleFieldChange('search')}\n              placeholder=\"model, provider, use case\"\n            />\n          </label>\n\n          <label>\n            <span>Fit filter</span>\n            <select value={filters.minFit} onChange={handleFieldChange('minFit')}>\n              {FIT_OPTIONS.map((option) => (\n                <option key={option.value} value={option.value}>\n                  {option.label}\n                </option>\n              ))}\n            </select>\n          </label>\n\n          <label>\n            <span>Runtime</span>\n            <select value={filters.runtime} onChange={handleFieldChange('runtime')}>\n              {RUNTIME_OPTIONS.map((option) => (\n                <option key={option.value} value={option.value}>\n                  {option.label}\n                </option>\n              ))}\n            </select>\n          </label>\n\n          <label>\n            <span>Use case</span>\n            <select value={filters.useCase} onChange={handleFieldChange('useCase')}>\n              {USE_CASE_OPTIONS.map((option) => (\n                <option key={option.value} value={option.value}>\n                  {option.label}\n                </option>\n              ))}\n            </select>\n          </label>\n\n          <label>\n            <span>Provider</span>\n            <input\n              type=\"text\"\n              value={filters.provider}\n              onChange={handleFieldChange('provider')}\n              placeholder=\"Meta, Qwen, Mistral\"\n            />\n          </label>\n\n          <label>\n            <span>Sort</span>\n            <select value={filters.sort} onChange={handleFieldChange('sort')}>\n              {SORT_OPTIONS.map((option) => (\n                <option key={option.value} value={option.value}>\n                  {option.label}\n                </option>\n              ))}\n            </select>\n          </label>\n\n          <label>\n            <span>Limit</span>\n            <select value={String(filters.limit)} onChange={handleFieldChange('limit')}>\n              {LIMIT_OPTIONS.map((option) => (\n                <option key={option.value} value={option.value}>\n                  {option.label}\n                </option>\n              ))}\n            </select>\n          </label>\n        </div>\n\n        {modelsState.error ? (\n          <div role=\"alert\" className=\"alert error\">\n            Could not load models: {modelsState.error}. Confirm this page is opened from `llmfit serve`.\n          </div>\n        ) : null}\n\n        <div className=\"models-layout\" aria-busy={modelsState.loading}>\n          <div className=\"table-wrap\">\n            <table>\n              <thead>\n                <tr>\n                  <th>Model</th>\n                  <th>Provider</th>\n                  <th>Params</th>\n                  <th>Fit</th>\n                  <th>Mode</th>\n                  <th>Runtime</th>\n                  <th>Score</th>\n                  <th>TPS</th>\n                  <th>Mem%</th>\n                  <th>Context</th>\n                  <th>Release</th>\n                </tr>\n              </thead>\n              <tbody>\n                {modelsState.loading ? (\n                  <tr>\n                    <td colSpan=\"11\" className=\"table-status\">\n                      Loading model fit data…\n                    </td>\n                  </tr>\n                ) : null}\n\n                {!modelsState.loading && modelsState.models.length === 0 ? (\n                  <tr>\n                    <td colSpan=\"11\" className=\"table-status\">\n                      No models match the current filters.\n                    </td>\n                  </tr>\n                ) : null}\n\n                {!modelsState.loading\n                  ? modelsState.models.map((model) => (\n                      <tr\n                        key={model.name}\n                        className={model.name === selectedModelName ? 'selected' : ''}\n                        onClick={() => setSelectedModelName(model.name)}\n                      >\n                        <td className=\"model-name\">{model.name}</td>\n                        <td>{model.provider}</td>\n                        <td>{round(model.params_b, 1)}B</td>\n                        <td>\n                          <span className={fitClass(model.fit_level)}>{model.fit_label}</span>\n                        </td>\n                        <td>\n                          <span className={modeClass(model.run_mode)}>{model.run_mode_label}</span>\n                        </td>\n                        <td>{model.runtime_label}</td>\n                        <td>{round(model.score, 1)}</td>\n                        <td>{round(model.estimated_tps, 1)}</td>\n                        <td>{round(model.utilization_pct, 1)}</td>\n                        <td>{model.context_length?.toLocaleString?.() ?? model.context_length ?? '—'}</td>\n                        <td>{model.release_date ?? '—'}</td>\n                      </tr>\n                    ))\n                  : null}\n              </tbody>\n            </table>\n          </div>\n\n          <aside className=\"details-panel\">\n            {selectedModel ? (\n              <>\n                <div className=\"details-header\">\n                  <h3>{selectedModel.name}</h3>\n                  <span className={fitClass(selectedModel.fit_level)}>{selectedModel.fit_label}</span>\n                </div>\n\n                <dl className=\"details-grid\">\n                  <div>\n                    <dt>Provider</dt>\n                    <dd>{selectedModel.provider}</dd>\n                  </div>\n                  <div>\n                    <dt>Run mode</dt>\n                    <dd>{selectedModel.run_mode_label}</dd>\n                  </div>\n                  <div>\n                    <dt>Runtime</dt>\n                    <dd>{selectedModel.runtime_label}</dd>\n                  </div>\n                  <div>\n                    <dt>Best quant</dt>\n                    <dd>{selectedModel.best_quant}</dd>\n                  </div>\n                  <div>\n                    <dt>Memory required</dt>\n                    <dd>{round(selectedModel.memory_required_gb, 2)} GB</dd>\n                  </div>\n                  <div>\n                    <dt>Memory available</dt>\n                    <dd>{round(selectedModel.memory_available_gb, 2)} GB</dd>\n                  </div>\n                </dl>\n\n                <div className=\"metrics-card\">\n                  <h4>Score Breakdown</h4>\n                  <MetricBar label=\"Quality\" value={selectedModel.score_components?.quality} />\n                  <MetricBar label=\"Speed\" value={selectedModel.score_components?.speed} />\n                  <MetricBar label=\"Fit\" value={selectedModel.score_components?.fit} />\n                  <MetricBar label=\"Context\" value={selectedModel.score_components?.context} />\n                </div>\n\n                <div className=\"metrics-card\">\n                  <h4>Performance</h4>\n                  <MetricBar label=\"Memory Utilization %\" value={selectedModel.utilization_pct} />\n                  <div className=\"kpi-grid\">\n                    <div>\n                      <span>Composite score</span>\n                      <strong>{round(selectedModel.score, 1)}</strong>\n                    </div>\n                    <div>\n                      <span>Estimated TPS</span>\n                      <strong>{round(selectedModel.estimated_tps, 1)}</strong>\n                    </div>\n                  </div>\n                </div>\n\n                {Array.isArray(selectedModel.notes) && selectedModel.notes.length > 0 ? (\n                  <div className=\"metrics-card\">\n                    <h4>Notes</h4>\n                    <ul>\n                      {selectedModel.notes.map((note) => (\n                        <li key={note}>{note}</li>\n                      ))}\n                    </ul>\n                  </div>\n                ) : (\n                  <p className=\"muted-copy\">No additional notes for this model fit.</p>\n                )}\n              </>\n            ) : (\n              <p className=\"muted-copy\">Select a model row to inspect detailed fit diagnostics.</p>\n            )}\n          </aside>\n        </div>\n      </section>\n    </div>\n  );\n}\n"
  },
  {
    "path": "llmfit-web/src/App.test.jsx",
    "content": "import { fireEvent, render, screen, waitFor } from '@testing-library/react';\nimport App from './App';\n\nfunction jsonResponse(payload, { ok = true, status = 200 } = {}) {\n  return {\n    ok,\n    status,\n    json: async () => payload\n  };\n}\n\nconst systemPayload = {\n  node: { name: 'local-node', os: 'darwin' },\n  system: {\n    cpu_name: 'Apple M3 Max',\n    cpu_cores: 14,\n    total_ram_gb: 64,\n    available_ram_gb: 51.4,\n    gpus: [{ name: 'Apple GPU', vram_gb: 64 }],\n    unified_memory: true\n  }\n};\n\nconst modelsPayload = {\n  total_models: 2,\n  returned_models: 2,\n  models: [\n    {\n      name: 'Qwen/Qwen2.5-7B-Instruct',\n      provider: 'Qwen',\n      params_b: 7,\n      fit_level: 'good',\n      fit_label: 'Good',\n      run_mode: 'gpu',\n      run_mode_label: 'GPU',\n      runtime: 'llamacpp',\n      runtime_label: 'llama.cpp',\n      score: 86,\n      estimated_tps: 34.5,\n      utilization_pct: 58.9,\n      memory_required_gb: 7.4,\n      memory_available_gb: 12.5,\n      context_length: 32768,\n      best_quant: 'Q5_K_M',\n      release_date: '2025-02-01',\n      score_components: {\n        quality: 87,\n        speed: 80,\n        fit: 90,\n        context: 85\n      },\n      notes: ['Runs smoothly on most laptops']\n    },\n    {\n      name: 'meta-llama/Llama-3.1-8B-Instruct',\n      provider: 'Meta',\n      params_b: 8,\n      fit_level: 'marginal',\n      fit_label: 'Marginal',\n      run_mode: 'cpu_offload',\n      run_mode_label: 'CPU Offload',\n      runtime: 'llamacpp',\n      runtime_label: 'llama.cpp',\n      score: 74,\n      estimated_tps: 19.2,\n      utilization_pct: 87.5,\n      memory_required_gb: 10.1,\n      memory_available_gb: 11.5,\n      context_length: 8192,\n      best_quant: 'Q4_K_M',\n      release_date: '2024-11-10',\n      score_components: {\n        quality: 78,\n        speed: 66,\n        fit: 72,\n        context: 74\n      },\n      notes: []\n    },\n    {\n      name: 'LargeModel/220B-Preview',\n      provider: 'Example',\n      params_b: 220,\n      fit_level: 'too_tight',\n      fit_label: 'Too Tight',\n      run_mode: 'cpu_only',\n      run_mode_label: 'CPU Only',\n      runtime: 'llamacpp',\n      runtime_label: 'llama.cpp',\n      score: 44,\n      estimated_tps: 1.9,\n      utilization_pct: 165.2,\n      memory_required_gb: 92.4,\n      memory_available_gb: 56.0,\n      context_length: 32768,\n      best_quant: 'Q2_K',\n      release_date: '2025-01-02',\n      score_components: {\n        quality: 95,\n        speed: 8,\n        fit: 10,\n        context: 62\n      },\n      notes: ['Requires substantially more memory than this system']\n    }\n  ]\n};\n\ndescribe('App', () => {\n  afterEach(() => {\n    vi.unstubAllGlobals();\n    window.localStorage.clear();\n  });\n\n  it('renders models and refetches when sort changes', async () => {\n    const fetchMock = vi.fn((url) => {\n      const target = String(url);\n      if (target.includes('/api/v1/system')) {\n        return Promise.resolve(jsonResponse(systemPayload));\n      }\n      if (target.includes('/api/v1/models')) {\n        return Promise.resolve(jsonResponse(modelsPayload));\n      }\n      return Promise.reject(new Error(`Unexpected URL: ${target}`));\n    });\n\n    vi.stubGlobal('fetch', fetchMock);\n\n    render(<App />);\n\n    await screen.findAllByText('Qwen/Qwen2.5-7B-Instruct');\n    expect(screen.getByText('meta-llama/Llama-3.1-8B-Instruct')).toBeInTheDocument();\n\n    fireEvent.change(screen.getByLabelText('Sort'), { target: { value: 'tps' } });\n\n    await waitFor(() => {\n      const queriedWithTps = fetchMock.mock.calls.some(([url]) => String(url).includes('sort=tps'));\n      expect(queriedWithTps).toBe(true);\n    });\n  });\n\n  it('opens detail diagnostics when a model row is selected', async () => {\n    vi.stubGlobal('fetch', vi.fn((url) => {\n      const target = String(url);\n      if (target.includes('/api/v1/system')) {\n        return Promise.resolve(jsonResponse(systemPayload));\n      }\n      return Promise.resolve(jsonResponse(modelsPayload));\n    }));\n\n    render(<App />);\n\n    const modelCell = (await screen.findAllByText('Qwen/Qwen2.5-7B-Instruct'))[0];\n    fireEvent.click(modelCell);\n\n    expect(screen.getByText('Score Breakdown')).toBeInTheDocument();\n    expect(screen.getByText('Runs smoothly on most laptops')).toBeInTheDocument();\n  });\n\n  it('shows actionable error message when model fetch fails', async () => {\n    vi.stubGlobal(\n      'fetch',\n      vi.fn((url) => {\n        const target = String(url);\n        if (target.includes('/api/v1/system')) {\n          return Promise.resolve(jsonResponse(systemPayload));\n        }\n        return Promise.resolve(jsonResponse({ error: 'backend unavailable' }, { ok: false, status: 500 }));\n      })\n    );\n\n    render(<App />);\n\n    const alert = await screen.findByRole('alert');\n    expect(alert).toHaveTextContent('Could not load models: backend unavailable');\n    expect(alert).toHaveTextContent('llmfit serve');\n  });\n\n  it('toggles between light and dark theme', async () => {\n    vi.stubGlobal('fetch', vi.fn((url) => {\n      const target = String(url);\n      if (target.includes('/api/v1/system')) {\n        return Promise.resolve(jsonResponse(systemPayload));\n      }\n      return Promise.resolve(jsonResponse(modelsPayload));\n    }));\n\n    render(<App />);\n\n    const toggle = await screen.findByRole('button', { name: 'Dark mode' });\n    fireEvent.click(toggle);\n\n    expect(await screen.findByRole('button', { name: 'Light mode' })).toBeInTheDocument();\n    expect(document.documentElement.dataset.theme).toBe('dark');\n  });\n\n  it('can filter to too-tight only', async () => {\n    vi.stubGlobal('fetch', vi.fn((url) => {\n      const target = String(url);\n      if (target.includes('/api/v1/system')) {\n        return Promise.resolve(jsonResponse(systemPayload));\n      }\n      return Promise.resolve(jsonResponse(modelsPayload));\n    }));\n\n    render(<App />);\n\n    await screen.findAllByText('Qwen/Qwen2.5-7B-Instruct');\n    fireEvent.change(screen.getByLabelText('Fit filter'), { target: { value: 'too_tight' } });\n\n    expect(await screen.findAllByText('LargeModel/220B-Preview')).not.toHaveLength(0);\n    expect(screen.queryAllByText('Qwen/Qwen2.5-7B-Instruct')).toHaveLength(0);\n  });\n});\n"
  },
  {
    "path": "llmfit-web/src/api.js",
    "content": "export const DEFAULT_FILTERS = {\n  search: '',\n  minFit: 'marginal',\n  runtime: 'any',\n  useCase: 'all',\n  provider: '',\n  sort: 'score',\n  limit: '50'\n};\n\nfunction trimOrEmpty(value) {\n  return typeof value === 'string' ? value.trim() : '';\n}\n\nexport function buildModelsQuery(filters) {\n  const params = new URLSearchParams();\n\n  const search = trimOrEmpty(filters.search);\n  if (search) {\n    params.set('search', search);\n  }\n\n  const provider = trimOrEmpty(filters.provider);\n  if (provider) {\n    params.set('provider', provider);\n  }\n\n  const minFit = filters.minFit || 'marginal';\n  const needsClientFitProcessing = minFit === 'too_tight';\n\n  if (minFit === 'all' || minFit === 'too_tight') {\n    // too_tight is the lowest level, so this returns all fits.\n    // We post-filter client-side for the too-tight-only mode.\n    params.set('min_fit', 'too_tight');\n    params.set('include_too_tight', 'true');\n  } else {\n    params.set('min_fit', minFit);\n    params.set('include_too_tight', 'false');\n  }\n\n  if (filters.runtime && filters.runtime !== 'any') {\n    params.set('runtime', filters.runtime);\n  }\n\n  if (filters.useCase && filters.useCase !== 'all') {\n    params.set('use_case', filters.useCase);\n  }\n\n  if (filters.sort) {\n    params.set('sort', filters.sort);\n  }\n\n  const limit = Number.parseInt(filters.limit, 10);\n  if (!needsClientFitProcessing && Number.isFinite(limit) && limit > 0) {\n    params.set('limit', String(limit));\n  }\n\n  return params.toString();\n}\n\nasync function parseJsonOrThrow(response) {\n  let payload;\n  try {\n    payload = await response.json();\n  } catch (err) {\n    throw new Error('Server returned an invalid JSON response.');\n  }\n\n  if (!response.ok) {\n    const message = payload?.error || `Request failed with status ${response.status}.`;\n    throw new Error(message);\n  }\n\n  return payload;\n}\n\nexport async function fetchSystemInfo(signal) {\n  const response = await fetch('/api/v1/system', { signal });\n  return parseJsonOrThrow(response);\n}\n\nexport async function fetchModels(filters, signal) {\n  const query = buildModelsQuery(filters);\n  const path = query ? `/api/v1/models?${query}` : '/api/v1/models';\n  const response = await fetch(path, { signal });\n  return parseJsonOrThrow(response);\n}\n"
  },
  {
    "path": "llmfit-web/src/api.test.js",
    "content": "import { buildModelsQuery } from './api';\n\ndescribe('buildModelsQuery', () => {\n  it('maps filter state to API query parameters', () => {\n    const query = buildModelsQuery({\n      search: 'qwen',\n      minFit: 'good',\n      runtime: 'llamacpp',\n      useCase: 'coding',\n      provider: 'Qwen',\n      sort: 'tps',\n      limit: 25\n    });\n\n    const params = new URLSearchParams(query);\n    expect(params.get('search')).toBe('qwen');\n    expect(params.get('min_fit')).toBe('good');\n    expect(params.get('runtime')).toBe('llamacpp');\n    expect(params.get('use_case')).toBe('coding');\n    expect(params.get('provider')).toBe('Qwen');\n    expect(params.get('sort')).toBe('tps');\n    expect(params.get('limit')).toBe('25');\n    expect(params.get('include_too_tight')).toBe('false');\n  });\n\n  it('requests broad fit set for too-tight-only mode', () => {\n    const query = buildModelsQuery({\n      search: '',\n      minFit: 'too_tight',\n      runtime: 'any',\n      useCase: 'all',\n      provider: '',\n      sort: 'score',\n      limit: 25\n    });\n\n    const params = new URLSearchParams(query);\n    expect(params.get('min_fit')).toBe('too_tight');\n    expect(params.get('include_too_tight')).toBe('true');\n    expect(params.get('limit')).toBeNull();\n  });\n\n  it('uses broad query mode for all-level filter', () => {\n    const query = buildModelsQuery({\n      search: '',\n      minFit: 'all',\n      runtime: 'any',\n      useCase: 'all',\n      provider: ' ',\n      sort: 'score',\n      limit: ''\n    });\n\n    const params = new URLSearchParams(query);\n    expect(params.get('search')).toBeNull();\n    expect(params.get('min_fit')).toBe('too_tight');\n    expect(params.get('runtime')).toBeNull();\n    expect(params.get('use_case')).toBeNull();\n    expect(params.get('provider')).toBeNull();\n    expect(params.get('sort')).toBe('score');\n    expect(params.get('limit')).toBeNull();\n    expect(params.get('include_too_tight')).toBe('true');\n  });\n});\n"
  },
  {
    "path": "llmfit-web/src/main.jsx",
    "content": "import React from 'react';\nimport ReactDOM from 'react-dom/client';\nimport App from './App';\nimport './styles.css';\n\nReactDOM.createRoot(document.getElementById('root')).render(\n  <React.StrictMode>\n    <App />\n  </React.StrictMode>\n);\n"
  },
  {
    "path": "llmfit-web/src/styles.css",
    "content": ":root {\n  --font-ui: \"Sora\", \"Manrope\", \"Avenir Next\", \"Segoe UI\", sans-serif;\n  --radius-xl: 22px;\n  --radius-lg: 16px;\n  --radius-md: 12px;\n  --radius-sm: 10px;\n  --ease-out: 240ms cubic-bezier(0.2, 0.8, 0.2, 1);\n}\n\n:root[data-theme='light'] {\n  --bg: #f5f6f0;\n  --bg-accent: #fce9d7;\n  --panel: #fffef7;\n  --panel-2: #ffffff;\n  --line: #d8d4c9;\n  --ink: #18212f;\n  --muted: #5f697a;\n  --accent: #e6673e;\n  --accent-soft: #fff0e9;\n  --good: #1d8f56;\n  --good-soft: #e8f8ef;\n  --warn: #a56a23;\n  --warn-soft: #fff4e7;\n  --danger: #b43d3d;\n  --danger-soft: #ffecec;\n  --table-head: #ece9df;\n  --row-hover: #fff4e9;\n  --row-selected: #ffe4d4;\n  --shadow: 0 20px 45px rgba(36, 31, 22, 0.09);\n}\n\n:root[data-theme='dark'] {\n  --bg: #15171d;\n  --bg-accent: #2a1f1a;\n  --panel: #1d212b;\n  --panel-2: #232836;\n  --line: #333a4f;\n  --ink: #edf1ff;\n  --muted: #a7b0c8;\n  --accent: #ff9c66;\n  --accent-soft: #37271f;\n  --good: #59d28f;\n  --good-soft: #193529;\n  --warn: #ffc26c;\n  --warn-soft: #3a2c18;\n  --danger: #ff8d8d;\n  --danger-soft: #3f2528;\n  --table-head: #262d3e;\n  --row-hover: #2a3448;\n  --row-selected: #3a2b25;\n  --shadow: 0 24px 50px rgba(0, 0, 0, 0.45);\n}\n\n* {\n  box-sizing: border-box;\n}\n\nbody {\n  margin: 0;\n  min-height: 100vh;\n  font-family: var(--font-ui);\n  color: var(--ink);\n  background:\n    radial-gradient(1200px 600px at 85% -10%, var(--bg-accent), transparent 45%),\n    radial-gradient(800px 500px at -10% 10%, color-mix(in oklab, var(--accent) 16%, transparent), transparent 52%),\n    var(--bg);\n}\n\n.page-shell {\n  position: relative;\n  max-width: 1640px;\n  margin: 0 auto;\n  padding: 1.1rem;\n  display: grid;\n  gap: 0.95rem;\n}\n\n.orb {\n  position: absolute;\n  border-radius: 999px;\n  pointer-events: none;\n  filter: blur(26px);\n  opacity: 0.24;\n  animation: float 8s ease-in-out infinite;\n}\n\n.orb-one {\n  width: 220px;\n  height: 220px;\n  background: color-mix(in oklab, var(--accent) 88%, transparent);\n  top: 72px;\n  right: 120px;\n}\n\n.orb-two {\n  width: 180px;\n  height: 180px;\n  background: color-mix(in oklab, var(--good) 70%, transparent);\n  bottom: 72px;\n  left: 110px;\n  animation-delay: -3.5s;\n}\n\nh1,\nh2,\nh3,\nh4,\np {\n  margin: 0;\n}\n\n.hero-shell,\n.panel {\n  position: relative;\n  z-index: 1;\n  border: 1px solid var(--line);\n  border-radius: var(--radius-xl);\n  background: linear-gradient(145deg, color-mix(in oklab, var(--panel) 96%, transparent), var(--panel-2));\n  box-shadow: var(--shadow);\n  backdrop-filter: blur(8px);\n}\n\n.hero-shell {\n  padding: 1rem 1.1rem;\n  display: flex;\n  align-items: center;\n  justify-content: space-between;\n  gap: 1rem;\n  animation: rise 520ms var(--ease-out) both;\n}\n\n.hero-eyebrow {\n  font-size: 0.72rem;\n  letter-spacing: 0.14em;\n  text-transform: uppercase;\n  color: var(--muted);\n  margin-bottom: 0.3rem;\n}\n\n.hero-shell h1 {\n  font-size: clamp(1.25rem, 2.5vw, 1.8rem);\n  font-weight: 700;\n}\n\n.hero-copy {\n  margin-top: 0.32rem;\n  color: var(--muted);\n  max-width: 72ch;\n}\n\n.hero-actions {\n  display: flex;\n  gap: 0.55rem;\n  align-items: center;\n}\n\n.btn {\n  border: 1px solid transparent;\n  border-radius: 999px;\n  padding: 0.55rem 0.95rem;\n  font-family: inherit;\n  font-size: 0.84rem;\n  font-weight: 700;\n  cursor: pointer;\n  transition: transform var(--ease-out), background-color var(--ease-out), border-color var(--ease-out), color var(--ease-out);\n}\n\n.btn:hover {\n  transform: translateY(-1px);\n}\n\n.btn-accent {\n  background: var(--accent);\n  color: #fff;\n}\n\n.btn-accent:hover {\n  background: color-mix(in oklab, var(--accent) 87%, black 13%);\n}\n\n.btn-ghost {\n  background: var(--panel-2);\n  border-color: var(--line);\n  color: var(--ink);\n}\n\n.btn-theme {\n  background: transparent;\n  border-color: var(--line);\n  color: var(--muted);\n}\n\n.panel {\n  padding: 0.9rem;\n  animation: rise 560ms var(--ease-out) both;\n}\n\n.panel-heading {\n  display: flex;\n  justify-content: space-between;\n  align-items: center;\n  gap: 0.8rem;\n  margin-bottom: 0.75rem;\n}\n\n.panel-heading h2 {\n  font-size: 1.03rem;\n}\n\n.chip {\n  display: inline-flex;\n  align-items: center;\n  border: 1px solid var(--line);\n  border-radius: 999px;\n  padding: 0.28rem 0.62rem;\n  background: color-mix(in oklab, var(--panel-2) 82%, transparent);\n  color: var(--muted);\n  font-size: 0.8rem;\n}\n\n.alert {\n  border-radius: var(--radius-sm);\n  border: 1px solid;\n  padding: 0.58rem 0.72rem;\n  margin-bottom: 0.75rem;\n  font-size: 0.88rem;\n}\n\n.alert.error {\n  background: var(--danger-soft);\n  border-color: color-mix(in oklab, var(--danger) 55%, var(--line));\n  color: var(--danger);\n}\n\n.system-grid {\n  display: grid;\n  grid-template-columns: repeat(4, minmax(220px, 1fr));\n  gap: 0.65rem;\n}\n\n.system-card {\n  border-radius: var(--radius-md);\n  border: 1px solid var(--line);\n  background: color-mix(in oklab, var(--panel-2) 84%, transparent);\n  padding: 0.7rem;\n  transition: border-color var(--ease-out), transform var(--ease-out);\n}\n\n.system-card:hover {\n  transform: translateY(-2px);\n  border-color: color-mix(in oklab, var(--accent) 35%, var(--line));\n}\n\n.system-label {\n  font-size: 0.72rem;\n  letter-spacing: 0.08em;\n  text-transform: uppercase;\n  color: var(--muted);\n}\n\n.system-value {\n  margin-top: 0.33rem;\n  font-weight: 680;\n  font-size: 0.96rem;\n}\n\n.system-detail {\n  margin-top: 0.24rem;\n  color: var(--muted);\n  font-size: 0.84rem;\n}\n\n.filters-shell {\n  border-radius: var(--radius-md);\n  border: 1px solid var(--line);\n  background: color-mix(in oklab, var(--panel-2) 80%, transparent);\n  padding: 0.7rem;\n  display: grid;\n  grid-template-columns: repeat(4, minmax(180px, 1fr));\n  gap: 0.58rem;\n  margin-bottom: 0.75rem;\n}\n\n.filters-shell label {\n  display: grid;\n  gap: 0.24rem;\n  font-size: 0.8rem;\n  color: var(--muted);\n}\n\n.filters-shell input,\n.filters-shell select {\n  width: 100%;\n  border-radius: 9px;\n  border: 1px solid var(--line);\n  background: var(--panel);\n  color: var(--ink);\n  padding: 0.5rem 0.57rem;\n  font-family: inherit;\n  font-size: 0.88rem;\n}\n\n.filters-shell input:focus,\n.filters-shell select:focus {\n  outline: 2px solid color-mix(in oklab, var(--accent) 48%, transparent);\n  outline-offset: 1px;\n}\n\n.models-layout {\n  display: grid;\n  grid-template-columns: minmax(0, 1.75fr) minmax(320px, 1fr);\n  gap: 0.76rem;\n}\n\n.table-wrap {\n  border: 1px solid var(--line);\n  border-radius: var(--radius-md);\n  overflow: auto;\n  max-height: 68vh;\n  background: var(--panel);\n}\n\ntable {\n  border-collapse: collapse;\n  width: 100%;\n  font-size: 0.84rem;\n}\n\nth,\ntd {\n  border-bottom: 1px solid var(--line);\n  padding: 0.5rem;\n  text-align: left;\n  white-space: nowrap;\n}\n\nth {\n  position: sticky;\n  top: 0;\n  z-index: 1;\n  background: var(--table-head);\n  color: var(--muted);\n  font-size: 0.73rem;\n  text-transform: uppercase;\n  letter-spacing: 0.04em;\n}\n\ntbody tr {\n  cursor: pointer;\n  transition: background-color var(--ease-out);\n}\n\ntbody tr:hover {\n  background: var(--row-hover);\n}\n\ntbody tr.selected {\n  background: var(--row-selected);\n}\n\n.table-status {\n  text-align: center;\n  padding: 1.26rem;\n  color: var(--muted);\n}\n\n.model-name {\n  font-weight: 620;\n  max-width: 320px;\n  overflow: hidden;\n  text-overflow: ellipsis;\n}\n\n.fit,\n.mode {\n  font-weight: 680;\n}\n\n.fit-perfect,\n.mode-gpu {\n  color: var(--good);\n}\n\n.fit-good {\n  color: #4f88ff;\n}\n\n.fit-marginal,\n.mode-cpu_offload,\n.mode-moe_offload {\n  color: var(--warn);\n}\n\n.fit-too_tight,\n.mode-cpu_only {\n  color: var(--danger);\n}\n\n.details-panel {\n  border: 1px solid var(--line);\n  border-radius: var(--radius-md);\n  background: var(--panel);\n  padding: 0.76rem;\n  display: grid;\n  align-content: start;\n  gap: 0.72rem;\n  max-height: 68vh;\n  overflow: auto;\n}\n\n.details-header {\n  display: flex;\n  justify-content: space-between;\n  gap: 0.72rem;\n  align-items: center;\n}\n\n.details-header h3 {\n  font-size: 1rem;\n}\n\n.details-grid {\n  display: grid;\n  grid-template-columns: repeat(2, minmax(0, 1fr));\n  gap: 0.53rem;\n}\n\n.details-grid dt {\n  font-size: 0.73rem;\n  color: var(--muted);\n  text-transform: uppercase;\n  letter-spacing: 0.04em;\n}\n\n.details-grid dd {\n  margin: 0.2rem 0 0;\n  font-size: 0.89rem;\n  font-weight: 620;\n}\n\n.metrics-card {\n  border: 1px solid var(--line);\n  border-radius: var(--radius-sm);\n  padding: 0.58rem;\n  background: color-mix(in oklab, var(--panel-2) 86%, transparent);\n}\n\n.metrics-card h4 {\n  margin-bottom: 0.52rem;\n  font-size: 0.81rem;\n  text-transform: uppercase;\n  letter-spacing: 0.06em;\n  color: var(--muted);\n}\n\n.metric-row {\n  display: grid;\n  gap: 0.24rem;\n  margin-bottom: 0.38rem;\n}\n\n.metric-text {\n  display: flex;\n  justify-content: space-between;\n  color: var(--muted);\n  font-size: 0.79rem;\n}\n\n.metric-track {\n  height: 0.45rem;\n  border-radius: 999px;\n  background: color-mix(in oklab, var(--line) 76%, transparent);\n  overflow: hidden;\n}\n\n.metric-fill {\n  height: 100%;\n  background: linear-gradient(90deg, color-mix(in oklab, var(--accent) 65%, #f3b45d), var(--accent));\n}\n\n.kpi-grid {\n  margin-top: 0.52rem;\n  display: grid;\n  grid-template-columns: repeat(2, minmax(0, 1fr));\n  gap: 0.5rem;\n}\n\n.kpi-grid div {\n  padding: 0.5rem;\n  border-radius: 9px;\n  background: color-mix(in oklab, var(--panel) 84%, transparent);\n  border: 1px solid var(--line);\n}\n\n.kpi-grid span {\n  display: block;\n  font-size: 0.75rem;\n  color: var(--muted);\n}\n\n.kpi-grid strong {\n  display: block;\n  margin-top: 0.28rem;\n  font-size: 1.08rem;\n}\n\n.metrics-card ul {\n  margin: 0;\n  padding-left: 1.1rem;\n}\n\n.metrics-card li {\n  margin-bottom: 0.33rem;\n  color: var(--muted);\n}\n\n.muted-copy {\n  color: var(--muted);\n  font-size: 0.88rem;\n}\n\n@media (max-width: 1320px) {\n  .system-grid {\n    grid-template-columns: repeat(2, minmax(200px, 1fr));\n  }\n\n  .filters-shell {\n    grid-template-columns: repeat(2, minmax(180px, 1fr));\n  }\n\n  .models-layout {\n    grid-template-columns: minmax(0, 1fr);\n  }\n\n  .details-panel {\n    max-height: none;\n  }\n}\n\n@media (max-width: 780px) {\n  .page-shell {\n    padding: 0.8rem;\n    gap: 0.72rem;\n  }\n\n  .hero-shell {\n    flex-direction: column;\n    align-items: flex-start;\n  }\n\n  .hero-actions {\n    width: 100%;\n    flex-wrap: wrap;\n  }\n\n  .hero-actions .btn {\n    flex: 1;\n    min-width: 135px;\n  }\n\n  .system-grid,\n  .filters-shell,\n  .details-grid,\n  .kpi-grid {\n    grid-template-columns: 1fr;\n  }\n}\n\n@keyframes rise {\n  from {\n    opacity: 0;\n    transform: translateY(8px);\n  }\n  to {\n    opacity: 1;\n    transform: translateY(0);\n  }\n}\n\n@keyframes float {\n  0%,\n  100% {\n    transform: translateY(0px);\n  }\n  50% {\n    transform: translateY(-10px);\n  }\n}\n"
  },
  {
    "path": "llmfit-web/src/test-setup.js",
    "content": "import '@testing-library/jest-dom/vitest';\n"
  },
  {
    "path": "llmfit-web/vite.config.js",
    "content": "import { defineConfig } from 'vite';\nimport react from '@vitejs/plugin-react';\n\nexport default defineConfig({\n  plugins: [react()],\n  server: {\n    port: 5173,\n    proxy: {\n      '/api': 'http://127.0.0.1:8787',\n      '/health': 'http://127.0.0.1:8787'\n    }\n  },\n  test: {\n    environment: 'jsdom',\n    setupFiles: './src/test-setup.js',\n    globals: true,\n    css: true\n  }\n});\n"
  },
  {
    "path": "scripts/install-openclaw-skill.sh",
    "content": "#!/usr/bin/env bash\nset -euo pipefail\n\n# Install the llmfit-advisor skill for OpenClaw\n# Usage: ./scripts/install-openclaw-skill.sh\n\nSKILL_NAME=\"llmfit-advisor\"\nSKILL_SRC=\"$(cd \"$(dirname \"$0\")/..\" && pwd)/skills/$SKILL_NAME\"\nSKILL_DST=\"$HOME/.openclaw/skills/$SKILL_NAME\"\n\n# Verify source exists\nif [ ! -f \"$SKILL_SRC/SKILL.md\" ]; then\n    echo \"Error: SKILL.md not found at $SKILL_SRC\"\n    exit 1\nfi\n\n# Check llmfit is installed\nif ! command -v llmfit &>/dev/null; then\n    echo \"Warning: llmfit is not on PATH.\"\n    echo \"Install it first:\"\n    echo \"  cargo install llmfit\"\n    echo \"  # or: brew install llmfit\"\n    echo \"\"\n    echo \"Continuing with skill install anyway...\"\nfi\n\n# Create destination and copy\nmkdir -p \"$SKILL_DST\"\ncp \"$SKILL_SRC/SKILL.md\" \"$SKILL_DST/SKILL.md\"\n\necho \"Installed $SKILL_NAME skill to $SKILL_DST\"\necho \"\"\necho \"The skill will be available on your next OpenClaw session.\"\necho \"Verify with: openclaw skills list | grep $SKILL_NAME\"\n"
  },
  {
    "path": "scripts/scrape_docker_models.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nScraper for Docker Model Runner available models.\n\nQueries the Docker Hub API for models in the 'ai/' namespace,\ncross-references against llmfit's HF model database and Ollama mapping table,\nand outputs a JSON mapping of HF model names to Docker Model Runner tags.\n\nUsage:\n  python3 scripts/scrape_docker_models.py\n\"\"\"\n\nimport json\nimport os\nimport sys\nimport urllib.request\nimport urllib.error\n\nDOCKER_HUB_API = \"https://hub.docker.com/v2/repositories/ai/\"\nPAGE_SIZE = 100\n\n# Same mapping as OLLAMA_MAPPINGS in providers.rs.\n# Maps lowercased HF repo suffix → Ollama-style tag (without ai/ prefix).\nOLLAMA_MAPPINGS = {\n    # Meta Llama family\n    \"llama-3.3-70b-instruct\": \"llama3.3:70b\",\n    \"llama-3.2-11b-vision-instruct\": \"llama3.2-vision:11b\",\n    \"llama-3.2-3b-instruct\": \"llama3.2:3b\",\n    \"llama-3.2-3b\": \"llama3.2:3b\",\n    \"llama-3.2-1b-instruct\": \"llama3.2:1b\",\n    \"llama-3.2-1b\": \"llama3.2:1b\",\n    \"llama-3.1-405b-instruct\": \"llama3.1:405b\",\n    \"llama-3.1-405b\": \"llama3.1:405b\",\n    \"llama-3.1-70b-instruct\": \"llama3.1:70b\",\n    \"llama-3.1-8b-instruct\": \"llama3.1:8b\",\n    \"llama-3.1-8b\": \"llama3.1:8b\",\n    \"meta-llama-3-8b-instruct\": \"llama3:8b\",\n    \"meta-llama-3-8b\": \"llama3:8b\",\n    \"llama-2-7b-hf\": \"llama2:7b\",\n    \"codellama-34b-instruct-hf\": \"codellama:34b\",\n    \"codellama-13b-instruct-hf\": \"codellama:13b\",\n    \"codellama-7b-instruct-hf\": \"codellama:7b\",\n    # Google Gemma\n    \"gemma-3-12b-it\": \"gemma3:12b\",\n    \"gemma-2-27b-it\": \"gemma2:27b\",\n    \"gemma-2-9b-it\": \"gemma2:9b\",\n    \"gemma-2-2b-it\": \"gemma2:2b\",\n    # Microsoft Phi\n    \"phi-4\": \"phi4\",\n    \"phi-4-mini-instruct\": \"phi4-mini\",\n    \"phi-3.5-mini-instruct\": \"phi3.5\",\n    \"phi-3-mini-4k-instruct\": \"phi3\",\n    \"phi-3-medium-14b-instruct\": \"phi3:14b\",\n    \"phi-2\": \"phi\",\n    \"orca-2-7b\": \"orca2:7b\",\n    \"orca-2-13b\": \"orca2:13b\",\n    # Mistral\n    \"mistral-7b-instruct-v0.3\": \"mistral:7b\",\n    \"mistral-7b-instruct-v0.2\": \"mistral:7b\",\n    \"mistral-nemo-instruct-2407\": \"mistral-nemo\",\n    \"mistral-small-24b-instruct-2501\": \"mistral-small:24b\",\n    \"mistral-large-instruct-2407\": \"mistral-large\",\n    \"mixtral-8x7b-instruct-v0.1\": \"mixtral:8x7b\",\n    \"mixtral-8x22b-instruct-v0.1\": \"mixtral:8x22b\",\n    # Qwen 2 / 2.5\n    \"qwen2-1.5b-instruct\": \"qwen2:1.5b\",\n    \"qwen2.5-72b-instruct\": \"qwen2.5:72b\",\n    \"qwen2.5-32b-instruct\": \"qwen2.5:32b\",\n    \"qwen2.5-14b-instruct\": \"qwen2.5:14b\",\n    \"qwen2.5-7b-instruct\": \"qwen2.5:7b\",\n    \"qwen2.5-7b\": \"qwen2.5:7b\",\n    \"qwen2.5-3b-instruct\": \"qwen2.5:3b\",\n    \"qwen2.5-1.5b-instruct\": \"qwen2.5:1.5b\",\n    \"qwen2.5-1.5b\": \"qwen2.5:1.5b\",\n    \"qwen2.5-0.5b-instruct\": \"qwen2.5:0.5b\",\n    \"qwen2.5-0.5b\": \"qwen2.5:0.5b\",\n    \"qwen2.5-coder-32b-instruct\": \"qwen2.5-coder:32b\",\n    \"qwen2.5-coder-14b-instruct\": \"qwen2.5-coder:14b\",\n    \"qwen2.5-coder-7b-instruct\": \"qwen2.5-coder:7b\",\n    \"qwen2.5-coder-1.5b-instruct\": \"qwen2.5-coder:1.5b\",\n    \"qwen2.5-coder-0.5b-instruct\": \"qwen2.5-coder:0.5b\",\n    \"qwen2.5-vl-7b-instruct\": \"qwen2.5vl:7b\",\n    \"qwen2.5-vl-3b-instruct\": \"qwen2.5vl:3b\",\n    # Qwen 3\n    \"qwen3-235b-a22b\": \"qwen3:235b\",\n    \"qwen3-32b\": \"qwen3:32b\",\n    \"qwen3-30b-a3b\": \"qwen3:30b-a3b\",\n    \"qwen3-30b-a3b-instruct-2507\": \"qwen3:30b-a3b\",\n    \"qwen3-14b\": \"qwen3:14b\",\n    \"qwen3-8b\": \"qwen3:8b\",\n    \"qwen3-4b\": \"qwen3:4b\",\n    \"qwen3-4b-instruct-2507\": \"qwen3:4b\",\n    \"qwen3-1.7b-base\": \"qwen3:1.7b\",\n    \"qwen3-0.6b\": \"qwen3:0.6b\",\n    \"qwen3-coder-30b-a3b-instruct\": \"qwen3-coder\",\n    # Qwen 3.5\n    \"qwen3.5-27b\": \"qwen3.5\",\n    \"qwen3.5-35b-a3b\": \"qwen3.5:35b\",\n    \"qwen3.5-122b-a10b\": \"qwen3.5:122b\",\n    # Qwen3-Coder-Next\n    \"qwen3-coder-next\": \"qwen3-coder-next\",\n    # DeepSeek\n    \"deepseek-v3\": \"deepseek-v3\",\n    \"deepseek-v3.2\": \"deepseek-v3\",\n    \"deepseek-r1\": \"deepseek-r1\",\n    \"deepseek-r1-0528\": \"deepseek-r1\",\n    \"deepseek-r1-distill-qwen-32b\": \"deepseek-r1:32b\",\n    \"deepseek-r1-distill-qwen-14b\": \"deepseek-r1:14b\",\n    \"deepseek-r1-distill-qwen-7b\": \"deepseek-r1:7b\",\n    \"deepseek-coder-v2-lite-instruct\": \"deepseek-coder-v2:16b\",\n    # Community / other\n    \"tinyllama-1.1b-chat-v1.0\": \"tinyllama\",\n    \"stablelm-2-1_6b-chat\": \"stablelm2:1.6b\",\n    \"yi-6b-chat\": \"yi:6b\",\n    \"yi-34b-chat\": \"yi:34b\",\n    \"starcoder2-7b\": \"starcoder2:7b\",\n    \"starcoder2-15b\": \"starcoder2:15b\",\n    \"falcon-7b-instruct\": \"falcon:7b\",\n    \"falcon-40b-instruct\": \"falcon:40b\",\n    \"falcon-180b-chat\": \"falcon:180b\",\n    \"falcon3-7b-instruct\": \"falcon3:7b\",\n    \"openchat-3.5-0106\": \"openchat:7b\",\n    \"vicuna-7b-v1.5\": \"vicuna:7b\",\n    \"vicuna-13b-v1.5\": \"vicuna:13b\",\n    \"glm-4-9b-chat\": \"glm4:9b\",\n    \"solar-10.7b-instruct-v1.0\": \"solar:10.7b\",\n    \"zephyr-7b-beta\": \"zephyr:7b\",\n    \"c4ai-command-r-v01\": \"command-r\",\n    \"nous-hermes-2-mixtral-8x7b-dpo\": \"nous-hermes2-mixtral:8x7b\",\n    \"hermes-3-llama-3.1-8b\": \"hermes3:8b\",\n    \"nomic-embed-text-v1.5\": \"nomic-embed-text\",\n    \"bge-large-en-v1.5\": \"bge-large\",\n    \"smollm2-135m-instruct\": \"smollm2:135m\",\n    \"smollm2-135m\": \"smollm2:135m\",\n    # Google Gemma 3n\n    \"gemma-3n-e4b-it\": \"gemma3n:e4b\",\n    \"gemma-3n-e2b-it\": \"gemma3n:e2b\",\n    # Microsoft Phi-4 reasoning\n    \"phi-4-reasoning\": \"phi4-reasoning\",\n    \"phi-4-mini-reasoning\": \"phi4-mini-reasoning\",\n    # DeepSeek V3.2 Speciale\n    \"deepseek-v3.2-speciale\": \"deepseek-v3\",\n    # Liquid AI LFM2\n    \"lfm2-350m\": \"lfm2:350m\",\n    \"lfm2-700m\": \"lfm2:700m\",\n    \"lfm2-1.2b\": \"lfm2:1.2b\",\n    \"lfm2-2.6b\": \"lfm2:2.6b\",\n    \"lfm2-2.6b-exp\": \"lfm2:2.6b\",\n    \"lfm2-8b-a1b\": \"lfm2:8b-a1b\",\n    \"lfm2-24b-a2b\": \"lfm2:24b\",\n    # Liquid AI LFM2.5\n    \"lfm2.5-1.2b-instruct\": \"lfm2.5:1.2b\",\n    \"lfm2.5-1.2b-thinking\": \"lfm2.5-thinking:1.2b\",\n}\n\n\ndef fetch_docker_hub_models() -> list[str]:\n    \"\"\"Fetch all model names from the Docker Hub ai/ namespace.\"\"\"\n    models = []\n    url = f\"{DOCKER_HUB_API}?page_size={PAGE_SIZE}\"\n\n    while url:\n        req = urllib.request.Request(url, headers={\"User-Agent\": \"llmfit-scraper/1.0\"})\n        try:\n            with urllib.request.urlopen(req, timeout=10) as resp:\n                data = json.loads(resp.read().decode())\n        except (urllib.error.URLError, urllib.error.HTTPError) as e:\n            print(f\"Error fetching {url}: {e}\", file=sys.stderr)\n            break\n\n        for repo in data.get(\"results\", []):\n            name = repo.get(\"name\", \"\")\n            if name:\n                models.append(name)\n\n        url = data.get(\"next\")\n\n    return models\n\n\ndef fetch_tags_for_model(model_name: str) -> list[str]:\n    \"\"\"Fetch available tags for a Docker Hub ai/ model.\"\"\"\n    url = f\"{DOCKER_HUB_API}{model_name}/tags/?page_size=100\"\n    req = urllib.request.Request(url, headers={\"User-Agent\": \"llmfit-scraper/1.0\"})\n    try:\n        with urllib.request.urlopen(req, timeout=10) as resp:\n            data = json.loads(resp.read().decode())\n    except (urllib.error.URLError, urllib.error.HTTPError):\n        return []\n\n    return [t[\"name\"] for t in data.get(\"results\", []) if t.get(\"name\")]\n\n\ndef ollama_tag_to_docker_repo(ollama_tag: str) -> str:\n    \"\"\"Extract the Docker Hub repo name from an Ollama tag.\n\n    E.g. \"llama3.1:8b\" → \"llama3.1\", \"phi4\" → \"phi4\"\n    \"\"\"\n    return ollama_tag.split(\":\")[0]\n\n\ndef lookup_ollama_tag(hf_name: str) -> str | None:\n    \"\"\"Mirror the Rust lookup_ollama_tag logic.\n\n    Extract the repo suffix (after last '/'), lowercase it,\n    and look it up in the OLLAMA_MAPPINGS dict.\n    \"\"\"\n    suffix = hf_name.rsplit(\"/\", 1)[-1].lower()\n    return OLLAMA_MAPPINGS.get(suffix)\n\n\ndef main():\n    script_dir = os.path.dirname(os.path.abspath(__file__))\n    project_root = os.path.dirname(script_dir)\n    output_file = os.path.join(project_root, \"llmfit-core\", \"data\", \"docker_models.json\")\n\n    # Load the HF model database to get all model names\n    hf_models_file = os.path.join(project_root, \"llmfit-core\", \"data\", \"hf_models.json\")\n    with open(hf_models_file) as f:\n        hf_models = json.load(f)\n\n    print(f\"Loaded {len(hf_models)} models from HF database\")\n\n    # Fetch all available Docker Hub ai/ models\n    print(\"Fetching Docker Hub ai/ namespace...\")\n    docker_repos = fetch_docker_hub_models()\n    # Filter out vllm/safetensors variants — these are alternative serving formats,\n    # not standard Model Runner models\n    docker_repos = [r for r in docker_repos if not r.endswith((\"-vllm\", \"-safetensors\"))]\n    docker_repo_set = set(docker_repos)\n    print(f\"Found {len(docker_repos)} Docker Model Runner repos (excl. vllm/safetensors variants)\")\n\n    # Fetch tags for each available repo\n    print(\"Fetching tags for each repo...\")\n    repo_tags: dict[str, list[str]] = {}\n    for repo in sorted(docker_repos):\n        tags = fetch_tags_for_model(repo)\n        repo_tags[repo] = tags\n        tag_str = \", \".join(tags[:5])\n        if len(tags) > 5:\n            tag_str += f\", ... ({len(tags)} total)\"\n        print(f\"  ai/{repo}: [{tag_str}]\")\n\n    # Cross-reference: for each HF model, check if its Ollama tag maps to a\n    # Docker Hub repo. Uses the same lookup logic as Rust's lookup_ollama_tag().\n    mappings = []\n    matched = 0\n    unmatched_models = []\n\n    for model in hf_models:\n        hf_name = model[\"name\"]\n\n        ollama_tag = lookup_ollama_tag(hf_name)\n        if not ollama_tag:\n            unmatched_models.append(hf_name)\n            continue\n\n        docker_repo = ollama_tag_to_docker_repo(ollama_tag)\n        if docker_repo not in docker_repo_set:\n            unmatched_models.append(hf_name)\n            continue\n\n        # Build the full Docker tag: ai/<repo>:<size> or ai/<repo>\n        docker_tag = f\"ai/{ollama_tag}\"\n        available_tags = repo_tags.get(docker_repo, [])\n\n        mappings.append({\n            \"hf_name\": hf_name,\n            \"docker_tag\": docker_tag,\n            \"docker_repo\": f\"ai/{docker_repo}\",\n            \"available_tags\": available_tags,\n        })\n        matched += 1\n\n    print()\n    print(f\"Matched: {matched}/{len(hf_models)} models have Docker Model Runner images\")\n\n    if unmatched_models:\n        print(f\"Unmatched: {len(unmatched_models)} models (no Ollama mapping or no Docker repo)\")\n\n    # Write output\n    output = {\n        \"generated_by\": \"scrape_docker_models.py\",\n        \"docker_hub_repo_count\": len(docker_repos),\n        \"matched_model_count\": matched,\n        \"models\": mappings,\n    }\n\n    with open(output_file, \"w\") as f:\n        json.dump(output, f, indent=2)\n        f.write(\"\\n\")\n\n    print(f\"\\nWrote {output_file}\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "scripts/scrape_hf_models.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nScraper for popular LLM models from Hugging Face.\nFetches model metadata and computes RAM/VRAM requirements from parameter counts.\nOutputs a JSON file consumable by llmfit's models.rs.\n\nUsage:\n  python3 scrape_hf_models.py                  # Curated list only\n  python3 scrape_hf_models.py --discover        # Curated + top trending models\n  python3 scrape_hf_models.py --discover -n 50  # Curated + top 50 trending\n\"\"\"\n\nimport argparse\nimport json\nimport os\nimport sys\nimport time\nimport urllib.request\nimport urllib.error\n\nHF_API = \"https://huggingface.co/api/models\"\n\n# Global auth token, set from --token flag or HF_TOKEN / HUGGING_FACE_HUB_TOKEN env var\n_hf_token: str | None = None\n\n\ndef _auth_headers() -> dict[str, str]:\n    \"\"\"Return HTTP headers with auth if a HuggingFace token is available.\"\"\"\n    headers = {\"User-Agent\": \"llmfit-scraper/1.0\"}\n    if _hf_token:\n        headers[\"Authorization\"] = f\"Bearer {_hf_token}\"\n    return headers\n\n# Top text-generation models to scrape (owner/repo)\nTARGET_MODELS = [\n    # Meta Llama family\n    \"meta-llama/Llama-3.1-8B\",\n    \"meta-llama/Llama-3.1-8B-Instruct\",\n    \"meta-llama/Llama-3.1-70B-Instruct\",\n    \"meta-llama/Llama-3.1-405B-Instruct\",\n    \"meta-llama/Llama-3.2-1B\",\n    \"meta-llama/Llama-3.2-3B\",\n    \"meta-llama/Llama-3.2-11B-Vision-Instruct\",  # NEW: Multimodal vision model\n    \"meta-llama/Llama-3.3-70B-Instruct\",\n    # Meta Llama 4 (MoE)\n    \"meta-llama/Llama-4-Scout-17B-16E-Instruct\",\n    \"meta-llama/Llama-4-Maverick-17B-128E-Instruct\",\n    # Code Llama\n    \"meta-llama/CodeLlama-7b-Instruct-hf\",  # NEW: Popular code model\n    \"meta-llama/CodeLlama-13b-Instruct-hf\",  # NEW: Larger code model\n    \"meta-llama/CodeLlama-34b-Instruct-hf\",  # NEW: Large code model\n    # Mistral\n    \"mistralai/Mistral-7B-Instruct-v0.3\",\n    \"mistralai/Mixtral-8x7B-Instruct-v0.1\",\n    \"mistralai/Mixtral-8x22B-Instruct-v0.1\",\n    \"mistralai/Mistral-Large-Instruct-2407\",\n    \"mistralai/Mistral-Small-24B-Instruct-2501\",\n    \"mistralai/Mistral-Small-3.1-24B-Instruct-2503\",\n    \"mistralai/Ministral-8B-Instruct-2410\",\n    \"mistralai/Mistral-Nemo-Instruct-2407\",\n    # Qwen\n    \"Qwen/Qwen2.5-7B-Instruct\",\n    \"Qwen/Qwen2.5-14B-Instruct\",\n    \"Qwen/Qwen2.5-32B-Instruct\",\n    \"Qwen/Qwen2.5-72B-Instruct\",\n    \"Qwen/Qwen2.5-Coder-1.5B-Instruct\",  # NEW: Ultra-lightweight coder\n    \"Qwen/Qwen2.5-Coder-7B-Instruct\",     # NEW: Popular coder\n    \"Qwen/Qwen2.5-Coder-14B-Instruct\",    # NEW: Mid-size coder\n    \"Qwen/Qwen2.5-Coder-32B-Instruct\",    # NEW: Large coder\n    \"Qwen/Qwen2.5-VL-3B-Instruct\",        # NEW: Vision-language 3B\n    \"Qwen/Qwen2.5-VL-7B-Instruct\",        # NEW: Vision-language 7B\n    \"Qwen/Qwen3-0.6B\",\n    \"Qwen/Qwen3-1.7B\",\n    \"Qwen/Qwen3-4B\",\n    \"Qwen/Qwen3-8B\",\n    \"Qwen/Qwen3-14B\",\n    \"Qwen/Qwen3-32B\",\n    \"Qwen/Qwen3-30B-A3B\",\n    \"Qwen/Qwen3-235B-A22B\",\n    \"Qwen/Qwen3-Coder-480B-A35B-Instruct\",\n    \"Qwen/Qwen3-Coder-Next\",\n    # Qwen 3.5 (native multimodal, Feb 2026)\n    \"Qwen/Qwen3.5-27B\",\n    \"Qwen/Qwen3.5-35B-A3B\",\n    \"Qwen/Qwen3.5-122B-A10B\",\n    \"Qwen/Qwen3.5-397B-A17B\",\n    # Qwen3.5 Small Series (Instruct)\n    \"Qwen/Qwen3.5-0.8B\",\n    \"Qwen/Qwen3.5-2B\",\n    \"Qwen/Qwen3.5-4B\",\n    \"Qwen/Qwen3.5-9B\",\n    # Qwen3.5 Small Series (Base)\n    \"Qwen/Qwen3.5-0.8B-Base\",\n    \"Qwen/Qwen3.5-2B-Base\",\n    \"Qwen/Qwen3.5-4B-Base\",\n    \"Qwen/Qwen3.5-9B-Base\",\n    # Microsoft Phi\n    \"microsoft/phi-3-mini-4k-instruct\",\n    \"microsoft/Phi-3-medium-14b-instruct\",\n    \"microsoft/Phi-3.5-mini-instruct\",  # NEW: Newer Phi variant\n    \"microsoft/phi-4\",\n    \"microsoft/Phi-4-mini-instruct\",\n    # Microsoft Orca\n    \"microsoft/Orca-2-7b\",  # NEW: Reasoning model\n    \"microsoft/Orca-2-13b\",  # NEW: Larger reasoning model\n    # Google Gemma\n    \"google/gemma-2-2b-it\",  # NEW: Smaller variant for edge\n    \"google/gemma-2-9b-it\",\n    \"google/gemma-2-27b-it\",\n    \"google/gemma-3-1b-it\",\n    \"google/gemma-3-4b-it\",\n    \"google/gemma-3-12b-it\",\n    \"google/gemma-3-27b-it\",\n    # DeepSeek\n    \"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\",\n    \"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B\",\n    \"deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct\",\n    \"deepseek-ai/DeepSeek-V3\",\n    \"deepseek-ai/DeepSeek-R1\",\n    # Cohere\n    \"CohereForAI/c4ai-command-r-v01\",\n    # 01.ai Yi family\n    \"01-ai/Yi-6B-Chat\",  # NEW: Popular multilingual 6B\n    \"01-ai/Yi-34B-Chat\",  # NEW: Popular multilingual 34B\n    # Upstage Solar\n    \"upstage/SOLAR-10.7B-Instruct-v1.0\",  # NEW: High-performance 10.7B\n    # TII Falcon\n    \"tiiuae/falcon-7b-instruct\",  # NEW: Popular UAE model\n    \"tiiuae/falcon-40b-instruct\",\n    \"tiiuae/falcon-180B-chat\",\n    \"tiiuae/Falcon3-7B-Instruct\",\n    \"tiiuae/Falcon3-10B-Instruct\",\n    # HuggingFace Zephyr\n    \"HuggingFaceH4/zephyr-7b-beta\",  # NEW: Very popular fine-tune\n    # OpenChat\n    \"openchat/openchat-3.5-0106\",  # NEW: Popular alternative\n    # LMSYS Vicuna\n    \"lmsys/vicuna-7b-v1.5\",  # NEW: Popular community model\n    \"lmsys/vicuna-13b-v1.5\",  # NEW: Larger Vicuna\n    # NousResearch\n    \"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO\",  # NEW: Popular fine-tune\n    # WizardLM\n    \"WizardLMTeam/WizardLM-13B-V1.2\",  # NEW: Popular instruction model\n    # Code models\n    \"bigcode/starcoder2-7b\",\n    \"bigcode/starcoder2-15b\",\n    \"WizardLMTeam/WizardCoder-15B-V1.0\",  # NEW: Code specialist\n    # Small / edge models\n    \"TinyLlama/TinyLlama-1.1B-Chat-v1.0\",\n    \"stabilityai/stablelm-2-1_6b-chat\",\n    # IBM Granite\n    \"ibm-granite/granite-3.1-8b-instruct\",\n    \"ibm-granite/granite-4.0-h-tiny\",\n    \"ibm-granite/granite-4.0-h-micro\",\n    \"ibm-granite/granite-4.0-h-small\",\n    # Allen Institute OLMo\n    \"allenai/OLMo-2-0325-32B-Instruct\",\n    # Zhipu GLM\n    \"THUDM/glm-4-9b-chat\",\n    # xAI Grok\n    \"xai-org/grok-1\",\n    # Moonshot Kimi\n    \"moonshotai/Kimi-K2-Instruct\",\n    # BigScience BLOOM\n    \"bigscience/bloom\",\n    # Baidu ERNIE\n    \"baidu/ERNIE-4.5-300B-A47B-Paddle\",\n    # Rednote dots.llm\n    \"rednote-hilab/dots.llm1.inst\",\n    # Meituan LongCat\n    \"meituan/LongCat-Flash\",\n    # Ant Group Ling\n    \"inclusionAI/Ling-lite\",\n    # Liquid AI LFM2 (dense)\n    \"LiquidAI/LFM2-350M\",\n    \"LiquidAI/LFM2-700M\",\n    \"LiquidAI/LFM2-1.2B\",\n    \"LiquidAI/LFM2-2.6B\",\n    \"LiquidAI/LFM2-2.6B-Exp\",\n    # Liquid AI LFM2 (MoE)\n    \"LiquidAI/LFM2-8B-A1B\",\n    \"LiquidAI/LFM2-24B-A2B\",\n    # Liquid AI LFM2.5\n    \"LiquidAI/LFM2.5-1.2B-Base\",\n    \"LiquidAI/LFM2.5-1.2B-Instruct\",\n    \"LiquidAI/LFM2.5-1.2B-Thinking\",\n    \"LiquidAI/LFM2.5-1.2B-JP\",\n    # Liquid AI LFM2 Vision-Language\n    \"LiquidAI/LFM2-VL-450M\",\n    \"LiquidAI/LFM2-VL-1.6B\",\n    \"LiquidAI/LFM2-VL-3B\",\n    \"LiquidAI/LFM2.5-VL-1.6B\",\n    # Liquid AI LFM2 Audio\n    \"LiquidAI/LFM2-Audio-1.5B\",\n    \"LiquidAI/LFM2.5-Audio-1.5B\",\n    # Liquid AI Liquid Nanos (task-specific fine-tunes)\n    \"LiquidAI/LFM2-1.2B-Tool\",\n    \"LiquidAI/LFM2-1.2B-RAG\",\n    \"LiquidAI/LFM2-1.2B-Extract\",\n    \"LiquidAI/LFM2-350M-Extract\",\n    \"LiquidAI/LFM2-350M-Math\",\n    \"LiquidAI/LFM2-350M-ENJP-MT\",\n    \"LiquidAI/LFM2-350M-PII-Extract-JP\",\n    \"LiquidAI/LFM2-ColBERT-350M\",\n    \"LiquidAI/LFM2-2.6B-Transcript\",\n    # Embeddings (useful for RAG sizing)\n    \"nomic-ai/nomic-embed-text-v1.5\",\n    \"BAAI/bge-large-en-v1.5\",\n    # --- New models added Feb 2026 ---\n    # DeepSeek V3.2 family\n    \"deepseek-ai/DeepSeek-V3.2\",\n    \"deepseek-ai/DeepSeek-V3.2-Speciale\",\n    # Zhipu/Z.ai GLM-5\n    \"zai-org/GLM-5\",\n    # Moonshot Kimi K2.5\n    \"moonshotai/Kimi-K2.5\",\n    # MiniMax M2.7 / M2.5\n    \"MiniMaxAI/MiniMax-M2.7\",\n    \"MiniMaxAI/MiniMax-M2.5\",\n    # Xiaomi MiMo\n    \"XiaomiMiMo/MiMo-V2-Flash\",\n    \"XiaomiMiMo/MiMo-7B-RL\",\n    # NVIDIA Nemotron\n    \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16\",\n    \"nvidia/NVIDIA-Nemotron-Nano-9B-v2\",\n    # Microsoft Phi-4 reasoning family\n    \"microsoft/Phi-4-reasoning\",\n    \"microsoft/Phi-4-mini-reasoning\",\n    \"microsoft/Phi-4-multimodal-instruct\",\n    # LG AI EXAONE 4.0\n    \"LGAI-EXAONE/EXAONE-4.0-32B\",\n    \"LGAI-EXAONE/EXAONE-4.0-1.2B\",\n    # HuggingFace SmolLM3\n    \"HuggingFaceTB/SmolLM3-3B\",\n    # Google Gemma 3n (effective parameter models)\n    \"google/gemma-3n-E4B-it\",\n    \"google/gemma-3n-E2B-it\",\n]\n\n# Bytes-per-parameter for different quantization levels\nQUANT_BPP = {\n    \"F32\":    4.0,\n    \"F16\":    2.0,\n    \"BF16\":   2.0,\n    \"Q8_0\":   1.0,\n    \"Q6_K\":   0.75,\n    \"Q5_K_M\": 0.625,\n    \"Q4_K_M\": 0.5,\n    \"Q4_0\":   0.5,\n    \"Q3_K_M\": 0.4375,\n    \"Q2_K\":   0.3125,\n    \"AWQ-4bit\": 0.5,\n    \"AWQ-8bit\": 1.0,\n    \"GPTQ-Int4\": 0.5,\n    \"GPTQ-Int8\": 1.0,\n}\n\n# Overhead multiplier for runtime memory beyond just model weights\nRUNTIME_OVERHEAD = 1.2  # ~20% overhead for KV cache, activations, OS\n\n# Known MoE (Mixture of Experts) architecture configurations\nMOE_CONFIGS = {\n    \"mixtral\": {\"num_experts\": 8, \"active_experts\": 2},\n    \"deepseek_v2\": {\"num_experts\": 64, \"active_experts\": 6},\n    \"deepseek_v3\": {\"num_experts\": 256, \"active_experts\": 8},\n    \"qwen3_moe\": {\"num_experts\": 128, \"active_experts\": 8},\n    \"llama4\": {\"num_experts\": 16, \"active_experts\": 1},\n    \"grok\": {\"num_experts\": 8, \"active_experts\": 2},\n    \"glm5\": {\"num_experts\": 256, \"active_experts\": 8},\n    \"minimax_m2\": {\"num_experts\": 32, \"active_experts\": 2},\n    \"mimo_v2\": {\"num_experts\": 128, \"active_experts\": 8},\n    \"nemotron3_nano\": {\"num_experts\": 128, \"active_experts\": 6},\n    \"qwen3_5_moe\": {\"num_experts\": 256, \"active_experts\": 8},\n    \"qwen3_vl_moe\": {\"num_experts\": 256, \"active_experts\": 8},\n}\n\n# Published active parameter counts for well-known MoE models\nMOE_ACTIVE_PARAMS = {\n    \"mistralai/Mixtral-8x7B-Instruct-v0.1\": 12_900_000_000,\n    \"mistralai/Mixtral-8x22B-Instruct-v0.1\": 39_100_000_000,\n    \"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO\": 12_900_000_000,\n    \"deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct\": 2_400_000_000,\n    \"deepseek-ai/DeepSeek-V3\": 37_000_000_000,\n    \"deepseek-ai/DeepSeek-R1\": 37_000_000_000,\n    \"deepseek-ai/DeepSeek-V3.2\": 37_000_000_000,\n    \"deepseek-ai/DeepSeek-V3.2-Speciale\": 37_000_000_000,\n    \"Qwen/Qwen3-30B-A3B\": 3_300_000_000,\n    \"Qwen/Qwen3-235B-A22B\": 22_000_000_000,\n    \"Qwen/Qwen3-Coder-480B-A35B-Instruct\": 35_000_000_000,\n    \"Qwen/Qwen3-Coder-Next\": 3_000_000_000,\n    \"Qwen/Qwen3.5-35B-A3B\": 3_000_000_000,\n    \"Qwen/Qwen3.5-122B-A10B\": 10_000_000_000,\n    \"Qwen/Qwen3.5-397B-A17B\": 17_000_000_000,\n    \"meta-llama/Llama-4-Scout-17B-16E-Instruct\": 17_000_000_000,\n    \"meta-llama/Llama-4-Maverick-17B-128E-Instruct\": 17_000_000_000,\n    \"xai-org/grok-1\": 86_000_000_000,\n    \"moonshotai/Kimi-K2-Instruct\": 32_000_000_000,\n    \"moonshotai/Kimi-K2.5\": 32_000_000_000,\n    \"zai-org/GLM-5\": 40_000_000_000,\n    \"MiniMaxAI/MiniMax-M2.7\": 10_000_000_000,\n    \"MiniMaxAI/MiniMax-M2.5\": 10_000_000_000,\n    \"XiaomiMiMo/MiMo-V2-Flash\": 15_000_000_000,\n    \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16\": 3_000_000_000,\n    \"LiquidAI/LFM2-8B-A1B\": 1_500_000_000,\n    \"LiquidAI/LFM2-24B-A2B\": 2_300_000_000,  # 23.8B total, 2.3B active\n}\n\n\ndef fetch_model_info(repo_id: str) -> dict | None:\n    \"\"\"Fetch model info from HuggingFace API.\"\"\"\n    url = f\"{HF_API}/{repo_id}\"\n    req = urllib.request.Request(url, headers=_auth_headers())\n    try:\n        with urllib.request.urlopen(req, timeout=30) as resp:\n            return json.loads(resp.read().decode())\n    except urllib.error.HTTPError as e:\n        if e.code == 401 and not _hf_token:\n            print(f\"  ⚠ HTTP 401 for {repo_id} — model is gated, set HF_TOKEN to access\",\n                  file=sys.stderr)\n        else:\n            print(f\"  ⚠ HTTP {e.code} for {repo_id} — skipping\", file=sys.stderr)\n        return None\n    except Exception as e:\n        print(f\"  ⚠ Error fetching {repo_id}: {e}\", file=sys.stderr)\n        return None\n\n\ndef format_param_count(total_params: int) -> str:\n    \"\"\"Convert raw parameter count into human-readable string.\"\"\"\n    if total_params >= 1_000_000_000:\n        val = total_params / 1_000_000_000\n        return f\"{val:.1f}B\" if val != int(val) else f\"{int(val)}B\"\n    elif total_params >= 1_000_000:\n        val = total_params / 1_000_000\n        return f\"{val:.0f}M\"\n    else:\n        return f\"{total_params / 1_000:.0f}K\"\n\n\ndef estimate_ram(total_params: int, quant: str) -> tuple[float, float]:\n    \"\"\"\n    Estimate min RAM (Q4 quantized) and recommended RAM (comfortable headroom).\n    Returns (min_ram_gb, recommended_ram_gb).\n    \"\"\"\n    bpp = QUANT_BPP.get(quant, 0.5)\n    model_size_gb = (total_params * bpp) / (1024**3)\n    min_ram_gb = model_size_gb * RUNTIME_OVERHEAD\n    # Recommended: enough for Q4 + generous KV cache + OS headroom\n    recommended_ram_gb = model_size_gb * 2.0\n\n    # Apply sensible floor\n    min_ram_gb = max(min_ram_gb, 1.0)\n    recommended_ram_gb = max(recommended_ram_gb, 2.0)\n\n    return round(min_ram_gb, 1), round(recommended_ram_gb, 1)\n\n\ndef estimate_vram(total_params: int, quant: str) -> float:\n    \"\"\"Estimate minimum VRAM to fit model weights on GPU.\"\"\"\n    bpp = QUANT_BPP.get(quant, 0.5)\n    model_size_gb = (total_params * bpp) / (1024**3)\n    # VRAM needs to hold weights + some activation memory\n    vram_gb = model_size_gb * 1.1\n    return round(max(vram_gb, 0.5), 1)\n\n\ndef detect_moe(repo_id: str, config: dict | None, architecture: str,\n               total_params: int) -> dict:\n    \"\"\"Detect MoE architecture and compute active parameters.\"\"\"\n    result = {\n        \"is_moe\": False,\n        \"num_experts\": None,\n        \"active_experts\": None,\n        \"active_parameters\": None,\n    }\n\n    # Check config.json for MoE indicators\n    num_experts = None\n    active_experts = None\n    if config:\n        num_experts = config.get(\"num_local_experts\") or config.get(\"num_experts\")\n        active_experts = config.get(\"num_experts_per_tok\")\n\n    # Check if architecture is in known MoE configs\n    if architecture in MOE_CONFIGS:\n        moe = MOE_CONFIGS[architecture]\n        num_experts = num_experts or moe[\"num_experts\"]\n        active_experts = active_experts or moe[\"active_experts\"]\n\n    if num_experts and active_experts:\n        result[\"is_moe\"] = True\n        result[\"num_experts\"] = num_experts\n        result[\"active_experts\"] = active_experts\n\n        # Use published active params if known, otherwise estimate\n        if repo_id in MOE_ACTIVE_PARAMS:\n            result[\"active_parameters\"] = MOE_ACTIVE_PARAMS[repo_id]\n        else:\n            result[\"active_parameters\"] = estimate_active_params(\n                total_params, num_experts, active_experts)\n\n    return result\n\n\ndef estimate_active_params(total_params: int, num_experts: int,\n                           active_experts: int) -> int:\n    \"\"\"Estimate active parameters for MoE models.\n\n    Assumes expert MLP layers are ~95% of total params and\n    shared attention/embedding layers are ~5%.\n    \"\"\"\n    shared_fraction = 0.05\n    shared = int(total_params * shared_fraction)\n    expert_pool = total_params - shared\n    per_expert = expert_pool // num_experts\n    return shared + active_experts * per_expert\n\n\ndef infer_use_case(repo_id: str, pipeline_tag: str | None, config: dict | None) -> str:\n    \"\"\"Infer a brief use-case description from model metadata.\"\"\"\n    rid = repo_id.lower()\n    if \"embed\" in rid or \"bge\" in rid:\n        return \"Text embeddings for RAG\"\n    if \"coder\" in rid or \"starcoder\" in rid or \"code\" in rid:\n        return \"Code generation and completion\"\n    if \"r1\" in rid or \"reason\" in rid:\n        return \"Advanced reasoning, chain-of-thought\"\n    if \"instruct\" in rid or \"chat\" in rid:\n        return \"Instruction following, chat\"\n    if \"tiny\" in rid or \"small\" in rid or \"mini\" in rid:\n        return \"Lightweight, edge deployment\"\n    if pipeline_tag == \"text-generation\":\n        return \"General purpose text generation\"\n    return \"General purpose\"\n\n\ndef infer_context_length(config: dict | None) -> int:\n    \"\"\"Try to extract context length from model config.\"\"\"\n    if not config:\n        return 4096\n\n    # Common config keys for max sequence length\n    keys_to_check = [\n        \"max_position_embeddings\",\n        \"max_sequence_length\",\n        \"seq_length\",\n        \"n_positions\",\n        \"sliding_window\",\n    ]\n\n    # Check top-level config\n    for key in keys_to_check:\n        if key in config:\n            val = config[key]\n            if isinstance(val, int) and val > 0:\n                return val\n\n    # For multimodal models (e.g., Qwen3.5), check text_config\n    if \"text_config\" in config and isinstance(config[\"text_config\"], dict):\n        for key in keys_to_check:\n            if key in config[\"text_config\"]:\n                val = config[\"text_config\"][key]\n                if isinstance(val, int) and val > 0:\n                    return val\n\n    return 4096\n\n\ndef fetch_config_json(repo_id: str) -> dict | None:\n    \"\"\"Fetch the full config.json from a HF repo (has max_position_embeddings).\"\"\"\n    url = f\"https://huggingface.co/{repo_id}/resolve/main/config.json\"\n    req = urllib.request.Request(url, headers=_auth_headers())\n    try:\n        with urllib.request.urlopen(req, timeout=15) as resp:\n            return json.loads(resp.read().decode())\n    except Exception:\n        return None\n\n\ndef extract_provider(repo_id: str) -> str:\n    \"\"\"Map HF org name to a friendly provider name.\"\"\"\n    org = repo_id.split(\"/\")[0].lower()\n    mapping = {\n        \"meta-llama\": \"Meta\",\n        \"mistralai\": \"Mistral AI\",\n        \"qwen\": \"Alibaba\",\n        \"microsoft\": \"Microsoft\",\n        \"google\": \"Google\",\n        \"deepseek-ai\": \"DeepSeek\",\n        \"bigcode\": \"BigCode\",\n        \"cohereforai\": \"Cohere\",\n        \"tinyllama\": \"Community\",\n        \"stabilityai\": \"Stability AI\",\n        \"nomic-ai\": \"Nomic\",\n        \"baai\": \"BAAI\",\n        \"01-ai\": \"01.ai\",  # NEW\n        \"upstage\": \"Upstage\",  # NEW\n        \"tiiuae\": \"TII\",  # NEW\n        \"huggingfaceh4\": \"HuggingFace\",  # NEW\n        \"openchat\": \"OpenChat\",  # NEW\n        \"lmsys\": \"LMSYS\",  # NEW\n        \"nousresearch\": \"NousResearch\",  # NEW\n        \"wizardlmteam\": \"WizardLM\",  # NEW\n        \"liquidai\": \"Liquid AI\",\n    }\n    return mapping.get(org, org)\n\n\ndef infer_capabilities(repo_id: str, pipeline_tag: str | None, use_case: str) -> list[str]:\n    \"\"\"Infer model capabilities like vision and tool use.\"\"\"\n    caps: list[str] = []\n    rid = repo_id.lower()\n    uc = use_case.lower()\n\n    # Vision\n    if (\n        pipeline_tag == \"image-text-to-text\"\n        or \"vision\" in rid\n        or \"-vl-\" in rid\n        or rid.endswith(\"-vl\")\n        or \"llava\" in rid\n        or \"onevision\" in rid\n        or \"pixtral\" in rid\n        or \"vision\" in uc\n        or \"multimodal\" in uc\n    ):\n        caps.append(\"vision\")\n\n    # Tool use (known families)\n    if (\n        \"tool\" in uc\n        or \"function call\" in uc\n        or \"qwen3\" in rid\n        or \"qwen2.5\" in rid\n        or \"command-r\" in rid\n        or (\"llama-3\" in rid and \"instruct\" in rid)\n        or (\"mistral\" in rid and \"instruct\" in rid)\n        or \"hermes\" in rid\n    ):\n        caps.append(\"tool_use\")\n\n    return caps\n\n\ndef detect_quant_format(repo_id: str, config: dict | None) -> tuple[str, str]:\n    \"\"\"Detect quantization format and label from config.json.\n\n    Returns (format, quant_label) where:\n    - format: \"gguf\", \"awq\", \"gptq\", \"mlx\", or \"safetensors\"\n    - quant_label: e.g. \"AWQ-4bit\", \"GPTQ-Int4\", \"Q4_K_M\"\n    \"\"\"\n    if not config:\n        return _detect_format_from_name(repo_id)\n\n    quant_config = config.get(\"quantization_config\", {})\n    if not quant_config:\n        return _detect_format_from_name(repo_id)\n\n    quant_method = quant_config.get(\"quant_method\", \"\")\n    bits = quant_config.get(\"bits\", quant_config.get(\"num_bits\", 4))\n\n    # AWQ\n    if quant_method == \"awq\":\n        label = f\"AWQ-{bits}bit\"\n        return (\"awq\", label)\n\n    # GPTQ (including gptq_marlin)\n    if quant_method.startswith(\"gptq\"):\n        label = f\"GPTQ-Int{bits}\"\n        return (\"gptq\", label)\n\n    # compressed-tensors: dig into config_groups for bits, check name for format\n    if quant_method == \"compressed-tensors\":\n        # Try to extract bits from config_groups\n        config_groups = quant_config.get(\"config_groups\", {})\n        for group in config_groups.values():\n            if isinstance(group, dict):\n                weights = group.get(\"weights\", {})\n                if \"num_bits\" in weights:\n                    bits = weights[\"num_bits\"]\n                    break\n\n        name_upper = repo_id.upper()\n        if \"-AWQ\" in name_upper:\n            label = f\"AWQ-{bits}bit\"\n            return (\"awq\", label)\n        elif \"-GPTQ\" in name_upper:\n            label = f\"GPTQ-Int{bits}\"\n            return (\"gptq\", label)\n\n    return _detect_format_from_name(repo_id)\n\n\ndef _detect_format_from_name(repo_id: str) -> tuple[str, str]:\n    \"\"\"Fallback: detect format from model name patterns.\"\"\"\n    name_upper = repo_id.upper()\n\n    if \"-AWQ-8BIT\" in name_upper:\n        return (\"awq\", \"AWQ-8bit\")\n    if \"-AWQ\" in name_upper:\n        return (\"awq\", \"AWQ-4bit\")\n    if \"-GPTQ-INT8\" in name_upper or \"-GPTQ-8BIT\" in name_upper:\n        return (\"gptq\", \"GPTQ-Int8\")\n    if \"-GPTQ\" in name_upper:\n        return (\"gptq\", \"GPTQ-Int4\")\n    if \"-MLX-\" in name_upper or name_upper.endswith(\"-MLX\"):\n        return (\"mlx\", \"Q4_K_M\")  # MLX uses its own quant scheme handled elsewhere\n\n    return (\"gguf\", \"Q4_K_M\")\n\n\ndef scrape_model(repo_id: str) -> dict | None:\n    \"\"\"Scrape a single model and return an LlmModel-compatible dict.\"\"\"\n    info = fetch_model_info(repo_id)\n    if not info:\n        return None\n\n    # Extract parameter count from safetensors metadata\n    safetensors = info.get(\"safetensors\", {})\n    total_params = safetensors.get(\"total\")\n    if not total_params:\n        params_by_dtype = safetensors.get(\"parameters\", {})\n        if params_by_dtype:\n            total_params = max(params_by_dtype.values())\n\n    if not total_params:\n        print(f\"  ⚠ No parameter count found for {repo_id}\", file=sys.stderr)\n        return None\n\n    config = info.get(\"config\", {})\n    pipeline_tag = info.get(\"pipeline_tag\")\n\n    # Fetch full config.json for accurate context length\n    full_config = fetch_config_json(repo_id)\n\n    # Detect quantization format from config.json\n    model_format, default_quant = detect_quant_format(repo_id, full_config)\n    context_length = infer_context_length(full_config) if full_config else infer_context_length(config)\n\n    min_ram, rec_ram = estimate_ram(total_params, default_quant)\n    min_vram = estimate_vram(total_params, default_quant)\n\n    architecture = config.get(\"model_type\", \"unknown\")\n\n    # Detect MoE architecture\n    moe_info = detect_moe(repo_id, full_config, architecture, total_params)\n\n    use_case_str = infer_use_case(repo_id, pipeline_tag, config)\n\n    result = {\n        \"name\": repo_id,\n        \"provider\": extract_provider(repo_id),\n        \"parameter_count\": format_param_count(total_params),\n        \"parameters_raw\": total_params,\n        \"min_ram_gb\": min_ram,\n        \"recommended_ram_gb\": rec_ram,\n        \"min_vram_gb\": min_vram,\n        \"quantization\": default_quant,\n        \"format\": model_format,\n        \"context_length\": context_length,\n        \"use_case\": use_case_str,\n        \"capabilities\": infer_capabilities(repo_id, pipeline_tag, use_case_str),\n        \"pipeline_tag\": pipeline_tag or \"unknown\",\n        \"architecture\": architecture,\n        \"hf_downloads\": info.get(\"downloads\", 0),\n        \"hf_likes\": info.get(\"likes\", 0),\n        \"release_date\": (info.get(\"createdAt\") or \"\")[:10] or None,\n    }\n\n    # Add MoE fields if detected\n    if moe_info[\"is_moe\"]:\n        result[\"is_moe\"] = True\n        result[\"num_experts\"] = moe_info[\"num_experts\"]\n        result[\"active_experts\"] = moe_info[\"active_experts\"]\n        result[\"active_parameters\"] = moe_info[\"active_parameters\"]\n\n    return result\n\n\n# ---------------------------------------------------------------------------\n# GGUF source enrichment — find pre-quantized GGUF repos for known models\n# ---------------------------------------------------------------------------\n\n# Providers known to publish high-quality GGUF quantizations\nGGUF_PROVIDERS = [\"unsloth\", \"bartowski\", \"ggml-org\", \"TheBloke\", \"mradermacher\"]\n\nGGUF_CACHE_FILE = os.path.join(os.path.dirname(__file__), \"..\", \"data\", \"gguf_sources_cache.json\")\nGGUF_CACHE_MAX_AGE_DAYS = 7  # Re-check repos older than this\n\n\ndef _load_gguf_cache() -> dict:\n    \"\"\"Load the GGUF source cache from disk.\n\n    Returns dict mapping model repo_id -> {\"sources\": [...], \"checked\": ISO timestamp}\n    \"\"\"\n    try:\n        with open(GGUF_CACHE_FILE) as f:\n            return json.load(f)\n    except (FileNotFoundError, json.JSONDecodeError):\n        return {}\n\n\ndef _save_gguf_cache(cache: dict):\n    \"\"\"Save the GGUF source cache to disk.\"\"\"\n    os.makedirs(os.path.dirname(GGUF_CACHE_FILE), exist_ok=True)\n    with open(GGUF_CACHE_FILE, \"w\") as f:\n        json.dump(cache, f, indent=2)\n\n\ndef _cache_entry_fresh(entry: dict) -> bool:\n    \"\"\"Check if a cache entry is still valid.\"\"\"\n    try:\n        from datetime import datetime, timedelta, timezone\n        checked = datetime.fromisoformat(entry[\"checked\"])\n        return (datetime.now(timezone.utc) - checked) < timedelta(days=GGUF_CACHE_MAX_AGE_DAYS)\n    except (KeyError, ValueError):\n        return False\n\n\ndef _model_gguf_repo_candidates(repo_id: str) -> list[tuple[str, str]]:\n    \"\"\"Generate candidate GGUF repo names for a model.\n\n    Returns list of (provider, candidate_repo_id) tuples.\n    e.g. for \"meta-llama/Llama-3.1-8B-Instruct\" →\n         [(\"unsloth\", \"unsloth/Llama-3.1-8B-Instruct-GGUF\"),\n          (\"bartowski\", \"bartowski/Llama-3.1-8B-Instruct-GGUF\")]\n    \"\"\"\n    model_name = repo_id.split(\"/\")[-1]\n    candidates = []\n    for provider in GGUF_PROVIDERS:\n        candidates.append((provider, f\"{provider}/{model_name}-GGUF\"))\n    return candidates\n\n\ndef check_gguf_repo_exists(repo_id: str) -> bool:\n    \"\"\"Check if a HuggingFace repo exists and has GGUF files.\"\"\"\n    url = f\"{HF_API}/{repo_id}\"\n    req = urllib.request.Request(url, headers=_auth_headers())\n    try:\n        with urllib.request.urlopen(req, timeout=10) as resp:\n            info = json.loads(resp.read().decode())\n            tags = info.get(\"tags\", [])\n            return \"gguf\" in tags\n    except Exception:\n        return False\n\n\ndef enrich_gguf_sources(models: list[dict]) -> int:\n    \"\"\"Add gguf_sources to models by checking GGUF provider repos.\n\n    Uses a persistent cache to avoid re-checking repos on every scrape.\n    Returns the number of models enriched.\n    \"\"\"\n    cache = _load_gguf_cache()\n    enriched = 0\n    cache_hits = 0\n    total = len(models)\n    from datetime import datetime, timezone\n\n    for i, model in enumerate(models, 1):\n        repo_id = model[\"name\"]\n\n        # Skip non-GGUF models (AWQ/GPTQ don't use GGUF sources)\n        if model.get(\"format\", \"gguf\") != \"gguf\":\n            continue\n\n        # Check cache first\n        if repo_id in cache and _cache_entry_fresh(cache[repo_id]):\n            sources = cache[repo_id][\"sources\"]\n            cache_hits += 1\n        else:\n            # Query HuggingFace\n            candidates = _model_gguf_repo_candidates(repo_id)\n            sources = []\n            for provider, candidate_repo in candidates:\n                print(f\"  [{i}/{total}] Checking {candidate_repo}...\", end=\"\")\n                if check_gguf_repo_exists(candidate_repo):\n                    sources.append({\"repo\": candidate_repo, \"provider\": provider})\n                    print(\" ✓\")\n                else:\n                    print(\" ✗\")\n                time.sleep(0.15)  # Be polite to the API\n\n            # Update cache\n            cache[repo_id] = {\n                \"sources\": sources,\n                \"checked\": datetime.now(timezone.utc).isoformat(),\n            }\n\n        if sources:\n            model[\"gguf_sources\"] = sources\n            enriched += 1\n\n    _save_gguf_cache(cache)\n    print(f\"  Cache: {cache_hits} hits, {total - cache_hits} API checks\")\n    return enriched\n\n\n# ---------------------------------------------------------------------------\n# Auto-discovery from HuggingFace trending / most-downloaded\n# ---------------------------------------------------------------------------\n\n# Pipeline tags to search for discoverable models\nDISCOVER_PIPELINES = [\"text-generation\", \"text2text-generation\"]\n\n# Orgs to skip — these publish many fine-tunes that clutter the list\nSKIP_ORGS = {\n    \"TheBloke\",               # GGUF repacks, not original models\n    \"unsloth\",                # Training framework repacks\n    \"mlx-community\",          # MLX conversions\n    \"bartowski\",              # GGUF repacks\n    \"mradermacher\",           # GGUF repacks\n    \"trl-internal-testing\",   # Test fixtures\n    \"openai-community\",       # Legacy model mirrors (gpt2 etc.)\n    \"distilbert\",             # Distilled legacy models\n}\n\n\ndef discover_trending_models(limit: int = 30, min_downloads: int = 10000) -> list[str]:\n    \"\"\"Query HuggingFace API for top text-generation models by download count.\n\n    Returns a list of repo IDs (e.g. [\"mistralai/Mistral-7B-v0.1\", ...])\n    that are NOT already in TARGET_MODELS.\n    \"\"\"\n    curated = set(TARGET_MODELS)\n    discovered = []\n\n    for pipeline in DISCOVER_PIPELINES:\n        # Fetch more than we need since we'll filter heavily\n        fetch_limit = limit * 5\n        url = (\n            f\"{HF_API}?\"\n            f\"pipeline_tag={pipeline}&\"\n            f\"sort=downloads&\"\n            f\"direction=-1&\"\n            f\"limit={fetch_limit}\"\n        )\n        req = urllib.request.Request(url, headers=_auth_headers())\n        try:\n            with urllib.request.urlopen(req, timeout=30) as resp:\n                models = json.loads(resp.read().decode())\n        except Exception as e:\n            print(f\"  ⚠ Failed to fetch trending {pipeline} models: {e}\",\n                  file=sys.stderr)\n            continue\n\n        for m in models:\n            repo_id = m.get(\"id\", \"\")\n            if not repo_id or \"/\" not in repo_id:\n                continue\n\n            # Skip if already curated\n            if repo_id in curated:\n                continue\n\n            # Skip already discovered\n            if repo_id in discovered:\n                continue\n\n            # Skip known repack / converter orgs\n            org = repo_id.split(\"/\")[0]\n            if org in SKIP_ORGS:\n                continue\n\n            # Skip models with too few downloads\n            downloads = m.get(\"downloads\", 0)\n            if downloads < min_downloads:\n                continue\n\n            # Skip GGUF-only repos, adapters, and merges\n            tags = set(m.get(\"tags\", []))\n            if tags & {\"gguf\", \"adapter\", \"merge\", \"lora\", \"qlora\"}:\n                continue\n\n            # Must have safetensors tag (listing API doesn't include param counts,\n            # but the safetensors tag means scrape_model() will find params)\n            if \"safetensors\" not in tags:\n                continue\n\n            discovered.append(repo_id)\n            if len(discovered) >= limit:\n                break\n\n        if len(discovered) >= limit:\n            break\n\n    return discovered[:limit]\n\n\ndef main():\n    parser = argparse.ArgumentParser(\n        description=\"Scrape LLM model metadata from HuggingFace for llmfit.\"\n    )\n    parser.add_argument(\n        \"--discover\", action=\"store_true\",\n        help=\"Auto-discover trending text-generation models from HuggingFace \"\n             \"in addition to the curated TARGET_MODELS list.\"\n    )\n    parser.add_argument(\n        \"-n\", \"--discover-limit\", type=int, default=30,\n        help=\"Max number of trending models to discover (default: 30). \"\n             \"Duplicates of curated models are skipped automatically.\"\n    )\n    parser.add_argument(\n        \"--min-downloads\", type=int, default=10000,\n        help=\"Minimum download count for discovered models (default: 10000).\"\n    )\n    parser.add_argument(\n        \"--gguf-sources\", action=\"store_true\", default=True,\n        help=\"Enrich models with known GGUF download sources from \"\n             \"providers like unsloth and bartowski on HuggingFace (default: enabled).\"\n    )\n    parser.add_argument(\n        \"--no-gguf-sources\", action=\"store_false\", dest=\"gguf_sources\",\n        help=\"Skip GGUF download source enrichment (faster scrape).\"\n    )\n    parser.add_argument(\n        \"--token\", type=str, default=None,\n        help=\"HuggingFace API token for accessing gated models. \"\n             \"Can also be set via HF_TOKEN or HUGGING_FACE_HUB_TOKEN env var.\"\n    )\n    args = parser.parse_args()\n\n    # Resolve auth token: CLI flag > HF_TOKEN > HUGGING_FACE_HUB_TOKEN\n    global _hf_token\n    _hf_token = (\n        args.token\n        or os.environ.get(\"HF_TOKEN\")\n        or os.environ.get(\"HUGGING_FACE_HUB_TOKEN\")\n    )\n    if _hf_token:\n        print(f\"🔑 Authenticated with HuggingFace token ({_hf_token[:4]}...{_hf_token[-4:]})\")\n    else:\n        print(\"ℹ  No HF token set. Gated models will use fallback data.\")\n        print(\"   Set HF_TOKEN env var or pass --token to access gated models.\\n\")\n\n    # Fallback entries for gated/auth-required models where the API\n    # doesn't return safetensors metadata without a token.\n    FALLBACKS = [\n        {\n            \"name\": \"meta-llama/Llama-3.3-70B-Instruct\",\n            \"provider\": \"Meta\", \"parameter_count\": \"70.6B\",\n            \"parameters_raw\": 70_553_706_496,\n            \"min_ram_gb\": 39.4, \"recommended_ram_gb\": 65.7, \"min_vram_gb\": 36.1,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"mistralai/Mistral-Small-24B-Instruct-2501\",\n            \"provider\": \"Mistral AI\", \"parameter_count\": \"24B\",\n            \"parameters_raw\": 24_000_000_000,\n            \"min_ram_gb\": 13.4, \"recommended_ram_gb\": 22.4, \"min_vram_gb\": 12.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"mistral\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen2.5-14B-Instruct\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"14.8B\",\n            \"parameters_raw\": 14_770_000_000,\n            \"min_ram_gb\": 8.2, \"recommended_ram_gb\": 13.7, \"min_vram_gb\": 7.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"qwen2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen2.5-32B-Instruct\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"32.5B\",\n            \"parameters_raw\": 32_510_000_000,\n            \"min_ram_gb\": 18.2, \"recommended_ram_gb\": 30.3, \"min_vram_gb\": 16.7,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"qwen2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"microsoft/phi-3-mini-4k-instruct\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"3.8B\",\n            \"parameters_raw\": 3_821_000_000,\n            \"min_ram_gb\": 2.1, \"recommended_ram_gb\": 3.6, \"min_vram_gb\": 2.0,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Lightweight, edge deployment\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"phi3\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"microsoft/phi-4\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"14B\",\n            \"parameters_raw\": 14_000_000_000,\n            \"min_ram_gb\": 7.8, \"recommended_ram_gb\": 13.0, \"min_vram_gb\": 7.2,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 16384,\n            \"use_case\": \"Reasoning, STEM, code generation\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"phi\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"google/gemma-3-12b-it\",\n            \"provider\": \"Google\", \"parameter_count\": \"12B\",\n            \"parameters_raw\": 12_000_000_000,\n            \"min_ram_gb\": 6.7, \"recommended_ram_gb\": 11.2, \"min_vram_gb\": 6.1,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"gemma3\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"deepseek-ai/DeepSeek-V3\",\n            \"provider\": \"DeepSeek\", \"parameter_count\": \"685B\",\n            \"parameters_raw\": 685_000_000_000,\n            \"min_ram_gb\": 382.8, \"recommended_ram_gb\": 638.0, \"min_vram_gb\": 351.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"State-of-the-art, MoE architecture\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"deepseek_v3\",\n            \"is_moe\": True, \"num_experts\": 256, \"active_experts\": 8,\n            \"active_parameters\": 37_000_000_000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"CohereForAI/c4ai-command-r-v01\",\n            \"provider\": \"Cohere\", \"parameter_count\": \"35B\",\n            \"parameters_raw\": 35_000_000_000,\n            \"min_ram_gb\": 19.5, \"recommended_ram_gb\": 32.6, \"min_vram_gb\": 17.9,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"RAG, tool use, agents\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"cohere\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"bigcode/starcoder2-15b\",\n            \"provider\": \"BigCode\", \"parameter_count\": \"15.7B\",\n            \"parameters_raw\": 15_700_000_000,\n            \"min_ram_gb\": 8.8, \"recommended_ram_gb\": 14.6, \"min_vram_gb\": 8.0,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 16384,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"starcoder2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"nomic-ai/nomic-embed-text-v1.5\",\n            \"provider\": \"Nomic\", \"parameter_count\": \"137M\",\n            \"parameters_raw\": 137_000_000,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"F16\", \"context_length\": 8192,\n            \"use_case\": \"Text embeddings for RAG\",\n            \"pipeline_tag\": \"feature-extraction\", \"architecture\": \"nomic_bert\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct\",\n            \"provider\": \"DeepSeek\", \"parameter_count\": \"16B\",\n            \"parameters_raw\": 15_700_000_000,\n            \"min_ram_gb\": 8.8, \"recommended_ram_gb\": 14.6, \"min_vram_gb\": 8.0,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"deepseek_v2\",\n            \"is_moe\": True, \"num_experts\": 64, \"active_experts\": 6,\n            \"active_parameters\": 2_400_000_000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"microsoft/Phi-3-medium-14b-instruct\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"14B\",\n            \"parameters_raw\": 14_000_000_000,\n            \"min_ram_gb\": 7.8, \"recommended_ram_gb\": 13.0, \"min_vram_gb\": 7.2,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Balanced performance and size\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"phi3\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        # NEW FALLBACKS for popular models\n        {\n            \"name\": \"google/gemma-2-2b-it\",\n            \"provider\": \"Google\", \"parameter_count\": \"2.6B\",\n            \"parameters_raw\": 2614341376,\n            \"min_ram_gb\": 1.5, \"recommended_ram_gb\": 2.4, \"min_vram_gb\": 1.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 8192,\n            \"use_case\": \"Lightweight, edge deployment\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"gemma2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"meta-llama/CodeLlama-7b-Instruct-hf\",\n            \"provider\": \"Meta\", \"parameter_count\": \"7.0B\",\n            \"parameters_raw\": 7016400896,\n            \"min_ram_gb\": 3.9, \"recommended_ram_gb\": 6.5, \"min_vram_gb\": 3.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 16384,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"meta-llama/CodeLlama-13b-Instruct-hf\",\n            \"provider\": \"Meta\", \"parameter_count\": \"13.0B\",\n            \"parameters_raw\": 13015864320,\n            \"min_ram_gb\": 7.3, \"recommended_ram_gb\": 12.1, \"min_vram_gb\": 6.7,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 16384,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"meta-llama/CodeLlama-34b-Instruct-hf\",\n            \"provider\": \"Meta\", \"parameter_count\": \"34.0B\",\n            \"parameters_raw\": 34018971648,\n            \"min_ram_gb\": 19.0, \"recommended_ram_gb\": 31.7, \"min_vram_gb\": 17.4,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 16384,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"meta-llama/Llama-3.2-11B-Vision-Instruct\",\n            \"provider\": \"Meta\", \"parameter_count\": \"11.0B\",\n            \"parameters_raw\": 10665463808,\n            \"min_ram_gb\": 6.0, \"recommended_ram_gb\": 9.9, \"min_vram_gb\": 5.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"mistralai/Ministral-8B-Instruct-2410\",\n            \"provider\": \"Mistral AI\", \"parameter_count\": \"8.0B\",\n            \"parameters_raw\": 8030261248,\n            \"min_ram_gb\": 4.5, \"recommended_ram_gb\": 7.5, \"min_vram_gb\": 4.1,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"mistral\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"mistralai/Mistral-Nemo-Instruct-2407\",\n            \"provider\": \"Mistral AI\", \"parameter_count\": \"12.2B\",\n            \"parameters_raw\": 12247076864,\n            \"min_ram_gb\": 6.8, \"recommended_ram_gb\": 11.4, \"min_vram_gb\": 6.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"mistral\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"microsoft/Phi-3.5-mini-instruct\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"3.8B\",\n            \"parameters_raw\": 3821000000,\n            \"min_ram_gb\": 2.1, \"recommended_ram_gb\": 3.6, \"min_vram_gb\": 2.0,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Lightweight, long context\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"phi3\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"microsoft/Orca-2-7b\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"7.0B\",\n            \"parameters_raw\": 7016400896,\n            \"min_ram_gb\": 3.9, \"recommended_ram_gb\": 6.5, \"min_vram_gb\": 3.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Reasoning, step-by-step solutions\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"microsoft/Orca-2-13b\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"13.0B\",\n            \"parameters_raw\": 13015864320,\n            \"min_ram_gb\": 7.3, \"recommended_ram_gb\": 12.1, \"min_vram_gb\": 6.7,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Reasoning, step-by-step solutions\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"01-ai/Yi-6B-Chat\",\n            \"provider\": \"01.ai\", \"parameter_count\": \"6.1B\",\n            \"parameters_raw\": 6061356032,\n            \"min_ram_gb\": 3.4, \"recommended_ram_gb\": 5.6, \"min_vram_gb\": 3.1,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Multilingual, Chinese/English chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"yi\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"01-ai/Yi-34B-Chat\",\n            \"provider\": \"01.ai\", \"parameter_count\": \"34.4B\",\n            \"parameters_raw\": 34386780160,\n            \"min_ram_gb\": 19.2, \"recommended_ram_gb\": 32.0, \"min_vram_gb\": 17.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Multilingual, Chinese/English chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"yi\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"upstage/SOLAR-10.7B-Instruct-v1.0\",\n            \"provider\": \"Upstage\", \"parameter_count\": \"10.7B\",\n            \"parameters_raw\": 10700000000,\n            \"min_ram_gb\": 6.0, \"recommended_ram_gb\": 10.0, \"min_vram_gb\": 5.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"High-performance instruction following\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"tiiuae/falcon-7b-instruct\",\n            \"provider\": \"TII\", \"parameter_count\": \"7.0B\",\n            \"parameters_raw\": 7000000000,\n            \"min_ram_gb\": 3.9, \"recommended_ram_gb\": 6.5, \"min_vram_gb\": 3.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 2048,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"falcon\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"tiiuae/falcon-40b-instruct\",\n            \"provider\": \"TII\", \"parameter_count\": \"40.0B\",\n            \"parameters_raw\": 40000000000,\n            \"min_ram_gb\": 22.4, \"recommended_ram_gb\": 37.3, \"min_vram_gb\": 20.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 2048,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"falcon\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"HuggingFaceH4/zephyr-7b-beta\",\n            \"provider\": \"HuggingFace\", \"parameter_count\": \"7.2B\",\n            \"parameters_raw\": 7241732096,\n            \"min_ram_gb\": 4.0, \"recommended_ram_gb\": 6.7, \"min_vram_gb\": 3.7,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"mistral\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"openchat/openchat-3.5-0106\",\n            \"provider\": \"OpenChat\", \"parameter_count\": \"7.0B\",\n            \"parameters_raw\": 7000000000,\n            \"min_ram_gb\": 3.9, \"recommended_ram_gb\": 6.5, \"min_vram_gb\": 3.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 8192,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"mistral\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"lmsys/vicuna-7b-v1.5\",\n            \"provider\": \"LMSYS\", \"parameter_count\": \"7.0B\",\n            \"parameters_raw\": 6738415616,\n            \"min_ram_gb\": 3.8, \"recommended_ram_gb\": 6.3, \"min_vram_gb\": 3.4,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"lmsys/vicuna-13b-v1.5\",\n            \"provider\": \"LMSYS\", \"parameter_count\": \"13.0B\",\n            \"parameters_raw\": 13015864320,\n            \"min_ram_gb\": 7.3, \"recommended_ram_gb\": 12.1, \"min_vram_gb\": 6.7,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO\",\n            \"provider\": \"NousResearch\", \"parameter_count\": \"46.7B\",\n            \"parameters_raw\": 46702792704,\n            \"min_ram_gb\": 26.1, \"recommended_ram_gb\": 43.5, \"min_vram_gb\": 23.9,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"mixtral\",\n            \"is_moe\": True, \"num_experts\": 8, \"active_experts\": 2,\n            \"active_parameters\": 12900000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"WizardLMTeam/WizardLM-13B-V1.2\",\n            \"provider\": \"WizardLM\", \"parameter_count\": \"13.0B\",\n            \"parameters_raw\": 13015864320,\n            \"min_ram_gb\": 7.3, \"recommended_ram_gb\": 12.1, \"min_vram_gb\": 6.7,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 4096,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"llama\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"WizardLMTeam/WizardCoder-15B-V1.0\",\n            \"provider\": \"WizardLM\", \"parameter_count\": \"15.5B\",\n            \"parameters_raw\": 15515334656,\n            \"min_ram_gb\": 8.7, \"recommended_ram_gb\": 14.5, \"min_vram_gb\": 7.9,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 8192,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"starcoder\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen2.5-Coder-1.5B-Instruct\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"1.5B\",\n            \"parameters_raw\": 1539938304,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.8,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"qwen2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen2.5-Coder-7B-Instruct\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"7.6B\",\n            \"parameters_raw\": 7615616000,\n            \"min_ram_gb\": 4.3, \"recommended_ram_gb\": 7.1, \"min_vram_gb\": 3.9,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"qwen2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen2.5-Coder-14B-Instruct\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"14.7B\",\n            \"parameters_raw\": 14770000000,\n            \"min_ram_gb\": 8.2, \"recommended_ram_gb\": 13.7, \"min_vram_gb\": 7.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"qwen2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen2.5-Coder-32B-Instruct\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"32.5B\",\n            \"parameters_raw\": 32510000000,\n            \"min_ram_gb\": 18.2, \"recommended_ram_gb\": 30.3, \"min_vram_gb\": 16.7,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Code generation and completion\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"qwen2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen2.5-VL-3B-Instruct\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"3.8B\",\n            \"parameters_raw\": 3821000000,\n            \"min_ram_gb\": 2.1, \"recommended_ram_gb\": 3.6, \"min_vram_gb\": 2.0,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"qwen2_vl\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen2.5-VL-7B-Instruct\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"8.3B\",\n            \"parameters_raw\": 8290000000,\n            \"min_ram_gb\": 4.6, \"recommended_ram_gb\": 7.7, \"min_vram_gb\": 4.2,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"qwen2_vl\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen3-14B\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"14.8B\",\n            \"parameters_raw\": 14770000000,\n            \"min_ram_gb\": 8.2, \"recommended_ram_gb\": 13.7, \"min_vram_gb\": 7.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"General purpose text generation\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"qwen3\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        # --- New fallbacks added Feb 2026 ---\n        {\n            \"name\": \"deepseek-ai/DeepSeek-V3.2\",\n            \"provider\": \"DeepSeek\", \"parameter_count\": \"685B\",\n            \"parameters_raw\": 685000000000,\n            \"min_ram_gb\": 383.2, \"recommended_ram_gb\": 638.7, \"min_vram_gb\": 351.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"State-of-the-art, MoE architecture\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"deepseek_v3\",\n            \"is_moe\": True, \"num_experts\": 256, \"active_experts\": 8,\n            \"active_parameters\": 37000000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-12-01\",\n        },\n        {\n            \"name\": \"deepseek-ai/DeepSeek-V3.2-Speciale\",\n            \"provider\": \"DeepSeek\", \"parameter_count\": \"685B\",\n            \"parameters_raw\": 685000000000,\n            \"min_ram_gb\": 383.2, \"recommended_ram_gb\": 638.7, \"min_vram_gb\": 351.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Advanced reasoning, chain-of-thought\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"deepseek_v3\",\n            \"is_moe\": True, \"num_experts\": 256, \"active_experts\": 8,\n            \"active_parameters\": 37000000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-12-01\",\n        },\n        {\n            \"name\": \"zai-org/GLM-5\",\n            \"provider\": \"Zhipu AI\", \"parameter_count\": \"744B\",\n            \"parameters_raw\": 744000000000,\n            \"min_ram_gb\": 416.2, \"recommended_ram_gb\": 693.6, \"min_vram_gb\": 381.4,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 200000,\n            \"use_case\": \"State-of-the-art, MoE architecture\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"glm\",\n            \"is_moe\": True, \"num_experts\": 256, \"active_experts\": 8,\n            \"active_parameters\": 40000000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2026-02-11\",\n        },\n        {\n            \"name\": \"moonshotai/Kimi-K2.5\",\n            \"provider\": \"Moonshot\", \"parameter_count\": \"171B\",\n            \"parameters_raw\": 171000000000,\n            \"min_ram_gb\": 95.6, \"recommended_ram_gb\": 159.4, \"min_vram_gb\": 87.7,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 262144,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"kimi\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2026-01-26\",\n        },\n        {\n            \"name\": \"MiniMaxAI/MiniMax-M2.7\",\n            \"provider\": \"MiniMax\", \"parameter_count\": \"230B\",\n            \"parameters_raw\": 230000000000,\n            \"min_ram_gb\": 128.6, \"recommended_ram_gb\": 214.4, \"min_vram_gb\": 117.9,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Latest flagship with enhanced reasoning and coding\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"minimax\",\n            \"is_moe\": True, \"num_experts\": 32, \"active_experts\": 2,\n            \"active_parameters\": 10000000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2026-03-18\",\n        },\n        {\n            \"name\": \"MiniMaxAI/MiniMax-M2.5\",\n            \"provider\": \"MiniMax\", \"parameter_count\": \"230B\",\n            \"parameters_raw\": 230000000000,\n            \"min_ram_gb\": 128.6, \"recommended_ram_gb\": 214.4, \"min_vram_gb\": 117.9,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Coding, agentic tool use\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"minimax\",\n            \"is_moe\": True, \"num_experts\": 32, \"active_experts\": 2,\n            \"active_parameters\": 10000000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2026-02-11\",\n        },\n        {\n            \"name\": \"XiaomiMiMo/MiMo-V2-Flash\",\n            \"provider\": \"Xiaomi\", \"parameter_count\": \"309B\",\n            \"parameters_raw\": 309000000000,\n            \"min_ram_gb\": 172.8, \"recommended_ram_gb\": 288.0, \"min_vram_gb\": 158.4,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Efficient reasoning, coding\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"mimo\",\n            \"is_moe\": True, \"num_experts\": 128, \"active_experts\": 8,\n            \"active_parameters\": 15000000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-12-01\",\n        },\n        {\n            \"name\": \"XiaomiMiMo/MiMo-7B-RL\",\n            \"provider\": \"Xiaomi\", \"parameter_count\": \"7.0B\",\n            \"parameters_raw\": 7000000000,\n            \"min_ram_gb\": 3.9, \"recommended_ram_gb\": 6.5, \"min_vram_gb\": 3.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Advanced reasoning, math and code\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"mimo\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-05-01\",\n        },\n        {\n            \"name\": \"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16\",\n            \"provider\": \"NVIDIA\", \"parameter_count\": \"30B\",\n            \"parameters_raw\": 30000000000,\n            \"min_ram_gb\": 16.8, \"recommended_ram_gb\": 28.0, \"min_vram_gb\": 15.4,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 1048576,\n            \"use_case\": \"Efficient MoE, agentic tasks\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"nemotron\",\n            \"is_moe\": True, \"num_experts\": 128, \"active_experts\": 6,\n            \"active_parameters\": 3000000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-06-01\",\n        },\n        {\n            \"name\": \"nvidia/NVIDIA-Nemotron-Nano-9B-v2\",\n            \"provider\": \"NVIDIA\", \"parameter_count\": \"9B\",\n            \"parameters_raw\": 9000000000,\n            \"min_ram_gb\": 5.0, \"recommended_ram_gb\": 8.4, \"min_vram_gb\": 4.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Hybrid Mamba2, reasoning\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"nemotron\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-06-01\",\n        },\n        {\n            \"name\": \"microsoft/Phi-4-reasoning\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"14B\",\n            \"parameters_raw\": 14000000000,\n            \"min_ram_gb\": 7.8, \"recommended_ram_gb\": 13.0, \"min_vram_gb\": 7.2,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Advanced reasoning, math and code\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"phi4\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-04-01\",\n        },\n        {\n            \"name\": \"microsoft/Phi-4-mini-reasoning\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"3.8B\",\n            \"parameters_raw\": 3800000000,\n            \"min_ram_gb\": 2.1, \"recommended_ram_gb\": 3.5, \"min_vram_gb\": 1.9,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 16384,\n            \"use_case\": \"Lightweight reasoning\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"phi4\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-04-01\",\n        },\n        {\n            \"name\": \"microsoft/Phi-4-multimodal-instruct\",\n            \"provider\": \"Microsoft\", \"parameter_count\": \"14B\",\n            \"parameters_raw\": 14000000000,\n            \"min_ram_gb\": 7.8, \"recommended_ram_gb\": 13.0, \"min_vram_gb\": 7.2,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Multimodal, vision and audio\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"phi4\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-04-01\",\n        },\n        {\n            \"name\": \"LGAI-EXAONE/EXAONE-4.0-32B\",\n            \"provider\": \"LG AI\", \"parameter_count\": \"32B\",\n            \"parameters_raw\": 32000000000,\n            \"min_ram_gb\": 17.9, \"recommended_ram_gb\": 29.8, \"min_vram_gb\": 16.4,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Hybrid reasoning, multilingual\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"exaone\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-07-15\",\n        },\n        {\n            \"name\": \"LGAI-EXAONE/EXAONE-4.0-1.2B\",\n            \"provider\": \"LG AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1200000000,\n            \"min_ram_gb\": 0.7, \"recommended_ram_gb\": 1.1, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Lightweight, on-device\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"exaone\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-07-15\",\n        },\n        {\n            \"name\": \"HuggingFaceTB/SmolLM3-3B\",\n            \"provider\": \"HuggingFace\", \"parameter_count\": \"3B\",\n            \"parameters_raw\": 3000000000,\n            \"min_ram_gb\": 1.7, \"recommended_ram_gb\": 2.8, \"min_vram_gb\": 1.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Lightweight, multilingual reasoning\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"smollm\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-07-08\",\n        },\n        {\n            \"name\": \"google/gemma-3n-E4B-it\",\n            \"provider\": \"Google\", \"parameter_count\": \"8B\",\n            \"parameters_raw\": 8000000000,\n            \"min_ram_gb\": 4.5, \"recommended_ram_gb\": 7.5, \"min_vram_gb\": 4.1,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Multimodal, on-device (effective 4B)\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"gemma3n\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-06-25\",\n        },\n        {\n            \"name\": \"google/gemma-3n-E2B-it\",\n            \"provider\": \"Google\", \"parameter_count\": \"4B\",\n            \"parameters_raw\": 4000000000,\n            \"min_ram_gb\": 2.2, \"recommended_ram_gb\": 3.7, \"min_vram_gb\": 2.1,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 131072,\n            \"use_case\": \"Multimodal, on-device (effective 2B)\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"gemma3n\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-06-25\",\n        },\n        # Qwen3-Coder-Next (80B MoE, 3B active, Jan 2026)\n        {\n            \"name\": \"Qwen/Qwen3-Coder-Next\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"80B\",\n            \"parameters_raw\": 80000000000,\n            \"min_ram_gb\": 44.8, \"recommended_ram_gb\": 74.6, \"min_vram_gb\": 41.0,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 262144,\n            \"use_case\": \"Code generation, agentic coding\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"qwen3_next\",\n            \"is_moe\": True, \"num_experts\": 64, \"active_experts\": 4,\n            \"active_parameters\": 3000000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2026-01-30\",\n        },\n        {\n            \"name\": \"Qwen/Qwen3.5-27B\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"27.8B\",\n            \"parameters_raw\": 27781427952,\n            \"min_ram_gb\": 15.5, \"recommended_ram_gb\": 25.9, \"min_vram_gb\": 14.2,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 262144,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"qwen3_5\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n        },\n        {\n            \"name\": \"Qwen/Qwen3.5-35B-A3B\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"36.0B\",\n            \"parameters_raw\": 35951822704,\n            \"min_ram_gb\": 20.1, \"recommended_ram_gb\": 33.5, \"min_vram_gb\": 18.4,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 262144,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"qwen3_5_moe\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n            \"is_moe\": True, \"num_experts\": 256, \"active_experts\": 8,\n            \"active_parameters\": 3_000_000_000,\n        },\n        {\n            \"name\": \"Qwen/Qwen3.5-122B-A10B\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"125.1B\",\n            \"parameters_raw\": 125086497008,\n            \"min_ram_gb\": 69.9, \"recommended_ram_gb\": 116.5, \"min_vram_gb\": 64.1,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 262144,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"qwen3_5_moe\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n            \"is_moe\": True, \"num_experts\": 256, \"active_experts\": 8,\n            \"active_parameters\": 10_000_000_000,\n        },\n        {\n            \"name\": \"Qwen/Qwen3.5-397B-A17B\",\n            \"provider\": \"Alibaba\", \"parameter_count\": \"403.4B\",\n            \"parameters_raw\": 403397928944,\n            \"min_ram_gb\": 225.4, \"recommended_ram_gb\": 375.7, \"min_vram_gb\": 206.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 262144,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"qwen3_5_moe\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": None,\n            \"is_moe\": True, \"num_experts\": 256, \"active_experts\": 8,\n            \"active_parameters\": 17_000_000_000,\n        },\n        # Liquid AI LFM2 dense models\n        {\n            \"name\": \"LiquidAI/LFM2-350M\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"354M\",\n            \"parameters_raw\": 354483968,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Lightweight, edge deployment\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-700M\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"742M\",\n            \"parameters_raw\": 742489344,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Lightweight, edge deployment\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-1.2B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1170340608,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"General purpose text generation\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-2.6B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"2.6B\",\n            \"parameters_raw\": 2569272320,\n            \"min_ram_gb\": 1.4, \"recommended_ram_gb\": 2.4, \"min_vram_gb\": 1.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"General purpose text generation\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-2.6B-Exp\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"2.6B\",\n            \"parameters_raw\": 2569272320,\n            \"min_ram_gb\": 1.4, \"recommended_ram_gb\": 2.4, \"min_vram_gb\": 1.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Instruction following, math, knowledge\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        # Liquid AI LFM2 MoE models\n        {\n            \"name\": \"LiquidAI/LFM2-8B-A1B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"8.3B\",\n            \"parameters_raw\": 8300000000,\n            \"min_ram_gb\": 4.6, \"recommended_ram_gb\": 7.7, \"min_vram_gb\": 4.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"General purpose, edge MoE\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"is_moe\": True, \"num_experts\": 32, \"active_experts\": 4,\n            \"active_parameters\": 1500000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-24B-A2B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"23.8B\",\n            \"parameters_raw\": 23_843_661_440,\n            \"min_ram_gb\": 13.3, \"recommended_ram_gb\": 22.2, \"min_vram_gb\": 12.2,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Agentic tasks, RAG, summarization\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"is_moe\": True, \"num_experts\": 32, \"active_experts\": 4,\n            \"active_parameters\": 2300000000,\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        # Liquid AI LFM2.5 models\n        {\n            \"name\": \"LiquidAI/LFM2.5-1.2B-Base\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1170340608,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"General purpose text generation\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2.5-1.2B-Instruct\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1170340608,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Instruction following, chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2.5-1.2B-Thinking\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1170340608,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Advanced reasoning, chain-of-thought\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2.5-1.2B-JP\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1170340608,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Japanese language, multilingual chat\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        # Liquid AI LFM2 Vision-Language models\n        {\n            \"name\": \"LiquidAI/LFM2-VL-450M\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"451M\",\n            \"parameters_raw\": 450822656,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-VL-1.6B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.6B\",\n            \"parameters_raw\": 1584804000,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.8,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-VL-3B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"3.0B\",\n            \"parameters_raw\": 2998975216,\n            \"min_ram_gb\": 1.7, \"recommended_ram_gb\": 2.8, \"min_vram_gb\": 1.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2.5-VL-1.6B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.6B\",\n            \"parameters_raw\": 1596625904,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.8,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Multimodal, vision and text\",\n            \"pipeline_tag\": \"image-text-to-text\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        # Liquid AI LFM2 Audio models\n        {\n            \"name\": \"LiquidAI/LFM2-Audio-1.5B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.5B\",\n            \"parameters_raw\": 1500000000,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.8,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Speech-to-speech, ASR, TTS\",\n            \"pipeline_tag\": \"audio-to-audio\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2.5-Audio-1.5B\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.5B\",\n            \"parameters_raw\": 1500000000,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.8,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 32768,\n            \"use_case\": \"Speech-to-speech, ASR, TTS\",\n            \"pipeline_tag\": \"audio-to-audio\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        # Liquid AI Liquid Nanos (task-specific fine-tunes)\n        {\n            \"name\": \"LiquidAI/LFM2-1.2B-Tool\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1170340608,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Tool calling, function calling\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-1.2B-RAG\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1170340608,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Retrieval-augmented generation\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-1.2B-Extract\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"1.2B\",\n            \"parameters_raw\": 1170340608,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.6,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Data extraction, structured output\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-350M-Extract\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"354M\",\n            \"parameters_raw\": 354483968,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Data extraction, structured output\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-350M-Math\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"354M\",\n            \"parameters_raw\": 354483968,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Math reasoning, chain-of-thought\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-350M-ENJP-MT\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"354M\",\n            \"parameters_raw\": 354483968,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"English-Japanese translation\",\n            \"pipeline_tag\": \"translation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-350M-PII-Extract-JP\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"354M\",\n            \"parameters_raw\": 354483968,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"PII extraction, Japanese\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-ColBERT-350M\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"353M\",\n            \"parameters_raw\": 353322752,\n            \"min_ram_gb\": 1.0, \"recommended_ram_gb\": 2.0, \"min_vram_gb\": 0.5,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Semantic search, sentence similarity\",\n            \"pipeline_tag\": \"sentence-similarity\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n        {\n            \"name\": \"LiquidAI/LFM2-2.6B-Transcript\",\n            \"provider\": \"Liquid AI\", \"parameter_count\": \"2.6B\",\n            \"parameters_raw\": 2569272320,\n            \"min_ram_gb\": 1.4, \"recommended_ram_gb\": 2.4, \"min_vram_gb\": 1.3,\n            \"quantization\": \"Q4_K_M\", \"context_length\": 128000,\n            \"use_case\": \"Meeting transcription, summarization\",\n            \"pipeline_tag\": \"text-generation\", \"architecture\": \"lfm2\",\n            \"hf_downloads\": 0, \"hf_likes\": 0, \"release_date\": \"2025-11-28\",\n        },\n    ]\n\n    print(f\"Scraping {len(TARGET_MODELS)} curated models from HuggingFace...\\n\")\n\n    results = []\n    scraped_names = set()\n    for i, repo_id in enumerate(TARGET_MODELS, 1):\n        print(f\"[{i}/{len(TARGET_MODELS)}] {repo_id}...\")\n        model = scrape_model(repo_id)\n        if model:\n            print(f\"  ✓ {model['parameter_count']} params, \"\n                  f\"min {model['min_ram_gb']} GB RAM, \"\n                  f\"ctx {model['context_length']}\")\n            results.append(model)\n            scraped_names.add(repo_id)\n        # Be polite to the API\n        time.sleep(0.3)\n\n    # Fill in fallbacks for models that couldn't be scraped\n    fallback_count = 0\n    for fb in FALLBACKS:\n        if fb[\"name\"] not in scraped_names:\n            print(f\"  + Fallback: {fb['name']} ({fb['parameter_count']})\")\n            results.append(fb)\n            scraped_names.add(fb[\"name\"])\n            fallback_count += 1\n\n    # Auto-discover trending models if --discover flag is set\n    discovered_count = 0\n    if args.discover:\n        print(f\"\\nDiscovering trending models (limit={args.discover_limit}, \"\n              f\"min_downloads={args.min_downloads})...\")\n        trending = discover_trending_models(\n            limit=args.discover_limit,\n            min_downloads=args.min_downloads,\n        )\n        print(f\"  Found {len(trending)} new models not in curated list\\n\")\n\n        for i, repo_id in enumerate(trending, 1):\n            if repo_id in scraped_names:\n                continue\n            print(f\"[discover {i}/{len(trending)}] {repo_id}...\")\n            model = scrape_model(repo_id)\n            if model:\n                model[\"_discovered\"] = True  # mark as auto-discovered\n                print(f\"  ✓ {model['parameter_count']} params, \"\n                      f\"{model['hf_downloads']:,} downloads, \"\n                      f\"ctx {model['context_length']}\")\n                results.append(model)\n                scraped_names.add(repo_id)\n                discovered_count += 1\n            time.sleep(0.3)\n\n    # Sort by parameter count\n    results.sort(key=lambda m: m[\"parameters_raw\"])\n\n    # Enrich with GGUF download sources if requested\n    gguf_enriched = 0\n    if args.gguf_sources:\n        print(f\"\\nEnriching {len(results)} models with GGUF download sources...\")\n        gguf_enriched = enrich_gguf_sources(results)\n        print(f\"  Found GGUF sources for {gguf_enriched} models\")\n\n    # Write to both locations: repo root (for reference) and llmfit-core (compiled into binary)\n    output_paths = [\"data/hf_models.json\", \"llmfit-core/data/hf_models.json\"]\n    for output_path in output_paths:\n        os.makedirs(os.path.dirname(output_path), exist_ok=True)\n        with open(output_path, \"w\") as f:\n            json.dump(results, f, indent=2)\n\n    print(f\"\\n✅ Wrote {len(results)} models to {', '.join(output_paths)}\")\n    print(f\"   Curated: {len(TARGET_MODELS)}, Fallbacks: {fallback_count}, \"\n          f\"Discovered: {discovered_count}, GGUF-sourced: {gguf_enriched}\")\n\n    # Print summary table\n    print(f\"\\n{'Model':<50} {'Params':>8} {'Min RAM':>8} {'Rec RAM':>8} {'VRAM':>6}\")\n    print(\"─\" * 84)\n    for m in results:\n        print(f\"{m['name']:<50} {m['parameter_count']:>8} \"\n              f\"{m['min_ram_gb']:>7.1f}G {m['recommended_ram_gb']:>7.1f}G \"\n              f\"{m['min_vram_gb']:>5.1f}G\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "scripts/test_api.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nLocal API validation tests for llmfit serve.\n\nUsage:\n  # Test an already-running server\n  python3 scripts/test_api.py --base-url http://127.0.0.1:8787\n\n  # Spawn server automatically (from repo root)\n  python3 scripts/test_api.py --spawn\n\"\"\"\n\nfrom __future__ import annotations\n\nimport argparse\nimport json\nimport os\nimport subprocess\nimport sys\nimport time\nimport urllib.error\nimport urllib.parse\nimport urllib.request\nfrom typing import Any, Dict, List, Optional, Tuple\n\n\ndef _http_json(url: str, timeout: float = 10.0) -> Tuple[int, Dict[str, Any]]:\n    req = urllib.request.Request(url, method=\"GET\")\n    try:\n        with urllib.request.urlopen(req, timeout=timeout) as resp:\n            code = resp.getcode()\n            body = resp.read().decode(\"utf-8\")\n            data = json.loads(body) if body else {}\n            return code, data\n    except urllib.error.HTTPError as exc:\n        body = exc.read().decode(\"utf-8\") if exc.fp else \"\"\n        try:\n            data = json.loads(body) if body else {}\n        except json.JSONDecodeError:\n            data = {\"raw\": body}\n        return exc.code, data\n\n\ndef _assert(condition: bool, message: str) -> None:\n    if not condition:\n        raise AssertionError(message)\n\n\ndef _expect_keys(obj: Dict[str, Any], keys: List[str], path: str) -> None:\n    for key in keys:\n        _assert(key in obj, f\"missing key '{key}' in {path}\")\n\n\ndef test_health(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/health\")\n    _assert(code == 200, f\"/health expected 200, got {code}\")\n    _expect_keys(data, [\"status\", \"node\"], \"/health\")\n    _assert(data[\"status\"] == \"ok\", \"health status must be 'ok'\")\n    _assert(isinstance(data[\"node\"], dict), \"health node must be object\")\n    _expect_keys(data[\"node\"], [\"name\", \"os\"], \"/health.node\")\n\n\ndef test_system(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/api/v1/system\")\n    _assert(code == 200, f\"/api/v1/system expected 200, got {code}\")\n    _expect_keys(data, [\"node\", \"system\"], \"/api/v1/system\")\n    _expect_keys(\n        data[\"system\"],\n        [\"total_ram_gb\", \"available_ram_gb\", \"cpu_cores\", \"cpu_name\", \"has_gpu\", \"backend\", \"gpus\"],\n        \"/api/v1/system.system\",\n    )\n\n\ndef test_models_envelope_and_limit(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/api/v1/models?limit=3&sort=score\")\n    _assert(code == 200, f\"/api/v1/models expected 200, got {code}\")\n    _expect_keys(data, [\"node\", \"system\", \"total_models\", \"returned_models\", \"filters\", \"models\"], \"/api/v1/models\")\n    _assert(isinstance(data[\"models\"], list), \"models must be a list\")\n    _assert(data[\"returned_models\"] <= 3, \"returned_models must respect limit\")\n    _assert(len(data[\"models\"]) == data[\"returned_models\"], \"returned_models must equal models length\")\n\n\ndef test_top_endpoint_excludes_too_tight(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/api/v1/models/top?limit=10&min_fit=marginal\")\n    _assert(code == 200, f\"/api/v1/models/top expected 200, got {code}\")\n    models = data.get(\"models\", [])\n    for row in models:\n        _assert(row.get(\"fit_level\") != \"too_tight\", \"/models/top should not include too_tight fits\")\n\n\ndef test_filters_runtime_and_use_case(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/api/v1/models?limit=10&runtime=any&use_case=general\")\n    _assert(code == 200, f\"runtime/use_case filter query expected 200, got {code}\")\n    models = data.get(\"models\", [])\n    for row in models:\n        category = str(row.get(\"category\", \"\")).lower()\n        _assert(category == \"general\", \"use_case=general should only return General category\")\n\n\ndef test_models_shape(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/api/v1/models?limit=5\")\n    _assert(code == 200, f\"/api/v1/models shape query expected 200, got {code}\")\n    models = data.get(\"models\", [])\n    if not models:\n        return\n\n    sample = models[0]\n    _expect_keys(\n        sample,\n        [\n            \"name\",\n            \"provider\",\n            \"fit_level\",\n            \"run_mode\",\n            \"score\",\n            \"estimated_tps\",\n            \"runtime\",\n            \"best_quant\",\n            \"memory_required_gb\",\n            \"memory_available_gb\",\n            \"utilization_pct\",\n            \"score_components\",\n        ],\n        \"/api/v1/models.models[0]\",\n    )\n    _expect_keys(sample[\"score_components\"], [\"quality\", \"speed\", \"fit\", \"context\"], \"/score_components\")\n\n\ndef test_name_lookup(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/api/v1/models?limit=1\")\n    _assert(code == 200, f\"seed query expected 200, got {code}\")\n    models = data.get(\"models\", [])\n    if not models:\n        return\n\n    raw_name = str(models[0].get(\"name\", \"\")).strip()\n    _assert(raw_name, \"expected at least one model name\")\n\n    token = raw_name.split(\"/\")[-1].split(\"-\")[0] or raw_name[:8]\n    path_name = urllib.parse.quote(token, safe=\"\")\n\n    code2, data2 = _http_json(f\"{base_url}/api/v1/models/{path_name}?limit=10\")\n    _assert(code2 == 200, f\"/api/v1/models/{{name}} expected 200, got {code2}\")\n    _expect_keys(data2, [\"models\"], \"/api/v1/models/{name}\")\n    result_models = data2.get(\"models\", [])\n\n    if result_models:\n        lower_token = token.lower()\n        matched = any(lower_token in str(row.get(\"name\", \"\")).lower() for row in result_models)\n        _assert(matched, \"name lookup should return at least one model matching token\")\n\n\ndef test_invalid_filter_returns_400(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/api/v1/models?min_fit=nope\")\n    _assert(code == 400, f\"invalid min_fit expected 400, got {code}\")\n    _expect_keys(data, [\"error\"], \"error response\")\n\n\ndef test_sort_score_desc(base_url: str) -> None:\n    code, data = _http_json(f\"{base_url}/api/v1/models?limit=25&sort=score\")\n    _assert(code == 200, f\"sort=score query expected 200, got {code}\")\n\n    scores: List[float] = []\n    for row in data.get(\"models\", []):\n        fit_level = row.get(\"fit_level\")\n        if fit_level == \"too_tight\":\n            continue\n        score = row.get(\"score\")\n        if isinstance(score, (int, float)):\n            scores.append(float(score))\n\n    for i in range(1, len(scores)):\n        _assert(scores[i - 1] >= scores[i] - 1e-9, \"scores should be non-increasing for sort=score\")\n\n\ndef wait_for_health(base_url: str, timeout_s: float = 30.0) -> None:\n    deadline = time.time() + timeout_s\n    while time.time() < deadline:\n        try:\n            code, data = _http_json(f\"{base_url}/health\", timeout=2.0)\n            if code == 200 and data.get(\"status\") == \"ok\":\n                return\n        except Exception:\n            pass\n        time.sleep(0.5)\n    raise RuntimeError(f\"server did not become healthy at {base_url} within {timeout_s}s\")\n\n\ndef spawn_server(base_url: str, project_root: str) -> subprocess.Popen:\n    parsed = urllib.parse.urlparse(base_url)\n    host = parsed.hostname or \"127.0.0.1\"\n    port = parsed.port or 8787\n\n    cmd = [\n        \"cargo\",\n        \"run\",\n        \"-p\",\n        \"llmfit\",\n        \"--\",\n        \"serve\",\n        \"--host\",\n        host,\n        \"--port\",\n        str(port),\n    ]\n\n    proc = subprocess.Popen(\n        cmd,\n        cwd=project_root,\n        stdout=subprocess.PIPE,\n        stderr=subprocess.STDOUT,\n        text=True,\n    )\n    return proc\n\n\ndef run_all_tests(base_url: str) -> None:\n    tests = [\n        (\"health\", test_health),\n        (\"system\", test_system),\n        (\"models envelope+limit\", test_models_envelope_and_limit),\n        (\"top excludes too_tight\", test_top_endpoint_excludes_too_tight),\n        (\"filters runtime/use_case\", test_filters_runtime_and_use_case),\n        (\"model row shape\", test_models_shape),\n        (\"name lookup\", test_name_lookup),\n        (\"invalid filter 400\", test_invalid_filter_returns_400),\n        (\"sort score desc\", test_sort_score_desc),\n    ]\n\n    for name, fn in tests:\n        fn(base_url)\n        print(f\"✓ {name}\")\n\n\ndef main() -> int:\n    parser = argparse.ArgumentParser(description=\"Run llmfit REST API validation tests\")\n    parser.add_argument(\"--base-url\", default=\"http://127.0.0.1:8787\", help=\"API base URL\")\n    parser.add_argument(\n        \"--spawn\",\n        action=\"store_true\",\n        help=\"Spawn llmfit serve automatically (requires cargo in PATH)\",\n    )\n    parser.add_argument(\n        \"--project-root\",\n        default=os.path.abspath(os.path.join(os.path.dirname(__file__), \"..\")),\n        help=\"Project root used when --spawn is set\",\n    )\n    args = parser.parse_args()\n\n    proc: Optional[subprocess.Popen] = None\n\n    try:\n        if args.spawn:\n            print(f\"Spawning server at {args.base_url} ...\")\n            proc = spawn_server(args.base_url, args.project_root)\n            wait_for_health(args.base_url, timeout_s=45.0)\n\n        print(f\"Running API tests against {args.base_url}\")\n        run_all_tests(args.base_url)\n        print(\"\\nAll API tests passed.\")\n        return 0\n\n    except Exception as exc:\n        print(f\"\\nAPI tests failed: {exc}\", file=sys.stderr)\n        if proc and proc.stdout:\n            try:\n                output = proc.stdout.read(4000)\n                if output:\n                    print(\"\\nServer output:\", file=sys.stderr)\n                    print(output, file=sys.stderr)\n            except Exception:\n                pass\n        return 1\n\n    finally:\n        if proc is not None:\n            proc.terminate()\n            try:\n                proc.wait(timeout=5)\n            except subprocess.TimeoutExpired:\n                proc.kill()\n\n\nif __name__ == \"__main__\":\n    raise SystemExit(main())\n"
  },
  {
    "path": "scripts/update_models.sh",
    "content": "#!/usr/bin/env bash\n# Automated model database update script for llmfit\n# This script:\n# 1. Runs the HuggingFace model scraper to fetch latest model data\n# 2. Verifies the JSON output is valid\n# 3. Rebuilds the Rust binary with updated embedded data\n# 4. Optionally runs tests to ensure everything works\n\nset -e  # Exit on error\n\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\nPROJECT_ROOT=\"$(cd \"$SCRIPT_DIR/..\" && pwd)\"\nDATA_FILE=\"$PROJECT_ROOT/data/hf_models.json\"\n\n# Colors for output\nRED='\\033[0;31m'\nGREEN='\\033[0;32m'\nYELLOW='\\033[1;33m'\nBLUE='\\033[0;34m'\nNC='\\033[0m' # No Color\n\necho -e \"${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}\"\necho -e \"${BLUE}  llmfit Model Database Update${NC}\"\necho -e \"${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}\"\necho\n\n# Check if Python 3 is available\nif ! command -v python3 &> /dev/null; then\n    echo -e \"${RED}✗ Error: python3 not found${NC}\"\n    exit 1\nfi\n\n# Backup existing data file\nif [ -f \"$DATA_FILE\" ]; then\n    BACKUP_FILE=\"$DATA_FILE.backup.$(date +%Y%m%d_%H%M%S)\"\n    echo -e \"${YELLOW}📦 Backing up existing data to:${NC}\"\n    echo \"   $BACKUP_FILE\"\n    cp \"$DATA_FILE\" \"$BACKUP_FILE\"\n    echo\nfi\n\n# Run the scraper\necho -e \"${BLUE}🔄 Running HuggingFace model scraper...${NC}\"\necho\ncd \"$PROJECT_ROOT\"\npython3 scripts/scrape_hf_models.py\n\nif [ $? -ne 0 ]; then\n    echo\n    echo -e \"${RED}✗ Scraper failed${NC}\"\n    exit 1\nfi\n\necho\n\n# Verify JSON is valid\necho -e \"${BLUE}🔍 Verifying JSON output...${NC}\"\nif ! python3 -m json.tool \"$DATA_FILE\" > /dev/null 2>&1; then\n    echo -e \"${RED}✗ Invalid JSON generated${NC}\"\n    # Restore backup if available\n    if [ -f \"$BACKUP_FILE\" ]; then\n        echo -e \"${YELLOW}📦 Restoring backup...${NC}\"\n        mv \"$BACKUP_FILE\" \"$DATA_FILE\"\n    fi\n    exit 1\nfi\n\nMODEL_COUNT=$(python3 -c \"import json; print(len(json.load(open('$DATA_FILE'))))\")\necho -e \"${GREEN}✓ Valid JSON with $MODEL_COUNT models${NC}\"\necho\n\n# Check if cargo is available\nif command -v cargo &> /dev/null; then\n    # Rebuild with updated data\n    echo -e \"${BLUE}🔨 Rebuilding llmfit with updated model data...${NC}\"\n    cargo build --release\n    \n    if [ $? -eq 0 ]; then\n        echo -e \"${GREEN}✓ Build successful${NC}\"\n        echo\n        \n        # Show build artifact location\n        if [ -f \"$PROJECT_ROOT/target/release/llmfit\" ]; then\n            BINARY_SIZE=$(ls -lh \"$PROJECT_ROOT/target/release/llmfit\" | awk '{print $5}')\n            echo -e \"${GREEN}📦 Binary location:${NC} target/release/llmfit (${BINARY_SIZE})\"\n        fi\n    else\n        echo -e \"${RED}✗ Build failed${NC}\"\n        exit 1\n    fi\nelse\n    echo -e \"${YELLOW}⚠ cargo not found, skipping rebuild${NC}\"\n    echo -e \"${YELLOW}  Run 'cargo build --release' manually to rebuild${NC}\"\nfi\n\necho\necho -e \"${GREEN}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}\"\necho -e \"${GREEN}✓ Model database update complete!${NC}\"\necho -e \"${GREEN}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}\"\necho\necho -e \"${BLUE}Next steps:${NC}\"\necho \"  • Run './target/release/llmfit' to test the updated binary\"\necho \"  • Check 'data/hf_models.json' for the updated model list\"\nif [ ! -z \"$BACKUP_FILE\" ]; then\n    echo \"  • Delete backup file if satisfied: rm $BACKUP_FILE\"\nfi\necho\n"
  },
  {
    "path": "scripts/verify_models.py",
    "content": "#!/usr/bin/env python3\n\"\"\"Verify that all models in hf_models.json exist on HuggingFace and all\nOllama mappings in src/providers.rs point to valid Ollama registry entries.\n\nUsage:\n    python3 scripts/verify_models.py            # check both\n    python3 scripts/verify_models.py --hf       # HuggingFace only\n    python3 scripts/verify_models.py --ollama   # Ollama only\n\nExits with code 1 if any model is missing. Suitable for CI.\n\"\"\"\n\nimport argparse\nimport json\nimport re\nimport sys\nimport time\nimport urllib.request\nimport urllib.error\nfrom pathlib import Path\n\nREPO_ROOT = Path(__file__).resolve().parent.parent\nHF_MODELS_PATH = REPO_ROOT / \"data\" / \"hf_models.json\"\nPROVIDERS_RS_PATH = REPO_ROOT / \"src\" / \"providers.rs\"\n\nHEADERS = {\"User-Agent\": \"llmfit-verify/1.0\"}\nREQUEST_DELAY = 0.3  # seconds between requests to avoid rate limiting\n\n\ndef check_url(url: str) -> int:\n    \"\"\"GET a URL and return the HTTP status code, or -1 on error.\"\"\"\n    try:\n        req = urllib.request.Request(url, headers=HEADERS)\n        resp = urllib.request.urlopen(req, timeout=10)\n        return resp.status\n    except urllib.error.HTTPError as e:\n        return e.code\n    except Exception:\n        return -1\n\n\n# ---------------------------------------------------------------------------\n# HuggingFace verification\n# ---------------------------------------------------------------------------\n\ndef load_hf_models() -> list[str]:\n    \"\"\"Return list of HF repo names from hf_models.json.\"\"\"\n    with open(HF_MODELS_PATH) as f:\n        data = json.load(f)\n    return [m[\"name\"] for m in data]\n\n\ndef verify_hf(models: list[str]) -> list[str]:\n    \"\"\"Check each HF model exists. Returns list of missing model names.\"\"\"\n    missing = []\n    total = len(models)\n    for i, name in enumerate(models, 1):\n        url = f\"https://huggingface.co/api/models/{name}\"\n        status = check_url(url)\n        if status == 200:\n            print(f\"  [{i}/{total}] ✓ {name}\")\n        else:\n            print(f\"  [{i}/{total}] ✗ {name} (HTTP {status})\")\n            missing.append(name)\n        time.sleep(REQUEST_DELAY)\n    return missing\n\n\n# ---------------------------------------------------------------------------\n# Ollama verification\n# ---------------------------------------------------------------------------\n\ndef parse_ollama_tags() -> list[str]:\n    \"\"\"Extract unique Ollama tags from OLLAMA_MAPPINGS in providers.rs.\"\"\"\n    src = PROVIDERS_RS_PATH.read_text()\n\n    # Find the OLLAMA_MAPPINGS block\n    match = re.search(\n        r\"const OLLAMA_MAPPINGS:.*?=.*?\\[(.+?)\\];\",\n        src,\n        re.DOTALL,\n    )\n    if not match:\n        print(\"ERROR: Could not find OLLAMA_MAPPINGS in providers.rs\")\n        sys.exit(2)\n\n    block = match.group(1)\n    # Extract the second element of each tuple: (\"hf_name\", \"ollama_tag\")\n    tags = re.findall(r'\\(\\s*\"[^\"]+\"\\s*,\\s*\"([^\"]+)\"\\s*\\)', block)\n    # Deduplicate while preserving order\n    seen = set()\n    unique = []\n    for tag in tags:\n        if tag not in seen:\n            seen.add(tag)\n            unique.append(tag)\n    return unique\n\n\ndef verify_ollama(tags: list[str]) -> list[str]:\n    \"\"\"Check each Ollama tag exists. Returns list of missing tags.\"\"\"\n    missing = []\n    total = len(tags)\n    for i, tag in enumerate(tags, 1):\n        url = f\"https://ollama.com/library/{tag}\"\n        status = check_url(url)\n        if status == 200:\n            print(f\"  [{i}/{total}] ✓ {tag}\")\n        else:\n            print(f\"  [{i}/{total}] ✗ {tag} (HTTP {status})\")\n            missing.append(tag)\n        time.sleep(REQUEST_DELAY)\n    return missing\n\n\n# ---------------------------------------------------------------------------\n# Main\n# ---------------------------------------------------------------------------\n\ndef main():\n    parser = argparse.ArgumentParser(description=\"Verify model availability\")\n    parser.add_argument(\"--hf\", action=\"store_true\", help=\"Check HuggingFace only\")\n    parser.add_argument(\"--ollama\", action=\"store_true\", help=\"Check Ollama only\")\n    args = parser.parse_args()\n\n    # Default: check both\n    check_hf = args.hf or not args.ollama\n    check_ollama = args.ollama or not args.hf\n\n    failures = False\n\n    if check_hf:\n        models = load_hf_models()\n        print(f\"\\n=== HuggingFace: checking {len(models)} models ===\\n\")\n        missing = verify_hf(models)\n        if missing:\n            failures = True\n            print(f\"\\n  ⚠ {len(missing)} HuggingFace model(s) not found:\")\n            for m in missing:\n                print(f\"    - {m}\")\n        else:\n            print(f\"\\n  All {len(models)} HuggingFace models verified ✓\")\n\n    if check_ollama:\n        tags = parse_ollama_tags()\n        print(f\"\\n=== Ollama: checking {len(tags)} tags ===\\n\")\n        missing = verify_ollama(tags)\n        if missing:\n            failures = True\n            print(f\"\\n  ⚠ {len(missing)} Ollama tag(s) not found:\")\n            for t in missing:\n                print(f\"    - {t}\")\n        else:\n            print(f\"\\n  All {len(tags)} Ollama tags verified ✓\")\n\n    print()\n    if failures:\n        print(\"FAIL: Some models are unavailable. Fix mappings or remove entries.\")\n        sys.exit(1)\n    else:\n        print(\"PASS: All models verified.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/llmfit-advisor/SKILL.md",
    "content": "---\nname: llmfit-advisor\ndescription: Detect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM models with optimal quantization, speed estimates, and fit scoring.\nmetadata:\n  {\n    \"openclaw\":\n      {\n        \"emoji\": \"🧠\",\n        \"requires\": { \"bins\": [\"llmfit\"] },\n        \"install\":\n          [\n            {\n              \"id\": \"brew\",\n              \"kind\": \"brew\",\n              \"formula\": \"llmfit\",\n              \"bins\": [\"llmfit\"],\n              \"label\": \"Install llmfit (brew)\",\n            },\n            {\n              \"id\": \"cargo\",\n              \"kind\": \"node\",\n              \"bins\": [\"llmfit\"],\n              \"label\": \"Install llmfit (cargo install llmfit)\",\n            },\n          ],\n      },\n  }\n---\n\n# llmfit-advisor\n\nHardware-aware local LLM advisor. Detects your system specs (RAM, CPU, GPU/VRAM) and recommends models that actually fit, with optimal quantization and speed estimates.\n\n## When to use (trigger phrases)\n\nUse this skill immediately when the user asks any of:\n\n- \"what local models can I run?\"\n- \"which LLMs fit my hardware?\"\n- \"recommend a local model\"\n- \"what's the best model for my GPU?\"\n- \"can I run Llama 70B locally?\"\n- \"configure local models\"\n- \"set up Ollama models\"\n- \"what models fit my VRAM?\"\n- \"help me pick a local model for coding\"\n\nAlso use this skill when:\n\n- The user wants to configure `models.providers.ollama` or `models.providers.lmstudio`\n- The user mentions running models locally and you need to know what fits\n- A model recommendation is needed and the user has local inference capability (Ollama, vLLM, LM Studio)\n\n## Quick start\n\n### Detect hardware\n\n```bash\nllmfit --json system\n```\n\nReturns JSON with CPU, RAM, GPU name, VRAM, multi-GPU info, and whether memory is unified (Apple Silicon).\n\n### Get top recommendations\n\n```bash\nllmfit recommend --json --limit 5\n```\n\nReturns the top 5 models ranked by a composite score (quality, speed, fit, context) with optimal quantization for the detected hardware.\n\n### Filter by use case\n\n```bash\nllmfit recommend --json --use-case coding --limit 3\nllmfit recommend --json --use-case reasoning --limit 3\nllmfit recommend --json --use-case chat --limit 3\n```\n\nValid use cases: `general`, `coding`, `reasoning`, `chat`, `multimodal`, `embedding`.\n\n### Filter by minimum fit level\n\n```bash\nllmfit recommend --json --min-fit good --limit 10\n```\n\nValid fit levels (best to worst): `perfect`, `good`, `marginal`.\n\n## Understanding the output\n\n### System JSON\n\n```json\n{\n  \"system\": {\n    \"cpu_name\": \"Apple M2 Max\",\n    \"cpu_cores\": 12,\n    \"total_ram_gb\": 32.0,\n    \"available_ram_gb\": 24.5,\n    \"has_gpu\": true,\n    \"gpu_name\": \"Apple M2 Max\",\n    \"gpu_vram_gb\": 32.0,\n    \"gpu_count\": 1,\n    \"backend\": \"Metal\",\n    \"unified_memory\": true\n  }\n}\n```\n\n### Recommendation JSON\n\nEach model in the `models` array includes:\n\n| Field | Meaning |\n|---|---|\n| `name` | HuggingFace model ID (e.g. `meta-llama/Llama-3.1-8B-Instruct`) |\n| `provider` | Model provider (Meta, Alibaba, Google, etc.) |\n| `params_b` | Parameter count in billions |\n| `score` | Composite score 0–100 (higher is better) |\n| `score_components` | Breakdown: `quality`, `speed`, `fit`, `context` (each 0–100) |\n| `fit_level` | `Perfect`, `Good`, `Marginal`, or `TooTight` |\n| `run_mode` | `GPU`, `CPU+GPU Offload`, or `CPU` |\n| `category` | Model category (e.g. `Reasoning`, `Coding`, `Chat`, `Embedding`) |\n| `is_moe` | Whether the model uses Mixture of Experts architecture |\n| `parameter_count` | Human-readable param count string (e.g. `\"7.6B\"`) |\n| `notes` | Array of human-readable notes about the recommendation |\n| `best_quant` | Optimal quantization for the hardware (e.g. `Q5_K_M`, `Q4_K_M`) |\n| `estimated_tps` | Estimated tokens per second |\n| `memory_required_gb` | VRAM/RAM needed at this quantization |\n| `memory_available_gb` | Available VRAM/RAM detected |\n| `utilization_pct` | How much of available memory the model uses |\n| `use_case` | What the model is designed for |\n| `context_length` | Maximum context window |\n\n### Fit levels explained\n\n- **Perfect**: Model fits comfortably with room to spare. Ideal choice.\n- **Good**: Model fits but uses most available memory. Will work well.\n- **Marginal**: Model barely fits. May work but expect slower performance or reduced context.\n- **TooTight**: Model does not fit. Do not recommend.\n\n### Run modes explained\n\n- **GPU**: Full GPU inference. Fastest. Model weights loaded entirely into VRAM.\n- **CPU+GPU Offload**: Some layers on GPU, rest in system RAM. Slower than pure GPU.\n- **CPU**: All inference on CPU using system RAM. Slowest but works without GPU.\n\n## Configuring OpenClaw with results\n\nAfter getting recommendations, configure the user's local model provider.\n\n### For Ollama\n\nMap the HuggingFace model name to its Ollama tag. Common mappings:\n\n| llmfit name | Ollama tag |\n|---|---|\n| `meta-llama/Llama-3.1-8B-Instruct` | `llama3.1:8b` |\n| `meta-llama/Llama-3.3-70B-Instruct` | `llama3.3:70b` |\n| `Qwen/Qwen2.5-Coder-7B-Instruct` | `qwen2.5-coder:7b` |\n| `Qwen/Qwen2.5-72B-Instruct` | `qwen2.5:72b` |\n| `deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct` | `deepseek-coder-v2:16b` |\n| `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B` | `deepseek-r1:32b` |\n| `google/gemma-2-9b-it` | `gemma2:9b` |\n| `mistralai/Mistral-7B-Instruct-v0.3` | `mistral:7b` |\n| `microsoft/Phi-3-mini-4k-instruct` | `phi3:mini` |\n| `microsoft/Phi-4-mini-instruct` | `phi4-mini` |\n\nThen update `openclaw.json`:\n\n```json\n{\n  \"models\": {\n    \"providers\": {\n      \"ollama\": {\n        \"models\": [\"ollama/<ollama-tag>\"]\n      }\n    }\n  }\n}\n```\n\nAnd optionally set as default:\n\n```json\n{\n  \"agents\": {\n    \"defaults\": {\n      \"model\": {\n        \"primary\": \"ollama/<ollama-tag>\"\n      }\n    }\n  }\n}\n```\n\n### For vLLM / LM Studio\n\nUse the HuggingFace model name directly as the model identifier with the appropriate provider prefix (`vllm/` or `lmstudio/`).\n\n## Workflow example\n\nWhen a user asks \"what local models can I run?\":\n\n1. Run `llmfit --json system` to show hardware summary\n2. Run `llmfit recommend --json --limit 5` to get top picks\n3. Present the recommendations with scores and fit levels\n4. If the user wants to configure one, map it to the appropriate Ollama/vLLM/LM Studio tag\n5. Offer to update `openclaw.json` with the chosen model\n\nWhen a user asks for a specific use case like \"recommend a coding model\":\n\n1. Run `llmfit recommend --json --use-case coding --limit 3`\n2. Present the coding-specific recommendations\n3. Offer to pull via Ollama and configure\n\n## Notes\n\n- llmfit detects NVIDIA GPUs (via nvidia-smi), AMD GPUs (via rocm-smi), and Apple Silicon (unified memory).\n- Multi-GPU setups aggregate VRAM across cards automatically.\n- The `best_quant` field tells you the optimal quantization — higher quant (Q6_K, Q8_0) means better quality if VRAM allows.\n- Speed estimates (`estimated_tps`) are approximate and vary by hardware and quantization.\n- Models with `fit_level: \"TooTight\"` should never be recommended to users.\n"
  }
]