Full Code of huggingface/skills for AI

main 73246ad6dc1a cached
129 files
1.1 MB
287.8k tokens
263 symbols
1 requests
Download .txt
Showing preview only (1,173K chars total). Download the full file or copy to clipboard to get everything.
Repository: huggingface/skills
Branch: main
Commit: 73246ad6dc1a
Files: 129
Total size: 1.1 MB

Directory structure:
gitextract_brvqcrv9/

├── .claude-plugin/
│   ├── marketplace.json
│   └── plugin.json
├── .cursor-plugin/
│   ├── marketplace.json
│   └── plugin.json
├── .github/
│   └── workflows/
│       ├── SECURITY.md
│       ├── generate-agents.yml
│       ├── push-evals-leaderboard.yml
│       ├── push-hackers-leaderboard.yml
│       └── push-quests.yml
├── .gitignore
├── .mcp.json
├── LICENSE
├── README.md
├── agents/
│   └── AGENTS.md
├── apps/
│   ├── evals-leaderboard/
│   │   ├── app.py
│   │   ├── collect_evals.py
│   │   └── requirements.txt
│   ├── hackers-leaderboard/
│   │   ├── README.md
│   │   ├── app.py
│   │   ├── collect_points.py
│   │   └── requirements.txt
│   └── quests/
│       ├── 02_evaluate-hub-model.md
│       ├── 03_publish-hub-dataset.md
│       ├── 04_sft-finetune-hub.md
│       └── README.md
├── gemini-extension.json
├── hf-mcp/
│   └── skills/
│       └── hf-mcp/
│           └── SKILL.md
├── scripts/
│   ├── AGENTS_TEMPLATE.md
│   ├── UV_RULES.md
│   ├── generate_agents.py
│   ├── generate_cursor_plugin.py
│   ├── publish.sh
│   └── run_skills_help.py
└── skills/
    ├── hf-cli/
    │   └── SKILL.md
    ├── hugging-face-dataset-viewer/
    │   └── SKILL.md
    ├── hugging-face-datasets/
    │   ├── SKILL.md
    │   ├── examples/
    │   │   ├── diverse_training_examples.json
    │   │   ├── system_prompt_template.txt
    │   │   └── training_examples.json
    │   ├── scripts/
    │   │   ├── dataset_manager.py
    │   │   └── sql_manager.py
    │   └── templates/
    │       ├── chat.json
    │       ├── classification.json
    │       ├── completion.json
    │       ├── custom.json
    │       ├── qa.json
    │       └── tabular.json
    ├── hugging-face-evaluation/
    │   ├── SKILL.md
    │   ├── examples/
    │   │   ├── USAGE_EXAMPLES.md
    │   │   ├── artificial_analysis_to_hub.py
    │   │   ├── example_readme_tables.md
    │   │   └── metric_mapping.json
    │   └── scripts/
    │       ├── evaluation_manager.py
    │       ├── inspect_eval_uv.py
    │       ├── inspect_vllm_uv.py
    │       ├── lighteval_vllm_uv.py
    │       ├── run_eval_job.py
    │       ├── run_vllm_eval_job.py
    │       └── test_extraction.py
    ├── hugging-face-jobs/
    │   ├── SKILL.md
    │   ├── index.html
    │   ├── references/
    │   │   ├── hardware_guide.md
    │   │   ├── hub_saving.md
    │   │   ├── token_usage.md
    │   │   └── troubleshooting.md
    │   └── scripts/
    │       ├── cot-self-instruct.py
    │       ├── finepdfs-stats.py
    │       └── generate-responses.py
    ├── hugging-face-model-trainer/
    │   ├── SKILL.md
    │   ├── references/
    │   │   ├── gguf_conversion.md
    │   │   ├── hardware_guide.md
    │   │   ├── hub_saving.md
    │   │   ├── local_training_macos.md
    │   │   ├── reliability_principles.md
    │   │   ├── trackio_guide.md
    │   │   ├── training_methods.md
    │   │   ├── training_patterns.md
    │   │   ├── troubleshooting.md
    │   │   └── unsloth.md
    │   └── scripts/
    │       ├── convert_to_gguf.py
    │       ├── dataset_inspector.py
    │       ├── estimate_cost.py
    │       ├── train_dpo_example.py
    │       ├── train_grpo_example.py
    │       ├── train_sft_example.py
    │       └── unsloth_sft_example.py
    ├── hugging-face-paper-pages/
    │   └── SKILL.md
    ├── hugging-face-paper-publisher/
    │   ├── SKILL.md
    │   ├── examples/
    │   │   └── example_usage.md
    │   ├── references/
    │   │   └── quick_reference.md
    │   ├── scripts/
    │   │   └── paper_manager.py
    │   └── templates/
    │       ├── arxiv.md
    │       ├── ml-report.md
    │       ├── modern.md
    │       └── standard.md
    ├── hugging-face-tool-builder/
    │   ├── SKILL.md
    │   └── references/
    │       ├── baseline_hf_api.py
    │       ├── baseline_hf_api.sh
    │       ├── baseline_hf_api.tsx
    │       ├── find_models_by_paper.sh
    │       ├── hf_enrich_models.sh
    │       ├── hf_model_card_frontmatter.sh
    │       └── hf_model_papers_auth.sh
    ├── hugging-face-trackio/
    │   ├── .claude-plugin/
    │   │   └── plugin.json
    │   ├── SKILL.md
    │   └── references/
    │       ├── alerts.md
    │       ├── logging_metrics.md
    │       └── retrieving_metrics.md
    ├── hugging-face-vision-trainer/
    │   ├── SKILL.md
    │   ├── references/
    │   │   ├── finetune_sam2_trainer.md
    │   │   ├── hub_saving.md
    │   │   ├── image_classification_training_notebook.md
    │   │   ├── object_detection_training_notebook.md
    │   │   ├── reliability_principles.md
    │   │   └── timm_trainer.md
    │   └── scripts/
    │       ├── dataset_inspector.py
    │       ├── estimate_cost.py
    │       ├── image_classification_training.py
    │       ├── object_detection_training.py
    │       └── sam_segmentation_training.py
    ├── huggingface-gradio/
    │   ├── SKILL.md
    │   └── examples.md
    └── transformers.js/
        ├── SKILL.md
        └── references/
            ├── CACHE.md
            ├── CONFIGURATION.md
            ├── EXAMPLES.md
            ├── MODEL_ARCHITECTURES.md
            ├── PIPELINE_OPTIONS.md
            └── TEXT_GENERATION.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .claude-plugin/marketplace.json
================================================
{
  "name": "huggingface-skills",
  "owner": {
    "name": "Hugging Face"
  },
  "metadata": {
    "description": "Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub",
    "version": "1.0.1"
  },
  "plugins": [
    {
      "name": "hugging-face-model-trainer",
      "source": "./skills/hugging-face-model-trainer",
      "skills": "./",
      "description": "Train or fine-tune language models using TRL on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes hardware selection, cost estimation, Trackio monitoring, and Hub persistence."
    },
    {
      "name": "hugging-face-paper-publisher",
      "source": "./skills/hugging-face-paper-publisher",
      "skills": "./",
      "description": "Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles."
    },
    {
      "name": "hugging-face-paper-pages",
      "source": "./skills/hugging-face-paper-pages",
      "skills": "./",
      "description": "Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata like authors, linked models, datasets, Spaces, and media URLs when needed."
    },
    {
      "name": "hugging-face-datasets",
      "source": "./skills/hugging-face-datasets",
      "skills": "./",
      "description": "Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation."
    },
    {
      "name": "hugging-face-evaluation",
      "source": "./skills/hugging-face-evaluation",
      "skills": "./",
      "description": "Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom evaluations with vLLM/lighteval."
    },
    {
      "name": "hugging-face-tool-builder",
      "source": "./skills/hugging-face-tool-builder",
      "skills": "./",
      "description": "Build reusable scripts for Hugging Face API operations. Useful for chaining API calls or automating repeated tasks."
    },
    {
      "name": "hf-cli",
      "source": "./skills/hf-cli",
      "skills": "./",
      "description": "Execute Hugging Face Hub operations using the hf CLI. Download models/datasets, upload files, manage repos, and run cloud compute jobs."
    },
    {
      "name": "hugging-face-jobs",
      "source": "./skills/hugging-face-jobs",
      "skills": "./",
      "description": "Run compute jobs on Hugging Face infrastructure. Execute Python scripts, manage scheduled jobs, and monitor job status."
    },
    {
      "name": "hugging-face-trackio",
      "source": "./skills/hugging-face-trackio",
      "skills": "./",
      "description": "Track and visualize ML training experiments with Trackio. Log metrics via Python API and retrieve them via CLI. Supports real-time dashboards synced to HF Spaces."
    },
    {
      "name": "hugging-face-dataset-viewer",
      "source": "./skills/hugging-face-dataset-viewer",
      "skills": "./",
      "description": "Explore, query, and extract data from any Hugging Face dataset using the Dataset Viewer REST API and npx tooling. Zero Python dependencies — covers split/config discovery, row pagination, text search, filtering, SQL via parquetlens, and dataset upload via CLI."
    },
    {
      "name": "gradio",
      "source": "./skills/huggingface-gradio",
      "skills": "./",
      "description": "Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, components, event listeners, layouts, or chatbots."
    },
    {
      "name": "transformers-js",
      "source": "./skills/transformers.js",
      "skills": "./",
      "description": "Run state-of-the-art machine learning models directly in JavaScript/TypeScript for NLP, computer vision, audio processing, and multimodal tasks. Works in Node.js and browsers with WebGPU/WASM using Hugging Face models."
    },

    {
      "name": "hugging-face-vision-trainer",
      "source": "./skills/hugging-face-vision-trainer",
      "skills": "./",
      "description": "Train and fine-tune object detection models (RTDETRv2, YOLOS, DETR and others) and image classification models (timm and transformers models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3) using Transformers Trainer API on Hugging Face Jobs infrastructure or locally. Includes COCO dataset format support, Albumentations augmentation, mAP/mAR metrics, trackio tracking, hardware selection, and Hub persistence."
    }
  ]
}


================================================
FILE: .claude-plugin/plugin.json
================================================
{
  "name": "huggingface-skills",
  "description": "Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub",
  "version": "1.0.1",
  "author": {
    "name": "Hugging Face"
  },
  "homepage": "https://github.com/huggingface/skills",
  "repository": "https://github.com/huggingface/skills",
  "license": "Apache-2.0",
  "keywords": [
    "huggingface",
    "machine-learning",
    "datasets",
    "training",
    "evaluation",
    "papers",
    "fine-tuning",
    "llm"
  ]
}



================================================
FILE: .cursor-plugin/marketplace.json
================================================
{
  "name": "huggingface-skills",
  "owner": {
    "name": "Hugging Face"
  },
  "metadata": {
    "description": "Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub",
    "version": "1.0.0"
  },
  "plugins": [
    {
      "name": "huggingface-skills",
      "source": ".",
      "skills": "skills",
      "description": "Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub"
    }
  ]
}


================================================
FILE: .cursor-plugin/plugin.json
================================================
{
  "name": "huggingface-skills",
  "skills": "skills",
  "mcpServers": ".mcp.json",
  "description": "Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub",
  "version": "1.0.1",
  "author": {
    "name": "Hugging Face"
  },
  "homepage": "https://github.com/huggingface/skills",
  "repository": "https://github.com/huggingface/skills",
  "license": "Apache-2.0",
  "keywords": [
    "huggingface",
    "machine-learning",
    "datasets",
    "training",
    "evaluation",
    "papers",
    "fine-tuning",
    "llm"
  ]
}


================================================
FILE: .github/workflows/SECURITY.md
================================================
# Security Policy

## Reporting a Vulnerability
We take the security of this project seriously. If you discover a security vulnerability within this repository, please **do not open a public issue**. Instead, please report it through one of the following channels:

* **Email:** security@huggingface.co
* **Hugging Face Hub:** Use the "Report" feature on the repository’s Hub page if applicable.

Please include a detailed description of the vulnerability and, if possible, a Proof of Concept (PoC) to help us reproduce the issue.

## Scope
The following types of vulnerabilities are of particular interest:
* **Remote Code Execution (RCE):** Vulnerabilities that allow an agent to execute unauthorized code outside of its intended environment.
* **Credential Leakage:** Skills that inadvertently expose API keys, tokens, or sensitive environment variables.
* **Injection Attacks:** Vulnerabilities in skills that handle user input for database queries or shell commands.

## Our Response Process
1. **Acknowledgment:** We will acknowledge your report within 48 business hours.
2. **Investigation:** Our team will investigate the issue and determine its impact.
3. **Fix and Disclosure:** Once a fix is ready, we will coordinate a release and, with your permission, credit you for the discovery.

Thank you for helping keep the Hugging Face ecosystem safe!

================================================
FILE: .github/workflows/generate-agents.yml
================================================
name: Validate generated agent/plugin artifacts

on:
  pull_request:
    paths:
      - "scripts/AGENTS_TEMPLATE.md"
      - "scripts/generate_agents.py"
      - "scripts/generate_cursor_plugin.py"
      - "scripts/publish.sh"
      - "**/SKILL.md"
      - "agents/AGENTS.md"
      - "README.md"
      - ".claude-plugin/marketplace.json"
      - ".claude-plugin/plugin.json"
      - "gemini-extension.json"
      - ".cursor-plugin/plugin.json"
      - ".mcp.json"

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v6

      - name: Set up uv
        uses: astral-sh/setup-uv@v7

      - name: Ensure generated files are up to date
        run: ./scripts/publish.sh --check


================================================
FILE: .github/workflows/push-evals-leaderboard.yml
================================================
name: Sync Evals Leaderboard to Hugging Face Space

on:
  push:
    branches:
      - main
    paths:
      - 'apps/evals-leaderboard/**'
  workflow_dispatch:

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Install huggingface_hub
        run: curl -LsSf https://hf.co/cli/install.sh | bash

      - name: Upload evals leaderboard to HF Space
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: hf upload hf-skills/distributed-leaderboard apps/evals-leaderboard . --repo-type=space



================================================
FILE: .github/workflows/push-hackers-leaderboard.yml
================================================
name: Sync Hackers Leaderboard to Hugging Face Space

on:
  push:
    branches:
      - main
    paths:
      - 'apps/hackers-leaderboard/**'
  workflow_dispatch:

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Install huggingface_hub
        run: curl -LsSf https://hf.co/cli/install.sh | bash

      - name: Upload hackers leaderboard to HF Space
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: hf upload hf-skills/hacker_leaderboard apps/hackers-leaderboard . --repo-type=space



================================================
FILE: .github/workflows/push-quests.yml
================================================
name: Sync Quests to Hugging Face Space

on:
  push:
    branches:
      - main
    paths:
      - 'quests/**'
  workflow_dispatch:

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Install huggingface_hub
        run: curl -LsSf https://hf.co/cli/install.sh | bash

      - name: Upload quests to HF Space
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: hf upload hf-skills/README quests . --repo-type=space



================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py.cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# UV
#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#uv.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
#poetry.toml

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control
#pdm.lock
#pdm.toml
.pdm-python
.pdm-build/

# pixi
#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
#pixi.lock
#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
#   in the .venv directory. It is recommended not to include this directory in version control.
.pixi

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# Abstra
# Abstra is an AI-powered process automation framework.
# Ignore directories containing user credentials, local state, and settings.
# Learn more at https://abstra.io/docs
.abstra/

# Visual Studio Code
#  Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore 
#  that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
#  and can be added to the global gitignore or merged into this file. However, if you prefer, 
#  you could uncomment the following to ignore the entire vscode folder
# .vscode/

# Ruff stuff:
.ruff_cache/

# PyPI configuration file
.pypirc

# Cursor
#  Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
#  exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
#  refer to https://docs.cursor.com/context/ignore-files
.cursorignore
.cursorindexingignore

# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/
.claude
.fast-agent/

# MacOS
.DS_Store

================================================
FILE: .mcp.json
================================================
{
  "mcpServers": {
    "huggingface-skills": {
      "url": "https://huggingface.co/mcp?login"
    }
  }
}


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# Hugging Face Skills

Hugging Face Skills are definitions for AI/ML tasks like dataset creation, model training, and evaluation. They are interoperable with all major coding agent tools like OpenAI Codex, Anthropic's Claude Code, Google DeepMind's Gemini CLI, and Cursor.

The skills in this repository follow the standardized [Agent Skills](https://agentskills.io/home) format.

## How do Skills work?

In practice, skills are self-contained folders that package instructions, scripts, and resources together for an AI agent to use on a specific use case. Each folder includes a `SKILL.md` file with YAML frontmatter (name and description) followed by the guidance your coding agent follows while the skill is active. 

> [!NOTE]
> 'Skills' is actually an Anthropic term used within Claude AI and Claude Code and not adopted by other agent tools, but we love it! OpenAI Codex uses the open [Agent Skills](https://agentskills.io/specification) format, where each skill is a directory with a `SKILL.md` file that Codex discovers from standard `.agents/skills` locations documented in the [Codex Skills guide](https://developers.openai.com/codex/skills/). Codex can also work with an `AGENTS.md` file. Google Gemini uses 'extensions' to define the instructions for your coding agent in a `gemini-extension.json` file. **This repo is compatible with all of them, and more!**

> [!TIP]
> If your agent doesn't support skills, you can use [`agents/AGENTS.md`](agents/AGENTS.md) directly as a fallback.

## Installation

Hugging Face skills are compatible with Claude Code, Codex, Gemini CLI, and Cursor.

### Claude Code

1. Register the repository as a plugin marketplace:  
   
```
/plugin marketplace add huggingface/skills
```

2. To install a skill, run:  
   
```
/plugin install <skill-name>@huggingface/skills
```

For example:  

```
/plugin install hf-cli@huggingface/skills
```

### Codex

1. Copy or symlink any skills you want to use from this repository's `skills/` directory into one of Codex's standard `.agents/skills` locations (for example, `$REPO_ROOT/.agents/skills` or `$HOME/.agents/skills`) as described in the [Codex Skills guide](https://developers.openai.com/codex/skills/).

2. Once a skill is available in one of those locations, Codex will discover it using the Agent Skills standard and load the `SKILL.md` instructions when it decides to use that skill or when you explicitly invoke it.

3. If your Codex setup still relies on `AGENTS.md`, you can use the generated [`agents/AGENTS.md`](agents/AGENTS.md) file in this repo as a fallback bundle of instructions.

### Gemini CLI

1. This repo includes `gemini-extension.json` to integrate with the Gemini CLI.

2. Install locally:  

```
gemini extensions install . --consent
```

or use the GitHub URL:

```
gemini extensions install https://github.com/huggingface/skills.git --consent
```

4. See [Gemini CLI extensions docs](https://geminicli.com/docs/extensions/#installing-an-extension) for more help.

### Cursor

This repository includes Cursor plugin manifests:

- `.cursor-plugin/plugin.json`
- `.mcp.json` (configured with the Hugging Face MCP server URL)

Install from repository URL (or local checkout) via the Cursor plugin flow.

For contributors, regenerate manifests with:

```bash
./scripts/publish.sh
```

## Skills

This repository contains a few skills to get you started. You can also contribute your own skills to the repository.

### Available skills

<!-- This table is auto-generated by scripts/generate_agents.py. Do not edit manually. -->
<!-- BEGIN_SKILLS_TABLE -->
| Name | Description | Documentation |
|------|-------------|---------------|
| `gradio` | Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, components, event listeners, layouts, or chatbots. | [SKILL.md](skills/huggingface-gradio/SKILL.md) |
| `hf-cli` | Execute Hugging Face Hub operations using the hf CLI. Download models/datasets, upload files, manage repos, and run cloud compute jobs. | [SKILL.md](skills/hf-cli/SKILL.md) |
| `hugging-face-dataset-viewer` | Explore, query, and extract data from any Hugging Face dataset using the Dataset Viewer REST API and npx tooling. Zero Python dependencies — covers split/config discovery, row pagination, text search, filtering, SQL via parquetlens, and dataset upload via CLI. | [SKILL.md](skills/hugging-face-dataset-viewer/SKILL.md) |
| `hugging-face-datasets` | Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. | [SKILL.md](skills/hugging-face-datasets/SKILL.md) |
| `hugging-face-evaluation` | Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom evaluations with vLLM/lighteval. | [SKILL.md](skills/hugging-face-evaluation/SKILL.md) |
| `hugging-face-jobs` | Run compute jobs on Hugging Face infrastructure. Execute Python scripts, manage scheduled jobs, and monitor job status. | [SKILL.md](skills/hugging-face-jobs/SKILL.md) |
| `hugging-face-model-trainer` | Train or fine-tune language models using TRL on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes hardware selection, cost estimation, Trackio monitoring, and Hub persistence. | [SKILL.md](skills/hugging-face-model-trainer/SKILL.md) |
| `hugging-face-paper-pages` | Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata like authors, linked models, datasets, Spaces, and media URLs when needed. | [SKILL.md](skills/hugging-face-paper-pages/SKILL.md) |
| `hugging-face-paper-publisher` | Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles. | [SKILL.md](skills/hugging-face-paper-publisher/SKILL.md) |
| `hugging-face-tool-builder` | Build reusable scripts for Hugging Face API operations. Useful for chaining API calls or automating repeated tasks. | [SKILL.md](skills/hugging-face-tool-builder/SKILL.md) |
| `hugging-face-trackio` | Track and visualize ML training experiments with Trackio. Log metrics via Python API and retrieve them via CLI. Supports real-time dashboards synced to HF Spaces. | [SKILL.md](skills/hugging-face-trackio/SKILL.md) |
| `hugging-face-vision-trainer` | Train and fine-tune object detection models (RTDETRv2, YOLOS, DETR and others) and image classification models (timm and transformers models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3) using Transformers Trainer API on Hugging Face Jobs infrastructure or locally. Includes COCO dataset format support, Albumentations augmentation, mAP/mAR metrics, trackio tracking, hardware selection, and Hub persistence. | [SKILL.md](skills/hugging-face-vision-trainer/SKILL.md) |
| `transformers-js` | Run state-of-the-art machine learning models directly in JavaScript/TypeScript for NLP, computer vision, audio processing, and multimodal tasks. Works in Node.js and browsers with WebGPU/WASM using Hugging Face models. | [SKILL.md](skills/transformers.js/SKILL.md) |
<!-- END_SKILLS_TABLE -->

### Using skills in your coding agent

Once a skill is installed, mention it directly while giving your coding agent instructions:

- "Use the HF LLM trainer skill to estimate the GPU memory needed for a 70B model run."
- "Use the HF model evaluation skill to launch `run_eval_job.py` on the latest checkpoint."
- "Use the HF dataset creator skill to draft new few-shot classification templates."
- "Use the HF paper publisher skill to index my arXiv paper and link it to my model."

Your coding agent automatically loads the corresponding `SKILL.md` instructions and helper scripts while it completes the task.

### Contribute or customize a skill

1. Copy one of the existing skill folders (for example, `hf-datasets/`) and rename it.
2. Update the new folder's `SKILL.md` frontmatter:
   ```markdown
   ---
   name: my-skill-name
   description: Describe what the skill does and when to use it
   ---

   # Skill Title
   Guidance + examples + guardrails
   ```
3. Add or edit supporting scripts, templates, and documents referenced by your instructions.
4. Add an entry to `.claude-plugin/marketplace.json` with a concise, human-readable description.
5. Run:
   ```bash
   ./scripts/publish.sh
   ```
   to regenerate and validate all generated metadata.
6. Reinstall or reload the skill bundle in your coding agent so the updated folder is available.

### Marketplace

The `.claude-plugin/marketplace.json` file lists skills with human-readable descriptions for the plugin marketplace. The CI validates that skill names and paths match between `SKILL.md` files and `marketplace.json`, but descriptions are maintained separately: `SKILL.md` descriptions guide when Claude activates the skill, while marketplace descriptions are written for humans browsing available skills.

### Additional references
- Browse the latest instructions, scripts, and templates directly at [huggingface/skills](https://github.com/huggingface/skills).
- Review Hugging Face documentation for the specific libraries or workflows you reference inside each skill.


================================================
FILE: agents/AGENTS.md
================================================
<skills>

You have additional SKILLs documented in directories containing a "SKILL.md" file.

These skills are:
 - gradio -> "skills/huggingface-gradio/SKILL.md"
 - hf-cli -> "skills/hf-cli/SKILL.md"
 - hugging-face-dataset-viewer -> "skills/hugging-face-dataset-viewer/SKILL.md"
 - hugging-face-datasets -> "skills/hugging-face-datasets/SKILL.md"
 - hugging-face-evaluation -> "skills/hugging-face-evaluation/SKILL.md"
 - hugging-face-jobs -> "skills/hugging-face-jobs/SKILL.md"
 - hugging-face-model-trainer -> "skills/hugging-face-model-trainer/SKILL.md"
 - hugging-face-paper-pages -> "skills/hugging-face-paper-pages/SKILL.md"
 - hugging-face-paper-publisher -> "skills/hugging-face-paper-publisher/SKILL.md"
 - hugging-face-tool-builder -> "skills/hugging-face-tool-builder/SKILL.md"
 - hugging-face-trackio -> "skills/hugging-face-trackio/SKILL.md"
 - hugging-face-vision-trainer -> "skills/hugging-face-vision-trainer/SKILL.md"
 - transformers-js -> "skills/transformers.js/SKILL.md"

IMPORTANT: You MUST read the SKILL.md file whenever the description of the skills matches the user intent, or may help accomplish their task. 

<available_skills>

gradio: `Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, components, event listeners, layouts, or chatbots.`
hf-cli: `"Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing repositories, models, datasets, and Spaces on the Hugging Face Hub. Replaces now deprecated `huggingface-cli` command."`
hugging-face-dataset-viewer: `Use this skill for Hugging Face Dataset Viewer API workflows that fetch subset/split metadata, paginate rows, search text, apply filters, download parquet URLs, and read size or statistics.`
hugging-face-datasets: `Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.`
hugging-face-evaluation: `Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.`
hugging-face-jobs: `This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.`
hugging-face-model-trainer: `This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.`
hugging-face-paper-pages: `Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata such as authors, linked models/datasets/spaces, Github repo and project page. Use when the user shares a Hugging Face paper page URL, an arXiv URL or ID, or asks to summarize, explain, or analyze an AI research paper.`
hugging-face-paper-publisher: `Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.`
hugging-face-tool-builder: `Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.`
hugging-face-trackio: `Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API), firing alerts for training diagnostics, or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, alerts with webhooks, HF Space syncing, and JSON output for automation.`
hugging-face-vision-trainer: `Trains and fine-tunes vision models for object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3 — plus any Transformers classifier), and SAM/SAM2 segmentation using Hugging Face Transformers on Hugging Face Jobs cloud GPUs. Covers COCO-format dataset preparation, Albumentations augmentation, mAP/mAR evaluation, accuracy metrics, SAM segmentation with bbox/point prompts, DiceCE loss, hardware selection, cost estimation, Trackio monitoring, and Hub persistence. Use when users mention training object detection, image classification, SAM, SAM2, segmentation, image matting, DETR, D-FINE, RT-DETR, ViT, timm, MobileNet, ResNet, bounding box models, or fine-tuning vision models on Hugging Face Jobs.`
transformers-js: `Use Transformers.js to run state-of-the-art machine learning models directly in JavaScript/TypeScript. Supports NLP (text classification, translation, summarization), computer vision (image classification, object detection), audio (speech recognition, audio classification), and multimodal tasks. Works in Node.js and browsers (with WebGPU/WASM) using pre-trained models from Hugging Face Hub.`
</available_skills>

Paths referenced within SKILL folders are relative to that SKILL. For example the hf-datasets `scripts/example.py` would be referenced as `hf-datasets/scripts/example.py`. 

</skills>


================================================
FILE: apps/evals-leaderboard/app.py
================================================
#!/usr/bin/env python3
"""
Evals Leaderboard - Gradio app for displaying model evaluation scores.

Reads leaderboard data from the hf-skills/evals-leaderboard dataset.
Run collect_evals.py separately to update the dataset.

Usage:
    python app.py
"""

from __future__ import annotations

import json

import gradio as gr
import requests

TABLE_HEADERS = [
    "Model",
    "Benchmark",
    "Score",
    "Source",
]

TABLE_DATATYPES = [
    "markdown",
    "text",
    "number",
    "markdown",
]


DATASET_REPO = "hf-skills/evals-leaderboard"
LEADERBOARD_URL = f"https://huggingface.co/datasets/{DATASET_REPO}/raw/main/data/leaderboard.jsonl"
METADATA_URL = f"https://huggingface.co/datasets/{DATASET_REPO}/raw/main/data/metadata.json"


def format_model_link(model_id: str) -> str:
    """Format model ID as a clickable link."""
    return f"[{model_id}](https://huggingface.co/{model_id})"


def format_source_link(source_type: str, contributor: str, source_url: str) -> str:
    """Format source as a clickable link."""
    return f"{source_type} by [{contributor}]({source_url})"


def fetch_leaderboard() -> tuple[list[dict], dict]:
    """Fetch leaderboard data from the HF dataset."""
    # Fetch leaderboard JSONL
    resp = requests.get(LEADERBOARD_URL, timeout=30)
    resp.raise_for_status()
    leaderboard = [json.loads(line) for line in resp.text.strip().split("\n") if line]

    # Fetch metadata
    resp = requests.get(METADATA_URL, timeout=30)
    resp.raise_for_status()
    metadata = resp.json()

    return leaderboard, metadata


def refresh_handler() -> tuple[str, list[list]]:
    """Refresh the leaderboard data from the dataset."""
    try:
        leaderboard, metadata = fetch_leaderboard()

        # Build table rows
        rows = []
        for entry in leaderboard:
            rows.append(
                [
                    format_model_link(entry["model_id"]),
                    entry["benchmark"],
                    entry["score"],
                    format_source_link(
                        entry["source_type"],
                        entry["contributor"],
                        entry["source_url"],
                    ),
                ]
            )

        status = "\n".join(
            [
                f"**Data from:** [{DATASET_REPO}](https://huggingface.co/datasets/{DATASET_REPO})",
                f"**Last updated:** {metadata.get('generated_at', 'unknown')}",
                f"**Models with scores:** {metadata.get('models_with_scores', 'unknown')}",
                f"**Total entries:** {metadata.get('total_entries', len(leaderboard))}",
            ]
        )

        return status, rows

    except Exception as e:
        return f"❌ Failed to load leaderboard: {e}", []


with gr.Blocks() as demo:
    gr.Markdown(
        """
        # 📊 HF Evaluation Leaderboard
        
        Shows MMLU, BigCodeBench, and ARC MC scores pulled from model-index
        metadata or their pull requests for trending text-generation models.
        """
    )

    status_box = gr.Markdown("Loading leaderboard...")

    leaderboard_table = gr.Dataframe(
        headers=TABLE_HEADERS,
        datatype=TABLE_DATATYPES,
        interactive=False,
        wrap=True,
    )

    demo.load(
        refresh_handler,
        outputs=[status_box, leaderboard_table],
    )

    gr.Markdown(
        f"""
        ---
        
        **Links:**
        - [Dataset: {DATASET_REPO}](https://huggingface.co/datasets/{DATASET_REPO})
        - [GitHub Repository](https://github.com/huggingface/skills)
        """
    )


if __name__ == "__main__":
    demo.launch()


================================================
FILE: apps/evals-leaderboard/collect_evals.py
================================================
#!/usr/bin/env python3
"""
Collect evaluation scores from trending models' model-index metadata.

Scans trending text-generation models on the Hub and extracts benchmark
scores from their model-index metadata or open pull requests.

Results are saved to a dataset for the evals leaderboard.

Usage:
    python collect_evals.py [--push-to-hub]
"""

from __future__ import annotations

import argparse
import json
import os
import re
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional

import requests
import yaml
from huggingface_hub import hf_hub_download
from huggingface_hub.utils import HfHubHTTPError

API_BASE = "https://huggingface.co/api"
PIPELINE_FILTER = "text-generation"
TRENDING_LIMIT = 50
TRENDING_FETCH_LIMIT = 100
PR_SCAN_LIMIT = 40
USER_AGENT = "skills-evals-leaderboard/0.3"


def _normalize(text: Optional[str]) -> str:
    if not text:
        return ""
    text = text.lower()
    text = re.sub(r"[^a-z0-9]+", " ", text)
    return text.strip()


def _coerce_score(value: Any) -> Optional[float]:
    if value is None:
        return None
    if isinstance(value, (int, float)):
        return float(value)
    if isinstance(value, str):
        candidate = value.strip()
        if candidate.endswith("%"):
            candidate = candidate[:-1]
        try:
            return float(candidate)
        except ValueError:
            return None
    return None


@dataclass(frozen=True)
class BenchmarkSpec:
    key: str
    label: str
    aliases: tuple[str, ...]

    def matches(self, fields: List[str]) -> bool:
        for alias in self.aliases:
            alias_norm = _normalize(alias)
            if not alias_norm:
                continue
            for field in fields:
                if alias_norm in field:
                    return True
        return False


BENCHMARKS: Dict[str, BenchmarkSpec] = {
    "mmlu": BenchmarkSpec(
        key="mmlu",
        label="MMLU",
        aliases=("mmlu", "massive multitask language understanding"),
    ),
    "bigcodebench": BenchmarkSpec(
        key="bigcodebench",
        label="BigCodeBench",
        aliases=("bigcodebench", "big code bench"),
    ),
    "arc_mc": BenchmarkSpec(
        key="arc_mc",
        label="ARC MC",
        aliases=(
            "arc mc",
            "arc-challenge",
            "arc challenge",
            "arc multiple choice",
            "arc c",
        ),
    ),
}


class EvalsCollector:
    """Collects evaluation scores from model-index metadata."""

    def __init__(self, token: str | None = None) -> None:
        self.token = token
        self.session = requests.Session()
        self.session.headers.update({"User-Agent": USER_AGENT})
        if token:
            self.session.headers.update({"Authorization": f"Bearer {token}"})
        self.logs: List[str] = []
        self.results: List[Dict[str, Any]] = []

    def log(self, message: str) -> None:
        """Add a log message."""
        print(message)
        self.logs.append(message)

    def collect_all(self) -> List[Dict[str, Any]]:
        """Collect evaluation scores from trending models."""
        self.log("🔍 Fetching trending text-generation models...")
        trending = self._fetch_trending_models()

        for entry in trending:
            repo_id = entry.get("modelId") or entry.get("id")
            if not repo_id:
                continue
            scores = self._collect_scores(repo_id)
            if scores["scores"]:
                self.results.extend(self._format_scores(repo_id, scores["scores"]))

        self.log(f"✅ Collected {len(self.results)} evaluation entries")
        return self.results

    def _fetch_trending_models(self) -> List[Dict[str, Any]]:
        params = {"sort": "trendingScore", "limit": TRENDING_FETCH_LIMIT}
        response = self.session.get(
            f"{API_BASE}/models",
            params=params,
            timeout=30,
        )
        response.raise_for_status()
        data = response.json()
        if not isinstance(data, list):
            raise ValueError("Unexpected trending response.")
        filtered = [
            model
            for model in data
            if (model.get("pipeline_tag") == PIPELINE_FILTER or PIPELINE_FILTER in (model.get("tags") or []))
        ]
        if not filtered:
            self.log("⚠️ No text-generation models in trending feed.")
            return []
        limited = filtered[:TRENDING_LIMIT]
        self.log(f"📊 Found {len(limited)} trending text-generation models")
        return limited

    def _collect_scores(self, repo_id: str) -> Dict[str, Any]:
        owner = repo_id.split("/")[0]
        card_meta = self._read_model_card(repo_id)
        model_index = card_meta.get("model-index")
        if model_index:
            self.log(f"✅ {repo_id}: model card metadata found.")
            scores = self._extract_scores(
                repo_id=repo_id,
                model_index=model_index,
                contributor=owner,
                source_type="model-card",
                source_url=f"https://huggingface.co/{repo_id}",
                revision="main",
            )
            if scores:
                return {"model_id": repo_id, "scores": scores}

        prs = self._fetch_pull_requests(repo_id)
        for pr in prs:
            revision = f"refs/pr/{pr['num']}"
            pr_meta = self._read_model_card(repo_id, revision=revision)
            pr_index = pr_meta.get("model-index")
            if not pr_index:
                continue
            author_info = pr.get("author", {}) or {}
            contributor = author_info.get("name") or author_info.get("fullname") or "unknown-author"
            discussion_path = f"{repo_id}/discussions/{pr['num']}"
            source_url = f"https://huggingface.co/{discussion_path}"
            scores = self._extract_scores(
                repo_id=repo_id,
                model_index=pr_index,
                contributor=contributor,
                source_type="pull-request",
                source_url=source_url,
                revision=revision,
            )
            if scores:
                note = f"📝 {repo_id}: PR #{pr['num']} by {contributor}."
                self.log(note)
                return {"model_id": repo_id, "scores": scores}

        self.log(f"⚠️ {repo_id}: no target benchmarks located.")
        return {"model_id": repo_id, "scores": {}}

    def _read_model_card(
        self,
        repo_id: str,
        revision: Optional[str] = None,
    ) -> Dict[str, Any]:
        try:
            path = hf_hub_download(
                repo_id=repo_id,
                filename="README.md",
                repo_type="model",
                revision=revision,
                token=self.token,
            )
        except HfHubHTTPError as err:
            ctx = f"{repo_id} ({revision or 'main'})"
            self.log(f"🚫 {ctx}: README download failed ({err}).")
            return {}
        text = Path(path).read_text(encoding="utf-8", errors="ignore")
        return self._parse_front_matter(text)

    @staticmethod
    def _parse_front_matter(content: str) -> Dict[str, Any]:
        content = content.lstrip("\ufeff")
        if not content.startswith("---"):
            return {}
        lines = content.splitlines()
        end_idx = None
        for idx, line in enumerate(lines[1:], start=1):
            if line.strip() == "---":
                end_idx = idx
                break
        if end_idx is None:
            return {}
        front_matter = "\n".join(lines[1:end_idx])
        try:
            data = yaml.safe_load(front_matter) or {}
            return data if isinstance(data, dict) else {}
        except yaml.YAMLError:
            return {}

    def _fetch_pull_requests(self, repo_id: str) -> List[Dict[str, Any]]:
        url = f"{API_BASE}/models/{repo_id}/discussions"
        try:
            response = self.session.get(
                url,
                params={"limit": PR_SCAN_LIMIT},
                timeout=30,
            )
            response.raise_for_status()
        except requests.RequestException as err:
            self.log(f"🚫 {repo_id}: PR list request failed ({err}).")
            return []

        payload = response.json()
        discussions = payload.get("discussions", [])
        prs = [disc for disc in discussions if disc.get("isPullRequest")]
        prs.sort(key=lambda item: item.get("createdAt", ""), reverse=True)
        if prs:
            self.log(f"📬 {repo_id}: scanning {len(prs)} pull requests.")
        return prs

    def _extract_scores(
        self,
        repo_id: str,
        model_index: Any,
        contributor: str,
        source_type: str,
        source_url: str,
        revision: str,
    ) -> Dict[str, Dict[str, Any]]:
        if not isinstance(model_index, list):
            return {}
        scores: Dict[str, Dict[str, Any]] = {}
        for entry in model_index:
            if not isinstance(entry, dict):
                continue
            model_name = entry.get("name") or repo_id.split("/")[-1]
            for result in entry.get("results", []):
                dataset_info = result.get("dataset") or {}
                dataset_name = dataset_info.get("name")
                dataset_type = dataset_info.get("type")
                task_info = result.get("task") or {}
                task_type = task_info.get("type")
                for metric in result.get("metrics", []):
                    benchmark_key = self._match_benchmark(
                        dataset_name,
                        dataset_type,
                        metric,
                    )
                    if not benchmark_key:
                        continue
                    raw_value = metric.get("value")
                    value = _coerce_score(raw_value)
                    if value is None:
                        continue
                    unit = metric.get("unit") or ""
                    is_pct = isinstance(raw_value, str) and raw_value.strip().endswith("%")
                    if not unit and is_pct:
                        unit = "%"
                    metric_name = metric.get("name") or metric.get("type") or ""
                    payload = {
                        "model": repo_id,
                        "model_name": model_name,
                        "benchmark_key": benchmark_key,
                        "benchmark_label": BENCHMARKS[benchmark_key].label,
                        "value": value,
                        "unit": unit,
                        "dataset": dataset_name or dataset_type or "",
                        "task_type": task_type or "",
                        "metric_name": metric_name,
                        "contributor": contributor,
                        "source_type": source_type,
                        "source_url": source_url,
                        "revision": revision,
                    }
                    existing = scores.get(benchmark_key)
                    if not existing or value > existing["value"]:
                        scores[benchmark_key] = payload
        return scores

    def _match_benchmark(
        self,
        dataset_name: Optional[str],
        dataset_type: Optional[str],
        metric: Dict[str, Any],
    ) -> Optional[str]:
        fields = [
            _normalize(dataset_name),
            _normalize(dataset_type),
            _normalize(metric.get("name")),
            _normalize(metric.get("type")),
        ]
        fields = [field for field in fields if field]
        for key, spec in BENCHMARKS.items():
            if spec.matches(fields):
                return key
        return None

    def _format_scores(self, model_id: str, scores: Dict[str, Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Format scores as flat records for the dataset."""
        rows = []
        for benchmark_key, payload in scores.items():
            rows.append(
                {
                    "model_id": model_id,
                    "benchmark": payload["benchmark_label"],
                    "benchmark_key": benchmark_key,
                    "score": round(payload["value"], 2),
                    "source_type": payload["source_type"],
                    "source_url": payload["source_url"],
                    "contributor": payload["contributor"],
                    "collected_at": datetime.now(timezone.utc).isoformat(),
                }
            )
        return rows

    def get_leaderboard(self) -> List[Dict[str, Any]]:
        """Get results sorted by score descending."""
        return sorted(self.results, key=lambda x: x["score"], reverse=True)

    def save_json(self, filepath: str) -> None:
        """Save the leaderboard to a JSON file."""
        leaderboard = self.get_leaderboard()
        output = {
            "generated_at": datetime.now(timezone.utc).isoformat(),
            "total_entries": len(leaderboard),
            "benchmarks": list(BENCHMARKS.keys()),
            "leaderboard": leaderboard,
        }
        with open(filepath, "w") as f:
            json.dump(output, f, indent=2)
        self.log(f"💾 Saved leaderboard to {filepath}")

    def push_to_hub(self, repo_id: str = "hf-skills/evals-leaderboard") -> None:
        """Push the leaderboard data to a HF dataset."""
        try:
            from huggingface_hub import HfApi
        except ImportError:
            self.log("❌ huggingface_hub not installed. Run: pip install huggingface_hub")
            return

        api = HfApi(token=self.token)
        leaderboard = self.get_leaderboard()

        # Create dataset as JSONL
        jsonl_content = "\n".join(json.dumps(row) for row in leaderboard)

        # Create metadata file
        metadata = {
            "generated_at": datetime.now(timezone.utc).isoformat(),
            "total_entries": len(leaderboard),
            "models_with_scores": len(set(r["model_id"] for r in leaderboard)),
            "benchmarks": list(BENCHMARKS.keys()),
        }

        try:
            # Create repo if it doesn't exist
            api.create_repo(repo_id=repo_id, repo_type="dataset", exist_ok=True)
            self.log(f"📁 Ensured dataset repo exists: {repo_id}")

            # Upload leaderboard data
            api.upload_file(
                path_or_fileobj=jsonl_content.encode(),
                path_in_repo="data/leaderboard.jsonl",
                repo_id=repo_id,
                repo_type="dataset",
                commit_message=f"Update leaderboard - {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')} UTC",
            )

            # Upload metadata
            api.upload_file(
                path_or_fileobj=json.dumps(metadata, indent=2).encode(),
                path_in_repo="data/metadata.json",
                repo_id=repo_id,
                repo_type="dataset",
                commit_message=f"Update metadata - {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')} UTC",
            )

            self.log(f"🚀 Pushed leaderboard to {repo_id}")
        except Exception as e:
            self.log(f"❌ Failed to push to hub: {e}")


def main() -> None:
    parser = argparse.ArgumentParser(description="Collect evaluation scores from model-index metadata")
    parser.add_argument(
        "--push-to-hub",
        action="store_true",
        help="Push results to HF dataset",
    )
    parser.add_argument(
        "--output",
        type=str,
        default="leaderboard.json",
        help="Output JSON file path",
    )
    parser.add_argument(
        "--repo-id",
        type=str,
        default="hf-skills/evals-leaderboard",
        help="HF dataset repo ID for pushing",
    )
    args = parser.parse_args()

    token = os.environ.get("HF_TOKEN")
    if not token:
        print("⚠️ No HF_TOKEN found. Some requests may be rate-limited.")

    collector = EvalsCollector(token=token)
    collector.collect_all()

    # Print leaderboard summary
    print("\n" + "=" * 60)
    print("📊 EVALUATION LEADERBOARD")
    print("=" * 60)

    leaderboard = collector.get_leaderboard()
    for entry in leaderboard[:20]:
        print(f"{entry['model_id']:40} | {entry['benchmark']:12} | {entry['score']:6.2f}")

    if len(leaderboard) > 20:
        print(f"   ... and {len(leaderboard) - 20} more entries")

    print("=" * 60)
    print(f"Total entries: {len(leaderboard)}")
    print(f"Models with scores: {len(set(r['model_id'] for r in leaderboard))}")

    # Save locally
    collector.save_json(args.output)

    # Push to hub if requested
    if args.push_to_hub:
        collector.push_to_hub(args.repo_id)


if __name__ == "__main__":
    main()


================================================
FILE: apps/evals-leaderboard/requirements.txt
================================================
gradio
requests
pyyaml
huggingface_hub



================================================
FILE: apps/hackers-leaderboard/README.md
================================================
---
title: Hackers Leaderboard
emoji: 🏆
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
pinned: false
---

# Hackers Leaderboard

Tracks engagement from the [hf-skills](https://huggingface.co/hf-skills) organization for the hackathon leaderboard.

## How Points Work

Simple and fair - **1 point per activity**:

| Activity | Points |
|----------|--------|
| 💬 Open a discussion | 1 |
| 📝 Post a comment | 1 |
| 🔀 Open a PR | 1 |
| 📦 Own/create a repo | 1 |

## Scripts

### Collect Points

```bash
# Collect org activity only
HF_TOKEN=$HF_TOKEN python collect_points.py

# Also scan trending repos for member PRs/discussions
HF_TOKEN=$HF_TOKEN python collect_points.py --scan-external

# Scan only specific repo types
HF_TOKEN=$HF_TOKEN python collect_points.py --scan-external --repo-type models
HF_TOKEN=$HF_TOKEN python collect_points.py --scan-external --repo-type models datasets

# Push to HF dataset
HF_TOKEN=$HF_TOKEN python collect_points.py --scan-external --push-to-hub

# Custom output
python collect_points.py --output my_leaderboard.json --repo-id my-org/my-dataset
```

### Options

| Flag | Description |
|------|-------------|
| `--scan-external` | Scan trending repos across Hub for member activity |
| `--repo-type` | Filter external scan to: `models`, `datasets`, `spaces` |
| `--push-to-hub` | Push results to HF dataset |
| `--repo-id` | Target dataset repo (default: `hf-skills/hackers-leaderboard`) |
| `--output` | Local JSON output path |

### Run the App

```bash
HF_TOKEN=$HF_TOKEN python app.py
```

## API

The collector scans:
- All models, datasets, and spaces in the org
- All discussions and PRs on those repos
- All comments on discussions

Results are saved as JSONL for easy dataset consumption.

## Output Format

```json
{
  "username": "user123",
  "total_points": 15,
  "discussions_opened": 3,
  "comments_made": 8,
  "prs_opened": 2,
  "repos_owned": 2
}
```



================================================
FILE: apps/hackers-leaderboard/app.py
================================================
#!/usr/bin/env python3
"""
Hackers Leaderboard - Gradio app for displaying engagement from hf-skills org.

Reads leaderboard data from the hf-skills/hackers-leaderboard dataset.
Run collect_points.py separately to update the dataset.

Usage:
    python app.py
"""

from __future__ import annotations

import json

import gradio as gr
import requests

TABLE_HEADERS = [
    "Rank",
    "Username",
    "Points",
    "💬 Discussions",
]

TABLE_DATATYPES = [
    "number",
    "markdown",
    "number",
]


DATASET_REPO = "hf-skills/hackers-leaderboard"
LEADERBOARD_URL = f"https://huggingface.co/datasets/{DATASET_REPO}/raw/main/data/leaderboard.jsonl"
METADATA_URL = f"https://huggingface.co/datasets/{DATASET_REPO}/raw/main/data/metadata.json"


def format_username(username: str) -> str:
    """Format username as a clickable link."""
    return f"[{username}](https://huggingface.co/{username})"


def fetch_leaderboard() -> tuple[list[dict], dict]:
    """Fetch leaderboard data from the HF dataset."""
    # Fetch leaderboard JSONL
    resp = requests.get(LEADERBOARD_URL, timeout=30)
    resp.raise_for_status()
    leaderboard = [json.loads(line) for line in resp.text.strip().split("\n") if line]

    # Fetch metadata
    resp = requests.get(METADATA_URL, timeout=30)
    resp.raise_for_status()
    metadata = resp.json()

    return leaderboard, metadata


def refresh_handler() -> tuple[str, list[list]]:
    """Refresh the leaderboard data from the dataset."""
    try:
        leaderboard, metadata = fetch_leaderboard()

        # Build table rows
        rows = []
        for i, entry in enumerate(leaderboard, 1):
            rows.append(
                [
                    i,
                    format_username(entry["username"]),
                    entry["prs_opened"],
                ]
            )

        status = "\n".join(
            [
                f"**Data from:** [{DATASET_REPO}](https://huggingface.co/datasets/{DATASET_REPO})",
                f"**Last updated:** {metadata.get('generated_at', 'unknown')}",
                f"**Participants:** {metadata.get('total_participants', len(leaderboard))}",
                f"**Total points:** {metadata.get('total_points', sum(e['total_points'] for e in leaderboard))}",
            ]
        )

        return status, rows

    except Exception as e:
        return f"❌ Failed to load leaderboard: {e}", []


with gr.Blocks() as demo:
    gr.HTML(
        """
        <div class="subtitle">
            <img src="https://github.com/huggingface/skills/raw/main/assets/banner.png" alt="Humanity's Last Hackathon (of 2025)" width="100%">
        </div>
        <div class="leaderboard-title"><h1>🏆 Humanity's Last Hackathon Leaderboard</h1></div>
        """
    )

    leaderboard_table = gr.Dataframe(
        headers=TABLE_HEADERS,
        datatype=TABLE_DATATYPES,
        interactive=False,
        wrap=True,
    )

    status_box = gr.Markdown("Click refresh to load the leaderboard...")
    
    demo.load(
        refresh_handler,
        outputs=[status_box, leaderboard_table],
    )

    gr.Markdown(
        """
        ---
        
        **Links:**
        - [Join hf-skills](https://huggingface.co/organizations/hf-skills/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp)
        - [Quest Instructions](https://github.com/huggingface/skills/tree/main/apps/quests)
        - [GitHub Repository](https://github.com/huggingface/skills)
        """
    )

if __name__ == "__main__":
    demo.launch()


================================================
FILE: apps/hackers-leaderboard/collect_points.py
================================================
#!/usr/bin/env python3
"""
Collect engagement points from the hf-skills organization.

Tracks user activity across all repos (models, datasets, spaces) and counts:
- 1 point per discussion opened
- 1 point per comment made
- 1 point per PR opened
- 1 point per repo owned/created

Results are saved to a dataset for the hackers leaderboard.

Usage:
    HF_TOKEN=$HF_TOKEN python collect_points.py [--push-to-hub]
"""

from __future__ import annotations

import argparse
import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any

import requests

API_BASE = "https://huggingface.co/api"
ORG_NAME = "hf-skills"
USER_AGENT = "hf-skills-leaderboard/1.0"
DISCUSSION_LIMIT = 100  # Max discussions to fetch per repo
TRENDING_LIMIT = 50  # Number of trending repos to scan for external PRs


@dataclass
class UserStats:
    """Tracks engagement stats for a single user."""

    username: str
    is_org_member: bool = True
    discussions_opened: int = 0
    comments_made: int = 0
    prs_opened: int = 0
    repos_owned: int = 0
    activities: list[dict[str, Any]] = field(default_factory=list)

    @property
    def total_points(self) -> int:
        return self.discussions_opened + self.comments_made + self.prs_opened + self.repos_owned

    def to_dict(self) -> dict[str, Any]:
        return {
            "username": self.username,
            "is_org_member": self.is_org_member,
            "total_points": self.total_points,
            "discussions_opened": self.discussions_opened,
            "comments_made": self.comments_made,
            "prs_opened": self.prs_opened,
            "repos_owned": self.repos_owned,
        }


class PointsCollector:
    """Collects engagement points from the hf-skills organization."""

    def __init__(self, token: str | None = None) -> None:
        self.token = token
        self.session = requests.Session()
        self.session.headers.update({"User-Agent": USER_AGENT})
        if token:
            self.session.headers.update({"Authorization": f"Bearer {token}"})
        self.user_stats: dict[str, UserStats] = {}
        self.logs: list[str] = []

    def log(self, message: str) -> None:
        """Add a log message."""
        print(message)
        self.logs.append(message)

    def _fetch_org_members(self) -> list[str]:
        """Fetch all members of the organization."""
        try:
            from huggingface_hub import HfApi

            api = HfApi(token=self.token)
            members = list(api.list_organization_members(ORG_NAME))
            usernames = [m.username for m in members if m.username]
            self.log(f"👥 Found {len(usernames)} organization members")
            return usernames
        except Exception as e:
            self.log(f"⚠️ Failed to fetch org members: {e}")
            # Fallback: try the API directly
            try:
                url = f"{API_BASE}/organizations/{ORG_NAME}/members"
                response = self.session.get(url, timeout=30)
                response.raise_for_status()
                members = response.json()
                usernames = [m.get("user") or m.get("username") or m.get("name") for m in members]
                usernames = [u for u in usernames if u]
                self.log(f"👥 Found {len(usernames)} organization members (via API)")
                return usernames
            except Exception as e2:
                self.log(f"⚠️ Fallback also failed: {e2}")
                return []

    def collect_all(self) -> dict[str, UserStats]:
        """Collect points from all repos in the organization."""
        self.log(f"🔍 Scanning organization: {ORG_NAME}")

        # First, get all org members and initialize their stats
        members = self._fetch_org_members()
        for username in members:
            self.user_stats[username] = UserStats(username=username)

        # Collect from all repo types
        models = self._list_repos("models")
        datasets = self._list_repos("datasets")
        spaces = self._list_repos("spaces")

        all_repos = [
            *[(r, "model") for r in models],
            *[(r, "dataset") for r in datasets],
            *[(r, "space") for r in spaces],
        ]

        self.log(f"📦 Found {len(models)} models, {len(datasets)} datasets, {len(spaces)} spaces")

        for repo_info, repo_type in all_repos:
            repo_id = repo_info.get("id") or repo_info.get("modelId")
            if not repo_id:
                continue

            # Credit repo owner
            owner = repo_info.get("author") or repo_id.split("/")[0]
            if owner and owner != ORG_NAME:
                self._add_point(owner, "repos_owned", repo_id, "repo_created")

            # Scan discussions
            self._scan_discussions(repo_id, repo_type)

        return dict(self.user_stats)

    def scan_external_repos(self, repo_types: list[str] | None = None) -> None:
        """Scan trending repos across the Hub for PRs by org members.

        Args:
            repo_types: List of repo types to scan. Options: "models", "datasets", "spaces".
                       If None, scans all types.
        """
        org_members = set(self.user_stats.keys())
        if not org_members:
            self.log("⚠️ No org members loaded. Run collect_all() first.")
            return

        if repo_types is None:
            repo_types = ["models", "datasets", "spaces"]

        self.log(f"🌐 Scanning trending repos for PRs by {len(org_members)} org members...")
        self.log(f"📂 Repo types: {', '.join(repo_types)}")

        for repo_type in repo_types:
            trending = self._fetch_trending(repo_type)
            self.log(f"📈 Scanning {len(trending)} trending {repo_type}...")

            for repo_info in trending:
                repo_id = repo_info.get("id") or repo_info.get("modelId")
                if not repo_id:
                    continue

                # Skip org repos (already scanned)
                if repo_id.startswith(f"{ORG_NAME}/"):
                    continue

                # Scan for PRs/discussions by each org member using author filter
                self._scan_repo_for_members(repo_id, repo_type, org_members)

    def _fetch_trending(self, repo_type: str) -> list[dict[str, Any]]:
        """Fetch trending repos of a given type."""
        endpoint = f"{API_BASE}/{repo_type}"
        params = {"sort": "trendingScore", "limit": TRENDING_LIMIT}

        try:
            response = self.session.get(endpoint, params=params, timeout=30)
            response.raise_for_status()
            return response.json()
        except requests.RequestException as e:
            self.log(f"⚠️ Failed to fetch trending {repo_type}: {e}")
            return []

    def _scan_repo_for_members(self, repo_id: str, repo_type: str, org_members: set[str]) -> None:
        """Scan a repo's discussions for activity by org members using author filter."""
        # Parse namespace and repo from repo_id
        parts = repo_id.split("/")
        if len(parts) != 2:
            return
        namespace, repo = parts

        for member in org_members:
            # Use author filter for efficient querying
            self._fetch_member_discussions(
                repo_type=repo_type,
                namespace=namespace,
                repo=repo,
                author=member,
                discussion_type="pull_request",
            )
            self._fetch_member_discussions(
                repo_type=repo_type,
                namespace=namespace,
                repo=repo,
                author=member,
                discussion_type="discussion",
            )

    def _fetch_member_discussions(
        self,
        repo_type: str,
        namespace: str,
        repo: str,
        author: str,
        discussion_type: str = "all",
    ) -> None:
        """Fetch discussions for a specific author on a repo.

        Uses: GET /api/{repoType}/{namespace}/{repo}/discussions?author={author}&type={type}
        """
        url = f"{API_BASE}/{repo_type}/{namespace}/{repo}/discussions"
        params = {
            "author": author,
            "type": discussion_type,
            "status": "all",
        }

        try:
            response = self.session.get(url, params=params, timeout=30)
            response.raise_for_status()
            data = response.json()
        except requests.RequestException:
            return

        discussions = data.get("discussions", [])
        repo_id = f"{namespace}/{repo}"

        for discussion in discussions:
            is_pr = discussion.get("isPullRequest", False)
            disc_num = discussion.get("num")

            if is_pr:
                self._add_point(author, "prs_opened", repo_id, "external_pr", disc_num)
                self.log(f"🔀 Found PR by {author} on {repo_id}")
            else:
                self._add_point(author, "discussions_opened", repo_id, "external_discussion", disc_num)
                self.log(f"💬 Found discussion by {author} on {repo_id}")

            # Count comments on the discussion
            num_comments = discussion.get("numComments", 0)
            if num_comments > 0:
                self._fetch_discussion_comments(repo_type, namespace, repo, disc_num, author)

    def _fetch_discussion_comments(
        self,
        repo_type: str,
        namespace: str,
        repo: str,
        disc_num: int,
        target_author: str,
    ) -> None:
        """Fetch comments on a discussion and count those by target author."""
        url = f"{API_BASE}/{repo_type}/{namespace}/{repo}/discussions/{disc_num}"

        try:
            response = self.session.get(url, timeout=30)
            response.raise_for_status()
            data = response.json()
        except requests.RequestException:
            return

        repo_id = f"{namespace}/{repo}"
        events = data.get("events", [])
        for event in events:
            if event.get("type") == "comment":
                author_info = event.get("author", {}) or {}
                author = author_info.get("name") or author_info.get("fullname")
                if author == target_author:
                    self._add_point(author, "comments_made", repo_id, "external_comment", disc_num)

    def _list_repos(self, repo_type: str) -> list[dict[str, Any]]:
        """List all repos of a given type in the organization."""
        endpoint = f"{API_BASE}/{repo_type}"
        params = {"author": ORG_NAME, "limit": 1000}

        try:
            response = self.session.get(endpoint, params=params, timeout=30)
            response.raise_for_status()
            return response.json()
        except requests.RequestException as e:
            self.log(f"⚠️ Failed to list {repo_type}: {e}")
            return []

    def _scan_discussions(self, repo_id: str, repo_type: str) -> None:
        """Scan all discussions for a repo and count engagement."""
        # Map repo type to API path
        type_map = {"model": "models", "dataset": "datasets", "space": "spaces"}
        api_type = type_map.get(repo_type, "models")

        url = f"{API_BASE}/{api_type}/{repo_id}/discussions"

        try:
            response = self.session.get(url, params={"limit": DISCUSSION_LIMIT}, timeout=30)
            response.raise_for_status()
            data = response.json()
        except requests.RequestException as e:
            self.log(f"⚠️ Failed to get discussions for {repo_id}: {e}")
            return

        discussions = data.get("discussions", [])
        if not discussions:
            return

        self.log(f"💬 {repo_id}: found {len(discussions)} discussions")

        for discussion in discussions:
            self._process_discussion(repo_id, api_type, discussion)

    def _process_discussion(self, repo_id: str, api_type: str, discussion: dict[str, Any]) -> None:
        """Process a single discussion and its comments."""
        author_info = discussion.get("author", {}) or {}
        author = author_info.get("name") or author_info.get("fullname")
        is_pr = discussion.get("isPullRequest", False)
        disc_num = discussion.get("num")

        if author and author != ORG_NAME:
            activity_type = "pr_opened" if is_pr else "discussion_opened"
            point_type = "prs_opened" if is_pr else "discussions_opened"
            self._add_point(author, point_type, repo_id, activity_type, disc_num)

        # Fetch discussion details to get comments
        if disc_num:
            self._fetch_comments(repo_id, api_type, disc_num)

    def _fetch_comments(self, repo_id: str, api_type: str, disc_num: int) -> None:
        """Fetch and count comments on a discussion."""
        url = f"{API_BASE}/{api_type}/{repo_id}/discussions/{disc_num}"

        try:
            response = self.session.get(url, timeout=30)
            response.raise_for_status()
            data = response.json()
        except requests.RequestException:
            # Silently skip failed comment fetches
            return

        events = data.get("events", [])
        for event in events:
            event_type = event.get("type")
            # Count comments (not the initial post, status changes, etc.)
            if event_type == "comment":
                author_info = event.get("author", {}) or {}
                author = author_info.get("name") or author_info.get("fullname")
                if author and author != ORG_NAME:
                    self._add_point(author, "comments_made", repo_id, "comment", disc_num)

    def _add_point(
        self,
        username: str,
        point_type: str,
        repo_id: str,
        activity_type: str,
        disc_num: int | None = None,
    ) -> None:
        """Add a point to a user's stats."""
        if not username:
            return

        # Initialize stats for users not in the org (external contributors)
        if username not in self.user_stats:
            self.user_stats[username] = UserStats(username=username, is_org_member=False)

        stats = self.user_stats[username]
        current = getattr(stats, point_type, 0)
        setattr(stats, point_type, current + 1)

        stats.activities.append(
            {
                "type": activity_type,
                "repo_id": repo_id,
                "discussion_num": disc_num,
                "timestamp": datetime.now(timezone.utc).isoformat(),
            }
        )

    def get_leaderboard(self) -> list[dict[str, Any]]:
        """Get the leaderboard sorted by total points."""
        leaderboard = [stats.to_dict() for stats in self.user_stats.values()]
        leaderboard.sort(key=lambda x: x["total_points"], reverse=True)
        return leaderboard

    def save_json(self, filepath: str) -> None:
        """Save the leaderboard to a JSON file."""
        leaderboard = self.get_leaderboard()
        output = {
            "generated_at": datetime.now(timezone.utc).isoformat(),
            "organization": ORG_NAME,
            "total_participants": len(leaderboard),
            "leaderboard": leaderboard,
        }
        with open(filepath, "w") as f:
            json.dump(output, f, indent=2)
        self.log(f"💾 Saved leaderboard to {filepath}")

    def push_to_hub(self, repo_id: str = "hf-skills/hackers-leaderboard") -> None:
        """Push the leaderboard data to a HF dataset."""
        try:
            from huggingface_hub import HfApi
        except ImportError:
            self.log("❌ huggingface_hub not installed. Run: pip install huggingface_hub")
            return

        api = HfApi()
        leaderboard = self.get_leaderboard()

        # Create dataset as JSONL
        jsonl_content = "\n".join(json.dumps(row) for row in leaderboard)

        # Also create a metadata file
        metadata = {
            "generated_at": datetime.now(timezone.utc).isoformat(),
            "organization": ORG_NAME,
            "total_participants": len(leaderboard),
            "total_points": sum(row["total_points"] for row in leaderboard),
        }

        try:
            # Create repo if it doesn't exist
            api.create_repo(repo_id=repo_id, repo_type="dataset", exist_ok=True)
            self.log(f"📁 Ensured dataset repo exists: {repo_id}")

            # Upload leaderboard data
            api.upload_file(
                path_or_fileobj=jsonl_content.encode(),
                path_in_repo="data/leaderboard.jsonl",
                repo_id=repo_id,
                repo_type="dataset",
                commit_message=f"Update leaderboard - {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')} UTC",
            )

            # Upload metadata
            api.upload_file(
                path_or_fileobj=json.dumps(metadata, indent=2).encode(),
                path_in_repo="data/metadata.json",
                repo_id=repo_id,
                repo_type="dataset",
                commit_message=f"Update metadata - {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')} UTC",
            )

            self.log(f"🚀 Pushed leaderboard to {repo_id}")
        except Exception as e:
            self.log(f"❌ Failed to push to hub: {e}")


def main() -> None:
    parser = argparse.ArgumentParser(description="Collect engagement points from hf-skills organization")
    parser.add_argument(
        "--push-to-hub",
        action="store_true",
        help="Push results to HF dataset",
    )
    parser.add_argument(
        "--output",
        type=str,
        default="leaderboard.json",
        help="Output JSON file path",
    )
    parser.add_argument(
        "--repo-id",
        type=str,
        default="hf-skills/hackers-leaderboard",
        help="HF dataset repo ID for pushing",
    )
    parser.add_argument(
        "--scan-external",
        action="store_true",
        help="Also scan trending repos for PRs/discussions by org members",
    )
    parser.add_argument(
        "--repo-type",
        type=str,
        nargs="+",
        choices=["models", "datasets", "spaces"],
        default=None,
        help="Repo types to scan (for --scan-external). Default: all types",
    )
    args = parser.parse_args()

    token = os.environ.get("HF_TOKEN")
    if not token:
        print("⚠️ No HF_TOKEN found. Some requests may be rate-limited.")

    collector = PointsCollector(token=token)
    collector.collect_all()

    # Optionally scan external repos for member activity
    if args.scan_external:
        collector.scan_external_repos(repo_types=args.repo_type)

    # Print leaderboard
    print("\n" + "=" * 50)
    print("🏆 HACKERS LEADERBOARD")
    print("=" * 50)

    leaderboard = collector.get_leaderboard()
    for i, entry in enumerate(leaderboard[:20], 1):
        print(
            f"{i:2}. {entry['username']:20} - {entry['total_points']:4} points "
            f"(💬{entry['discussions_opened']} 📝{entry['comments_made']} "
            f"🔀{entry['prs_opened']} 📦{entry['repos_owned']})"
        )

    if len(leaderboard) > 20:
        print(f"   ... and {len(leaderboard) - 20} more participants")

    print("=" * 50)
    print(f"Total participants: {len(leaderboard)}")
    print(f"Total points awarded: {sum(e['total_points'] for e in leaderboard)}")

    # Save locally
    collector.save_json(args.output)

    # Push to hub if requested
    if args.push_to_hub:
        collector.push_to_hub(args.repo_id)


if __name__ == "__main__":
    main()


================================================
FILE: apps/hackers-leaderboard/requirements.txt
================================================
gradio>=5.50.0
huggingface_hub>=1.1.4
requests>=2.32.5

================================================
FILE: apps/quests/02_evaluate-hub-model.md
================================================
# Week 1: Evaluate a Hub Model

📣 TASK: Add evaluation results to model cards across the Hub. Together, we're building a distributed leaderboard of open source model performance.

>[!NOTE]
> Bonus XP for contributing to the leaderboard application. Open a PR [on the hub](https://huggingface.co/spaces/hf-skills/distributed-leaderboard/discussions) or [on GitHub](https://github.com/huggingface/skills/blob/main/apps/evals-leaderboard/app.py) to get your (bonus) XP.

## Why This Matters

Model cards without evaluation data are hard to compare. By adding structured eval results to metadata, we make models easier to compare and review. Your contributions power leaderboards and help the community find the best models for their needs. Also, by doing this in a distributed way, we can share our evaluation results with the community.

## Goals

- Add eval scores to the 100 trending models on the Hub
- Include AIME 2025, BigBenchHard, LiveCodeBench, MMLU, ARC on trending models.
- It is ok to include a subset of the benchmarks available for the model.
- Build a leaderboard application that shows the evaluation results for the trending models.

## XP Tiers

Taking part is simple. We need to get model authors to show evaluation results in their model cards. This is a clean up job!

| Tier            | XP    | Description                                                   | What Counts                                  |
|-----------------|-------|---------------------------------------------------------------|-----------------------------------------------|
| 🐢 Contributor  | 1 XP  | Extract evaluation results from one benchmark and update its model card. | Any PR on the repo with evaluation data.      |
| 🐕 Evaluator    | 5 XP  | Import scores from third-party benchmarks like Artificial Analysis. | Undefined benchmark scores and merged PRs.    |
| 🦁 Advanced     | 20 XP| Run your own evaluation with inspect-ai and publish results.   | Original eval run and merged PR.              |
| 🐉 Bonus        | 20 XP| Contribute to the leaderboard application.                      | Any Merged PR on the hub or GitHub.                  |
| 🤢 Slop         | -20 XP  | Opening none useful PRs.                  | Duplicate PRs, Incorrect Eval Scores, Incorrect Benchmark Scores          |

> [!WARNING]
> This hackathon is about advancing the state of open source AI. We want useful PRs that help everyone out, not just metrics. 

## The Skill

Use `hf-evaluation/` for this quest. Key capabilities:

- Extract evaluation tables from existing README content posted by model authors.
- Import benchmark scores from [Artificial Analysis](https://artificial.com/).
- Run your own evals with [inspect-ai](https://github.com/UKGovernmentBEIS/inspect_ai) on [HF Jobs](https://huggingface.co/docs/huggingface_hub/en/guides/jobs).
- Update model-index metadata in the model card.

>[!NOTE]
> Take a look at the [SKILL.md](https://github.com/huggingface/skills/blob/main/hf-evaluation/SKILL.md) for more details.

### Extract Evaluation Tables from README

1. Pick a Hub model without evaluation data from *trending models* on the hub
2. Use the skill to extract or add a benchmark score
3. Create a PR (or push directly if you own the model)

The agent will use this script to extract evaluation tables from the model's README.

```bash
python hf-evaluation/scripts/evaluation_manager.py extract-readme \
  --repo-id "model-author/model-name" --dry-run
```

### Import Scores from Artificial Analysis

1. Find a model with benchmark data on external sites
2. Use `import-aa` to fetch scores from Artificial Analysis API
3. Create a PR with properly attributed evaluation data

The agent will use this script to fetch scores from Artificial Analysis API and add them to the model card.

```bash
python hf-evaluation/scripts/evaluation_manager.py import-aa \
  --creator-slug "anthropic" --model-name "claude-sonnet-4" \
  --repo-id "target/model" --create-pr
```

### Run your own evaluation with inspect-ai and publish results.

1. Choose an eval task (MMLU, GSM8K, HumanEval, etc.)
2. Run the evaluation on HF Jobs infrastructure
3. Update the model card with your results and methodology

The agent will use this script to run the evaluation on HF Jobs infrastructure and update the model card with the results.

```bash
HF_TOKEN=$HF_TOKEN hf jobs uv run hf-evaluation/scripts/inspect_eval_uv.py \
  --flavor a10g-small --secret HF_TOKEN=$HF_TOKEN \
  -- --model "meta-llama/Llama-2-7b-hf" --task "mmlu"
```

## Tips

- Always use `--dry-run` first to preview changes before pushing
- Check for transposed tables where models are rows and benchmarks are columns
- Be careful with PRs for models you don't own — most maintainers appreciate eval contributions but be respectful.
- Manually validate the extracted scores and close PRs if needed.

## Resources

- [SKILL.md](../../hf-evaluation/SKILL.md) — Full skill documentation
- [Example Usage](../../hf-evaluation/examples/USAGE_EXAMPLES.md) — Worked examples
- [Metric Mapping](../../hf-evaluation/examples/metric_mapping.json) — Standard metric types



================================================
FILE: apps/quests/03_publish-hub-dataset.md
================================================
# Week 2: Publish a Hub Dataset

Create and share high-quality datasets on the Hub. Good data is the foundation of good models—help the community by contributing datasets others can train on.

## Why This Matters

The best open source models are built on openly available datasets. By publishing well-documented, properly structured datasets, you're directly enabling the next generation of model development. Quality matters more than quantity.

## The Skill

Use `hf-datasets/` for this quest. Key capabilities:

- Initialize dataset repos with proper structure
- Multi-format support: chat, classification, QA, completion, tabular
- Template-based validation for data quality
- Streaming uploads without downloading entire datasets

```bash
# Quick setup with a template
python hf-datasets/scripts/dataset_manager.py quick_setup \
  --repo_id "your-username/dataset-name" --template chat
```

## XP Tiers

### 🐢 Starter — 50 XP

**Upload a small, clean dataset with a complete dataset card.**

1. Create a dataset with ≤1,000 rows
2. Write a dataset card covering: license, splits, and data provenance
3. Upload to the Hub under the hackathon organization (or your own account)

**What counts:** Clean data, clear documentation, proper licensing.

```bash
python hf-datasets/scripts/dataset_manager.py init \
  --repo_id "hf-skills/your-dataset-name"

python hf-datasets/scripts/dataset_manager.py add_rows \
  --repo_id "hf-skills/your-dataset-name" \
  --template classification \
  --rows_json "$(cat your_data.json)"
```

### 🐕 Standard — 100 XP

**Publish a conversational dataset with a complete dataset card.**

1. Create a dataset with ≤1,000 rows
2. Write a dataset card covering: license and splits.
3. Upload to the Hub under the hackathon organization.

**What counts:** Clean data, clear documentation, proper licensing.

### 🦁 Advanced — 200 XP

**Translate a dataset into multiple languages and publish it on the Hub.**

1. Find a dataset on the Hub
2. Translate the dataset into multiple languages
3. Publish the translated datasets on the Hub under the hackathon organization

**What counts:** Translated datasets and merged PRs.

## Resources

- [SKILL.md](../../hf-datasets/SKILL.md) — Full skill documentation
- [Templates](../../hf-datasets/templates/) — JSON templates for each format
- [Examples](../../hf-datasets/examples/) — Sample data and system prompts

---

**Next Quest:** [Supervised Fine-Tuning](04_sft-finetune-hub.md)


================================================
FILE: apps/quests/04_sft-finetune-hub.md
================================================
# Week 3: Supervised Fine-Tuning on the Hub

Fine-tune and share models on the Hub. Take a base model, train it on your data, and publish the result for the community to use.

## Why This Matters

Fine-tuning is how we adapt foundation models to specific tasks. By sharing fine-tuned models—along with your training methodology—you're giving the community ready-to-use solutions and reproducible recipes they can learn from.

## The Skill

Use `hf-llm-trainer/` for this quest. Key capabilities:

- **SFT** (Supervised Fine-Tuning) — Standard instruction tuning
- **DPO** (Direct Preference Optimization) — Alignment from preference data
- **GRPO** (Group Relative Policy Optimization) — Online RL training
- Cloud GPU training on HF Jobs—no local setup required
- Trackio integration for real-time monitoring
- GGUF conversion for local deployment

Your coding agent uses `hf_jobs()` to submit training scripts directly to HF infrastructure.

## XP Tiers

We'll announce the XP tiers for this quest soon.

## Resources

- [SKILL.md](../../hf-llm-trainer/SKILL.md) — Full skill documentation
- [SFT Example](../../hf-llm-trainer/scripts/train_sft_example.py) — Production SFT template
- [DPO Example](../../hf-llm-trainer/scripts/train_dpo_example.py) — Production DPO template
- [GRPO Example](../../hf-llm-trainer/scripts/train_grpo_example.py) — Production GRPO template
- [Training Methods](../../hf-llm-trainer/references/training_methods.md) — Method selection guide
- [Hardware Guide](../../hf-llm-trainer/references/hardware_guide.md) — GPU selection


================================================
FILE: apps/quests/README.md
================================================
---
title: README
emoji: 🐠
colorFrom: yellow
colorTo: gray
sdk: static
pinned: false
---

# Humanity's Last Hackathon (of 2025)

<img src="https://github.com/huggingface/skills/raw/main/assets/banner.png" alt="Humanity's Last Hackathon (of 2025)" width="100%">

Welcome to our hackathon!

Whether you’re a tooled up ML engineer, a classicist NLP dev, or an AGI pilled vibe coder, this hackathon is going to be hard work! We’re going to take the latest and greatest coding agents 
and use them to level up open source AI. After all, **why use December to relax and spend time with loved ones, when you can solve AI for all humanity?** Jokes aside, this hackathon is not 
about learning skills from zero or breaking things down in their simplest components. It’s about collaborating, shipping, and making a difference for the open source community.

## What We're Building

Over four weeks, we're using coding agents to level up the open source AI ecosystem:

- **Week 1** — Evaluate models and build a distributed leaderboard
- **Week 2** — Create high-quality datasets for the community  
- **Week 3** — Fine-tune and share models on the Hub
- **Week 4** — Sprint to the finish line together

Every contribution earns XP. Top contributors make the leaderboard. Winners get prizes!

Here's the schedule:

| Date | Event | Link |
|------|-------|------|
| Dec 2 (Mon) | Week 1 Quest Released | [Evaluate a Hub Model](02_evaluate-hub-model.md) |
| Dec 4 (Wed) | Livestream 1 | [Q&A 1](https://youtube.com/live/rworGSh-Rgk?feature=share) |
| Dec 9 (Mon) | Week 2 Quest Released | [Publish a Hub Dataset](03_publish-hub-dataset.md) |
| Dec 11 (Wed) | Livestream 2 | TBA |
| Dec 16 (Mon) | Week 3 Quest Released | [Supervised Fine-Tuning](04_sft-finetune-hub.md) |
| Dec 18 (Wed) | Livestream 3 | TBA |
| Dec 23 (Mon) | Week 4 Community Sprint | TBA |
| Dec 31 (Tue) | Hackathon Ends | TBA

## Getting Started

### 1. Join the Organization

Join [hf-skills](https://huggingface.co/organizations/hf-skills/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp) on Hugging Face. This is where your contributions will be tracked and updated on the leaderboard.

### 2. Set Up Your Coding Agent

Use whatever coding agent you prefer:

- **Claude Code** — `claude` in your terminal
- **Codex** — `codex` CLI
- **Gemini CLI** — `gemini` in your terminal
- **Cursor / Windsurf** — IDE-based agents
- **Open source** — aider, continue, etc.

The skills in this repo work with any agent that can read markdown instructions and run Python scripts. To install the skills, follow the instructions in the [README](../README.md).

### 3. Get Your HF Token

Most quests require a Hugging Face token with write access:

```bash
# mac/linux
curl -LsSf https://hf.co/cli/install.sh | bash

# windows
powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"

# Login (creates/stores your token)
hf auth login
```

This will set your `HF_TOKEN` environment variable.

### 4. Clone the Skills Repo

```bash
git clone https://github.com/huggingface/skills.git
cd skills
```

Point your coding agent at the relevant configuration. Check the [README](../README.md) for instructions on how to use the skills with your coding agent.

## Your First Quest

**Week 1 is live!** Head to [02_evaluate-hub-model.md](02_evaluate-hub-model.md) to start evaluating models and climb the leaderboard.

<iframe
	src="https://hf-skills-hacker-leaderboard.hf.space"
	frameborder="0"
	width="850"
	height="450"
></iframe>

[Leaderboard](https://hf-skills-hacker-leaderboard.hf.space)

## Earning XP

Each quest has three tiers:

| Tier | What it takes | XP |
|------|---------------|-----|
| 🐢 | Complete the basics | 50-75 XP |
| 🐕 | Go deeper with more features | 100-125 XP |
| 🦁 | Ship something impressive | 200-225 XP |

You can complete multiple tiers, and you can complete the same quest multiple times with different models/datasets/spaces.

## Getting Help

- [Discord](https://discord.com/channels/879548962464493619/1442881667986624554) — Join the Hugging Face Discord for real-time help
- [Livestreams](https://www.youtube.com/@HuggingFace/streams) — Weekly streams with walkthroughs and Q&A
- [Issues](https://github.com/huggingface/skills/issues) — Open an issue in this repo if you're stuck

To join the Hackathon, join the organization on the hub and setup your coding agent. 

Ready? Let's ship some AI. 🚀


================================================
FILE: gemini-extension.json
================================================
{
    "name": "huggingface-skills",
    "description": "Provides access to the Hugging Face Skills.",
    "version": "0.1.0",
    "contextFileName": "agents/AGENTS.md",
    "mcpServers": {
        "huggingface-skills": {
            "httpUrl": "https://huggingface.co/mcp?login"
        }
    }
}

================================================
FILE: hf-mcp/skills/hf-mcp/SKILL.md
================================================
---
name: hf-mcp
description: Use Hugging Face Hub via MCP server tools. Search models, datasets, Spaces, papers. Get repo details, fetch documentation, run compute jobs, and use Gradio Spaces as AI tools. Available when connected to the HF MCP server.
---

# Hugging Face MCP Server

Connect AI assistants to the Hugging Face Hub. Setup: https://huggingface.co/settings/mcp

## Use Cases & Examples

### Find the Best Model for a Task

```
User: "Find the best model for code generation"

1. model_search(task="text-generation", query="code", sort="trendingScore", limit=10)
2. hub_repo_details(repo_ids=["top-result-id"], include_readme=true)
```

### Compare Models from Different Providers

```
User: "Compare Llama vs Qwen for text generation"

1. model_search(author="meta-llama", task="text-generation", sort="downloads", limit=5)
2. model_search(author="Qwen", task="text-generation", sort="downloads", limit=5)
3. hub_repo_details(repo_ids=["meta-llama/Llama-3.2-1B", "Qwen/Qwen3-8B"], include_readme=true)
```

### Find Training Datasets

```
User: "Find datasets for sentiment analysis in English"

1. dataset_search(query="sentiment", tags=["language:en", "task_categories:text-classification"], sort="downloads")
2. hub_repo_details(repo_ids=["top-dataset-id"], repo_type="dataset", include_readme=true)
```

### Discover AI Tools (MCP Spaces)

```
User: "Find a tool that can remove image backgrounds"

1. space_search(query="background removal", mcp=true)
2. dynamic_space(operation="view_parameters", space_name="result-space-id")
3. dynamic_space(operation="invoke", space_name="result-space-id", parameters="{...}")
```

### Generate Images

```
User: "Create an image of a robot reading a book"

1. dynamic_space(operation="discover")  # See available tasks
2. gr1_flux1_schnell_infer(prompt="a robot sitting in a library reading a book, warm lighting, detailed")
```

### Research a Topic

```
User: "What are the latest papers on RLHF?"

1. paper_search(query="reinforcement learning from human feedback", results_limit=10)
2. hub_repo_details(repo_ids=["paper-linked-model"], include_readme=true)  # If paper links to models
```

### Learn How to Use a Library

```
User: "How do I fine-tune with LoRA using PEFT?"

1. hf_doc_search(query="LoRA fine-tuning", product="peft")
2. hf_doc_fetch(doc_url="https://huggingface.co/docs/peft/...")
```

### Run a Quick GPU Job

```
User: "Run this Python script on a GPU"

hf_jobs(operation="uv", args={
  "script": "# /// script\n# dependencies = [\"torch\"]\n# ///\nimport torch\nprint(torch.cuda.is_available())",
  "flavor": "t4-small"
})
```

### Train a Model on Cloud GPU

```
User: "Run my training script on an A10G"

hf_jobs(operation="run", args={
  "image": "pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime",
  "command": ["/bin/sh", "-lc", "pip install transformers trl && python train.py"],
  "flavor": "a10g-small",
  "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
```

### Check Job Status

```
User: "What's happening with my training job?"

1. hf_jobs(operation="ps")
2. hf_jobs(operation="logs", args={"job_id": "job-xxxxx"})
```

### Explore What's Trending

```
User: "What models are trending right now?"

model_search(sort="trendingScore", limit=20)
```

### Get Model Card Details

```
User: "Tell me about Mistral-7B"

hub_repo_details(repo_ids=["mistralai/Mistral-7B-v0.1"], include_readme=true)
```

### Find Quantized Models

```
User: "Find GGUF versions of Llama 3"

model_search(query="Llama 3 GGUF", sort="downloads", limit=10)
```

### Use a Gradio Space as a Tool

```
User: "Transcribe this audio file"

1. space_search(query="speech to text transcription", mcp=true)
2. dynamic_space(operation="view_parameters", space_name="openai/whisper")
3. dynamic_space(operation="invoke", space_name="openai/whisper", parameters="{\"audio\": \"...\"}")
```

### Schedule Recurring Jobs

```
User: "Run this data sync every day at midnight"

hf_jobs(operation="scheduled uv", args={
  "script": "...",
  "cron": "0 0 * * *",
  "flavor": "cpu-basic"
})
```

## Tool Selection Guide

| Goal | Tool |
|------|------|
| Find models | `model_search` |
| Find datasets | `dataset_search` |
| Find Spaces/apps | `space_search` |
| Find papers | `paper_search` |
| Get repo README/details | `hub_repo_details` |
| Learn library usage | `hf_doc_search` → `hf_doc_fetch` |
| Run code on GPU/CPU | `hf_jobs` |
| Use Gradio apps as tools | `dynamic_space` |
| Generate images | `gr1_flux1_schnell_infer` or `dynamic_space` |
| Check auth | `hf_whoami` |

## Tips

- Use `sort="trendingScore"` to find what's popular now
- Use `sort="downloads"` to find battle-tested options
- Set `mcp=true` in `space_search` to find Spaces usable as tools
- Use `include_readme=true` in `hub_repo_details` for full model/dataset documentation
- For jobs accessing private repos, always include `secrets: {"HF_TOKEN": "$HF_TOKEN"}`
- Use `dynamic_space(operation="discover")` to see all available Space-based tasks


================================================
FILE: scripts/AGENTS_TEMPLATE.md
================================================
<skills>

You have additional SKILLs documented in directories containing a "SKILL.md" file.

These skills are:
{{#skills}}
 - {{name}} -> "{{path}}/SKILL.md"
{{/skills}}

IMPORTANT: You MUST read the SKILL.md file whenever the description of the skills matches the user intent, or may help accomplish their task. 

<available_skills>

{{#skills}}
{{name}}: `{{description}}`

{{/skills}}
</available_skills>

Paths referenced within SKILL folders are relative to that SKILL. For example the hf-datasets `scripts/example.py` would be referenced as `hf-datasets/scripts/example.py`. 

</skills>


================================================
FILE: scripts/UV_RULES.md
================================================
# UV rules

Use these rules for Python scripts in this repository:

1. **Use PEP 723 inline dependencies** in each runnable script:
   ```python
   # /// script
   # requires-python = ">=3.10"
   # dependencies = ["requests"]
   # ///
   ```
2. **Run scripts with `uv run`**, not `python ...`:
   ```bash
   uv run scripts/my_script.py --help
   ```
3. **Do not document `pip install -r requirements.txt` for repo scripts** unless there is a specific fallback reason. Normal usage should not require manual installation.
4. **Do not tell users to `source .venv/bin/activate` for skill scripts.** `uv run` should be enough.
5. **If a manual install example is truly needed, use `uv pip install ...`**, not `uv add`, unless you are intentionally editing a project-managed environment.
6. **For Hugging Face Jobs UV workloads, use `hf jobs uv run ...`**.


================================================
FILE: scripts/generate_agents.py
================================================
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.10"
# dependencies = []
# ///
"""Generate AGENTS.md from AGENTS_TEMPLATE.md and SKILL.md frontmatter.

Also validates that marketplace.json is in sync with discovered skills,
and updates the skills table in README.md.
"""

from __future__ import annotations

import json
import re
import sys
from pathlib import Path


ROOT = Path(__file__).resolve().parent.parent
TEMPLATE_PATH = ROOT / "scripts" / "AGENTS_TEMPLATE.md"
OUTPUT_PATH = ROOT / "agents" / "AGENTS.md"
MARKETPLACE_PATH = ROOT / ".claude-plugin" / "marketplace.json"
README_PATH = ROOT / "README.md"

# Markers for the auto-generated skills table in README
README_TABLE_START = "<!-- BEGIN_SKILLS_TABLE -->"
README_TABLE_END = "<!-- END_SKILLS_TABLE -->"


def load_template() -> str:
    return TEMPLATE_PATH.read_text(encoding="utf-8")


def parse_frontmatter(text: str) -> dict[str, str]:
    """Parse a minimal YAML-ish frontmatter block without external deps."""
    match = re.search(r"^---\s*\n(.*?)\n---\s*", text, re.DOTALL)
    if not match:
        return {}
    data: dict[str, str] = {}
    for line in match.group(1).splitlines():
        if ":" not in line:
            continue
        key, value = line.split(":", 1)
        data[key.strip()] = value.strip()
    return data


def collect_skills() -> list[dict[str, str]]:
    skills: list[dict[str, str]] = []
    for skill_md in ROOT.glob("skills/*/SKILL.md"):
        meta = parse_frontmatter(skill_md.read_text(encoding="utf-8"))
        name = meta.get("name")
        description = meta.get("description")
        if not name or not description:
            continue
        skills.append(
            {
                "name": name,
                "description": description,
                "path": str(skill_md.parent.relative_to(ROOT)),
            }
        )
    # Keep deterministic order for consistent output
    return sorted(skills, key=lambda s: s["name"].lower())


def render(template: str, skills: list[dict[str, str]]) -> str:
    """Very small Mustache-like renderer that only supports a single skills loop."""
    def repl(match: re.Match[str]) -> str:
        block = match.group(1).strip("\n")
        rendered_blocks = []
        for skill in skills:
            rendered = (
                block.replace("{{name}}", skill["name"])
                .replace("{{description}}", skill["description"])
                .replace("{{path}}", skill["path"])
            )
            rendered_blocks.append(rendered)
        return "\n".join(rendered_blocks)

    # Render loop blocks
    content = re.sub(r"{{#skills}}(.*?){{/skills}}", repl, template, flags=re.DOTALL)
    return content


def load_marketplace() -> dict:
    """Load marketplace.json and return parsed structure."""
    if not MARKETPLACE_PATH.exists():
        raise FileNotFoundError(f"marketplace.json not found at {MARKETPLACE_PATH}")
    return json.loads(MARKETPLACE_PATH.read_text(encoding="utf-8"))


def generate_readme_table(skills: list[dict[str, str]]) -> str:
    """Generate the skills table for README.md using marketplace.json names."""
    marketplace = load_marketplace()
    plugins = {p["source"]: p for p in marketplace.get("plugins", [])}

    lines = [
        "| Name | Description | Documentation |",
        "|------|-------------|---------------|",
    ]

    for skill in skills:
        source = f"./{skill['path']}"
        plugin = plugins.get(source, {})
        name = plugin.get("name", skill["name"])
        description = plugin.get("description", skill["description"])
        doc_link = f"[SKILL.md]({skill['path']}/SKILL.md)"
        lines.append(f"| `{name}` | {description} | {doc_link} |")

    return "\n".join(lines)


def update_readme(skills: list[dict[str, str]]) -> bool:
    """
    Update the README.md skills table between markers.
    Returns True if the file was updated, False if markers not found.
    """
    if not README_PATH.exists():
        print(f"Warning: README.md not found at {README_PATH}", file=sys.stderr)
        return False

    content = README_PATH.read_text(encoding="utf-8")

    start_idx = content.find(README_TABLE_START)
    end_idx = content.find(README_TABLE_END)

    if start_idx == -1 or end_idx == -1:
        print(
            f"Warning: README.md markers not found. Add {README_TABLE_START} and "
            f"{README_TABLE_END} to enable table generation.",
            file=sys.stderr,
        )
        return False

    if end_idx < start_idx:
        print("Warning: README.md markers are in wrong order.", file=sys.stderr)
        return False

    table = generate_readme_table(skills)
    new_content = (
        content[: start_idx + len(README_TABLE_START)]
        + "\n"
        + table
        + "\n"
        + content[end_idx:]
    )

    README_PATH.write_text(new_content, encoding="utf-8")
    return True


def validate_marketplace(skills: list[dict[str, str]]) -> list[str]:
    """
    Validate marketplace.json against discovered skills.
    Returns list of error messages (empty = passed).
    """
    errors: list[str] = []
    marketplace = load_marketplace()
    plugins = marketplace.get("plugins", [])

    # Build lookups (normalize paths: skill uses "skills/x", marketplace uses "./skills/x")
    skill_by_source = {f"./{s['path']}": s for s in skills}
    plugin_by_source = {p["source"]: p for p in plugins}

    # Check: every skill has a marketplace entry with matching name
    for skill in skills:
        expected_source = f"./{skill['path']}"
        if expected_source not in plugin_by_source:
            errors.append(
                f"Skill '{skill['name']}' at '{skill['path']}' is missing from marketplace.json"
            )
        elif plugin_by_source[expected_source]["name"] != skill["name"]:
            errors.append(
                f"Name mismatch at '{expected_source}': "
                f"SKILL.md='{skill['name']}', marketplace.json='{plugin_by_source[expected_source]['name']}'"
            )

    # Check: every marketplace plugin has a corresponding skill
    for plugin in plugins:
        if plugin["source"] not in skill_by_source:
            errors.append(
                f"Marketplace plugin '{plugin['name']}' at '{plugin['source']}' has no SKILL.md"
            )

    return errors


def main() -> None:
    template = load_template()
    skills = collect_skills()
    output = render(template, skills)
    OUTPUT_PATH.parent.mkdir(parents=True, exist_ok=True)
    OUTPUT_PATH.write_text(output, encoding="utf-8")
    print(f"Wrote {OUTPUT_PATH} with {len(skills)} skills.")

    # Validate marketplace.json
    errors = validate_marketplace(skills)
    if errors:
        print("\nMarketplace.json validation errors:", file=sys.stderr)
        for error in errors:
            print(f"  - {error}", file=sys.stderr)
        sys.exit(1)
    print("Marketplace.json validation passed.")

    # Update README.md skills table
    if update_readme(skills):
        print(f"Updated {README_PATH} skills table.")


if __name__ == "__main__":
    main()


================================================
FILE: scripts/generate_cursor_plugin.py
================================================
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.10"
# dependencies = []
# ///
"""Generate Cursor plugin artifacts from existing repo metadata.

Outputs:
- .cursor-plugin/plugin.json
- .mcp.json

Design goals:
- Keep Claude + Cursor metadata in sync.
- Reuse .claude-plugin/plugin.json as primary metadata source.
- Discover skills from skills/*/SKILL.md.
- Reuse MCP URL from gemini-extension.json when available.
"""

from __future__ import annotations

import argparse
import json
import re
import sys
from pathlib import Path


ROOT = Path(__file__).resolve().parent.parent
CLAUDE_PLUGIN_MANIFEST = ROOT / ".claude-plugin" / "plugin.json"
GEMINI_EXTENSION = ROOT / "gemini-extension.json"
CURSOR_PLUGIN_DIR = ROOT / ".cursor-plugin"
CURSOR_PLUGIN_MANIFEST = CURSOR_PLUGIN_DIR / "plugin.json"
CURSOR_MCP_CONFIG = ROOT / ".mcp.json"

DEFAULT_MCP_SERVER_NAME = "huggingface-skills"
DEFAULT_MCP_URL = "https://huggingface.co/mcp?login"

PLUGIN_NAME_RE = re.compile(r"^[a-z0-9](?:[a-z0-9.-]*[a-z0-9])?$")


def load_json(path: Path) -> dict:
    if not path.exists():
        raise FileNotFoundError(f"Missing required file: {path}")
    return json.loads(path.read_text(encoding="utf-8"))


def parse_frontmatter(text: str) -> dict[str, str]:
    match = re.search(r"^---\s*\n(.*?)\n---\s*", text, re.DOTALL)
    if not match:
        return {}
    data: dict[str, str] = {}
    for line in match.group(1).splitlines():
        if ":" not in line:
            continue
        key, value = line.split(":", 1)
        data[key.strip()] = value.strip()
    return data


def collect_skills() -> list[str]:
    skills: list[str] = []
    for skill_md in sorted(ROOT.glob("skills/*/SKILL.md")):
        meta = parse_frontmatter(skill_md.read_text(encoding="utf-8"))
        name = meta.get("name", "").strip()
        if not name:
            continue
        skills.append(name)
    return skills


def validate_plugin_name(name: str) -> None:
    if not PLUGIN_NAME_RE.match(name):
        raise ValueError(
            "Invalid plugin name in .claude-plugin/plugin.json: "
            f"'{name}'. Must be lowercase and match {PLUGIN_NAME_RE.pattern}"
        )


def build_cursor_plugin_manifest() -> dict:
    src = load_json(CLAUDE_PLUGIN_MANIFEST)

    name = src.get("name")
    if not isinstance(name, str) or not name:
        raise ValueError(".claude-plugin/plugin.json must define a non-empty 'name'")
    validate_plugin_name(name)

    skills = collect_skills()
    if not skills:
        raise ValueError("No skills discovered under skills/*/SKILL.md")

    manifest: dict = {"name": name, "skills": "skills", "mcpServers": ".mcp.json"}

    # Copy optional metadata fields when present.
    for key in [
        "description",
        "version",
        "author",
        "homepage",
        "repository",
        "license",
        "keywords",
        "logo",
    ]:
        if key in src:
            manifest[key] = src[key]

    return manifest


def extract_mcp_from_gemini() -> tuple[str, str]:
    """Return (server_name, url) from gemini-extension when available."""
    if not GEMINI_EXTENSION.exists():
        return DEFAULT_MCP_SERVER_NAME, DEFAULT_MCP_URL

    data = load_json(GEMINI_EXTENSION)
    servers = data.get("mcpServers")
    if not isinstance(servers, dict) or not servers:
        return DEFAULT_MCP_SERVER_NAME, DEFAULT_MCP_URL

    # Use first configured server as source of truth.
    server_name = next(iter(servers.keys()))
    server_cfg = servers[server_name]
    if not isinstance(server_cfg, dict):
        return DEFAULT_MCP_SERVER_NAME, DEFAULT_MCP_URL

    url = server_cfg.get("url") or server_cfg.get("httpUrl") or DEFAULT_MCP_URL
    if not isinstance(url, str) or not url.strip():
        url = DEFAULT_MCP_URL

    return server_name, url


def build_mcp_config() -> dict:
    server_name, url = extract_mcp_from_gemini()
    return {
        "mcpServers": {
            server_name: {
                "url": url,
            }
        }
    }


def render_json(data: dict) -> str:
    return json.dumps(data, indent=2, ensure_ascii=False) + "\n"


def write_or_check(path: Path, content: str, check: bool) -> bool:
    """Return True when file is already up-to-date (or after writing in non-check mode)."""
    current = path.read_text(encoding="utf-8") if path.exists() else None
    if current == content:
        return True

    if check:
        return False

    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(content, encoding="utf-8")
    return True


def main() -> None:
    parser = argparse.ArgumentParser(description="Generate Cursor plugin manifest + MCP config")
    parser.add_argument(
        "--check",
        action="store_true",
        help="Validate generated files are up-to-date without writing changes.",
    )
    args = parser.parse_args()

    plugin_manifest = render_json(build_cursor_plugin_manifest())
    mcp_config = render_json(build_mcp_config())

    ok_plugin = write_or_check(CURSOR_PLUGIN_MANIFEST, plugin_manifest, check=args.check)
    ok_mcp = write_or_check(CURSOR_MCP_CONFIG, mcp_config, check=args.check)

    if args.check:
        outdated = []
        if not ok_plugin:
            outdated.append(str(CURSOR_PLUGIN_MANIFEST.relative_to(ROOT)))
        if not ok_mcp:
            outdated.append(str(CURSOR_MCP_CONFIG.relative_to(ROOT)))

        if outdated:
            print("Generated Cursor artifacts are out of date:", file=sys.stderr)
            for item in outdated:
                print(f"  - {item}", file=sys.stderr)
            print("Run: uv run scripts/generate_cursor_plugin.py", file=sys.stderr)
            sys.exit(1)

        print("Cursor plugin artifacts are up to date.")
        return

    print(f"Wrote {CURSOR_PLUGIN_MANIFEST.relative_to(ROOT)}")
    print(f"Wrote {CURSOR_MCP_CONFIG.relative_to(ROOT)}")


if __name__ == "__main__":
    main()


================================================
FILE: scripts/publish.sh
================================================
#!/usr/bin/env bash
set -euo pipefail

ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$ROOT_DIR"

GENERATED_FILES=(
  "agents/AGENTS.md"
  "README.md"
  ".cursor-plugin/plugin.json"
  ".mcp.json"
)

file_sig() {
  local path="$1"
  if [[ -f "$path" ]]; then
    sha256sum "$path" | awk '{print $1}'
  else
    echo "__MISSING__"
  fi
}

run_generate() {
  uv run scripts/generate_agents.py
  uv run scripts/generate_cursor_plugin.py
}

run_check() {
  declare -A before
  local changed=()

  for path in "${GENERATED_FILES[@]}"; do
    before["$path"]="$(file_sig "$path")"
  done

  run_generate

  for path in "${GENERATED_FILES[@]}"; do
    if [[ "${before[$path]}" != "$(file_sig "$path")" ]]; then
      changed+=("$path")
    fi
  done

  if [[ ${#changed[@]} -gt 0 ]]; then
    echo "Generated artifacts are outdated."
    echo "Run: ./scripts/publish.sh"
    echo
    echo "Changed files:"
    for path in "${changed[@]}"; do
      echo "$path"
    done
    exit 1
  fi

  # Extra explicit check for cursor-only artifacts
  uv run scripts/generate_cursor_plugin.py --check

  echo "All generated artifacts are up to date."
}

case "${1:-}" in
  "")
    run_generate
    echo "Publish artifacts generated successfully."
    ;;
  "--check")
    run_check
    ;;
  "-h"|"--help")
    cat <<'EOF'
Usage:
  ./scripts/publish.sh         Generate all publish artifacts
  ./scripts/publish.sh --check Verify generated artifacts are up to date

This script regenerates:
  - agents/AGENTS.md
  - README.md (skills table section)
  - .cursor-plugin/plugin.json
  - .mcp.json
EOF
    ;;
  *)
    echo "Unknown option: $1" >&2
    echo "Use --help for usage." >&2
    exit 2
    ;;
esac


================================================
FILE: scripts/run_skills_help.py
================================================
#!/usr/bin/env python3
"""
Script to execute all Python programs under "skills" directories with `uv run` and --help flag.
"""

import subprocess
from pathlib import Path

def find_python_files():
    """Find all Python files under skills directories."""
    python_files = []
    
    # Search for skills directories and find Python files
    for skills_dir in Path('.').rglob('../skills'):
        if skills_dir.is_dir():
            python_files.extend(skills_dir.rglob('*.py'))
    
    return sorted(set(python_files))

def run_with_help(python_file):
    """Run a Python file with uv run --help."""
    try:
        print(f"\n{'='*60}")
        print(f"Running: {python_file}")
        print(f"{'='*60}")
        
        result = subprocess.run(
            ['uv', 'run', str(python_file), '--help'],
            capture_output=True,
            text=True,
            timeout=30  # 30 second timeout
        )
        
        if result.returncode == 0:
            print("SUCCESS - Output:")
            print(result.stdout)
        else:
            print(f"FAILED - Return code: {result.returncode}")
            if result.stderr:
                print("Stderr:")
                print(result.stderr)
            if result.stdout:
                print("Stdout:")
                print(result.stdout)
                
    except subprocess.TimeoutExpired:
        print("TIMEOUT - Command took too long")
    except FileNotFoundError:
        print("ERROR - uv command not found. Please install uv.")
        return False
    except Exception as e:
        print(f"ERROR - {str(e)}")
    
    return True

def main():
    """Main function to find and run all Python files."""
    print("Finding Python files under skills directories...")
    python_files = find_python_files()
    
    if not python_files:
        print("No Python files found under skills directories.")
        return
    
    print(f"Found {len(python_files)} Python files")
    
    success_count = 0
    failed_count = 0
    
    for python_file in python_files:
        if run_with_help(python_file):
            success_count += 1
        else:
            failed_count += 1
    
    print(f"\n{'='*60}")
    print(f"SUMMARY:")
    print(f"Total files: {len(python_files)}")
    print(f"Successful: {success_count}")
    print(f"Failed: {failed_count}")
    print(f"{'='*60}")

if __name__ == "__main__":
    main()


================================================
FILE: skills/hf-cli/SKILL.md
================================================
---
name: hf-cli
description: "Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing repositories, models, datasets, and Spaces on the Hugging Face Hub. Replaces now deprecated `huggingface-cli` command."
---

Install: `curl -LsSf https://hf.co/cli/install.sh | bash -s`.

The Hugging Face Hub CLI tool `hf` is available. IMPORTANT: The `hf` command replaces the deprecated `huggingface-cli` command.

Use `hf --help` to view available functions. Note that auth commands are now all under `hf auth` e.g. `hf auth whoami`.

Generated with `huggingface_hub v1.7.2`. Run `hf skills add --force` to regenerate.

## Commands

- `hf download REPO_ID` — Download files from the Hub. `[--type CHOICE --revision TEXT --include TEXT --exclude TEXT --cache-dir TEXT --local-dir TEXT --force-download --dry-run --quiet --max-workers INTEGER]`
- `hf env` — Print information about the environment.
- `hf sync` — Sync files between local directory and a bucket. `[--delete --ignore-times --ignore-sizes --plan TEXT --apply TEXT --dry-run --include TEXT --exclude TEXT --filter-from TEXT --existing --ignore-existing --verbose --quiet]`
- `hf upload REPO_ID` — Upload a file or a folder to the Hub. Recommended for single-commit uploads. `[--type CHOICE --revision TEXT --private --include TEXT --exclude TEXT --delete TEXT --commit-message TEXT --commit-description TEXT --create-pr --every FLOAT --quiet]`
- `hf upload-large-folder REPO_ID LOCAL_PATH` — Upload a large folder to the Hub. Recommended for resumable uploads. `[--type CHOICE --revision TEXT --private --include TEXT --exclude TEXT --num-workers INTEGER --no-report --no-bars]`
- `hf version` — Print information about the hf version.

### `hf auth` — Manage authentication (login, logout, etc.).

- `hf auth list` — List all stored access tokens.
- `hf auth login` — Login using a token from huggingface.co/settings/tokens. `[--add-to-git-credential --force]`
- `hf auth logout` — Logout from a specific token. `[--token-name TEXT]`
- `hf auth switch` — Switch between access tokens. `[--token-name TEXT --add-to-git-credential]`
- `hf auth whoami` — Find out which huggingface.co account you are logged in as. `[--format CHOICE]`

### `hf buckets` — Commands to interact with buckets.

- `hf buckets cp SRC` — Copy a single file to or from a bucket. `[--quiet]`
- `hf buckets create BUCKET_ID` — Create a new bucket. `[--private --exist-ok --quiet]`
- `hf buckets delete BUCKET_ID` — Delete a bucket. `[--yes --missing-ok --quiet]`
- `hf buckets info BUCKET_ID` — Get info about a bucket. `[--quiet]`
- `hf buckets list` — List buckets or files in a bucket. `[--human-readable --tree --recursive --format CHOICE --quiet]`
- `hf buckets move FROM_ID TO_ID` — Move (rename) a bucket to a new name or namespace.
- `hf buckets remove ARGUMENT` — Remove files from a bucket. `[--recursive --yes --dry-run --include TEXT --exclude TEXT --quiet]`
- `hf buckets sync` — Sync files between local directory and a bucket. `[--delete --ignore-times --ignore-sizes --plan TEXT --apply TEXT --dry-run --include TEXT --exclude TEXT --filter-from TEXT --existing --ignore-existing --verbose --quiet]`

### `hf cache` — Manage local cache directory.

- `hf cache list` — List cached repositories or revisions. `[--cache-dir TEXT --revisions --filter TEXT --format CHOICE --quiet --sort CHOICE --limit INTEGER]`
- `hf cache prune` — Remove detached revisions from the cache. `[--cache-dir TEXT --yes --dry-run]`
- `hf cache rm TARGETS` — Remove cached repositories or revisions. `[--cache-dir TEXT --yes --dry-run]`
- `hf cache verify REPO_ID` — Verify checksums for a single repo revision from cache or a local directory. `[--type CHOICE --revision TEXT --cache-dir TEXT --local-dir TEXT --fail-on-missing-files --fail-on-extra-files]`

### `hf collections` — Interact with collections on the Hub.

- `hf collections add-item COLLECTION_SLUG ITEM_ID ITEM_TYPE` — Add an item to a collection. `[--note TEXT --exists-ok]`
- `hf collections create TITLE` — Create a new collection on the Hub. `[--namespace TEXT --description TEXT --private --exists-ok]`
- `hf collections delete COLLECTION_SLUG` — Delete a collection from the Hub. `[--missing-ok]`
- `hf collections delete-item COLLECTION_SLUG ITEM_OBJECT_ID` — Delete an item from a collection. `[--missing-ok]`
- `hf collections info COLLECTION_SLUG` — Get info about a collection on the Hub. Output is in JSON format.
- `hf collections list` — List collections on the Hub. `[--owner TEXT --item TEXT --sort CHOICE --limit INTEGER --format CHOICE --quiet]`
- `hf collections update COLLECTION_SLUG` — Update a collection's metadata on the Hub. `[--title TEXT --description TEXT --position INTEGER --private --theme TEXT]`
- `hf collections update-item COLLECTION_SLUG ITEM_OBJECT_ID` — Update an item in a collection. `[--note TEXT --position INTEGER]`

### `hf datasets` — Interact with datasets on the Hub.

- `hf datasets info DATASET_ID` — Get info about a dataset on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`
- `hf datasets list` — List datasets on the Hub. `[--search TEXT --author TEXT --filter TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`
- `hf datasets parquet DATASET_ID` — List parquet file URLs available for a dataset. `[--subset TEXT --split TEXT --format CHOICE --quiet]`
- `hf datasets sql SQL` — Execute a raw SQL query with DuckDB against dataset parquet URLs. `[--format CHOICE]`

### `hf discussions` — Manage discussions and pull requests on the Hub.

- `hf discussions close REPO_ID NUM` — Close a discussion or pull request. `[--comment TEXT --yes --type CHOICE]`
- `hf discussions comment REPO_ID NUM` — Comment on a discussion or pull request. `[--body TEXT --body-file PATH --type CHOICE]`
- `hf discussions create REPO_ID --title TEXT` — Create a new discussion or pull request on a repo. `[--body TEXT --body-file PATH --pull-request --type CHOICE]`
- `hf discussions diff REPO_ID NUM` — Show the diff of a pull request. `[--type CHOICE]`
- `hf discussions info REPO_ID NUM` — Get info about a discussion or pull request. `[--comments --diff --no-color --type CHOICE --format CHOICE]`
- `hf discussions list REPO_ID` — List discussions and pull requests on a repo. `[--status CHOICE --kind CHOICE --author TEXT --limit INTEGER --type CHOICE --format CHOICE --quiet]`
- `hf discussions merge REPO_ID NUM` — Merge a pull request. `[--comment TEXT --yes --type CHOICE]`
- `hf discussions rename REPO_ID NUM NEW_TITLE` — Rename a discussion or pull request. `[--type CHOICE]`
- `hf discussions reopen REPO_ID NUM` — Reopen a closed discussion or pull request. `[--comment TEXT --yes --type CHOICE]`

### `hf endpoints` — Manage Hugging Face Inference Endpoints.

- `hf endpoints catalog deploy --repo TEXT` — Deploy an Inference Endpoint from the Model Catalog. `[--name TEXT --accelerator TEXT --namespace TEXT]`
- `hf endpoints catalog list` — List available Catalog models.
- `hf endpoints delete NAME` — Delete an Inference Endpoint permanently. `[--namespace TEXT --yes]`
- `hf endpoints deploy NAME --repo TEXT --framework TEXT --accelerator TEXT --instance-size TEXT --instance-type TEXT --region TEXT --vendor TEXT` — Deploy an Inference Endpoint from a Hub repository. `[--namespace TEXT --task TEXT --min-replica INTEGER --max-replica INTEGER --scale-to-zero-timeout INTEGER --scaling-metric CHOICE --scaling-threshold FLOAT]`
- `hf endpoints describe NAME` — Get information about an existing endpoint. `[--namespace TEXT]`
- `hf endpoints list` — Lists all Inference Endpoints for the given namespace. `[--namespace TEXT --format CHOICE --quiet]`
- `hf endpoints pause NAME` — Pause an Inference Endpoint. `[--namespace TEXT]`
- `hf endpoints resume NAME` — Resume an Inference Endpoint. `[--namespace TEXT --fail-if-already-running]`
- `hf endpoints scale-to-zero NAME` — Scale an Inference Endpoint to zero. `[--namespace TEXT]`
- `hf endpoints update NAME` — Update an existing endpoint. `[--namespace TEXT --repo TEXT --accelerator TEXT --instance-size TEXT --instance-type TEXT --framework TEXT --revision TEXT --task TEXT --min-replica INTEGER --max-replica INTEGER --scale-to-zero-timeout INTEGER --scaling-metric CHOICE --scaling-threshold FLOAT]`

### `hf extensions` — Manage hf CLI extensions.

- `hf extensions exec NAME` — Execute an installed extension.
- `hf extensions install REPO_ID` — Install an extension from a public GitHub repository. `[--force]`
- `hf extensions list` — List installed extension commands. `[--format CHOICE --quiet]`
- `hf extensions remove NAME` — Remove an installed extension.
- `hf extensions search` — Search extensions available on GitHub (tagged with 'hf-extension' topic). `[--format CHOICE --quiet]`

### `hf jobs` — Run and manage Jobs on the Hub.

- `hf jobs cancel JOB_ID` — Cancel a Job `[--namespace TEXT]`
- `hf jobs hardware` — List available hardware options for Jobs
- `hf jobs inspect JOB_IDS` — Display detailed information on one or more Jobs `[--namespace TEXT]`
- `hf jobs logs JOB_ID` — Fetch the logs of a Job. `[--follow --tail INTEGER --namespace TEXT]`
- `hf jobs ps` — List Jobs. `[--all --namespace TEXT --filter TEXT --format TEXT --quiet]`
- `hf jobs run IMAGE COMMAND` — Run a Job. `[--env TEXT --secrets TEXT --label TEXT --env-file TEXT --secrets-file TEXT --flavor CHOICE --timeout TEXT --detach --namespace TEXT]`
- `hf jobs scheduled delete SCHEDULED_JOB_ID` — Delete a scheduled Job. `[--namespace TEXT]`
- `hf jobs scheduled inspect SCHEDULED_JOB_IDS` — Display detailed information on one or more scheduled Jobs `[--namespace TEXT]`
- `hf jobs scheduled ps` — List scheduled Jobs `[--all --namespace TEXT --filter TEXT --format TEXT --quiet]`
- `hf jobs scheduled resume SCHEDULED_JOB_ID` — Resume (unpause) a scheduled Job. `[--namespace TEXT]`
- `hf jobs scheduled run SCHEDULE IMAGE COMMAND` — Schedule a Job. `[--suspend --concurrency --env TEXT --secrets TEXT --label TEXT --env-file TEXT --secrets-file TEXT --flavor CHOICE --timeout TEXT --namespace TEXT]`
- `hf jobs scheduled suspend SCHEDULED_JOB_ID` — Suspend (pause) a scheduled Job. `[--namespace TEXT]`
- `hf jobs scheduled uv run SCHEDULE SCRIPT` — Run a UV script (local file or URL) on HF infrastructure `[--suspend --concurrency --image TEXT --flavor CHOICE --env TEXT --secrets TEXT --label TEXT --env-file TEXT --secrets-file TEXT --timeout TEXT --namespace TEXT --with TEXT --python TEXT]`
- `hf jobs stats` — Fetch the resource usage statistics and metrics of Jobs `[--namespace TEXT]`
- `hf jobs uv run SCRIPT` — Run a UV script (local file or URL) on HF infrastructure `[--image TEXT --flavor CHOICE --env TEXT --secrets TEXT --label TEXT --env-file TEXT --secrets-file TEXT --timeout TEXT --detach --namespace TEXT --with TEXT --python TEXT]`

### `hf models` — Interact with models on the Hub.

- `hf models info MODEL_ID` — Get info about a model on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`
- `hf models list` — List models on the Hub. `[--search TEXT --author TEXT --filter TEXT --num-parameters TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`

### `hf papers` — Interact with papers on the Hub.

- `hf papers list` — List daily papers on the Hub. `[--date TEXT --sort CHOICE --limit INTEGER --format CHOICE --quiet]`

### `hf repos` — Manage repos on the Hub.

- `hf repos branch create REPO_ID BRANCH` — Create a new branch for a repo on the Hub. `[--revision TEXT --type CHOICE --exist-ok]`
- `hf repos branch delete REPO_ID BRANCH` — Delete a branch from a repo on the Hub. `[--type CHOICE]`
- `hf repos create REPO_ID` — Create a new repo on the Hub. `[--type CHOICE --space-sdk TEXT --private --exist-ok --resource-group-id TEXT]`
- `hf repos delete REPO_ID` — Delete a repo from the Hub. This is an irreversible operation. `[--type CHOICE --missing-ok]`
- `hf repos delete-files REPO_ID PATTERNS` — Delete files from a repo on the Hub. `[--type CHOICE --revision TEXT --commit-message TEXT --commit-description TEXT --create-pr]`
- `hf repos duplicate FROM_ID` — Duplicate a repo on the Hub (model, dataset, or Space). `[--type CHOICE --private --exist-ok]`
- `hf repos move FROM_ID TO_ID` — Move a repository from a namespace to another namespace. `[--type CHOICE]`
- `hf repos settings REPO_ID` — Update the settings of a repository. `[--gated CHOICE --private --type CHOICE]`
- `hf repos tag create REPO_ID TAG` — Create a tag for a repo. `[--message TEXT --revision TEXT --type CHOICE]`
- `hf repos tag delete REPO_ID TAG` — Delete a tag for a repo. `[--yes --type CHOICE]`
- `hf repos tag list REPO_ID` — List tags for a repo. `[--type CHOICE]`

### `hf skills` — Manage skills for AI assistants.

- `hf skills add` — Download a skill and install it for an AI assistant. `[--claude --codex --cursor --opencode --global --dest PATH --force]`
- `hf skills preview` — Print the generated SKILL.md to stdout.

### `hf spaces` — Interact with spaces on the Hub.

- `hf spaces dev-mode SPACE_ID` — Enable or disable dev mode on a Space. `[--stop]`
- `hf spaces hot-reload SPACE_ID` — Hot-reload any Python file of a Space without a full rebuild + restart. `[--local-file TEXT --skip-checks --skip-summary]`
- `hf spaces info SPACE_ID` — Get info about a space on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`
- `hf spaces list` — List spaces on the Hub. `[--search TEXT --author TEXT --filter TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`

### `hf webhooks` — Manage webhooks on the Hub.

- `hf webhooks create --watch TEXT` — Create a new webhook. `[--url TEXT --job-id TEXT --domain CHOICE --secret TEXT]`
- `hf webhooks delete WEBHOOK_ID` — Delete a webhook permanently. `[--yes]`
- `hf webhooks disable WEBHOOK_ID` — Disable an active webhook.
- `hf webhooks enable WEBHOOK_ID` — Enable a disabled webhook.
- `hf webhooks info WEBHOOK_ID` — Show full details for a single webhook as JSON.
- `hf webhooks list` — List all webhooks for the current user. `[--format CHOICE --quiet]`
- `hf webhooks update WEBHOOK_ID` — Update an existing webhook. Only provided options are changed. `[--url TEXT --watch TEXT --domain CHOICE --secret TEXT]`

## Common options

- `--format` — Output format: `--format json` (or `--json`) or `--format table` (default).
- `-q / --quiet` — Minimal output.
- `--revision` — Git revision id which can be a branch name, a tag, or a commit hash.
- `--token` — Use a User Access Token. Prefer setting `HF_TOKEN` env var instead of passing `--token`.
- `--type` — The type of repository (model, dataset, or space).

## Tips

- Use `hf <command> --help` for full options, descriptions, usage, and real-world examples
- Authenticate with `HF_TOKEN` env var (recommended) or with `--token`


================================================
FILE: skills/hugging-face-dataset-viewer/SKILL.md
================================================
---
name: hugging-face-dataset-viewer
description: Use this skill for Hugging Face Dataset Viewer API workflows that fetch subset/split metadata, paginate rows, search text, apply filters, download parquet URLs, and read size or statistics.
---

# Hugging Face Dataset Viewer

Use this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.

## Core workflow

1. Optionally validate dataset availability with `/is-valid`.
2. Resolve `config` + `split` with `/splits`.
3. Preview with `/first-rows`.
4. Paginate content with `/rows` using `offset` and `length` (max 100).
5. Use `/search` for text matching and `/filter` for row predicates.
6. Retrieve parquet links via `/parquet` and totals/metadata via `/size` and `/statistics`.

## Defaults

- Base URL: `https://datasets-server.huggingface.co`
- Default API method: `GET`
- Query params should be URL-encoded.
- `offset` is 0-based.
- `length` max is usually `100` for row-like endpoints.
- Gated/private datasets require `Authorization: Bearer <HF_TOKEN>`.

## Dataset Viewer

- `Validate dataset`: `/is-valid?dataset=<namespace/repo>`
- `List subsets and splits`: `/splits?dataset=<namespace/repo>`
- `Preview first rows`: `/first-rows?dataset=<namespace/repo>&config=<config>&split=<split>`
- `Paginate rows`: `/rows?dataset=<namespace/repo>&config=<config>&split=<split>&offset=<int>&length=<int>`
- `Search text`: `/search?dataset=<namespace/repo>&config=<config>&split=<split>&query=<text>&offset=<int>&length=<int>`
- `Filter with predicates`: `/filter?dataset=<namespace/repo>&config=<config>&split=<split>&where=<predicate>&orderby=<sort>&offset=<int>&length=<int>`
- `List parquet shards`: `/parquet?dataset=<namespace/repo>`
- `Get size totals`: `/size?dataset=<namespace/repo>`
- `Get column statistics`: `/statistics?dataset=<namespace/repo>&config=<config>&split=<split>`
- `Get Croissant metadata (if available)`: `/croissant?dataset=<namespace/repo>`

Pagination pattern:

```bash
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100"
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100"
```

When pagination is partial, use response fields such as `num_rows_total`, `num_rows_per_page`, and `partial` to drive continuation logic.

Search/filter notes:

- `/search` matches string columns (full-text style behavior is internal to the API).
- `/filter` requires predicate syntax in `where` and optional sort in `orderby`.
- Keep filtering and searches read-only and side-effect free.

## Querying Datasets

Use `npx parquetlens` with Hub parquet alias paths for SQL querying.

Parquet alias shape:

```text
hf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet
```

Derive `<config>`, `<split>`, and `<shard>` from Dataset Viewer `/parquet`:

```bash
curl -s "https://datasets-server.huggingface.co/parquet?dataset=cfahlgren1/hub-stats" \
  | jq -r '.parquet_files[] | "hf://datasets/\(.dataset)@~parquet/\(.config)/\(.split)/\(.filename)"'
```

Run SQL query:

```bash
npx -y -p parquetlens -p @parquetlens/sql parquetlens \
  "hf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet" \
  --sql "SELECT * FROM data LIMIT 20"
```

### SQL export

- CSV: `--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.csv' (FORMAT CSV, HEADER, DELIMITER ',')"`
- JSON: `--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.json' (FORMAT JSON)"`
- Parquet: `--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.parquet' (FORMAT PARQUET)"`

## Creating and Uploading Datasets

Use one of these flows depending on dependency constraints.

Zero local dependencies (Hub UI):

- Create dataset repo in browser: `https://huggingface.co/new-dataset`
- Upload parquet files in the repo "Files and versions" page.
- Verify shards appear in Dataset Viewer:

```bash
curl -s "https://datasets-server.huggingface.co/parquet?dataset=<namespace>/<repo>"
```

Low dependency CLI flow (`npx @huggingface/hub` / `hfjs`):

- Set auth token:

```bash
export HF_TOKEN=<your_hf_token>
```

- Upload parquet folder to a dataset repo (auto-creates repo if missing):

```bash
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data
```

- Upload as private repo on creation:

```bash
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data --private
```

After upload, call `/parquet` to discover `<config>/<split>/<shard>` values for querying with `@~parquet`.


================================================
FILE: skills/hugging-face-datasets/SKILL.md
================================================
---
name: hugging-face-datasets
description: Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.
---

# Overview
This skill provides tools to manage datasets on the Hugging Face Hub with a focus on creation, configuration, content management, and SQL-based data manipulation. It is designed to complement the existing Hugging Face MCP server by providing dataset editing and querying capabilities.

## Integration with HF MCP Server
- **Use HF MCP Server for**: Dataset discovery, search, and metadata retrieval
- **Use This Skill for**: Dataset creation, content editing, SQL queries, data transformation, and structured data formatting

# Version
2.1.0

# Dependencies
# This skill uses PEP 723 scripts with inline dependency management
# Scripts auto-install requirements when run with: uv run scripts/script_name.py

- uv (Python package manager)
- Getting Started: See "Usage Instructions" below for PEP 723 usage

# Core Capabilities

## 1. Dataset Lifecycle Management
- **Initialize**: Create new dataset repositories with proper structure
- **Configure**: Store detailed configuration including system prompts and metadata
- **Stream Updates**: Add rows efficiently without downloading entire datasets

## 2. SQL-Based Dataset Querying (NEW)
Query any Hugging Face dataset using DuckDB SQL via `scripts/sql_manager.py`:
- **Direct Queries**: Run SQL on datasets using the `hf://` protocol
- **Schema Discovery**: Describe dataset structure and column types
- **Data Sampling**: Get random samples for exploration
- **Aggregations**: Count, histogram, unique values analysis
- **Transformations**: Filter, join, reshape data with SQL
- **Export & Push**: Save results locally or push to new Hub repos

## 3. Multi-Format Dataset Support
Supports diverse dataset types through template system:
- **Chat/Conversational**: Chat templating, multi-turn dialogues, tool usage examples
- **Text Classification**: Sentiment analysis, intent detection, topic classification
- **Question-Answering**: Reading comprehension, factual QA, knowledge bases
- **Text Completion**: Language modeling, code completion, creative writing
- **Tabular Data**: Structured data for regression/classification tasks
- **Custom Formats**: Flexible schema definition for specialized needs

## 4. Quality Assurance Features
- **JSON Validation**: Ensures data integrity during uploads
- **Batch Processing**: Efficient handling of large datasets
- **Error Recovery**: Graceful handling of upload failures and conflicts

# Usage Instructions

The skill includes two Python scripts that use PEP 723 inline dependency management:

> **All paths are relative to the directory containing this SKILL.md
file.**
> Scripts are run with: `uv run scripts/script_name.py [arguments]`

- `scripts/dataset_manager.py` - Dataset creation and management
- `scripts/sql_manager.py` - SQL-based dataset querying and transformation

### Prerequisites
- `uv` package manager installed
- `HF_TOKEN` environment variable must be set with a Write-access token

---

# SQL Dataset Querying (sql_manager.py)

Query, transform, and push Hugging Face datasets using DuckDB SQL. The `hf://` protocol provides direct access to any public dataset (or private with token).

## Quick Start

```bash
# Query a dataset
uv run scripts/sql_manager.py query \
  --dataset "cais/mmlu" \
  --sql "SELECT * FROM data WHERE subject='nutrition' LIMIT 10"

# Get dataset schema
uv run scripts/sql_manager.py describe --dataset "cais/mmlu"

# Sample random rows
uv run scripts/sql_manager.py sample --dataset "cais/mmlu" --n 5

# Count rows with filter
uv run scripts/sql_manager.py count --dataset "cais/mmlu" --where "subject='nutrition'"
```

## SQL Query Syntax

Use `data` as the table name in your SQL - it gets replaced with the actual `hf://` path:

```sql
-- Basic select
SELECT * FROM data LIMIT 10

-- Filtering
SELECT * FROM data WHERE subject='nutrition'

-- Aggregations
SELECT subject, COUNT(*) as cnt FROM data GROUP BY subject ORDER BY cnt DESC

-- Column selection and transformation
SELECT question, choices[answer] AS correct_answer FROM data

-- Regex matching
SELECT * FROM data WHERE regexp_matches(question, 'nutrition|diet')

-- String functions
SELECT regexp_replace(question, '\n', '') AS cleaned FROM data
```

## Common Operations

### 1. Explore Dataset Structure
```bash
# Get schema
uv run scripts/sql_manager.py describe --dataset "cais/mmlu"

# Get unique values in column
uv run scripts/sql_manager.py unique --dataset "cais/mmlu" --column "subject"

# Get value distribution
uv run scripts/sql_manager.py histogram --dataset "cais/mmlu" --column "subject" --bins 20
```

### 2. Filter and Transform
```bash
# Complex filtering with SQL
uv run scripts/sql_manager.py query \
  --dataset "cais/mmlu" \
  --sql "SELECT subject, COUNT(*) as cnt FROM data GROUP BY subject HAVING cnt > 100"

# Using transform command
uv run scripts/sql_manager.py transform \
  --dataset "cais/mmlu" \
  --select "subject, COUNT(*) as cnt" \
  --group-by "subject" \
  --order-by "cnt DESC" \
  --limit 10
```

### 3. Create Subsets and Push to Hub
```bash
# Query and push to new dataset
uv run scripts/sql_manager.py query \
  --dataset "cais/mmlu" \
  --sql "SELECT * FROM data WHERE subject='nutrition'" \
  --push-to "username/mmlu-nutrition-subset" \
  --private

# Transform and push
uv run scripts/sql_manager.py transform \
  --dataset "ibm/duorc" \
  --config "ParaphraseRC" \
  --select "question, answers" \
  --where "LENGTH(question) > 50" \
  --push-to "username/duorc-long-questions"
```

### 4. Export to Local Files
```bash
# Export to Parquet
uv run scripts/sql_manager.py export \
  --dataset "cais/mmlu" \
  --sql "SELECT * FROM data WHERE subject='nutrition'" \
  --output "nutrition.parquet" \
  --format parquet

# Export to JSONL
uv run scripts/sql_manager.py export \
  --dataset "cais/mmlu" \
  --sql "SELECT * FROM data LIMIT 100" \
  --output "sample.jsonl" \
  --format jsonl
```

### 5. Working with Dataset Configs/Splits
```bash
# Specify config (subset)
uv run scripts/sql_manager.py query \
  --dataset "ibm/duorc" \
  --config "ParaphraseRC" \
  --sql "SELECT * FROM data LIMIT 5"

# Specify split
uv run scripts/sql_manager.py query \
  --dataset "cais/mmlu" \
  --split "test" \
  --sql "SELECT COUNT(*) FROM data"

# Query all splits
uv run scripts/sql_manager.py query \
  --dataset "cais/mmlu" \
  --split "*" \
  --sql "SELECT * FROM data LIMIT 10"
```

### 6. Raw SQL with Full Paths
For complex queries or joining datasets:
```bash
uv run scripts/sql_manager.py raw --sql "
  SELECT a.*, b.* 
  FROM 'hf://datasets/dataset1@~parquet/default/train/*.parquet' a
  JOIN 'hf://datasets/dataset2@~parquet/default/train/*.parquet' b
  ON a.id = b.id
  LIMIT 100
"
```

## Python API Usage

```python
from sql_manager import HFDatasetSQL

sql = HFDatasetSQL()

# Query
results = sql.query("cais/mmlu", "SELECT * FROM data WHERE subject='nutrition' LIMIT 10")

# Get schema
schema = sql.describe("cais/mmlu")

# Sample
samples = sql.sample("cais/mmlu", n=5, seed=42)

# Count
count = sql.count("cais/mmlu", where="subject='nutrition'")

# Histogram
dist = sql.histogram("cais/mmlu", "subject")

# Filter and transform
results = sql.filter_and_transform(
    "cais/mmlu",
    select="subject, COUNT(*) as cnt",
    group_by="subject",
    order_by="cnt DESC",
    limit=10
)

# Push to Hub
url = sql.push_to_hub(
    "cais/mmlu",
    "username/nutrition-subset",
    sql="SELECT * FROM data WHERE subject='nutrition'",
    private=True
)

# Export locally
sql.export_to_parquet("cais/mmlu", "output.parquet", sql="SELECT * FROM data LIMIT 100")

sql.close()
```

## HF Path Format

DuckDB uses the `hf://` protocol to access datasets:
```
hf://datasets/{dataset_id}@{revision}/{config}/{split}/*.parquet
```

Examples:
- `hf://datasets/cais/mmlu@~parquet/default/train/*.parquet`
- `hf://datasets/ibm/duorc@~parquet/ParaphraseRC/test/*.parquet`

The `@~parquet` revision provides auto-converted Parquet files for any dataset format.

## Useful DuckDB SQL Functions

```sql
-- String functions
LENGTH(column)                    -- String length
regexp_replace(col, '\n', '')     -- Regex replace
regexp_matches(col, 'pattern')    -- Regex match
LOWER(col), UPPER(col)           -- Case conversion

-- Array functions  
choices[0]                        -- Array indexing (0-based)
array_length(choices)             -- Array length
unnest(choices)                   -- Expand array to rows

-- Aggregations
COUNT(*), SUM(col), AVG(col)
GROUP BY col HAVING condition

-- Sampling
USING SAMPLE 10                   -- Random sample
USING SAMPLE 10 (RESERVOIR, 42)   -- Reproducible sample

-- Window functions
ROW_NUMBER() OVER (PARTITION BY col ORDER BY col2)
```

---

# Dataset Creation (dataset_manager.py)

### Recommended Workflow

**1. Discovery (Use HF MCP Server):**
```python
# Use HF MCP tools to find existing datasets
search_datasets("conversational AI training")
get_dataset_details("username/dataset-name")
```

**2. Creation (Use This Skill):**
```bash
# Initialize new dataset
uv run scripts/dataset_manager.py init --repo_id "your-username/dataset-name" [--private]

# Configure with detailed system prompt
uv run scripts/dataset_manager.py config --repo_id "your-username/dataset-name" --system_prompt "$(cat system_prompt.txt)"
```

**3. Content Management (Use This Skill):**
```bash
# Quick setup with any template
uv run scripts/dataset_manager.py quick_setup \
  --repo_id "your-username/dataset-name" \
  --template classification

# Add data with template validation
uv run scripts/dataset_manager.py add_rows \
  --repo_id "your-username/dataset-name" \
  --template qa \
  --rows_json "$(cat your_qa_data.json)"
```

### Template-Based Data Structures

**1. Chat Template (`--template chat`)**
```json
{
  "messages": [
    {"role": "user", "content": "Natural user request"},
    {"role": "assistant", "content": "Response with tool usage"},
    {"role": "tool", "content": "Tool response", "tool_call_id": "call_123"}
  ],
  "scenario": "Description of use case",
  "complexity": "simple|intermediate|advanced"
}
```

**2. Classification Template (`--template classification`)**
```json
{
  "text": "Input text to be classified",
  "label": "classification_label",
  "confidence": 0.95,
  "metadata": {"domain": "technology", "language": "en"}
}
```

**3. QA Template (`--template qa`)**
```json
{
  "question": "What is the question being asked?",
  "answer": "The complete answer",
  "context": "Additional context if needed",
  "answer_type": "factual|explanatory|opinion",
  "difficulty": "easy|medium|hard"
}
```

**4. Completion Template (`--template completion`)**
```json
{
  "prompt": "The beginning text or context",
  "completion": "The expected continuation",
  "domain": "code|creative|technical|conversational",
  "style": "description of writing style"
}
```

**5. Tabular Template (`--template tabular`)**
```json
{
  "columns": [
    {"name": "feature1", "type": "numeric", "description": "First feature"},
    {"name": "target", "type": "categorical", "description": "Target variable"}
  ],
  "data": [
    {"feature1": 123, "target": "class_a"},
    {"feature1": 456, "target": "class_b"}
  ]
}
```

### Advanced System Prompt Template

For high-quality training data generation:
```text
You are an AI assistant expert at using MCP tools effectively.

## MCP SERVER DEFINITIONS
[Define available servers and tools]

## TRAINING EXAMPLE STRUCTURE
[Specify exact JSON schema for chat templating]

## QUALITY GUIDELINES
[Detail requirements for realistic scenarios, progressive complexity, proper tool usage]

## EXAMPLE CATEGORIES
[List development workflows, debugging scenarios, data management tasks]
```

### Example Categories & Templates

The skill includes diverse training examples beyond just MCP usage:

**Available Example Sets:**
- `training_examples.json` - MCP tool usage examples (debugging, project setup, database analysis)
- `diverse_training_examples.json` - Broader scenarios including:
  - **Educational Chat** - Explaining programming concepts, tutorials
  - **Git Workflows** - Feature branches, version control guidance
  - **Code Analysis** - Performance optimization, architecture review
  - **Content Generation** - Professional writing, creative brainstorming
  - **Codebase Navigation** - Legacy code exploration, systematic analysis
  - **Conversational Support** - Problem-solving, technical discussions

**Using Different Example Sets:**
```bash
# Add MCP-focused examples
uv run scripts/dataset_manager.py add_rows --repo_id "your-username/dataset-name" \
  --rows_json "$(cat examples/training_examples.json)"

# Add diverse conversational examples
uv run scripts/dataset_manager.py add_rows --repo_id "your-username/dataset-name" \
  --rows_json "$(cat examples/diverse_training_examples.json)"

# Mix both for comprehensive training data
uv run scripts/dataset_manager.py add_rows --repo_id "your-username/dataset-name" \
  --rows_json "$(jq -s '.[0] + .[1]' examples/training_examples.json examples/diverse_training_examples.json)"
```

### Commands Reference

**List Available Templates:**
```bash
uv run scripts/dataset_manager.py list_templates
```

**Quick Setup (Recommended):**
```bash
uv run scripts/dataset_manager.py quick_setup --repo_id "your-username/dataset-name" --template classification
```

**Manual Setup:**
```bash
# Initialize repository
uv run scripts/dataset_manager.py init --repo_id "your-username/dataset-name" [--private]

# Configure with system prompt
uv run scripts/dataset_manager.py config --repo_id "your-username/dataset-name" --system_prompt "Your prompt here"

# Add data with validation
uv run scripts/dataset_manager.py add_rows \
  --repo_id "your-username/dataset-name" \
  --template qa \
  --rows_json '[{"question": "What is AI?", "answer": "Artificial Intelligence..."}]'
```

**View Dataset Statistics:**
```bash
uv run scripts/dataset_manager.py stats --repo_id "your-username/dataset-name"
```

### Error Handling
- **Repository exists**: Script will notify and continue with configuration
- **Invalid JSON**: Clear error message with parsing details
- **Network issues**: Automatic retry for transient failures
- **Token permissions**: Validation before operations begin

---

# Combined Workflow Examples

## Example 1: Create Training Subset from Existing Dataset
```bash
# 1. Explore the source dataset
uv run scripts/sql_manager.py describe --dataset "cais/mmlu"
uv run scripts/sql_manager.py histogram --dataset "cais/mmlu" --column "subject"

# 2. Query and create subset
uv run scripts/sql_manager.py query \
  --dataset "cais/mmlu" \
  --sql "SELECT * FROM data WHERE subject IN ('nutrition', 'anatomy', 'clinical_knowledge')" \
  --push-to "username/mmlu-medical-subset" \
  --private
```

## Example 2: Transform and Reshape Data
```bash
# Transform MMLU to QA format with correct answers extracted
uv run scripts/sql_manager.py query \
  --dataset "cais/mmlu" \
  --sql "SELECT question, choices[answer] as correct_answer, subject FROM data" \
  --push-to "username/mmlu-qa-format"
```

## Example 3: Merge Multiple Dataset Splits
```bash
# Export multiple splits and combine
uv run scripts/sql_manager.py export \
  --dataset "cais/mmlu" \
  --split "*" \
  --output "mmlu_all.parquet"
```

## Example 4: Quality Filtering
```bash
# Filter for high-quality examples
uv run scripts/sql_manager.py query \
  --dataset "squad" \
  --sql "SELECT * FROM data WHERE LENGTH(context) > 500 AND LENGTH(question) > 20" \
  --push-to "username/squad-filtered"
```

## Example 5: Create Custom Training Dataset
```bash
# 1. Query source data
uv run scripts/sql_manager.py export \
  --dataset "cais/mmlu" \
  --sql "SELECT question, subject FROM data WHERE subject='nutrition'" \
  --output "nutrition_source.jsonl" \
  --format jsonl

# 2. Process with your pipeline (add answers, format, etc.)

# 3. Push processed data
uv run scripts/dataset_manager.py init --repo_id "username/nutrition-training"
uv run scripts/dataset_manager.py add_rows \
  --repo_id "username/nutrition-training" \
  --template qa \
  --rows_json "$(cat processed_data.json)"
```


================================================
FILE: skills/hugging-face-datasets/examples/diverse_training_examples.json
================================================
[
  {
    "messages": [
      {
        "role": "user",
        "content": "Hi there! I'm working on a Python project and getting stuck with some async programming concepts. Can you help explain how to use async/await properly?"
      },
      {
        "role": "assistant",
        "content": "I'd be happy to help you understand async/await in Python! These are powerful tools for writing concurrent code. Let me show you some practical examples and explain the key concepts.\n\n**Basic async/await syntax:**\n```python\nimport asyncio\n\nasync def fetch_data():\n    # Simulate an async operation\n    await asyncio.sleep(1)\n    return \"Data fetched!\"\n\nasync def main():\n    result = await fetch_data()\n    print(result)\n\n# Run the async function\nasyncio.run(main())\n```\n\n**Key concepts:**\n- `async def` defines an asynchronous function (coroutine)\n- `await` pauses execution until the awaited operation completes\n- `asyncio.run()` runs the main async function\n\n**When to use async/await:**\n- I/O operations (file reads, network requests, database queries)\n- Operations that involve waiting\n- When you want to handle multiple tasks concurrently\n\nWould you like me to show you a more complex example with multiple concurrent operations, or do you have a specific use case you're working on?"
      }
    ],
    "scenario": "Educational chat about Python async programming concepts",
    "complexity": "simple",
    "tools_used": [],
    "learning_objectives": ["conversational teaching", "code explanation", "technical mentoring", "educational chat patterns"]
  },
  {
    "messages": [
      {
        "role": "user",
        "content": "I need to create a new feature branch for my project, make some changes, and then create a pull request. Can you walk me through the Git workflow?"
      },
      {
        "role": "assistant",
        "content": "I'll help you with the complete Git workflow for creating a feature branch and pull request. Let me first check your current Git status and then guide you through each step.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "git_status",
              "arguments": {}
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "On branch main\nYour branch is up to date with 'origin/main'.\n\nnothing to commit, working tree clean",
        "tool_call_id": "call_301"
      },
      {
        "role": "assistant",
        "content": "Perfect! Your working tree is clean and you're on the main branch. Now let's create your feature branch and walk through the workflow.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "git_branch",
              "arguments": {
                "branch_name": "feature/user-authentication",
                "create": true
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "Switched to a new branch 'feature/user-authentication'",
        "tool_call_id": "call_302"
      },
      {
        "role": "assistant",
        "content": "Great! I've created and switched to your new feature branch. Here's the complete workflow:\n\n**✅ Step 1: Created feature branch** (Done)\n```bash\ngit checkout -b feature/user-authentication\n```\n\n**Next steps for you:**\n\n**Step 2: Make your changes**\n- Edit your files as needed\n- Add your new feature code\n\n**Step 3: Stage and commit changes**\n```bash\ngit add .\ngit commit -m \"Add user authentication feature\"\n```\n\n**Step 4: Push branch to remote**\n```bash\ngit push -u origin feature/user-authentication\n```\n\n**Step 5: Create Pull Request**\n- Go to your repository on GitHub/GitLab\n- Click \"Create Pull Request\" or \"Merge Request\"\n- Fill in description and request review\n\nWould you like me to help you with any of these steps once you've made your changes?"
      }
    ],
    "scenario": "Git workflow guidance for feature development",
    "complexity": "intermediate",
    "tools_used": ["git_status", "git_branch"],
    "learning_objectives": ["Git workflow patterns", "branch management", "collaborative development", "step-by-step guidance"]
  },
  {
    "messages": [
      {
        "role": "user",
        "content": "I have a Python function that's running really slowly. Can you help me analyze the code and suggest optimizations?"
      },
      {
        "role": "assistant",
        "content": "I'd be happy to help optimize your Python code! Let me analyze the function to identify potential performance bottlenecks and suggest improvements.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "find_functions",
              "arguments": {
                "pattern": "def.*process.*"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "Found functions:\n- def process_user_data(users_list): line 45 in utils.py\n- def process_payments(payment_records): line 78 in billing.py",
        "tool_call_id": "call_401"
      },
      {
        "role": "assistant",
        "content": "I found some processing functions. Let me examine the code to identify performance issues.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "read_file",
              "arguments": {
                "path": "utils.py",
                "start_line": 45,
                "end_line": 60
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "def process_user_data(users_list):\n    results = []\n    for user in users_list:\n        # Multiple database calls in loop - PERFORMANCE ISSUE!\n        profile = db.get_user_profile(user.id)\n        settings = db.get_user_settings(user.id)\n        preferences = db.get_user_preferences(user.id)\n        \n        # String concatenation in loop - PERFORMANCE ISSUE!\n        summary = \"\"\n        for field in user.fields:\n            summary += f\"{field.name}: {field.value}\\n\"\n        \n        results.append({\n            'user': user,\n            'profile': profile,\n            'settings': settings,\n            'preferences': preferences,\n            'summary': summary\n        })\n    return results",
        "tool_call_id": "call_402"
      },
      {
        "role": "assistant",
        "content": "I've identified several performance issues in your `process_user_data` function. Here's my analysis and optimization suggestions:\n\n**🔍 Performance Issues Found:**\n\n1. **N+1 Database Query Problem** (Lines 48-50)\n   - Making 3 separate DB calls per user in a loop\n   - This scales linearly with user count (very slow!)\n\n2. **String Concatenation in Loop** (Lines 52-55)\n   - Using `+=` for string building is inefficient\n   - Creates new string objects each iteration\n\n**⚡ Optimization Recommendations:**\n\n```python\ndef process_user_data_optimized(users_list):\n    # Fix 1: Batch database queries\n    user_ids = [user.id for user in users_list]\n    profiles = db.get_user_profiles_batch(user_ids)  # Single query\n    settings = db.get_user_settings_batch(user_ids)  # Single query  \n    preferences = db.get_user_preferences_batch(user_ids)  # Single query\n    \n    results = []\n    for user in users_list:\n        # Fix 2: Use join() for string concatenation\n        summary = \"\\n\".join(f\"{field.name}: {field.value}\" \n                           for field in user.fields)\n        \n        results.append({\n            'user': user,\n            'profile': profiles.get(user.id),\n            'settings': settings.get(user.id), \n            'preferences': preferences.get(user.id),\n            'summary': summary\n        })\n    return results\n```\n\n**Expected Performance Gain:** 10-100x faster depending on user count!\n\nWould you like me to help implement the batch database queries or analyze any other functions?"
      }
    ],
    "scenario": "Code performance analysis and optimization guidance",
    "complexity": "advanced",
    "tools_used": ["find_functions", "read_file"],
    "learning_objectives": ["performance optimization", "code analysis", "database optimization patterns", "Python best practices"]
  },
  {
    "messages": [
      {
        "role": "user",
        "content": "I need to write a professional email to a client explaining a project delay. Can you help me draft something that's apologetic but confident?"
      },
      {
        "role": "assistant",
        "content": "I'll help you draft a professional email that strikes the right balance between taking responsibility and maintaining client confidence. Here's a template that addresses the delay while focusing on solutions:\n\n**Subject:** Project Update - Revised Timeline for [Project Name]\n\n**Email Draft:**\n\nDear [Client Name],\n\nI hope this email finds you well. I'm writing to provide you with an important update regarding the [Project Name] timeline.\n\nAfter a thorough review of our current progress, I need to inform you that we will require an additional [X weeks/days] to complete the project to our mutual quality standards. This adjustment is primarily due to [brief, honest reason - e.g., \"unexpected technical complexities in the integration phase\" or \"additional testing requirements to ensure optimal performance\"].\n\nI want to be completely transparent about this delay and take full responsibility. However, I'm confident that this additional time will allow us to deliver a solution that exceeds your expectations and provides long-term value.\n\n**Here's our revised timeline:**\n- [Key milestone 1]: [New date]\n- [Key milestone 2]: [New date] \n- Final delivery: [New date]\n\nTo ensure we stay on track, I've implemented [specific action - e.g., \"daily progress reviews\" or \"additional resources to the development team\"].\n\nI understand this may impact your planning, and I'm available to discuss how we can minimize any inconvenience. Would you be available for a brief call this week to address any concerns?\n\nThank you for your patience and continued trust in our partnership.\n\nBest regards,\n[Your name]\n\n**Key elements this email includes:**\n- Direct, honest communication\n- Takes responsibility without excessive apologizing\n- Provides specific reasons and solutions\n- Shows proactive management\n- Offers further discussion\n\nWould you like me to adjust the tone or focus on any particular aspects?"
      }
    ],
    "scenario": "Professional communication assistance and content generation",
    "complexity": "simple",
    "tools_used": [],
    "learning_objectives": ["professional writing", "client communication", "content generation", "business correspondence patterns"]
  },
  {
    "messages": [
      {
        "role": "user",
        "content": "I'm trying to understand this legacy codebase I inherited. There are so many files and I don't know where to start. Can you help me map out the structure?"
      },
      {
        "role": "assistant",
        "content": "I completely understand - navigating a new codebase can be overwhelming! Let me help you systematically explore and understand the structure. I'll start by getting an overview and then drill down into the important parts.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "list_directory",
              "arguments": {
                "path": "."
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "README.md\npackage.json\nsrc/\ntest/\nconfig/\ndocs/\n.env.example\n.gitignore\nDockerfile\nnode_modules/",
        "tool_call_id": "call_501"
      },
      {
        "role": "assistant",
        "content": "Great! This looks like a Node.js project with a well-organized structure. Let me examine the key areas to give you a roadmap.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "read_file",
              "arguments": {
                "path": "package.json"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "{\n  \"name\": \"legacy-api-service\",\n  \"version\": \"2.3.1\",\n  \"description\": \"Customer management API with billing integration\",\n  \"main\": \"src/app.js\",\n  \"scripts\": {\n    \"start\": \"node src/app.js\",\n    \"dev\": \"nodemon src/app.js\",\n    \"test\": \"jest test/\"\n  },\n  \"dependencies\": {\n    \"express\": \"^4.18.0\",\n    \"mongoose\": \"^6.2.0\",\n    \"stripe\": \"^8.0.0\"\n  }\n}",
        "tool_call_id": "call_502"
      },
      {
        "role": "assistant",
        "content": "Now let me explore the source directory to understand the application structure.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "list_directory",
              "arguments": {
                "path": "src"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "app.js\nroutes/\nmodels/\nservices/\nmiddleware/\nutils/\ncontrollers/",
        "tool_call_id": "call_503"
      },
      {
        "role": "assistant",
        "content": "Perfect! Now I have a clear picture. Here's your **Codebase Navigation Guide**:\n\n## 📋 Project Overview\n**Type:** Customer Management API with billing (Node.js/Express)\n**Main Entry:** `src/app.js`\n**Database:** MongoDB (Mongoose)\n**Payment:** Stripe integration\n\n## 🗺️ Architecture Map\n\n**Start Here:**\n1. 📖 `README.md` - Project documentation\n2. ⚙️ `src/app.js` - Application entry point\n3. 🛣️ `src/routes/` - API endpoints definition\n\n**Core Application Logic:**\n- `src/controllers/` - Request handling logic\n- `src/models/` - Database schemas (MongoDB)\n- `src/services/` - Business logic layer\n- `src/middleware/` - Authentication, validation, etc.\n\n**Supporting Code:**\n- `src/utils/` - Helper functions\n- `config/` - Configuration files\n- `test/` - Unit/integration tests\n\n## 🎯 Recommended Exploration Order:\n\n**Phase 1 - Foundation (Start here!):**\n1. Read `README.md` for setup instructions\n2. Check `src/app.js` to see how everything connects\n3. Browse `src/routes/` to understand available APIs\n\n**Phase 2 - Business Logic:**\n4. Explore `src/controllers/` for main functionality\n5. Review `src/models/` for data structure\n6. Study `src/services/` for business rules\n\n**Phase 3 - Details:**\n7. Check `src/middleware/` for security/validation\n8. Look at `config/` for environment setup\n9. Review `test/` to understand expected behavior\n\nWould you like me to dive deeper into any specific area, like showing you the main routes or examining the data models?"
      }
    ],
    "scenario": "Codebase exploration and architecture understanding",
    "complexity": "intermediate",
    "tools_used": ["list_directory", "read_file"],
    "learning_objectives": ["codebase navigation", "architecture analysis", "systematic exploration", "legacy code understanding"]
  },
  {
    "messages": [
      {
        "role": "user",
        "content": "Can you help me brainstorm some creative names for my new productivity app? It's focused on time blocking and helps people focus on deep work."
      },
      {
        "role": "assistant",
        "content": "I'd love to help you brainstorm creative names for your time blocking and deep work app! Let me generate some ideas across different themes and styles.\n\n## 🎯 Focus & Deep Work Theme\n- **FlowBlocks** - Combines time blocking with flow state\n- **DeepDive** - Simple, suggests deep work immersion\n- **FocusForge** - Creating focused time periods\n- **ZenBlocks** - Peaceful, mindful time management\n- **MindVault** - Protecting mental space for deep work\n\n## ⏰ Time & Productivity Theme\n- **ChronoFlow** - Time management with smooth workflow\n- **TimeWeaver** - Weaving together productive time blocks\n- **ClockCraft** - Skillfully crafting your schedule\n- **TempoLock** - Locking in productive tempos\n- **PulseBlocks** - Rhythmic time blocking\n\n## 🏗️ Building & Structure Theme\n- **TaskArchitect** - Building your productive day\n- **FocusFoundry** - Where focused time is created\n- **ProductivityBricks** - Building blocks of productivity\n- **WorkStack** - Stacking focused work sessions\n- **MindGrid** - Organized mental workspace\n\n## 🎨 Creative & Unique\n- **Concentrix** - Focus + matrix of time\n- **Quietude** - Emphasizes calm, focused environment\n- **DeepSpace** - Space for deep work (tech feel)\n- **FlowChamber** - Protected space for flow state\n- **ThinkTank** - Tank of focused thinking time\n\n## 💡 Short & Catchy\n- **Blokr** - Modern, minimal spelling\n- **Focux** - Focus + flux\n- **Zenith** - Peak productivity\n- **Prism** - Breaking time into focused segments\n- **Vault** - Simple, suggests protection/security\n\n**My top recommendations:**\n1. **FlowBlocks** - Perfect blend of your core features\n2. **DeepVault** - Strong, memorable, suggests protection of deep work\n3. **FocusForge** - Active, suggests creating something valuable\n\nDo any of these resonate with your vision? Would you like me to explore variations of your favorites or brainstorm in a different direction?"
      }
    ],
    "scenario": "Creative brainstorming and idea generation assistance",
    "complexity": "simple",
    "tools_used": [],
    "learning_objectives": ["creative thinking", "brainstorming techniques", "branding assistance", "idea generation patterns"]
  }
]

================================================
FILE: skills/hugging-face-datasets/examples/system_prompt_template.txt
================================================
You are an AI assistant expert at using MCP (Model Context Protocol) tools effectively. You have access to various MCP servers and tools that allow you to interact with external systems, databases, APIs, and services.

## MCP SERVER DEFINITIONS

### Available MCP Servers and Tools:

**1. Filesystem Server (mcp-server-filesystem)**
- `read_file`: Read file contents
- `write_file`: Create or overwrite files
- `edit_file`: Make precise edits to existing files
- `list_directory`: List directory contents
- `create_directory`: Create new directories
- `move_file`: Move/rename files
- `delete_file`: Remove files
- `search_files`: Search for files by name pattern

**2. Git Server (mcp-server-git)**
- `git_status`: Check repository status
- `git_log`: View commit history
- `git_diff`: Show changes between commits
- `git_commit`: Create commits
- `git_branch`: Manage branches
- `git_push`: Push changes to remote
- `git_pull`: Pull changes from remote

**3. Database Server (mcp-server-sqlite)**
- `execute_query`: Run SQL queries
- `describe_table`: Get table schema
- `list_tables`: Show all tables
- `create_table`: Create new tables
- `insert_data`: Add records
- `backup_database`: Create database backup

**4. Web Server (mcp-server-web)**
- `fetch_url`: Get web page content
- `post_request`: Send POST requests
- `search_web`: Search the internet
- `download_file`: Download files from URLs

**5. Code Analysis Server (mcp-server-code)**
- `analyze_code`: Static code analysis
- `find_functions`: Locate function definitions
- `get_dependencies`: Extract imports/dependencies
- `format_code`: Auto-format source code
- `run_tests`: Execute test suites

## TRAINING EXAMPLE STRUCTURE

Generate training examples using this exact structure:

```json
{
  "messages": [
    {
      "role": "user",
      "content": "[Natural user request describing a real problem]"
    },
    {
      "role": "assistant",
      "content": "[Conversational response with embedded MCP tool usage]",
      "tool_calls": [
        {
          "type": "function",
          "function": {
            "name": "[mcp_tool_name]",
            "arguments": {
              "[param1]": "[value1]",
              "[param2]": "[value2]"
            }
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "[Simulated tool response/output]",
      "tool_call_id": "call_123"
    },
    {
      "role": "assistant",
      "content": "[Follow-up response analyzing results and next steps]"
    }
  ],
  "scenario": "[Brief description of the use case]",
  "complexity": "[simple|intermediate|advanced]",
  "tools_used": ["[tool1]", "[tool2]"],
  "learning_objectives": ["[objective1]", "[objective2]"]
}
```

## QUALITY GUIDELINES

**1. Realistic Scenarios**: Create examples based on actual developer workflows:
- Debugging application errors
- Setting up new projects
- Code refactoring and optimization
- Database management tasks
- API integration challenges
- Version control operations
- File organization and cleanup

**2. Progressive Complexity**:
- **Simple**: Single tool usage for straightforward tasks
- **Intermediate**: Multiple tools working together
- **Advanced**: Complex workflows with error handling and iteration

**3. Tool Usage Patterns**:
- Always explain tool selection rationale
- Show proper parameter usage
- Demonstrate error handling
- Include realistic tool outputs
- Show iterative problem-solving

**4. Conversational Quality**:
- Natural language flow
- User context awareness
- Helpful explanations without being verbose
- Professional but approachable tone
- Clear next steps and follow-up options

**5. Technical Accuracy**:
- Correct MCP tool syntax
- Realistic file paths and data
- Valid code snippets
- Proper error messages
- Authentic command outputs

## EXAMPLE CATEGORIES TO COVER

**Development Workflows**:
- Project setup and initialization
- Code review and quality checks
- Dependency management
- Build and deployment processes

**Git & Version Control**:
- Feature branch workflows
- Merge conflict resolution
- Repository management
- Collaborative development patterns

**Code Analysis & Optimization**:
- Performance profiling and optimization
- Code quality assessment
- Refactoring guidance
- Architecture analysis

**Debugging & Troubleshooting**:
- Error investigation
- Log analysis
- Performance profiling
- Configuration issues

**Data Management**:
- Database queries and updates
- File processing and transformation
- Backup and migration tasks
- Data validation and cleanup

**Integration Tasks**:
- API testing and integration
- External service connections
- Webhook setup and testing
- Authentication implementation

**Educational & Mentoring**:
- Concept explanations
- Technical tutorials
- Best practices guidance
- Learning path recommendations

**Content Generation**:
- Professional writing assistance
- Documentation creation
- Creative brainstorming
- Technical communication

**Conversational Support**:
- Problem-solving discussions
- Project guidance
- Technical Q&A
- Collaborative planning

**Codebase Navigation**:
- Legacy code exploration
- Architecture understanding
- Documentation analysis
- Systematic code review

**Maintenance & Operations**:
- Code refactoring
- Documentation updates
- Security audits
- System monitoring

Generate examples that demonstrate sophisticated MCP usage while maintaining natural conversational flow and practical applicability.

================================================
FILE: skills/hugging-face-datasets/examples/training_examples.json
================================================
[
  {
    "messages": [
      {
        "role": "user",
        "content": "My Node.js app is crashing on startup. Can you help me debug this? The error message mentions something about a missing config file."
      },
      {
        "role": "assistant",
        "content": "I'll help you debug the startup crash. Let me first check your application structure and then look at the logs to understand the missing config file issue.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "list_directory",
              "arguments": {
                "path": "."
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "package.json\nserver.js\nsrc/\nnode_modules/\nREADME.md\nlogs/",
        "tool_call_id": "call_001"
      },
      {
        "role": "assistant",
        "content": "I can see your project structure. Now let me check the main server file to understand what config file it's expecting.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "read_file",
              "arguments": {
                "path": "server.js"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "const express = require('express');\nconst config = require('./config/database.json');\nconst app = express();\n\napp.listen(config.port, () => {\n  console.log(`Server running on port ${config.port}`);\n});",
        "tool_call_id": "call_002"
      },
      {
        "role": "assistant",
        "content": "Found the issue! Your server.js is trying to load './config/database.json' but I don't see a config directory. Let me check if it exists and create it if needed.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "list_directory",
              "arguments": {
                "path": "config"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "Error: Directory 'config' does not exist",
        "tool_call_id": "call_003"
      },
      {
        "role": "assistant",
        "content": "Perfect! The config directory is missing. I'll create it and add a basic database configuration file to fix your startup crash.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "create_directory",
              "arguments": {
                "path": "config"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "Directory 'config' created successfully",
        "tool_call_id": "call_004"
      },
      {
        "role": "assistant",
        "content": "Now I'll create the missing database.json configuration file with sensible defaults.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "write_file",
              "arguments": {
                "path": "config/database.json",
                "content": "{\n  \"port\": 3000,\n  \"database\": {\n    \"host\": \"localhost\",\n    \"port\": 5432,\n    \"name\": \"myapp\",\n    \"user\": \"postgres\"\n  }\n}"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "File 'config/database.json' created successfully",
        "tool_call_id": "call_005"
      },
      {
        "role": "assistant",
        "content": "Great! I've fixed the missing config file issue. Your Node.js app should now start without crashing. The server will run on port 3000 by default. You may want to adjust the database connection settings in config/database.json to match your actual database setup."
      }
    ],
    "scenario": "Debugging Node.js startup crash due to missing configuration file",
    "complexity": "intermediate",
    "tools_used": ["list_directory", "read_file", "create_directory", "write_file"],
    "learning_objectives": ["systematic debugging approach", "file system navigation", "configuration file management", "error root cause analysis"]
  },
  {
    "messages": [
      {
        "role": "user",
        "content": "I need to set up a new React project with TypeScript and add some basic components. Can you help me get started?"
      },
      {
        "role": "assistant",
        "content": "I'll help you set up a new React TypeScript project from scratch. Let me start by creating the project structure and configuration files.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "create_directory",
              "arguments": {
                "path": "my-react-app"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "Directory 'my-react-app' created successfully",
        "tool_call_id": "call_101"
      },
      {
        "role": "assistant",
        "content": "Now I'll create the package.json with React and TypeScript dependencies.",
        "tool_calls": [
          {
            "type": "function",
            "function": {
              "name": "write_file",
              "arguments": {
                "path": "my-react-app/package.json",
                "content": "{\n  \"name\": \"my-react-app\",\n  \"version\": \"1.0.0\",\n  \"scripts\": {\n    \"dev\": \"vite\",\n    \"build\": \"tsc && vite build\",\n    \"preview\": \"vite preview\"\n  },\n  \"dependencies\": {\n    \"react\": \"^18.2.0\",\n    \"react-dom\": \"^18.2.0\"\n  },\n  \"devDependencies\": {\n    \"@types/react\": \"^18.2.0\",\n    \"@types/react-dom\": \"^18.2.0\",\n    \"@vitejs/plugin-react\": \"^4.0.0\",\n    \"typescript\": \"^5.0.0\",\n    \"vite\": \"^4.3.0\"\n  }\n}"
              }
            }
          }
        ]
      },
      {
        "role": "tool",
        "content": "File 'my-react-app/package.json' created successfully",
        "tool_call_id": "call_102"
      },
      {
        "role": "assistant",
        "content": "Perfect! I've set up your React TypeScript project with proper configuration. The project includes package.json with all necessary dependencies, and a clean structure. You can now run `npm install` to install dependencies and start developing your components. Would you like me to create any specific components or add additional configuration?"
      }
    ],
    "scenario": "Setting up new React TypeScript project from scratch",
    "complexity": "simple",
    "tools_used": ["create_directory", "write_file"],
    "learning_objectives": ["project initialization", "React TypeScript setup", "configuration management", "directory structure planning"]
  }
]

================================================
FILE: skills/hugging-face-datasets/scripts/dataset_manager.py
================================================
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "huggingface_hub>=0.20.0",
# ]
# ///
"""
Hugging Face Dataset Manager

Enhanced dataset creation and management tool designed to work alongside
the HF MCP server. Provides dataset creation, configuration, and content
management capabilities optimized for conversational AI training data.

Version: 2.0.0

Usage:
    uv run dataset_manager.py init --repo_id username/dataset-name
    uv run dataset_manager.py quick_setup --repo_id username/dataset-name --template chat
    uv run dataset_manager.py add_rows --repo_id username/dataset-name --rows_json '[{"messages": [...]}]'
    uv run dataset_manager.py stats --repo_id username/dataset-name
    uv run dataset_manager.py list_templates
"""

import os
import json
import time
import argparse
from pathlib import Path
from typing import List, Dict, Any, Optional
from huggingface_hub import HfApi, create_repo
from huggingface_hub.utils import HfHubHTTPError

# Configuration
HF_TOKEN = os.environ.get("HF_TOKEN")
EXAMPLES_DIR = Path(__file__).parent.parent / "examples"


def init_dataset(repo_id, token=None, private=True):
    """
    Initialize a new dataset repository on Hugging Face Hub.
    """
    api = HfApi(token=token)
    try:
        create_repo(repo_id, repo_type="dataset", private=private, token=token)
        print(f"Created dataset repository: {repo_id}")
    except HfHubHTTPError as e:
        if "409" in str(e):
            print(f"Repository {repo_id} already exists.")
        else:
            raise e

    # Create a basic README.md with metadata if it doesn't exist
    readme_content = f"""---
license: mit
---

# {repo_id.split("/")[-1]}

This dataset was created using the Claude Dataset Skill.
"""
    try:
        api.upload_file(
            path_or_fileobj=readme_content.encode("utf-8"),
            path_in_repo="README.md",
            repo_id=repo_id,
            repo_type="dataset",
            commit_message="Initialize dataset README",
        )
    except Exception as e:
        print(f"Note: README might already exist or failed to update: {e}")


def define_config(repo_id, system_prompt=None, token=None):
    """
    Define a configuration for the dataset, including a system prompt.
    This saves a config.json file to the repository.
    """
    api = HfApi(token=token)

    config_data = {"dataset_config": {"version": "1.0", "created_at": time.time()}}

    if system_prompt:
        config_data["system_prompt"] = system_prompt

    # Upload config.json
    api.upload_file(
        path_or_fileobj=json.dumps(config_data, indent=2).encode("utf-8"),
        path_in_repo="config.json",
        repo_id=repo_id,
        repo_type="dataset",
        commit_message="Update dataset configuration",
    )
    print(f"Configuration updated for {repo_id}")


def load_dataset_template(template_name: str) -> Dict[str, Any]:
    """Load dataset template configuration from templates directory."""
    template_path = EXAMPLES_DIR.parent / "templates" / f"{template_name}.json"
    if not template_path.exists():
        available_templates = [f.stem for f in (EXAMPLES_DIR.parent / "templates").glob("*.json")]
        print(f"❌ Template '{template_name}' not found.")
        print(f"Available templates: {', '.join(available_templates)}")
        return {}

    with open(template_path) as f:
        return json.load(f)


def validate_by_template(rows: List[Dict[str, Any]], template: Dict[str, Any]) -> bool:
    """Validate data according to template schema."""
    if not template:
        return False

    schema = template.get("validation_schema", {})
    required_fields = set(schema.get("required_fields", []))
    recommended_fields = set(schema.get("recommended_fields", []))
    field_types =
Download .txt
gitextract_brvqcrv9/

├── .claude-plugin/
│   ├── marketplace.json
│   └── plugin.json
├── .cursor-plugin/
│   ├── marketplace.json
│   └── plugin.json
├── .github/
│   └── workflows/
│       ├── SECURITY.md
│       ├── generate-agents.yml
│       ├── push-evals-leaderboard.yml
│       ├── push-hackers-leaderboard.yml
│       └── push-quests.yml
├── .gitignore
├── .mcp.json
├── LICENSE
├── README.md
├── agents/
│   └── AGENTS.md
├── apps/
│   ├── evals-leaderboard/
│   │   ├── app.py
│   │   ├── collect_evals.py
│   │   └── requirements.txt
│   ├── hackers-leaderboard/
│   │   ├── README.md
│   │   ├── app.py
│   │   ├── collect_points.py
│   │   └── requirements.txt
│   └── quests/
│       ├── 02_evaluate-hub-model.md
│       ├── 03_publish-hub-dataset.md
│       ├── 04_sft-finetune-hub.md
│       └── README.md
├── gemini-extension.json
├── hf-mcp/
│   └── skills/
│       └── hf-mcp/
│           └── SKILL.md
├── scripts/
│   ├── AGENTS_TEMPLATE.md
│   ├── UV_RULES.md
│   ├── generate_agents.py
│   ├── generate_cursor_plugin.py
│   ├── publish.sh
│   └── run_skills_help.py
└── skills/
    ├── hf-cli/
    │   └── SKILL.md
    ├── hugging-face-dataset-viewer/
    │   └── SKILL.md
    ├── hugging-face-datasets/
    │   ├── SKILL.md
    │   ├── examples/
    │   │   ├── diverse_training_examples.json
    │   │   ├── system_prompt_template.txt
    │   │   └── training_examples.json
    │   ├── scripts/
    │   │   ├── dataset_manager.py
    │   │   └── sql_manager.py
    │   └── templates/
    │       ├── chat.json
    │       ├── classification.json
    │       ├── completion.json
    │       ├── custom.json
    │       ├── qa.json
    │       └── tabular.json
    ├── hugging-face-evaluation/
    │   ├── SKILL.md
    │   ├── examples/
    │   │   ├── USAGE_EXAMPLES.md
    │   │   ├── artificial_analysis_to_hub.py
    │   │   ├── example_readme_tables.md
    │   │   └── metric_mapping.json
    │   └── scripts/
    │       ├── evaluation_manager.py
    │       ├── inspect_eval_uv.py
    │       ├── inspect_vllm_uv.py
    │       ├── lighteval_vllm_uv.py
    │       ├── run_eval_job.py
    │       ├── run_vllm_eval_job.py
    │       └── test_extraction.py
    ├── hugging-face-jobs/
    │   ├── SKILL.md
    │   ├── index.html
    │   ├── references/
    │   │   ├── hardware_guide.md
    │   │   ├── hub_saving.md
    │   │   ├── token_usage.md
    │   │   └── troubleshooting.md
    │   └── scripts/
    │       ├── cot-self-instruct.py
    │       ├── finepdfs-stats.py
    │       └── generate-responses.py
    ├── hugging-face-model-trainer/
    │   ├── SKILL.md
    │   ├── references/
    │   │   ├── gguf_conversion.md
    │   │   ├── hardware_guide.md
    │   │   ├── hub_saving.md
    │   │   ├── local_training_macos.md
    │   │   ├── reliability_principles.md
    │   │   ├── trackio_guide.md
    │   │   ├── training_methods.md
    │   │   ├── training_patterns.md
    │   │   ├── troubleshooting.md
    │   │   └── unsloth.md
    │   └── scripts/
    │       ├── convert_to_gguf.py
    │       ├── dataset_inspector.py
    │       ├── estimate_cost.py
    │       ├── train_dpo_example.py
    │       ├── train_grpo_example.py
    │       ├── train_sft_example.py
    │       └── unsloth_sft_example.py
    ├── hugging-face-paper-pages/
    │   └── SKILL.md
    ├── hugging-face-paper-publisher/
    │   ├── SKILL.md
    │   ├── examples/
    │   │   └── example_usage.md
    │   ├── references/
    │   │   └── quick_reference.md
    │   ├── scripts/
    │   │   └── paper_manager.py
    │   └── templates/
    │       ├── arxiv.md
    │       ├── ml-report.md
    │       ├── modern.md
    │       └── standard.md
    ├── hugging-face-tool-builder/
    │   ├── SKILL.md
    │   └── references/
    │       ├── baseline_hf_api.py
    │       ├── baseline_hf_api.sh
    │       ├── baseline_hf_api.tsx
    │       ├── find_models_by_paper.sh
    │       ├── hf_enrich_models.sh
    │       ├── hf_model_card_frontmatter.sh
    │       └── hf_model_papers_auth.sh
    ├── hugging-face-trackio/
    │   ├── .claude-plugin/
    │   │   └── plugin.json
    │   ├── SKILL.md
    │   └── references/
    │       ├── alerts.md
    │       ├── logging_metrics.md
    │       └── retrieving_metrics.md
    ├── hugging-face-vision-trainer/
    │   ├── SKILL.md
    │   ├── references/
    │   │   ├── finetune_sam2_trainer.md
    │   │   ├── hub_saving.md
    │   │   ├── image_classification_training_notebook.md
    │   │   ├── object_detection_training_notebook.md
    │   │   ├── reliability_principles.md
    │   │   └── timm_trainer.md
    │   └── scripts/
    │       ├── dataset_inspector.py
    │       ├── estimate_cost.py
    │       ├── image_classification_training.py
    │       ├── object_detection_training.py
    │       └── sam_segmentation_training.py
    ├── huggingface-gradio/
    │   ├── SKILL.md
    │   └── examples.md
    └── transformers.js/
        ├── SKILL.md
        └── references/
            ├── CACHE.md
            ├── CONFIGURATION.md
            ├── EXAMPLES.md
            ├── MODEL_ARCHITECTURES.md
            ├── PIPELINE_OPTIONS.md
            └── TEXT_GENERATION.md
Download .txt
SYMBOL INDEX (263 symbols across 31 files)

FILE: apps/evals-leaderboard/app.py
  function format_model_link (line 39) | def format_model_link(model_id: str) -> str:
  function format_source_link (line 44) | def format_source_link(source_type: str, contributor: str, source_url: s...
  function fetch_leaderboard (line 49) | def fetch_leaderboard() -> tuple[list[dict], dict]:
  function refresh_handler (line 64) | def refresh_handler() -> tuple[str, list[list]]:

FILE: apps/evals-leaderboard/collect_evals.py
  function _normalize (line 38) | def _normalize(text: Optional[str]) -> str:
  function _coerce_score (line 46) | def _coerce_score(value: Any) -> Optional[float]:
  class BenchmarkSpec (line 63) | class BenchmarkSpec:
    method matches (line 68) | def matches(self, fields: List[str]) -> bool:
  class EvalsCollector (line 104) | class EvalsCollector:
    method __init__ (line 107) | def __init__(self, token: str | None = None) -> None:
    method log (line 116) | def log(self, message: str) -> None:
    method collect_all (line 121) | def collect_all(self) -> List[Dict[str, Any]]:
    method _fetch_trending_models (line 137) | def _fetch_trending_models(self) -> List[Dict[str, Any]]:
    method _collect_scores (line 160) | def _collect_scores(self, repo_id: str) -> Dict[str, Any]:
    method _read_model_card (line 204) | def _read_model_card(
    method _parse_front_matter (line 225) | def _parse_front_matter(content: str) -> Dict[str, Any]:
    method _fetch_pull_requests (line 244) | def _fetch_pull_requests(self, repo_id: str) -> List[Dict[str, Any]]:
    method _extract_scores (line 265) | def _extract_scores(
    method _match_benchmark (line 324) | def _match_benchmark(
    method _format_scores (line 342) | def _format_scores(self, model_id: str, scores: Dict[str, Dict[str, An...
    method get_leaderboard (line 360) | def get_leaderboard(self) -> List[Dict[str, Any]]:
    method save_json (line 364) | def save_json(self, filepath: str) -> None:
    method push_to_hub (line 377) | def push_to_hub(self, repo_id: str = "hf-skills/evals-leaderboard") ->...
  function main (line 427) | def main() -> None:

FILE: apps/hackers-leaderboard/app.py
  function format_username (line 38) | def format_username(username: str) -> str:
  function fetch_leaderboard (line 43) | def fetch_leaderboard() -> tuple[list[dict], dict]:
  function refresh_handler (line 58) | def refresh_handler() -> tuple[str, list[list]]:

FILE: apps/hackers-leaderboard/collect_points.py
  class UserStats (line 36) | class UserStats:
    method total_points (line 48) | def total_points(self) -> int:
    method to_dict (line 51) | def to_dict(self) -> dict[str, Any]:
  class PointsCollector (line 63) | class PointsCollector:
    method __init__ (line 66) | def __init__(self, token: str | None = None) -> None:
    method log (line 75) | def log(self, message: str) -> None:
    method _fetch_org_members (line 80) | def _fetch_org_members(self) -> list[str]:
    method collect_all (line 106) | def collect_all(self) -> dict[str, UserStats]:
    method scan_external_repos (line 143) | def scan_external_repos(self, repo_types: list[str] | None = None) -> ...
    method _fetch_trending (line 177) | def _fetch_trending(self, repo_type: str) -> list[dict[str, Any]]:
    method _scan_repo_for_members (line 190) | def _scan_repo_for_members(self, repo_id: str, repo_type: str, org_mem...
    method _fetch_member_discussions (line 215) | def _fetch_member_discussions(
    method _fetch_discussion_comments (line 260) | def _fetch_discussion_comments(
    method _list_repos (line 287) | def _list_repos(self, repo_type: str) -> list[dict[str, Any]]:
    method _scan_discussions (line 300) | def _scan_discussions(self, repo_id: str, repo_type: str) -> None:
    method _process_discussion (line 325) | def _process_discussion(self, repo_id: str, api_type: str, discussion:...
    method _fetch_comments (line 341) | def _fetch_comments(self, repo_id: str, api_type: str, disc_num: int) ...
    method _add_point (line 363) | def _add_point(
    method get_leaderboard (line 392) | def get_leaderboard(self) -> list[dict[str, Any]]:
    method save_json (line 398) | def save_json(self, filepath: str) -> None:
    method push_to_hub (line 411) | def push_to_hub(self, repo_id: str = "hf-skills/hackers-leaderboard") ...
  function main (line 461) | def main() -> None:

FILE: scripts/generate_agents.py
  function load_template (line 31) | def load_template() -> str:
  function parse_frontmatter (line 35) | def parse_frontmatter(text: str) -> dict[str, str]:
  function collect_skills (line 49) | def collect_skills() -> list[dict[str, str]]:
  function render (line 68) | def render(template: str, skills: list[dict[str, str]]) -> str:
  function load_marketplace (line 87) | def load_marketplace() -> dict:
  function generate_readme_table (line 94) | def generate_readme_table(skills: list[dict[str, str]]) -> str:
  function update_readme (line 115) | def update_readme(skills: list[dict[str, str]]) -> bool:
  function validate_marketplace (line 154) | def validate_marketplace(skills: list[dict[str, str]]) -> list[str]:
  function main (line 190) | def main() -> None:

FILE: scripts/generate_cursor_plugin.py
  function load_json (line 41) | def load_json(path: Path) -> dict:
  function parse_frontmatter (line 47) | def parse_frontmatter(text: str) -> dict[str, str]:
  function collect_skills (line 60) | def collect_skills() -> list[str]:
  function validate_plugin_name (line 71) | def validate_plugin_name(name: str) -> None:
  function build_cursor_plugin_manifest (line 79) | def build_cursor_plugin_manifest() -> dict:
  function extract_mcp_from_gemini (line 110) | def extract_mcp_from_gemini() -> tuple[str, str]:
  function build_mcp_config (line 133) | def build_mcp_config() -> dict:
  function render_json (line 144) | def render_json(data: dict) -> str:
  function write_or_check (line 148) | def write_or_check(path: Path, content: str, check: bool) -> bool:
  function main (line 162) | def main() -> None:

FILE: scripts/run_skills_help.py
  function find_python_files (line 9) | def find_python_files():
  function run_with_help (line 20) | def run_with_help(python_file):
  function main (line 56) | def main():

FILE: skills/hugging-face-datasets/scripts/dataset_manager.py
  function init_dataset (line 39) | def init_dataset(repo_id, token=None, private=True):
  function define_config (line 74) | def define_config(repo_id, system_prompt=None, token=None):
  function load_dataset_template (line 97) | def load_dataset_template(template_name: str) -> Dict[str, Any]:
  function validate_by_template (line 110) | def validate_by_template(rows: List[Dict[str, Any]], template: Dict[str,...
  function _validate_field_type (line 153) | def _validate_field_type(value: Any, expected_type: str, context: str) -...
  function _validate_chat_format (line 176) | def _validate_chat_format(row: Dict[str, Any], row_index: int) -> bool:
  function _validate_classification_format (line 198) | def _validate_classification_format(row: Dict[str, Any], row_index: int)...
  function _validate_tabular_format (line 210) | def _validate_tabular_format(row: Dict[str, Any], row_index: int) -> bool:
  function validate_training_data (line 232) | def validate_training_data(rows: List[Dict[str, Any]], template_name: st...
  function _basic_validation (line 245) | def _basic_validation(rows: List[Dict[str, Any]]) -> bool:
  function add_rows (line 255) | def add_rows(
  function load_template (line 307) | def load_template(template_name: str = "system_prompt_template.txt") -> ...
  function quick_setup (line 317) | def quick_setup(repo_id: str, template_type: str = "chat", token: Option...
  function show_stats (line 360) | def show_stats(repo_id: str, token: Optional[str] = None) -> None:
  function list_available_templates (line 386) | def list_available_templates() -> None:

FILE: skills/hugging-face-datasets/scripts/sql_manager.py
  class HFDatasetSQL (line 58) | class HFDatasetSQL:
    method __init__ (line 69) | def __init__(self, token: Optional[str] = None):
    method _quote_identifier (line 76) | def _quote_identifier(name: str) -> str:
    method _validate_identifier (line 81) | def _validate_identifier(name: str) -> None:
    method _setup_connection (line 90) | def _setup_connection(self):
    method _build_hf_path (line 96) | def _build_hf_path(
    method _build_hf_path_flexible (line 116) | def _build_hf_path_flexible(
    method query (line 144) | def query(
    method query_raw (line 208) | def query_raw(self, sql: str, output_format: str = "dict") -> Union[Li...
    method describe (line 235) | def describe(self, dataset_id: str, split: str = "train", config: Opti...
    method sample (line 257) | def sample(
    method count (line 287) | def count(
    method unique_values (line 311) | def unique_values(
    method histogram (line 335) | def histogram(
    method filter_and_transform (line 366) | def filter_and_transform(
    method join_datasets (line 418) | def join_datasets(
    method export_to_parquet (line 464) | def export_to_parquet(
    method export_to_jsonl (line 502) | def export_to_jsonl(
    method push_to_hub (line 532) | def push_to_hub(
    method create_view (line 590) | def create_view(self, name: str, dataset_id: str, split: str = "train"...
    method info (line 606) | def info(self, dataset_id: str) -> Dict[str, Any]:
    method close (line 642) | def close(self):
  function main (line 647) | def main():
  function _print_results (line 839) | def _print_results(results: List[Dict], format: str):

FILE: skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py
  function get_model_evaluations_data (line 53) | def get_model_evaluations_data(creator_slug, model_name):
  function aa_evaluations_to_model_index (line 65) | def aa_evaluations_to_model_index(
  function main (line 107) | def main():

FILE: skills/hugging-face-evaluation/scripts/evaluation_manager.py
  function load_env (line 29) | def load_env() -> None:
  function require_markdown_it (line 38) | def require_markdown_it():
  function require_model_card (line 49) | def require_model_card():
  function require_requests (line 60) | def require_requests():
  function require_yaml (line 71) | def require_yaml():
  function extract_tables_from_markdown (line 87) | def extract_tables_from_markdown(markdown_content: str) -> List[str]:
  function parse_markdown_table (line 95) | def parse_markdown_table(table_str: str) -> Tuple[List[str], List[List[s...
  function is_evaluation_table (line 123) | def is_evaluation_table(header: List[str], rows: List[List[str]]) -> bool:
  function normalize_model_name (line 154) | def normalize_model_name(name: str) -> tuple[set[str], str]:
  function find_main_model_column (line 176) | def find_main_model_column(header: List[str], model_name: str) -> Option...
  function find_main_model_row (line 215) | def find_main_model_row(
  function is_transposed_table (line 262) | def is_transposed_table(header: List[str], rows: List[List[str]]) -> bool:
  function extract_metrics_from_table (line 319) | def extract_metrics_from_table(
  function extract_evaluations_from_readme (line 489) | def extract_evaluations_from_readme(
  function extract_tables_with_parser (line 609) | def extract_tables_with_parser(markdown_content: str) -> List[Dict[str, ...
  function detect_table_format (line 656) | def detect_table_format(table: Dict[str, Any], repo_id: str) -> Dict[str...
  function inspect_tables (line 720) | def inspect_tables(repo_id: str) -> None:
  function get_open_prs (line 788) | def get_open_prs(repo_id: str) -> List[Dict[str, Any]]:
  function list_open_prs (line 826) | def list_open_prs(repo_id: str) -> None:
  function get_aa_model_data (line 853) | def get_aa_model_data(creator_slug: str, model_name: str) -> Optional[Di...
  function aa_data_to_model_index (line 893) | def aa_data_to_model_index(
  function import_aa_evaluations (line 943) | def import_aa_evaluations(
  function update_model_card_with_evaluations (line 973) | def update_model_card_with_evaluations(
  function show_evaluations (line 1052) | def show_evaluations(repo_id: str) -> None:
  function validate_model_index (line 1100) | def validate_model_index(repo_id: str) -> bool:
  function main (line 1153) | def main():

FILE: skills/hugging-face-evaluation/scripts/inspect_eval_uv.py
  function _inspect_evals_tasks_root (line 24) | def _inspect_evals_tasks_root() -> Optional[Path]:
  function _normalize_task (line 34) | def _normalize_task(task: str) -> str:
  function main (line 43) | def main() -> None:

FILE: skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py
  function setup_environment (line 41) | def setup_environment() -> None:
  function run_inspect_vllm (line 49) | def run_inspect_vllm(
  function run_inspect_hf (line 118) | def run_inspect_hf(
  function main (line 183) | def main() -> None:

FILE: skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py
  function setup_environment (line 37) | def setup_environment() -> None:
  function run_lighteval_vllm (line 45) | def run_lighteval_vllm(
  function run_lighteval_accelerate (line 113) | def run_lighteval_accelerate(
  function main (line 177) | def main() -> None:

FILE: skills/hugging-face-evaluation/scripts/run_eval_job.py
  function create_eval_job (line 32) | def create_eval_job(
  function main (line 80) | def main() -> None:

FILE: skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py
  function estimate_hardware (line 54) | def estimate_hardware(model_id: str) -> str:
  function create_lighteval_job (line 76) | def create_lighteval_job(
  function create_inspect_job (line 134) | def create_inspect_job(
  function main (line 186) | def main() -> None:

FILE: skills/hugging-face-evaluation/scripts/test_extraction.py
  function test_table_extraction (line 65) | def test_table_extraction():
  function test_table_parsing (line 82) | def test_table_parsing(tables):
  function test_evaluation_detection (line 105) | def test_evaluation_detection(parsed_tables):
  function test_metric_extraction (line 125) | def test_metric_extraction(eval_tables):
  function test_model_index_format (line 145) | def test_model_index_format(metrics):
  function main (line 177) | def main():

FILE: skills/hugging-face-jobs/scripts/cot-self-instruct.py
  function check_gpu_availability (line 104) | def check_gpu_availability() -> int:
  function parse_thinking_output (line 122) | def parse_thinking_output(text: str) -> str:
  function extract_reasoning_output (line 129) | def extract_reasoning_output(text: str) -> Tuple[Optional[str], Optional...
  function extract_instruction_output (line 152) | def extract_instruction_output(text: str) -> Optional[str]:
  function categorize_prompts (line 163) | def categorize_prompts(prompts: List[str], num_categories: int = 8) -> D...
  function generate_synthetic_data (line 196) | def generate_synthetic_data(
  function answer_consistency_filter (line 267) | def answer_consistency_filter(
  function rip_filter (line 319) | def rip_filter(
  function create_dataset_card (line 364) | def create_dataset_card(
  function main (line 441) | def main():

FILE: skills/hugging-face-jobs/scripts/finepdfs-stats.py
  function list_available_languages (line 75) | def list_available_languages(dataset_id: str) -> list[str]:
  function compute_temporal_stats (line 91) | def compute_temporal_stats(df: pl.LazyFrame, output_path: Path) -> pl.Da...
  function compute_global_stats (line 103) | def compute_global_stats(temporal: pl.DataFrame) -> pl.DataFrame:
  function format_temporal_stats (line 119) | def format_temporal_stats(temporal: pl.DataFrame) -> pl.DataFrame:
  function create_ascii_charts (line 132) | def create_ascii_charts(temporal_stats: pl.DataFrame) -> str:
  function create_readme (line 171) | def create_readme(
  function main (line 274) | def main():

FILE: skills/hugging-face-jobs/scripts/generate-responses.py
  function check_gpu_availability (line 64) | def check_gpu_availability() -> int:
  function create_dataset_card (line 82) | def create_dataset_card(
  function main (line 168) | def main(

FILE: skills/hugging-face-model-trainer/scripts/convert_to_gguf.py
  function check_system_dependencies (line 50) | def check_system_dependencies():
  function run_command (line 77) | def run_command(cmd, description):

FILE: skills/hugging-face-model-trainer/scripts/dataset_inspector.py
  function parse_args (line 29) | def parse_args():
  function api_request (line 40) | def api_request(url: str) -> Dict:
  function get_splits (line 53) | def get_splits(dataset: str) -> Dict:
  function get_rows (line 59) | def get_rows(dataset: str, config: str, split: str, offset: int = 0, len...
  function find_columns (line 65) | def find_columns(columns: List[str], patterns: List[str]) -> List[str]:
  function check_sft_compatibility (line 70) | def check_sft_compatibility(columns: List[str]) -> Dict[str, Any]:
  function check_dpo_compatibility (line 90) | def check_dpo_compatibility(columns: List[str]) -> Dict[str, Any]:
  function check_grpo_compatibility (line 109) | def check_grpo_compatibility(columns: List[str]) -> Dict[str, Any]:
  function check_kto_compatibility (line 123) | def check_kto_compatibility(columns: List[str]) -> Dict[str, Any]:
  function generate_mapping_code (line 128) | def generate_mapping_code(method: str, info: Dict[str, Any]) -> str:
  function format_value_preview (line 186) | def format_value_preview(value: Any, max_chars: int) -> str:
  function main (line 202) | def main():

FILE: skills/hugging-face-model-trainer/scripts/estimate_cost.py
  function estimate_training_time (line 39) | def estimate_training_time(model_params, dataset_size, epochs, hardware):
  function parse_args (line 66) | def parse_args():
  function extract_model_size (line 75) | def extract_model_size(model_name):
  function main (line 90) | def main():

FILE: skills/hugging-face-model-trainer/scripts/unsloth_sft_example.py
  function check_cuda (line 58) | def check_cuda():
  function parse_args (line 72) | def parse_args():
  function main (line 215) | def main():

FILE: skills/hugging-face-paper-publisher/scripts/paper_manager.py
  class PaperManager (line 39) | class PaperManager:
    method __init__ (line 42) | def __init__(self, hf_token: Optional[str] = None):
    method index_paper (line 49) | def index_paper(self, arxiv_id: str) -> Dict[str, Any]:
    method check_paper (line 84) | def check_paper(self, arxiv_id: str) -> Dict[str, Any]:
    method link_paper_to_repo (line 119) | def link_paper_to_repo(
    method _add_paper_to_readme (line 197) | def _add_paper_to_readme(
    method create_research_article (line 255) | def create_research_article(
    method get_arxiv_info (line 338) | def get_arxiv_info(self, arxiv_id: str) -> Dict[str, Any]:
    method generate_citation (line 382) | def generate_citation(
    method _clean_arxiv_id (line 434) | def _clean_arxiv_id(arxiv_id: str) -> str:
    method _escape_yaml_value (line 457) | def _escape_yaml_value(value: str) -> str:
    method _sanitize_text (line 467) | def _sanitize_text(text: str) -> str:
  function main (line 485) | def main():

FILE: skills/hugging-face-tool-builder/references/baseline_hf_api.py
  function show_help (line 16) | def show_help() -> None:
  function main (line 36) | def main() -> int:

FILE: skills/hugging-face-vision-trainer/scripts/dataset_inspector.py
  function parse_args (line 31) | def parse_args():
  function api_request (line 42) | def api_request(url: str) -> Dict:
  function get_splits (line 55) | def get_splits(dataset: str) -> Dict:
  function get_rows (line 61) | def get_rows(dataset: str, config: str, split: str, offset: int = 0, len...
  function find_columns (line 67) | def find_columns(columns: List[str], patterns: List[str]) -> List[str]:
  function detect_bbox_format (line 72) | def detect_bbox_format(bbox: List[float], image_size: Tuple[int, int] = ...
  function _extract_image_size (line 110) | def _extract_image_size(row: Dict) -> Tuple[int, int] | None:
  function analyze_annotations (line 122) | def analyze_annotations(sample_rows: List[Dict], annotation_cols: List[s...
  function check_image_classification_compatibility (line 243) | def check_image_classification_compatibility(columns: List[str], sample_...
  function check_object_detection_compatibility (line 296) | def check_object_detection_compatibility(columns: List[str], sample_rows...
  function check_sam_segmentation_compatibility (line 329) | def check_sam_segmentation_compatibility(columns: List[str], sample_rows...
  function _try_json (line 409) | def _try_json(value) -> Any:
  function _validate_bbox (line 418) | def _validate_bbox(bbox, image_size=None) -> Dict[str, Any]:
  function generate_mapping_code (line 457) | def generate_mapping_code(info: Dict[str, Any]) -> str:
  function format_value_preview (line 522) | def format_value_preview(value: Any, max_chars: int) -> str:
  function main (line 546) | def main():

FILE: skills/hugging-face-vision-trainer/scripts/estimate_cost.py
  function extract_model_params (line 71) | def extract_model_params(model_name: str) -> float:
  function estimate_training_time (line 80) | def estimate_training_time(model_params_m: float, dataset_size: int, epo...
  function parse_args (line 122) | def parse_args():
  function main (line 136) | def main():

FILE: skills/hugging-face-vision-trainer/scripts/image_classification_training.py
  class DataTrainingArguments (line 62) | class DataTrainingArguments:
  class ModelArguments (line 94) | class ModelArguments:
  function build_transforms (line 129) | def build_transforms(image_processor, is_training: bool):
  function main (line 167) | def main():

FILE: skills/hugging-face-vision-trainer/scripts/object_detection_training.py
  class ModelOutput (line 59) | class ModelOutput:
  function format_image_annotations_as_coco (line 64) | def format_image_annotations_as_coco(
  function detect_bbox_format_from_samples (line 99) | def detect_bbox_format_from_samples(dataset, image_col="image", objects_...
  function sanitize_dataset (line 142) | def sanitize_dataset(dataset, bbox_format="xywh", image_col="image", obj...
  function convert_bbox_yolo_to_pascal (line 213) | def convert_bbox_yolo_to_pascal(boxes: torch.Tensor, image_size: tuple[i...
  function augment_and_transform_batch (line 239) | def augment_and_transform_batch(
  function collate_fn (line 287) | def collate_fn(batch: list[BatchFeature]) -> Mapping[str, torch.Tensor |...
  function compute_metrics (line 297) | def compute_metrics(
  class DataTrainingArguments (line 374) | class DataTrainingArguments:
  class ModelArguments (line 422) | class ModelArguments:
  function main (line 469) | def main():

FILE: skills/hugging-face-vision-trainer/scripts/sam_segmentation_training.py
  class SAMSegmentationDataset (line 49) | class SAMSegmentationDataset(Dataset):
    method __init__ (line 57) | def __init__(self, dataset, processor, prompt_type: str,
    method __len__ (line 69) | def __len__(self):
    method _extract_prompt (line 72) | def _extract_prompt(self, item):
    method __getitem__ (line 86) | def __getitem__(self, idx):
  function collate_fn (line 106) | def collate_fn(batch):
  function compute_loss (line 151) | def compute_loss(outputs, labels, num_items_in_batch=None):
  class DataTrainingArguments (line 161) | class DataTrainingArguments:
  class ModelArguments (line 209) | class ModelArguments:
  function main (line 232) | def main():
Condensed preview — 129 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,215K chars).
[
  {
    "path": ".claude-plugin/marketplace.json",
    "chars": 4816,
    "preview": "{\n  \"name\": \"huggingface-skills\",\n  \"owner\": {\n    \"name\": \"Hugging Face\"\n  },\n  \"metadata\": {\n    \"description\": \"Agent"
  },
  {
    "path": ".claude-plugin/plugin.json",
    "chars": 555,
    "preview": "{\n  \"name\": \"huggingface-skills\",\n  \"description\": \"Agent Skills for AI/ML tasks including dataset creation, model train"
  },
  {
    "path": ".cursor-plugin/marketplace.json",
    "chars": 554,
    "preview": "{\n  \"name\": \"huggingface-skills\",\n  \"owner\": {\n    \"name\": \"Hugging Face\"\n  },\n  \"metadata\": {\n    \"description\": \"Agent"
  },
  {
    "path": ".cursor-plugin/plugin.json",
    "chars": 605,
    "preview": "{\n  \"name\": \"huggingface-skills\",\n  \"skills\": \"skills\",\n  \"mcpServers\": \".mcp.json\",\n  \"description\": \"Agent Skills for "
  },
  {
    "path": ".github/workflows/SECURITY.md",
    "chars": 1356,
    "preview": "# Security Policy\n\n## Reporting a Vulnerability\nWe take the security of this project seriously. If you discover a securi"
  },
  {
    "path": ".github/workflows/generate-agents.yml",
    "chars": 734,
    "preview": "name: Validate generated agent/plugin artifacts\n\non:\n  pull_request:\n    paths:\n      - \"scripts/AGENTS_TEMPLATE.md\"\n   "
  },
  {
    "path": ".github/workflows/push-evals-leaderboard.yml",
    "chars": 589,
    "preview": "name: Sync Evals Leaderboard to Hugging Face Space\n\non:\n  push:\n    branches:\n      - main\n    paths:\n      - 'apps/eval"
  },
  {
    "path": ".github/workflows/push-hackers-leaderboard.yml",
    "chars": 592,
    "preview": "name: Sync Hackers Leaderboard to Hugging Face Space\n\non:\n  push:\n    branches:\n      - main\n    paths:\n      - 'apps/ha"
  },
  {
    "path": ".github/workflows/push-quests.yml",
    "chars": 518,
    "preview": "name: Sync Quests to Hugging Face Space\n\non:\n  push:\n    branches:\n      - main\n    paths:\n      - 'quests/**'\n  workflo"
  },
  {
    "path": ".gitignore",
    "chars": 4727,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[codz]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packag"
  },
  {
    "path": ".mcp.json",
    "chars": 108,
    "preview": "{\n  \"mcpServers\": {\n    \"huggingface-skills\": {\n      \"url\": \"https://huggingface.co/mcp?login\"\n    }\n  }\n}\n"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 9389,
    "preview": "# Hugging Face Skills\n\nHugging Face Skills are definitions for AI/ML tasks like dataset creation, model training, and ev"
  },
  {
    "path": "agents/AGENTS.md",
    "chars": 6223,
    "preview": "<skills>\n\nYou have additional SKILLs documented in directories containing a \"SKILL.md\" file.\n\nThese skills are:\n - gradi"
  },
  {
    "path": "apps/evals-leaderboard/app.py",
    "chars": 3624,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nEvals Leaderboard - Gradio app for displaying model evaluation scores.\n\nReads leaderboard dat"
  },
  {
    "path": "apps/evals-leaderboard/collect_evals.py",
    "chars": 16725,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nCollect evaluation scores from trending models' model-index metadata.\n\nScans trending text-ge"
  },
  {
    "path": "apps/evals-leaderboard/requirements.txt",
    "chars": 40,
    "preview": "gradio\nrequests\npyyaml\nhuggingface_hub\n\n"
  },
  {
    "path": "apps/hackers-leaderboard/README.md",
    "chars": 1937,
    "preview": "---\ntitle: Hackers Leaderboard\nemoji: 🏆\ncolorFrom: purple\ncolorTo: pink\nsdk: gradio\nsdk_version: 5.50.0\napp_file: app.py"
  },
  {
    "path": "apps/hackers-leaderboard/app.py",
    "chars": 3488,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nHackers Leaderboard - Gradio app for displaying engagement from hf-skills org.\n\nReads leaderb"
  },
  {
    "path": "apps/hackers-leaderboard/collect_points.py",
    "chars": 19448,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nCollect engagement points from the hf-skills organization.\n\nTracks user activity across all r"
  },
  {
    "path": "apps/hackers-leaderboard/requirements.txt",
    "chars": 54,
    "preview": "gradio>=5.50.0\nhuggingface_hub>=1.1.4\nrequests>=2.32.5"
  },
  {
    "path": "apps/quests/02_evaluate-hub-model.md",
    "chars": 5109,
    "preview": "# Week 1: Evaluate a Hub Model\n\n📣 TASK: Add evaluation results to model cards across the Hub. Together, we're building a"
  },
  {
    "path": "apps/quests/03_publish-hub-dataset.md",
    "chars": 2456,
    "preview": "# Week 2: Publish a Hub Dataset\n\nCreate and share high-quality datasets on the Hub. Good data is the foundation of good "
  },
  {
    "path": "apps/quests/04_sft-finetune-hub.md",
    "chars": 1559,
    "preview": "# Week 3: Supervised Fine-Tuning on the Hub\n\nFine-tune and share models on the Hub. Take a base model, train it on your "
  },
  {
    "path": "apps/quests/README.md",
    "chars": 4387,
    "preview": "---\ntitle: README\nemoji: 🐠\ncolorFrom: yellow\ncolorTo: gray\nsdk: static\npinned: false\n---\n\n# Humanity's Last Hackathon (o"
  },
  {
    "path": "gemini-extension.json",
    "chars": 296,
    "preview": "{\n    \"name\": \"huggingface-skills\",\n    \"description\": \"Provides access to the Hugging Face Skills.\",\n    \"version\": \"0."
  },
  {
    "path": "hf-mcp/skills/hf-mcp/SKILL.md",
    "chars": 4966,
    "preview": "---\nname: hf-mcp\ndescription: Use Hugging Face Hub via MCP server tools. Search models, datasets, Spaces, papers. Get re"
  },
  {
    "path": "scripts/AGENTS_TEMPLATE.md",
    "chars": 594,
    "preview": "<skills>\n\nYou have additional SKILLs documented in directories containing a \"SKILL.md\" file.\n\nThese skills are:\n{{#skill"
  },
  {
    "path": "scripts/UV_RULES.md",
    "chars": 852,
    "preview": "# UV rules\n\nUse these rules for Python scripts in this repository:\n\n1. **Use PEP 723 inline dependencies** in each runna"
  },
  {
    "path": "scripts/generate_agents.py",
    "chars": 7090,
    "preview": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"Generate AGENTS.md from "
  },
  {
    "path": "scripts/generate_cursor_plugin.py",
    "chars": 5914,
    "preview": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"Generate Cursor plugin a"
  },
  {
    "path": "scripts/publish.sh",
    "chars": 1699,
    "preview": "#!/usr/bin/env bash\nset -euo pipefail\n\nROOT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")/..\" && pwd)\"\ncd \"$ROOT_DIR\"\n\nGENER"
  },
  {
    "path": "scripts/run_skills_help.py",
    "chars": 2401,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nScript to execute all Python programs under \"skills\" directories with `uv run` and --help fla"
  },
  {
    "path": "skills/hf-cli/SKILL.md",
    "chars": 14779,
    "preview": "---\nname: hf-cli\ndescription: \"Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing repositories, models"
  },
  {
    "path": "skills/hugging-face-dataset-viewer/SKILL.md",
    "chars": 4710,
    "preview": "---\r\nname: hugging-face-dataset-viewer\r\ndescription: Use this skill for Hugging Face Dataset Viewer API workflows that f"
  },
  {
    "path": "skills/hugging-face-datasets/SKILL.md",
    "chars": 16374,
    "preview": "---\nname: hugging-face-datasets\ndescription: Create and manage datasets on Hugging Face Hub. Supports initializing repos"
  },
  {
    "path": "skills/hugging-face-datasets/examples/diverse_training_examples.json",
    "chars": 17751,
    "preview": "[\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"Hi there! I'm working on a Python project an"
  },
  {
    "path": "skills/hugging-face-datasets/examples/system_prompt_template.txt",
    "chars": 5451,
    "preview": "You are an AI assistant expert at using MCP (Model Context Protocol) tools effectively. You have access to various MCP s"
  },
  {
    "path": "skills/hugging-face-datasets/examples/training_examples.json",
    "chars": 6812,
    "preview": "[\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"My Node.js app is crashing on startup. Can y"
  },
  {
    "path": "skills/hugging-face-datasets/scripts/dataset_manager.py",
    "chars": 18374,
    "preview": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#   \"huggingface_hub>=0.20.0\",\n# ]"
  },
  {
    "path": "skills/hugging-face-datasets/scripts/sql_manager.py",
    "chars": 29784,
    "preview": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#   \"duckdb>=1.0.0\",\n#   \"huggingf"
  },
  {
    "path": "skills/hugging-face-datasets/templates/chat.json",
    "chars": 2290,
    "preview": "{\n  \"type\": \"chat\",\n  \"description\": \"Conversational AI training data with chat templating support\",\n  \"system_prompt\": "
  },
  {
    "path": "skills/hugging-face-datasets/templates/classification.json",
    "chars": 1923,
    "preview": "{\n  \"type\": \"classification\",\n  \"description\": \"Text classification datasets for sentiment analysis, topic classificatio"
  },
  {
    "path": "skills/hugging-face-datasets/templates/completion.json",
    "chars": 2664,
    "preview": "{\n  \"type\": \"completion\",\n  \"description\": \"Text completion and generation datasets for language modeling, code completi"
  },
  {
    "path": "skills/hugging-face-datasets/templates/custom.json",
    "chars": 2412,
    "preview": "{\n  \"type\": \"custom\",\n  \"description\": \"Flexible template for custom dataset formats - define your own schema and valida"
  },
  {
    "path": "skills/hugging-face-datasets/templates/qa.json",
    "chars": 2673,
    "preview": "{\n  \"type\": \"qa\",\n  \"description\": \"Question-answering datasets for reading comprehension, knowledge QA, and factual que"
  },
  {
    "path": "skills/hugging-face-datasets/templates/tabular.json",
    "chars": 2546,
    "preview": "{\n  \"type\": \"tabular\",\n  \"description\": \"Structured tabular data for regression, classification, or analysis tasks\",\n  \""
  },
  {
    "path": "skills/hugging-face-evaluation/SKILL.md",
    "chars": 22585,
    "preview": "---\nname: hugging-face-evaluation\ndescription: Add and manage evaluation results in Hugging Face model cards. Supports e"
  },
  {
    "path": "skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md",
    "chars": 9311,
    "preview": "# Usage Examples\n\nThis document provides practical examples for both methods of adding evaluations to HuggingFace model "
  },
  {
    "path": "skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py",
    "chars": 3953,
    "preview": "# /// script\n# requires-python = \">=3.13\"\n# dependencies = [\n#     \"huggingface-hub>=1.1.4\",\n#     \"python-dotenv>=1.2.1"
  },
  {
    "path": "skills/hugging-face-evaluation/examples/example_readme_tables.md",
    "chars": 3118,
    "preview": "# Example Evaluation Table Formats\n\nThis file shows various formats of evaluation tables that can be extracted from mode"
  },
  {
    "path": "skills/hugging-face-evaluation/examples/metric_mapping.json",
    "chars": 924,
    "preview": "{\n  \"MMLU\": {\n    \"type\": \"mmlu\",\n    \"name\": \"Massive Multitask Language Understanding\"\n  },\n  \"HumanEval\": {\n    \"type"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/evaluation_manager.py",
    "chars": 48676,
    "preview": "# /// script\n# requires-python = \">=3.13\"\n# dependencies = [\n#     \"huggingface-hub>=1.1.4\",\n#     \"markdown-it-py>=3.0."
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/inspect_eval_uv.py",
    "chars": 2984,
    "preview": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"inspect-ai>=0.3.0\",\n#     \"inspect-evals\",\n#     \"op"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py",
    "chars": 9442,
    "preview": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"inspect-ai>=0.3.0\",\n#     \"inspect-evals\",\n#     \"vl"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py",
    "chars": 9400,
    "preview": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"lighteval[accelerate,vllm]>=0.6.0\",\n#     \"torch>=2."
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/run_eval_job.py",
    "chars": 2500,
    "preview": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"huggingface-hub>=0.26.0\",\n#     \"python-dotenv>=1.2."
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py",
    "chars": 9506,
    "preview": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"huggingface-hub>=0.26.0\",\n#     \"python-dotenv>=1.2."
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/test_extraction.py",
    "chars": 5607,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"pyyaml\",\n# ]\n# ///\n\"\"\"\nTest s"
  },
  {
    "path": "skills/hugging-face-jobs/SKILL.md",
    "chars": 31343,
    "preview": "---\nname: hugging-face-jobs\ndescription: This skill should be used when users want to run any workload on Hugging Face J"
  },
  {
    "path": "skills/hugging-face-jobs/index.html",
    "chars": 7682,
    "preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width"
  },
  {
    "path": "skills/hugging-face-jobs/references/hardware_guide.md",
    "chars": 8257,
    "preview": "# Hardware Selection Guide\n\nChoosing the right hardware (flavor) is critical for cost-effective workloads.\n\n> **Referenc"
  },
  {
    "path": "skills/hugging-face-jobs/references/hub_saving.md",
    "chars": 7630,
    "preview": "# Saving Results to Hugging Face Hub\n\n**⚠️ CRITICAL:** Job environments are ephemeral. ALL results are lost when a job c"
  },
  {
    "path": "skills/hugging-face-jobs/references/token_usage.md",
    "chars": 13332,
    "preview": "# Token Usage Guide for Hugging Face Jobs\n\n**⚠️ CRITICAL:** Proper token usage is essential for any job that interacts w"
  },
  {
    "path": "skills/hugging-face-jobs/references/troubleshooting.md",
    "chars": 10226,
    "preview": "# Troubleshooting Guide\n\nCommon issues and solutions for Hugging Face Jobs.\n\n## Authentication Issues\n\n### Error: 401 Un"
  },
  {
    "path": "skills/hugging-face-jobs/scripts/cot-self-instruct.py",
    "chars": 24883,
    "preview": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"datasets\",\n#     \"transformers\",\n#     \"vllm>=0.6.5\""
  },
  {
    "path": "skills/hugging-face-jobs/scripts/finepdfs-stats.py",
    "chars": 17115,
    "preview": "# /// script\n# requires-python = \">=3.12\"\n# dependencies = [\n#     \"polars>=1.31.0\",\n#     \"huggingface-hub\",\n#     \"dat"
  },
  {
    "path": "skills/hugging-face-jobs/scripts/generate-responses.py",
    "chars": 20516,
    "preview": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"datasets\",\n#     \"flashinfer-python\",\n#     \"hugging"
  },
  {
    "path": "skills/hugging-face-model-trainer/SKILL.md",
    "chars": 27849,
    "preview": "---\nname: hugging-face-model-trainer\ndescription: This skill should be used when users want to train or fine-tune langua"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/gguf_conversion.md",
    "chars": 9830,
    "preview": "# GGUF Conversion Guide\n\nAfter training models with TRL on Hugging Face Jobs, convert them to **GGUF format** for use wi"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/hardware_guide.md",
    "chars": 6752,
    "preview": "# Hardware Selection Guide\n\nChoosing the right hardware (flavor) is critical for cost-effective training.\n\n## Available "
  },
  {
    "path": "skills/hugging-face-model-trainer/references/hub_saving.md",
    "chars": 8471,
    "preview": "# Saving Training Results to Hugging Face Hub\n\n**⚠️ CRITICAL:** Training environments are ephemeral. ALL results are los"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/local_training_macos.md",
    "chars": 8281,
    "preview": "# Local Training on macOS (Apple Silicon)\n\nRun small LoRA fine-tuning jobs locally on Mac for smoke tests and quick iter"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/reliability_principles.md",
    "chars": 10826,
    "preview": "# Reliability Principles for Training Jobs\n\nThese principles are derived from real production failures and successful fi"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/trackio_guide.md",
    "chars": 5760,
    "preview": "# Trackio Integration for TRL Training\n\n**Trackio** is an experiment tracking library that provides real-time metrics vi"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/training_methods.md",
    "chars": 5011,
    "preview": "# TRL Training Methods Overview\n\nTRL (Transformer Reinforcement Learning) provides multiple training methods for fine-tu"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/training_patterns.md",
    "chars": 6111,
    "preview": "# Common Training Patterns\n\nThis guide provides common training patterns and use cases for TRL on Hugging Face Jobs.\n\n##"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/troubleshooting.md",
    "chars": 8840,
    "preview": "# Troubleshooting TRL Training Jobs\n\nCommon issues and solutions when training with TRL on Hugging Face Jobs.\n\n## Traini"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/unsloth.md",
    "chars": 8005,
    "preview": "# Unsloth: Fast Fine-Tuning with Memory Optimization\n\n**Unsloth** is a fine-tuning library that provides ~2x faster trai"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/convert_to_gguf.py",
    "chars": 12495,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"transformers>=4.36.0\",\n#     "
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/dataset_inspector.py",
    "chars": 15663,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"\nDataset Format Inspector "
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/estimate_cost.py",
    "chars": 4822,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"\nEstimate training time an"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/train_dpo_example.py",
    "chars": 3096,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"trl>=0.12.0\",\n#     \"transfor"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/train_grpo_example.py",
    "chars": 2371,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"trl>=0.12.0\",\n#     \"transfor"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/train_sft_example.py",
    "chars": 3321,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"trl>=0.12.0\",\n#     \"peft>=0."
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/unsloth_sft_example.py",
    "chars": 16854,
    "preview": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"unsloth\",\n#     \"datasets\",\n#     \"trl==0.22.2\",\n#  "
  },
  {
    "path": "skills/hugging-face-paper-pages/SKILL.md",
    "chars": 9354,
    "preview": "---\nname: hugging-face-paper-pages\ndescription: Look up and read Hugging Face paper pages in markdown, and use the paper"
  },
  {
    "path": "skills/hugging-face-paper-publisher/SKILL.md",
    "chars": 16689,
    "preview": "---\nname: hugging-face-paper-publisher\ndescription: Publish and manage research papers on Hugging Face Hub. Supports cre"
  },
  {
    "path": "skills/hugging-face-paper-publisher/examples/example_usage.md",
    "chars": 8478,
    "preview": "# Example Usage: HF Paper Publisher Skill\n\nThis document demonstrates common workflows for publishing research papers on"
  },
  {
    "path": "skills/hugging-face-paper-publisher/references/quick_reference.md",
    "chars": 4497,
    "preview": "# Quick Reference Guide\n\n## Essential Commands\n\n### Paper Indexing\n```bash\n# Index from arXiv\nuv run scripts/paper_manag"
  },
  {
    "path": "skills/hugging-face-paper-publisher/scripts/paper_manager.py",
    "chars": 21918,
    "preview": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"huggingface_hub\",\n#     \"py"
  },
  {
    "path": "skills/hugging-face-paper-publisher/templates/arxiv.md",
    "chars": 7295,
    "preview": "---\ntitle: {{TITLE}}\nauthors: {{AUTHORS}}\ndate: {{DATE}}\narxiv:\nlayout: arxiv\n---\n\n# {{TITLE}}\n\n<div class=\"arxiv-header"
  },
  {
    "path": "skills/hugging-face-paper-publisher/templates/ml-report.md",
    "chars": 6848,
    "preview": "---\ntitle: {{TITLE}}\nauthors: {{AUTHORS}}\ndate: {{DATE}}\ntype: ml-experiment-report\ntags: [machine-learning, experiment-"
  },
  {
    "path": "skills/hugging-face-paper-publisher/templates/modern.md",
    "chars": 6411,
    "preview": "---\ntitle: {{TITLE}}\nauthors: {{AUTHORS}}\ndate: {{DATE}}\narxiv:\ntags: [machine-learning, ai]\nlayout: modern\n---\n\n<div cl"
  },
  {
    "path": "skills/hugging-face-paper-publisher/templates/standard.md",
    "chars": 3180,
    "preview": "---\ntitle: {{TITLE}}\nauthors: {{AUTHORS}}\ndate: {{DATE}}\narxiv:\ntags: [machine-learning, deep-learning]\n---\n\n# {{TITLE}}"
  },
  {
    "path": "skills/hugging-face-tool-builder/SKILL.md",
    "chars": 5301,
    "preview": "---\nname: hugging-face-tool-builder\ndescription: Use this skill when the user wants to build tool/scripts or achieve a t"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/baseline_hf_api.py",
    "chars": 1369,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nUltra-simple Hugging Face API example (Python).\n\nFetches a small list of models from the HF A"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/baseline_hf_api.sh",
    "chars": 707,
    "preview": "#!/usr/bin/env bash\n\nset -euo pipefail\n\nshow_help() {\n    cat << EOF\nUltra-simple Hugging Face API example (Shell)\n\nUsag"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/baseline_hf_api.tsx",
    "chars": 1265,
    "preview": "#!/usr/bin/env tsx\n\n/**\n * Ultra-simple Hugging Face API example (TSX).\n *\n * Fetches a small list of models from the HF"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/find_models_by_paper.sh",
    "chars": 7675,
    "preview": "#!/bin/bash\n\n# Find models associated with papers on Hugging Face\n# Usage: ./find_models_by_paper.sh [arXiv_id|search_te"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/hf_enrich_models.sh",
    "chars": 2267,
    "preview": "#!/usr/bin/env bash\n\nset -euo pipefail\n\nshow_help() {\n    cat << 'USAGE'\nStream model IDs on stdin, emit one JSON object"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh",
    "chars": 4548,
    "preview": "#!/usr/bin/env bash\n\nset -euo pipefail\n\nshow_help() {\n    cat << 'USAGE'\nFetch Hugging Face model cards via the hf CLI a"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh",
    "chars": 4834,
    "preview": "#!/usr/bin/env bash\n\n# Hugging Face Model Papers Tool with Authentication\n# Fetches papers referenced by Hugging Face mo"
  },
  {
    "path": "skills/hugging-face-trackio/.claude-plugin/plugin.json",
    "chars": 694,
    "preview": "{\n  \"name\": \"trackio-cli\",\n  \"version\": \"1.0.0\",\n  \"description\": \"Query Trackio projects, runs, and metrics using the `"
  },
  {
    "path": "skills/hugging-face-trackio/SKILL.md",
    "chars": 4848,
    "preview": "---\nname: hugging-face-trackio\ndescription: Track and visualize ML training experiments with Trackio. Use when logging m"
  },
  {
    "path": "skills/hugging-face-trackio/references/alerts.md",
    "chars": 5931,
    "preview": "# Trackio Alerts\n\nAlerts let you flag important training events directly from code. They are the primary mechanism for L"
  },
  {
    "path": "skills/hugging-face-trackio/references/logging_metrics.md",
    "chars": 4847,
    "preview": "# Logging Metrics with Trackio\n\n**Trackio** is a lightweight, free experiment tracking library from Hugging Face. It pro"
  },
  {
    "path": "skills/hugging-face-trackio/references/retrieving_metrics.md",
    "chars": 8703,
    "preview": "# Retrieving Metrics with Trackio CLI\n\nThe `trackio` CLI provides direct terminal access to query Trackio experiment tra"
  },
  {
    "path": "skills/hugging-face-vision-trainer/SKILL.md",
    "chars": 29994,
    "preview": "---\nname: hugging-face-vision-trainer\ndescription: Trains and fine-tunes vision models for object detection (D-FINE, RT-"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/finetune_sam2_trainer.md",
    "chars": 6432,
    "preview": "# Fine-tuning SAM2 with HF Trainer\n\nFine-tune SAM2.1 on a small part of the MicroMat dataset for image matting,\nusing th"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/hub_saving.md",
    "chars": 16894,
    "preview": "# Saving Vision Models to Hugging Face Hub\n\n## Contents\n- Why Hub Push is Required\n- Required Configuration (TrainingArg"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/image_classification_training_notebook.md",
    "chars": 11060,
    "preview": "# Image classification\n\n## Contents\n- Load Food-101 dataset\n- Preprocess (ViT image processor, torchvision transforms)\n-"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/object_detection_training_notebook.md",
    "chars": 29903,
    "preview": "# Object Detection Training Reference\n\n## Contents\n- Load the CPPE-5 dataset\n- Preprocess the data (augmentation with Al"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/reliability_principles.md",
    "chars": 9452,
    "preview": "# Reliability Principles for Training Jobs\n\n## Contents\n- Principle 1: Always Verify Before Use\n- Principle 2: Prioritiz"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/timm_trainer.md",
    "chars": 3547,
    "preview": "# Using timm models with Hugging Face Trainer\n\nTransformers has first-class support for timm models via the `TimmWrapper"
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/dataset_inspector.py",
    "chars": 31763,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"\nDataset Format Inspector "
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/estimate_cost.py",
    "chars": 7400,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"\nEstimate training time an"
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/image_classification_training.py",
    "chars": 13385,
    "preview": "# /// script\n# dependencies = [\n#     \"transformers>=5.2.0\",\n#     \"accelerate>=1.1.0\",\n#     \"timm\",\n#     \"datasets>=4"
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/object_detection_training.py",
    "chars": 27358,
    "preview": "# /// script\n# dependencies = [\n#     \"transformers>=5.2.0\",\n#     \"accelerate>=1.1.0\",\n#     \"albumentations >= 1.4.16\""
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/sam_segmentation_training.py",
    "chars": 13946,
    "preview": "# /// script\n# dependencies = [\n#     \"transformers>=5.2.0\",\n#     \"accelerate>=1.1.0\",\n#     \"datasets>=4.0\",\n#     \"to"
  },
  {
    "path": "skills/huggingface-gradio/SKILL.md",
    "chars": 22800,
    "preview": "---\nname: gradio\ndescription: Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, compon"
  },
  {
    "path": "skills/huggingface-gradio/examples.md",
    "chars": 14481,
    "preview": "# Gradio End-to-End Examples\n\nComplete working Gradio apps for reference.\n\n## Blocks Essay Simple\n\n```python\nimport grad"
  },
  {
    "path": "skills/transformers.js/SKILL.md",
    "chars": 22667,
    "preview": "---\nname: transformers-js\ndescription: Use Transformers.js to run state-of-the-art machine learning models directly in J"
  },
  {
    "path": "skills/transformers.js/references/CACHE.md",
    "chars": 9187,
    "preview": "# Caching Reference\n\nComplete guide to caching strategies for Transformers.js models across different environments.\n\n## "
  },
  {
    "path": "skills/transformers.js/references/CONFIGURATION.md",
    "chars": 9763,
    "preview": "# Environment Configuration Reference\n\nComplete guide to configuring Transformers.js behavior using the `env` object.\n\n#"
  },
  {
    "path": "skills/transformers.js/references/EXAMPLES.md",
    "chars": 15194,
    "preview": "# Transformers.js Code Examples\n\nWorking examples showing how to use Transformers.js across different runtimes and frame"
  },
  {
    "path": "skills/transformers.js/references/MODEL_ARCHITECTURES.md",
    "chars": 5590,
    "preview": "# Supported Model Architectures\n\nThis document lists the model architectures currently supported by Transformers.js.\n\n##"
  },
  {
    "path": "skills/transformers.js/references/PIPELINE_OPTIONS.md",
    "chars": 14537,
    "preview": "# Pipeline Options Reference\n\nGuide to configuring model loading and inference using the `PretrainedModelOptions` parame"
  },
  {
    "path": "skills/transformers.js/references/TEXT_GENERATION.md",
    "chars": 7880,
    "preview": "# Text Generation Guide\n\nGuide to generating text with Transformers.js, including streaming and chat format.\n\n## Table o"
  }
]

About this extraction

This page contains the full source code of the huggingface/skills GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 129 files (1.1 MB), approximately 287.8k tokens, and a symbol index with 263 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!