[
  {
    "path": ".claude-plugin/marketplace.json",
    "content": "{\n  \"name\": \"huggingface-skills\",\n  \"owner\": {\n    \"name\": \"Hugging Face\"\n  },\n  \"metadata\": {\n    \"description\": \"Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub\",\n    \"version\": \"1.0.1\"\n  },\n  \"plugins\": [\n    {\n      \"name\": \"hugging-face-model-trainer\",\n      \"source\": \"./skills/hugging-face-model-trainer\",\n      \"skills\": \"./\",\n      \"description\": \"Train or fine-tune language models using TRL on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes hardware selection, cost estimation, Trackio monitoring, and Hub persistence.\"\n    },\n    {\n      \"name\": \"hugging-face-paper-publisher\",\n      \"source\": \"./skills/hugging-face-paper-publisher\",\n      \"skills\": \"./\",\n      \"description\": \"Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.\"\n    },\n    {\n      \"name\": \"hugging-face-paper-pages\",\n      \"source\": \"./skills/hugging-face-paper-pages\",\n      \"skills\": \"./\",\n      \"description\": \"Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata like authors, linked models, datasets, Spaces, and media URLs when needed.\"\n    },\n    {\n      \"name\": \"hugging-face-datasets\",\n      \"source\": \"./skills/hugging-face-datasets\",\n      \"skills\": \"./\",\n      \"description\": \"Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation.\"\n    },\n    {\n      \"name\": \"hugging-face-evaluation\",\n      \"source\": \"./skills/hugging-face-evaluation\",\n      \"skills\": \"./\",\n      \"description\": \"Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom evaluations with vLLM/lighteval.\"\n    },\n    {\n      \"name\": \"hugging-face-tool-builder\",\n      \"source\": \"./skills/hugging-face-tool-builder\",\n      \"skills\": \"./\",\n      \"description\": \"Build reusable scripts for Hugging Face API operations. Useful for chaining API calls or automating repeated tasks.\"\n    },\n    {\n      \"name\": \"hf-cli\",\n      \"source\": \"./skills/hf-cli\",\n      \"skills\": \"./\",\n      \"description\": \"Execute Hugging Face Hub operations using the hf CLI. Download models/datasets, upload files, manage repos, and run cloud compute jobs.\"\n    },\n    {\n      \"name\": \"hugging-face-jobs\",\n      \"source\": \"./skills/hugging-face-jobs\",\n      \"skills\": \"./\",\n      \"description\": \"Run compute jobs on Hugging Face infrastructure. Execute Python scripts, manage scheduled jobs, and monitor job status.\"\n    },\n    {\n      \"name\": \"hugging-face-trackio\",\n      \"source\": \"./skills/hugging-face-trackio\",\n      \"skills\": \"./\",\n      \"description\": \"Track and visualize ML training experiments with Trackio. Log metrics via Python API and retrieve them via CLI. Supports real-time dashboards synced to HF Spaces.\"\n    },\n    {\n      \"name\": \"hugging-face-dataset-viewer\",\n      \"source\": \"./skills/hugging-face-dataset-viewer\",\n      \"skills\": \"./\",\n      \"description\": \"Explore, query, and extract data from any Hugging Face dataset using the Dataset Viewer REST API and npx tooling. Zero Python dependencies — covers split/config discovery, row pagination, text search, filtering, SQL via parquetlens, and dataset upload via CLI.\"\n    },\n    {\n      \"name\": \"gradio\",\n      \"source\": \"./skills/huggingface-gradio\",\n      \"skills\": \"./\",\n      \"description\": \"Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, components, event listeners, layouts, or chatbots.\"\n    },\n    {\n      \"name\": \"transformers-js\",\n      \"source\": \"./skills/transformers.js\",\n      \"skills\": \"./\",\n      \"description\": \"Run state-of-the-art machine learning models directly in JavaScript/TypeScript for NLP, computer vision, audio processing, and multimodal tasks. Works in Node.js and browsers with WebGPU/WASM using Hugging Face models.\"\n    },\n\n    {\n      \"name\": \"hugging-face-vision-trainer\",\n      \"source\": \"./skills/hugging-face-vision-trainer\",\n      \"skills\": \"./\",\n      \"description\": \"Train and fine-tune object detection models (RTDETRv2, YOLOS, DETR and others) and image classification models (timm and transformers models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3) using Transformers Trainer API on Hugging Face Jobs infrastructure or locally. Includes COCO dataset format support, Albumentations augmentation, mAP/mAR metrics, trackio tracking, hardware selection, and Hub persistence.\"\n    }\n  ]\n}\n"
  },
  {
    "path": ".claude-plugin/plugin.json",
    "content": "{\n  \"name\": \"huggingface-skills\",\n  \"description\": \"Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub\",\n  \"version\": \"1.0.1\",\n  \"author\": {\n    \"name\": \"Hugging Face\"\n  },\n  \"homepage\": \"https://github.com/huggingface/skills\",\n  \"repository\": \"https://github.com/huggingface/skills\",\n  \"license\": \"Apache-2.0\",\n  \"keywords\": [\n    \"huggingface\",\n    \"machine-learning\",\n    \"datasets\",\n    \"training\",\n    \"evaluation\",\n    \"papers\",\n    \"fine-tuning\",\n    \"llm\"\n  ]\n}\n\n"
  },
  {
    "path": ".cursor-plugin/marketplace.json",
    "content": "{\n  \"name\": \"huggingface-skills\",\n  \"owner\": {\n    \"name\": \"Hugging Face\"\n  },\n  \"metadata\": {\n    \"description\": \"Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub\",\n    \"version\": \"1.0.0\"\n  },\n  \"plugins\": [\n    {\n      \"name\": \"huggingface-skills\",\n      \"source\": \".\",\n      \"skills\": \"skills\",\n      \"description\": \"Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub\"\n    }\n  ]\n}\n"
  },
  {
    "path": ".cursor-plugin/plugin.json",
    "content": "{\n  \"name\": \"huggingface-skills\",\n  \"skills\": \"skills\",\n  \"mcpServers\": \".mcp.json\",\n  \"description\": \"Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub\",\n  \"version\": \"1.0.1\",\n  \"author\": {\n    \"name\": \"Hugging Face\"\n  },\n  \"homepage\": \"https://github.com/huggingface/skills\",\n  \"repository\": \"https://github.com/huggingface/skills\",\n  \"license\": \"Apache-2.0\",\n  \"keywords\": [\n    \"huggingface\",\n    \"machine-learning\",\n    \"datasets\",\n    \"training\",\n    \"evaluation\",\n    \"papers\",\n    \"fine-tuning\",\n    \"llm\"\n  ]\n}\n"
  },
  {
    "path": ".github/workflows/SECURITY.md",
    "content": "# Security Policy\n\n## Reporting a Vulnerability\nWe take the security of this project seriously. If you discover a security vulnerability within this repository, please **do not open a public issue**. Instead, please report it through one of the following channels:\n\n* **Email:** security@huggingface.co\n* **Hugging Face Hub:** Use the \"Report\" feature on the repository’s Hub page if applicable.\n\nPlease include a detailed description of the vulnerability and, if possible, a Proof of Concept (PoC) to help us reproduce the issue.\n\n## Scope\nThe following types of vulnerabilities are of particular interest:\n* **Remote Code Execution (RCE):** Vulnerabilities that allow an agent to execute unauthorized code outside of its intended environment.\n* **Credential Leakage:** Skills that inadvertently expose API keys, tokens, or sensitive environment variables.\n* **Injection Attacks:** Vulnerabilities in skills that handle user input for database queries or shell commands.\n\n## Our Response Process\n1. **Acknowledgment:** We will acknowledge your report within 48 business hours.\n2. **Investigation:** Our team will investigate the issue and determine its impact.\n3. **Fix and Disclosure:** Once a fix is ready, we will coordinate a release and, with your permission, credit you for the discovery.\n\nThank you for helping keep the Hugging Face ecosystem safe!"
  },
  {
    "path": ".github/workflows/generate-agents.yml",
    "content": "name: Validate generated agent/plugin artifacts\n\non:\n  pull_request:\n    paths:\n      - \"scripts/AGENTS_TEMPLATE.md\"\n      - \"scripts/generate_agents.py\"\n      - \"scripts/generate_cursor_plugin.py\"\n      - \"scripts/publish.sh\"\n      - \"**/SKILL.md\"\n      - \"agents/AGENTS.md\"\n      - \"README.md\"\n      - \".claude-plugin/marketplace.json\"\n      - \".claude-plugin/plugin.json\"\n      - \"gemini-extension.json\"\n      - \".cursor-plugin/plugin.json\"\n      - \".mcp.json\"\n\njobs:\n  validate:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout\n        uses: actions/checkout@v6\n\n      - name: Set up uv\n        uses: astral-sh/setup-uv@v7\n\n      - name: Ensure generated files are up to date\n        run: ./scripts/publish.sh --check\n"
  },
  {
    "path": ".github/workflows/push-evals-leaderboard.yml",
    "content": "name: Sync Evals Leaderboard to Hugging Face Space\n\non:\n  push:\n    branches:\n      - main\n    paths:\n      - 'apps/evals-leaderboard/**'\n  workflow_dispatch:\n\njobs:\n  sync:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v6\n\n      - name: Install huggingface_hub\n        run: curl -LsSf https://hf.co/cli/install.sh | bash\n\n      - name: Upload evals leaderboard to HF Space\n        env:\n          HF_TOKEN: ${{ secrets.HF_TOKEN }}\n        run: hf upload hf-skills/distributed-leaderboard apps/evals-leaderboard . --repo-type=space\n\n"
  },
  {
    "path": ".github/workflows/push-hackers-leaderboard.yml",
    "content": "name: Sync Hackers Leaderboard to Hugging Face Space\n\non:\n  push:\n    branches:\n      - main\n    paths:\n      - 'apps/hackers-leaderboard/**'\n  workflow_dispatch:\n\njobs:\n  sync:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v6\n\n      - name: Install huggingface_hub\n        run: curl -LsSf https://hf.co/cli/install.sh | bash\n\n      - name: Upload hackers leaderboard to HF Space\n        env:\n          HF_TOKEN: ${{ secrets.HF_TOKEN }}\n        run: hf upload hf-skills/hacker_leaderboard apps/hackers-leaderboard . --repo-type=space\n\n"
  },
  {
    "path": ".github/workflows/push-quests.yml",
    "content": "name: Sync Quests to Hugging Face Space\n\non:\n  push:\n    branches:\n      - main\n    paths:\n      - 'quests/**'\n  workflow_dispatch:\n\njobs:\n  sync:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v6\n\n      - name: Install huggingface_hub\n        run: curl -LsSf https://hf.co/cli/install.sh | bash\n\n      - name: Upload quests to HF Space\n        env:\n          HF_TOKEN: ${{ secrets.HF_TOKEN }}\n        run: hf upload hf-skills/README quests . --repo-type=space\n\n"
  },
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[codz]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py.cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n# .python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# UV\n#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#uv.lock\n\n# poetry\n#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control\n#poetry.lock\n#poetry.toml\n\n# pdm\n#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.\n#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.\n#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control\n#pdm.lock\n#pdm.toml\n.pdm-python\n.pdm-build/\n\n# pixi\n#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.\n#pixi.lock\n#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one\n#   in the .venv directory. It is recommended not to include this directory in version control.\n.pixi\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.envrc\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n# PyCharm\n#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can\n#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore\n#  and can be added to the global gitignore or merged into this file.  For a more nuclear\n#  option (not recommended) you can uncomment the following to ignore the entire idea folder.\n#.idea/\n\n# Abstra\n# Abstra is an AI-powered process automation framework.\n# Ignore directories containing user credentials, local state, and settings.\n# Learn more at https://abstra.io/docs\n.abstra/\n\n# Visual Studio Code\n#  Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore \n#  that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore\n#  and can be added to the global gitignore or merged into this file. However, if you prefer, \n#  you could uncomment the following to ignore the entire vscode folder\n# .vscode/\n\n# Ruff stuff:\n.ruff_cache/\n\n# PyPI configuration file\n.pypirc\n\n# Cursor\n#  Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to\n#  exclude from AI features like autocomplete and code analysis. Recommended for sensitive data\n#  refer to https://docs.cursor.com/context/ignore-files\n.cursorignore\n.cursorindexingignore\n\n# Marimo\nmarimo/_static/\nmarimo/_lsp/\n__marimo__/\n.claude\n.fast-agent/\n\n# MacOS\n.DS_Store"
  },
  {
    "path": ".mcp.json",
    "content": "{\n  \"mcpServers\": {\n    \"huggingface-skills\": {\n      \"url\": \"https://huggingface.co/mcp?login\"\n    }\n  }\n}\n"
  },
  {
    "path": "LICENSE",
    "content": "                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   APPENDIX: How to apply the Apache License to your work.\n\n      To apply the Apache License to your work, attach the following\n      boilerplate notice, with the fields enclosed by brackets \"[]\"\n      replaced with your own identifying information. (Don't include\n      the brackets!)  The text should be enclosed in the appropriate\n      comment syntax for the file format. We also recommend that a\n      file or class name and description of purpose be included on the\n      same \"printed page\" as the copyright notice for easier\n      identification within third-party archives.\n\n   Copyright [yyyy] [name of copyright owner]\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n"
  },
  {
    "path": "README.md",
    "content": "# Hugging Face Skills\n\nHugging Face Skills are definitions for AI/ML tasks like dataset creation, model training, and evaluation. They are interoperable with all major coding agent tools like OpenAI Codex, Anthropic's Claude Code, Google DeepMind's Gemini CLI, and Cursor.\n\nThe skills in this repository follow the standardized [Agent Skills](https://agentskills.io/home) format.\n\n## How do Skills work?\n\nIn practice, skills are self-contained folders that package instructions, scripts, and resources together for an AI agent to use on a specific use case. Each folder includes a `SKILL.md` file with YAML frontmatter (name and description) followed by the guidance your coding agent follows while the skill is active. \n\n> [!NOTE]\n> 'Skills' is actually an Anthropic term used within Claude AI and Claude Code and not adopted by other agent tools, but we love it! OpenAI Codex uses the open [Agent Skills](https://agentskills.io/specification) format, where each skill is a directory with a `SKILL.md` file that Codex discovers from standard `.agents/skills` locations documented in the [Codex Skills guide](https://developers.openai.com/codex/skills/). Codex can also work with an `AGENTS.md` file. Google Gemini uses 'extensions' to define the instructions for your coding agent in a `gemini-extension.json` file. **This repo is compatible with all of them, and more!**\n\n> [!TIP]\n> If your agent doesn't support skills, you can use [`agents/AGENTS.md`](agents/AGENTS.md) directly as a fallback.\n\n## Installation\n\nHugging Face skills are compatible with Claude Code, Codex, Gemini CLI, and Cursor.\n\n### Claude Code\n\n1. Register the repository as a plugin marketplace:  \n   \n```\n/plugin marketplace add huggingface/skills\n```\n\n2. To install a skill, run:  \n   \n```\n/plugin install <skill-name>@huggingface/skills\n```\n\nFor example:  \n\n```\n/plugin install hf-cli@huggingface/skills\n```\n\n### Codex\n\n1. Copy or symlink any skills you want to use from this repository's `skills/` directory into one of Codex's standard `.agents/skills` locations (for example, `$REPO_ROOT/.agents/skills` or `$HOME/.agents/skills`) as described in the [Codex Skills guide](https://developers.openai.com/codex/skills/).\n\n2. Once a skill is available in one of those locations, Codex will discover it using the Agent Skills standard and load the `SKILL.md` instructions when it decides to use that skill or when you explicitly invoke it.\n\n3. If your Codex setup still relies on `AGENTS.md`, you can use the generated [`agents/AGENTS.md`](agents/AGENTS.md) file in this repo as a fallback bundle of instructions.\n\n### Gemini CLI\n\n1. This repo includes `gemini-extension.json` to integrate with the Gemini CLI.\n\n2. Install locally:  \n\n```\ngemini extensions install . --consent\n```\n\nor use the GitHub URL:\n\n```\ngemini extensions install https://github.com/huggingface/skills.git --consent\n```\n\n4. See [Gemini CLI extensions docs](https://geminicli.com/docs/extensions/#installing-an-extension) for more help.\n\n### Cursor\n\nThis repository includes Cursor plugin manifests:\n\n- `.cursor-plugin/plugin.json`\n- `.mcp.json` (configured with the Hugging Face MCP server URL)\n\nInstall from repository URL (or local checkout) via the Cursor plugin flow.\n\nFor contributors, regenerate manifests with:\n\n```bash\n./scripts/publish.sh\n```\n\n## Skills\n\nThis repository contains a few skills to get you started. You can also contribute your own skills to the repository.\n\n### Available skills\n\n<!-- This table is auto-generated by scripts/generate_agents.py. Do not edit manually. -->\n<!-- BEGIN_SKILLS_TABLE -->\n| Name | Description | Documentation |\n|------|-------------|---------------|\n| `gradio` | Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, components, event listeners, layouts, or chatbots. | [SKILL.md](skills/huggingface-gradio/SKILL.md) |\n| `hf-cli` | Execute Hugging Face Hub operations using the hf CLI. Download models/datasets, upload files, manage repos, and run cloud compute jobs. | [SKILL.md](skills/hf-cli/SKILL.md) |\n| `hugging-face-dataset-viewer` | Explore, query, and extract data from any Hugging Face dataset using the Dataset Viewer REST API and npx tooling. Zero Python dependencies — covers split/config discovery, row pagination, text search, filtering, SQL via parquetlens, and dataset upload via CLI. | [SKILL.md](skills/hugging-face-dataset-viewer/SKILL.md) |\n| `hugging-face-datasets` | Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. | [SKILL.md](skills/hugging-face-datasets/SKILL.md) |\n| `hugging-face-evaluation` | Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom evaluations with vLLM/lighteval. | [SKILL.md](skills/hugging-face-evaluation/SKILL.md) |\n| `hugging-face-jobs` | Run compute jobs on Hugging Face infrastructure. Execute Python scripts, manage scheduled jobs, and monitor job status. | [SKILL.md](skills/hugging-face-jobs/SKILL.md) |\n| `hugging-face-model-trainer` | Train or fine-tune language models using TRL on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes hardware selection, cost estimation, Trackio monitoring, and Hub persistence. | [SKILL.md](skills/hugging-face-model-trainer/SKILL.md) |\n| `hugging-face-paper-pages` | Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata like authors, linked models, datasets, Spaces, and media URLs when needed. | [SKILL.md](skills/hugging-face-paper-pages/SKILL.md) |\n| `hugging-face-paper-publisher` | Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles. | [SKILL.md](skills/hugging-face-paper-publisher/SKILL.md) |\n| `hugging-face-tool-builder` | Build reusable scripts for Hugging Face API operations. Useful for chaining API calls or automating repeated tasks. | [SKILL.md](skills/hugging-face-tool-builder/SKILL.md) |\n| `hugging-face-trackio` | Track and visualize ML training experiments with Trackio. Log metrics via Python API and retrieve them via CLI. Supports real-time dashboards synced to HF Spaces. | [SKILL.md](skills/hugging-face-trackio/SKILL.md) |\n| `hugging-face-vision-trainer` | Train and fine-tune object detection models (RTDETRv2, YOLOS, DETR and others) and image classification models (timm and transformers models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3) using Transformers Trainer API on Hugging Face Jobs infrastructure or locally. Includes COCO dataset format support, Albumentations augmentation, mAP/mAR metrics, trackio tracking, hardware selection, and Hub persistence. | [SKILL.md](skills/hugging-face-vision-trainer/SKILL.md) |\n| `transformers-js` | Run state-of-the-art machine learning models directly in JavaScript/TypeScript for NLP, computer vision, audio processing, and multimodal tasks. Works in Node.js and browsers with WebGPU/WASM using Hugging Face models. | [SKILL.md](skills/transformers.js/SKILL.md) |\n<!-- END_SKILLS_TABLE -->\n\n### Using skills in your coding agent\n\nOnce a skill is installed, mention it directly while giving your coding agent instructions:\n\n- \"Use the HF LLM trainer skill to estimate the GPU memory needed for a 70B model run.\"\n- \"Use the HF model evaluation skill to launch `run_eval_job.py` on the latest checkpoint.\"\n- \"Use the HF dataset creator skill to draft new few-shot classification templates.\"\n- \"Use the HF paper publisher skill to index my arXiv paper and link it to my model.\"\n\nYour coding agent automatically loads the corresponding `SKILL.md` instructions and helper scripts while it completes the task.\n\n### Contribute or customize a skill\n\n1. Copy one of the existing skill folders (for example, `hf-datasets/`) and rename it.\n2. Update the new folder's `SKILL.md` frontmatter:\n   ```markdown\n   ---\n   name: my-skill-name\n   description: Describe what the skill does and when to use it\n   ---\n\n   # Skill Title\n   Guidance + examples + guardrails\n   ```\n3. Add or edit supporting scripts, templates, and documents referenced by your instructions.\n4. Add an entry to `.claude-plugin/marketplace.json` with a concise, human-readable description.\n5. Run:\n   ```bash\n   ./scripts/publish.sh\n   ```\n   to regenerate and validate all generated metadata.\n6. Reinstall or reload the skill bundle in your coding agent so the updated folder is available.\n\n### Marketplace\n\nThe `.claude-plugin/marketplace.json` file lists skills with human-readable descriptions for the plugin marketplace. The CI validates that skill names and paths match between `SKILL.md` files and `marketplace.json`, but descriptions are maintained separately: `SKILL.md` descriptions guide when Claude activates the skill, while marketplace descriptions are written for humans browsing available skills.\n\n### Additional references\n- Browse the latest instructions, scripts, and templates directly at [huggingface/skills](https://github.com/huggingface/skills).\n- Review Hugging Face documentation for the specific libraries or workflows you reference inside each skill.\n"
  },
  {
    "path": "agents/AGENTS.md",
    "content": "<skills>\n\nYou have additional SKILLs documented in directories containing a \"SKILL.md\" file.\n\nThese skills are:\n - gradio -> \"skills/huggingface-gradio/SKILL.md\"\n - hf-cli -> \"skills/hf-cli/SKILL.md\"\n - hugging-face-dataset-viewer -> \"skills/hugging-face-dataset-viewer/SKILL.md\"\n - hugging-face-datasets -> \"skills/hugging-face-datasets/SKILL.md\"\n - hugging-face-evaluation -> \"skills/hugging-face-evaluation/SKILL.md\"\n - hugging-face-jobs -> \"skills/hugging-face-jobs/SKILL.md\"\n - hugging-face-model-trainer -> \"skills/hugging-face-model-trainer/SKILL.md\"\n - hugging-face-paper-pages -> \"skills/hugging-face-paper-pages/SKILL.md\"\n - hugging-face-paper-publisher -> \"skills/hugging-face-paper-publisher/SKILL.md\"\n - hugging-face-tool-builder -> \"skills/hugging-face-tool-builder/SKILL.md\"\n - hugging-face-trackio -> \"skills/hugging-face-trackio/SKILL.md\"\n - hugging-face-vision-trainer -> \"skills/hugging-face-vision-trainer/SKILL.md\"\n - transformers-js -> \"skills/transformers.js/SKILL.md\"\n\nIMPORTANT: You MUST read the SKILL.md file whenever the description of the skills matches the user intent, or may help accomplish their task. \n\n<available_skills>\n\ngradio: `Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, components, event listeners, layouts, or chatbots.`\nhf-cli: `\"Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing repositories, models, datasets, and Spaces on the Hugging Face Hub. Replaces now deprecated `huggingface-cli` command.\"`\nhugging-face-dataset-viewer: `Use this skill for Hugging Face Dataset Viewer API workflows that fetch subset/split metadata, paginate rows, search text, apply filters, download parquet URLs, and read size or statistics.`\nhugging-face-datasets: `Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.`\nhugging-face-evaluation: `Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.`\nhugging-face-jobs: `This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.`\nhugging-face-model-trainer: `This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.`\nhugging-face-paper-pages: `Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata such as authors, linked models/datasets/spaces, Github repo and project page. Use when the user shares a Hugging Face paper page URL, an arXiv URL or ID, or asks to summarize, explain, or analyze an AI research paper.`\nhugging-face-paper-publisher: `Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.`\nhugging-face-tool-builder: `Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.`\nhugging-face-trackio: `Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API), firing alerts for training diagnostics, or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, alerts with webhooks, HF Space syncing, and JSON output for automation.`\nhugging-face-vision-trainer: `Trains and fine-tunes vision models for object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3 — plus any Transformers classifier), and SAM/SAM2 segmentation using Hugging Face Transformers on Hugging Face Jobs cloud GPUs. Covers COCO-format dataset preparation, Albumentations augmentation, mAP/mAR evaluation, accuracy metrics, SAM segmentation with bbox/point prompts, DiceCE loss, hardware selection, cost estimation, Trackio monitoring, and Hub persistence. Use when users mention training object detection, image classification, SAM, SAM2, segmentation, image matting, DETR, D-FINE, RT-DETR, ViT, timm, MobileNet, ResNet, bounding box models, or fine-tuning vision models on Hugging Face Jobs.`\ntransformers-js: `Use Transformers.js to run state-of-the-art machine learning models directly in JavaScript/TypeScript. Supports NLP (text classification, translation, summarization), computer vision (image classification, object detection), audio (speech recognition, audio classification), and multimodal tasks. Works in Node.js and browsers (with WebGPU/WASM) using pre-trained models from Hugging Face Hub.`\n</available_skills>\n\nPaths referenced within SKILL folders are relative to that SKILL. For example the hf-datasets `scripts/example.py` would be referenced as `hf-datasets/scripts/example.py`. \n\n</skills>\n"
  },
  {
    "path": "apps/evals-leaderboard/app.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nEvals Leaderboard - Gradio app for displaying model evaluation scores.\n\nReads leaderboard data from the hf-skills/evals-leaderboard dataset.\nRun collect_evals.py separately to update the dataset.\n\nUsage:\n    python app.py\n\"\"\"\n\nfrom __future__ import annotations\n\nimport json\n\nimport gradio as gr\nimport requests\n\nTABLE_HEADERS = [\n    \"Model\",\n    \"Benchmark\",\n    \"Score\",\n    \"Source\",\n]\n\nTABLE_DATATYPES = [\n    \"markdown\",\n    \"text\",\n    \"number\",\n    \"markdown\",\n]\n\n\nDATASET_REPO = \"hf-skills/evals-leaderboard\"\nLEADERBOARD_URL = f\"https://huggingface.co/datasets/{DATASET_REPO}/raw/main/data/leaderboard.jsonl\"\nMETADATA_URL = f\"https://huggingface.co/datasets/{DATASET_REPO}/raw/main/data/metadata.json\"\n\n\ndef format_model_link(model_id: str) -> str:\n    \"\"\"Format model ID as a clickable link.\"\"\"\n    return f\"[{model_id}](https://huggingface.co/{model_id})\"\n\n\ndef format_source_link(source_type: str, contributor: str, source_url: str) -> str:\n    \"\"\"Format source as a clickable link.\"\"\"\n    return f\"{source_type} by [{contributor}]({source_url})\"\n\n\ndef fetch_leaderboard() -> tuple[list[dict], dict]:\n    \"\"\"Fetch leaderboard data from the HF dataset.\"\"\"\n    # Fetch leaderboard JSONL\n    resp = requests.get(LEADERBOARD_URL, timeout=30)\n    resp.raise_for_status()\n    leaderboard = [json.loads(line) for line in resp.text.strip().split(\"\\n\") if line]\n\n    # Fetch metadata\n    resp = requests.get(METADATA_URL, timeout=30)\n    resp.raise_for_status()\n    metadata = resp.json()\n\n    return leaderboard, metadata\n\n\ndef refresh_handler() -> tuple[str, list[list]]:\n    \"\"\"Refresh the leaderboard data from the dataset.\"\"\"\n    try:\n        leaderboard, metadata = fetch_leaderboard()\n\n        # Build table rows\n        rows = []\n        for entry in leaderboard:\n            rows.append(\n                [\n                    format_model_link(entry[\"model_id\"]),\n                    entry[\"benchmark\"],\n                    entry[\"score\"],\n                    format_source_link(\n                        entry[\"source_type\"],\n                        entry[\"contributor\"],\n                        entry[\"source_url\"],\n                    ),\n                ]\n            )\n\n        status = \"\\n\".join(\n            [\n                f\"**Data from:** [{DATASET_REPO}](https://huggingface.co/datasets/{DATASET_REPO})\",\n                f\"**Last updated:** {metadata.get('generated_at', 'unknown')}\",\n                f\"**Models with scores:** {metadata.get('models_with_scores', 'unknown')}\",\n                f\"**Total entries:** {metadata.get('total_entries', len(leaderboard))}\",\n            ]\n        )\n\n        return status, rows\n\n    except Exception as e:\n        return f\"❌ Failed to load leaderboard: {e}\", []\n\n\nwith gr.Blocks() as demo:\n    gr.Markdown(\n        \"\"\"\n        # 📊 HF Evaluation Leaderboard\n        \n        Shows MMLU, BigCodeBench, and ARC MC scores pulled from model-index\n        metadata or their pull requests for trending text-generation models.\n        \"\"\"\n    )\n\n    status_box = gr.Markdown(\"Loading leaderboard...\")\n\n    leaderboard_table = gr.Dataframe(\n        headers=TABLE_HEADERS,\n        datatype=TABLE_DATATYPES,\n        interactive=False,\n        wrap=True,\n    )\n\n    demo.load(\n        refresh_handler,\n        outputs=[status_box, leaderboard_table],\n    )\n\n    gr.Markdown(\n        f\"\"\"\n        ---\n        \n        **Links:**\n        - [Dataset: {DATASET_REPO}](https://huggingface.co/datasets/{DATASET_REPO})\n        - [GitHub Repository](https://github.com/huggingface/skills)\n        \"\"\"\n    )\n\n\nif __name__ == \"__main__\":\n    demo.launch()\n"
  },
  {
    "path": "apps/evals-leaderboard/collect_evals.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nCollect evaluation scores from trending models' model-index metadata.\n\nScans trending text-generation models on the Hub and extracts benchmark\nscores from their model-index metadata or open pull requests.\n\nResults are saved to a dataset for the evals leaderboard.\n\nUsage:\n    python collect_evals.py [--push-to-hub]\n\"\"\"\n\nfrom __future__ import annotations\n\nimport argparse\nimport json\nimport os\nimport re\nfrom dataclasses import dataclass\nfrom datetime import datetime, timezone\nfrom pathlib import Path\nfrom typing import Any, Dict, List, Optional\n\nimport requests\nimport yaml\nfrom huggingface_hub import hf_hub_download\nfrom huggingface_hub.utils import HfHubHTTPError\n\nAPI_BASE = \"https://huggingface.co/api\"\nPIPELINE_FILTER = \"text-generation\"\nTRENDING_LIMIT = 50\nTRENDING_FETCH_LIMIT = 100\nPR_SCAN_LIMIT = 40\nUSER_AGENT = \"skills-evals-leaderboard/0.3\"\n\n\ndef _normalize(text: Optional[str]) -> str:\n    if not text:\n        return \"\"\n    text = text.lower()\n    text = re.sub(r\"[^a-z0-9]+\", \" \", text)\n    return text.strip()\n\n\ndef _coerce_score(value: Any) -> Optional[float]:\n    if value is None:\n        return None\n    if isinstance(value, (int, float)):\n        return float(value)\n    if isinstance(value, str):\n        candidate = value.strip()\n        if candidate.endswith(\"%\"):\n            candidate = candidate[:-1]\n        try:\n            return float(candidate)\n        except ValueError:\n            return None\n    return None\n\n\n@dataclass(frozen=True)\nclass BenchmarkSpec:\n    key: str\n    label: str\n    aliases: tuple[str, ...]\n\n    def matches(self, fields: List[str]) -> bool:\n        for alias in self.aliases:\n            alias_norm = _normalize(alias)\n            if not alias_norm:\n                continue\n            for field in fields:\n                if alias_norm in field:\n                    return True\n        return False\n\n\nBENCHMARKS: Dict[str, BenchmarkSpec] = {\n    \"mmlu\": BenchmarkSpec(\n        key=\"mmlu\",\n        label=\"MMLU\",\n        aliases=(\"mmlu\", \"massive multitask language understanding\"),\n    ),\n    \"bigcodebench\": BenchmarkSpec(\n        key=\"bigcodebench\",\n        label=\"BigCodeBench\",\n        aliases=(\"bigcodebench\", \"big code bench\"),\n    ),\n    \"arc_mc\": BenchmarkSpec(\n        key=\"arc_mc\",\n        label=\"ARC MC\",\n        aliases=(\n            \"arc mc\",\n            \"arc-challenge\",\n            \"arc challenge\",\n            \"arc multiple choice\",\n            \"arc c\",\n        ),\n    ),\n}\n\n\nclass EvalsCollector:\n    \"\"\"Collects evaluation scores from model-index metadata.\"\"\"\n\n    def __init__(self, token: str | None = None) -> None:\n        self.token = token\n        self.session = requests.Session()\n        self.session.headers.update({\"User-Agent\": USER_AGENT})\n        if token:\n            self.session.headers.update({\"Authorization\": f\"Bearer {token}\"})\n        self.logs: List[str] = []\n        self.results: List[Dict[str, Any]] = []\n\n    def log(self, message: str) -> None:\n        \"\"\"Add a log message.\"\"\"\n        print(message)\n        self.logs.append(message)\n\n    def collect_all(self) -> List[Dict[str, Any]]:\n        \"\"\"Collect evaluation scores from trending models.\"\"\"\n        self.log(\"🔍 Fetching trending text-generation models...\")\n        trending = self._fetch_trending_models()\n\n        for entry in trending:\n            repo_id = entry.get(\"modelId\") or entry.get(\"id\")\n            if not repo_id:\n                continue\n            scores = self._collect_scores(repo_id)\n            if scores[\"scores\"]:\n                self.results.extend(self._format_scores(repo_id, scores[\"scores\"]))\n\n        self.log(f\"✅ Collected {len(self.results)} evaluation entries\")\n        return self.results\n\n    def _fetch_trending_models(self) -> List[Dict[str, Any]]:\n        params = {\"sort\": \"trendingScore\", \"limit\": TRENDING_FETCH_LIMIT}\n        response = self.session.get(\n            f\"{API_BASE}/models\",\n            params=params,\n            timeout=30,\n        )\n        response.raise_for_status()\n        data = response.json()\n        if not isinstance(data, list):\n            raise ValueError(\"Unexpected trending response.\")\n        filtered = [\n            model\n            for model in data\n            if (model.get(\"pipeline_tag\") == PIPELINE_FILTER or PIPELINE_FILTER in (model.get(\"tags\") or []))\n        ]\n        if not filtered:\n            self.log(\"⚠️ No text-generation models in trending feed.\")\n            return []\n        limited = filtered[:TRENDING_LIMIT]\n        self.log(f\"📊 Found {len(limited)} trending text-generation models\")\n        return limited\n\n    def _collect_scores(self, repo_id: str) -> Dict[str, Any]:\n        owner = repo_id.split(\"/\")[0]\n        card_meta = self._read_model_card(repo_id)\n        model_index = card_meta.get(\"model-index\")\n        if model_index:\n            self.log(f\"✅ {repo_id}: model card metadata found.\")\n            scores = self._extract_scores(\n                repo_id=repo_id,\n                model_index=model_index,\n                contributor=owner,\n                source_type=\"model-card\",\n                source_url=f\"https://huggingface.co/{repo_id}\",\n                revision=\"main\",\n            )\n            if scores:\n                return {\"model_id\": repo_id, \"scores\": scores}\n\n        prs = self._fetch_pull_requests(repo_id)\n        for pr in prs:\n            revision = f\"refs/pr/{pr['num']}\"\n            pr_meta = self._read_model_card(repo_id, revision=revision)\n            pr_index = pr_meta.get(\"model-index\")\n            if not pr_index:\n                continue\n            author_info = pr.get(\"author\", {}) or {}\n            contributor = author_info.get(\"name\") or author_info.get(\"fullname\") or \"unknown-author\"\n            discussion_path = f\"{repo_id}/discussions/{pr['num']}\"\n            source_url = f\"https://huggingface.co/{discussion_path}\"\n            scores = self._extract_scores(\n                repo_id=repo_id,\n                model_index=pr_index,\n                contributor=contributor,\n                source_type=\"pull-request\",\n                source_url=source_url,\n                revision=revision,\n            )\n            if scores:\n                note = f\"📝 {repo_id}: PR #{pr['num']} by {contributor}.\"\n                self.log(note)\n                return {\"model_id\": repo_id, \"scores\": scores}\n\n        self.log(f\"⚠️ {repo_id}: no target benchmarks located.\")\n        return {\"model_id\": repo_id, \"scores\": {}}\n\n    def _read_model_card(\n        self,\n        repo_id: str,\n        revision: Optional[str] = None,\n    ) -> Dict[str, Any]:\n        try:\n            path = hf_hub_download(\n                repo_id=repo_id,\n                filename=\"README.md\",\n                repo_type=\"model\",\n                revision=revision,\n                token=self.token,\n            )\n        except HfHubHTTPError as err:\n            ctx = f\"{repo_id} ({revision or 'main'})\"\n            self.log(f\"🚫 {ctx}: README download failed ({err}).\")\n            return {}\n        text = Path(path).read_text(encoding=\"utf-8\", errors=\"ignore\")\n        return self._parse_front_matter(text)\n\n    @staticmethod\n    def _parse_front_matter(content: str) -> Dict[str, Any]:\n        content = content.lstrip(\"\\ufeff\")\n        if not content.startswith(\"---\"):\n            return {}\n        lines = content.splitlines()\n        end_idx = None\n        for idx, line in enumerate(lines[1:], start=1):\n            if line.strip() == \"---\":\n                end_idx = idx\n                break\n        if end_idx is None:\n            return {}\n        front_matter = \"\\n\".join(lines[1:end_idx])\n        try:\n            data = yaml.safe_load(front_matter) or {}\n            return data if isinstance(data, dict) else {}\n        except yaml.YAMLError:\n            return {}\n\n    def _fetch_pull_requests(self, repo_id: str) -> List[Dict[str, Any]]:\n        url = f\"{API_BASE}/models/{repo_id}/discussions\"\n        try:\n            response = self.session.get(\n                url,\n                params={\"limit\": PR_SCAN_LIMIT},\n                timeout=30,\n            )\n            response.raise_for_status()\n        except requests.RequestException as err:\n            self.log(f\"🚫 {repo_id}: PR list request failed ({err}).\")\n            return []\n\n        payload = response.json()\n        discussions = payload.get(\"discussions\", [])\n        prs = [disc for disc in discussions if disc.get(\"isPullRequest\")]\n        prs.sort(key=lambda item: item.get(\"createdAt\", \"\"), reverse=True)\n        if prs:\n            self.log(f\"📬 {repo_id}: scanning {len(prs)} pull requests.\")\n        return prs\n\n    def _extract_scores(\n        self,\n        repo_id: str,\n        model_index: Any,\n        contributor: str,\n        source_type: str,\n        source_url: str,\n        revision: str,\n    ) -> Dict[str, Dict[str, Any]]:\n        if not isinstance(model_index, list):\n            return {}\n        scores: Dict[str, Dict[str, Any]] = {}\n        for entry in model_index:\n            if not isinstance(entry, dict):\n                continue\n            model_name = entry.get(\"name\") or repo_id.split(\"/\")[-1]\n            for result in entry.get(\"results\", []):\n                dataset_info = result.get(\"dataset\") or {}\n                dataset_name = dataset_info.get(\"name\")\n                dataset_type = dataset_info.get(\"type\")\n                task_info = result.get(\"task\") or {}\n                task_type = task_info.get(\"type\")\n                for metric in result.get(\"metrics\", []):\n                    benchmark_key = self._match_benchmark(\n                        dataset_name,\n                        dataset_type,\n                        metric,\n                    )\n                    if not benchmark_key:\n                        continue\n                    raw_value = metric.get(\"value\")\n                    value = _coerce_score(raw_value)\n                    if value is None:\n                        continue\n                    unit = metric.get(\"unit\") or \"\"\n                    is_pct = isinstance(raw_value, str) and raw_value.strip().endswith(\"%\")\n                    if not unit and is_pct:\n                        unit = \"%\"\n                    metric_name = metric.get(\"name\") or metric.get(\"type\") or \"\"\n                    payload = {\n                        \"model\": repo_id,\n                        \"model_name\": model_name,\n                        \"benchmark_key\": benchmark_key,\n                        \"benchmark_label\": BENCHMARKS[benchmark_key].label,\n                        \"value\": value,\n                        \"unit\": unit,\n                        \"dataset\": dataset_name or dataset_type or \"\",\n                        \"task_type\": task_type or \"\",\n                        \"metric_name\": metric_name,\n                        \"contributor\": contributor,\n                        \"source_type\": source_type,\n                        \"source_url\": source_url,\n                        \"revision\": revision,\n                    }\n                    existing = scores.get(benchmark_key)\n                    if not existing or value > existing[\"value\"]:\n                        scores[benchmark_key] = payload\n        return scores\n\n    def _match_benchmark(\n        self,\n        dataset_name: Optional[str],\n        dataset_type: Optional[str],\n        metric: Dict[str, Any],\n    ) -> Optional[str]:\n        fields = [\n            _normalize(dataset_name),\n            _normalize(dataset_type),\n            _normalize(metric.get(\"name\")),\n            _normalize(metric.get(\"type\")),\n        ]\n        fields = [field for field in fields if field]\n        for key, spec in BENCHMARKS.items():\n            if spec.matches(fields):\n                return key\n        return None\n\n    def _format_scores(self, model_id: str, scores: Dict[str, Dict[str, Any]]) -> List[Dict[str, Any]]:\n        \"\"\"Format scores as flat records for the dataset.\"\"\"\n        rows = []\n        for benchmark_key, payload in scores.items():\n            rows.append(\n                {\n                    \"model_id\": model_id,\n                    \"benchmark\": payload[\"benchmark_label\"],\n                    \"benchmark_key\": benchmark_key,\n                    \"score\": round(payload[\"value\"], 2),\n                    \"source_type\": payload[\"source_type\"],\n                    \"source_url\": payload[\"source_url\"],\n                    \"contributor\": payload[\"contributor\"],\n                    \"collected_at\": datetime.now(timezone.utc).isoformat(),\n                }\n            )\n        return rows\n\n    def get_leaderboard(self) -> List[Dict[str, Any]]:\n        \"\"\"Get results sorted by score descending.\"\"\"\n        return sorted(self.results, key=lambda x: x[\"score\"], reverse=True)\n\n    def save_json(self, filepath: str) -> None:\n        \"\"\"Save the leaderboard to a JSON file.\"\"\"\n        leaderboard = self.get_leaderboard()\n        output = {\n            \"generated_at\": datetime.now(timezone.utc).isoformat(),\n            \"total_entries\": len(leaderboard),\n            \"benchmarks\": list(BENCHMARKS.keys()),\n            \"leaderboard\": leaderboard,\n        }\n        with open(filepath, \"w\") as f:\n            json.dump(output, f, indent=2)\n        self.log(f\"💾 Saved leaderboard to {filepath}\")\n\n    def push_to_hub(self, repo_id: str = \"hf-skills/evals-leaderboard\") -> None:\n        \"\"\"Push the leaderboard data to a HF dataset.\"\"\"\n        try:\n            from huggingface_hub import HfApi\n        except ImportError:\n            self.log(\"❌ huggingface_hub not installed. Run: pip install huggingface_hub\")\n            return\n\n        api = HfApi(token=self.token)\n        leaderboard = self.get_leaderboard()\n\n        # Create dataset as JSONL\n        jsonl_content = \"\\n\".join(json.dumps(row) for row in leaderboard)\n\n        # Create metadata file\n        metadata = {\n            \"generated_at\": datetime.now(timezone.utc).isoformat(),\n            \"total_entries\": len(leaderboard),\n            \"models_with_scores\": len(set(r[\"model_id\"] for r in leaderboard)),\n            \"benchmarks\": list(BENCHMARKS.keys()),\n        }\n\n        try:\n            # Create repo if it doesn't exist\n            api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=True)\n            self.log(f\"📁 Ensured dataset repo exists: {repo_id}\")\n\n            # Upload leaderboard data\n            api.upload_file(\n                path_or_fileobj=jsonl_content.encode(),\n                path_in_repo=\"data/leaderboard.jsonl\",\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n                commit_message=f\"Update leaderboard - {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')} UTC\",\n            )\n\n            # Upload metadata\n            api.upload_file(\n                path_or_fileobj=json.dumps(metadata, indent=2).encode(),\n                path_in_repo=\"data/metadata.json\",\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n                commit_message=f\"Update metadata - {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')} UTC\",\n            )\n\n            self.log(f\"🚀 Pushed leaderboard to {repo_id}\")\n        except Exception as e:\n            self.log(f\"❌ Failed to push to hub: {e}\")\n\n\ndef main() -> None:\n    parser = argparse.ArgumentParser(description=\"Collect evaluation scores from model-index metadata\")\n    parser.add_argument(\n        \"--push-to-hub\",\n        action=\"store_true\",\n        help=\"Push results to HF dataset\",\n    )\n    parser.add_argument(\n        \"--output\",\n        type=str,\n        default=\"leaderboard.json\",\n        help=\"Output JSON file path\",\n    )\n    parser.add_argument(\n        \"--repo-id\",\n        type=str,\n        default=\"hf-skills/evals-leaderboard\",\n        help=\"HF dataset repo ID for pushing\",\n    )\n    args = parser.parse_args()\n\n    token = os.environ.get(\"HF_TOKEN\")\n    if not token:\n        print(\"⚠️ No HF_TOKEN found. Some requests may be rate-limited.\")\n\n    collector = EvalsCollector(token=token)\n    collector.collect_all()\n\n    # Print leaderboard summary\n    print(\"\\n\" + \"=\" * 60)\n    print(\"📊 EVALUATION LEADERBOARD\")\n    print(\"=\" * 60)\n\n    leaderboard = collector.get_leaderboard()\n    for entry in leaderboard[:20]:\n        print(f\"{entry['model_id']:40} | {entry['benchmark']:12} | {entry['score']:6.2f}\")\n\n    if len(leaderboard) > 20:\n        print(f\"   ... and {len(leaderboard) - 20} more entries\")\n\n    print(\"=\" * 60)\n    print(f\"Total entries: {len(leaderboard)}\")\n    print(f\"Models with scores: {len(set(r['model_id'] for r in leaderboard))}\")\n\n    # Save locally\n    collector.save_json(args.output)\n\n    # Push to hub if requested\n    if args.push_to_hub:\n        collector.push_to_hub(args.repo_id)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "apps/evals-leaderboard/requirements.txt",
    "content": "gradio\nrequests\npyyaml\nhuggingface_hub\n\n"
  },
  {
    "path": "apps/hackers-leaderboard/README.md",
    "content": "---\ntitle: Hackers Leaderboard\nemoji: 🏆\ncolorFrom: purple\ncolorTo: pink\nsdk: gradio\nsdk_version: 5.50.0\napp_file: app.py\npinned: false\n---\n\n# Hackers Leaderboard\n\nTracks engagement from the [hf-skills](https://huggingface.co/hf-skills) organization for the hackathon leaderboard.\n\n## How Points Work\n\nSimple and fair - **1 point per activity**:\n\n| Activity | Points |\n|----------|--------|\n| 💬 Open a discussion | 1 |\n| 📝 Post a comment | 1 |\n| 🔀 Open a PR | 1 |\n| 📦 Own/create a repo | 1 |\n\n## Scripts\n\n### Collect Points\n\n```bash\n# Collect org activity only\nHF_TOKEN=$HF_TOKEN python collect_points.py\n\n# Also scan trending repos for member PRs/discussions\nHF_TOKEN=$HF_TOKEN python collect_points.py --scan-external\n\n# Scan only specific repo types\nHF_TOKEN=$HF_TOKEN python collect_points.py --scan-external --repo-type models\nHF_TOKEN=$HF_TOKEN python collect_points.py --scan-external --repo-type models datasets\n\n# Push to HF dataset\nHF_TOKEN=$HF_TOKEN python collect_points.py --scan-external --push-to-hub\n\n# Custom output\npython collect_points.py --output my_leaderboard.json --repo-id my-org/my-dataset\n```\n\n### Options\n\n| Flag | Description |\n|------|-------------|\n| `--scan-external` | Scan trending repos across Hub for member activity |\n| `--repo-type` | Filter external scan to: `models`, `datasets`, `spaces` |\n| `--push-to-hub` | Push results to HF dataset |\n| `--repo-id` | Target dataset repo (default: `hf-skills/hackers-leaderboard`) |\n| `--output` | Local JSON output path |\n\n### Run the App\n\n```bash\nHF_TOKEN=$HF_TOKEN python app.py\n```\n\n## API\n\nThe collector scans:\n- All models, datasets, and spaces in the org\n- All discussions and PRs on those repos\n- All comments on discussions\n\nResults are saved as JSONL for easy dataset consumption.\n\n## Output Format\n\n```json\n{\n  \"username\": \"user123\",\n  \"total_points\": 15,\n  \"discussions_opened\": 3,\n  \"comments_made\": 8,\n  \"prs_opened\": 2,\n  \"repos_owned\": 2\n}\n```\n\n"
  },
  {
    "path": "apps/hackers-leaderboard/app.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nHackers Leaderboard - Gradio app for displaying engagement from hf-skills org.\n\nReads leaderboard data from the hf-skills/hackers-leaderboard dataset.\nRun collect_points.py separately to update the dataset.\n\nUsage:\n    python app.py\n\"\"\"\n\nfrom __future__ import annotations\n\nimport json\n\nimport gradio as gr\nimport requests\n\nTABLE_HEADERS = [\n    \"Rank\",\n    \"Username\",\n    \"Points\",\n    \"💬 Discussions\",\n]\n\nTABLE_DATATYPES = [\n    \"number\",\n    \"markdown\",\n    \"number\",\n]\n\n\nDATASET_REPO = \"hf-skills/hackers-leaderboard\"\nLEADERBOARD_URL = f\"https://huggingface.co/datasets/{DATASET_REPO}/raw/main/data/leaderboard.jsonl\"\nMETADATA_URL = f\"https://huggingface.co/datasets/{DATASET_REPO}/raw/main/data/metadata.json\"\n\n\ndef format_username(username: str) -> str:\n    \"\"\"Format username as a clickable link.\"\"\"\n    return f\"[{username}](https://huggingface.co/{username})\"\n\n\ndef fetch_leaderboard() -> tuple[list[dict], dict]:\n    \"\"\"Fetch leaderboard data from the HF dataset.\"\"\"\n    # Fetch leaderboard JSONL\n    resp = requests.get(LEADERBOARD_URL, timeout=30)\n    resp.raise_for_status()\n    leaderboard = [json.loads(line) for line in resp.text.strip().split(\"\\n\") if line]\n\n    # Fetch metadata\n    resp = requests.get(METADATA_URL, timeout=30)\n    resp.raise_for_status()\n    metadata = resp.json()\n\n    return leaderboard, metadata\n\n\ndef refresh_handler() -> tuple[str, list[list]]:\n    \"\"\"Refresh the leaderboard data from the dataset.\"\"\"\n    try:\n        leaderboard, metadata = fetch_leaderboard()\n\n        # Build table rows\n        rows = []\n        for i, entry in enumerate(leaderboard, 1):\n            rows.append(\n                [\n                    i,\n                    format_username(entry[\"username\"]),\n                    entry[\"prs_opened\"],\n                ]\n            )\n\n        status = \"\\n\".join(\n            [\n                f\"**Data from:** [{DATASET_REPO}](https://huggingface.co/datasets/{DATASET_REPO})\",\n                f\"**Last updated:** {metadata.get('generated_at', 'unknown')}\",\n                f\"**Participants:** {metadata.get('total_participants', len(leaderboard))}\",\n                f\"**Total points:** {metadata.get('total_points', sum(e['total_points'] for e in leaderboard))}\",\n            ]\n        )\n\n        return status, rows\n\n    except Exception as e:\n        return f\"❌ Failed to load leaderboard: {e}\", []\n\n\nwith gr.Blocks() as demo:\n    gr.HTML(\n        \"\"\"\n        <div class=\"subtitle\">\n            <img src=\"https://github.com/huggingface/skills/raw/main/assets/banner.png\" alt=\"Humanity's Last Hackathon (of 2025)\" width=\"100%\">\n        </div>\n        <div class=\"leaderboard-title\"><h1>🏆 Humanity's Last Hackathon Leaderboard</h1></div>\n        \"\"\"\n    )\n\n    leaderboard_table = gr.Dataframe(\n        headers=TABLE_HEADERS,\n        datatype=TABLE_DATATYPES,\n        interactive=False,\n        wrap=True,\n    )\n\n    status_box = gr.Markdown(\"Click refresh to load the leaderboard...\")\n    \n    demo.load(\n        refresh_handler,\n        outputs=[status_box, leaderboard_table],\n    )\n\n    gr.Markdown(\n        \"\"\"\n        ---\n        \n        **Links:**\n        - [Join hf-skills](https://huggingface.co/organizations/hf-skills/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp)\n        - [Quest Instructions](https://github.com/huggingface/skills/tree/main/apps/quests)\n        - [GitHub Repository](https://github.com/huggingface/skills)\n        \"\"\"\n    )\n\nif __name__ == \"__main__\":\n    demo.launch()\n"
  },
  {
    "path": "apps/hackers-leaderboard/collect_points.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nCollect engagement points from the hf-skills organization.\n\nTracks user activity across all repos (models, datasets, spaces) and counts:\n- 1 point per discussion opened\n- 1 point per comment made\n- 1 point per PR opened\n- 1 point per repo owned/created\n\nResults are saved to a dataset for the hackers leaderboard.\n\nUsage:\n    HF_TOKEN=$HF_TOKEN python collect_points.py [--push-to-hub]\n\"\"\"\n\nfrom __future__ import annotations\n\nimport argparse\nimport json\nimport os\nfrom dataclasses import dataclass, field\nfrom datetime import datetime, timezone\nfrom typing import Any\n\nimport requests\n\nAPI_BASE = \"https://huggingface.co/api\"\nORG_NAME = \"hf-skills\"\nUSER_AGENT = \"hf-skills-leaderboard/1.0\"\nDISCUSSION_LIMIT = 100  # Max discussions to fetch per repo\nTRENDING_LIMIT = 50  # Number of trending repos to scan for external PRs\n\n\n@dataclass\nclass UserStats:\n    \"\"\"Tracks engagement stats for a single user.\"\"\"\n\n    username: str\n    is_org_member: bool = True\n    discussions_opened: int = 0\n    comments_made: int = 0\n    prs_opened: int = 0\n    repos_owned: int = 0\n    activities: list[dict[str, Any]] = field(default_factory=list)\n\n    @property\n    def total_points(self) -> int:\n        return self.discussions_opened + self.comments_made + self.prs_opened + self.repos_owned\n\n    def to_dict(self) -> dict[str, Any]:\n        return {\n            \"username\": self.username,\n            \"is_org_member\": self.is_org_member,\n            \"total_points\": self.total_points,\n            \"discussions_opened\": self.discussions_opened,\n            \"comments_made\": self.comments_made,\n            \"prs_opened\": self.prs_opened,\n            \"repos_owned\": self.repos_owned,\n        }\n\n\nclass PointsCollector:\n    \"\"\"Collects engagement points from the hf-skills organization.\"\"\"\n\n    def __init__(self, token: str | None = None) -> None:\n        self.token = token\n        self.session = requests.Session()\n        self.session.headers.update({\"User-Agent\": USER_AGENT})\n        if token:\n            self.session.headers.update({\"Authorization\": f\"Bearer {token}\"})\n        self.user_stats: dict[str, UserStats] = {}\n        self.logs: list[str] = []\n\n    def log(self, message: str) -> None:\n        \"\"\"Add a log message.\"\"\"\n        print(message)\n        self.logs.append(message)\n\n    def _fetch_org_members(self) -> list[str]:\n        \"\"\"Fetch all members of the organization.\"\"\"\n        try:\n            from huggingface_hub import HfApi\n\n            api = HfApi(token=self.token)\n            members = list(api.list_organization_members(ORG_NAME))\n            usernames = [m.username for m in members if m.username]\n            self.log(f\"👥 Found {len(usernames)} organization members\")\n            return usernames\n        except Exception as e:\n            self.log(f\"⚠️ Failed to fetch org members: {e}\")\n            # Fallback: try the API directly\n            try:\n                url = f\"{API_BASE}/organizations/{ORG_NAME}/members\"\n                response = self.session.get(url, timeout=30)\n                response.raise_for_status()\n                members = response.json()\n                usernames = [m.get(\"user\") or m.get(\"username\") or m.get(\"name\") for m in members]\n                usernames = [u for u in usernames if u]\n                self.log(f\"👥 Found {len(usernames)} organization members (via API)\")\n                return usernames\n            except Exception as e2:\n                self.log(f\"⚠️ Fallback also failed: {e2}\")\n                return []\n\n    def collect_all(self) -> dict[str, UserStats]:\n        \"\"\"Collect points from all repos in the organization.\"\"\"\n        self.log(f\"🔍 Scanning organization: {ORG_NAME}\")\n\n        # First, get all org members and initialize their stats\n        members = self._fetch_org_members()\n        for username in members:\n            self.user_stats[username] = UserStats(username=username)\n\n        # Collect from all repo types\n        models = self._list_repos(\"models\")\n        datasets = self._list_repos(\"datasets\")\n        spaces = self._list_repos(\"spaces\")\n\n        all_repos = [\n            *[(r, \"model\") for r in models],\n            *[(r, \"dataset\") for r in datasets],\n            *[(r, \"space\") for r in spaces],\n        ]\n\n        self.log(f\"📦 Found {len(models)} models, {len(datasets)} datasets, {len(spaces)} spaces\")\n\n        for repo_info, repo_type in all_repos:\n            repo_id = repo_info.get(\"id\") or repo_info.get(\"modelId\")\n            if not repo_id:\n                continue\n\n            # Credit repo owner\n            owner = repo_info.get(\"author\") or repo_id.split(\"/\")[0]\n            if owner and owner != ORG_NAME:\n                self._add_point(owner, \"repos_owned\", repo_id, \"repo_created\")\n\n            # Scan discussions\n            self._scan_discussions(repo_id, repo_type)\n\n        return dict(self.user_stats)\n\n    def scan_external_repos(self, repo_types: list[str] | None = None) -> None:\n        \"\"\"Scan trending repos across the Hub for PRs by org members.\n\n        Args:\n            repo_types: List of repo types to scan. Options: \"models\", \"datasets\", \"spaces\".\n                       If None, scans all types.\n        \"\"\"\n        org_members = set(self.user_stats.keys())\n        if not org_members:\n            self.log(\"⚠️ No org members loaded. Run collect_all() first.\")\n            return\n\n        if repo_types is None:\n            repo_types = [\"models\", \"datasets\", \"spaces\"]\n\n        self.log(f\"🌐 Scanning trending repos for PRs by {len(org_members)} org members...\")\n        self.log(f\"📂 Repo types: {', '.join(repo_types)}\")\n\n        for repo_type in repo_types:\n            trending = self._fetch_trending(repo_type)\n            self.log(f\"📈 Scanning {len(trending)} trending {repo_type}...\")\n\n            for repo_info in trending:\n                repo_id = repo_info.get(\"id\") or repo_info.get(\"modelId\")\n                if not repo_id:\n                    continue\n\n                # Skip org repos (already scanned)\n                if repo_id.startswith(f\"{ORG_NAME}/\"):\n                    continue\n\n                # Scan for PRs/discussions by each org member using author filter\n                self._scan_repo_for_members(repo_id, repo_type, org_members)\n\n    def _fetch_trending(self, repo_type: str) -> list[dict[str, Any]]:\n        \"\"\"Fetch trending repos of a given type.\"\"\"\n        endpoint = f\"{API_BASE}/{repo_type}\"\n        params = {\"sort\": \"trendingScore\", \"limit\": TRENDING_LIMIT}\n\n        try:\n            response = self.session.get(endpoint, params=params, timeout=30)\n            response.raise_for_status()\n            return response.json()\n        except requests.RequestException as e:\n            self.log(f\"⚠️ Failed to fetch trending {repo_type}: {e}\")\n            return []\n\n    def _scan_repo_for_members(self, repo_id: str, repo_type: str, org_members: set[str]) -> None:\n        \"\"\"Scan a repo's discussions for activity by org members using author filter.\"\"\"\n        # Parse namespace and repo from repo_id\n        parts = repo_id.split(\"/\")\n        if len(parts) != 2:\n            return\n        namespace, repo = parts\n\n        for member in org_members:\n            # Use author filter for efficient querying\n            self._fetch_member_discussions(\n                repo_type=repo_type,\n                namespace=namespace,\n                repo=repo,\n                author=member,\n                discussion_type=\"pull_request\",\n            )\n            self._fetch_member_discussions(\n                repo_type=repo_type,\n                namespace=namespace,\n                repo=repo,\n                author=member,\n                discussion_type=\"discussion\",\n            )\n\n    def _fetch_member_discussions(\n        self,\n        repo_type: str,\n        namespace: str,\n        repo: str,\n        author: str,\n        discussion_type: str = \"all\",\n    ) -> None:\n        \"\"\"Fetch discussions for a specific author on a repo.\n\n        Uses: GET /api/{repoType}/{namespace}/{repo}/discussions?author={author}&type={type}\n        \"\"\"\n        url = f\"{API_BASE}/{repo_type}/{namespace}/{repo}/discussions\"\n        params = {\n            \"author\": author,\n            \"type\": discussion_type,\n            \"status\": \"all\",\n        }\n\n        try:\n            response = self.session.get(url, params=params, timeout=30)\n            response.raise_for_status()\n            data = response.json()\n        except requests.RequestException:\n            return\n\n        discussions = data.get(\"discussions\", [])\n        repo_id = f\"{namespace}/{repo}\"\n\n        for discussion in discussions:\n            is_pr = discussion.get(\"isPullRequest\", False)\n            disc_num = discussion.get(\"num\")\n\n            if is_pr:\n                self._add_point(author, \"prs_opened\", repo_id, \"external_pr\", disc_num)\n                self.log(f\"🔀 Found PR by {author} on {repo_id}\")\n            else:\n                self._add_point(author, \"discussions_opened\", repo_id, \"external_discussion\", disc_num)\n                self.log(f\"💬 Found discussion by {author} on {repo_id}\")\n\n            # Count comments on the discussion\n            num_comments = discussion.get(\"numComments\", 0)\n            if num_comments > 0:\n                self._fetch_discussion_comments(repo_type, namespace, repo, disc_num, author)\n\n    def _fetch_discussion_comments(\n        self,\n        repo_type: str,\n        namespace: str,\n        repo: str,\n        disc_num: int,\n        target_author: str,\n    ) -> None:\n        \"\"\"Fetch comments on a discussion and count those by target author.\"\"\"\n        url = f\"{API_BASE}/{repo_type}/{namespace}/{repo}/discussions/{disc_num}\"\n\n        try:\n            response = self.session.get(url, timeout=30)\n            response.raise_for_status()\n            data = response.json()\n        except requests.RequestException:\n            return\n\n        repo_id = f\"{namespace}/{repo}\"\n        events = data.get(\"events\", [])\n        for event in events:\n            if event.get(\"type\") == \"comment\":\n                author_info = event.get(\"author\", {}) or {}\n                author = author_info.get(\"name\") or author_info.get(\"fullname\")\n                if author == target_author:\n                    self._add_point(author, \"comments_made\", repo_id, \"external_comment\", disc_num)\n\n    def _list_repos(self, repo_type: str) -> list[dict[str, Any]]:\n        \"\"\"List all repos of a given type in the organization.\"\"\"\n        endpoint = f\"{API_BASE}/{repo_type}\"\n        params = {\"author\": ORG_NAME, \"limit\": 1000}\n\n        try:\n            response = self.session.get(endpoint, params=params, timeout=30)\n            response.raise_for_status()\n            return response.json()\n        except requests.RequestException as e:\n            self.log(f\"⚠️ Failed to list {repo_type}: {e}\")\n            return []\n\n    def _scan_discussions(self, repo_id: str, repo_type: str) -> None:\n        \"\"\"Scan all discussions for a repo and count engagement.\"\"\"\n        # Map repo type to API path\n        type_map = {\"model\": \"models\", \"dataset\": \"datasets\", \"space\": \"spaces\"}\n        api_type = type_map.get(repo_type, \"models\")\n\n        url = f\"{API_BASE}/{api_type}/{repo_id}/discussions\"\n\n        try:\n            response = self.session.get(url, params={\"limit\": DISCUSSION_LIMIT}, timeout=30)\n            response.raise_for_status()\n            data = response.json()\n        except requests.RequestException as e:\n            self.log(f\"⚠️ Failed to get discussions for {repo_id}: {e}\")\n            return\n\n        discussions = data.get(\"discussions\", [])\n        if not discussions:\n            return\n\n        self.log(f\"💬 {repo_id}: found {len(discussions)} discussions\")\n\n        for discussion in discussions:\n            self._process_discussion(repo_id, api_type, discussion)\n\n    def _process_discussion(self, repo_id: str, api_type: str, discussion: dict[str, Any]) -> None:\n        \"\"\"Process a single discussion and its comments.\"\"\"\n        author_info = discussion.get(\"author\", {}) or {}\n        author = author_info.get(\"name\") or author_info.get(\"fullname\")\n        is_pr = discussion.get(\"isPullRequest\", False)\n        disc_num = discussion.get(\"num\")\n\n        if author and author != ORG_NAME:\n            activity_type = \"pr_opened\" if is_pr else \"discussion_opened\"\n            point_type = \"prs_opened\" if is_pr else \"discussions_opened\"\n            self._add_point(author, point_type, repo_id, activity_type, disc_num)\n\n        # Fetch discussion details to get comments\n        if disc_num:\n            self._fetch_comments(repo_id, api_type, disc_num)\n\n    def _fetch_comments(self, repo_id: str, api_type: str, disc_num: int) -> None:\n        \"\"\"Fetch and count comments on a discussion.\"\"\"\n        url = f\"{API_BASE}/{api_type}/{repo_id}/discussions/{disc_num}\"\n\n        try:\n            response = self.session.get(url, timeout=30)\n            response.raise_for_status()\n            data = response.json()\n        except requests.RequestException:\n            # Silently skip failed comment fetches\n            return\n\n        events = data.get(\"events\", [])\n        for event in events:\n            event_type = event.get(\"type\")\n            # Count comments (not the initial post, status changes, etc.)\n            if event_type == \"comment\":\n                author_info = event.get(\"author\", {}) or {}\n                author = author_info.get(\"name\") or author_info.get(\"fullname\")\n                if author and author != ORG_NAME:\n                    self._add_point(author, \"comments_made\", repo_id, \"comment\", disc_num)\n\n    def _add_point(\n        self,\n        username: str,\n        point_type: str,\n        repo_id: str,\n        activity_type: str,\n        disc_num: int | None = None,\n    ) -> None:\n        \"\"\"Add a point to a user's stats.\"\"\"\n        if not username:\n            return\n\n        # Initialize stats for users not in the org (external contributors)\n        if username not in self.user_stats:\n            self.user_stats[username] = UserStats(username=username, is_org_member=False)\n\n        stats = self.user_stats[username]\n        current = getattr(stats, point_type, 0)\n        setattr(stats, point_type, current + 1)\n\n        stats.activities.append(\n            {\n                \"type\": activity_type,\n                \"repo_id\": repo_id,\n                \"discussion_num\": disc_num,\n                \"timestamp\": datetime.now(timezone.utc).isoformat(),\n            }\n        )\n\n    def get_leaderboard(self) -> list[dict[str, Any]]:\n        \"\"\"Get the leaderboard sorted by total points.\"\"\"\n        leaderboard = [stats.to_dict() for stats in self.user_stats.values()]\n        leaderboard.sort(key=lambda x: x[\"total_points\"], reverse=True)\n        return leaderboard\n\n    def save_json(self, filepath: str) -> None:\n        \"\"\"Save the leaderboard to a JSON file.\"\"\"\n        leaderboard = self.get_leaderboard()\n        output = {\n            \"generated_at\": datetime.now(timezone.utc).isoformat(),\n            \"organization\": ORG_NAME,\n            \"total_participants\": len(leaderboard),\n            \"leaderboard\": leaderboard,\n        }\n        with open(filepath, \"w\") as f:\n            json.dump(output, f, indent=2)\n        self.log(f\"💾 Saved leaderboard to {filepath}\")\n\n    def push_to_hub(self, repo_id: str = \"hf-skills/hackers-leaderboard\") -> None:\n        \"\"\"Push the leaderboard data to a HF dataset.\"\"\"\n        try:\n            from huggingface_hub import HfApi\n        except ImportError:\n            self.log(\"❌ huggingface_hub not installed. Run: pip install huggingface_hub\")\n            return\n\n        api = HfApi()\n        leaderboard = self.get_leaderboard()\n\n        # Create dataset as JSONL\n        jsonl_content = \"\\n\".join(json.dumps(row) for row in leaderboard)\n\n        # Also create a metadata file\n        metadata = {\n            \"generated_at\": datetime.now(timezone.utc).isoformat(),\n            \"organization\": ORG_NAME,\n            \"total_participants\": len(leaderboard),\n            \"total_points\": sum(row[\"total_points\"] for row in leaderboard),\n        }\n\n        try:\n            # Create repo if it doesn't exist\n            api.create_repo(repo_id=repo_id, repo_type=\"dataset\", exist_ok=True)\n            self.log(f\"📁 Ensured dataset repo exists: {repo_id}\")\n\n            # Upload leaderboard data\n            api.upload_file(\n                path_or_fileobj=jsonl_content.encode(),\n                path_in_repo=\"data/leaderboard.jsonl\",\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n                commit_message=f\"Update leaderboard - {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')} UTC\",\n            )\n\n            # Upload metadata\n            api.upload_file(\n                path_or_fileobj=json.dumps(metadata, indent=2).encode(),\n                path_in_repo=\"data/metadata.json\",\n                repo_id=repo_id,\n                repo_type=\"dataset\",\n                commit_message=f\"Update metadata - {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')} UTC\",\n            )\n\n            self.log(f\"🚀 Pushed leaderboard to {repo_id}\")\n        except Exception as e:\n            self.log(f\"❌ Failed to push to hub: {e}\")\n\n\ndef main() -> None:\n    parser = argparse.ArgumentParser(description=\"Collect engagement points from hf-skills organization\")\n    parser.add_argument(\n        \"--push-to-hub\",\n        action=\"store_true\",\n        help=\"Push results to HF dataset\",\n    )\n    parser.add_argument(\n        \"--output\",\n        type=str,\n        default=\"leaderboard.json\",\n        help=\"Output JSON file path\",\n    )\n    parser.add_argument(\n        \"--repo-id\",\n        type=str,\n        default=\"hf-skills/hackers-leaderboard\",\n        help=\"HF dataset repo ID for pushing\",\n    )\n    parser.add_argument(\n        \"--scan-external\",\n        action=\"store_true\",\n        help=\"Also scan trending repos for PRs/discussions by org members\",\n    )\n    parser.add_argument(\n        \"--repo-type\",\n        type=str,\n        nargs=\"+\",\n        choices=[\"models\", \"datasets\", \"spaces\"],\n        default=None,\n        help=\"Repo types to scan (for --scan-external). Default: all types\",\n    )\n    args = parser.parse_args()\n\n    token = os.environ.get(\"HF_TOKEN\")\n    if not token:\n        print(\"⚠️ No HF_TOKEN found. Some requests may be rate-limited.\")\n\n    collector = PointsCollector(token=token)\n    collector.collect_all()\n\n    # Optionally scan external repos for member activity\n    if args.scan_external:\n        collector.scan_external_repos(repo_types=args.repo_type)\n\n    # Print leaderboard\n    print(\"\\n\" + \"=\" * 50)\n    print(\"🏆 HACKERS LEADERBOARD\")\n    print(\"=\" * 50)\n\n    leaderboard = collector.get_leaderboard()\n    for i, entry in enumerate(leaderboard[:20], 1):\n        print(\n            f\"{i:2}. {entry['username']:20} - {entry['total_points']:4} points \"\n            f\"(💬{entry['discussions_opened']} 📝{entry['comments_made']} \"\n            f\"🔀{entry['prs_opened']} 📦{entry['repos_owned']})\"\n        )\n\n    if len(leaderboard) > 20:\n        print(f\"   ... and {len(leaderboard) - 20} more participants\")\n\n    print(\"=\" * 50)\n    print(f\"Total participants: {len(leaderboard)}\")\n    print(f\"Total points awarded: {sum(e['total_points'] for e in leaderboard)}\")\n\n    # Save locally\n    collector.save_json(args.output)\n\n    # Push to hub if requested\n    if args.push_to_hub:\n        collector.push_to_hub(args.repo_id)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "apps/hackers-leaderboard/requirements.txt",
    "content": "gradio>=5.50.0\nhuggingface_hub>=1.1.4\nrequests>=2.32.5"
  },
  {
    "path": "apps/quests/02_evaluate-hub-model.md",
    "content": "# Week 1: Evaluate a Hub Model\n\n📣 TASK: Add evaluation results to model cards across the Hub. Together, we're building a distributed leaderboard of open source model performance.\n\n>[!NOTE]\n> Bonus XP for contributing to the leaderboard application. Open a PR [on the hub](https://huggingface.co/spaces/hf-skills/distributed-leaderboard/discussions) or [on GitHub](https://github.com/huggingface/skills/blob/main/apps/evals-leaderboard/app.py) to get your (bonus) XP.\n\n## Why This Matters\n\nModel cards without evaluation data are hard to compare. By adding structured eval results to metadata, we make models easier to compare and review. Your contributions power leaderboards and help the community find the best models for their needs. Also, by doing this in a distributed way, we can share our evaluation results with the community.\n\n## Goals\n\n- Add eval scores to the 100 trending models on the Hub\n- Include AIME 2025, BigBenchHard, LiveCodeBench, MMLU, ARC on trending models.\n- It is ok to include a subset of the benchmarks available for the model.\n- Build a leaderboard application that shows the evaluation results for the trending models.\n\n## XP Tiers\n\nTaking part is simple. We need to get model authors to show evaluation results in their model cards. This is a clean up job!\n\n| Tier            | XP    | Description                                                   | What Counts                                  |\n|-----------------|-------|---------------------------------------------------------------|-----------------------------------------------|\n| 🐢 Contributor  | 1 XP  | Extract evaluation results from one benchmark and update its model card. | Any PR on the repo with evaluation data.      |\n| 🐕 Evaluator    | 5 XP  | Import scores from third-party benchmarks like Artificial Analysis. | Undefined benchmark scores and merged PRs.    |\n| 🦁 Advanced     | 20 XP| Run your own evaluation with inspect-ai and publish results.   | Original eval run and merged PR.              |\n| 🐉 Bonus        | 20 XP| Contribute to the leaderboard application.                      | Any Merged PR on the hub or GitHub.                  |\n| 🤢 Slop         | -20 XP  | Opening none useful PRs.                  | Duplicate PRs, Incorrect Eval Scores, Incorrect Benchmark Scores          |\n\n> [!WARNING]\n> This hackathon is about advancing the state of open source AI. We want useful PRs that help everyone out, not just metrics. \n\n## The Skill\n\nUse `hf-evaluation/` for this quest. Key capabilities:\n\n- Extract evaluation tables from existing README content posted by model authors.\n- Import benchmark scores from [Artificial Analysis](https://artificial.com/).\n- Run your own evals with [inspect-ai](https://github.com/UKGovernmentBEIS/inspect_ai) on [HF Jobs](https://huggingface.co/docs/huggingface_hub/en/guides/jobs).\n- Update model-index metadata in the model card.\n\n>[!NOTE]\n> Take a look at the [SKILL.md](https://github.com/huggingface/skills/blob/main/hf-evaluation/SKILL.md) for more details.\n\n### Extract Evaluation Tables from README\n\n1. Pick a Hub model without evaluation data from *trending models* on the hub\n2. Use the skill to extract or add a benchmark score\n3. Create a PR (or push directly if you own the model)\n\nThe agent will use this script to extract evaluation tables from the model's README.\n\n```bash\npython hf-evaluation/scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"model-author/model-name\" --dry-run\n```\n\n### Import Scores from Artificial Analysis\n\n1. Find a model with benchmark data on external sites\n2. Use `import-aa` to fetch scores from Artificial Analysis API\n3. Create a PR with properly attributed evaluation data\n\nThe agent will use this script to fetch scores from Artificial Analysis API and add them to the model card.\n\n```bash\npython hf-evaluation/scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"anthropic\" --model-name \"claude-sonnet-4\" \\\n  --repo-id \"target/model\" --create-pr\n```\n\n### Run your own evaluation with inspect-ai and publish results.\n\n1. Choose an eval task (MMLU, GSM8K, HumanEval, etc.)\n2. Run the evaluation on HF Jobs infrastructure\n3. Update the model card with your results and methodology\n\nThe agent will use this script to run the evaluation on HF Jobs infrastructure and update the model card with the results.\n\n```bash\nHF_TOKEN=$HF_TOKEN hf jobs uv run hf-evaluation/scripts/inspect_eval_uv.py \\\n  --flavor a10g-small --secret HF_TOKEN=$HF_TOKEN \\\n  -- --model \"meta-llama/Llama-2-7b-hf\" --task \"mmlu\"\n```\n\n## Tips\n\n- Always use `--dry-run` first to preview changes before pushing\n- Check for transposed tables where models are rows and benchmarks are columns\n- Be careful with PRs for models you don't own — most maintainers appreciate eval contributions but be respectful.\n- Manually validate the extracted scores and close PRs if needed.\n\n## Resources\n\n- [SKILL.md](../../hf-evaluation/SKILL.md) — Full skill documentation\n- [Example Usage](../../hf-evaluation/examples/USAGE_EXAMPLES.md) — Worked examples\n- [Metric Mapping](../../hf-evaluation/examples/metric_mapping.json) — Standard metric types\n\n"
  },
  {
    "path": "apps/quests/03_publish-hub-dataset.md",
    "content": "# Week 2: Publish a Hub Dataset\n\nCreate and share high-quality datasets on the Hub. Good data is the foundation of good models—help the community by contributing datasets others can train on.\n\n## Why This Matters\n\nThe best open source models are built on openly available datasets. By publishing well-documented, properly structured datasets, you're directly enabling the next generation of model development. Quality matters more than quantity.\n\n## The Skill\n\nUse `hf-datasets/` for this quest. Key capabilities:\n\n- Initialize dataset repos with proper structure\n- Multi-format support: chat, classification, QA, completion, tabular\n- Template-based validation for data quality\n- Streaming uploads without downloading entire datasets\n\n```bash\n# Quick setup with a template\npython hf-datasets/scripts/dataset_manager.py quick_setup \\\n  --repo_id \"your-username/dataset-name\" --template chat\n```\n\n## XP Tiers\n\n### 🐢 Starter — 50 XP\n\n**Upload a small, clean dataset with a complete dataset card.**\n\n1. Create a dataset with ≤1,000 rows\n2. Write a dataset card covering: license, splits, and data provenance\n3. Upload to the Hub under the hackathon organization (or your own account)\n\n**What counts:** Clean data, clear documentation, proper licensing.\n\n```bash\npython hf-datasets/scripts/dataset_manager.py init \\\n  --repo_id \"hf-skills/your-dataset-name\"\n\npython hf-datasets/scripts/dataset_manager.py add_rows \\\n  --repo_id \"hf-skills/your-dataset-name\" \\\n  --template classification \\\n  --rows_json \"$(cat your_data.json)\"\n```\n\n### 🐕 Standard — 100 XP\n\n**Publish a conversational dataset with a complete dataset card.**\n\n1. Create a dataset with ≤1,000 rows\n2. Write a dataset card covering: license and splits.\n3. Upload to the Hub under the hackathon organization.\n\n**What counts:** Clean data, clear documentation, proper licensing.\n\n### 🦁 Advanced — 200 XP\n\n**Translate a dataset into multiple languages and publish it on the Hub.**\n\n1. Find a dataset on the Hub\n2. Translate the dataset into multiple languages\n3. Publish the translated datasets on the Hub under the hackathon organization\n\n**What counts:** Translated datasets and merged PRs.\n\n## Resources\n\n- [SKILL.md](../../hf-datasets/SKILL.md) — Full skill documentation\n- [Templates](../../hf-datasets/templates/) — JSON templates for each format\n- [Examples](../../hf-datasets/examples/) — Sample data and system prompts\n\n---\n\n**Next Quest:** [Supervised Fine-Tuning](04_sft-finetune-hub.md)\n"
  },
  {
    "path": "apps/quests/04_sft-finetune-hub.md",
    "content": "# Week 3: Supervised Fine-Tuning on the Hub\n\nFine-tune and share models on the Hub. Take a base model, train it on your data, and publish the result for the community to use.\n\n## Why This Matters\n\nFine-tuning is how we adapt foundation models to specific tasks. By sharing fine-tuned models—along with your training methodology—you're giving the community ready-to-use solutions and reproducible recipes they can learn from.\n\n## The Skill\n\nUse `hf-llm-trainer/` for this quest. Key capabilities:\n\n- **SFT** (Supervised Fine-Tuning) — Standard instruction tuning\n- **DPO** (Direct Preference Optimization) — Alignment from preference data\n- **GRPO** (Group Relative Policy Optimization) — Online RL training\n- Cloud GPU training on HF Jobs—no local setup required\n- Trackio integration for real-time monitoring\n- GGUF conversion for local deployment\n\nYour coding agent uses `hf_jobs()` to submit training scripts directly to HF infrastructure.\n\n## XP Tiers\n\nWe'll announce the XP tiers for this quest soon.\n\n## Resources\n\n- [SKILL.md](../../hf-llm-trainer/SKILL.md) — Full skill documentation\n- [SFT Example](../../hf-llm-trainer/scripts/train_sft_example.py) — Production SFT template\n- [DPO Example](../../hf-llm-trainer/scripts/train_dpo_example.py) — Production DPO template\n- [GRPO Example](../../hf-llm-trainer/scripts/train_grpo_example.py) — Production GRPO template\n- [Training Methods](../../hf-llm-trainer/references/training_methods.md) — Method selection guide\n- [Hardware Guide](../../hf-llm-trainer/references/hardware_guide.md) — GPU selection\n"
  },
  {
    "path": "apps/quests/README.md",
    "content": "---\ntitle: README\nemoji: 🐠\ncolorFrom: yellow\ncolorTo: gray\nsdk: static\npinned: false\n---\n\n# Humanity's Last Hackathon (of 2025)\n\n<img src=\"https://github.com/huggingface/skills/raw/main/assets/banner.png\" alt=\"Humanity's Last Hackathon (of 2025)\" width=\"100%\">\n\nWelcome to our hackathon!\n\nWhether you’re a tooled up ML engineer, a classicist NLP dev, or an AGI pilled vibe coder, this hackathon is going to be hard work! We’re going to take the latest and greatest coding agents \nand use them to level up open source AI. After all, **why use December to relax and spend time with loved ones, when you can solve AI for all humanity?** Jokes aside, this hackathon is not \nabout learning skills from zero or breaking things down in their simplest components. It’s about collaborating, shipping, and making a difference for the open source community.\n\n## What We're Building\n\nOver four weeks, we're using coding agents to level up the open source AI ecosystem:\n\n- **Week 1** — Evaluate models and build a distributed leaderboard\n- **Week 2** — Create high-quality datasets for the community  \n- **Week 3** — Fine-tune and share models on the Hub\n- **Week 4** — Sprint to the finish line together\n\nEvery contribution earns XP. Top contributors make the leaderboard. Winners get prizes!\n\nHere's the schedule:\n\n| Date | Event | Link |\n|------|-------|------|\n| Dec 2 (Mon) | Week 1 Quest Released | [Evaluate a Hub Model](02_evaluate-hub-model.md) |\n| Dec 4 (Wed) | Livestream 1 | [Q&A 1](https://youtube.com/live/rworGSh-Rgk?feature=share) |\n| Dec 9 (Mon) | Week 2 Quest Released | [Publish a Hub Dataset](03_publish-hub-dataset.md) |\n| Dec 11 (Wed) | Livestream 2 | TBA |\n| Dec 16 (Mon) | Week 3 Quest Released | [Supervised Fine-Tuning](04_sft-finetune-hub.md) |\n| Dec 18 (Wed) | Livestream 3 | TBA |\n| Dec 23 (Mon) | Week 4 Community Sprint | TBA |\n| Dec 31 (Tue) | Hackathon Ends | TBA\n\n## Getting Started\n\n### 1. Join the Organization\n\nJoin [hf-skills](https://huggingface.co/organizations/hf-skills/share/KrqrmBxkETjvevFbfkXeezcyMbgMjjMaOp) on Hugging Face. This is where your contributions will be tracked and updated on the leaderboard.\n\n### 2. Set Up Your Coding Agent\n\nUse whatever coding agent you prefer:\n\n- **Claude Code** — `claude` in your terminal\n- **Codex** — `codex` CLI\n- **Gemini CLI** — `gemini` in your terminal\n- **Cursor / Windsurf** — IDE-based agents\n- **Open source** — aider, continue, etc.\n\nThe skills in this repo work with any agent that can read markdown instructions and run Python scripts. To install the skills, follow the instructions in the [README](../README.md).\n\n### 3. Get Your HF Token\n\nMost quests require a Hugging Face token with write access:\n\n```bash\n# mac/linux\ncurl -LsSf https://hf.co/cli/install.sh | bash\n\n# windows\npowershell -ExecutionPolicy ByPass -c \"irm https://hf.co/cli/install.ps1 | iex\"\n\n# Login (creates/stores your token)\nhf auth login\n```\n\nThis will set your `HF_TOKEN` environment variable.\n\n### 4. Clone the Skills Repo\n\n```bash\ngit clone https://github.com/huggingface/skills.git\ncd skills\n```\n\nPoint your coding agent at the relevant configuration. Check the [README](../README.md) for instructions on how to use the skills with your coding agent.\n\n## Your First Quest\n\n**Week 1 is live!** Head to [02_evaluate-hub-model.md](02_evaluate-hub-model.md) to start evaluating models and climb the leaderboard.\n\n<iframe\n\tsrc=\"https://hf-skills-hacker-leaderboard.hf.space\"\n\tframeborder=\"0\"\n\twidth=\"850\"\n\theight=\"450\"\n></iframe>\n\n[Leaderboard](https://hf-skills-hacker-leaderboard.hf.space)\n\n## Earning XP\n\nEach quest has three tiers:\n\n| Tier | What it takes | XP |\n|------|---------------|-----|\n| 🐢 | Complete the basics | 50-75 XP |\n| 🐕 | Go deeper with more features | 100-125 XP |\n| 🦁 | Ship something impressive | 200-225 XP |\n\nYou can complete multiple tiers, and you can complete the same quest multiple times with different models/datasets/spaces.\n\n## Getting Help\n\n- [Discord](https://discord.com/channels/879548962464493619/1442881667986624554) — Join the Hugging Face Discord for real-time help\n- [Livestreams](https://www.youtube.com/@HuggingFace/streams) — Weekly streams with walkthroughs and Q&A\n- [Issues](https://github.com/huggingface/skills/issues) — Open an issue in this repo if you're stuck\n\nTo join the Hackathon, join the organization on the hub and setup your coding agent. \n\nReady? Let's ship some AI. 🚀\n"
  },
  {
    "path": "gemini-extension.json",
    "content": "{\n    \"name\": \"huggingface-skills\",\n    \"description\": \"Provides access to the Hugging Face Skills.\",\n    \"version\": \"0.1.0\",\n    \"contextFileName\": \"agents/AGENTS.md\",\n    \"mcpServers\": {\n        \"huggingface-skills\": {\n            \"httpUrl\": \"https://huggingface.co/mcp?login\"\n        }\n    }\n}"
  },
  {
    "path": "hf-mcp/skills/hf-mcp/SKILL.md",
    "content": "---\nname: hf-mcp\ndescription: Use Hugging Face Hub via MCP server tools. Search models, datasets, Spaces, papers. Get repo details, fetch documentation, run compute jobs, and use Gradio Spaces as AI tools. Available when connected to the HF MCP server.\n---\n\n# Hugging Face MCP Server\n\nConnect AI assistants to the Hugging Face Hub. Setup: https://huggingface.co/settings/mcp\n\n## Use Cases & Examples\n\n### Find the Best Model for a Task\n\n```\nUser: \"Find the best model for code generation\"\n\n1. model_search(task=\"text-generation\", query=\"code\", sort=\"trendingScore\", limit=10)\n2. hub_repo_details(repo_ids=[\"top-result-id\"], include_readme=true)\n```\n\n### Compare Models from Different Providers\n\n```\nUser: \"Compare Llama vs Qwen for text generation\"\n\n1. model_search(author=\"meta-llama\", task=\"text-generation\", sort=\"downloads\", limit=5)\n2. model_search(author=\"Qwen\", task=\"text-generation\", sort=\"downloads\", limit=5)\n3. hub_repo_details(repo_ids=[\"meta-llama/Llama-3.2-1B\", \"Qwen/Qwen3-8B\"], include_readme=true)\n```\n\n### Find Training Datasets\n\n```\nUser: \"Find datasets for sentiment analysis in English\"\n\n1. dataset_search(query=\"sentiment\", tags=[\"language:en\", \"task_categories:text-classification\"], sort=\"downloads\")\n2. hub_repo_details(repo_ids=[\"top-dataset-id\"], repo_type=\"dataset\", include_readme=true)\n```\n\n### Discover AI Tools (MCP Spaces)\n\n```\nUser: \"Find a tool that can remove image backgrounds\"\n\n1. space_search(query=\"background removal\", mcp=true)\n2. dynamic_space(operation=\"view_parameters\", space_name=\"result-space-id\")\n3. dynamic_space(operation=\"invoke\", space_name=\"result-space-id\", parameters=\"{...}\")\n```\n\n### Generate Images\n\n```\nUser: \"Create an image of a robot reading a book\"\n\n1. dynamic_space(operation=\"discover\")  # See available tasks\n2. gr1_flux1_schnell_infer(prompt=\"a robot sitting in a library reading a book, warm lighting, detailed\")\n```\n\n### Research a Topic\n\n```\nUser: \"What are the latest papers on RLHF?\"\n\n1. paper_search(query=\"reinforcement learning from human feedback\", results_limit=10)\n2. hub_repo_details(repo_ids=[\"paper-linked-model\"], include_readme=true)  # If paper links to models\n```\n\n### Learn How to Use a Library\n\n```\nUser: \"How do I fine-tune with LoRA using PEFT?\"\n\n1. hf_doc_search(query=\"LoRA fine-tuning\", product=\"peft\")\n2. hf_doc_fetch(doc_url=\"https://huggingface.co/docs/peft/...\")\n```\n\n### Run a Quick GPU Job\n\n```\nUser: \"Run this Python script on a GPU\"\n\nhf_jobs(operation=\"uv\", args={\n  \"script\": \"# /// script\\n# dependencies = [\\\"torch\\\"]\\n# ///\\nimport torch\\nprint(torch.cuda.is_available())\",\n  \"flavor\": \"t4-small\"\n})\n```\n\n### Train a Model on Cloud GPU\n\n```\nUser: \"Run my training script on an A10G\"\n\nhf_jobs(operation=\"run\", args={\n  \"image\": \"pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime\",\n  \"command\": [\"/bin/sh\", \"-lc\", \"pip install transformers trl && python train.py\"],\n  \"flavor\": \"a10g-small\",\n  \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n### Check Job Status\n\n```\nUser: \"What's happening with my training job?\"\n\n1. hf_jobs(operation=\"ps\")\n2. hf_jobs(operation=\"logs\", args={\"job_id\": \"job-xxxxx\"})\n```\n\n### Explore What's Trending\n\n```\nUser: \"What models are trending right now?\"\n\nmodel_search(sort=\"trendingScore\", limit=20)\n```\n\n### Get Model Card Details\n\n```\nUser: \"Tell me about Mistral-7B\"\n\nhub_repo_details(repo_ids=[\"mistralai/Mistral-7B-v0.1\"], include_readme=true)\n```\n\n### Find Quantized Models\n\n```\nUser: \"Find GGUF versions of Llama 3\"\n\nmodel_search(query=\"Llama 3 GGUF\", sort=\"downloads\", limit=10)\n```\n\n### Use a Gradio Space as a Tool\n\n```\nUser: \"Transcribe this audio file\"\n\n1. space_search(query=\"speech to text transcription\", mcp=true)\n2. dynamic_space(operation=\"view_parameters\", space_name=\"openai/whisper\")\n3. dynamic_space(operation=\"invoke\", space_name=\"openai/whisper\", parameters=\"{\\\"audio\\\": \\\"...\\\"}\")\n```\n\n### Schedule Recurring Jobs\n\n```\nUser: \"Run this data sync every day at midnight\"\n\nhf_jobs(operation=\"scheduled uv\", args={\n  \"script\": \"...\",\n  \"cron\": \"0 0 * * *\",\n  \"flavor\": \"cpu-basic\"\n})\n```\n\n## Tool Selection Guide\n\n| Goal | Tool |\n|------|------|\n| Find models | `model_search` |\n| Find datasets | `dataset_search` |\n| Find Spaces/apps | `space_search` |\n| Find papers | `paper_search` |\n| Get repo README/details | `hub_repo_details` |\n| Learn library usage | `hf_doc_search` → `hf_doc_fetch` |\n| Run code on GPU/CPU | `hf_jobs` |\n| Use Gradio apps as tools | `dynamic_space` |\n| Generate images | `gr1_flux1_schnell_infer` or `dynamic_space` |\n| Check auth | `hf_whoami` |\n\n## Tips\n\n- Use `sort=\"trendingScore\"` to find what's popular now\n- Use `sort=\"downloads\"` to find battle-tested options\n- Set `mcp=true` in `space_search` to find Spaces usable as tools\n- Use `include_readme=true` in `hub_repo_details` for full model/dataset documentation\n- For jobs accessing private repos, always include `secrets: {\"HF_TOKEN\": \"$HF_TOKEN\"}`\n- Use `dynamic_space(operation=\"discover\")` to see all available Space-based tasks\n"
  },
  {
    "path": "scripts/AGENTS_TEMPLATE.md",
    "content": "<skills>\n\nYou have additional SKILLs documented in directories containing a \"SKILL.md\" file.\n\nThese skills are:\n{{#skills}}\n - {{name}} -> \"{{path}}/SKILL.md\"\n{{/skills}}\n\nIMPORTANT: You MUST read the SKILL.md file whenever the description of the skills matches the user intent, or may help accomplish their task. \n\n<available_skills>\n\n{{#skills}}\n{{name}}: `{{description}}`\n\n{{/skills}}\n</available_skills>\n\nPaths referenced within SKILL folders are relative to that SKILL. For example the hf-datasets `scripts/example.py` would be referenced as `hf-datasets/scripts/example.py`. \n\n</skills>\n"
  },
  {
    "path": "scripts/UV_RULES.md",
    "content": "# UV rules\n\nUse these rules for Python scripts in this repository:\n\n1. **Use PEP 723 inline dependencies** in each runnable script:\n   ```python\n   # /// script\n   # requires-python = \">=3.10\"\n   # dependencies = [\"requests\"]\n   # ///\n   ```\n2. **Run scripts with `uv run`**, not `python ...`:\n   ```bash\n   uv run scripts/my_script.py --help\n   ```\n3. **Do not document `pip install -r requirements.txt` for repo scripts** unless there is a specific fallback reason. Normal usage should not require manual installation.\n4. **Do not tell users to `source .venv/bin/activate` for skill scripts.** `uv run` should be enough.\n5. **If a manual install example is truly needed, use `uv pip install ...`**, not `uv add`, unless you are intentionally editing a project-managed environment.\n6. **For Hugging Face Jobs UV workloads, use `hf jobs uv run ...`**.\n"
  },
  {
    "path": "scripts/generate_agents.py",
    "content": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"Generate AGENTS.md from AGENTS_TEMPLATE.md and SKILL.md frontmatter.\n\nAlso validates that marketplace.json is in sync with discovered skills,\nand updates the skills table in README.md.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport json\nimport re\nimport sys\nfrom pathlib import Path\n\n\nROOT = Path(__file__).resolve().parent.parent\nTEMPLATE_PATH = ROOT / \"scripts\" / \"AGENTS_TEMPLATE.md\"\nOUTPUT_PATH = ROOT / \"agents\" / \"AGENTS.md\"\nMARKETPLACE_PATH = ROOT / \".claude-plugin\" / \"marketplace.json\"\nREADME_PATH = ROOT / \"README.md\"\n\n# Markers for the auto-generated skills table in README\nREADME_TABLE_START = \"<!-- BEGIN_SKILLS_TABLE -->\"\nREADME_TABLE_END = \"<!-- END_SKILLS_TABLE -->\"\n\n\ndef load_template() -> str:\n    return TEMPLATE_PATH.read_text(encoding=\"utf-8\")\n\n\ndef parse_frontmatter(text: str) -> dict[str, str]:\n    \"\"\"Parse a minimal YAML-ish frontmatter block without external deps.\"\"\"\n    match = re.search(r\"^---\\s*\\n(.*?)\\n---\\s*\", text, re.DOTALL)\n    if not match:\n        return {}\n    data: dict[str, str] = {}\n    for line in match.group(1).splitlines():\n        if \":\" not in line:\n            continue\n        key, value = line.split(\":\", 1)\n        data[key.strip()] = value.strip()\n    return data\n\n\ndef collect_skills() -> list[dict[str, str]]:\n    skills: list[dict[str, str]] = []\n    for skill_md in ROOT.glob(\"skills/*/SKILL.md\"):\n        meta = parse_frontmatter(skill_md.read_text(encoding=\"utf-8\"))\n        name = meta.get(\"name\")\n        description = meta.get(\"description\")\n        if not name or not description:\n            continue\n        skills.append(\n            {\n                \"name\": name,\n                \"description\": description,\n                \"path\": str(skill_md.parent.relative_to(ROOT)),\n            }\n        )\n    # Keep deterministic order for consistent output\n    return sorted(skills, key=lambda s: s[\"name\"].lower())\n\n\ndef render(template: str, skills: list[dict[str, str]]) -> str:\n    \"\"\"Very small Mustache-like renderer that only supports a single skills loop.\"\"\"\n    def repl(match: re.Match[str]) -> str:\n        block = match.group(1).strip(\"\\n\")\n        rendered_blocks = []\n        for skill in skills:\n            rendered = (\n                block.replace(\"{{name}}\", skill[\"name\"])\n                .replace(\"{{description}}\", skill[\"description\"])\n                .replace(\"{{path}}\", skill[\"path\"])\n            )\n            rendered_blocks.append(rendered)\n        return \"\\n\".join(rendered_blocks)\n\n    # Render loop blocks\n    content = re.sub(r\"{{#skills}}(.*?){{/skills}}\", repl, template, flags=re.DOTALL)\n    return content\n\n\ndef load_marketplace() -> dict:\n    \"\"\"Load marketplace.json and return parsed structure.\"\"\"\n    if not MARKETPLACE_PATH.exists():\n        raise FileNotFoundError(f\"marketplace.json not found at {MARKETPLACE_PATH}\")\n    return json.loads(MARKETPLACE_PATH.read_text(encoding=\"utf-8\"))\n\n\ndef generate_readme_table(skills: list[dict[str, str]]) -> str:\n    \"\"\"Generate the skills table for README.md using marketplace.json names.\"\"\"\n    marketplace = load_marketplace()\n    plugins = {p[\"source\"]: p for p in marketplace.get(\"plugins\", [])}\n\n    lines = [\n        \"| Name | Description | Documentation |\",\n        \"|------|-------------|---------------|\",\n    ]\n\n    for skill in skills:\n        source = f\"./{skill['path']}\"\n        plugin = plugins.get(source, {})\n        name = plugin.get(\"name\", skill[\"name\"])\n        description = plugin.get(\"description\", skill[\"description\"])\n        doc_link = f\"[SKILL.md]({skill['path']}/SKILL.md)\"\n        lines.append(f\"| `{name}` | {description} | {doc_link} |\")\n\n    return \"\\n\".join(lines)\n\n\ndef update_readme(skills: list[dict[str, str]]) -> bool:\n    \"\"\"\n    Update the README.md skills table between markers.\n    Returns True if the file was updated, False if markers not found.\n    \"\"\"\n    if not README_PATH.exists():\n        print(f\"Warning: README.md not found at {README_PATH}\", file=sys.stderr)\n        return False\n\n    content = README_PATH.read_text(encoding=\"utf-8\")\n\n    start_idx = content.find(README_TABLE_START)\n    end_idx = content.find(README_TABLE_END)\n\n    if start_idx == -1 or end_idx == -1:\n        print(\n            f\"Warning: README.md markers not found. Add {README_TABLE_START} and \"\n            f\"{README_TABLE_END} to enable table generation.\",\n            file=sys.stderr,\n        )\n        return False\n\n    if end_idx < start_idx:\n        print(\"Warning: README.md markers are in wrong order.\", file=sys.stderr)\n        return False\n\n    table = generate_readme_table(skills)\n    new_content = (\n        content[: start_idx + len(README_TABLE_START)]\n        + \"\\n\"\n        + table\n        + \"\\n\"\n        + content[end_idx:]\n    )\n\n    README_PATH.write_text(new_content, encoding=\"utf-8\")\n    return True\n\n\ndef validate_marketplace(skills: list[dict[str, str]]) -> list[str]:\n    \"\"\"\n    Validate marketplace.json against discovered skills.\n    Returns list of error messages (empty = passed).\n    \"\"\"\n    errors: list[str] = []\n    marketplace = load_marketplace()\n    plugins = marketplace.get(\"plugins\", [])\n\n    # Build lookups (normalize paths: skill uses \"skills/x\", marketplace uses \"./skills/x\")\n    skill_by_source = {f\"./{s['path']}\": s for s in skills}\n    plugin_by_source = {p[\"source\"]: p for p in plugins}\n\n    # Check: every skill has a marketplace entry with matching name\n    for skill in skills:\n        expected_source = f\"./{skill['path']}\"\n        if expected_source not in plugin_by_source:\n            errors.append(\n                f\"Skill '{skill['name']}' at '{skill['path']}' is missing from marketplace.json\"\n            )\n        elif plugin_by_source[expected_source][\"name\"] != skill[\"name\"]:\n            errors.append(\n                f\"Name mismatch at '{expected_source}': \"\n                f\"SKILL.md='{skill['name']}', marketplace.json='{plugin_by_source[expected_source]['name']}'\"\n            )\n\n    # Check: every marketplace plugin has a corresponding skill\n    for plugin in plugins:\n        if plugin[\"source\"] not in skill_by_source:\n            errors.append(\n                f\"Marketplace plugin '{plugin['name']}' at '{plugin['source']}' has no SKILL.md\"\n            )\n\n    return errors\n\n\ndef main() -> None:\n    template = load_template()\n    skills = collect_skills()\n    output = render(template, skills)\n    OUTPUT_PATH.parent.mkdir(parents=True, exist_ok=True)\n    OUTPUT_PATH.write_text(output, encoding=\"utf-8\")\n    print(f\"Wrote {OUTPUT_PATH} with {len(skills)} skills.\")\n\n    # Validate marketplace.json\n    errors = validate_marketplace(skills)\n    if errors:\n        print(\"\\nMarketplace.json validation errors:\", file=sys.stderr)\n        for error in errors:\n            print(f\"  - {error}\", file=sys.stderr)\n        sys.exit(1)\n    print(\"Marketplace.json validation passed.\")\n\n    # Update README.md skills table\n    if update_readme(skills):\n        print(f\"Updated {README_PATH} skills table.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "scripts/generate_cursor_plugin.py",
    "content": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"Generate Cursor plugin artifacts from existing repo metadata.\n\nOutputs:\n- .cursor-plugin/plugin.json\n- .mcp.json\n\nDesign goals:\n- Keep Claude + Cursor metadata in sync.\n- Reuse .claude-plugin/plugin.json as primary metadata source.\n- Discover skills from skills/*/SKILL.md.\n- Reuse MCP URL from gemini-extension.json when available.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport argparse\nimport json\nimport re\nimport sys\nfrom pathlib import Path\n\n\nROOT = Path(__file__).resolve().parent.parent\nCLAUDE_PLUGIN_MANIFEST = ROOT / \".claude-plugin\" / \"plugin.json\"\nGEMINI_EXTENSION = ROOT / \"gemini-extension.json\"\nCURSOR_PLUGIN_DIR = ROOT / \".cursor-plugin\"\nCURSOR_PLUGIN_MANIFEST = CURSOR_PLUGIN_DIR / \"plugin.json\"\nCURSOR_MCP_CONFIG = ROOT / \".mcp.json\"\n\nDEFAULT_MCP_SERVER_NAME = \"huggingface-skills\"\nDEFAULT_MCP_URL = \"https://huggingface.co/mcp?login\"\n\nPLUGIN_NAME_RE = re.compile(r\"^[a-z0-9](?:[a-z0-9.-]*[a-z0-9])?$\")\n\n\ndef load_json(path: Path) -> dict:\n    if not path.exists():\n        raise FileNotFoundError(f\"Missing required file: {path}\")\n    return json.loads(path.read_text(encoding=\"utf-8\"))\n\n\ndef parse_frontmatter(text: str) -> dict[str, str]:\n    match = re.search(r\"^---\\s*\\n(.*?)\\n---\\s*\", text, re.DOTALL)\n    if not match:\n        return {}\n    data: dict[str, str] = {}\n    for line in match.group(1).splitlines():\n        if \":\" not in line:\n            continue\n        key, value = line.split(\":\", 1)\n        data[key.strip()] = value.strip()\n    return data\n\n\ndef collect_skills() -> list[str]:\n    skills: list[str] = []\n    for skill_md in sorted(ROOT.glob(\"skills/*/SKILL.md\")):\n        meta = parse_frontmatter(skill_md.read_text(encoding=\"utf-8\"))\n        name = meta.get(\"name\", \"\").strip()\n        if not name:\n            continue\n        skills.append(name)\n    return skills\n\n\ndef validate_plugin_name(name: str) -> None:\n    if not PLUGIN_NAME_RE.match(name):\n        raise ValueError(\n            \"Invalid plugin name in .claude-plugin/plugin.json: \"\n            f\"'{name}'. Must be lowercase and match {PLUGIN_NAME_RE.pattern}\"\n        )\n\n\ndef build_cursor_plugin_manifest() -> dict:\n    src = load_json(CLAUDE_PLUGIN_MANIFEST)\n\n    name = src.get(\"name\")\n    if not isinstance(name, str) or not name:\n        raise ValueError(\".claude-plugin/plugin.json must define a non-empty 'name'\")\n    validate_plugin_name(name)\n\n    skills = collect_skills()\n    if not skills:\n        raise ValueError(\"No skills discovered under skills/*/SKILL.md\")\n\n    manifest: dict = {\"name\": name, \"skills\": \"skills\", \"mcpServers\": \".mcp.json\"}\n\n    # Copy optional metadata fields when present.\n    for key in [\n        \"description\",\n        \"version\",\n        \"author\",\n        \"homepage\",\n        \"repository\",\n        \"license\",\n        \"keywords\",\n        \"logo\",\n    ]:\n        if key in src:\n            manifest[key] = src[key]\n\n    return manifest\n\n\ndef extract_mcp_from_gemini() -> tuple[str, str]:\n    \"\"\"Return (server_name, url) from gemini-extension when available.\"\"\"\n    if not GEMINI_EXTENSION.exists():\n        return DEFAULT_MCP_SERVER_NAME, DEFAULT_MCP_URL\n\n    data = load_json(GEMINI_EXTENSION)\n    servers = data.get(\"mcpServers\")\n    if not isinstance(servers, dict) or not servers:\n        return DEFAULT_MCP_SERVER_NAME, DEFAULT_MCP_URL\n\n    # Use first configured server as source of truth.\n    server_name = next(iter(servers.keys()))\n    server_cfg = servers[server_name]\n    if not isinstance(server_cfg, dict):\n        return DEFAULT_MCP_SERVER_NAME, DEFAULT_MCP_URL\n\n    url = server_cfg.get(\"url\") or server_cfg.get(\"httpUrl\") or DEFAULT_MCP_URL\n    if not isinstance(url, str) or not url.strip():\n        url = DEFAULT_MCP_URL\n\n    return server_name, url\n\n\ndef build_mcp_config() -> dict:\n    server_name, url = extract_mcp_from_gemini()\n    return {\n        \"mcpServers\": {\n            server_name: {\n                \"url\": url,\n            }\n        }\n    }\n\n\ndef render_json(data: dict) -> str:\n    return json.dumps(data, indent=2, ensure_ascii=False) + \"\\n\"\n\n\ndef write_or_check(path: Path, content: str, check: bool) -> bool:\n    \"\"\"Return True when file is already up-to-date (or after writing in non-check mode).\"\"\"\n    current = path.read_text(encoding=\"utf-8\") if path.exists() else None\n    if current == content:\n        return True\n\n    if check:\n        return False\n\n    path.parent.mkdir(parents=True, exist_ok=True)\n    path.write_text(content, encoding=\"utf-8\")\n    return True\n\n\ndef main() -> None:\n    parser = argparse.ArgumentParser(description=\"Generate Cursor plugin manifest + MCP config\")\n    parser.add_argument(\n        \"--check\",\n        action=\"store_true\",\n        help=\"Validate generated files are up-to-date without writing changes.\",\n    )\n    args = parser.parse_args()\n\n    plugin_manifest = render_json(build_cursor_plugin_manifest())\n    mcp_config = render_json(build_mcp_config())\n\n    ok_plugin = write_or_check(CURSOR_PLUGIN_MANIFEST, plugin_manifest, check=args.check)\n    ok_mcp = write_or_check(CURSOR_MCP_CONFIG, mcp_config, check=args.check)\n\n    if args.check:\n        outdated = []\n        if not ok_plugin:\n            outdated.append(str(CURSOR_PLUGIN_MANIFEST.relative_to(ROOT)))\n        if not ok_mcp:\n            outdated.append(str(CURSOR_MCP_CONFIG.relative_to(ROOT)))\n\n        if outdated:\n            print(\"Generated Cursor artifacts are out of date:\", file=sys.stderr)\n            for item in outdated:\n                print(f\"  - {item}\", file=sys.stderr)\n            print(\"Run: uv run scripts/generate_cursor_plugin.py\", file=sys.stderr)\n            sys.exit(1)\n\n        print(\"Cursor plugin artifacts are up to date.\")\n        return\n\n    print(f\"Wrote {CURSOR_PLUGIN_MANIFEST.relative_to(ROOT)}\")\n    print(f\"Wrote {CURSOR_MCP_CONFIG.relative_to(ROOT)}\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "scripts/publish.sh",
    "content": "#!/usr/bin/env bash\nset -euo pipefail\n\nROOT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")/..\" && pwd)\"\ncd \"$ROOT_DIR\"\n\nGENERATED_FILES=(\n  \"agents/AGENTS.md\"\n  \"README.md\"\n  \".cursor-plugin/plugin.json\"\n  \".mcp.json\"\n)\n\nfile_sig() {\n  local path=\"$1\"\n  if [[ -f \"$path\" ]]; then\n    sha256sum \"$path\" | awk '{print $1}'\n  else\n    echo \"__MISSING__\"\n  fi\n}\n\nrun_generate() {\n  uv run scripts/generate_agents.py\n  uv run scripts/generate_cursor_plugin.py\n}\n\nrun_check() {\n  declare -A before\n  local changed=()\n\n  for path in \"${GENERATED_FILES[@]}\"; do\n    before[\"$path\"]=\"$(file_sig \"$path\")\"\n  done\n\n  run_generate\n\n  for path in \"${GENERATED_FILES[@]}\"; do\n    if [[ \"${before[$path]}\" != \"$(file_sig \"$path\")\" ]]; then\n      changed+=(\"$path\")\n    fi\n  done\n\n  if [[ ${#changed[@]} -gt 0 ]]; then\n    echo \"Generated artifacts are outdated.\"\n    echo \"Run: ./scripts/publish.sh\"\n    echo\n    echo \"Changed files:\"\n    for path in \"${changed[@]}\"; do\n      echo \"$path\"\n    done\n    exit 1\n  fi\n\n  # Extra explicit check for cursor-only artifacts\n  uv run scripts/generate_cursor_plugin.py --check\n\n  echo \"All generated artifacts are up to date.\"\n}\n\ncase \"${1:-}\" in\n  \"\")\n    run_generate\n    echo \"Publish artifacts generated successfully.\"\n    ;;\n  \"--check\")\n    run_check\n    ;;\n  \"-h\"|\"--help\")\n    cat <<'EOF'\nUsage:\n  ./scripts/publish.sh         Generate all publish artifacts\n  ./scripts/publish.sh --check Verify generated artifacts are up to date\n\nThis script regenerates:\n  - agents/AGENTS.md\n  - README.md (skills table section)\n  - .cursor-plugin/plugin.json\n  - .mcp.json\nEOF\n    ;;\n  *)\n    echo \"Unknown option: $1\" >&2\n    echo \"Use --help for usage.\" >&2\n    exit 2\n    ;;\nesac\n"
  },
  {
    "path": "scripts/run_skills_help.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nScript to execute all Python programs under \"skills\" directories with `uv run` and --help flag.\n\"\"\"\n\nimport subprocess\nfrom pathlib import Path\n\ndef find_python_files():\n    \"\"\"Find all Python files under skills directories.\"\"\"\n    python_files = []\n    \n    # Search for skills directories and find Python files\n    for skills_dir in Path('.').rglob('../skills'):\n        if skills_dir.is_dir():\n            python_files.extend(skills_dir.rglob('*.py'))\n    \n    return sorted(set(python_files))\n\ndef run_with_help(python_file):\n    \"\"\"Run a Python file with uv run --help.\"\"\"\n    try:\n        print(f\"\\n{'='*60}\")\n        print(f\"Running: {python_file}\")\n        print(f\"{'='*60}\")\n        \n        result = subprocess.run(\n            ['uv', 'run', str(python_file), '--help'],\n            capture_output=True,\n            text=True,\n            timeout=30  # 30 second timeout\n        )\n        \n        if result.returncode == 0:\n            print(\"SUCCESS - Output:\")\n            print(result.stdout)\n        else:\n            print(f\"FAILED - Return code: {result.returncode}\")\n            if result.stderr:\n                print(\"Stderr:\")\n                print(result.stderr)\n            if result.stdout:\n                print(\"Stdout:\")\n                print(result.stdout)\n                \n    except subprocess.TimeoutExpired:\n        print(\"TIMEOUT - Command took too long\")\n    except FileNotFoundError:\n        print(\"ERROR - uv command not found. Please install uv.\")\n        return False\n    except Exception as e:\n        print(f\"ERROR - {str(e)}\")\n    \n    return True\n\ndef main():\n    \"\"\"Main function to find and run all Python files.\"\"\"\n    print(\"Finding Python files under skills directories...\")\n    python_files = find_python_files()\n    \n    if not python_files:\n        print(\"No Python files found under skills directories.\")\n        return\n    \n    print(f\"Found {len(python_files)} Python files\")\n    \n    success_count = 0\n    failed_count = 0\n    \n    for python_file in python_files:\n        if run_with_help(python_file):\n            success_count += 1\n        else:\n            failed_count += 1\n    \n    print(f\"\\n{'='*60}\")\n    print(f\"SUMMARY:\")\n    print(f\"Total files: {len(python_files)}\")\n    print(f\"Successful: {success_count}\")\n    print(f\"Failed: {failed_count}\")\n    print(f\"{'='*60}\")\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hf-cli/SKILL.md",
    "content": "---\nname: hf-cli\ndescription: \"Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing repositories, models, datasets, and Spaces on the Hugging Face Hub. Replaces now deprecated `huggingface-cli` command.\"\n---\n\nInstall: `curl -LsSf https://hf.co/cli/install.sh | bash -s`.\n\nThe Hugging Face Hub CLI tool `hf` is available. IMPORTANT: The `hf` command replaces the deprecated `huggingface-cli` command.\n\nUse `hf --help` to view available functions. Note that auth commands are now all under `hf auth` e.g. `hf auth whoami`.\n\nGenerated with `huggingface_hub v1.7.2`. Run `hf skills add --force` to regenerate.\n\n## Commands\n\n- `hf download REPO_ID` — Download files from the Hub. `[--type CHOICE --revision TEXT --include TEXT --exclude TEXT --cache-dir TEXT --local-dir TEXT --force-download --dry-run --quiet --max-workers INTEGER]`\n- `hf env` — Print information about the environment.\n- `hf sync` — Sync files between local directory and a bucket. `[--delete --ignore-times --ignore-sizes --plan TEXT --apply TEXT --dry-run --include TEXT --exclude TEXT --filter-from TEXT --existing --ignore-existing --verbose --quiet]`\n- `hf upload REPO_ID` — Upload a file or a folder to the Hub. Recommended for single-commit uploads. `[--type CHOICE --revision TEXT --private --include TEXT --exclude TEXT --delete TEXT --commit-message TEXT --commit-description TEXT --create-pr --every FLOAT --quiet]`\n- `hf upload-large-folder REPO_ID LOCAL_PATH` — Upload a large folder to the Hub. Recommended for resumable uploads. `[--type CHOICE --revision TEXT --private --include TEXT --exclude TEXT --num-workers INTEGER --no-report --no-bars]`\n- `hf version` — Print information about the hf version.\n\n### `hf auth` — Manage authentication (login, logout, etc.).\n\n- `hf auth list` — List all stored access tokens.\n- `hf auth login` — Login using a token from huggingface.co/settings/tokens. `[--add-to-git-credential --force]`\n- `hf auth logout` — Logout from a specific token. `[--token-name TEXT]`\n- `hf auth switch` — Switch between access tokens. `[--token-name TEXT --add-to-git-credential]`\n- `hf auth whoami` — Find out which huggingface.co account you are logged in as. `[--format CHOICE]`\n\n### `hf buckets` — Commands to interact with buckets.\n\n- `hf buckets cp SRC` — Copy a single file to or from a bucket. `[--quiet]`\n- `hf buckets create BUCKET_ID` — Create a new bucket. `[--private --exist-ok --quiet]`\n- `hf buckets delete BUCKET_ID` — Delete a bucket. `[--yes --missing-ok --quiet]`\n- `hf buckets info BUCKET_ID` — Get info about a bucket. `[--quiet]`\n- `hf buckets list` — List buckets or files in a bucket. `[--human-readable --tree --recursive --format CHOICE --quiet]`\n- `hf buckets move FROM_ID TO_ID` — Move (rename) a bucket to a new name or namespace.\n- `hf buckets remove ARGUMENT` — Remove files from a bucket. `[--recursive --yes --dry-run --include TEXT --exclude TEXT --quiet]`\n- `hf buckets sync` — Sync files between local directory and a bucket. `[--delete --ignore-times --ignore-sizes --plan TEXT --apply TEXT --dry-run --include TEXT --exclude TEXT --filter-from TEXT --existing --ignore-existing --verbose --quiet]`\n\n### `hf cache` — Manage local cache directory.\n\n- `hf cache list` — List cached repositories or revisions. `[--cache-dir TEXT --revisions --filter TEXT --format CHOICE --quiet --sort CHOICE --limit INTEGER]`\n- `hf cache prune` — Remove detached revisions from the cache. `[--cache-dir TEXT --yes --dry-run]`\n- `hf cache rm TARGETS` — Remove cached repositories or revisions. `[--cache-dir TEXT --yes --dry-run]`\n- `hf cache verify REPO_ID` — Verify checksums for a single repo revision from cache or a local directory. `[--type CHOICE --revision TEXT --cache-dir TEXT --local-dir TEXT --fail-on-missing-files --fail-on-extra-files]`\n\n### `hf collections` — Interact with collections on the Hub.\n\n- `hf collections add-item COLLECTION_SLUG ITEM_ID ITEM_TYPE` — Add an item to a collection. `[--note TEXT --exists-ok]`\n- `hf collections create TITLE` — Create a new collection on the Hub. `[--namespace TEXT --description TEXT --private --exists-ok]`\n- `hf collections delete COLLECTION_SLUG` — Delete a collection from the Hub. `[--missing-ok]`\n- `hf collections delete-item COLLECTION_SLUG ITEM_OBJECT_ID` — Delete an item from a collection. `[--missing-ok]`\n- `hf collections info COLLECTION_SLUG` — Get info about a collection on the Hub. Output is in JSON format.\n- `hf collections list` — List collections on the Hub. `[--owner TEXT --item TEXT --sort CHOICE --limit INTEGER --format CHOICE --quiet]`\n- `hf collections update COLLECTION_SLUG` — Update a collection's metadata on the Hub. `[--title TEXT --description TEXT --position INTEGER --private --theme TEXT]`\n- `hf collections update-item COLLECTION_SLUG ITEM_OBJECT_ID` — Update an item in a collection. `[--note TEXT --position INTEGER]`\n\n### `hf datasets` — Interact with datasets on the Hub.\n\n- `hf datasets info DATASET_ID` — Get info about a dataset on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`\n- `hf datasets list` — List datasets on the Hub. `[--search TEXT --author TEXT --filter TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`\n- `hf datasets parquet DATASET_ID` — List parquet file URLs available for a dataset. `[--subset TEXT --split TEXT --format CHOICE --quiet]`\n- `hf datasets sql SQL` — Execute a raw SQL query with DuckDB against dataset parquet URLs. `[--format CHOICE]`\n\n### `hf discussions` — Manage discussions and pull requests on the Hub.\n\n- `hf discussions close REPO_ID NUM` — Close a discussion or pull request. `[--comment TEXT --yes --type CHOICE]`\n- `hf discussions comment REPO_ID NUM` — Comment on a discussion or pull request. `[--body TEXT --body-file PATH --type CHOICE]`\n- `hf discussions create REPO_ID --title TEXT` — Create a new discussion or pull request on a repo. `[--body TEXT --body-file PATH --pull-request --type CHOICE]`\n- `hf discussions diff REPO_ID NUM` — Show the diff of a pull request. `[--type CHOICE]`\n- `hf discussions info REPO_ID NUM` — Get info about a discussion or pull request. `[--comments --diff --no-color --type CHOICE --format CHOICE]`\n- `hf discussions list REPO_ID` — List discussions and pull requests on a repo. `[--status CHOICE --kind CHOICE --author TEXT --limit INTEGER --type CHOICE --format CHOICE --quiet]`\n- `hf discussions merge REPO_ID NUM` — Merge a pull request. `[--comment TEXT --yes --type CHOICE]`\n- `hf discussions rename REPO_ID NUM NEW_TITLE` — Rename a discussion or pull request. `[--type CHOICE]`\n- `hf discussions reopen REPO_ID NUM` — Reopen a closed discussion or pull request. `[--comment TEXT --yes --type CHOICE]`\n\n### `hf endpoints` — Manage Hugging Face Inference Endpoints.\n\n- `hf endpoints catalog deploy --repo TEXT` — Deploy an Inference Endpoint from the Model Catalog. `[--name TEXT --accelerator TEXT --namespace TEXT]`\n- `hf endpoints catalog list` — List available Catalog models.\n- `hf endpoints delete NAME` — Delete an Inference Endpoint permanently. `[--namespace TEXT --yes]`\n- `hf endpoints deploy NAME --repo TEXT --framework TEXT --accelerator TEXT --instance-size TEXT --instance-type TEXT --region TEXT --vendor TEXT` — Deploy an Inference Endpoint from a Hub repository. `[--namespace TEXT --task TEXT --min-replica INTEGER --max-replica INTEGER --scale-to-zero-timeout INTEGER --scaling-metric CHOICE --scaling-threshold FLOAT]`\n- `hf endpoints describe NAME` — Get information about an existing endpoint. `[--namespace TEXT]`\n- `hf endpoints list` — Lists all Inference Endpoints for the given namespace. `[--namespace TEXT --format CHOICE --quiet]`\n- `hf endpoints pause NAME` — Pause an Inference Endpoint. `[--namespace TEXT]`\n- `hf endpoints resume NAME` — Resume an Inference Endpoint. `[--namespace TEXT --fail-if-already-running]`\n- `hf endpoints scale-to-zero NAME` — Scale an Inference Endpoint to zero. `[--namespace TEXT]`\n- `hf endpoints update NAME` — Update an existing endpoint. `[--namespace TEXT --repo TEXT --accelerator TEXT --instance-size TEXT --instance-type TEXT --framework TEXT --revision TEXT --task TEXT --min-replica INTEGER --max-replica INTEGER --scale-to-zero-timeout INTEGER --scaling-metric CHOICE --scaling-threshold FLOAT]`\n\n### `hf extensions` — Manage hf CLI extensions.\n\n- `hf extensions exec NAME` — Execute an installed extension.\n- `hf extensions install REPO_ID` — Install an extension from a public GitHub repository. `[--force]`\n- `hf extensions list` — List installed extension commands. `[--format CHOICE --quiet]`\n- `hf extensions remove NAME` — Remove an installed extension.\n- `hf extensions search` — Search extensions available on GitHub (tagged with 'hf-extension' topic). `[--format CHOICE --quiet]`\n\n### `hf jobs` — Run and manage Jobs on the Hub.\n\n- `hf jobs cancel JOB_ID` — Cancel a Job `[--namespace TEXT]`\n- `hf jobs hardware` — List available hardware options for Jobs\n- `hf jobs inspect JOB_IDS` — Display detailed information on one or more Jobs `[--namespace TEXT]`\n- `hf jobs logs JOB_ID` — Fetch the logs of a Job. `[--follow --tail INTEGER --namespace TEXT]`\n- `hf jobs ps` — List Jobs. `[--all --namespace TEXT --filter TEXT --format TEXT --quiet]`\n- `hf jobs run IMAGE COMMAND` — Run a Job. `[--env TEXT --secrets TEXT --label TEXT --env-file TEXT --secrets-file TEXT --flavor CHOICE --timeout TEXT --detach --namespace TEXT]`\n- `hf jobs scheduled delete SCHEDULED_JOB_ID` — Delete a scheduled Job. `[--namespace TEXT]`\n- `hf jobs scheduled inspect SCHEDULED_JOB_IDS` — Display detailed information on one or more scheduled Jobs `[--namespace TEXT]`\n- `hf jobs scheduled ps` — List scheduled Jobs `[--all --namespace TEXT --filter TEXT --format TEXT --quiet]`\n- `hf jobs scheduled resume SCHEDULED_JOB_ID` — Resume (unpause) a scheduled Job. `[--namespace TEXT]`\n- `hf jobs scheduled run SCHEDULE IMAGE COMMAND` — Schedule a Job. `[--suspend --concurrency --env TEXT --secrets TEXT --label TEXT --env-file TEXT --secrets-file TEXT --flavor CHOICE --timeout TEXT --namespace TEXT]`\n- `hf jobs scheduled suspend SCHEDULED_JOB_ID` — Suspend (pause) a scheduled Job. `[--namespace TEXT]`\n- `hf jobs scheduled uv run SCHEDULE SCRIPT` — Run a UV script (local file or URL) on HF infrastructure `[--suspend --concurrency --image TEXT --flavor CHOICE --env TEXT --secrets TEXT --label TEXT --env-file TEXT --secrets-file TEXT --timeout TEXT --namespace TEXT --with TEXT --python TEXT]`\n- `hf jobs stats` — Fetch the resource usage statistics and metrics of Jobs `[--namespace TEXT]`\n- `hf jobs uv run SCRIPT` — Run a UV script (local file or URL) on HF infrastructure `[--image TEXT --flavor CHOICE --env TEXT --secrets TEXT --label TEXT --env-file TEXT --secrets-file TEXT --timeout TEXT --detach --namespace TEXT --with TEXT --python TEXT]`\n\n### `hf models` — Interact with models on the Hub.\n\n- `hf models info MODEL_ID` — Get info about a model on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`\n- `hf models list` — List models on the Hub. `[--search TEXT --author TEXT --filter TEXT --num-parameters TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`\n\n### `hf papers` — Interact with papers on the Hub.\n\n- `hf papers list` — List daily papers on the Hub. `[--date TEXT --sort CHOICE --limit INTEGER --format CHOICE --quiet]`\n\n### `hf repos` — Manage repos on the Hub.\n\n- `hf repos branch create REPO_ID BRANCH` — Create a new branch for a repo on the Hub. `[--revision TEXT --type CHOICE --exist-ok]`\n- `hf repos branch delete REPO_ID BRANCH` — Delete a branch from a repo on the Hub. `[--type CHOICE]`\n- `hf repos create REPO_ID` — Create a new repo on the Hub. `[--type CHOICE --space-sdk TEXT --private --exist-ok --resource-group-id TEXT]`\n- `hf repos delete REPO_ID` — Delete a repo from the Hub. This is an irreversible operation. `[--type CHOICE --missing-ok]`\n- `hf repos delete-files REPO_ID PATTERNS` — Delete files from a repo on the Hub. `[--type CHOICE --revision TEXT --commit-message TEXT --commit-description TEXT --create-pr]`\n- `hf repos duplicate FROM_ID` — Duplicate a repo on the Hub (model, dataset, or Space). `[--type CHOICE --private --exist-ok]`\n- `hf repos move FROM_ID TO_ID` — Move a repository from a namespace to another namespace. `[--type CHOICE]`\n- `hf repos settings REPO_ID` — Update the settings of a repository. `[--gated CHOICE --private --type CHOICE]`\n- `hf repos tag create REPO_ID TAG` — Create a tag for a repo. `[--message TEXT --revision TEXT --type CHOICE]`\n- `hf repos tag delete REPO_ID TAG` — Delete a tag for a repo. `[--yes --type CHOICE]`\n- `hf repos tag list REPO_ID` — List tags for a repo. `[--type CHOICE]`\n\n### `hf skills` — Manage skills for AI assistants.\n\n- `hf skills add` — Download a skill and install it for an AI assistant. `[--claude --codex --cursor --opencode --global --dest PATH --force]`\n- `hf skills preview` — Print the generated SKILL.md to stdout.\n\n### `hf spaces` — Interact with spaces on the Hub.\n\n- `hf spaces dev-mode SPACE_ID` — Enable or disable dev mode on a Space. `[--stop]`\n- `hf spaces hot-reload SPACE_ID` — Hot-reload any Python file of a Space without a full rebuild + restart. `[--local-file TEXT --skip-checks --skip-summary]`\n- `hf spaces info SPACE_ID` — Get info about a space on the Hub. Output is in JSON format. `[--revision TEXT --expand TEXT]`\n- `hf spaces list` — List spaces on the Hub. `[--search TEXT --author TEXT --filter TEXT --sort CHOICE --limit INTEGER --expand TEXT --format CHOICE --quiet]`\n\n### `hf webhooks` — Manage webhooks on the Hub.\n\n- `hf webhooks create --watch TEXT` — Create a new webhook. `[--url TEXT --job-id TEXT --domain CHOICE --secret TEXT]`\n- `hf webhooks delete WEBHOOK_ID` — Delete a webhook permanently. `[--yes]`\n- `hf webhooks disable WEBHOOK_ID` — Disable an active webhook.\n- `hf webhooks enable WEBHOOK_ID` — Enable a disabled webhook.\n- `hf webhooks info WEBHOOK_ID` — Show full details for a single webhook as JSON.\n- `hf webhooks list` — List all webhooks for the current user. `[--format CHOICE --quiet]`\n- `hf webhooks update WEBHOOK_ID` — Update an existing webhook. Only provided options are changed. `[--url TEXT --watch TEXT --domain CHOICE --secret TEXT]`\n\n## Common options\n\n- `--format` — Output format: `--format json` (or `--json`) or `--format table` (default).\n- `-q / --quiet` — Minimal output.\n- `--revision` — Git revision id which can be a branch name, a tag, or a commit hash.\n- `--token` — Use a User Access Token. Prefer setting `HF_TOKEN` env var instead of passing `--token`.\n- `--type` — The type of repository (model, dataset, or space).\n\n## Tips\n\n- Use `hf <command> --help` for full options, descriptions, usage, and real-world examples\n- Authenticate with `HF_TOKEN` env var (recommended) or with `--token`\n"
  },
  {
    "path": "skills/hugging-face-dataset-viewer/SKILL.md",
    "content": "---\r\nname: hugging-face-dataset-viewer\r\ndescription: Use this skill for Hugging Face Dataset Viewer API workflows that fetch subset/split metadata, paginate rows, search text, apply filters, download parquet URLs, and read size or statistics.\r\n---\r\n\r\n# Hugging Face Dataset Viewer\r\n\r\nUse this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.\r\n\r\n## Core workflow\r\n\r\n1. Optionally validate dataset availability with `/is-valid`.\r\n2. Resolve `config` + `split` with `/splits`.\r\n3. Preview with `/first-rows`.\r\n4. Paginate content with `/rows` using `offset` and `length` (max 100).\r\n5. Use `/search` for text matching and `/filter` for row predicates.\r\n6. Retrieve parquet links via `/parquet` and totals/metadata via `/size` and `/statistics`.\r\n\r\n## Defaults\r\n\r\n- Base URL: `https://datasets-server.huggingface.co`\r\n- Default API method: `GET`\r\n- Query params should be URL-encoded.\r\n- `offset` is 0-based.\r\n- `length` max is usually `100` for row-like endpoints.\r\n- Gated/private datasets require `Authorization: Bearer <HF_TOKEN>`.\r\n\r\n## Dataset Viewer\r\n\r\n- `Validate dataset`: `/is-valid?dataset=<namespace/repo>`\r\n- `List subsets and splits`: `/splits?dataset=<namespace/repo>`\r\n- `Preview first rows`: `/first-rows?dataset=<namespace/repo>&config=<config>&split=<split>`\r\n- `Paginate rows`: `/rows?dataset=<namespace/repo>&config=<config>&split=<split>&offset=<int>&length=<int>`\r\n- `Search text`: `/search?dataset=<namespace/repo>&config=<config>&split=<split>&query=<text>&offset=<int>&length=<int>`\r\n- `Filter with predicates`: `/filter?dataset=<namespace/repo>&config=<config>&split=<split>&where=<predicate>&orderby=<sort>&offset=<int>&length=<int>`\r\n- `List parquet shards`: `/parquet?dataset=<namespace/repo>`\r\n- `Get size totals`: `/size?dataset=<namespace/repo>`\r\n- `Get column statistics`: `/statistics?dataset=<namespace/repo>&config=<config>&split=<split>`\r\n- `Get Croissant metadata (if available)`: `/croissant?dataset=<namespace/repo>`\r\n\r\nPagination pattern:\r\n\r\n```bash\r\ncurl \"https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100\"\r\ncurl \"https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100\"\r\n```\r\n\r\nWhen pagination is partial, use response fields such as `num_rows_total`, `num_rows_per_page`, and `partial` to drive continuation logic.\r\n\r\nSearch/filter notes:\r\n\r\n- `/search` matches string columns (full-text style behavior is internal to the API).\r\n- `/filter` requires predicate syntax in `where` and optional sort in `orderby`.\r\n- Keep filtering and searches read-only and side-effect free.\r\n\r\n## Querying Datasets\r\n\r\nUse `npx parquetlens` with Hub parquet alias paths for SQL querying.\r\n\r\nParquet alias shape:\r\n\r\n```text\r\nhf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet\r\n```\r\n\r\nDerive `<config>`, `<split>`, and `<shard>` from Dataset Viewer `/parquet`:\r\n\r\n```bash\r\ncurl -s \"https://datasets-server.huggingface.co/parquet?dataset=cfahlgren1/hub-stats\" \\\r\n  | jq -r '.parquet_files[] | \"hf://datasets/\\(.dataset)@~parquet/\\(.config)/\\(.split)/\\(.filename)\"'\r\n```\r\n\r\nRun SQL query:\r\n\r\n```bash\r\nnpx -y -p parquetlens -p @parquetlens/sql parquetlens \\\r\n  \"hf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet\" \\\r\n  --sql \"SELECT * FROM data LIMIT 20\"\r\n```\r\n\r\n### SQL export\r\n\r\n- CSV: `--sql \"COPY (SELECT * FROM data LIMIT 1000) TO 'export.csv' (FORMAT CSV, HEADER, DELIMITER ',')\"`\r\n- JSON: `--sql \"COPY (SELECT * FROM data LIMIT 1000) TO 'export.json' (FORMAT JSON)\"`\r\n- Parquet: `--sql \"COPY (SELECT * FROM data LIMIT 1000) TO 'export.parquet' (FORMAT PARQUET)\"`\r\n\r\n## Creating and Uploading Datasets\r\n\r\nUse one of these flows depending on dependency constraints.\r\n\r\nZero local dependencies (Hub UI):\r\n\r\n- Create dataset repo in browser: `https://huggingface.co/new-dataset`\r\n- Upload parquet files in the repo \"Files and versions\" page.\r\n- Verify shards appear in Dataset Viewer:\r\n\r\n```bash\r\ncurl -s \"https://datasets-server.huggingface.co/parquet?dataset=<namespace>/<repo>\"\r\n```\r\n\r\nLow dependency CLI flow (`npx @huggingface/hub` / `hfjs`):\r\n\r\n- Set auth token:\r\n\r\n```bash\r\nexport HF_TOKEN=<your_hf_token>\r\n```\r\n\r\n- Upload parquet folder to a dataset repo (auto-creates repo if missing):\r\n\r\n```bash\r\nnpx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data\r\n```\r\n\r\n- Upload as private repo on creation:\r\n\r\n```bash\r\nnpx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data --private\r\n```\r\n\r\nAfter upload, call `/parquet` to discover `<config>/<split>/<shard>` values for querying with `@~parquet`.\r\n"
  },
  {
    "path": "skills/hugging-face-datasets/SKILL.md",
    "content": "---\nname: hugging-face-datasets\ndescription: Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.\n---\n\n# Overview\nThis skill provides tools to manage datasets on the Hugging Face Hub with a focus on creation, configuration, content management, and SQL-based data manipulation. It is designed to complement the existing Hugging Face MCP server by providing dataset editing and querying capabilities.\n\n## Integration with HF MCP Server\n- **Use HF MCP Server for**: Dataset discovery, search, and metadata retrieval\n- **Use This Skill for**: Dataset creation, content editing, SQL queries, data transformation, and structured data formatting\n\n# Version\n2.1.0\n\n# Dependencies\n# This skill uses PEP 723 scripts with inline dependency management\n# Scripts auto-install requirements when run with: uv run scripts/script_name.py\n\n- uv (Python package manager)\n- Getting Started: See \"Usage Instructions\" below for PEP 723 usage\n\n# Core Capabilities\n\n## 1. Dataset Lifecycle Management\n- **Initialize**: Create new dataset repositories with proper structure\n- **Configure**: Store detailed configuration including system prompts and metadata\n- **Stream Updates**: Add rows efficiently without downloading entire datasets\n\n## 2. SQL-Based Dataset Querying (NEW)\nQuery any Hugging Face dataset using DuckDB SQL via `scripts/sql_manager.py`:\n- **Direct Queries**: Run SQL on datasets using the `hf://` protocol\n- **Schema Discovery**: Describe dataset structure and column types\n- **Data Sampling**: Get random samples for exploration\n- **Aggregations**: Count, histogram, unique values analysis\n- **Transformations**: Filter, join, reshape data with SQL\n- **Export & Push**: Save results locally or push to new Hub repos\n\n## 3. Multi-Format Dataset Support\nSupports diverse dataset types through template system:\n- **Chat/Conversational**: Chat templating, multi-turn dialogues, tool usage examples\n- **Text Classification**: Sentiment analysis, intent detection, topic classification\n- **Question-Answering**: Reading comprehension, factual QA, knowledge bases\n- **Text Completion**: Language modeling, code completion, creative writing\n- **Tabular Data**: Structured data for regression/classification tasks\n- **Custom Formats**: Flexible schema definition for specialized needs\n\n## 4. Quality Assurance Features\n- **JSON Validation**: Ensures data integrity during uploads\n- **Batch Processing**: Efficient handling of large datasets\n- **Error Recovery**: Graceful handling of upload failures and conflicts\n\n# Usage Instructions\n\nThe skill includes two Python scripts that use PEP 723 inline dependency management:\n\n> **All paths are relative to the directory containing this SKILL.md\nfile.**\n> Scripts are run with: `uv run scripts/script_name.py [arguments]`\n\n- `scripts/dataset_manager.py` - Dataset creation and management\n- `scripts/sql_manager.py` - SQL-based dataset querying and transformation\n\n### Prerequisites\n- `uv` package manager installed\n- `HF_TOKEN` environment variable must be set with a Write-access token\n\n---\n\n# SQL Dataset Querying (sql_manager.py)\n\nQuery, transform, and push Hugging Face datasets using DuckDB SQL. The `hf://` protocol provides direct access to any public dataset (or private with token).\n\n## Quick Start\n\n```bash\n# Query a dataset\nuv run scripts/sql_manager.py query \\\n  --dataset \"cais/mmlu\" \\\n  --sql \"SELECT * FROM data WHERE subject='nutrition' LIMIT 10\"\n\n# Get dataset schema\nuv run scripts/sql_manager.py describe --dataset \"cais/mmlu\"\n\n# Sample random rows\nuv run scripts/sql_manager.py sample --dataset \"cais/mmlu\" --n 5\n\n# Count rows with filter\nuv run scripts/sql_manager.py count --dataset \"cais/mmlu\" --where \"subject='nutrition'\"\n```\n\n## SQL Query Syntax\n\nUse `data` as the table name in your SQL - it gets replaced with the actual `hf://` path:\n\n```sql\n-- Basic select\nSELECT * FROM data LIMIT 10\n\n-- Filtering\nSELECT * FROM data WHERE subject='nutrition'\n\n-- Aggregations\nSELECT subject, COUNT(*) as cnt FROM data GROUP BY subject ORDER BY cnt DESC\n\n-- Column selection and transformation\nSELECT question, choices[answer] AS correct_answer FROM data\n\n-- Regex matching\nSELECT * FROM data WHERE regexp_matches(question, 'nutrition|diet')\n\n-- String functions\nSELECT regexp_replace(question, '\\n', '') AS cleaned FROM data\n```\n\n## Common Operations\n\n### 1. Explore Dataset Structure\n```bash\n# Get schema\nuv run scripts/sql_manager.py describe --dataset \"cais/mmlu\"\n\n# Get unique values in column\nuv run scripts/sql_manager.py unique --dataset \"cais/mmlu\" --column \"subject\"\n\n# Get value distribution\nuv run scripts/sql_manager.py histogram --dataset \"cais/mmlu\" --column \"subject\" --bins 20\n```\n\n### 2. Filter and Transform\n```bash\n# Complex filtering with SQL\nuv run scripts/sql_manager.py query \\\n  --dataset \"cais/mmlu\" \\\n  --sql \"SELECT subject, COUNT(*) as cnt FROM data GROUP BY subject HAVING cnt > 100\"\n\n# Using transform command\nuv run scripts/sql_manager.py transform \\\n  --dataset \"cais/mmlu\" \\\n  --select \"subject, COUNT(*) as cnt\" \\\n  --group-by \"subject\" \\\n  --order-by \"cnt DESC\" \\\n  --limit 10\n```\n\n### 3. Create Subsets and Push to Hub\n```bash\n# Query and push to new dataset\nuv run scripts/sql_manager.py query \\\n  --dataset \"cais/mmlu\" \\\n  --sql \"SELECT * FROM data WHERE subject='nutrition'\" \\\n  --push-to \"username/mmlu-nutrition-subset\" \\\n  --private\n\n# Transform and push\nuv run scripts/sql_manager.py transform \\\n  --dataset \"ibm/duorc\" \\\n  --config \"ParaphraseRC\" \\\n  --select \"question, answers\" \\\n  --where \"LENGTH(question) > 50\" \\\n  --push-to \"username/duorc-long-questions\"\n```\n\n### 4. Export to Local Files\n```bash\n# Export to Parquet\nuv run scripts/sql_manager.py export \\\n  --dataset \"cais/mmlu\" \\\n  --sql \"SELECT * FROM data WHERE subject='nutrition'\" \\\n  --output \"nutrition.parquet\" \\\n  --format parquet\n\n# Export to JSONL\nuv run scripts/sql_manager.py export \\\n  --dataset \"cais/mmlu\" \\\n  --sql \"SELECT * FROM data LIMIT 100\" \\\n  --output \"sample.jsonl\" \\\n  --format jsonl\n```\n\n### 5. Working with Dataset Configs/Splits\n```bash\n# Specify config (subset)\nuv run scripts/sql_manager.py query \\\n  --dataset \"ibm/duorc\" \\\n  --config \"ParaphraseRC\" \\\n  --sql \"SELECT * FROM data LIMIT 5\"\n\n# Specify split\nuv run scripts/sql_manager.py query \\\n  --dataset \"cais/mmlu\" \\\n  --split \"test\" \\\n  --sql \"SELECT COUNT(*) FROM data\"\n\n# Query all splits\nuv run scripts/sql_manager.py query \\\n  --dataset \"cais/mmlu\" \\\n  --split \"*\" \\\n  --sql \"SELECT * FROM data LIMIT 10\"\n```\n\n### 6. Raw SQL with Full Paths\nFor complex queries or joining datasets:\n```bash\nuv run scripts/sql_manager.py raw --sql \"\n  SELECT a.*, b.* \n  FROM 'hf://datasets/dataset1@~parquet/default/train/*.parquet' a\n  JOIN 'hf://datasets/dataset2@~parquet/default/train/*.parquet' b\n  ON a.id = b.id\n  LIMIT 100\n\"\n```\n\n## Python API Usage\n\n```python\nfrom sql_manager import HFDatasetSQL\n\nsql = HFDatasetSQL()\n\n# Query\nresults = sql.query(\"cais/mmlu\", \"SELECT * FROM data WHERE subject='nutrition' LIMIT 10\")\n\n# Get schema\nschema = sql.describe(\"cais/mmlu\")\n\n# Sample\nsamples = sql.sample(\"cais/mmlu\", n=5, seed=42)\n\n# Count\ncount = sql.count(\"cais/mmlu\", where=\"subject='nutrition'\")\n\n# Histogram\ndist = sql.histogram(\"cais/mmlu\", \"subject\")\n\n# Filter and transform\nresults = sql.filter_and_transform(\n    \"cais/mmlu\",\n    select=\"subject, COUNT(*) as cnt\",\n    group_by=\"subject\",\n    order_by=\"cnt DESC\",\n    limit=10\n)\n\n# Push to Hub\nurl = sql.push_to_hub(\n    \"cais/mmlu\",\n    \"username/nutrition-subset\",\n    sql=\"SELECT * FROM data WHERE subject='nutrition'\",\n    private=True\n)\n\n# Export locally\nsql.export_to_parquet(\"cais/mmlu\", \"output.parquet\", sql=\"SELECT * FROM data LIMIT 100\")\n\nsql.close()\n```\n\n## HF Path Format\n\nDuckDB uses the `hf://` protocol to access datasets:\n```\nhf://datasets/{dataset_id}@{revision}/{config}/{split}/*.parquet\n```\n\nExamples:\n- `hf://datasets/cais/mmlu@~parquet/default/train/*.parquet`\n- `hf://datasets/ibm/duorc@~parquet/ParaphraseRC/test/*.parquet`\n\nThe `@~parquet` revision provides auto-converted Parquet files for any dataset format.\n\n## Useful DuckDB SQL Functions\n\n```sql\n-- String functions\nLENGTH(column)                    -- String length\nregexp_replace(col, '\\n', '')     -- Regex replace\nregexp_matches(col, 'pattern')    -- Regex match\nLOWER(col), UPPER(col)           -- Case conversion\n\n-- Array functions  \nchoices[0]                        -- Array indexing (0-based)\narray_length(choices)             -- Array length\nunnest(choices)                   -- Expand array to rows\n\n-- Aggregations\nCOUNT(*), SUM(col), AVG(col)\nGROUP BY col HAVING condition\n\n-- Sampling\nUSING SAMPLE 10                   -- Random sample\nUSING SAMPLE 10 (RESERVOIR, 42)   -- Reproducible sample\n\n-- Window functions\nROW_NUMBER() OVER (PARTITION BY col ORDER BY col2)\n```\n\n---\n\n# Dataset Creation (dataset_manager.py)\n\n### Recommended Workflow\n\n**1. Discovery (Use HF MCP Server):**\n```python\n# Use HF MCP tools to find existing datasets\nsearch_datasets(\"conversational AI training\")\nget_dataset_details(\"username/dataset-name\")\n```\n\n**2. Creation (Use This Skill):**\n```bash\n# Initialize new dataset\nuv run scripts/dataset_manager.py init --repo_id \"your-username/dataset-name\" [--private]\n\n# Configure with detailed system prompt\nuv run scripts/dataset_manager.py config --repo_id \"your-username/dataset-name\" --system_prompt \"$(cat system_prompt.txt)\"\n```\n\n**3. Content Management (Use This Skill):**\n```bash\n# Quick setup with any template\nuv run scripts/dataset_manager.py quick_setup \\\n  --repo_id \"your-username/dataset-name\" \\\n  --template classification\n\n# Add data with template validation\nuv run scripts/dataset_manager.py add_rows \\\n  --repo_id \"your-username/dataset-name\" \\\n  --template qa \\\n  --rows_json \"$(cat your_qa_data.json)\"\n```\n\n### Template-Based Data Structures\n\n**1. Chat Template (`--template chat`)**\n```json\n{\n  \"messages\": [\n    {\"role\": \"user\", \"content\": \"Natural user request\"},\n    {\"role\": \"assistant\", \"content\": \"Response with tool usage\"},\n    {\"role\": \"tool\", \"content\": \"Tool response\", \"tool_call_id\": \"call_123\"}\n  ],\n  \"scenario\": \"Description of use case\",\n  \"complexity\": \"simple|intermediate|advanced\"\n}\n```\n\n**2. Classification Template (`--template classification`)**\n```json\n{\n  \"text\": \"Input text to be classified\",\n  \"label\": \"classification_label\",\n  \"confidence\": 0.95,\n  \"metadata\": {\"domain\": \"technology\", \"language\": \"en\"}\n}\n```\n\n**3. QA Template (`--template qa`)**\n```json\n{\n  \"question\": \"What is the question being asked?\",\n  \"answer\": \"The complete answer\",\n  \"context\": \"Additional context if needed\",\n  \"answer_type\": \"factual|explanatory|opinion\",\n  \"difficulty\": \"easy|medium|hard\"\n}\n```\n\n**4. Completion Template (`--template completion`)**\n```json\n{\n  \"prompt\": \"The beginning text or context\",\n  \"completion\": \"The expected continuation\",\n  \"domain\": \"code|creative|technical|conversational\",\n  \"style\": \"description of writing style\"\n}\n```\n\n**5. Tabular Template (`--template tabular`)**\n```json\n{\n  \"columns\": [\n    {\"name\": \"feature1\", \"type\": \"numeric\", \"description\": \"First feature\"},\n    {\"name\": \"target\", \"type\": \"categorical\", \"description\": \"Target variable\"}\n  ],\n  \"data\": [\n    {\"feature1\": 123, \"target\": \"class_a\"},\n    {\"feature1\": 456, \"target\": \"class_b\"}\n  ]\n}\n```\n\n### Advanced System Prompt Template\n\nFor high-quality training data generation:\n```text\nYou are an AI assistant expert at using MCP tools effectively.\n\n## MCP SERVER DEFINITIONS\n[Define available servers and tools]\n\n## TRAINING EXAMPLE STRUCTURE\n[Specify exact JSON schema for chat templating]\n\n## QUALITY GUIDELINES\n[Detail requirements for realistic scenarios, progressive complexity, proper tool usage]\n\n## EXAMPLE CATEGORIES\n[List development workflows, debugging scenarios, data management tasks]\n```\n\n### Example Categories & Templates\n\nThe skill includes diverse training examples beyond just MCP usage:\n\n**Available Example Sets:**\n- `training_examples.json` - MCP tool usage examples (debugging, project setup, database analysis)\n- `diverse_training_examples.json` - Broader scenarios including:\n  - **Educational Chat** - Explaining programming concepts, tutorials\n  - **Git Workflows** - Feature branches, version control guidance\n  - **Code Analysis** - Performance optimization, architecture review\n  - **Content Generation** - Professional writing, creative brainstorming\n  - **Codebase Navigation** - Legacy code exploration, systematic analysis\n  - **Conversational Support** - Problem-solving, technical discussions\n\n**Using Different Example Sets:**\n```bash\n# Add MCP-focused examples\nuv run scripts/dataset_manager.py add_rows --repo_id \"your-username/dataset-name\" \\\n  --rows_json \"$(cat examples/training_examples.json)\"\n\n# Add diverse conversational examples\nuv run scripts/dataset_manager.py add_rows --repo_id \"your-username/dataset-name\" \\\n  --rows_json \"$(cat examples/diverse_training_examples.json)\"\n\n# Mix both for comprehensive training data\nuv run scripts/dataset_manager.py add_rows --repo_id \"your-username/dataset-name\" \\\n  --rows_json \"$(jq -s '.[0] + .[1]' examples/training_examples.json examples/diverse_training_examples.json)\"\n```\n\n### Commands Reference\n\n**List Available Templates:**\n```bash\nuv run scripts/dataset_manager.py list_templates\n```\n\n**Quick Setup (Recommended):**\n```bash\nuv run scripts/dataset_manager.py quick_setup --repo_id \"your-username/dataset-name\" --template classification\n```\n\n**Manual Setup:**\n```bash\n# Initialize repository\nuv run scripts/dataset_manager.py init --repo_id \"your-username/dataset-name\" [--private]\n\n# Configure with system prompt\nuv run scripts/dataset_manager.py config --repo_id \"your-username/dataset-name\" --system_prompt \"Your prompt here\"\n\n# Add data with validation\nuv run scripts/dataset_manager.py add_rows \\\n  --repo_id \"your-username/dataset-name\" \\\n  --template qa \\\n  --rows_json '[{\"question\": \"What is AI?\", \"answer\": \"Artificial Intelligence...\"}]'\n```\n\n**View Dataset Statistics:**\n```bash\nuv run scripts/dataset_manager.py stats --repo_id \"your-username/dataset-name\"\n```\n\n### Error Handling\n- **Repository exists**: Script will notify and continue with configuration\n- **Invalid JSON**: Clear error message with parsing details\n- **Network issues**: Automatic retry for transient failures\n- **Token permissions**: Validation before operations begin\n\n---\n\n# Combined Workflow Examples\n\n## Example 1: Create Training Subset from Existing Dataset\n```bash\n# 1. Explore the source dataset\nuv run scripts/sql_manager.py describe --dataset \"cais/mmlu\"\nuv run scripts/sql_manager.py histogram --dataset \"cais/mmlu\" --column \"subject\"\n\n# 2. Query and create subset\nuv run scripts/sql_manager.py query \\\n  --dataset \"cais/mmlu\" \\\n  --sql \"SELECT * FROM data WHERE subject IN ('nutrition', 'anatomy', 'clinical_knowledge')\" \\\n  --push-to \"username/mmlu-medical-subset\" \\\n  --private\n```\n\n## Example 2: Transform and Reshape Data\n```bash\n# Transform MMLU to QA format with correct answers extracted\nuv run scripts/sql_manager.py query \\\n  --dataset \"cais/mmlu\" \\\n  --sql \"SELECT question, choices[answer] as correct_answer, subject FROM data\" \\\n  --push-to \"username/mmlu-qa-format\"\n```\n\n## Example 3: Merge Multiple Dataset Splits\n```bash\n# Export multiple splits and combine\nuv run scripts/sql_manager.py export \\\n  --dataset \"cais/mmlu\" \\\n  --split \"*\" \\\n  --output \"mmlu_all.parquet\"\n```\n\n## Example 4: Quality Filtering\n```bash\n# Filter for high-quality examples\nuv run scripts/sql_manager.py query \\\n  --dataset \"squad\" \\\n  --sql \"SELECT * FROM data WHERE LENGTH(context) > 500 AND LENGTH(question) > 20\" \\\n  --push-to \"username/squad-filtered\"\n```\n\n## Example 5: Create Custom Training Dataset\n```bash\n# 1. Query source data\nuv run scripts/sql_manager.py export \\\n  --dataset \"cais/mmlu\" \\\n  --sql \"SELECT question, subject FROM data WHERE subject='nutrition'\" \\\n  --output \"nutrition_source.jsonl\" \\\n  --format jsonl\n\n# 2. Process with your pipeline (add answers, format, etc.)\n\n# 3. Push processed data\nuv run scripts/dataset_manager.py init --repo_id \"username/nutrition-training\"\nuv run scripts/dataset_manager.py add_rows \\\n  --repo_id \"username/nutrition-training\" \\\n  --template qa \\\n  --rows_json \"$(cat processed_data.json)\"\n```\n"
  },
  {
    "path": "skills/hugging-face-datasets/examples/diverse_training_examples.json",
    "content": "[\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"Hi there! I'm working on a Python project and getting stuck with some async programming concepts. Can you help explain how to use async/await properly?\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I'd be happy to help you understand async/await in Python! These are powerful tools for writing concurrent code. Let me show you some practical examples and explain the key concepts.\\n\\n**Basic async/await syntax:**\\n```python\\nimport asyncio\\n\\nasync def fetch_data():\\n    # Simulate an async operation\\n    await asyncio.sleep(1)\\n    return \\\"Data fetched!\\\"\\n\\nasync def main():\\n    result = await fetch_data()\\n    print(result)\\n\\n# Run the async function\\nasyncio.run(main())\\n```\\n\\n**Key concepts:**\\n- `async def` defines an asynchronous function (coroutine)\\n- `await` pauses execution until the awaited operation completes\\n- `asyncio.run()` runs the main async function\\n\\n**When to use async/await:**\\n- I/O operations (file reads, network requests, database queries)\\n- Operations that involve waiting\\n- When you want to handle multiple tasks concurrently\\n\\nWould you like me to show you a more complex example with multiple concurrent operations, or do you have a specific use case you're working on?\"\n      }\n    ],\n    \"scenario\": \"Educational chat about Python async programming concepts\",\n    \"complexity\": \"simple\",\n    \"tools_used\": [],\n    \"learning_objectives\": [\"conversational teaching\", \"code explanation\", \"technical mentoring\", \"educational chat patterns\"]\n  },\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"I need to create a new feature branch for my project, make some changes, and then create a pull request. Can you walk me through the Git workflow?\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I'll help you with the complete Git workflow for creating a feature branch and pull request. Let me first check your current Git status and then guide you through each step.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"git_status\",\n              \"arguments\": {}\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"On branch main\\nYour branch is up to date with 'origin/main'.\\n\\nnothing to commit, working tree clean\",\n        \"tool_call_id\": \"call_301\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Perfect! Your working tree is clean and you're on the main branch. Now let's create your feature branch and walk through the workflow.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"git_branch\",\n              \"arguments\": {\n                \"branch_name\": \"feature/user-authentication\",\n                \"create\": true\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"Switched to a new branch 'feature/user-authentication'\",\n        \"tool_call_id\": \"call_302\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Great! I've created and switched to your new feature branch. Here's the complete workflow:\\n\\n**✅ Step 1: Created feature branch** (Done)\\n```bash\\ngit checkout -b feature/user-authentication\\n```\\n\\n**Next steps for you:**\\n\\n**Step 2: Make your changes**\\n- Edit your files as needed\\n- Add your new feature code\\n\\n**Step 3: Stage and commit changes**\\n```bash\\ngit add .\\ngit commit -m \\\"Add user authentication feature\\\"\\n```\\n\\n**Step 4: Push branch to remote**\\n```bash\\ngit push -u origin feature/user-authentication\\n```\\n\\n**Step 5: Create Pull Request**\\n- Go to your repository on GitHub/GitLab\\n- Click \\\"Create Pull Request\\\" or \\\"Merge Request\\\"\\n- Fill in description and request review\\n\\nWould you like me to help you with any of these steps once you've made your changes?\"\n      }\n    ],\n    \"scenario\": \"Git workflow guidance for feature development\",\n    \"complexity\": \"intermediate\",\n    \"tools_used\": [\"git_status\", \"git_branch\"],\n    \"learning_objectives\": [\"Git workflow patterns\", \"branch management\", \"collaborative development\", \"step-by-step guidance\"]\n  },\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"I have a Python function that's running really slowly. Can you help me analyze the code and suggest optimizations?\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I'd be happy to help optimize your Python code! Let me analyze the function to identify potential performance bottlenecks and suggest improvements.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"find_functions\",\n              \"arguments\": {\n                \"pattern\": \"def.*process.*\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"Found functions:\\n- def process_user_data(users_list): line 45 in utils.py\\n- def process_payments(payment_records): line 78 in billing.py\",\n        \"tool_call_id\": \"call_401\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I found some processing functions. Let me examine the code to identify performance issues.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"read_file\",\n              \"arguments\": {\n                \"path\": \"utils.py\",\n                \"start_line\": 45,\n                \"end_line\": 60\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"def process_user_data(users_list):\\n    results = []\\n    for user in users_list:\\n        # Multiple database calls in loop - PERFORMANCE ISSUE!\\n        profile = db.get_user_profile(user.id)\\n        settings = db.get_user_settings(user.id)\\n        preferences = db.get_user_preferences(user.id)\\n        \\n        # String concatenation in loop - PERFORMANCE ISSUE!\\n        summary = \\\"\\\"\\n        for field in user.fields:\\n            summary += f\\\"{field.name}: {field.value}\\\\n\\\"\\n        \\n        results.append({\\n            'user': user,\\n            'profile': profile,\\n            'settings': settings,\\n            'preferences': preferences,\\n            'summary': summary\\n        })\\n    return results\",\n        \"tool_call_id\": \"call_402\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I've identified several performance issues in your `process_user_data` function. Here's my analysis and optimization suggestions:\\n\\n**🔍 Performance Issues Found:**\\n\\n1. **N+1 Database Query Problem** (Lines 48-50)\\n   - Making 3 separate DB calls per user in a loop\\n   - This scales linearly with user count (very slow!)\\n\\n2. **String Concatenation in Loop** (Lines 52-55)\\n   - Using `+=` for string building is inefficient\\n   - Creates new string objects each iteration\\n\\n**⚡ Optimization Recommendations:**\\n\\n```python\\ndef process_user_data_optimized(users_list):\\n    # Fix 1: Batch database queries\\n    user_ids = [user.id for user in users_list]\\n    profiles = db.get_user_profiles_batch(user_ids)  # Single query\\n    settings = db.get_user_settings_batch(user_ids)  # Single query  \\n    preferences = db.get_user_preferences_batch(user_ids)  # Single query\\n    \\n    results = []\\n    for user in users_list:\\n        # Fix 2: Use join() for string concatenation\\n        summary = \\\"\\\\n\\\".join(f\\\"{field.name}: {field.value}\\\" \\n                           for field in user.fields)\\n        \\n        results.append({\\n            'user': user,\\n            'profile': profiles.get(user.id),\\n            'settings': settings.get(user.id), \\n            'preferences': preferences.get(user.id),\\n            'summary': summary\\n        })\\n    return results\\n```\\n\\n**Expected Performance Gain:** 10-100x faster depending on user count!\\n\\nWould you like me to help implement the batch database queries or analyze any other functions?\"\n      }\n    ],\n    \"scenario\": \"Code performance analysis and optimization guidance\",\n    \"complexity\": \"advanced\",\n    \"tools_used\": [\"find_functions\", \"read_file\"],\n    \"learning_objectives\": [\"performance optimization\", \"code analysis\", \"database optimization patterns\", \"Python best practices\"]\n  },\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"I need to write a professional email to a client explaining a project delay. Can you help me draft something that's apologetic but confident?\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I'll help you draft a professional email that strikes the right balance between taking responsibility and maintaining client confidence. Here's a template that addresses the delay while focusing on solutions:\\n\\n**Subject:** Project Update - Revised Timeline for [Project Name]\\n\\n**Email Draft:**\\n\\nDear [Client Name],\\n\\nI hope this email finds you well. I'm writing to provide you with an important update regarding the [Project Name] timeline.\\n\\nAfter a thorough review of our current progress, I need to inform you that we will require an additional [X weeks/days] to complete the project to our mutual quality standards. This adjustment is primarily due to [brief, honest reason - e.g., \\\"unexpected technical complexities in the integration phase\\\" or \\\"additional testing requirements to ensure optimal performance\\\"].\\n\\nI want to be completely transparent about this delay and take full responsibility. However, I'm confident that this additional time will allow us to deliver a solution that exceeds your expectations and provides long-term value.\\n\\n**Here's our revised timeline:**\\n- [Key milestone 1]: [New date]\\n- [Key milestone 2]: [New date] \\n- Final delivery: [New date]\\n\\nTo ensure we stay on track, I've implemented [specific action - e.g., \\\"daily progress reviews\\\" or \\\"additional resources to the development team\\\"].\\n\\nI understand this may impact your planning, and I'm available to discuss how we can minimize any inconvenience. Would you be available for a brief call this week to address any concerns?\\n\\nThank you for your patience and continued trust in our partnership.\\n\\nBest regards,\\n[Your name]\\n\\n**Key elements this email includes:**\\n- Direct, honest communication\\n- Takes responsibility without excessive apologizing\\n- Provides specific reasons and solutions\\n- Shows proactive management\\n- Offers further discussion\\n\\nWould you like me to adjust the tone or focus on any particular aspects?\"\n      }\n    ],\n    \"scenario\": \"Professional communication assistance and content generation\",\n    \"complexity\": \"simple\",\n    \"tools_used\": [],\n    \"learning_objectives\": [\"professional writing\", \"client communication\", \"content generation\", \"business correspondence patterns\"]\n  },\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"I'm trying to understand this legacy codebase I inherited. There are so many files and I don't know where to start. Can you help me map out the structure?\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I completely understand - navigating a new codebase can be overwhelming! Let me help you systematically explore and understand the structure. I'll start by getting an overview and then drill down into the important parts.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"list_directory\",\n              \"arguments\": {\n                \"path\": \".\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"README.md\\npackage.json\\nsrc/\\ntest/\\nconfig/\\ndocs/\\n.env.example\\n.gitignore\\nDockerfile\\nnode_modules/\",\n        \"tool_call_id\": \"call_501\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Great! This looks like a Node.js project with a well-organized structure. Let me examine the key areas to give you a roadmap.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"read_file\",\n              \"arguments\": {\n                \"path\": \"package.json\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"{\\n  \\\"name\\\": \\\"legacy-api-service\\\",\\n  \\\"version\\\": \\\"2.3.1\\\",\\n  \\\"description\\\": \\\"Customer management API with billing integration\\\",\\n  \\\"main\\\": \\\"src/app.js\\\",\\n  \\\"scripts\\\": {\\n    \\\"start\\\": \\\"node src/app.js\\\",\\n    \\\"dev\\\": \\\"nodemon src/app.js\\\",\\n    \\\"test\\\": \\\"jest test/\\\"\\n  },\\n  \\\"dependencies\\\": {\\n    \\\"express\\\": \\\"^4.18.0\\\",\\n    \\\"mongoose\\\": \\\"^6.2.0\\\",\\n    \\\"stripe\\\": \\\"^8.0.0\\\"\\n  }\\n}\",\n        \"tool_call_id\": \"call_502\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Now let me explore the source directory to understand the application structure.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"list_directory\",\n              \"arguments\": {\n                \"path\": \"src\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"app.js\\nroutes/\\nmodels/\\nservices/\\nmiddleware/\\nutils/\\ncontrollers/\",\n        \"tool_call_id\": \"call_503\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Perfect! Now I have a clear picture. Here's your **Codebase Navigation Guide**:\\n\\n## 📋 Project Overview\\n**Type:** Customer Management API with billing (Node.js/Express)\\n**Main Entry:** `src/app.js`\\n**Database:** MongoDB (Mongoose)\\n**Payment:** Stripe integration\\n\\n## 🗺️ Architecture Map\\n\\n**Start Here:**\\n1. 📖 `README.md` - Project documentation\\n2. ⚙️ `src/app.js` - Application entry point\\n3. 🛣️ `src/routes/` - API endpoints definition\\n\\n**Core Application Logic:**\\n- `src/controllers/` - Request handling logic\\n- `src/models/` - Database schemas (MongoDB)\\n- `src/services/` - Business logic layer\\n- `src/middleware/` - Authentication, validation, etc.\\n\\n**Supporting Code:**\\n- `src/utils/` - Helper functions\\n- `config/` - Configuration files\\n- `test/` - Unit/integration tests\\n\\n## 🎯 Recommended Exploration Order:\\n\\n**Phase 1 - Foundation (Start here!):**\\n1. Read `README.md` for setup instructions\\n2. Check `src/app.js` to see how everything connects\\n3. Browse `src/routes/` to understand available APIs\\n\\n**Phase 2 - Business Logic:**\\n4. Explore `src/controllers/` for main functionality\\n5. Review `src/models/` for data structure\\n6. Study `src/services/` for business rules\\n\\n**Phase 3 - Details:**\\n7. Check `src/middleware/` for security/validation\\n8. Look at `config/` for environment setup\\n9. Review `test/` to understand expected behavior\\n\\nWould you like me to dive deeper into any specific area, like showing you the main routes or examining the data models?\"\n      }\n    ],\n    \"scenario\": \"Codebase exploration and architecture understanding\",\n    \"complexity\": \"intermediate\",\n    \"tools_used\": [\"list_directory\", \"read_file\"],\n    \"learning_objectives\": [\"codebase navigation\", \"architecture analysis\", \"systematic exploration\", \"legacy code understanding\"]\n  },\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"Can you help me brainstorm some creative names for my new productivity app? It's focused on time blocking and helps people focus on deep work.\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I'd love to help you brainstorm creative names for your time blocking and deep work app! Let me generate some ideas across different themes and styles.\\n\\n## 🎯 Focus & Deep Work Theme\\n- **FlowBlocks** - Combines time blocking with flow state\\n- **DeepDive** - Simple, suggests deep work immersion\\n- **FocusForge** - Creating focused time periods\\n- **ZenBlocks** - Peaceful, mindful time management\\n- **MindVault** - Protecting mental space for deep work\\n\\n## ⏰ Time & Productivity Theme\\n- **ChronoFlow** - Time management with smooth workflow\\n- **TimeWeaver** - Weaving together productive time blocks\\n- **ClockCraft** - Skillfully crafting your schedule\\n- **TempoLock** - Locking in productive tempos\\n- **PulseBlocks** - Rhythmic time blocking\\n\\n## 🏗️ Building & Structure Theme\\n- **TaskArchitect** - Building your productive day\\n- **FocusFoundry** - Where focused time is created\\n- **ProductivityBricks** - Building blocks of productivity\\n- **WorkStack** - Stacking focused work sessions\\n- **MindGrid** - Organized mental workspace\\n\\n## 🎨 Creative & Unique\\n- **Concentrix** - Focus + matrix of time\\n- **Quietude** - Emphasizes calm, focused environment\\n- **DeepSpace** - Space for deep work (tech feel)\\n- **FlowChamber** - Protected space for flow state\\n- **ThinkTank** - Tank of focused thinking time\\n\\n## 💡 Short & Catchy\\n- **Blokr** - Modern, minimal spelling\\n- **Focux** - Focus + flux\\n- **Zenith** - Peak productivity\\n- **Prism** - Breaking time into focused segments\\n- **Vault** - Simple, suggests protection/security\\n\\n**My top recommendations:**\\n1. **FlowBlocks** - Perfect blend of your core features\\n2. **DeepVault** - Strong, memorable, suggests protection of deep work\\n3. **FocusForge** - Active, suggests creating something valuable\\n\\nDo any of these resonate with your vision? Would you like me to explore variations of your favorites or brainstorm in a different direction?\"\n      }\n    ],\n    \"scenario\": \"Creative brainstorming and idea generation assistance\",\n    \"complexity\": \"simple\",\n    \"tools_used\": [],\n    \"learning_objectives\": [\"creative thinking\", \"brainstorming techniques\", \"branding assistance\", \"idea generation patterns\"]\n  }\n]"
  },
  {
    "path": "skills/hugging-face-datasets/examples/system_prompt_template.txt",
    "content": "You are an AI assistant expert at using MCP (Model Context Protocol) tools effectively. You have access to various MCP servers and tools that allow you to interact with external systems, databases, APIs, and services.\n\n## MCP SERVER DEFINITIONS\n\n### Available MCP Servers and Tools:\n\n**1. Filesystem Server (mcp-server-filesystem)**\n- `read_file`: Read file contents\n- `write_file`: Create or overwrite files\n- `edit_file`: Make precise edits to existing files\n- `list_directory`: List directory contents\n- `create_directory`: Create new directories\n- `move_file`: Move/rename files\n- `delete_file`: Remove files\n- `search_files`: Search for files by name pattern\n\n**2. Git Server (mcp-server-git)**\n- `git_status`: Check repository status\n- `git_log`: View commit history\n- `git_diff`: Show changes between commits\n- `git_commit`: Create commits\n- `git_branch`: Manage branches\n- `git_push`: Push changes to remote\n- `git_pull`: Pull changes from remote\n\n**3. Database Server (mcp-server-sqlite)**\n- `execute_query`: Run SQL queries\n- `describe_table`: Get table schema\n- `list_tables`: Show all tables\n- `create_table`: Create new tables\n- `insert_data`: Add records\n- `backup_database`: Create database backup\n\n**4. Web Server (mcp-server-web)**\n- `fetch_url`: Get web page content\n- `post_request`: Send POST requests\n- `search_web`: Search the internet\n- `download_file`: Download files from URLs\n\n**5. Code Analysis Server (mcp-server-code)**\n- `analyze_code`: Static code analysis\n- `find_functions`: Locate function definitions\n- `get_dependencies`: Extract imports/dependencies\n- `format_code`: Auto-format source code\n- `run_tests`: Execute test suites\n\n## TRAINING EXAMPLE STRUCTURE\n\nGenerate training examples using this exact structure:\n\n```json\n{\n  \"messages\": [\n    {\n      \"role\": \"user\",\n      \"content\": \"[Natural user request describing a real problem]\"\n    },\n    {\n      \"role\": \"assistant\",\n      \"content\": \"[Conversational response with embedded MCP tool usage]\",\n      \"tool_calls\": [\n        {\n          \"type\": \"function\",\n          \"function\": {\n            \"name\": \"[mcp_tool_name]\",\n            \"arguments\": {\n              \"[param1]\": \"[value1]\",\n              \"[param2]\": \"[value2]\"\n            }\n          }\n        }\n      ]\n    },\n    {\n      \"role\": \"tool\",\n      \"content\": \"[Simulated tool response/output]\",\n      \"tool_call_id\": \"call_123\"\n    },\n    {\n      \"role\": \"assistant\",\n      \"content\": \"[Follow-up response analyzing results and next steps]\"\n    }\n  ],\n  \"scenario\": \"[Brief description of the use case]\",\n  \"complexity\": \"[simple|intermediate|advanced]\",\n  \"tools_used\": [\"[tool1]\", \"[tool2]\"],\n  \"learning_objectives\": [\"[objective1]\", \"[objective2]\"]\n}\n```\n\n## QUALITY GUIDELINES\n\n**1. Realistic Scenarios**: Create examples based on actual developer workflows:\n- Debugging application errors\n- Setting up new projects\n- Code refactoring and optimization\n- Database management tasks\n- API integration challenges\n- Version control operations\n- File organization and cleanup\n\n**2. Progressive Complexity**:\n- **Simple**: Single tool usage for straightforward tasks\n- **Intermediate**: Multiple tools working together\n- **Advanced**: Complex workflows with error handling and iteration\n\n**3. Tool Usage Patterns**:\n- Always explain tool selection rationale\n- Show proper parameter usage\n- Demonstrate error handling\n- Include realistic tool outputs\n- Show iterative problem-solving\n\n**4. Conversational Quality**:\n- Natural language flow\n- User context awareness\n- Helpful explanations without being verbose\n- Professional but approachable tone\n- Clear next steps and follow-up options\n\n**5. Technical Accuracy**:\n- Correct MCP tool syntax\n- Realistic file paths and data\n- Valid code snippets\n- Proper error messages\n- Authentic command outputs\n\n## EXAMPLE CATEGORIES TO COVER\n\n**Development Workflows**:\n- Project setup and initialization\n- Code review and quality checks\n- Dependency management\n- Build and deployment processes\n\n**Git & Version Control**:\n- Feature branch workflows\n- Merge conflict resolution\n- Repository management\n- Collaborative development patterns\n\n**Code Analysis & Optimization**:\n- Performance profiling and optimization\n- Code quality assessment\n- Refactoring guidance\n- Architecture analysis\n\n**Debugging & Troubleshooting**:\n- Error investigation\n- Log analysis\n- Performance profiling\n- Configuration issues\n\n**Data Management**:\n- Database queries and updates\n- File processing and transformation\n- Backup and migration tasks\n- Data validation and cleanup\n\n**Integration Tasks**:\n- API testing and integration\n- External service connections\n- Webhook setup and testing\n- Authentication implementation\n\n**Educational & Mentoring**:\n- Concept explanations\n- Technical tutorials\n- Best practices guidance\n- Learning path recommendations\n\n**Content Generation**:\n- Professional writing assistance\n- Documentation creation\n- Creative brainstorming\n- Technical communication\n\n**Conversational Support**:\n- Problem-solving discussions\n- Project guidance\n- Technical Q&A\n- Collaborative planning\n\n**Codebase Navigation**:\n- Legacy code exploration\n- Architecture understanding\n- Documentation analysis\n- Systematic code review\n\n**Maintenance & Operations**:\n- Code refactoring\n- Documentation updates\n- Security audits\n- System monitoring\n\nGenerate examples that demonstrate sophisticated MCP usage while maintaining natural conversational flow and practical applicability."
  },
  {
    "path": "skills/hugging-face-datasets/examples/training_examples.json",
    "content": "[\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"My Node.js app is crashing on startup. Can you help me debug this? The error message mentions something about a missing config file.\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I'll help you debug the startup crash. Let me first check your application structure and then look at the logs to understand the missing config file issue.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"list_directory\",\n              \"arguments\": {\n                \"path\": \".\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"package.json\\nserver.js\\nsrc/\\nnode_modules/\\nREADME.md\\nlogs/\",\n        \"tool_call_id\": \"call_001\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I can see your project structure. Now let me check the main server file to understand what config file it's expecting.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"read_file\",\n              \"arguments\": {\n                \"path\": \"server.js\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"const express = require('express');\\nconst config = require('./config/database.json');\\nconst app = express();\\n\\napp.listen(config.port, () => {\\n  console.log(`Server running on port ${config.port}`);\\n});\",\n        \"tool_call_id\": \"call_002\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Found the issue! Your server.js is trying to load './config/database.json' but I don't see a config directory. Let me check if it exists and create it if needed.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"list_directory\",\n              \"arguments\": {\n                \"path\": \"config\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"Error: Directory 'config' does not exist\",\n        \"tool_call_id\": \"call_003\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Perfect! The config directory is missing. I'll create it and add a basic database configuration file to fix your startup crash.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"create_directory\",\n              \"arguments\": {\n                \"path\": \"config\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"Directory 'config' created successfully\",\n        \"tool_call_id\": \"call_004\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Now I'll create the missing database.json configuration file with sensible defaults.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"write_file\",\n              \"arguments\": {\n                \"path\": \"config/database.json\",\n                \"content\": \"{\\n  \\\"port\\\": 3000,\\n  \\\"database\\\": {\\n    \\\"host\\\": \\\"localhost\\\",\\n    \\\"port\\\": 5432,\\n    \\\"name\\\": \\\"myapp\\\",\\n    \\\"user\\\": \\\"postgres\\\"\\n  }\\n}\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"File 'config/database.json' created successfully\",\n        \"tool_call_id\": \"call_005\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Great! I've fixed the missing config file issue. Your Node.js app should now start without crashing. The server will run on port 3000 by default. You may want to adjust the database connection settings in config/database.json to match your actual database setup.\"\n      }\n    ],\n    \"scenario\": \"Debugging Node.js startup crash due to missing configuration file\",\n    \"complexity\": \"intermediate\",\n    \"tools_used\": [\"list_directory\", \"read_file\", \"create_directory\", \"write_file\"],\n    \"learning_objectives\": [\"systematic debugging approach\", \"file system navigation\", \"configuration file management\", \"error root cause analysis\"]\n  },\n  {\n    \"messages\": [\n      {\n        \"role\": \"user\",\n        \"content\": \"I need to set up a new React project with TypeScript and add some basic components. Can you help me get started?\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"I'll help you set up a new React TypeScript project from scratch. Let me start by creating the project structure and configuration files.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"create_directory\",\n              \"arguments\": {\n                \"path\": \"my-react-app\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"Directory 'my-react-app' created successfully\",\n        \"tool_call_id\": \"call_101\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Now I'll create the package.json with React and TypeScript dependencies.\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"write_file\",\n              \"arguments\": {\n                \"path\": \"my-react-app/package.json\",\n                \"content\": \"{\\n  \\\"name\\\": \\\"my-react-app\\\",\\n  \\\"version\\\": \\\"1.0.0\\\",\\n  \\\"scripts\\\": {\\n    \\\"dev\\\": \\\"vite\\\",\\n    \\\"build\\\": \\\"tsc && vite build\\\",\\n    \\\"preview\\\": \\\"vite preview\\\"\\n  },\\n  \\\"dependencies\\\": {\\n    \\\"react\\\": \\\"^18.2.0\\\",\\n    \\\"react-dom\\\": \\\"^18.2.0\\\"\\n  },\\n  \\\"devDependencies\\\": {\\n    \\\"@types/react\\\": \\\"^18.2.0\\\",\\n    \\\"@types/react-dom\\\": \\\"^18.2.0\\\",\\n    \\\"@vitejs/plugin-react\\\": \\\"^4.0.0\\\",\\n    \\\"typescript\\\": \\\"^5.0.0\\\",\\n    \\\"vite\\\": \\\"^4.3.0\\\"\\n  }\\n}\"\n              }\n            }\n          }\n        ]\n      },\n      {\n        \"role\": \"tool\",\n        \"content\": \"File 'my-react-app/package.json' created successfully\",\n        \"tool_call_id\": \"call_102\"\n      },\n      {\n        \"role\": \"assistant\",\n        \"content\": \"Perfect! I've set up your React TypeScript project with proper configuration. The project includes package.json with all necessary dependencies, and a clean structure. You can now run `npm install` to install dependencies and start developing your components. Would you like me to create any specific components or add additional configuration?\"\n      }\n    ],\n    \"scenario\": \"Setting up new React TypeScript project from scratch\",\n    \"complexity\": \"simple\",\n    \"tools_used\": [\"create_directory\", \"write_file\"],\n    \"learning_objectives\": [\"project initialization\", \"React TypeScript setup\", \"configuration management\", \"directory structure planning\"]\n  }\n]"
  },
  {
    "path": "skills/hugging-face-datasets/scripts/dataset_manager.py",
    "content": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#   \"huggingface_hub>=0.20.0\",\n# ]\n# ///\n\"\"\"\nHugging Face Dataset Manager\n\nEnhanced dataset creation and management tool designed to work alongside\nthe HF MCP server. Provides dataset creation, configuration, and content\nmanagement capabilities optimized for conversational AI training data.\n\nVersion: 2.0.0\n\nUsage:\n    uv run dataset_manager.py init --repo_id username/dataset-name\n    uv run dataset_manager.py quick_setup --repo_id username/dataset-name --template chat\n    uv run dataset_manager.py add_rows --repo_id username/dataset-name --rows_json '[{\"messages\": [...]}]'\n    uv run dataset_manager.py stats --repo_id username/dataset-name\n    uv run dataset_manager.py list_templates\n\"\"\"\n\nimport os\nimport json\nimport time\nimport argparse\nfrom pathlib import Path\nfrom typing import List, Dict, Any, Optional\nfrom huggingface_hub import HfApi, create_repo\nfrom huggingface_hub.utils import HfHubHTTPError\n\n# Configuration\nHF_TOKEN = os.environ.get(\"HF_TOKEN\")\nEXAMPLES_DIR = Path(__file__).parent.parent / \"examples\"\n\n\ndef init_dataset(repo_id, token=None, private=True):\n    \"\"\"\n    Initialize a new dataset repository on Hugging Face Hub.\n    \"\"\"\n    api = HfApi(token=token)\n    try:\n        create_repo(repo_id, repo_type=\"dataset\", private=private, token=token)\n        print(f\"Created dataset repository: {repo_id}\")\n    except HfHubHTTPError as e:\n        if \"409\" in str(e):\n            print(f\"Repository {repo_id} already exists.\")\n        else:\n            raise e\n\n    # Create a basic README.md with metadata if it doesn't exist\n    readme_content = f\"\"\"---\nlicense: mit\n---\n\n# {repo_id.split(\"/\")[-1]}\n\nThis dataset was created using the Claude Dataset Skill.\n\"\"\"\n    try:\n        api.upload_file(\n            path_or_fileobj=readme_content.encode(\"utf-8\"),\n            path_in_repo=\"README.md\",\n            repo_id=repo_id,\n            repo_type=\"dataset\",\n            commit_message=\"Initialize dataset README\",\n        )\n    except Exception as e:\n        print(f\"Note: README might already exist or failed to update: {e}\")\n\n\ndef define_config(repo_id, system_prompt=None, token=None):\n    \"\"\"\n    Define a configuration for the dataset, including a system prompt.\n    This saves a config.json file to the repository.\n    \"\"\"\n    api = HfApi(token=token)\n\n    config_data = {\"dataset_config\": {\"version\": \"1.0\", \"created_at\": time.time()}}\n\n    if system_prompt:\n        config_data[\"system_prompt\"] = system_prompt\n\n    # Upload config.json\n    api.upload_file(\n        path_or_fileobj=json.dumps(config_data, indent=2).encode(\"utf-8\"),\n        path_in_repo=\"config.json\",\n        repo_id=repo_id,\n        repo_type=\"dataset\",\n        commit_message=\"Update dataset configuration\",\n    )\n    print(f\"Configuration updated for {repo_id}\")\n\n\ndef load_dataset_template(template_name: str) -> Dict[str, Any]:\n    \"\"\"Load dataset template configuration from templates directory.\"\"\"\n    template_path = EXAMPLES_DIR.parent / \"templates\" / f\"{template_name}.json\"\n    if not template_path.exists():\n        available_templates = [f.stem for f in (EXAMPLES_DIR.parent / \"templates\").glob(\"*.json\")]\n        print(f\"❌ Template '{template_name}' not found.\")\n        print(f\"Available templates: {', '.join(available_templates)}\")\n        return {}\n\n    with open(template_path) as f:\n        return json.load(f)\n\n\ndef validate_by_template(rows: List[Dict[str, Any]], template: Dict[str, Any]) -> bool:\n    \"\"\"Validate data according to template schema.\"\"\"\n    if not template:\n        return False\n\n    schema = template.get(\"validation_schema\", {})\n    required_fields = set(schema.get(\"required_fields\", []))\n    recommended_fields = set(schema.get(\"recommended_fields\", []))\n    field_types = schema.get(\"field_types\", {})\n\n    for i, row in enumerate(rows):\n        # Check required fields\n        if not all(field in row for field in required_fields):\n            missing = required_fields - set(row.keys())\n            print(f\"Row {i}: Missing required fields: {missing}\")\n            return False\n\n        # Validate field types\n        for field, expected_type in field_types.items():\n            if field in row:\n                if not _validate_field_type(row[field], expected_type, f\"Row {i}, field '{field}'\"):\n                    return False\n\n        # Template-specific validation\n        if template[\"type\"] == \"chat\":\n            if not _validate_chat_format(row, i):\n                return False\n        elif template[\"type\"] == \"classification\":\n            if not _validate_classification_format(row, i):\n                return False\n        elif template[\"type\"] == \"tabular\":\n            if not _validate_tabular_format(row, i):\n                return False\n\n        # Warn about missing recommended fields\n        missing_recommended = recommended_fields - set(row.keys())\n        if missing_recommended:\n            print(f\"Row {i}: Recommended to include: {missing_recommended}\")\n\n    print(f\"✓ Validated {len(rows)} examples for {template['type']} dataset\")\n    return True\n\n\ndef _validate_field_type(value: Any, expected_type: str, context: str) -> bool:\n    \"\"\"Validate individual field type.\"\"\"\n    if expected_type.startswith(\"enum:\"):\n        valid_values = expected_type[5:].split(\",\")\n        if value not in valid_values:\n            print(f\"{context}: Invalid value '{value}'. Must be one of: {valid_values}\")\n            return False\n    elif expected_type == \"array\" and not isinstance(value, list):\n        print(f\"{context}: Expected array, got {type(value).__name__}\")\n        return False\n    elif expected_type == \"object\" and not isinstance(value, dict):\n        print(f\"{context}: Expected object, got {type(value).__name__}\")\n        return False\n    elif expected_type == \"string\" and not isinstance(value, str):\n        print(f\"{context}: Expected string, got {type(value).__name__}\")\n        return False\n    elif expected_type == \"number\" and not isinstance(value, (int, float)):\n        print(f\"{context}: Expected number, got {type(value).__name__}\")\n        return False\n\n    return True\n\n\ndef _validate_chat_format(row: Dict[str, Any], row_index: int) -> bool:\n    \"\"\"Validate chat-specific format.\"\"\"\n    messages = row.get(\"messages\", [])\n    if not isinstance(messages, list) or len(messages) == 0:\n        print(f\"Row {row_index}: 'messages' must be a non-empty list\")\n        return False\n\n    valid_roles = {\"user\", \"assistant\", \"tool\", \"system\"}\n    for j, msg in enumerate(messages):\n        if not isinstance(msg, dict):\n            print(f\"Row {row_index}, message {j}: Must be an object\")\n            return False\n        if \"role\" not in msg or msg[\"role\"] not in valid_roles:\n            print(f\"Row {row_index}, message {j}: Invalid role. Use: {valid_roles}\")\n            return False\n        if \"content\" not in msg:\n            print(f\"Row {row_index}, message {j}: Missing 'content' field\")\n            return False\n\n    return True\n\n\ndef _validate_classification_format(row: Dict[str, Any], row_index: int) -> bool:\n    \"\"\"Validate classification-specific format.\"\"\"\n    if \"text\" not in row:\n        print(f\"Row {row_index}: Missing 'text' field\")\n        return False\n    if \"label\" not in row:\n        print(f\"Row {row_index}: Missing 'label' field\")\n        return False\n\n    return True\n\n\ndef _validate_tabular_format(row: Dict[str, Any], row_index: int) -> bool:\n    \"\"\"Validate tabular-specific format.\"\"\"\n    if \"data\" not in row:\n        print(f\"Row {row_index}: Missing 'data' field\")\n        return False\n    if \"columns\" not in row:\n        print(f\"Row {row_index}: Missing 'columns' field\")\n        return False\n\n    data = row[\"data\"]\n    columns = row[\"columns\"]\n\n    if not isinstance(data, list):\n        print(f\"Row {row_index}: 'data' must be an array\")\n        return False\n    if not isinstance(columns, list):\n        print(f\"Row {row_index}: 'columns' must be an array\")\n        return False\n\n    return True\n\n\ndef validate_training_data(rows: List[Dict[str, Any]], template_name: str = \"chat\") -> bool:\n    \"\"\"\n    Validate training data structure according to template.\n    Supports multiple dataset types with appropriate validation.\n    \"\"\"\n    template = load_dataset_template(template_name)\n    if not template:\n        print(f\"❌ Could not load template '{template_name}', falling back to basic validation\")\n        return _basic_validation(rows)\n\n    return validate_by_template(rows, template)\n\n\ndef _basic_validation(rows: List[Dict[str, Any]]) -> bool:\n    \"\"\"Basic validation when no template is available.\"\"\"\n    for i, row in enumerate(rows):\n        if not isinstance(row, dict):\n            print(f\"Row {i}: Must be a dictionary/object\")\n            return False\n    print(f\"✓ Basic validation passed for {len(rows)} rows\")\n    return True\n\n\ndef add_rows(\n    repo_id: str,\n    rows: List[Dict[str, Any]],\n    split: str = \"train\",\n    validate: bool = True,\n    template: str = \"chat\",\n    token: Optional[str] = None,\n) -> None:\n    \"\"\"\n    Stream updates to the dataset by uploading a new chunk of rows.\n    Enhanced with validation for multiple dataset types.\n\n    Args:\n        repo_id: Repository identifier (username/dataset-name)\n        rows: List of training examples\n        split: Dataset split name (train, test, validation)\n        validate: Whether to validate data structure before upload\n        template: Dataset template type (chat, classification, qa, completion, tabular, custom)\n        token: HuggingFace API token\n    \"\"\"\n    api = HfApi(token=token)\n\n    if not rows:\n        print(\"No rows to add.\")\n        return\n\n    # Validate training data structure\n    if validate and not validate_training_data(rows, template):\n        print(\"❌ Validation failed. Use --no-validate to skip validation.\")\n        return\n\n    # Create a newline-delimited JSON string\n    jsonl_content = \"\\n\".join(json.dumps(row) for row in rows)\n\n    # Generate a unique filename for this chunk\n    timestamp = int(time.time() * 1000)\n    filename = f\"data/{split}-{timestamp}.jsonl\"\n\n    try:\n        api.upload_file(\n            path_or_fileobj=jsonl_content.encode(\"utf-8\"),\n            path_in_repo=filename,\n            repo_id=repo_id,\n            repo_type=\"dataset\",\n            commit_message=f\"Add {len(rows)} rows to {split} split\",\n        )\n        print(f\"✅ Added {len(rows)} rows to {repo_id} (split: {split})\")\n    except Exception as e:\n        print(f\"❌ Upload failed: {e}\")\n        return\n\n\ndef load_template(template_name: str = \"system_prompt_template.txt\") -> str:\n    \"\"\"Load a template file from the examples directory.\"\"\"\n    template_path = EXAMPLES_DIR / template_name\n    if template_path.exists():\n        return template_path.read_text()\n    else:\n        print(f\"⚠️ Template {template_name} not found at {template_path}\")\n        return \"\"\n\n\ndef quick_setup(repo_id: str, template_type: str = \"chat\", token: Optional[str] = None) -> None:\n    \"\"\"\n    Quick setup for different dataset types using templates.\n\n    Args:\n        repo_id: Repository identifier\n        template_type: Dataset template (chat, classification, qa, completion, tabular, custom)\n        token: HuggingFace API token\n    \"\"\"\n    print(f\"🚀 Quick setup for {repo_id} with '{template_type}' template...\")\n\n    # Load template configuration\n    template_config = load_dataset_template(template_type)\n    if not template_config:\n        print(f\"❌ Could not load template '{template_type}'. Setup cancelled.\")\n        return\n\n    # Initialize repository\n    init_dataset(repo_id, token=token, private=True)\n\n    # Configure with template system prompt\n    system_prompt = template_config.get(\"system_prompt\", \"\")\n    if system_prompt:\n        define_config(repo_id, system_prompt=system_prompt, token=token)\n\n    # Add template examples\n    examples = template_config.get(\"examples\", [])\n    if examples:\n        add_rows(repo_id, examples, template=template_type, token=token)\n        print(f\"✅ Added {len(examples)} example(s) from template\")\n\n    print(f\"✅ Quick setup complete for {repo_id}\")\n    print(f\"📊 Dataset type: {template_config.get('description', 'No description')}\")\n\n    # Show next steps\n    print(f\"\\n📋 Next steps:\")\n    print(\n        f\"1. Add more data: uv run scripts/dataset_manager.py add_rows --repo_id {repo_id} --template {template_type} --rows_json 'your_data.json'\"\n    )\n    print(f\"2. View stats: uv run scripts/dataset_manager.py stats --repo_id {repo_id}\")\n    print(f\"3. Explore at: https://huggingface.co/datasets/{repo_id}\")\n\n\ndef show_stats(repo_id: str, token: Optional[str] = None) -> None:\n    \"\"\"Display statistics about the dataset.\"\"\"\n    api = HfApi(token=token)\n\n    try:\n        # Get repository info\n        repo_info = api.repo_info(repo_id, repo_type=\"dataset\")\n        print(f\"\\n📊 Dataset Stats: {repo_id}\")\n        print(f\"Created: {repo_info.created_at}\")\n        print(f\"Updated: {repo_info.last_modified}\")\n        print(f\"Private: {repo_info.private}\")\n\n        # List files\n        files = api.list_repo_files(repo_id, repo_type=\"dataset\")\n        data_files = [f for f in files if f.startswith(\"data/\")]\n        print(f\"Data files: {len(data_files)}\")\n\n        if \"config.json\" in files:\n            print(\"✅ Configuration present\")\n        else:\n            print(\"⚠️ No configuration found\")\n\n    except Exception as e:\n        print(f\"❌ Failed to get stats: {e}\")\n\n\ndef list_available_templates() -> None:\n    \"\"\"List all available dataset templates with descriptions.\"\"\"\n    templates_dir = EXAMPLES_DIR.parent / \"templates\"\n\n    if not templates_dir.exists():\n        print(\"❌ Templates directory not found\")\n        return\n\n    print(\"\\n📋 Available Dataset Templates:\")\n    print(\"=\" * 50)\n\n    for template_file in templates_dir.glob(\"*.json\"):\n        try:\n            with open(template_file) as f:\n                template = json.load(f)\n\n            name = template_file.stem\n            desc = template.get(\"description\", \"No description available\")\n            template_type = template.get(\"type\", name)\n\n            print(f\"\\n🏷️  {name}\")\n            print(f\"   Type: {template_type}\")\n            print(f\"   Description: {desc}\")\n\n            # Show required fields\n            schema = template.get(\"validation_schema\", {})\n            required = schema.get(\"required_fields\", [])\n            if required:\n                print(f\"   Required fields: {', '.join(required)}\")\n\n        except Exception as e:\n            print(f\"❌ Error loading template {template_file.name}: {e}\")\n\n    print(\n        f\"\\n💡 Usage: uv run scripts/dataset_manager.py quick_setup --repo_id your-username/dataset-name --template TEMPLATE_NAME\"\n    )\n    print(f\"📚 Example templates directory: {templates_dir}\")\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(description=\"Hugging Face Dataset Manager\")\n    subparsers = parser.add_subparsers(dest=\"command\", required=True)\n\n    # Init command\n    init_parser = subparsers.add_parser(\"init\", help=\"Initialize a new dataset\")\n    init_parser.add_argument(\"--repo_id\", required=True, help=\"Repository ID (user/repo_name)\")\n    init_parser.add_argument(\"--private\", action=\"store_true\", help=\"Make repository private\")\n\n    # Config command\n    config_parser = subparsers.add_parser(\"config\", help=\"Setup dataset config\")\n    config_parser.add_argument(\"--repo_id\", required=True, help=\"Repository ID\")\n    config_parser.add_argument(\"--system_prompt\", help=\"System prompt to store in config\")\n\n    # Add rows command\n    add_parser = subparsers.add_parser(\"add_rows\", help=\"Add rows to the dataset\")\n    add_parser.add_argument(\"--repo_id\", required=True, help=\"Repository ID\")\n    add_parser.add_argument(\"--split\", default=\"train\", help=\"Dataset split (e.g., train, test)\")\n    add_parser.add_argument(\n        \"--template\",\n        default=\"chat\",\n        choices=[\n            \"chat\",\n            \"classification\",\n            \"qa\",\n            \"completion\",\n            \"tabular\",\n            \"custom\",\n        ],\n        help=\"Dataset template type for validation\",\n    )\n    add_parser.add_argument(\n        \"--rows_json\",\n        required=True,\n        help=\"JSON string containing a list of rows\",\n    )\n    add_parser.add_argument(\n        \"--no-validate\",\n        dest=\"validate\",\n        action=\"store_false\",\n        help=\"Skip data validation\",\n    )\n\n    # Quick setup command\n    setup_parser = subparsers.add_parser(\"quick_setup\", help=\"Quick setup with template\")\n    setup_parser.add_argument(\"--repo_id\", required=True, help=\"Repository ID\")\n    setup_parser.add_argument(\n        \"--template\",\n        default=\"chat\",\n        choices=[\n            \"chat\",\n            \"classification\",\n            \"qa\",\n            \"completion\",\n            \"tabular\",\n            \"custom\",\n        ],\n        help=\"Dataset template type\",\n    )\n\n    # Stats command\n    stats_parser = subparsers.add_parser(\"stats\", help=\"Show dataset statistics\")\n    stats_parser.add_argument(\"--repo_id\", required=True, help=\"Repository ID\")\n\n    # List templates command\n    templates_parser = subparsers.add_parser(\"list_templates\", help=\"List available dataset templates\")\n\n    args = parser.parse_args()\n\n    token = HF_TOKEN\n    if not token:\n        print(\"Warning: HF_TOKEN environment variable not set.\")\n\n    if args.command == \"init\":\n        init_dataset(args.repo_id, token=token, private=args.private)\n    elif args.command == \"config\":\n        define_config(args.repo_id, system_prompt=args.system_prompt, token=token)\n    elif args.command == \"add_rows\":\n        try:\n            rows = json.loads(args.rows_json)\n            if not isinstance(rows, list):\n                raise ValueError(\"rows_json must be a JSON list of objects\")\n            add_rows(\n                args.repo_id,\n                rows,\n                split=args.split,\n                template=args.template,\n                validate=args.validate,\n                token=token,\n            )\n        except json.JSONDecodeError:\n            print(\"Error: Invalid JSON provided for --rows_json\")\n    elif args.command == \"quick_setup\":\n        quick_setup(args.repo_id, template_type=args.template, token=token)\n    elif args.command == \"stats\":\n        show_stats(args.repo_id, token=token)\n    elif args.command == \"list_templates\":\n        list_available_templates()\n"
  },
  {
    "path": "skills/hugging-face-datasets/scripts/sql_manager.py",
    "content": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#   \"duckdb>=1.0.0\",\n#   \"huggingface_hub>=0.20.0\",\n#   \"datasets>=2.14.0\",\n#   \"pandas>=2.0.0\",\n# ]\n# ///\n\"\"\"\nHugging Face Dataset SQL Manager\n\nQuery, transform, and push Hugging Face datasets using DuckDB's SQL interface.\nSupports the hf:// protocol for direct dataset access, data wrangling, and \npushing results back to the Hub.\n\nVersion: 1.0.0\n\nUsage:\n    # Query a dataset\n    uv run sql_manager.py query --dataset \"cais/mmlu\" --sql \"SELECT * FROM data LIMIT 10\"\n    \n    # Query and push to new dataset\n    uv run sql_manager.py query --dataset \"cais/mmlu\" --sql \"SELECT * FROM data WHERE subject='nutrition'\" \\\n        --push-to \"username/nutrition-subset\"\n    \n    # Describe dataset schema\n    uv run sql_manager.py describe --dataset \"cais/mmlu\"\n    \n    # List available splits/configs\n    uv run sql_manager.py info --dataset \"cais/mmlu\"\n    \n    # Get random sample\n    uv run sql_manager.py sample --dataset \"cais/mmlu\" --n 5\n    \n    # Export to parquet\n    uv run sql_manager.py export --dataset \"cais/mmlu\" --output \"data.parquet\"\n\"\"\"\n\nimport os\nimport json\nimport re\nimport argparse\nfrom typing import Optional, List, Dict, Any, Union\n\nimport duckdb\nfrom huggingface_hub import HfApi\n\n# Regex for valid SQL identifiers (column names, view names)\n_IDENTIFIER_RE = re.compile(r\"^[A-Za-z_][A-Za-z0-9_]*$\")\n\n\n# Configuration\nHF_TOKEN = os.environ.get(\"HF_TOKEN\")\n\n\nclass HFDatasetSQL:\n    \"\"\"\n    Query Hugging Face datasets using DuckDB SQL.\n\n    Examples:\n        >>> sql = HFDatasetSQL()\n        >>> results = sql.query(\"cais/mmlu\", \"SELECT * FROM data LIMIT 5\")\n        >>> schema = sql.describe(\"cais/mmlu\")\n        >>> sql.query_and_push(\"cais/mmlu\", \"SELECT * FROM data WHERE subject='nutrition'\", \"user/nutrition-qa\")\n    \"\"\"\n\n    def __init__(self, token: Optional[str] = None):\n        \"\"\"Initialize the SQL manager with optional HF token.\"\"\"\n        self.token = token or HF_TOKEN\n        self.conn = duckdb.connect()\n        self._setup_connection()\n\n    @staticmethod\n    def _quote_identifier(name: str) -> str:\n        \"\"\"Quote a SQL identifier, escaping embedded double-quotes.\"\"\"\n        return '\"' + name.replace('\"', '\"\"') + '\"'\n\n    @staticmethod\n    def _validate_identifier(name: str) -> None:\n        \"\"\"Raise ValueError if *name* is not a safe SQL identifier.\"\"\"\n        if not _IDENTIFIER_RE.match(name):\n            raise ValueError(\n                f\"Invalid identifier: {name!r}. \"\n                \"Identifiers must start with a letter or underscore and contain only \"\n                \"alphanumeric characters and underscores.\"\n            )\n\n    def _setup_connection(self):\n        \"\"\"Configure DuckDB connection for HF access.\"\"\"\n        # Set HF token if available (for private datasets)\n        if self.token:\n            self.conn.execute(\"CREATE SECRET hf_token (TYPE HUGGINGFACE, TOKEN $1);\", [self.token])\n\n    def _build_hf_path(\n        self, dataset_id: str, split: str = \"*\", config: Optional[str] = None, revision: str = \"~parquet\"\n    ) -> str:\n        \"\"\"\n        Build the hf:// path for a dataset.\n\n        Args:\n            dataset_id: Dataset ID (e.g., \"cais/mmlu\")\n            split: Split name or \"*\" for all splits\n            config: Optional config/subset name\n            revision: Revision, defaults to ~parquet for auto-converted parquet\n\n        Returns:\n            hf:// path string\n        \"\"\"\n        if config:\n            return f\"hf://datasets/{dataset_id}@{revision}/{config}/{split}/*.parquet\"\n        else:\n            return f\"hf://datasets/{dataset_id}@{revision}/default/{split}/*.parquet\"\n\n    def _build_hf_path_flexible(\n        self,\n        dataset_id: str,\n        split: Optional[str] = None,\n        config: Optional[str] = None,\n    ) -> str:\n        \"\"\"\n        Build flexible hf:// path with wildcards for discovery.\n\n        Args:\n            dataset_id: Dataset ID\n            split: Optional specific split\n            config: Optional config name\n\n        Returns:\n            hf:// path with appropriate wildcards\n        \"\"\"\n        base = f\"hf://datasets/{dataset_id}@~parquet\"\n\n        if config and split:\n            return f\"{base}/{config}/{split}/*.parquet\"\n        elif config:\n            return f\"{base}/{config}/*/*.parquet\"\n        elif split:\n            return f\"{base}/*/{split}/*.parquet\"\n        else:\n            return f\"{base}/*/*/*.parquet\"\n\n    def query(\n        self,\n        dataset_id: str,\n        sql: str,\n        split: str = \"train\",\n        config: Optional[str] = None,\n        limit: Optional[int] = None,\n        output_format: str = \"dict\",\n    ) -> Union[List[Dict], Any]:\n        \"\"\"\n        Execute SQL query on a Hugging Face dataset.\n\n        Args:\n            dataset_id: Dataset ID (e.g., \"cais/mmlu\", \"ibm/duorc\")\n            sql: SQL query. Use 'data' as table name (will be replaced with actual path)\n            split: Dataset split (train, test, validation, or * for all)\n            config: Optional dataset config/subset\n            limit: Optional limit override\n            output_format: Output format - \"dict\", \"df\" (pandas), \"arrow\", \"raw\"\n\n        Returns:\n            Query results in specified format\n\n        Examples:\n            >>> sql.query(\"cais/mmlu\", \"SELECT * FROM data WHERE subject='nutrition' LIMIT 10\")\n            >>> sql.query(\"cais/mmlu\", \"SELECT subject, COUNT(*) as cnt FROM data GROUP BY subject\")\n        \"\"\"\n        # Build the HF path\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n\n        # Replace 'data' placeholder with actual path\n        # Handle various SQL patterns\n        processed_sql = sql.replace(\"FROM data\", f\"FROM '{hf_path}'\")\n        processed_sql = processed_sql.replace(\"from data\", f\"FROM '{hf_path}'\")\n        processed_sql = processed_sql.replace(\"JOIN data\", f\"JOIN '{hf_path}'\")\n        processed_sql = processed_sql.replace(\"join data\", f\"JOIN '{hf_path}'\")\n\n        # If user provides raw path, use as-is\n        if \"hf://\" in sql:\n            processed_sql = sql\n\n        # Apply limit if specified and not already in query\n        if limit and \"LIMIT\" not in processed_sql.upper():\n            processed_sql += f\" LIMIT {limit}\"\n\n        try:\n            result = self.conn.execute(processed_sql)\n\n            if output_format == \"df\":\n                return result.fetchdf()\n            elif output_format == \"arrow\":\n                return result.fetch_arrow_table()\n            elif output_format == \"raw\":\n                return result.fetchall()\n            else:  # dict\n                columns = [desc[0] for desc in result.description]\n                rows = result.fetchall()\n                return [dict(zip(columns, row)) for row in rows]\n\n        except Exception as e:\n            print(f\"❌ Query error: {e}\")\n            print(f\"   SQL: {processed_sql[:200]}...\")\n            raise\n\n    def query_raw(self, sql: str, output_format: str = \"dict\") -> Union[List[Dict], Any]:\n        \"\"\"\n        Execute raw SQL query without path substitution.\n\n        Useful for queries that already contain full hf:// paths or for\n        multi-dataset queries.\n\n        Args:\n            sql: Complete SQL query\n            output_format: Output format\n\n        Returns:\n            Query results\n        \"\"\"\n        result = self.conn.execute(sql)\n\n        if output_format == \"df\":\n            return result.fetchdf()\n        elif output_format == \"arrow\":\n            return result.fetch_arrow_table()\n        elif output_format == \"raw\":\n            return result.fetchall()\n        else:\n            columns = [desc[0] for desc in result.description]\n            rows = result.fetchall()\n            return [dict(zip(columns, row)) for row in rows]\n\n    def describe(self, dataset_id: str, split: str = \"train\", config: Optional[str] = None) -> List[Dict[str, str]]:\n        \"\"\"\n        Get schema/structure of a dataset.\n\n        Args:\n            dataset_id: Dataset ID\n            split: Dataset split\n            config: Optional config\n\n        Returns:\n            List of column definitions with name, type, nullable info\n        \"\"\"\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n\n        sql = f\"DESCRIBE SELECT * FROM '{hf_path}' LIMIT 1\"\n        result = self.conn.execute(sql)\n\n        columns = [desc[0] for desc in result.description]\n        rows = result.fetchall()\n\n        return [dict(zip(columns, row)) for row in rows]\n\n    def sample(\n        self,\n        dataset_id: str,\n        n: int = 10,\n        split: str = \"train\",\n        config: Optional[str] = None,\n        seed: Optional[int] = None,\n    ) -> List[Dict]:\n        \"\"\"\n        Get a random sample from a dataset.\n\n        Args:\n            dataset_id: Dataset ID\n            n: Number of samples\n            split: Dataset split\n            config: Optional config\n            seed: Random seed for reproducibility\n\n        Returns:\n            List of sampled rows\n        \"\"\"\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n\n        if seed is not None:\n            sql = f\"SELECT * FROM '{hf_path}' USING SAMPLE {n} (RESERVOIR, {seed})\"\n        else:\n            sql = f\"SELECT * FROM '{hf_path}' USING SAMPLE {n}\"\n\n        return self.query_raw(sql)\n\n    def count(\n        self, dataset_id: str, split: str = \"train\", config: Optional[str] = None, where: Optional[str] = None\n    ) -> int:\n        \"\"\"\n        Count rows in a dataset, optionally with filter.\n\n        Args:\n            dataset_id: Dataset ID\n            split: Dataset split\n            config: Optional config\n            where: Optional WHERE clause (without WHERE keyword)\n\n        Returns:\n            Row count\n        \"\"\"\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n\n        sql = f\"SELECT COUNT(*) FROM '{hf_path}'\"\n        if where:\n            sql += f\" WHERE {where}\"\n\n        result = self.conn.execute(sql).fetchone()\n        return result[0] if result else 0\n\n    def unique_values(\n        self, dataset_id: str, column: str, split: str = \"train\", config: Optional[str] = None, limit: int = 100\n    ) -> List[Any]:\n        \"\"\"\n        Get unique values in a column.\n\n        Args:\n            dataset_id: Dataset ID\n            column: Column name\n            split: Dataset split\n            config: Optional config\n            limit: Max unique values to return\n\n        Returns:\n            List of unique values\n        \"\"\"\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n\n        quoted_col = self._quote_identifier(column)\n        sql = f\"SELECT DISTINCT {quoted_col} FROM '{hf_path}' LIMIT {limit}\"\n        result = self.conn.execute(sql).fetchall()\n\n        return [row[0] for row in result]\n\n    def histogram(\n        self, dataset_id: str, column: str, split: str = \"train\", config: Optional[str] = None, bins: int = 10\n    ) -> List[Dict]:\n        \"\"\"\n        Get value distribution/histogram for a column.\n\n        Args:\n            dataset_id: Dataset ID\n            column: Column name\n            split: Dataset split\n            config: Optional config\n            bins: Number of bins for numeric columns\n\n        Returns:\n            Distribution data\n        \"\"\"\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n\n        quoted_col = self._quote_identifier(column)\n        sql = f\"\"\"\n        SELECT \n            {quoted_col},\n            COUNT(*) as count\n        FROM '{hf_path}'\n        GROUP BY {quoted_col}\n        ORDER BY count DESC\n        LIMIT {bins}\n        \"\"\"\n\n        return self.query_raw(sql)\n\n    def filter_and_transform(\n        self,\n        dataset_id: str,\n        select: str = \"*\",\n        where: Optional[str] = None,\n        group_by: Optional[str] = None,\n        order_by: Optional[str] = None,\n        split: str = \"train\",\n        config: Optional[str] = None,\n        limit: Optional[int] = None,\n    ) -> List[Dict]:\n        \"\"\"\n        Filter and transform dataset with SQL clauses.\n\n        Args:\n            dataset_id: Dataset ID\n            select: SELECT clause (columns, expressions, aggregations)\n            where: WHERE clause (filter conditions)\n            group_by: GROUP BY clause\n            order_by: ORDER BY clause\n            split: Dataset split\n            config: Optional config\n            limit: Row limit\n\n        Returns:\n            Transformed data\n\n        Examples:\n            >>> sql.filter_and_transform(\n            ...     \"cais/mmlu\",\n            ...     select=\"subject, COUNT(*) as cnt\",\n            ...     group_by=\"subject\",\n            ...     order_by=\"cnt DESC\",\n            ...     limit=10\n            ... )\n        \"\"\"\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n\n        sql_parts = [f\"SELECT {select}\", f\"FROM '{hf_path}'\"]\n\n        if where:\n            sql_parts.append(f\"WHERE {where}\")\n        if group_by:\n            sql_parts.append(f\"GROUP BY {group_by}\")\n        if order_by:\n            sql_parts.append(f\"ORDER BY {order_by}\")\n        if limit:\n            sql_parts.append(f\"LIMIT {limit}\")\n\n        sql = \" \".join(sql_parts)\n        return self.query_raw(sql)\n\n    def join_datasets(\n        self,\n        left_dataset: str,\n        right_dataset: str,\n        on: str,\n        select: str = \"*\",\n        join_type: str = \"INNER\",\n        left_split: str = \"train\",\n        right_split: str = \"train\",\n        left_config: Optional[str] = None,\n        right_config: Optional[str] = None,\n        limit: Optional[int] = None,\n    ) -> List[Dict]:\n        \"\"\"\n        Join two datasets.\n\n        Args:\n            left_dataset: Left dataset ID\n            right_dataset: Right dataset ID\n            on: JOIN condition (e.g., \"left.id = right.id\")\n            select: SELECT clause\n            join_type: Type of join (INNER, LEFT, RIGHT, FULL)\n            left_split: Split for left dataset\n            right_split: Split for right dataset\n            left_config: Config for left dataset\n            right_config: Config for right dataset\n            limit: Row limit\n\n        Returns:\n            Joined data\n        \"\"\"\n        left_path = self._build_hf_path(left_dataset, split=left_split, config=left_config)\n        right_path = self._build_hf_path(right_dataset, split=right_split, config=right_config)\n\n        sql = f\"\"\"\n        SELECT {select}\n        FROM '{left_path}' AS left_table\n        {join_type} JOIN '{right_path}' AS right_table\n        ON {on}\n        \"\"\"\n\n        if limit:\n            sql += f\" LIMIT {limit}\"\n\n        return self.query_raw(sql)\n\n    def export_to_parquet(\n        self,\n        dataset_id: str,\n        output_path: str,\n        sql: Optional[str] = None,\n        split: str = \"train\",\n        config: Optional[str] = None,\n    ) -> str:\n        \"\"\"\n        Export query results to a local Parquet file.\n\n        Args:\n            dataset_id: Source dataset ID\n            output_path: Local path for output Parquet file\n            sql: Optional SQL query (uses SELECT * if not provided)\n            split: Dataset split\n            config: Optional config\n\n        Returns:\n            Path to created file\n        \"\"\"\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n\n        if sql:\n            # Process the query\n            processed_sql = sql.replace(\"FROM data\", f\"FROM '{hf_path}'\")\n            processed_sql = processed_sql.replace(\"from data\", f\"FROM '{hf_path}'\")\n        else:\n            processed_sql = f\"SELECT * FROM '{hf_path}'\"\n\n        if \"'\" in output_path:\n            raise ValueError(f\"Invalid output path: paths must not contain single quotes\")\n        export_sql = f\"COPY ({processed_sql}) TO '{output_path}' (FORMAT PARQUET)\"\n        self.conn.execute(export_sql)\n\n        print(f\"✅ Exported to {output_path}\")\n        return output_path\n\n    def export_to_jsonl(\n        self,\n        dataset_id: str,\n        output_path: str,\n        sql: Optional[str] = None,\n        split: str = \"train\",\n        config: Optional[str] = None,\n    ) -> str:\n        \"\"\"\n        Export query results to JSONL format.\n\n        Args:\n            dataset_id: Source dataset ID\n            output_path: Local path for output JSONL file\n            sql: Optional SQL query\n            split: Dataset split\n            config: Optional config\n\n        Returns:\n            Path to created file\n        \"\"\"\n        results = self.query(dataset_id, sql or \"SELECT * FROM data\", split=split, config=config)\n\n        with open(output_path, \"w\") as f:\n            for row in results:\n                f.write(json.dumps(row) + \"\\n\")\n\n        print(f\"✅ Exported {len(results)} rows to {output_path}\")\n        return output_path\n\n    def push_to_hub(\n        self,\n        dataset_id: str,\n        target_repo: str,\n        sql: Optional[str] = None,\n        split: str = \"train\",\n        config: Optional[str] = None,\n        target_split: str = \"train\",\n        private: bool = True,\n        commit_message: Optional[str] = None,\n    ) -> str:\n        \"\"\"\n        Query a dataset and push results to a new Hub repository.\n\n        Args:\n            dataset_id: Source dataset ID\n            target_repo: Target repository ID (e.g., \"username/new-dataset\")\n            sql: SQL query to transform data (optional, defaults to SELECT *)\n            split: Source split\n            config: Source config\n            target_split: Target split name\n            private: Whether to create private repo\n            commit_message: Commit message\n\n        Returns:\n            URL of created dataset\n        \"\"\"\n        try:\n            from datasets import Dataset\n        except ImportError:\n            raise ImportError(\n                \"datasets library required for push_to_hub. \"\n                \"Run with `uv run ...` or install with `uv pip install datasets`.\"\n            )\n\n        # Execute query\n        results = self.query(dataset_id, sql or \"SELECT * FROM data\", split=split, config=config)\n\n        if not results:\n            print(\"❌ No results to push\")\n            return \"\"\n\n        # Convert to HF Dataset\n        ds = Dataset.from_list(results)\n\n        # Push to Hub\n        ds.push_to_hub(\n            target_repo,\n            split=target_split,\n            private=private,\n            commit_message=commit_message or f\"Created from {dataset_id} via SQL query\",\n            token=self.token,\n        )\n\n        url = f\"https://huggingface.co/datasets/{target_repo}\"\n        print(f\"✅ Pushed {len(results)} rows to {url}\")\n        return url\n\n    def create_view(self, name: str, dataset_id: str, split: str = \"train\", config: Optional[str] = None):\n        \"\"\"\n        Create a DuckDB view for easier querying.\n\n        Args:\n            name: View name\n            dataset_id: Dataset ID\n            split: Dataset split\n            config: Optional config\n        \"\"\"\n        self._validate_identifier(name)\n        hf_path = self._build_hf_path(dataset_id, split=split, config=config)\n        quoted_name = self._quote_identifier(name)\n        self.conn.execute(f\"CREATE OR REPLACE VIEW {quoted_name} AS SELECT * FROM '{hf_path}'\")\n        print(f\"✅ Created view '{name}' for {dataset_id}\")\n\n    def info(self, dataset_id: str) -> Dict[str, Any]:\n        \"\"\"\n        Get information about a dataset including available configs and splits.\n\n        Args:\n            dataset_id: Dataset ID\n\n        Returns:\n            Dataset information\n        \"\"\"\n        api = HfApi(token=self.token)\n\n        try:\n            info = api.dataset_info(dataset_id)\n\n            result = {\n                \"id\": info.id,\n                \"author\": info.author,\n                \"private\": info.private,\n                \"downloads\": info.downloads,\n                \"likes\": info.likes,\n                \"tags\": info.tags,\n                \"created_at\": str(info.created_at) if info.created_at else None,\n                \"last_modified\": str(info.last_modified) if info.last_modified else None,\n            }\n\n            # Try to get config/split info from card data\n            if info.card_data:\n                result[\"configs\"] = getattr(info.card_data, \"configs\", None)\n\n            return result\n\n        except Exception as e:\n            print(f\"❌ Failed to get info: {e}\")\n            return {}\n\n    def close(self):\n        \"\"\"Close the database connection.\"\"\"\n        self.conn.close()\n\n\ndef main():\n    \"\"\"CLI entry point.\"\"\"\n    parser = argparse.ArgumentParser(\n        description=\"Query Hugging Face datasets with SQL\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=\"\"\"\nExamples:\n  # Query dataset with SQL\n  python sql_manager.py query --dataset \"cais/mmlu\" --sql \"SELECT * FROM data WHERE subject='nutrition' LIMIT 10\"\n  \n  # Get random sample\n  python sql_manager.py sample --dataset \"cais/mmlu\" --n 5\n  \n  # Describe schema\n  python sql_manager.py describe --dataset \"cais/mmlu\"\n  \n  # Get value counts\n  python sql_manager.py histogram --dataset \"cais/mmlu\" --column \"subject\"\n  \n  # Filter and transform\n  python sql_manager.py transform --dataset \"cais/mmlu\" \\\\\n    --select \"subject, COUNT(*) as cnt\" \\\\\n    --group-by \"subject\" \\\\\n    --order-by \"cnt DESC\"\n  \n  # Query and push to Hub\n  python sql_manager.py query --dataset \"cais/mmlu\" \\\\\n    --sql \"SELECT * FROM data WHERE subject='nutrition'\" \\\\\n    --push-to \"username/nutrition-subset\"\n  \n  # Export to Parquet\n  python sql_manager.py export --dataset \"cais/mmlu\" \\\\\n    --sql \"SELECT * FROM data WHERE subject='nutrition'\" \\\\\n    --output \"nutrition.parquet\"\n        \"\"\",\n    )\n\n    subparsers = parser.add_subparsers(dest=\"command\", required=True)\n\n    # Common arguments\n    def add_common_args(p):\n        p.add_argument(\"--dataset\", \"-d\", required=True, help=\"Dataset ID (e.g., cais/mmlu)\")\n        p.add_argument(\"--split\", \"-s\", default=\"train\", help=\"Dataset split (default: train)\")\n        p.add_argument(\"--config\", \"-c\", help=\"Dataset config/subset\")\n\n    # Query command\n    query_parser = subparsers.add_parser(\"query\", help=\"Execute SQL query on dataset\")\n    add_common_args(query_parser)\n    query_parser.add_argument(\"--sql\", required=True, help=\"SQL query (use 'data' as table name)\")\n    query_parser.add_argument(\"--limit\", \"-l\", type=int, help=\"Limit results\")\n    query_parser.add_argument(\"--format\", choices=[\"json\", \"table\", \"csv\"], default=\"json\", help=\"Output format\")\n    query_parser.add_argument(\"--push-to\", help=\"Push results to this Hub repo\")\n    query_parser.add_argument(\"--private\", action=\"store_true\", help=\"Make pushed repo private\")\n\n    # Sample command\n    sample_parser = subparsers.add_parser(\"sample\", help=\"Get random sample from dataset\")\n    add_common_args(sample_parser)\n    sample_parser.add_argument(\"--n\", type=int, default=10, help=\"Number of samples\")\n    sample_parser.add_argument(\"--seed\", type=int, help=\"Random seed\")\n\n    # Describe command\n    describe_parser = subparsers.add_parser(\"describe\", help=\"Get dataset schema\")\n    add_common_args(describe_parser)\n\n    # Count command\n    count_parser = subparsers.add_parser(\"count\", help=\"Count rows in dataset\")\n    add_common_args(count_parser)\n    count_parser.add_argument(\"--where\", \"-w\", help=\"WHERE clause for filtering\")\n\n    # Histogram command\n    histogram_parser = subparsers.add_parser(\"histogram\", help=\"Get value distribution\")\n    add_common_args(histogram_parser)\n    histogram_parser.add_argument(\"--column\", required=True, help=\"Column name\")\n    histogram_parser.add_argument(\"--bins\", type=int, default=20, help=\"Number of bins\")\n\n    # Unique command\n    unique_parser = subparsers.add_parser(\"unique\", help=\"Get unique values in column\")\n    add_common_args(unique_parser)\n    unique_parser.add_argument(\"--column\", required=True, help=\"Column name\")\n    unique_parser.add_argument(\"--limit\", \"-l\", type=int, default=100, help=\"Max values\")\n\n    # Transform command\n    transform_parser = subparsers.add_parser(\"transform\", help=\"Filter and transform dataset\")\n    add_common_args(transform_parser)\n    transform_parser.add_argument(\"--select\", default=\"*\", help=\"SELECT clause\")\n    transform_parser.add_argument(\"--where\", \"-w\", help=\"WHERE clause\")\n    transform_parser.add_argument(\"--group-by\", help=\"GROUP BY clause\")\n    transform_parser.add_argument(\"--order-by\", help=\"ORDER BY clause\")\n    transform_parser.add_argument(\"--limit\", \"-l\", type=int, help=\"LIMIT\")\n    transform_parser.add_argument(\"--push-to\", help=\"Push results to Hub repo\")\n\n    # Export command\n    export_parser = subparsers.add_parser(\"export\", help=\"Export query results to file\")\n    add_common_args(export_parser)\n    export_parser.add_argument(\"--sql\", help=\"SQL query (defaults to SELECT *)\")\n    export_parser.add_argument(\"--output\", \"-o\", required=True, help=\"Output file path\")\n    export_parser.add_argument(\"--format\", choices=[\"parquet\", \"jsonl\"], default=\"parquet\", help=\"Output format\")\n\n    # Info command\n    info_parser = subparsers.add_parser(\"info\", help=\"Get dataset information\")\n    info_parser.add_argument(\"--dataset\", \"-d\", required=True, help=\"Dataset ID\")\n\n    # Raw SQL command\n    raw_parser = subparsers.add_parser(\"raw\", help=\"Execute raw SQL with full hf:// paths\")\n    raw_parser.add_argument(\"--sql\", required=True, help=\"Complete SQL query\")\n    raw_parser.add_argument(\"--format\", choices=[\"json\", \"table\", \"csv\"], default=\"json\", help=\"Output format\")\n\n    args = parser.parse_args()\n\n    # Initialize SQL manager\n    sql = HFDatasetSQL()\n\n    try:\n        if args.command == \"query\":\n            results = sql.query(args.dataset, args.sql, split=args.split, config=args.config, limit=args.limit)\n\n            if getattr(args, \"push_to\", None):\n                sql.push_to_hub(\n                    args.dataset, args.push_to, sql=args.sql, split=args.split, config=args.config, private=args.private\n                )\n            else:\n                _print_results(results, args.format)\n\n        elif args.command == \"sample\":\n            results = sql.sample(args.dataset, n=args.n, split=args.split, config=args.config, seed=args.seed)\n            _print_results(results, \"json\")\n\n        elif args.command == \"describe\":\n            schema = sql.describe(args.dataset, split=args.split, config=args.config)\n            _print_results(schema, \"table\")\n\n        elif args.command == \"count\":\n            count = sql.count(args.dataset, split=args.split, config=args.config, where=args.where)\n            print(f\"Count: {count:,}\")\n\n        elif args.command == \"histogram\":\n            results = sql.histogram(args.dataset, args.column, split=args.split, config=args.config, bins=args.bins)\n            _print_results(results, \"table\")\n\n        elif args.command == \"unique\":\n            values = sql.unique_values(\n                args.dataset, args.column, split=args.split, config=args.config, limit=args.limit\n            )\n            for v in values:\n                print(v)\n\n        elif args.command == \"transform\":\n            results = sql.filter_and_transform(\n                args.dataset,\n                select=args.select,\n                where=args.where,\n                group_by=args.group_by,\n                order_by=args.order_by,\n                split=args.split,\n                config=args.config,\n                limit=args.limit,\n            )\n\n            if getattr(args, \"push_to\", None):\n                # Build SQL for push\n                query_sql = f\"SELECT {args.select} FROM data\"\n                if args.where:\n                    query_sql += f\" WHERE {args.where}\"\n                if args.group_by:\n                    query_sql += f\" GROUP BY {args.group_by}\"\n                if args.order_by:\n                    query_sql += f\" ORDER BY {args.order_by}\"\n                if args.limit:\n                    query_sql += f\" LIMIT {args.limit}\"\n\n                sql.push_to_hub(args.dataset, args.push_to, sql=query_sql, split=args.split, config=args.config)\n            else:\n                _print_results(results, \"json\")\n\n        elif args.command == \"export\":\n            if args.format == \"parquet\":\n                sql.export_to_parquet(args.dataset, args.output, sql=args.sql, split=args.split, config=args.config)\n            else:\n                sql.export_to_jsonl(args.dataset, args.output, sql=args.sql, split=args.split, config=args.config)\n\n        elif args.command == \"info\":\n            info = sql.info(args.dataset)\n            _print_results([info], \"json\")\n\n        elif args.command == \"raw\":\n            results = sql.query_raw(args.sql)\n            _print_results(results, args.format)\n\n    finally:\n        sql.close()\n\n\ndef _print_results(results: List[Dict], format: str):\n    \"\"\"Print results in specified format.\"\"\"\n    if not results:\n        print(\"No results\")\n        return\n\n    if format == \"json\":\n        print(json.dumps(results, indent=2, default=str))\n\n    elif format == \"csv\":\n        if results:\n            keys = results[0].keys()\n            print(\",\".join(str(k) for k in keys))\n            for row in results:\n                print(\",\".join(str(row.get(k, \"\")) for k in keys))\n\n    elif format == \"table\":\n        if results:\n            keys = list(results[0].keys())\n            # Calculate column widths\n            widths = {k: max(len(str(k)), max(len(str(r.get(k, \"\"))) for r in results)) for k in keys}\n\n            # Header\n            header = \" | \".join(str(k).ljust(widths[k]) for k in keys)\n            print(header)\n            print(\"-\" * len(header))\n\n            # Rows\n            for row in results:\n                print(\" | \".join(str(row.get(k, \"\")).ljust(widths[k]) for k in keys))\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-datasets/templates/chat.json",
    "content": "{\n  \"type\": \"chat\",\n  \"description\": \"Conversational AI training data with chat templating support\",\n  \"system_prompt\": \"You are an AI assistant expert at using tools and engaging in helpful conversations. Generate training examples that demonstrate natural dialogue patterns, tool usage, and problem-solving interactions.\",\n  \"validation_schema\": {\n    \"required_fields\": [\"messages\"],\n    \"recommended_fields\": [\"scenario\", \"complexity\", \"tools_used\", \"learning_objectives\"],\n    \"field_types\": {\n      \"messages\": \"array\",\n      \"scenario\": \"string\",\n      \"complexity\": \"enum:simple,intermediate,advanced\",\n      \"tools_used\": \"array\",\n      \"learning_objectives\": \"array\"\n    }\n  },\n  \"example_structure\": {\n    \"messages\": [\n      {\n        \"role\": \"user|assistant|tool|system\",\n        \"content\": \"Message content\",\n        \"tool_calls\": [\n          {\n            \"type\": \"function\",\n            \"function\": {\n              \"name\": \"tool_name\",\n              \"arguments\": {\"param\": \"value\"}\n            }\n          }\n        ]\n      }\n    ],\n    \"scenario\": \"Brief description of the use case\",\n    \"complexity\": \"simple|intermediate|advanced\",\n    \"tools_used\": [\"tool1\", \"tool2\"],\n    \"learning_objectives\": [\"objective1\", \"objective2\"]\n  },\n  \"examples\": [\n    {\n      \"messages\": [\n        {\n          \"role\": \"user\",\n          \"content\": \"Hello! Can you help me understand how to use APIs?\"\n        },\n        {\n          \"role\": \"assistant\",\n          \"content\": \"I'd be happy to help you understand APIs! An API (Application Programming Interface) is like a bridge that allows different software applications to communicate with each other.\\n\\nThink of it like ordering at a restaurant:\\n- You (the client) look at the menu (API documentation)\\n- You tell the waiter (the API) what you want\\n- The waiter takes your order to the kitchen (the server)\\n- The kitchen prepares your food (processes the request)\\n- The waiter brings your food back (returns the response)\\n\\nWould you like me to show you a simple example of making an API call?\"\n        }\n      ],\n      \"scenario\": \"API education and explanation\",\n      \"complexity\": \"simple\",\n      \"tools_used\": [],\n      \"learning_objectives\": [\"API concepts\", \"educational conversation\", \"analogical explanation\"]\n    }\n  ]\n}"
  },
  {
    "path": "skills/hugging-face-datasets/templates/classification.json",
    "content": "{\n  \"type\": \"classification\",\n  \"description\": \"Text classification datasets for sentiment analysis, topic classification, intent detection, etc.\",\n  \"system_prompt\": \"Create high-quality text classification training data. Each example should have clear, unambiguous text and appropriate labels. Ensure balanced representation across all classes and include diverse language patterns.\",\n  \"validation_schema\": {\n    \"required_fields\": [\"text\", \"label\"],\n    \"recommended_fields\": [\"confidence\", \"metadata\", \"source\"],\n    \"field_types\": {\n      \"text\": \"string\",\n      \"label\": \"string|array\",\n      \"confidence\": \"number\",\n      \"metadata\": \"object\",\n      \"source\": \"string\"\n    }\n  },\n  \"example_structure\": {\n    \"text\": \"Input text to be classified\",\n    \"label\": \"classification_label\",\n    \"confidence\": 0.95,\n    \"metadata\": {\n      \"length\": 42,\n      \"language\": \"en\",\n      \"domain\": \"technology\"\n    },\n    \"source\": \"web_scraping\"\n  },\n  \"examples\": [\n    {\n      \"text\": \"I absolutely love this new smartphone! The camera quality is outstanding and the battery lasts all day.\",\n      \"label\": \"positive\",\n      \"confidence\": 0.98,\n      \"metadata\": {\n        \"length\": 98,\n        \"language\": \"en\",\n        \"domain\": \"product_reviews\"\n      },\n      \"source\": \"customer_feedback\"\n    },\n    {\n      \"text\": \"The software keeps crashing and customer support is unresponsive. Very disappointed with this purchase.\",\n      \"label\": \"negative\",\n      \"confidence\": 0.92,\n      \"metadata\": {\n        \"length\": 96,\n        \"language\": \"en\",\n        \"domain\": \"product_reviews\"\n      },\n      \"source\": \"customer_feedback\"\n    },\n    {\n      \"text\": \"Book a table for two at 7 PM tonight\",\n      \"label\": \"restaurant_reservation\",\n      \"confidence\": 0.95,\n      \"metadata\": {\n        \"length\": 35,\n        \"language\": \"en\",\n        \"domain\": \"intent_detection\"\n      },\n      \"source\": \"voice_assistant\"\n    }\n  ]\n}"
  },
  {
    "path": "skills/hugging-face-datasets/templates/completion.json",
    "content": "{\n  \"type\": \"completion\",\n  \"description\": \"Text completion and generation datasets for language modeling, code completion, creative writing\",\n  \"system_prompt\": \"Generate diverse text completion examples that demonstrate natural language patterns, creativity, and contextual understanding. Include various domains like code, creative writing, technical documentation, and conversational text.\",\n  \"validation_schema\": {\n    \"required_fields\": [\"prompt\", \"completion\"],\n    \"recommended_fields\": [\"domain\", \"style\", \"length\", \"difficulty\"],\n    \"field_types\": {\n      \"prompt\": \"string\",\n      \"completion\": \"string\",\n      \"domain\": \"enum:code,creative,technical,conversational,academic\",\n      \"style\": \"string\",\n      \"length\": \"enum:short,medium,long\",\n      \"difficulty\": \"enum:beginner,intermediate,advanced\"\n    }\n  },\n  \"example_structure\": {\n    \"prompt\": \"The beginning text or context\",\n    \"completion\": \"The expected continuation or completion\",\n    \"domain\": \"code|creative|technical|conversational|academic\",\n    \"style\": \"description of writing style\",\n    \"length\": \"short|medium|long\",\n    \"difficulty\": \"beginner|intermediate|advanced\"\n  },\n  \"examples\": [\n    {\n      \"prompt\": \"def fibonacci(n):\\n    \\\"\\\"\\\"Calculate the nth Fibonacci number.\\\"\\\"\\\"\\n    if n <= 1:\\n        return n\\n    else:\",\n      \"completion\": \"        return fibonacci(n-1) + fibonacci(n-2)\",\n      \"domain\": \"code\",\n      \"style\": \"python_function\",\n      \"length\": \"short\",\n      \"difficulty\": \"beginner\"\n    },\n    {\n      \"prompt\": \"The old lighthouse stood silently on the cliff, its beacon long extinguished. Sarah approached slowly, wondering\",\n      \"completion\": \"if the stories her grandmother had told her were true. As she pushed open the weathered door, a gust of salt air carried whispers of the past, and she could almost see the ghostly figure of the lighthouse keeper who had vanished mysteriously fifty years ago.\",\n      \"domain\": \"creative\",\n      \"style\": \"narrative_fiction\",\n      \"length\": \"medium\",\n      \"difficulty\": \"intermediate\"\n    },\n    {\n      \"prompt\": \"To configure SSL/TLS encryption for your web server, you'll need to:\",\n      \"completion\": \"1. Obtain an SSL certificate from a trusted Certificate Authority (CA)\\n2. Install the certificate on your web server\\n3. Configure your server to use HTTPS (port 443)\\n4. Update your application to redirect HTTP traffic to HTTPS\\n5. Test the implementation using SSL testing tools\\n\\nMake sure to backup your server configuration before making changes.\",\n      \"domain\": \"technical\",\n      \"style\": \"instructional\",\n      \"length\": \"medium\",\n      \"difficulty\": \"intermediate\"\n    }\n  ]\n}"
  },
  {
    "path": "skills/hugging-face-datasets/templates/custom.json",
    "content": "{\n  \"type\": \"custom\",\n  \"description\": \"Flexible template for custom dataset formats - define your own schema and validation rules\",\n  \"system_prompt\": \"Generate data according to the custom schema provided. Follow the field definitions, data types, and validation rules specified in the schema configuration.\",\n  \"validation_schema\": {\n    \"required_fields\": [\"data\"],\n    \"recommended_fields\": [\"schema\", \"metadata\"],\n    \"field_types\": {\n      \"data\": \"any\",\n      \"schema\": \"object\",\n      \"metadata\": \"object\"\n    }\n  },\n  \"example_structure\": {\n    \"data\": \"Your custom data structure goes here\",\n    \"schema\": {\n      \"fields\": [\n        {\n          \"name\": \"field_name\",\n          \"type\": \"string|number|boolean|array|object\",\n          \"required\": true,\n          \"description\": \"Field description\"\n        }\n      ],\n      \"validation_rules\": {\n        \"custom_rule_1\": \"validation logic\",\n        \"custom_rule_2\": \"additional validation\"\n      }\n    },\n    \"metadata\": {\n      \"created_by\": \"user\",\n      \"purpose\": \"dataset purpose\",\n      \"version\": \"1.0\"\n    }\n  },\n  \"examples\": [\n    {\n      \"data\": {\n        \"id\": \"unique_identifier\",\n        \"features\": {\n          \"text_feature\": \"sample text\",\n          \"numeric_features\": [1.0, 2.5, -0.3],\n          \"categorical_feature\": \"category_a\"\n        },\n        \"labels\": {\n          \"primary_label\": \"positive\",\n          \"confidence_score\": 0.87,\n          \"secondary_labels\": [\"helpful\", \"informative\"]\n        },\n        \"annotations\": {\n          \"annotator_id\": \"expert_1\",\n          \"annotation_time\": \"2024-01-15T10:30:00Z\",\n          \"quality_score\": 9\n        }\n      },\n      \"schema\": {\n        \"fields\": [\n          {\"name\": \"id\", \"type\": \"string\", \"required\": true, \"description\": \"Unique identifier\"},\n          {\"name\": \"features\", \"type\": \"object\", \"required\": true, \"description\": \"Input features\"},\n          {\"name\": \"labels\", \"type\": \"object\", \"required\": true, \"description\": \"Target labels\"},\n          {\"name\": \"annotations\", \"type\": \"object\", \"required\": false, \"description\": \"Annotation metadata\"}\n        ],\n        \"validation_rules\": {\n          \"id_format\": \"must be non-empty string\",\n          \"confidence_range\": \"must be between 0 and 1\"\n        }\n      },\n      \"metadata\": {\n        \"created_by\": \"data_scientist\",\n        \"purpose\": \"multi_modal_classification\",\n        \"version\": \"1.0\"\n      }\n    }\n  ]\n}"
  },
  {
    "path": "skills/hugging-face-datasets/templates/qa.json",
    "content": "{\n  \"type\": \"qa\",\n  \"description\": \"Question-answering datasets for reading comprehension, knowledge QA, and factual question answering\",\n  \"system_prompt\": \"Generate high-quality question-answer pairs with clear, informative answers. Include context when needed and ensure questions are well-formed and answerable. Cover diverse topics and question types.\",\n  \"validation_schema\": {\n    \"required_fields\": [\"question\", \"answer\"],\n    \"recommended_fields\": [\"context\", \"answer_type\", \"difficulty\", \"topic\", \"source\"],\n    \"field_types\": {\n      \"question\": \"string\",\n      \"answer\": \"string|array\",\n      \"context\": \"string\",\n      \"answer_type\": \"enum:factual,explanatory,opinion,yes_no,multiple_choice\",\n      \"difficulty\": \"enum:easy,medium,hard\",\n      \"topic\": \"string\",\n      \"source\": \"string\"\n    }\n  },\n  \"example_structure\": {\n    \"question\": \"What is the question being asked?\",\n    \"answer\": \"The complete answer to the question\",\n    \"context\": \"Additional context or passage (if needed)\",\n    \"answer_type\": \"factual|explanatory|opinion|yes_no|multiple_choice\",\n    \"difficulty\": \"easy|medium|hard\",\n    \"topic\": \"subject_area\",\n    \"source\": \"where_this_came_from\"\n  },\n  \"examples\": [\n    {\n      \"question\": \"What is the capital of France?\",\n      \"answer\": \"Paris\",\n      \"answer_type\": \"factual\",\n      \"difficulty\": \"easy\",\n      \"topic\": \"geography\",\n      \"source\": \"general_knowledge\"\n    },\n    {\n      \"question\": \"How does photosynthesis work?\",\n      \"answer\": \"Photosynthesis is the process by which plants use sunlight, carbon dioxide, and water to produce glucose and oxygen. Chlorophyll in plant cells captures light energy, which drives chemical reactions that convert CO2 and H2O into C6H12O6 (glucose) and O2. This process occurs primarily in the chloroplasts of leaf cells.\",\n      \"answer_type\": \"explanatory\",\n      \"difficulty\": \"medium\",\n      \"topic\": \"biology\",\n      \"source\": \"educational_content\"\n    },\n    {\n      \"question\": \"Based on the passage, what was the main cause of the economic downturn?\",\n      \"answer\": \"According to the passage, the main cause was the sudden drop in consumer confidence following the bank failures, which led to reduced spending and business investment.\",\n      \"context\": \"The economic downturn of 2008 began when several major banks failed due to subprime mortgage losses. This triggered a crisis of confidence among consumers and businesses, leading to decreased spending and investment. The ripple effects spread throughout the global economy.\",\n      \"answer_type\": \"explanatory\",\n      \"difficulty\": \"medium\",\n      \"topic\": \"economics\",\n      \"source\": \"reading_comprehension\"\n    }\n  ]\n}"
  },
  {
    "path": "skills/hugging-face-datasets/templates/tabular.json",
    "content": "{\n  \"type\": \"tabular\",\n  \"description\": \"Structured tabular data for regression, classification, or analysis tasks\",\n  \"system_prompt\": \"Generate structured tabular data with clear column definitions and appropriate data types. Ensure realistic value ranges and relationships between variables. Include proper metadata for each column.\",\n  \"validation_schema\": {\n    \"required_fields\": [\"data\", \"columns\"],\n    \"recommended_fields\": [\"target\", \"metadata\", \"description\"],\n    \"field_types\": {\n      \"data\": \"array\",\n      \"columns\": \"array\",\n      \"target\": \"string\",\n      \"metadata\": \"object\",\n      \"description\": \"string\"\n    }\n  },\n  \"example_structure\": {\n    \"columns\": [\n      {\n        \"name\": \"column_name\",\n        \"type\": \"numeric|categorical|text|datetime\",\n        \"description\": \"What this column represents\",\n        \"nullable\": true\n      }\n    ],\n    \"data\": [\n      {\"column1\": \"value1\", \"column2\": \"value2\"},\n      {\"column1\": \"value3\", \"column2\": \"value4\"}\n    ],\n    \"target\": \"target_column_name\",\n    \"metadata\": {\n      \"rows\": 1000,\n      \"features\": 10,\n      \"task\": \"classification|regression|clustering\"\n    },\n    \"description\": \"Description of the dataset purpose\"\n  },\n  \"examples\": [\n    {\n      \"columns\": [\n        {\n          \"name\": \"age\",\n          \"type\": \"numeric\",\n          \"description\": \"Customer age in years\",\n          \"nullable\": false\n        },\n        {\n          \"name\": \"income\",\n          \"type\": \"numeric\",\n          \"description\": \"Annual income in USD\",\n          \"nullable\": false\n        },\n        {\n          \"name\": \"education\",\n          \"type\": \"categorical\",\n          \"description\": \"Education level\",\n          \"nullable\": false\n        },\n        {\n          \"name\": \"purchased\",\n          \"type\": \"categorical\",\n          \"description\": \"Whether customer made a purchase\",\n          \"nullable\": false\n        }\n      ],\n      \"data\": [\n        {\"age\": 25, \"income\": 45000, \"education\": \"bachelor\", \"purchased\": \"yes\"},\n        {\"age\": 34, \"income\": 67000, \"education\": \"master\", \"purchased\": \"yes\"},\n        {\"age\": 19, \"income\": 23000, \"education\": \"high_school\", \"purchased\": \"no\"},\n        {\"age\": 42, \"income\": 85000, \"education\": \"bachelor\", \"purchased\": \"yes\"},\n        {\"age\": 28, \"income\": 52000, \"education\": \"bachelor\", \"purchased\": \"no\"}\n      ],\n      \"target\": \"purchased\",\n      \"metadata\": {\n        \"rows\": 5,\n        \"features\": 3,\n        \"task\": \"classification\"\n      },\n      \"description\": \"Customer purchase prediction based on demographics\"\n    }\n  ]\n}"
  },
  {
    "path": "skills/hugging-face-evaluation/SKILL.md",
    "content": "---\nname: hugging-face-evaluation\ndescription: Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.\n---\n\n# Overview\nThis skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data:\n- Extracting existing evaluation tables from README content\n- Importing benchmark scores from Artificial Analysis\n- Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai)\n\n## Integration with HF Ecosystem\n- **Model Cards**: Updates model-index metadata for leaderboard integration\n- **Artificial Analysis**: Direct API integration for benchmark imports\n- **Papers with Code**: Compatible with their model-index specification\n- **Jobs**: Run evaluations directly on Hugging Face Jobs with `uv` integration\n- **vLLM**: Efficient GPU inference for custom model evaluation\n- **lighteval**: HuggingFace's evaluation library with vLLM/accelerate backends\n- **inspect-ai**: UK AI Safety Institute's evaluation framework\n\n# Version\n1.3.0\n\n# Dependencies\n\n## Core Dependencies\n- huggingface_hub>=0.26.0\n- markdown-it-py>=3.0.0\n- python-dotenv>=1.2.1\n- pyyaml>=6.0.3\n- requests>=2.32.5\n- re (built-in)\n\n## Inference Provider Evaluation\n- inspect-ai>=0.3.0\n- inspect-evals\n- openai\n\n## vLLM Custom Model Evaluation (GPU required)\n- lighteval[accelerate,vllm]>=0.6.0\n- vllm>=0.4.0\n- torch>=2.0.0\n- transformers>=4.40.0\n- accelerate>=0.30.0\n\nNote: vLLM dependencies are installed automatically via PEP 723 script headers when using `uv run`.\n\n# IMPORTANT: Using This Skill\n\n## ⚠️ CRITICAL: Check for Existing PRs Before Creating New Ones\n\n**Before creating ANY pull request with `--create-pr`, you MUST check for existing open PRs:**\n\n```bash\nuv run scripts/evaluation_manager.py get-prs --repo-id \"username/model-name\"\n```\n\n**If open PRs exist:**\n1. **DO NOT create a new PR** - this creates duplicate work for maintainers\n2. **Warn the user** that open PRs already exist\n3. **Show the user** the existing PR URLs so they can review them\n4. Only proceed if the user explicitly confirms they want to create another PR\n\nThis prevents spamming model repositories with duplicate evaluation PRs.\n\n---\n\n> **All paths are relative to the directory containing this SKILL.md\nfile.**\n> Before running any script, first `cd` to that directory or use the full\npath.\n\n\n**Use `--help` for the latest workflow guidance.** Works with plain Python or `uv run`:\n```bash\nuv run scripts/evaluation_manager.py --help\nuv run scripts/evaluation_manager.py inspect-tables --help\nuv run scripts/evaluation_manager.py extract-readme --help\n```\nKey workflow (matches CLI help):\n\n1) `get-prs` → check for existing open PRs first\n2) `inspect-tables` → find table numbers/columns  \n3) `extract-readme --table N` → prints YAML by default  \n4) add `--apply` (push) or `--create-pr` to write changes\n\n# Core Capabilities\n\n## 1. Inspect and Extract Evaluation Tables from README\n- **Inspect Tables**: Use `inspect-tables` to see all tables in a README with structure, columns, and sample rows\n- **Parse Markdown Tables**: Accurate parsing using markdown-it-py (ignores code blocks and examples)\n- **Table Selection**: Use `--table N` to extract from a specific table (required when multiple tables exist)\n- **Format Detection**: Recognize common formats (benchmarks as rows, columns, or comparison tables with multiple models)\n- **Column Matching**: Automatically identify model columns/rows; prefer `--model-column-index` (index from inspect output). Use `--model-name-override` only with exact column header text.\n- **YAML Generation**: Convert selected table to model-index YAML format\n- **Task Typing**: `--task-type` sets the `task.type` field in model-index output (e.g., `text-generation`, `summarization`)\n\n## 2. Import from Artificial Analysis\n- **API Integration**: Fetch benchmark scores directly from Artificial Analysis\n- **Automatic Formatting**: Convert API responses to model-index format\n- **Metadata Preservation**: Maintain source attribution and URLs\n- **PR Creation**: Automatically create pull requests with evaluation updates\n\n## 3. Model-Index Management\n- **YAML Generation**: Create properly formatted model-index entries\n- **Merge Support**: Add evaluations to existing model cards without overwriting\n- **Validation**: Ensure compliance with Papers with Code specification\n- **Batch Operations**: Process multiple models efficiently\n\n## 4. Run Evaluations on HF Jobs (Inference Providers)\n- **Inspect-AI Integration**: Run standard evaluations using the `inspect-ai` library\n- **UV Integration**: Seamlessly run Python scripts with ephemeral dependencies on HF infrastructure\n- **Zero-Config**: No Dockerfiles or Space management required\n- **Hardware Selection**: Configure CPU or GPU hardware for the evaluation job\n- **Secure Execution**: Handles API tokens safely via secrets passed through the CLI\n\n## 5. Run Custom Model Evaluations with vLLM (NEW)\n\n⚠️ **Important:** This approach is only possible on devices with `uv` installed and sufficient GPU memory.\n**Benefits:** No need to use `hf_jobs()` MCP tool, can run scripts directly in terminal\n**When to use:** User working in local device directly  when GPU is available\n\n### Before running the script\n\n- check the script path\n- check uv is installed\n- check gpu is available with `nvidia-smi`\n\n### Running the script\n\n```bash\nuv run scripts/train_sft_example.py\n```\n### Features\n\n- **vLLM Backend**: High-performance GPU inference (5-10x faster than standard HF methods)\n- **lighteval Framework**: HuggingFace's evaluation library with Open LLM Leaderboard tasks\n- **inspect-ai Framework**: UK AI Safety Institute's evaluation library\n- **Standalone or Jobs**: Run locally or submit to HF Jobs infrastructure\n\n# Usage Instructions\n\nThe skill includes Python scripts in `scripts/` to perform operations.\n\n### Prerequisites\n- Preferred: use `uv run` (PEP 723 header auto-installs deps)\n- Optional manual fallback: `uv pip install huggingface-hub markdown-it-py python-dotenv pyyaml requests`\n- Set `HF_TOKEN` environment variable with Write-access token\n- For Artificial Analysis: Set `AA_API_KEY` environment variable\n- `.env` is loaded automatically if `python-dotenv` is installed\n\n### Method 1: Extract from README (CLI workflow)\n\nRecommended flow (matches `--help`):\n```bash\n# 1) Inspect tables to get table numbers and column hints\nuv run scripts/evaluation_manager.py inspect-tables --repo-id \"username/model\"\n\n# 2) Extract a specific table (prints YAML by default)\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"username/model\" \\\n  --table 1 \\\n  [--model-column-index <column index shown by inspect-tables>] \\\n  [--model-name-override \"<column header/model name>\"]  # use exact header text if you can't use the index\n\n# 3) Apply changes (push or PR)\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"username/model\" \\\n  --table 1 \\\n  --apply       # push directly\n# or\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"username/model\" \\\n  --table 1 \\\n  --create-pr   # open a PR\n```\n\nValidation checklist:\n- YAML is printed by default; compare against the README table before applying.\n- Prefer `--model-column-index`; if using `--model-name-override`, the column header text must be exact.\n- For transposed tables (models as rows), ensure only one row is extracted.\n\n### Method 2: Import from Artificial Analysis\n\nFetch benchmark scores from Artificial Analysis API and add them to a model card.\n\n**Basic Usage:**\n```bash\nAA_API_KEY=\"your-api-key\" uv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"anthropic\" \\\n  --model-name \"claude-sonnet-4\" \\\n  --repo-id \"username/model-name\"\n```\n\n**With Environment File:**\n```bash\n# Create .env file\necho \"AA_API_KEY=your-api-key\" >> .env\necho \"HF_TOKEN=your-hf-token\" >> .env\n\n# Run import\nuv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"anthropic\" \\\n  --model-name \"claude-sonnet-4\" \\\n  --repo-id \"username/model-name\"\n```\n\n**Create Pull Request:**\n```bash\nuv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"anthropic\" \\\n  --model-name \"claude-sonnet-4\" \\\n  --repo-id \"username/model-name\" \\\n  --create-pr\n```\n\n### Method 3: Run Evaluation Job\n\nSubmit an evaluation job on Hugging Face infrastructure using the `hf jobs uv run` CLI.\n\n**Direct CLI Usage:**\n```bash\nHF_TOKEN=$HF_TOKEN \\\nhf jobs uv run hf-evaluation/scripts/inspect_eval_uv.py \\\n  --flavor cpu-basic \\\n  --secret HF_TOKEN=$HF_TOKEN \\\n  -- --model \"meta-llama/Llama-2-7b-hf\" \\\n     --task \"mmlu\"\n```\n\n**GPU Example (A10G):**\n```bash\nHF_TOKEN=$HF_TOKEN \\\nhf jobs uv run hf-evaluation/scripts/inspect_eval_uv.py \\\n  --flavor a10g-small \\\n  --secret HF_TOKEN=$HF_TOKEN \\\n  -- --model \"meta-llama/Llama-2-7b-hf\" \\\n     --task \"gsm8k\"\n```\n\n**Python Helper (optional):**\n```bash\nuv run scripts/run_eval_job.py \\\n  --model \"meta-llama/Llama-2-7b-hf\" \\\n  --task \"mmlu\" \\\n  --hardware \"t4-small\"\n```\n\n### Method 4: Run Custom Model Evaluation with vLLM\n\nEvaluate custom HuggingFace models directly on GPU using vLLM or accelerate backends. These scripts are **separate from inference provider scripts** and run models locally on the job's hardware.\n\n#### When to Use vLLM Evaluation (vs Inference Providers)\n\n| Feature | vLLM Scripts | Inference Provider Scripts |\n|---------|-------------|---------------------------|\n| Model access | Any HF model | Models with API endpoints |\n| Hardware | Your GPU (or HF Jobs GPU) | Provider's infrastructure |\n| Cost | HF Jobs compute cost | API usage fees |\n| Speed | vLLM optimized | Depends on provider |\n| Offline | Yes (after download) | No |\n\n#### Option A: lighteval with vLLM Backend\n\nlighteval is HuggingFace's evaluation library, supporting Open LLM Leaderboard tasks.\n\n**Standalone (local GPU):**\n```bash\n# Run MMLU 5-shot with vLLM\nuv run scripts/lighteval_vllm_uv.py \\\n  --model meta-llama/Llama-3.2-1B \\\n  --tasks \"leaderboard|mmlu|5\"\n\n# Run multiple tasks\nuv run scripts/lighteval_vllm_uv.py \\\n  --model meta-llama/Llama-3.2-1B \\\n  --tasks \"leaderboard|mmlu|5,leaderboard|gsm8k|5\"\n\n# Use accelerate backend instead of vLLM\nuv run scripts/lighteval_vllm_uv.py \\\n  --model meta-llama/Llama-3.2-1B \\\n  --tasks \"leaderboard|mmlu|5\" \\\n  --backend accelerate\n\n# Chat/instruction-tuned models\nuv run scripts/lighteval_vllm_uv.py \\\n  --model meta-llama/Llama-3.2-1B-Instruct \\\n  --tasks \"leaderboard|mmlu|5\" \\\n  --use-chat-template\n```\n\n**Via HF Jobs:**\n```bash\nhf jobs uv run scripts/lighteval_vllm_uv.py \\\n  --flavor a10g-small \\\n  --secrets HF_TOKEN=$HF_TOKEN \\\n  -- --model meta-llama/Llama-3.2-1B \\\n     --tasks \"leaderboard|mmlu|5\"\n```\n\n**lighteval Task Format:**\nTasks use the format `suite|task|num_fewshot`:\n- `leaderboard|mmlu|5` - MMLU with 5-shot\n- `leaderboard|gsm8k|5` - GSM8K with 5-shot\n- `lighteval|hellaswag|0` - HellaSwag zero-shot\n- `leaderboard|arc_challenge|25` - ARC-Challenge with 25-shot\n\n**Finding Available Tasks:**\nThe complete list of available lighteval tasks can be found at:\nhttps://github.com/huggingface/lighteval/blob/main/examples/tasks/all_tasks.txt\n\nThis file contains all supported tasks in the format `suite|task|num_fewshot|0` (the trailing `0` is a version flag and can be ignored). Common suites include:\n- `leaderboard` - Open LLM Leaderboard tasks (MMLU, GSM8K, ARC, HellaSwag, etc.)\n- `lighteval` - Additional lighteval tasks\n- `bigbench` - BigBench tasks\n- `original` - Original benchmark tasks\n\nTo use a task from the list, extract the `suite|task|num_fewshot` portion (without the trailing `0`) and pass it to the `--tasks` parameter. For example:\n- From file: `leaderboard|mmlu|0` → Use: `leaderboard|mmlu|0` (or change to `5` for 5-shot)\n- From file: `bigbench|abstract_narrative_understanding|0` → Use: `bigbench|abstract_narrative_understanding|0`\n- From file: `lighteval|wmt14:hi-en|0` → Use: `lighteval|wmt14:hi-en|0`\n\nMultiple tasks can be specified as comma-separated values: `--tasks \"leaderboard|mmlu|5,leaderboard|gsm8k|5\"`\n\n#### Option B: inspect-ai with vLLM Backend\n\ninspect-ai is the UK AI Safety Institute's evaluation framework.\n\n**Standalone (local GPU):**\n```bash\n# Run MMLU with vLLM\nuv run scripts/inspect_vllm_uv.py \\\n  --model meta-llama/Llama-3.2-1B \\\n  --task mmlu\n\n# Use HuggingFace Transformers backend\nuv run scripts/inspect_vllm_uv.py \\\n  --model meta-llama/Llama-3.2-1B \\\n  --task mmlu \\\n  --backend hf\n\n# Multi-GPU with tensor parallelism\nuv run scripts/inspect_vllm_uv.py \\\n  --model meta-llama/Llama-3.2-70B \\\n  --task mmlu \\\n  --tensor-parallel-size 4\n```\n\n**Via HF Jobs:**\n```bash\nhf jobs uv run scripts/inspect_vllm_uv.py \\\n  --flavor a10g-small \\\n  --secrets HF_TOKEN=$HF_TOKEN \\\n  -- --model meta-llama/Llama-3.2-1B \\\n     --task mmlu\n```\n\n**Available inspect-ai Tasks:**\n- `mmlu` - Massive Multitask Language Understanding\n- `gsm8k` - Grade School Math\n- `hellaswag` - Common sense reasoning\n- `arc_challenge` - AI2 Reasoning Challenge\n- `truthfulqa` - TruthfulQA benchmark\n- `winogrande` - Winograd Schema Challenge\n- `humaneval` - Code generation\n\n#### Option C: Python Helper Script\n\nThe helper script auto-selects hardware and simplifies job submission:\n\n```bash\n# Auto-detect hardware based on model size\nuv run scripts/run_vllm_eval_job.py \\\n  --model meta-llama/Llama-3.2-1B \\\n  --task \"leaderboard|mmlu|5\" \\\n  --framework lighteval\n\n# Explicit hardware selection\nuv run scripts/run_vllm_eval_job.py \\\n  --model meta-llama/Llama-3.2-70B \\\n  --task mmlu \\\n  --framework inspect \\\n  --hardware a100-large \\\n  --tensor-parallel-size 4\n\n# Use HF Transformers backend\nuv run scripts/run_vllm_eval_job.py \\\n  --model microsoft/phi-2 \\\n  --task mmlu \\\n  --framework inspect \\\n  --backend hf\n```\n\n**Hardware Recommendations:**\n| Model Size | Recommended Hardware |\n|------------|---------------------|\n| < 3B params | `t4-small` |\n| 3B - 13B | `a10g-small` |\n| 13B - 34B | `a10g-large` |\n| 34B+ | `a100-large` |\n\n### Commands Reference\n\n**Top-level help and version:**\n```bash\nuv run scripts/evaluation_manager.py --help\nuv run scripts/evaluation_manager.py --version\n```\n\n**Inspect Tables (start here):**\n```bash\nuv run scripts/evaluation_manager.py inspect-tables --repo-id \"username/model-name\"\n```\n\n**Extract from README:**\n```bash\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"username/model-name\" \\\n  --table N \\\n  [--model-column-index N] \\\n  [--model-name-override \"Exact Column Header or Model Name\"] \\\n  [--task-type \"text-generation\"] \\\n  [--dataset-name \"Custom Benchmarks\"] \\\n  [--apply | --create-pr]\n```\n\n**Import from Artificial Analysis:**\n```bash\nAA_API_KEY=... uv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"creator-name\" \\\n  --model-name \"model-slug\" \\\n  --repo-id \"username/model-name\" \\\n  [--create-pr]\n```\n\n**View / Validate:**\n```bash\nuv run scripts/evaluation_manager.py show --repo-id \"username/model-name\"\nuv run scripts/evaluation_manager.py validate --repo-id \"username/model-name\"\n```\n\n**Check Open PRs (ALWAYS run before --create-pr):**\n```bash\nuv run scripts/evaluation_manager.py get-prs --repo-id \"username/model-name\"\n```\nLists all open pull requests for the model repository. Shows PR number, title, author, date, and URL.\n\n**Run Evaluation Job (Inference Providers):**\n```bash\nhf jobs uv run scripts/inspect_eval_uv.py \\\n  --flavor \"cpu-basic|t4-small|...\" \\\n  --secret HF_TOKEN=$HF_TOKEN \\\n  -- --model \"model-id\" \\\n     --task \"task-name\"\n```\n\nor use the Python helper:\n\n```bash\nuv run scripts/run_eval_job.py \\\n  --model \"model-id\" \\\n  --task \"task-name\" \\\n  --hardware \"cpu-basic|t4-small|...\"\n```\n\n**Run vLLM Evaluation (Custom Models):**\n```bash\n# lighteval with vLLM\nhf jobs uv run scripts/lighteval_vllm_uv.py \\\n  --flavor \"a10g-small\" \\\n  --secrets HF_TOKEN=$HF_TOKEN \\\n  -- --model \"model-id\" \\\n     --tasks \"leaderboard|mmlu|5\"\n\n# inspect-ai with vLLM\nhf jobs uv run scripts/inspect_vllm_uv.py \\\n  --flavor \"a10g-small\" \\\n  --secrets HF_TOKEN=$HF_TOKEN \\\n  -- --model \"model-id\" \\\n     --task \"mmlu\"\n\n# Helper script (auto hardware selection)\nuv run scripts/run_vllm_eval_job.py \\\n  --model \"model-id\" \\\n  --task \"leaderboard|mmlu|5\" \\\n  --framework lighteval\n```\n\n### Model-Index Format\n\nThe generated model-index follows this structure:\n\n```yaml\nmodel-index:\n  - name: Model Name\n    results:\n      - task:\n          type: text-generation\n        dataset:\n          name: Benchmark Dataset\n          type: benchmark_type\n        metrics:\n          - name: MMLU\n            type: mmlu\n            value: 85.2\n          - name: HumanEval\n            type: humaneval\n            value: 72.5\n        source:\n          name: Source Name\n          url: https://source-url.com\n```\n\nWARNING: Do not use markdown formatting in the model name. Use the exact name from the table. Only use urls in the source.url field.\n\n### Error Handling\n- **Table Not Found**: Script will report if no evaluation tables are detected\n- **Invalid Format**: Clear error messages for malformed tables\n- **API Errors**: Retry logic for transient Artificial Analysis API failures\n- **Token Issues**: Validation before attempting updates\n- **Merge Conflicts**: Preserves existing model-index entries when adding new ones\n- **Space Creation**: Handles naming conflicts and hardware request failures gracefully\n\n### Best Practices\n\n1. **Check for existing PRs first**: Run `get-prs` before creating any new PR to avoid duplicates\n2. **Always start with `inspect-tables`**: See table structure and get the correct extraction command\n3. **Use `--help` for guidance**: Run `inspect-tables --help` to see the complete workflow\n4. **Preview first**: Default behavior prints YAML; review it before using `--apply` or `--create-pr`\n5. **Verify extracted values**: Compare YAML output against the README table manually\n6. **Use `--table N` for multi-table READMEs**: Required when multiple evaluation tables exist\n7. **Use `--model-name-override` for comparison tables**: Copy the exact column header from `inspect-tables` output\n8. **Create PRs for Others**: Use `--create-pr` when updating models you don't own\n9. **One model per repo**: Only add the main model's results to model-index\n10. **No markdown in YAML names**: The model name field in YAML should be plain text\n\n### Model Name Matching\n\nWhen extracting evaluation tables with multiple models (either as columns or rows), the script uses **exact normalized token matching**:\n\n- Removes markdown formatting (bold `**`, links `[]()`  )\n- Normalizes names (lowercase, replace `-` and `_` with spaces)\n- Compares token sets: `\"OLMo-3-32B\"` → `{\"olmo\", \"3\", \"32b\"}` matches `\"**Olmo 3 32B**\"` or `\"[Olmo-3-32B](...)`\n- Only extracts if tokens match exactly (handles different word orders and separators)\n- Fails if no exact match found (rather than guessing from similar names)\n\n**For column-based tables** (benchmarks as rows, models as columns):\n- Finds the column header matching the model name\n- Extracts scores from that column only\n\n**For transposed tables** (models as rows, benchmarks as columns):\n- Finds the row in the first column matching the model name\n- Extracts all benchmark scores from that row only\n\nThis ensures only the correct model's scores are extracted, never unrelated models or training checkpoints. \n\n### Common Patterns\n\n**Update Your Own Model:**\n```bash\n# Extract from README and push directly\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/your-model\" \\\n  --task-type \"text-generation\"\n```\n\n**Update Someone Else's Model (Full Workflow):**\n```bash\n# Step 1: ALWAYS check for existing PRs first\nuv run scripts/evaluation_manager.py get-prs \\\n  --repo-id \"other-username/their-model\"\n\n# Step 2: If NO open PRs exist, proceed with creating one\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"other-username/their-model\" \\\n  --create-pr\n\n# If open PRs DO exist:\n# - Warn the user about existing PRs\n# - Show them the PR URLs\n# - Do NOT create a new PR unless user explicitly confirms\n```\n\n**Import Fresh Benchmarks:**\n```bash\n# Step 1: Check for existing PRs\nuv run scripts/evaluation_manager.py get-prs \\\n  --repo-id \"anthropic/claude-sonnet-4\"\n\n# Step 2: If no PRs, import from Artificial Analysis\nAA_API_KEY=... uv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"anthropic\" \\\n  --model-name \"claude-sonnet-4\" \\\n  --repo-id \"anthropic/claude-sonnet-4\" \\\n  --create-pr\n```\n\n### Troubleshooting\n\n**Issue**: \"No evaluation tables found in README\"\n- **Solution**: Check if README contains markdown tables with numeric scores\n\n**Issue**: \"Could not find model 'X' in transposed table\"\n- **Solution**: The script will display available models. Use `--model-name-override` with the exact name from the list\n- **Example**: `--model-name-override \"**Olmo 3-32B**\"`\n\n**Issue**: \"AA_API_KEY not set\"\n- **Solution**: Set environment variable or add to .env file\n\n**Issue**: \"Token does not have write access\"\n- **Solution**: Ensure HF_TOKEN has write permissions for the repository\n\n**Issue**: \"Model not found in Artificial Analysis\"\n- **Solution**: Verify creator-slug and model-name match API values\n\n**Issue**: \"Payment required for hardware\"\n- **Solution**: Add a payment method to your Hugging Face account to use non-CPU hardware\n\n**Issue**: \"vLLM out of memory\" or CUDA OOM\n- **Solution**: Use a larger hardware flavor, reduce `--gpu-memory-utilization`, or use `--tensor-parallel-size` for multi-GPU\n\n**Issue**: \"Model architecture not supported by vLLM\"\n- **Solution**: Use `--backend hf` (inspect-ai) or `--backend accelerate` (lighteval) for HuggingFace Transformers\n\n**Issue**: \"Trust remote code required\"\n- **Solution**: Add `--trust-remote-code` flag for models with custom code (e.g., Phi-2, Qwen)\n\n**Issue**: \"Chat template not found\"\n- **Solution**: Only use `--use-chat-template` for instruction-tuned models that include a chat template\n\n### Integration Examples\n\n**Python Script Integration:**\n```python\nimport subprocess\nimport os\n\ndef update_model_evaluations(repo_id, readme_content):\n    \"\"\"Update model card with evaluations from README.\"\"\"\n    result = subprocess.run([\n        \"python\", \"scripts/evaluation_manager.py\",\n        \"extract-readme\",\n        \"--repo-id\", repo_id,\n        \"--create-pr\"\n    ], capture_output=True, text=True)\n\n    if result.returncode == 0:\n        print(f\"Successfully updated {repo_id}\")\n    else:\n        print(f\"Error: {result.stderr}\")\n```\n"
  },
  {
    "path": "skills/hugging-face-evaluation/examples/USAGE_EXAMPLES.md",
    "content": "# Usage Examples\n\nThis document provides practical examples for both methods of adding evaluations to HuggingFace model cards.\n\n## Table of Contents\n1. [Setup](#setup)\n2. [Method 1: Extract from README](#method-1-extract-from-readme)\n3. [Method 2: Import from Artificial Analysis](#method-2-import-from-artificial-analysis)\n4. [Standalone vs Integrated](#standalone-vs-integrated)\n5. [Common Workflows](#common-workflows)\n\n## Setup\n\n### Initial Configuration\n\n```bash\n# Navigate to skill directory\ncd hf_evaluation_skill\n\n\n# Configure environment variables\ncp examples/.env.example .env\n# Edit .env with your tokens\n```\n\nYour `.env` file should contain:\n```env\nHF_TOKEN=hf_your_write_token_here\nAA_API_KEY=aa_your_api_key_here  # Optional for AA imports\n```\n\n### Verify Installation\n\n```bash\nuv run scripts/test_extraction.py\n```\n\n## Method 1: Extract from README\n\nExtract evaluation tables from your model's existing README.\n\n### Basic Extraction\n\n```bash\n# Preview what will be extracted (dry run)\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"meta-llama/Llama-3.3-70B-Instruct\" \\\n  --dry-run\n```\n\n### Apply Extraction to Your Model\n\n```bash\n# Extract and update model card directly\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/your-model-7b\"\n```\n\n### Custom Task and Dataset Names\n\n```bash\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/your-model-7b\" \\\n  --task-type \"text-generation\" \\\n  --dataset-name \"Standard Benchmarks\" \\\n  --dataset-type \"llm_benchmarks\"\n```\n\n### Create Pull Request (for models you don't own)\n\n```bash\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"organization/community-model\" \\\n  --create-pr\n```\n\n### Example README Format\n\nYour model README should contain tables like:\n\n```markdown\n## Evaluation Results\n\n| Benchmark     | Score |\n|---------------|-------|\n| MMLU          | 85.2  |\n| HumanEval     | 72.5  |\n| GSM8K         | 91.3  |\n| HellaSwag     | 88.9  |\n```\n\n## Method 2: Import from Artificial Analysis\n\nFetch benchmark scores directly from Artificial Analysis API.\n\n### Integrated Approach (Recommended)\n\n```bash\n# Import scores for Claude Sonnet 4.5\nuv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"anthropic\" \\\n  --model-name \"claude-sonnet-4\" \\\n  --repo-id \"your-username/claude-mirror\"\n```\n\n### With Pull Request\n\n```bash\n# Create PR instead of direct commit\nuv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"openai\" \\\n  --model-name \"gpt-4\" \\\n  --repo-id \"your-username/gpt-4-mirror\" \\\n  --create-pr\n```\n\n### Standalone Script\n\nFor simple, one-off imports, use the standalone script:\n\n```bash\n# Navigate to examples directory\ncd examples\n\n# Run standalone script\nAA_API_KEY=\"your-key\" HF_TOKEN=\"your-token\" \\\nuv run artificial_analysis_to_hub.py \\\n  --creator-slug \"anthropic\" \\\n  --model-name \"claude-sonnet-4\" \\\n  --repo-id \"your-username/your-repo\"\n```\n\n### Finding Creator Slug and Model Name\n\n1. Visit [Artificial Analysis](https://artificialanalysis.ai/)\n2. Navigate to the model you want to import\n3. The URL format is: `https://artificialanalysis.ai/models/{creator-slug}/{model-name}`\n4. Or check their [API documentation](https://artificialanalysis.ai/api)\n\nCommon examples:\n- Anthropic: `--creator-slug \"anthropic\" --model-name \"claude-sonnet-4\"`\n- OpenAI: `--creator-slug \"openai\" --model-name \"gpt-4-turbo\"`\n- Meta: `--creator-slug \"meta\" --model-name \"llama-3-70b\"`\n\n## Standalone vs Integrated\n\n### Standalone Script Features\n- ✓ Simple, single-purpose\n- ✓ Can run via `uv run` from URL\n- ✓ Minimal dependencies\n- ✗ No README extraction\n- ✗ No validation\n- ✗ No dry-run mode\n\n**Use when:** You only need AA imports and want a simple script.\n\n### Integrated Script Features\n- ✓ Both README extraction AND AA import\n- ✓ Validation and show commands\n- ✓ Dry-run preview mode\n- ✓ Better error handling\n- ✓ Merge with existing evaluations\n- ✓ More flexible options\n\n**Use when:** You want full evaluation management capabilities.\n\n## Common Workflows\n\n### Workflow 1: New Model with README Tables\n\nYou've just created a model with evaluation tables in the README.\n\n```bash\n# Step 1: Preview extraction\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/new-model-7b\" \\\n  --dry-run\n\n# Step 2: Apply if it looks good\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/new-model-7b\"\n\n# Step 3: Validate\nuv run scripts/evaluation_manager.py validate \\\n  --repo-id \"your-username/new-model-7b\"\n\n# Step 4: View results\nuv run scripts/evaluation_manager.py show \\\n  --repo-id \"your-username/new-model-7b\"\n```\n\n### Workflow 2: Model Benchmarked on AA\n\nYour model appears on Artificial Analysis with fresh benchmarks.\n\n```bash\n# Import scores and create PR for review\nuv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"your-org\" \\\n  --model-name \"your-model\" \\\n  --repo-id \"your-org/your-model-hf\" \\\n  --create-pr\n```\n\n### Workflow 3: Combine Both Methods\n\nYou have README tables AND AA scores.\n\n```bash\n# Step 1: Extract from README\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/hybrid-model\"\n\n# Step 2: Import from AA (will merge with existing)\nuv run scripts/evaluation_manager.py import-aa \\\n  --creator-slug \"your-org\" \\\n  --model-name \"hybrid-model\" \\\n  --repo-id \"your-username/hybrid-model\"\n\n# Step 3: View combined results\nuv run scripts/evaluation_manager.py show \\\n  --repo-id \"your-username/hybrid-model\"\n```\n\n### Workflow 4: Contributing to Community Models\n\nHelp improve community models by adding missing evaluations.\n\n```bash\n# Find a model with evaluations in README but no model-index\n# Example: community/awesome-7b\n\n# Create PR with extracted evaluations\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"community/awesome-7b\" \\\n  --create-pr\n\n# GitHub will notify the repository owner\n# They can review and merge your PR\n```\n\n### Workflow 5: Batch Processing\n\nUpdate multiple models at once.\n\n```bash\n# Create a list of repos\ncat > models.txt << EOF\nyour-org/model-1-7b\nyour-org/model-2-13b\nyour-org/model-3-70b\nEOF\n\n# Process each\nwhile read repo_id; do\n  echo \"Processing $repo_id...\"\n  uv run scripts/evaluation_manager.py extract-readme \\\n    --repo-id \"$repo_id\"\ndone < models.txt\n```\n\n### Workflow 6: Automated Updates (CI/CD)\n\nSet up automatic evaluation updates using GitHub Actions.\n\n```yaml\n# .github/workflows/update-evals.yml\nname: Update Evaluations Weekly\non:\n  schedule:\n    - cron: '0 0 * * 0'  # Every Sunday\n  workflow_dispatch:  # Manual trigger\n\njobs:\n  update:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n\n      - name: Set up uv\n        uses: astral-sh/setup-uv@v5\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.13'\n\n      - name: Update from Artificial Analysis\n        env:\n          AA_API_KEY: ${{ secrets.AA_API_KEY }}\n          HF_TOKEN: ${{ secrets.HF_TOKEN }}\n        run: |\n          uv run scripts/evaluation_manager.py import-aa \\\n            --creator-slug \"${{ vars.AA_CREATOR_SLUG }}\" \\\n            --model-name \"${{ vars.AA_MODEL_NAME }}\" \\\n            --repo-id \"${{ github.repository }}\" \\\n            --create-pr\n```\n\n## Verification and Validation\n\n### Check Current Evaluations\n\n```bash\nuv run scripts/evaluation_manager.py show \\\n  --repo-id \"your-username/your-model\"\n```\n\n### Validate Format\n\n```bash\nuv run scripts/evaluation_manager.py validate \\\n  --repo-id \"your-username/your-model\"\n```\n\n### View in HuggingFace UI\n\nAfter updating, visit:\n```\nhttps://huggingface.co/your-username/your-model\n```\n\nThe evaluation widget should display your scores automatically.\n\n## Troubleshooting Examples\n\n### Problem: No tables found\n\n```bash\n# Check what tables exist in your README\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/your-model\" \\\n  --dry-run\n\n# If no output, ensure your README has markdown tables with numeric scores\n```\n\n### Problem: AA model not found\n\n```bash\n# Verify the creator and model slugs\n# Check the AA website URL or API directly\ncurl -H \"x-api-key: $AA_API_KEY\" \\\n  https://artificialanalysis.ai/api/v2/data/llms/models | jq\n```\n\n### Problem: Token permission error\n\n```bash\n# Verify your token has write access\n# Generate a new token at: https://huggingface.co/settings/tokens\n# Ensure \"Write\" scope is enabled\n```\n\n## Tips and Best Practices\n\n1. **Always dry-run first**: Use `--dry-run` to preview changes\n2. **Use PRs for others' repos**: Always use `--create-pr` for repositories you don't own\n3. **Validate after updates**: Run `validate` to ensure proper formatting\n4. **Keep evaluations current**: Set up automated updates for AA scores\n5. **Document sources**: The tool automatically adds source attribution\n6. **Check the UI**: Always verify the evaluation widget displays correctly\n\n## Getting Help\n\n```bash\n# General help\nuv run scripts/evaluation_manager.py --help\n\n# Command-specific help\nuv run scripts/evaluation_manager.py extract-readme --help\nuv run scripts/evaluation_manager.py import-aa --help\n```\n\nFor issues or questions, consult:\n- `../SKILL.md` - Complete documentation\n- `../README.md` - Troubleshooting guide\n- `../QUICKSTART.md` - Quick start guide\n"
  },
  {
    "path": "skills/hugging-face-evaluation/examples/artificial_analysis_to_hub.py",
    "content": "# /// script\n# requires-python = \">=3.13\"\n# dependencies = [\n#     \"huggingface-hub>=1.1.4\",\n#     \"python-dotenv>=1.2.1\",\n#     \"pyyaml>=6.0.3\",\n#     \"requests>=2.32.5\",\n# ]\n# ///\n\n\"\"\"\nAdd Artificial Analysis evaluations to a Hugging Face model card.\n\nNOTE: This is a standalone reference script. For integrated functionality\nwith additional features (README extraction, validation, etc.), use:\n    ../scripts/evaluation_manager.py import-aa [options]\n\nSTANDALONE USAGE:\nAA_API_KEY=\"<your-api-key>\" HF_TOKEN=\"<your-huggingface-token>\" \\\nuv run artificial_analysis_to_hub.py \\\n--creator-slug <artificial-analysis-creator-slug> \\\n--model-name <artificial-analysis-model-name> \\\n--repo-id <huggingface-repo-id>\n\nINTEGRATED USAGE (Recommended):\nuv run ../scripts/evaluation_manager.py import-aa \\\n--creator-slug <creator-slug> \\\n--model-name <model-name> \\\n--repo-id <repo-id> \\\n[--create-pr]\n\"\"\"\n\nimport argparse\nimport os\n\nimport requests\nimport dotenv\nfrom huggingface_hub import ModelCard\n\ndotenv.load_dotenv()\n\nAPI_KEY = os.getenv(\"AA_API_KEY\")\nHF_TOKEN = os.getenv(\"HF_TOKEN\")\nURL = \"https://artificialanalysis.ai/api/v2/data/llms/models\"\nHEADERS = {\"x-api-key\": API_KEY}\n\nif not API_KEY:\n    raise ValueError(\"AA_API_KEY is not set\")\nif not HF_TOKEN:\n    raise ValueError(\"HF_TOKEN is not set\")\n\n\ndef get_model_evaluations_data(creator_slug, model_name):\n    response = requests.get(URL, headers=HEADERS)\n    response_data = response.json()[\"data\"]\n    for model in response_data:\n        if (\n            model[\"model_creator\"][\"slug\"] == creator_slug\n            and model[\"slug\"] == model_name\n        ):\n            return model\n    raise ValueError(f\"Model {model_name} not found\")\n\n\ndef aa_evaluations_to_model_index(\n    model,\n    dataset_name=\"Artificial Analysis Benchmarks\",\n    dataset_type=\"artificial_analysis\",\n    task_type=\"evaluation\",\n):\n    if not model:\n        raise ValueError(\"Model data is required\")\n\n    model_name = model.get(\"name\", model.get(\"slug\", \"unknown-model\"))\n    evaluations = model.get(\"evaluations\", {})\n\n    metrics = []\n    for key, value in evaluations.items():\n        metrics.append(\n            {\n                \"name\": key.replace(\"_\", \" \").title(),\n                \"type\": key,\n                \"value\": value,\n            }\n        )\n\n    model_index = [\n        {\n            \"name\": model_name,\n            \"results\": [\n                {\n                    \"task\": {\"type\": task_type},\n                    \"dataset\": {\"name\": dataset_name, \"type\": dataset_type},\n                    \"metrics\": metrics,\n                    \"source\": {\n                        \"name\": \"Artificial Analysis API\",\n                        \"url\": \"https://artificialanalysis.ai\",\n                    },\n                }\n            ],\n        }\n    ]\n\n    return model_index\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--creator-slug\", type=str, required=True)\n    parser.add_argument(\"--model-name\", type=str, required=True)\n    parser.add_argument(\"--repo-id\", type=str, required=True)\n    args = parser.parse_args()\n\n    aa_evaluations_data = get_model_evaluations_data(\n        creator_slug=args.creator_slug, model_name=args.model_name\n    )\n\n    model_index = aa_evaluations_to_model_index(model=aa_evaluations_data)\n\n    card = ModelCard.load(args.repo_id)\n    card.data[\"model-index\"] = model_index\n\n    commit_message = (\n        f\"Add Artificial Analysis evaluations for {args.model_name}\"\n    )\n    commit_description = (\n        f\"This commit adds the Artificial Analysis evaluations for the {args.model_name} model to this repository. \"\n        \"To see the scores, visit the [Artificial Analysis](https://artificialanalysis.ai) website.\"\n    )\n\n    card.push_to_hub(\n        args.repo_id,\n        token=HF_TOKEN,\n        commit_message=commit_message,\n        commit_description=commit_description,\n        create_pr=True,\n    )\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-evaluation/examples/example_readme_tables.md",
    "content": "# Example Evaluation Table Formats\n\nThis file shows various formats of evaluation tables that can be extracted from model README files.\n\n## Format 1: Benchmarks as Rows (Most Common)\n\n```markdown\n| Benchmark | Score |\n|-----------|-------|\n| MMLU      | 85.2  |\n| HumanEval | 72.5  |\n| GSM8K     | 91.3  |\n| HellaSwag | 88.9  |\n```\n\n## Format 2: Multiple Metric Columns\n\n```markdown\n| Benchmark | Accuracy | F1 Score |\n|-----------|----------|----------|\n| MMLU      | 85.2     | 0.84     |\n| GSM8K     | 91.3     | 0.91     |\n| DROP      | 78.5     | 0.77     |\n```\n\n## Format 3: Benchmarks as Columns\n\n```markdown\n| MMLU | HumanEval | GSM8K | HellaSwag |\n|------|-----------|-------|-----------|\n| 85.2 | 72.5      | 91.3  | 88.9      |\n```\n\n## Format 4: Percentage Values\n\n```markdown\n| Benchmark     | Score    |\n|---------------|----------|\n| MMLU          | 85.2%    |\n| HumanEval     | 72.5%    |\n| GSM8K         | 91.3%    |\n| TruthfulQA    | 68.7%    |\n```\n\n## Format 5: Mixed Format with Categories\n\n```markdown\n### Reasoning\n\n| Benchmark | Score |\n|-----------|-------|\n| MMLU      | 85.2  |\n| BBH       | 82.4  |\n| GPQA      | 71.3  |\n\n### Coding\n\n| Benchmark | Score |\n|-----------|-------|\n| HumanEval | 72.5  |\n| MBPP      | 78.9  |\n\n### Math\n\n| Benchmark | Score |\n|-----------|-------|\n| GSM8K     | 91.3  |\n| MATH      | 65.8  |\n```\n\n## Format 6: With Additional Columns\n\n```markdown\n| Benchmark | Score | Rank | Notes              |\n|-----------|-------|------|--------------------|\n| MMLU      | 85.2  | #5   | 5-shot             |\n| HumanEval | 72.5  | #8   | pass@1             |\n| GSM8K     | 91.3  | #3   | 8-shot, maj@1      |\n```\n\n## How the Extractor Works\n\nThe script will:\n1. Find all markdown tables in the README\n2. Identify which tables contain evaluation results\n3. Parse the table structure (rows vs columns)\n4. Extract numeric values as scores\n5. Convert to model-index YAML format\n\n## Tips for README Authors\n\nTo ensure your evaluation tables are properly extracted:\n\n1. **Use clear headers**: Include \"Benchmark\", \"Score\", or similar terms\n2. **Keep it simple**: Stick to benchmark name + score columns\n3. **Use standard formats**: Follow markdown table syntax\n4. **Include numeric values**: Ensure scores are parseable numbers\n5. **Be consistent**: Use the same format across multiple tables\n\n## Example Complete README Section\n\n```markdown\n# Model Card for MyModel-7B\n\n## Evaluation Results\n\nOur model was evaluated on several standard benchmarks:\n\n| Benchmark     | Score |\n|---------------|-------|\n| MMLU          | 85.2  |\n| HumanEval     | 72.5  |\n| GSM8K         | 91.3  |\n| HellaSwag     | 88.9  |\n| ARC-Challenge | 81.7  |\n| TruthfulQA    | 68.7  |\n\n### Detailed Results\n\nFor more detailed results and methodology, see our [paper](link).\n```\n\n## Running the Extractor\n\n```bash\n# Extract from this example\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/your-model\" \\\n  --dry-run\n\n# Apply to your model card\nuv run scripts/evaluation_manager.py extract-readme \\\n  --repo-id \"your-username/your-model\" \\\n  --task-type \"text-generation\"\n```\n"
  },
  {
    "path": "skills/hugging-face-evaluation/examples/metric_mapping.json",
    "content": "{\n  \"MMLU\": {\n    \"type\": \"mmlu\",\n    \"name\": \"Massive Multitask Language Understanding\"\n  },\n  \"HumanEval\": {\n    \"type\": \"humaneval\",\n    \"name\": \"Code Generation (HumanEval)\"\n  },\n  \"GSM8K\": {\n    \"type\": \"gsm8k\",\n    \"name\": \"Grade School Math\"\n  },\n  \"HellaSwag\": {\n    \"type\": \"hellaswag\",\n    \"name\": \"HellaSwag Common Sense\"\n  },\n  \"ARC-C\": {\n    \"type\": \"arc_challenge\",\n    \"name\": \"ARC Challenge\"\n  },\n  \"ARC-E\": {\n    \"type\": \"arc_easy\",\n    \"name\": \"ARC Easy\"\n  },\n  \"Winogrande\": {\n    \"type\": \"winogrande\",\n    \"name\": \"Winogrande\"\n  },\n  \"TruthfulQA\": {\n    \"type\": \"truthfulqa\",\n    \"name\": \"TruthfulQA\"\n  },\n  \"GPQA\": {\n    \"type\": \"gpqa\",\n    \"name\": \"Graduate-Level Google-Proof Q&A\"\n  },\n  \"DROP\": {\n    \"type\": \"drop\",\n    \"name\": \"Discrete Reasoning Over Paragraphs\"\n  },\n  \"BBH\": {\n    \"type\": \"bbh\",\n    \"name\": \"Big Bench Hard\"\n  },\n  \"MATH\": {\n    \"type\": \"math\",\n    \"name\": \"MATH Dataset\"\n  }\n}\n"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/evaluation_manager.py",
    "content": "# /// script\n# requires-python = \">=3.13\"\n# dependencies = [\n#     \"huggingface-hub>=1.1.4\",\n#     \"markdown-it-py>=3.0.0\",\n#     \"python-dotenv>=1.2.1\",\n#     \"pyyaml>=6.0.3\",\n#     \"requests>=2.32.5\",\n# ]\n# ///\n\n\"\"\"\nManage evaluation results in Hugging Face model cards.\n\nThis script provides two methods:\n1. Extract evaluation tables from model README files\n2. Import evaluation scores from Artificial Analysis API\n\nBoth methods update the model-index metadata in model cards.\n\"\"\"\n\nimport argparse\nimport os\nimport re\nfrom textwrap import dedent\nfrom typing import Any, Dict, List, Optional, Tuple\n\n\ndef load_env() -> None:\n    \"\"\"Load .env if python-dotenv is available; keep help usable without it.\"\"\"\n    try:\n        import dotenv  # type: ignore\n    except ModuleNotFoundError:\n        return\n    dotenv.load_dotenv()\n\n\ndef require_markdown_it():\n    try:\n        from markdown_it import MarkdownIt  # type: ignore\n    except ModuleNotFoundError as exc:\n        raise ModuleNotFoundError(\n            \"markdown-it-py is required for table parsing. \"\n            \"Run with `uv run ...` or install with `uv pip install markdown-it-py`.\"\n        ) from exc\n    return MarkdownIt\n\n\ndef require_model_card():\n    try:\n        from huggingface_hub import ModelCard  # type: ignore\n    except ModuleNotFoundError as exc:\n        raise ModuleNotFoundError(\n            \"huggingface-hub is required for model card operations. \"\n            \"Run with `uv run ...` or install with `uv pip install huggingface-hub`.\"\n        ) from exc\n    return ModelCard\n\n\ndef require_requests():\n    try:\n        import requests  # type: ignore\n    except ModuleNotFoundError as exc:\n        raise ModuleNotFoundError(\n            \"requests is required for Artificial Analysis import. \"\n            \"Run with `uv run ...` or install with `uv pip install requests`.\"\n        ) from exc\n    return requests\n\n\ndef require_yaml():\n    try:\n        import yaml  # type: ignore\n    except ModuleNotFoundError as exc:\n        raise ModuleNotFoundError(\n            \"PyYAML is required for YAML output. \"\n            \"Run with `uv run ...` or install with `uv pip install pyyaml`.\"\n        ) from exc\n    return yaml\n\n\n# ============================================================================\n# Method 1: Extract Evaluations from README\n# ============================================================================\n\n\ndef extract_tables_from_markdown(markdown_content: str) -> List[str]:\n    \"\"\"Extract all markdown tables from content.\"\"\"\n    # Pattern to match markdown tables\n    table_pattern = r\"(\\|[^\\n]+\\|(?:\\r?\\n\\|[^\\n]+\\|)+)\"\n    tables = re.findall(table_pattern, markdown_content)\n    return tables\n\n\ndef parse_markdown_table(table_str: str) -> Tuple[List[str], List[List[str]]]:\n    \"\"\"\n    Parse a markdown table string into headers and rows.\n\n    Returns:\n        Tuple of (headers, data_rows)\n    \"\"\"\n    lines = [line.strip() for line in table_str.strip().split(\"\\n\")]\n\n    # Remove separator line (the one with dashes)\n    lines = [line for line in lines if not re.match(r\"^\\|[\\s\\-:]+\\|$\", line)]\n\n    if len(lines) < 2:\n        return [], []\n\n    # Parse header\n    header = [cell.strip() for cell in lines[0].split(\"|\")[1:-1]]\n\n    # Parse data rows\n    data_rows = []\n    for line in lines[1:]:\n        cells = [cell.strip() for cell in line.split(\"|\")[1:-1]]\n        if cells:\n            data_rows.append(cells)\n\n    return header, data_rows\n\n\ndef is_evaluation_table(header: List[str], rows: List[List[str]]) -> bool:\n    \"\"\"Determine if a table contains evaluation results.\"\"\"\n    if not header or not rows:\n        return False\n\n    # Check if first column looks like benchmark names\n    benchmark_keywords = [\n        \"benchmark\", \"task\", \"dataset\", \"eval\", \"test\", \"metric\",\n        \"mmlu\", \"humaneval\", \"gsm\", \"hellaswag\", \"arc\", \"winogrande\",\n        \"truthfulqa\", \"boolq\", \"piqa\", \"siqa\"\n    ]\n\n    first_col = header[0].lower()\n    has_benchmark_header = any(keyword in first_col for keyword in benchmark_keywords)\n\n    # Check if there are numeric values in the table\n    has_numeric_values = False\n    for row in rows:\n        for cell in row:\n            try:\n                float(cell.replace(\"%\", \"\").replace(\",\", \"\"))\n                has_numeric_values = True\n                break\n            except ValueError:\n                continue\n        if has_numeric_values:\n            break\n\n    return has_benchmark_header or has_numeric_values\n\n\ndef normalize_model_name(name: str) -> tuple[set[str], str]:\n    \"\"\"\n    Normalize a model name for matching.\n\n    Args:\n        name: Model name to normalize\n\n    Returns:\n        Tuple of (token_set, normalized_string)\n    \"\"\"\n    # Remove markdown formatting\n    cleaned = re.sub(r'\\[([^\\]]+)\\]\\([^\\)]+\\)', r'\\1', name)  # Remove markdown links\n    cleaned = re.sub(r'\\*\\*([^\\*]+)\\*\\*', r'\\1', cleaned)  # Remove bold\n    cleaned = cleaned.strip()\n\n    # Normalize and tokenize\n    normalized = cleaned.lower().replace(\"-\", \" \").replace(\"_\", \" \")\n    tokens = set(normalized.split())\n\n    return tokens, normalized\n\n\ndef find_main_model_column(header: List[str], model_name: str) -> Optional[int]:\n    \"\"\"\n    Identify the column index that corresponds to the main model.\n\n    Only returns a column if there's an exact normalized match with the model name.\n    This prevents extracting scores from training checkpoints or similar models.\n\n    Args:\n        header: Table column headers\n        model_name: Model name from repo_id (e.g., \"OLMo-3-32B-Think\")\n\n    Returns:\n        Column index of the main model, or None if no exact match found\n    \"\"\"\n    if not header or not model_name:\n        return None\n\n    # Normalize model name and extract tokens\n    model_tokens, _ = normalize_model_name(model_name)\n\n    # Find exact matches only\n    for i, col_name in enumerate(header):\n        if not col_name:\n            continue\n\n        # Skip first column (benchmark names)\n        if i == 0:\n            continue\n\n        col_tokens, _ = normalize_model_name(col_name)\n\n        # Check for exact token match\n        if model_tokens == col_tokens:\n            return i\n\n    # No exact match found\n    return None\n\n\ndef find_main_model_row(\n    rows: List[List[str]], model_name: str\n) -> tuple[Optional[int], List[str]]:\n    \"\"\"\n    Identify the row index that corresponds to the main model in a transposed table.\n\n    In transposed tables, each row represents a different model, with the first\n    column containing the model name.\n\n    Args:\n        rows: Table data rows\n        model_name: Model name from repo_id (e.g., \"OLMo-3-32B\")\n\n    Returns:\n        Tuple of (row_index, available_models)\n        - row_index: Index of the main model, or None if no exact match found\n        - available_models: List of all model names found in the table\n    \"\"\"\n    if not rows or not model_name:\n        return None, []\n\n    model_tokens, _ = normalize_model_name(model_name)\n    available_models = []\n\n    for i, row in enumerate(rows):\n        if not row or not row[0]:\n            continue\n\n        row_name = row[0].strip()\n\n        # Skip separator/header rows\n        if not row_name or row_name.startswith('---'):\n            continue\n\n        row_tokens, _ = normalize_model_name(row_name)\n\n        # Collect all non-empty model names\n        if row_tokens:\n            available_models.append(row_name)\n\n        # Check for exact token match\n        if model_tokens == row_tokens:\n            return i, available_models\n\n    return None, available_models\n\n\ndef is_transposed_table(header: List[str], rows: List[List[str]]) -> bool:\n    \"\"\"\n    Determine if a table is transposed (models as rows, benchmarks as columns).\n\n    A table is considered transposed if:\n    - The first column contains model-like names (not benchmark names)\n    - Most other columns contain numeric values\n    - Header row contains benchmark-like names\n\n    Args:\n        header: Table column headers\n        rows: Table data rows\n\n    Returns:\n        True if table appears to be transposed, False otherwise\n    \"\"\"\n    if not header or not rows or len(header) < 3:\n        return False\n\n    # Check if first column header suggests model names\n    first_col = header[0].lower()\n    model_indicators = [\"model\", \"system\", \"llm\", \"name\"]\n    has_model_header = any(indicator in first_col for indicator in model_indicators)\n\n    # Check if remaining headers look like benchmarks\n    benchmark_keywords = [\n        \"mmlu\", \"humaneval\", \"gsm\", \"hellaswag\", \"arc\", \"winogrande\",\n        \"eval\", \"score\", \"benchmark\", \"test\", \"math\", \"code\", \"mbpp\",\n        \"truthfulqa\", \"boolq\", \"piqa\", \"siqa\", \"drop\", \"squad\"\n    ]\n\n    benchmark_header_count = 0\n    for col_name in header[1:]:\n        col_lower = col_name.lower()\n        if any(keyword in col_lower for keyword in benchmark_keywords):\n            benchmark_header_count += 1\n\n    has_benchmark_headers = benchmark_header_count >= 2\n\n    # Check if data rows have numeric values in most columns (except first)\n    numeric_count = 0\n    total_cells = 0\n\n    for row in rows[:5]:  # Check first 5 rows\n        for cell in row[1:]:  # Skip first column\n            total_cells += 1\n            try:\n                float(cell.replace(\"%\", \"\").replace(\",\", \"\").strip())\n                numeric_count += 1\n            except (ValueError, AttributeError):\n                continue\n\n    has_numeric_data = total_cells > 0 and (numeric_count / total_cells) > 0.5\n\n    return (has_model_header or has_benchmark_headers) and has_numeric_data\n\n\ndef extract_metrics_from_table(\n    header: List[str],\n    rows: List[List[str]],\n    table_format: str = \"auto\",\n    model_name: Optional[str] = None,\n    model_column_index: Optional[int] = None\n) -> List[Dict[str, Any]]:\n    \"\"\"\n    Extract metrics from parsed table data.\n\n    Args:\n        header: Table column headers\n        rows: Table data rows\n        table_format: \"rows\" (benchmarks as rows), \"columns\" (benchmarks as columns),\n                     \"transposed\" (models as rows, benchmarks as columns), or \"auto\"\n        model_name: Optional model name to identify the correct column/row\n\n    Returns:\n        List of metric dictionaries with name, type, and value\n    \"\"\"\n    metrics = []\n\n    if table_format == \"auto\":\n        # First check if it's a transposed table (models as rows)\n        if is_transposed_table(header, rows):\n            table_format = \"transposed\"\n        else:\n            # Check if first column header is empty/generic (indicates benchmarks in rows)\n            first_header = header[0].lower().strip() if header else \"\"\n            is_first_col_benchmarks = not first_header or first_header in [\"\", \"benchmark\", \"task\", \"dataset\", \"metric\", \"eval\"]\n\n            if is_first_col_benchmarks:\n                table_format = \"rows\"\n            else:\n                # Heuristic: if first row has mostly numeric values, benchmarks are columns\n                try:\n                    numeric_count = sum(\n                        1 for cell in rows[0] if cell and\n                        re.match(r\"^\\d+\\.?\\d*%?$\", cell.replace(\",\", \"\").strip())\n                    )\n                    table_format = \"columns\" if numeric_count > len(rows[0]) / 2 else \"rows\"\n                except (IndexError, ValueError):\n                    table_format = \"rows\"\n\n    if table_format == \"rows\":\n        # Benchmarks are in rows, scores in columns\n        # Try to identify the main model column if model_name is provided\n        target_column = model_column_index\n        if target_column is None and model_name:\n            target_column = find_main_model_column(header, model_name)\n\n        for row in rows:\n            if not row:\n                continue\n\n            benchmark_name = row[0].strip()\n            if not benchmark_name:\n                continue\n\n            # If we identified a specific column, use it; otherwise use first numeric value\n            if target_column is not None and target_column < len(row):\n                try:\n                    value_str = row[target_column].replace(\"%\", \"\").replace(\",\", \"\").strip()\n                    if value_str:\n                        value = float(value_str)\n                        metrics.append({\n                            \"name\": benchmark_name,\n                            \"type\": benchmark_name.lower().replace(\" \", \"_\"),\n                            \"value\": value\n                        })\n                except (ValueError, IndexError):\n                    pass\n            else:\n                # Extract numeric values from remaining columns (original behavior)\n                for i, cell in enumerate(row[1:], start=1):\n                    try:\n                        # Remove common suffixes and convert to float\n                        value_str = cell.replace(\"%\", \"\").replace(\",\", \"\").strip()\n                        if not value_str:\n                            continue\n\n                        value = float(value_str)\n\n                        # Determine metric name\n                        metric_name = benchmark_name\n                        if len(header) > i and header[i].lower() not in [\"score\", \"value\", \"result\"]:\n                            metric_name = f\"{benchmark_name} ({header[i]})\"\n\n                        metrics.append({\n                            \"name\": metric_name,\n                            \"type\": benchmark_name.lower().replace(\" \", \"_\"),\n                            \"value\": value\n                        })\n                        break  # Only take first numeric value per row\n                    except (ValueError, IndexError):\n                        continue\n\n    elif table_format == \"transposed\":\n        # Models are in rows (first column), benchmarks are in columns (header)\n        # Find the row that matches the target model\n        if not model_name:\n            print(\"Warning: model_name required for transposed table format\")\n            return metrics\n\n        target_row_idx, available_models = find_main_model_row(rows, model_name)\n\n        if target_row_idx is None:\n            print(f\"\\n⚠ Could not find model '{model_name}' in transposed table\")\n            if available_models:\n                print(\"\\nAvailable models in table:\")\n                for i, model in enumerate(available_models, 1):\n                    print(f\"  {i}. {model}\")\n                print(\"\\nPlease select the correct model name from the list above.\")\n                print(\"You can specify it using the --model-name-override flag:\")\n                print(f'  --model-name-override \"{available_models[0]}\"')\n            return metrics\n\n        target_row = rows[target_row_idx]\n\n        # Extract metrics from each column (skip first column which is model name)\n        for i in range(1, len(header)):\n            benchmark_name = header[i].strip()\n            if not benchmark_name or i >= len(target_row):\n                continue\n\n            try:\n                value_str = target_row[i].replace(\"%\", \"\").replace(\",\", \"\").strip()\n                if not value_str:\n                    continue\n\n                value = float(value_str)\n\n                metrics.append({\n                    \"name\": benchmark_name,\n                    \"type\": benchmark_name.lower().replace(\" \", \"_\").replace(\"-\", \"_\"),\n                    \"value\": value\n                })\n            except (ValueError, AttributeError):\n                continue\n\n    else:  # table_format == \"columns\"\n        # Benchmarks are in columns\n        if not rows:\n            return metrics\n\n        # Use first data row for values\n        data_row = rows[0]\n\n        for i, benchmark_name in enumerate(header):\n            if not benchmark_name or i >= len(data_row):\n                continue\n\n            try:\n                value_str = data_row[i].replace(\"%\", \"\").replace(\",\", \"\").strip()\n                if not value_str:\n                    continue\n\n                value = float(value_str)\n\n                metrics.append({\n                    \"name\": benchmark_name,\n                    \"type\": benchmark_name.lower().replace(\" \", \"_\"),\n                    \"value\": value\n                })\n            except ValueError:\n                continue\n\n    return metrics\n\n\ndef extract_evaluations_from_readme(\n    repo_id: str,\n    task_type: str = \"text-generation\",\n    dataset_name: str = \"Benchmarks\",\n    dataset_type: str = \"benchmark\",\n    model_name_override: Optional[str] = None,\n    table_index: Optional[int] = None,\n    model_column_index: Optional[int] = None\n) -> Optional[List[Dict[str, Any]]]:\n    \"\"\"\n    Extract evaluation results from a model's README.\n\n    Args:\n        repo_id: Hugging Face model repository ID\n        task_type: Task type for model-index (e.g., \"text-generation\")\n        dataset_name: Name for the benchmark dataset\n        dataset_type: Type identifier for the dataset\n        model_name_override: Override model name for matching (column header for comparison tables)\n        table_index: 1-indexed table number from inspect-tables output\n\n    Returns:\n        Model-index formatted results or None if no evaluations found\n    \"\"\"\n    try:\n        load_env()\n        ModelCard = require_model_card()\n        hf_token = os.getenv(\"HF_TOKEN\")\n        card = ModelCard.load(repo_id, token=hf_token)\n        readme_content = card.content\n\n        if not readme_content:\n            print(f\"No README content found for {repo_id}\")\n            return None\n\n        # Extract model name from repo_id or use override\n        if model_name_override:\n            model_name = model_name_override\n            print(f\"Using model name override: '{model_name}'\")\n        else:\n            model_name = repo_id.split(\"/\")[-1] if \"/\" in repo_id else repo_id\n\n        # Use markdown-it parser for accurate table extraction\n        all_tables = extract_tables_with_parser(readme_content)\n\n        if not all_tables:\n            print(f\"No tables found in README for {repo_id}\")\n            return None\n\n        # If table_index specified, use that specific table\n        if table_index is not None:\n            if table_index < 1 or table_index > len(all_tables):\n                print(f\"Invalid table index {table_index}. Found {len(all_tables)} tables.\")\n                print(\"Run inspect-tables to see available tables.\")\n                return None\n            tables_to_process = [all_tables[table_index - 1]]\n        else:\n            # Filter to evaluation tables only\n            eval_tables = []\n            for table in all_tables:\n                header = table.get(\"headers\", [])\n                rows = table.get(\"rows\", [])\n                if is_evaluation_table(header, rows):\n                    eval_tables.append(table)\n\n            if len(eval_tables) > 1:\n                print(f\"\\n⚠ Found {len(eval_tables)} evaluation tables.\")\n                print(\"Run inspect-tables first, then use --table to select one:\")\n                print(f'  uv run scripts/evaluation_manager.py inspect-tables --repo-id \"{repo_id}\"')\n                return None\n            elif len(eval_tables) == 0:\n                print(f\"No evaluation tables found in README for {repo_id}\")\n                return None\n\n            tables_to_process = eval_tables\n\n        # Extract metrics from selected table(s)\n        all_metrics = []\n        for table in tables_to_process:\n            header = table.get(\"headers\", [])\n            rows = table.get(\"rows\", [])\n            metrics = extract_metrics_from_table(\n                header,\n                rows,\n                model_name=model_name,\n                model_column_index=model_column_index\n            )\n            all_metrics.extend(metrics)\n\n        if not all_metrics:\n            print(f\"No metrics extracted from table\")\n            return None\n\n        # Build model-index structure\n        display_name = repo_id.split(\"/\")[-1] if \"/\" in repo_id else repo_id\n\n        results = [{\n            \"task\": {\"type\": task_type},\n            \"dataset\": {\n                \"name\": dataset_name,\n                \"type\": dataset_type\n            },\n            \"metrics\": all_metrics,\n            \"source\": {\n                \"name\": \"Model README\",\n                \"url\": f\"https://huggingface.co/{repo_id}\"\n            }\n        }]\n\n        return results\n\n    except Exception as e:\n        print(f\"Error extracting evaluations from README: {e}\")\n        return None\n\n\n# ============================================================================\n# Table Inspection (using markdown-it-py for accurate parsing)\n# ============================================================================\n\n\ndef extract_tables_with_parser(markdown_content: str) -> List[Dict[str, Any]]:\n    \"\"\"\n    Extract tables from markdown using markdown-it-py parser.\n    Uses GFM (GitHub Flavored Markdown) which includes table support.\n    \"\"\"\n    MarkdownIt = require_markdown_it()\n    # Disable linkify to avoid optional dependency errors; not needed for table parsing.\n    md = MarkdownIt(\"gfm-like\", {\"linkify\": False})\n    tokens = md.parse(markdown_content)\n\n    tables = []\n    i = 0\n    while i < len(tokens):\n        token = tokens[i]\n\n        if token.type == \"table_open\":\n            table_data = {\"headers\": [], \"rows\": []}\n            current_row = []\n            in_header = False\n\n            i += 1\n            while i < len(tokens) and tokens[i].type != \"table_close\":\n                t = tokens[i]\n                if t.type == \"thead_open\":\n                    in_header = True\n                elif t.type == \"thead_close\":\n                    in_header = False\n                elif t.type == \"tr_open\":\n                    current_row = []\n                elif t.type == \"tr_close\":\n                    if in_header:\n                        table_data[\"headers\"] = current_row\n                    else:\n                        table_data[\"rows\"].append(current_row)\n                    current_row = []\n                elif t.type == \"inline\":\n                    current_row.append(t.content.strip())\n                i += 1\n\n            if table_data[\"headers\"] or table_data[\"rows\"]:\n                tables.append(table_data)\n\n        i += 1\n\n    return tables\n\n\ndef detect_table_format(table: Dict[str, Any], repo_id: str) -> Dict[str, Any]:\n    \"\"\"Analyze a table to detect its format and identify model columns.\"\"\"\n    headers = table.get(\"headers\", [])\n    rows = table.get(\"rows\", [])\n\n    if not headers or not rows:\n        return {\"format\": \"unknown\", \"columns\": headers, \"model_columns\": [], \"row_count\": 0, \"sample_rows\": []}\n\n    first_header = headers[0].lower() if headers else \"\"\n    is_first_col_benchmarks = not first_header or first_header in [\"\", \"benchmark\", \"task\", \"dataset\", \"metric\", \"eval\"]\n\n    # Check for numeric columns\n    numeric_columns = []\n    for col_idx in range(1, len(headers)):\n        numeric_count = 0\n        for row in rows[:5]:\n            if col_idx < len(row):\n                try:\n                    val = re.sub(r'\\s*\\([^)]*\\)', '', row[col_idx])\n                    float(val.replace(\"%\", \"\").replace(\",\", \"\").strip())\n                    numeric_count += 1\n                except (ValueError, AttributeError):\n                    pass\n        if numeric_count > len(rows[:5]) / 2:\n            numeric_columns.append(col_idx)\n\n    # Determine format\n    if is_first_col_benchmarks and len(numeric_columns) > 1:\n        format_type = \"comparison\"\n    elif is_first_col_benchmarks and len(numeric_columns) == 1:\n        format_type = \"simple\"\n    elif len(numeric_columns) > len(headers) / 2:\n        format_type = \"transposed\"\n    else:\n        format_type = \"unknown\"\n\n    # Find model columns\n    model_columns = []\n    model_name = repo_id.split(\"/\")[-1] if \"/\" in repo_id else repo_id\n    model_tokens, _ = normalize_model_name(model_name)\n\n    for idx, header in enumerate(headers):\n        if idx == 0 and is_first_col_benchmarks:\n            continue\n        if header:\n            header_tokens, _ = normalize_model_name(header)\n            is_match = model_tokens == header_tokens\n            is_partial = model_tokens.issubset(header_tokens) or header_tokens.issubset(model_tokens)\n            model_columns.append({\n                \"index\": idx,\n                \"header\": header,\n                \"is_exact_match\": is_match,\n                \"is_partial_match\": is_partial and not is_match\n            })\n\n    return {\n        \"format\": format_type,\n        \"columns\": headers,\n        \"model_columns\": model_columns,\n        \"row_count\": len(rows),\n        \"sample_rows\": [row[0] for row in rows[:5] if row]\n    }\n\n\ndef inspect_tables(repo_id: str) -> None:\n    \"\"\"Inspect and display all evaluation tables in a model's README.\"\"\"\n    try:\n        load_env()\n        ModelCard = require_model_card()\n        hf_token = os.getenv(\"HF_TOKEN\")\n        card = ModelCard.load(repo_id, token=hf_token)\n        readme_content = card.content\n\n        if not readme_content:\n            print(f\"No README content found for {repo_id}\")\n            return\n\n        tables = extract_tables_with_parser(readme_content)\n\n        if not tables:\n            print(f\"No tables found in README for {repo_id}\")\n            return\n\n        print(f\"\\n{'='*70}\")\n        print(f\"Tables found in README for: {repo_id}\")\n        print(f\"{'='*70}\")\n\n        eval_table_count = 0\n        for table in tables:\n            analysis = detect_table_format(table, repo_id)\n\n            if analysis[\"format\"] == \"unknown\" and not analysis.get(\"sample_rows\"):\n                continue\n\n            eval_table_count += 1\n            print(f\"\\n## Table {eval_table_count}\")\n            print(f\"   Format: {analysis['format']}\")\n            print(f\"   Rows: {analysis['row_count']}\")\n\n            print(f\"\\n   Columns ({len(analysis['columns'])}):\")\n            for col_info in analysis.get(\"model_columns\", []):\n                idx = col_info[\"index\"]\n                header = col_info[\"header\"]\n                if col_info[\"is_exact_match\"]:\n                    print(f\"      [{idx}] {header}  ✓ EXACT MATCH\")\n                elif col_info[\"is_partial_match\"]:\n                    print(f\"      [{idx}] {header}  ~ partial match\")\n                else:\n                    print(f\"      [{idx}] {header}\")\n\n            if analysis.get(\"sample_rows\"):\n                print(f\"\\n   Sample rows (first column):\")\n                for row_val in analysis[\"sample_rows\"][:5]:\n                    print(f\"      - {row_val}\")\n\n        if eval_table_count == 0:\n            print(\"\\nNo evaluation tables detected.\")\n        else:\n            print(\"\\nSuggested next step:\")\n            print(f'  uv run scripts/evaluation_manager.py extract-readme --repo-id \"{repo_id}\" --table <table-number> [--model-column-index <column-index>]')\n\n        print(f\"\\n{'='*70}\\n\")\n\n    except Exception as e:\n        print(f\"Error inspecting tables: {e}\")\n\n\n# ============================================================================\n# Pull Request Management\n# ============================================================================\n\n\ndef get_open_prs(repo_id: str) -> List[Dict[str, Any]]:\n    \"\"\"\n    Fetch open pull requests for a Hugging Face model repository.\n\n    Args:\n        repo_id: Hugging Face model repository ID (e.g., \"allenai/Olmo-3-32B-Think\")\n\n    Returns:\n        List of open PR dictionaries with num, title, author, and createdAt\n    \"\"\"\n    requests = require_requests()\n    url = f\"https://huggingface.co/api/models/{repo_id}/discussions\"\n\n    try:\n        response = requests.get(url, timeout=30, allow_redirects=True)\n        response.raise_for_status()\n\n        data = response.json()\n        discussions = data.get(\"discussions\", [])\n\n        open_prs = [\n            {\n                \"num\": d[\"num\"],\n                \"title\": d[\"title\"],\n                \"author\": d[\"author\"][\"name\"],\n                \"createdAt\": d.get(\"createdAt\", \"unknown\"),\n            }\n            for d in discussions\n            if d.get(\"status\") == \"open\" and d.get(\"isPullRequest\")\n        ]\n\n        return open_prs\n\n    except requests.RequestException as e:\n        print(f\"Error fetching PRs from Hugging Face: {e}\")\n        return []\n\n\ndef list_open_prs(repo_id: str) -> None:\n    \"\"\"Display open pull requests for a model repository.\"\"\"\n    prs = get_open_prs(repo_id)\n\n    print(f\"\\n{'='*70}\")\n    print(f\"Open Pull Requests for: {repo_id}\")\n    print(f\"{'='*70}\")\n\n    if not prs:\n        print(\"\\nNo open pull requests found.\")\n    else:\n        print(f\"\\nFound {len(prs)} open PR(s):\\n\")\n        for pr in prs:\n            print(f\"  PR #{pr['num']} - {pr['title']}\")\n            print(f\"     Author: {pr['author']}\")\n            print(f\"     Created: {pr['createdAt']}\")\n            print(f\"     URL: https://huggingface.co/{repo_id}/discussions/{pr['num']}\")\n            print()\n\n    print(f\"{'='*70}\\n\")\n\n\n# ============================================================================\n# Method 2: Import from Artificial Analysis\n# ============================================================================\n\n\ndef get_aa_model_data(creator_slug: str, model_name: str) -> Optional[Dict[str, Any]]:\n    \"\"\"\n    Fetch model evaluation data from Artificial Analysis API.\n\n    Args:\n        creator_slug: Creator identifier (e.g., \"anthropic\", \"openai\")\n        model_name: Model slug/identifier\n\n    Returns:\n        Model data dictionary or None if not found\n    \"\"\"\n    load_env()\n    AA_API_KEY = os.getenv(\"AA_API_KEY\")\n    if not AA_API_KEY:\n        raise ValueError(\"AA_API_KEY environment variable is not set\")\n\n    url = \"https://artificialanalysis.ai/api/v2/data/llms/models\"\n    headers = {\"x-api-key\": AA_API_KEY}\n\n    requests = require_requests()\n\n    try:\n        response = requests.get(url, headers=headers, timeout=30)\n        response.raise_for_status()\n\n        data = response.json().get(\"data\", [])\n\n        for model in data:\n            creator = model.get(\"model_creator\", {})\n            if creator.get(\"slug\") == creator_slug and model.get(\"slug\") == model_name:\n                return model\n\n        print(f\"Model {creator_slug}/{model_name} not found in Artificial Analysis\")\n        return None\n\n    except requests.RequestException as e:\n        print(f\"Error fetching data from Artificial Analysis: {e}\")\n        return None\n\n\ndef aa_data_to_model_index(\n    model_data: Dict[str, Any],\n    dataset_name: str = \"Artificial Analysis Benchmarks\",\n    dataset_type: str = \"artificial_analysis\",\n    task_type: str = \"evaluation\"\n) -> List[Dict[str, Any]]:\n    \"\"\"\n    Convert Artificial Analysis model data to model-index format.\n\n    Args:\n        model_data: Raw model data from AA API\n        dataset_name: Dataset name for model-index\n        dataset_type: Dataset type identifier\n        task_type: Task type for model-index\n\n    Returns:\n        Model-index formatted results\n    \"\"\"\n    model_name = model_data.get(\"name\", model_data.get(\"slug\", \"unknown-model\"))\n    evaluations = model_data.get(\"evaluations\", {})\n\n    if not evaluations:\n        print(f\"No evaluations found for model {model_name}\")\n        return []\n\n    metrics = []\n    for key, value in evaluations.items():\n        if value is not None:\n            metrics.append({\n                \"name\": key.replace(\"_\", \" \").title(),\n                \"type\": key,\n                \"value\": value\n            })\n\n    results = [{\n        \"task\": {\"type\": task_type},\n        \"dataset\": {\n            \"name\": dataset_name,\n            \"type\": dataset_type\n        },\n        \"metrics\": metrics,\n        \"source\": {\n            \"name\": \"Artificial Analysis API\",\n            \"url\": \"https://artificialanalysis.ai\"\n        }\n    }]\n\n    return results\n\n\ndef import_aa_evaluations(\n    creator_slug: str,\n    model_name: str,\n    repo_id: str\n) -> Optional[List[Dict[str, Any]]]:\n    \"\"\"\n    Import evaluation results from Artificial Analysis for a model.\n\n    Args:\n        creator_slug: Creator identifier in AA\n        model_name: Model identifier in AA\n        repo_id: Hugging Face repository ID to update\n\n    Returns:\n        Model-index formatted results or None if import fails\n    \"\"\"\n    model_data = get_aa_model_data(creator_slug, model_name)\n\n    if not model_data:\n        return None\n\n    results = aa_data_to_model_index(model_data)\n    return results\n\n\n# ============================================================================\n# Model Card Update Functions\n# ============================================================================\n\n\ndef update_model_card_with_evaluations(\n    repo_id: str,\n    results: List[Dict[str, Any]],\n    create_pr: bool = False,\n    commit_message: Optional[str] = None\n) -> bool:\n    \"\"\"\n    Update a model card with evaluation results.\n\n    Args:\n        repo_id: Hugging Face repository ID\n        results: Model-index formatted results\n        create_pr: Whether to create a PR instead of direct push\n        commit_message: Custom commit message\n\n    Returns:\n        True if successful, False otherwise\n    \"\"\"\n    try:\n        load_env()\n        ModelCard = require_model_card()\n        hf_token = os.getenv(\"HF_TOKEN\")\n        if not hf_token:\n            raise ValueError(\"HF_TOKEN environment variable is not set\")\n\n        # Load existing card\n        card = ModelCard.load(repo_id, token=hf_token)\n\n        # Get model name\n        model_name = repo_id.split(\"/\")[-1] if \"/\" in repo_id else repo_id\n\n        # Create or update model-index\n        model_index = [{\n            \"name\": model_name,\n            \"results\": results\n        }]\n\n        # Merge with existing model-index if present\n        if \"model-index\" in card.data:\n            existing = card.data[\"model-index\"]\n            if isinstance(existing, list) and existing:\n                # Keep existing name if present\n                if \"name\" in existing[0]:\n                    model_index[0][\"name\"] = existing[0][\"name\"]\n\n                # Merge results\n                existing_results = existing[0].get(\"results\", [])\n                model_index[0][\"results\"].extend(existing_results)\n\n        card.data[\"model-index\"] = model_index\n\n        # Prepare commit message\n        if not commit_message:\n            commit_message = f\"Add evaluation results to {model_name}\"\n\n        commit_description = (\n            \"This commit adds structured evaluation results to the model card. \"\n            \"The results are formatted using the model-index specification and \"\n            \"will be displayed in the model card's evaluation widget.\"\n        )\n\n        # Push update\n        card.push_to_hub(\n            repo_id,\n            token=hf_token,\n            commit_message=commit_message,\n            commit_description=commit_description,\n            create_pr=create_pr\n        )\n\n        action = \"Pull request created\" if create_pr else \"Model card updated\"\n        print(f\"✓ {action} successfully for {repo_id}\")\n        return True\n\n    except Exception as e:\n        print(f\"Error updating model card: {e}\")\n        return False\n\n\ndef show_evaluations(repo_id: str) -> None:\n    \"\"\"Display current evaluations in a model card.\"\"\"\n    try:\n        load_env()\n        ModelCard = require_model_card()\n        hf_token = os.getenv(\"HF_TOKEN\")\n        card = ModelCard.load(repo_id, token=hf_token)\n\n        if \"model-index\" not in card.data:\n            print(f\"No model-index found in {repo_id}\")\n            return\n\n        model_index = card.data[\"model-index\"]\n\n        print(f\"\\nEvaluations for {repo_id}:\")\n        print(\"=\" * 60)\n\n        for model_entry in model_index:\n            model_name = model_entry.get(\"name\", \"Unknown\")\n            print(f\"\\nModel: {model_name}\")\n\n            results = model_entry.get(\"results\", [])\n            for i, result in enumerate(results, 1):\n                print(f\"\\n  Result Set {i}:\")\n\n                task = result.get(\"task\", {})\n                print(f\"    Task: {task.get('type', 'unknown')}\")\n\n                dataset = result.get(\"dataset\", {})\n                print(f\"    Dataset: {dataset.get('name', 'unknown')}\")\n\n                metrics = result.get(\"metrics\", [])\n                print(f\"    Metrics ({len(metrics)}):\")\n                for metric in metrics:\n                    name = metric.get(\"name\", \"Unknown\")\n                    value = metric.get(\"value\", \"N/A\")\n                    print(f\"      - {name}: {value}\")\n\n                source = result.get(\"source\", {})\n                if source:\n                    print(f\"    Source: {source.get('name', 'Unknown')}\")\n\n        print(\"\\n\" + \"=\" * 60)\n\n    except Exception as e:\n        print(f\"Error showing evaluations: {e}\")\n\n\ndef validate_model_index(repo_id: str) -> bool:\n    \"\"\"Validate model-index format in a model card.\"\"\"\n    try:\n        load_env()\n        ModelCard = require_model_card()\n        hf_token = os.getenv(\"HF_TOKEN\")\n        card = ModelCard.load(repo_id, token=hf_token)\n\n        if \"model-index\" not in card.data:\n            print(f\"✗ No model-index found in {repo_id}\")\n            return False\n\n        model_index = card.data[\"model-index\"]\n\n        if not isinstance(model_index, list):\n            print(\"✗ model-index must be a list\")\n            return False\n\n        for i, entry in enumerate(model_index):\n            if \"name\" not in entry:\n                print(f\"✗ Entry {i} missing 'name' field\")\n                return False\n\n            if \"results\" not in entry:\n                print(f\"✗ Entry {i} missing 'results' field\")\n                return False\n\n            for j, result in enumerate(entry[\"results\"]):\n                if \"task\" not in result:\n                    print(f\"✗ Result {j} in entry {i} missing 'task' field\")\n                    return False\n\n                if \"dataset\" not in result:\n                    print(f\"✗ Result {j} in entry {i} missing 'dataset' field\")\n                    return False\n\n                if \"metrics\" not in result:\n                    print(f\"✗ Result {j} in entry {i} missing 'metrics' field\")\n                    return False\n\n        print(f\"✓ Model-index format is valid for {repo_id}\")\n        return True\n\n    except Exception as e:\n        print(f\"Error validating model-index: {e}\")\n        return False\n\n\n# ============================================================================\n# CLI Interface\n# ============================================================================\n\n\ndef main():\n    parser = argparse.ArgumentParser(\n        description=(\n            \"Manage evaluation results in Hugging Face model cards.\\n\\n\"\n            \"Use standard Python or `uv run scripts/evaluation_manager.py ...` \"\n            \"to auto-resolve dependencies from the PEP 723 header.\"\n        ),\n        formatter_class=argparse.RawTextHelpFormatter,\n        epilog=dedent(\n            \"\"\"\\\n            Typical workflows:\n              - Inspect tables first:\n                  uv run scripts/evaluation_manager.py inspect-tables --repo-id <model>\n              - Extract from README (prints YAML by default):\n                  uv run scripts/evaluation_manager.py extract-readme --repo-id <model> --table N\n              - Apply changes:\n                  uv run scripts/evaluation_manager.py extract-readme --repo-id <model> --table N --apply\n              - Import from Artificial Analysis:\n                  AA_API_KEY=... uv run scripts/evaluation_manager.py import-aa --creator-slug org --model-name slug --repo-id <model>\n\n            Tips:\n              - YAML is printed by default; use --apply or --create-pr to write changes.\n              - Set HF_TOKEN (and AA_API_KEY for import-aa); .env is loaded automatically if python-dotenv is installed.\n              - When multiple tables exist, run inspect-tables then select with --table N.\n              - To apply changes (push or PR), rerun extract-readme with --apply or --create-pr.\n            \"\"\"\n        ),\n    )\n    parser.add_argument(\"--version\", action=\"version\", version=\"evaluation_manager 1.2.0\")\n\n    subparsers = parser.add_subparsers(dest=\"command\", help=\"Command to execute\")\n\n    # Extract from README command\n    extract_parser = subparsers.add_parser(\n        \"extract-readme\",\n        help=\"Extract evaluation tables from model README\",\n        formatter_class=argparse.RawTextHelpFormatter,\n        description=\"Parse README tables into model-index YAML. Default behavior prints YAML; use --apply/--create-pr to write changes.\",\n        epilog=dedent(\n            \"\"\"\\\n            Examples:\n              uv run scripts/evaluation_manager.py extract-readme --repo-id username/model\n              uv run scripts/evaluation_manager.py extract-readme --repo-id username/model --table 2 --model-column-index 3\n              uv run scripts/evaluation_manager.py extract-readme --repo-id username/model --table 2 --model-name-override \\\"**Model 7B**\\\"  # exact header text\n              uv run scripts/evaluation_manager.py extract-readme --repo-id username/model --table 2 --create-pr\n\n            Apply changes:\n              - Default: prints YAML to stdout (no writes).\n              - Add --apply to push directly, or --create-pr to open a PR.\n            Model selection:\n              - Preferred: --model-column-index <header index shown by inspect-tables>\n              - If using --model-name-override, copy the column header text exactly.\n            \"\"\"\n        ),\n    )\n    extract_parser.add_argument(\"--repo-id\", type=str, required=True, help=\"HF repository ID\")\n    extract_parser.add_argument(\"--table\", type=int, help=\"Table number (1-indexed, from inspect-tables output)\")\n    extract_parser.add_argument(\"--model-column-index\", type=int, help=\"Preferred: column index from inspect-tables output (exact selection)\")\n    extract_parser.add_argument(\"--model-name-override\", type=str, help=\"Exact column header/model name for comparison/transpose tables (when index is not used)\")\n    extract_parser.add_argument(\"--task-type\", type=str, default=\"text-generation\", help=\"Sets model-index task.type (e.g., text-generation, summarization)\")\n    extract_parser.add_argument(\"--dataset-name\", type=str, default=\"Benchmarks\", help=\"Dataset name\")\n    extract_parser.add_argument(\"--dataset-type\", type=str, default=\"benchmark\", help=\"Dataset type\")\n    extract_parser.add_argument(\"--create-pr\", action=\"store_true\", help=\"Create PR instead of direct push\")\n    extract_parser.add_argument(\"--apply\", action=\"store_true\", help=\"Apply changes (default is to print YAML only)\")\n    extract_parser.add_argument(\"--dry-run\", action=\"store_true\", help=\"Preview YAML without updating (default)\")\n\n    # Import from AA command\n    aa_parser = subparsers.add_parser(\n        \"import-aa\",\n        help=\"Import evaluation scores from Artificial Analysis\",\n        formatter_class=argparse.RawTextHelpFormatter,\n        description=\"Fetch scores from Artificial Analysis API and write them into model-index.\",\n        epilog=dedent(\n            \"\"\"\\\n            Examples:\n              AA_API_KEY=... uv run scripts/evaluation_manager.py import-aa --creator-slug anthropic --model-name claude-sonnet-4 --repo-id username/model\n              uv run scripts/evaluation_manager.py import-aa --creator-slug openai --model-name gpt-4o --repo-id username/model --create-pr\n\n            Requires: AA_API_KEY in env (or .env if python-dotenv installed).\n            \"\"\"\n        ),\n    )\n    aa_parser.add_argument(\"--creator-slug\", type=str, required=True, help=\"AA creator slug\")\n    aa_parser.add_argument(\"--model-name\", type=str, required=True, help=\"AA model name\")\n    aa_parser.add_argument(\"--repo-id\", type=str, required=True, help=\"HF repository ID\")\n    aa_parser.add_argument(\"--create-pr\", action=\"store_true\", help=\"Create PR instead of direct push\")\n\n    # Show evaluations command\n    show_parser = subparsers.add_parser(\n        \"show\",\n        help=\"Display current evaluations in model card\",\n        formatter_class=argparse.RawTextHelpFormatter,\n        description=\"Print model-index content from the model card (requires HF_TOKEN for private repos).\",\n    )\n    show_parser.add_argument(\"--repo-id\", type=str, required=True, help=\"HF repository ID\")\n\n    # Validate command\n    validate_parser = subparsers.add_parser(\n        \"validate\",\n        help=\"Validate model-index format\",\n        formatter_class=argparse.RawTextHelpFormatter,\n        description=\"Schema sanity check for model-index section of the card.\",\n    )\n    validate_parser.add_argument(\"--repo-id\", type=str, required=True, help=\"HF repository ID\")\n\n    # Inspect tables command\n    inspect_parser = subparsers.add_parser(\n        \"inspect-tables\",\n        help=\"Inspect tables in README → outputs suggested extract-readme command\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=\"\"\"\nWorkflow:\n  1. inspect-tables     → see table structure, columns, and table numbers\n  2. extract-readme     → run with --table N (from step 1); YAML prints by default\n  3. apply changes      → rerun extract-readme with --apply or --create-pr\n\nReminder:\n  - Preferred: use --model-column-index <index>. If needed, use --model-name-override with the exact column header text.\n\"\"\"\n    )\n    inspect_parser.add_argument(\"--repo-id\", type=str, required=True, help=\"HF repository ID\")\n\n    # Get PRs command\n    prs_parser = subparsers.add_parser(\n        \"get-prs\",\n        help=\"List open pull requests for a model repository\",\n        formatter_class=argparse.RawTextHelpFormatter,\n        description=\"Check for existing open PRs before creating new ones to avoid duplicates.\",\n        epilog=dedent(\n            \"\"\"\\\n            Examples:\n              uv run scripts/evaluation_manager.py get-prs --repo-id \"allenai/Olmo-3-32B-Think\"\n\n            IMPORTANT: Always run this before using --create-pr to avoid duplicate PRs.\n            \"\"\"\n        ),\n    )\n    prs_parser.add_argument(\"--repo-id\", type=str, required=True, help=\"HF repository ID\")\n\n    args = parser.parse_args()\n\n    if not args.command:\n        parser.print_help()\n        return\n\n    try:\n        # Execute command\n        if args.command == \"extract-readme\":\n            results = extract_evaluations_from_readme(\n                repo_id=args.repo_id,\n                task_type=args.task_type,\n                dataset_name=args.dataset_name,\n                dataset_type=args.dataset_type,\n                model_name_override=args.model_name_override,\n                table_index=args.table,\n                model_column_index=args.model_column_index\n            )\n\n            if not results:\n                print(\"No evaluations extracted\")\n                return\n\n            apply_changes = args.apply or args.create_pr\n\n            # Default behavior: print YAML (dry-run)\n            yaml = require_yaml()\n            print(\"\\nExtracted evaluations (YAML):\")\n            print(\n                yaml.dump(\n                    {\"model-index\": [{\"name\": args.repo_id.split('/')[-1], \"results\": results}]},\n                    sort_keys=False\n                )\n            )\n\n            if apply_changes:\n                if args.model_name_override and args.model_column_index is not None:\n                    print(\"Note: --model-column-index takes precedence over --model-name-override.\")\n                update_model_card_with_evaluations(\n                    repo_id=args.repo_id,\n                    results=results,\n                    create_pr=args.create_pr,\n                    commit_message=\"Extract evaluation results from README\"\n                )\n\n        elif args.command == \"import-aa\":\n            results = import_aa_evaluations(\n                creator_slug=args.creator_slug,\n                model_name=args.model_name,\n                repo_id=args.repo_id\n            )\n\n            if not results:\n                print(\"No evaluations imported\")\n                return\n\n            update_model_card_with_evaluations(\n                repo_id=args.repo_id,\n                results=results,\n                create_pr=args.create_pr,\n                commit_message=f\"Add Artificial Analysis evaluations for {args.model_name}\"\n            )\n\n        elif args.command == \"show\":\n            show_evaluations(args.repo_id)\n\n        elif args.command == \"validate\":\n            validate_model_index(args.repo_id)\n\n        elif args.command == \"inspect-tables\":\n            inspect_tables(args.repo_id)\n\n        elif args.command == \"get-prs\":\n            list_open_prs(args.repo_id)\n    except ModuleNotFoundError as exc:\n        # Surface dependency hints cleanly when user only needs help output\n        print(exc)\n    except Exception as exc:\n        print(f\"Error: {exc}\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/inspect_eval_uv.py",
    "content": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"inspect-ai>=0.3.0\",\n#     \"inspect-evals\",\n#     \"openai\",\n# ]\n# ///\n\n\"\"\"\nEntry point script for running inspect-ai evaluations via `hf jobs uv run`.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport argparse\nimport os\nimport subprocess\nimport sys\nfrom pathlib import Path\nfrom typing import Optional\n\n\ndef _inspect_evals_tasks_root() -> Optional[Path]:\n    \"\"\"Return the installed inspect_evals package path if available.\"\"\"\n    try:\n        import inspect_evals\n\n        return Path(inspect_evals.__file__).parent\n    except Exception:\n        return None\n\n\ndef _normalize_task(task: str) -> str:\n    \"\"\"Allow lighteval-style `suite|task|shots` strings by keeping the task name.\"\"\"\n    if \"|\" in task:\n        parts = task.split(\"|\")\n        if len(parts) >= 2 and parts[1]:\n            return parts[1]\n    return task\n\n\ndef main() -> None:\n    parser = argparse.ArgumentParser(description=\"Inspect-ai job runner\")\n    parser.add_argument(\"--model\", required=True, help=\"Model ID on Hugging Face Hub\")\n    parser.add_argument(\"--task\", required=True, help=\"inspect-ai task to execute\")\n    parser.add_argument(\"--limit\", type=int, default=None, help=\"Limit number of samples to evaluate\")\n    parser.add_argument(\n        \"--tasks-root\",\n        default=None,\n        help=\"Optional path to inspect task files. Defaults to the installed inspect_evals package.\",\n    )\n    parser.add_argument(\n        \"--sandbox\",\n        default=\"local\",\n        help=\"Sandbox backend to use (default: local for HF jobs without Docker).\",\n    )\n    args = parser.parse_args()\n\n    # Ensure downstream libraries can read the token passed as a secret\n    hf_token = os.getenv(\"HF_TOKEN\")\n    if hf_token:\n        os.environ.setdefault(\"HUGGING_FACE_HUB_TOKEN\", hf_token)\n        os.environ.setdefault(\"HF_HUB_TOKEN\", hf_token)\n\n    task = _normalize_task(args.task)\n    tasks_root = Path(args.tasks_root) if args.tasks_root else _inspect_evals_tasks_root()\n    if tasks_root and not tasks_root.exists():\n        tasks_root = None\n\n    cmd = [\n        \"inspect\",\n        \"eval\",\n        task,\n        \"--model\",\n        f\"hf-inference-providers/{args.model}\",\n        \"--log-level\",\n        \"info\",\n        # Reduce batch size to avoid OOM errors (default is 32)\n        \"--max-connections\",\n        \"1\",\n        # Set a small positive temperature (HF doesn't allow temperature=0)\n        \"--temperature\",\n        \"0.001\",\n    ]\n\n    if args.sandbox:\n        cmd.extend([\"--sandbox\", args.sandbox])\n\n    if args.limit:\n        cmd.extend([\"--limit\", str(args.limit)])\n\n    try:\n        subprocess.run(cmd, check=True, cwd=tasks_root)\n        print(\"Evaluation complete.\")\n    except subprocess.CalledProcessError as exc:\n        location = f\" (cwd={tasks_root})\" if tasks_root else \"\"\n        print(f\"Evaluation failed with exit code {exc.returncode}{location}\", file=sys.stderr)\n        raise\n\n\nif __name__ == \"__main__\":\n    main()\n\n"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/inspect_vllm_uv.py",
    "content": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"inspect-ai>=0.3.0\",\n#     \"inspect-evals\",\n#     \"vllm>=0.4.0\",\n#     \"torch>=2.0.0\",\n#     \"transformers>=4.40.0\",\n# ]\n# ///\n\n\"\"\"\nEntry point script for running inspect-ai evaluations with vLLM or HuggingFace Transformers backend.\n\nThis script runs evaluations on custom HuggingFace models using local GPU inference,\nseparate from inference provider scripts (which use external APIs).\n\nUsage (standalone):\n    python inspect_vllm_uv.py --model \"meta-llama/Llama-3.2-1B\" --task \"mmlu\"\n\nUsage (via HF Jobs):\n    hf jobs uv run inspect_vllm_uv.py \\\\\n        --flavor a10g-small \\\\\n        --secret HF_TOKEN=$HF_TOKEN \\\\\n        -- --model \"meta-llama/Llama-3.2-1B\" --task \"mmlu\"\n\nModel backends:\n    - vllm: Fast inference with vLLM (recommended for large models)\n    - hf: HuggingFace Transformers backend (broader model compatibility)\n\"\"\"\n\nfrom __future__ import annotations\n\nimport argparse\nimport os\nimport subprocess\nimport sys\nfrom typing import Optional\n\n\ndef setup_environment() -> None:\n    \"\"\"Configure environment variables for HuggingFace authentication.\"\"\"\n    hf_token = os.getenv(\"HF_TOKEN\")\n    if hf_token:\n        os.environ.setdefault(\"HUGGING_FACE_HUB_TOKEN\", hf_token)\n        os.environ.setdefault(\"HF_HUB_TOKEN\", hf_token)\n\n\ndef run_inspect_vllm(\n    model_id: str,\n    task: str,\n    limit: Optional[int] = None,\n    max_connections: int = 4,\n    temperature: float = 0.0,\n    tensor_parallel_size: int = 1,\n    gpu_memory_utilization: float = 0.8,\n    dtype: str = \"auto\",\n    trust_remote_code: bool = False,\n    log_level: str = \"info\",\n) -> None:\n    \"\"\"\n    Run inspect-ai evaluation with vLLM backend.\n\n    Args:\n        model_id: HuggingFace model ID\n        task: inspect-ai task to execute (e.g., \"mmlu\", \"gsm8k\")\n        limit: Limit number of samples to evaluate\n        max_connections: Maximum concurrent connections\n        temperature: Sampling temperature\n        tensor_parallel_size: Number of GPUs for tensor parallelism\n        gpu_memory_utilization: GPU memory fraction\n        dtype: Data type (auto, float16, bfloat16)\n        trust_remote_code: Allow remote code execution\n        log_level: Logging level\n    \"\"\"\n    setup_environment()\n\n    model_spec = f\"vllm/{model_id}\"\n    cmd = [\n        \"inspect\",\n        \"eval\",\n        task,\n        \"--model\",\n        model_spec,\n        \"--log-level\",\n        log_level,\n        \"--max-connections\",\n        str(max_connections),\n    ]\n\n    # vLLM supports temperature=0 unlike HF inference providers\n    cmd.extend([\"--temperature\", str(temperature)])\n\n    # Older inspect-ai CLI versions do not support --model-args; rely on defaults\n    # and let vLLM choose sensible settings for small models.\n    if tensor_parallel_size != 1:\n        cmd.extend([\"--tensor-parallel-size\", str(tensor_parallel_size)])\n    if gpu_memory_utilization != 0.8:\n        cmd.extend([\"--gpu-memory-utilization\", str(gpu_memory_utilization)])\n    if dtype != \"auto\":\n        cmd.extend([\"--dtype\", dtype])\n    if trust_remote_code:\n        cmd.append(\"--trust-remote-code\")\n\n    if limit:\n        cmd.extend([\"--limit\", str(limit)])\n\n    print(f\"Running: {' '.join(cmd)}\")\n\n    try:\n        subprocess.run(cmd, check=True)\n        print(\"Evaluation complete.\")\n    except subprocess.CalledProcessError as exc:\n        print(f\"Evaluation failed with exit code {exc.returncode}\", file=sys.stderr)\n        sys.exit(exc.returncode)\n\n\ndef run_inspect_hf(\n    model_id: str,\n    task: str,\n    limit: Optional[int] = None,\n    max_connections: int = 1,\n    temperature: float = 0.001,\n    device: str = \"auto\",\n    dtype: str = \"auto\",\n    trust_remote_code: bool = False,\n    log_level: str = \"info\",\n) -> None:\n    \"\"\"\n    Run inspect-ai evaluation with HuggingFace Transformers backend.\n\n    Use this when vLLM doesn't support the model architecture.\n\n    Args:\n        model_id: HuggingFace model ID\n        task: inspect-ai task to execute\n        limit: Limit number of samples\n        max_connections: Maximum concurrent connections (keep low for memory)\n        temperature: Sampling temperature\n        device: Device to use (auto, cuda, cpu)\n        dtype: Data type\n        trust_remote_code: Allow remote code execution\n        log_level: Logging level\n    \"\"\"\n    setup_environment()\n\n    model_spec = f\"hf/{model_id}\"\n\n    cmd = [\n        \"inspect\",\n        \"eval\",\n        task,\n        \"--model\",\n        model_spec,\n        \"--log-level\",\n        log_level,\n        \"--max-connections\",\n        str(max_connections),\n        \"--temperature\",\n        str(temperature),\n    ]\n\n    if device != \"auto\":\n        cmd.extend([\"--device\", device])\n    if dtype != \"auto\":\n        cmd.extend([\"--dtype\", dtype])\n    if trust_remote_code:\n        cmd.append(\"--trust-remote-code\")\n\n    if limit:\n        cmd.extend([\"--limit\", str(limit)])\n\n    print(f\"Running: {' '.join(cmd)}\")\n\n    try:\n        subprocess.run(cmd, check=True)\n        print(\"Evaluation complete.\")\n    except subprocess.CalledProcessError as exc:\n        print(f\"Evaluation failed with exit code {exc.returncode}\", file=sys.stderr)\n        sys.exit(exc.returncode)\n\n\ndef main() -> None:\n    parser = argparse.ArgumentParser(\n        description=\"Run inspect-ai evaluations with vLLM or HuggingFace Transformers on custom models\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=\"\"\"\nExamples:\n  # Run MMLU with vLLM backend\n  python inspect_vllm_uv.py --model meta-llama/Llama-3.2-1B --task mmlu\n\n  # Run with HuggingFace Transformers backend\n  python inspect_vllm_uv.py --model meta-llama/Llama-3.2-1B --task mmlu --backend hf\n\n  # Run with limited samples for testing\n  python inspect_vllm_uv.py --model meta-llama/Llama-3.2-1B --task mmlu --limit 10\n\n  # Run on multiple GPUs with tensor parallelism\n  python inspect_vllm_uv.py --model meta-llama/Llama-3.2-70B --task mmlu --tensor-parallel-size 4\n\nAvailable tasks (from inspect-evals):\n  - mmlu: Massive Multitask Language Understanding\n  - gsm8k: Grade School Math\n  - hellaswag: Common sense reasoning\n  - arc_challenge: AI2 Reasoning Challenge\n  - truthfulqa: TruthfulQA benchmark\n  - winogrande: Winograd Schema Challenge\n  - humaneval: Code generation (HumanEval)\n\nVia HF Jobs:\n  hf jobs uv run inspect_vllm_uv.py \\\\\n      --flavor a10g-small \\\\\n      --secret HF_TOKEN=$HF_TOKEN \\\\\n      -- --model meta-llama/Llama-3.2-1B --task mmlu\n        \"\"\",\n    )\n\n    parser.add_argument(\n        \"--model\",\n        required=True,\n        help=\"HuggingFace model ID (e.g., meta-llama/Llama-3.2-1B)\",\n    )\n    parser.add_argument(\n        \"--task\",\n        required=True,\n        help=\"inspect-ai task to execute (e.g., mmlu, gsm8k)\",\n    )\n    parser.add_argument(\n        \"--backend\",\n        choices=[\"vllm\", \"hf\"],\n        default=\"vllm\",\n        help=\"Model backend (default: vllm)\",\n    )\n    parser.add_argument(\n        \"--limit\",\n        type=int,\n        default=None,\n        help=\"Limit number of samples to evaluate\",\n    )\n    parser.add_argument(\n        \"--max-connections\",\n        type=int,\n        default=None,\n        help=\"Maximum concurrent connections (default: 4 for vllm, 1 for hf)\",\n    )\n    parser.add_argument(\n        \"--temperature\",\n        type=float,\n        default=None,\n        help=\"Sampling temperature (default: 0.0 for vllm, 0.001 for hf)\",\n    )\n    parser.add_argument(\n        \"--tensor-parallel-size\",\n        type=int,\n        default=1,\n        help=\"Number of GPUs for tensor parallelism (vLLM only, default: 1)\",\n    )\n    parser.add_argument(\n        \"--gpu-memory-utilization\",\n        type=float,\n        default=0.8,\n        help=\"GPU memory fraction to use (vLLM only, default: 0.8)\",\n    )\n    parser.add_argument(\n        \"--dtype\",\n        default=\"auto\",\n        choices=[\"auto\", \"float16\", \"bfloat16\", \"float32\"],\n        help=\"Data type for model weights (default: auto)\",\n    )\n    parser.add_argument(\n        \"--device\",\n        default=\"auto\",\n        help=\"Device for HF backend (auto, cuda, cpu)\",\n    )\n    parser.add_argument(\n        \"--trust-remote-code\",\n        action=\"store_true\",\n        help=\"Allow executing remote code from model repository\",\n    )\n    parser.add_argument(\n        \"--log-level\",\n        default=\"info\",\n        choices=[\"debug\", \"info\", \"warning\", \"error\"],\n        help=\"Logging level (default: info)\",\n    )\n\n    args = parser.parse_args()\n\n    if args.backend == \"vllm\":\n        run_inspect_vllm(\n            model_id=args.model,\n            task=args.task,\n            limit=args.limit,\n            max_connections=args.max_connections or 4,\n            temperature=args.temperature if args.temperature is not None else 0.0,\n            tensor_parallel_size=args.tensor_parallel_size,\n            gpu_memory_utilization=args.gpu_memory_utilization,\n            dtype=args.dtype,\n            trust_remote_code=args.trust_remote_code,\n            log_level=args.log_level,\n        )\n    else:\n        run_inspect_hf(\n            model_id=args.model,\n            task=args.task,\n            limit=args.limit,\n            max_connections=args.max_connections or 1,\n            temperature=args.temperature if args.temperature is not None else 0.001,\n            device=args.device,\n            dtype=args.dtype,\n            trust_remote_code=args.trust_remote_code,\n            log_level=args.log_level,\n        )\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/lighteval_vllm_uv.py",
    "content": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"lighteval[accelerate,vllm]>=0.6.0\",\n#     \"torch>=2.0.0\",\n#     \"transformers>=4.40.0\",\n#     \"accelerate>=0.30.0\",\n#     \"vllm>=0.4.0\",\n# ]\n# ///\n\n\"\"\"\nEntry point script for running lighteval evaluations with vLLM backend via `hf jobs uv run`.\n\nThis script runs evaluations using vLLM for efficient GPU inference on custom HuggingFace models.\nIt is separate from inference provider scripts and evaluates models directly on the hardware.\n\nUsage (standalone):\n    python lighteval_vllm_uv.py --model \"meta-llama/Llama-3.2-1B\" --tasks \"leaderboard|mmlu|5\"\n\nUsage (via HF Jobs):\n    hf jobs uv run lighteval_vllm_uv.py \\\\\n        --flavor a10g-small \\\\\n        --secret HF_TOKEN=$HF_TOKEN \\\\\n        -- --model \"meta-llama/Llama-3.2-1B\" --tasks \"leaderboard|mmlu|5\"\n\"\"\"\n\nfrom __future__ import annotations\n\nimport argparse\nimport os\nimport subprocess\nimport sys\nfrom typing import Optional\n\n\ndef setup_environment() -> None:\n    \"\"\"Configure environment variables for HuggingFace authentication.\"\"\"\n    hf_token = os.getenv(\"HF_TOKEN\")\n    if hf_token:\n        os.environ.setdefault(\"HUGGING_FACE_HUB_TOKEN\", hf_token)\n        os.environ.setdefault(\"HF_HUB_TOKEN\", hf_token)\n\n\ndef run_lighteval_vllm(\n    model_id: str,\n    tasks: str,\n    output_dir: Optional[str] = None,\n    max_samples: Optional[int] = None,\n    batch_size: int = 1,\n    tensor_parallel_size: int = 1,\n    gpu_memory_utilization: float = 0.8,\n    dtype: str = \"auto\",\n    trust_remote_code: bool = False,\n    use_chat_template: bool = False,\n    system_prompt: Optional[str] = None,\n) -> None:\n    \"\"\"\n    Run lighteval with vLLM backend for efficient GPU inference.\n\n    Args:\n        model_id: HuggingFace model ID (e.g., \"meta-llama/Llama-3.2-1B\")\n        tasks: Task specification (e.g., \"leaderboard|mmlu|5\" or \"lighteval|hellaswag|0\")\n        output_dir: Directory for evaluation results\n        max_samples: Limit number of samples per task\n        batch_size: Batch size for evaluation\n        tensor_parallel_size: Number of GPUs for tensor parallelism\n        gpu_memory_utilization: GPU memory fraction to use (0.0-1.0)\n        dtype: Data type for model weights (auto, float16, bfloat16)\n        trust_remote_code: Allow executing remote code from model repo\n        use_chat_template: Apply chat template for conversational models\n        system_prompt: System prompt for chat models\n    \"\"\"\n    setup_environment()\n\n    # Build lighteval vllm command\n    cmd = [\n        \"lighteval\",\n        \"vllm\",\n        model_id,\n        tasks,\n        \"--batch-size\", str(batch_size),\n        \"--tensor-parallel-size\", str(tensor_parallel_size),\n        \"--gpu-memory-utilization\", str(gpu_memory_utilization),\n        \"--dtype\", dtype,\n    ]\n\n    if output_dir:\n        cmd.extend([\"--output-dir\", output_dir])\n\n    if max_samples:\n        cmd.extend([\"--max-samples\", str(max_samples)])\n\n    if trust_remote_code:\n        cmd.append(\"--trust-remote-code\")\n\n    if use_chat_template:\n        cmd.append(\"--use-chat-template\")\n\n    if system_prompt:\n        cmd.extend([\"--system-prompt\", system_prompt])\n\n    print(f\"Running: {' '.join(cmd)}\")\n\n    try:\n        subprocess.run(cmd, check=True)\n        print(\"Evaluation complete.\")\n    except subprocess.CalledProcessError as exc:\n        print(f\"Evaluation failed with exit code {exc.returncode}\", file=sys.stderr)\n        sys.exit(exc.returncode)\n\n\ndef run_lighteval_accelerate(\n    model_id: str,\n    tasks: str,\n    output_dir: Optional[str] = None,\n    max_samples: Optional[int] = None,\n    batch_size: int = 1,\n    dtype: str = \"bfloat16\",\n    trust_remote_code: bool = False,\n    use_chat_template: bool = False,\n    system_prompt: Optional[str] = None,\n) -> None:\n    \"\"\"\n    Run lighteval with accelerate backend for multi-GPU distributed inference.\n\n    Use this backend when vLLM is not available or for models not supported by vLLM.\n\n    Args:\n        model_id: HuggingFace model ID\n        tasks: Task specification\n        output_dir: Directory for evaluation results\n        max_samples: Limit number of samples per task\n        batch_size: Batch size for evaluation\n        dtype: Data type for model weights\n        trust_remote_code: Allow executing remote code\n        use_chat_template: Apply chat template\n        system_prompt: System prompt for chat models\n    \"\"\"\n    setup_environment()\n\n    # Build lighteval accelerate command\n    cmd = [\n        \"lighteval\",\n        \"accelerate\",\n        model_id,\n        tasks,\n        \"--batch-size\", str(batch_size),\n        \"--dtype\", dtype,\n    ]\n\n    if output_dir:\n        cmd.extend([\"--output-dir\", output_dir])\n\n    if max_samples:\n        cmd.extend([\"--max-samples\", str(max_samples)])\n\n    if trust_remote_code:\n        cmd.append(\"--trust-remote-code\")\n\n    if use_chat_template:\n        cmd.append(\"--use-chat-template\")\n\n    if system_prompt:\n        cmd.extend([\"--system-prompt\", system_prompt])\n\n    print(f\"Running: {' '.join(cmd)}\")\n\n    try:\n        subprocess.run(cmd, check=True)\n        print(\"Evaluation complete.\")\n    except subprocess.CalledProcessError as exc:\n        print(f\"Evaluation failed with exit code {exc.returncode}\", file=sys.stderr)\n        sys.exit(exc.returncode)\n\n\ndef main() -> None:\n    parser = argparse.ArgumentParser(\n        description=\"Run lighteval evaluations with vLLM or accelerate backend on custom HuggingFace models\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=\"\"\"\nExamples:\n  # Run MMLU evaluation with vLLM\n  python lighteval_vllm_uv.py --model meta-llama/Llama-3.2-1B --tasks \"leaderboard|mmlu|5\"\n\n  # Run with accelerate backend instead of vLLM\n  python lighteval_vllm_uv.py --model meta-llama/Llama-3.2-1B --tasks \"leaderboard|mmlu|5\" --backend accelerate\n\n  # Run with chat template for instruction-tuned models\n  python lighteval_vllm_uv.py --model meta-llama/Llama-3.2-1B-Instruct --tasks \"leaderboard|mmlu|5\" --use-chat-template\n\n  # Run with limited samples for testing\n  python lighteval_vllm_uv.py --model meta-llama/Llama-3.2-1B --tasks \"leaderboard|mmlu|5\" --max-samples 10\n\nTask format:\n  Tasks use the format: \"suite|task|num_fewshot\"\n  - leaderboard|mmlu|5 (MMLU with 5-shot)\n  - lighteval|hellaswag|0 (HellaSwag zero-shot)\n  - leaderboard|gsm8k|5 (GSM8K with 5-shot)\n  - Multiple tasks: \"leaderboard|mmlu|5,leaderboard|gsm8k|5\"\n        \"\"\",\n    )\n\n    parser.add_argument(\n        \"--model\",\n        required=True,\n        help=\"HuggingFace model ID (e.g., meta-llama/Llama-3.2-1B)\",\n    )\n    parser.add_argument(\n        \"--tasks\",\n        required=True,\n        help=\"Task specification (e.g., 'leaderboard|mmlu|5')\",\n    )\n    parser.add_argument(\n        \"--backend\",\n        choices=[\"vllm\", \"accelerate\"],\n        default=\"vllm\",\n        help=\"Inference backend to use (default: vllm)\",\n    )\n    parser.add_argument(\n        \"--output-dir\",\n        default=None,\n        help=\"Directory for evaluation results\",\n    )\n    parser.add_argument(\n        \"--max-samples\",\n        type=int,\n        default=None,\n        help=\"Limit number of samples per task (useful for testing)\",\n    )\n    parser.add_argument(\n        \"--batch-size\",\n        type=int,\n        default=1,\n        help=\"Batch size for evaluation (default: 1)\",\n    )\n    parser.add_argument(\n        \"--tensor-parallel-size\",\n        type=int,\n        default=1,\n        help=\"Number of GPUs for tensor parallelism (vLLM only, default: 1)\",\n    )\n    parser.add_argument(\n        \"--gpu-memory-utilization\",\n        type=float,\n        default=0.8,\n        help=\"GPU memory fraction to use (vLLM only, default: 0.8)\",\n    )\n    parser.add_argument(\n        \"--dtype\",\n        default=\"auto\",\n        choices=[\"auto\", \"float16\", \"bfloat16\", \"float32\"],\n        help=\"Data type for model weights (default: auto)\",\n    )\n    parser.add_argument(\n        \"--trust-remote-code\",\n        action=\"store_true\",\n        help=\"Allow executing remote code from model repository\",\n    )\n    parser.add_argument(\n        \"--use-chat-template\",\n        action=\"store_true\",\n        help=\"Apply chat template for instruction-tuned/chat models\",\n    )\n    parser.add_argument(\n        \"--system-prompt\",\n        default=None,\n        help=\"System prompt for chat models\",\n    )\n\n    args = parser.parse_args()\n\n    if args.backend == \"vllm\":\n        run_lighteval_vllm(\n            model_id=args.model,\n            tasks=args.tasks,\n            output_dir=args.output_dir,\n            max_samples=args.max_samples,\n            batch_size=args.batch_size,\n            tensor_parallel_size=args.tensor_parallel_size,\n            gpu_memory_utilization=args.gpu_memory_utilization,\n            dtype=args.dtype,\n            trust_remote_code=args.trust_remote_code,\n            use_chat_template=args.use_chat_template,\n            system_prompt=args.system_prompt,\n        )\n    else:\n        run_lighteval_accelerate(\n            model_id=args.model,\n            tasks=args.tasks,\n            output_dir=args.output_dir,\n            max_samples=args.max_samples,\n            batch_size=args.batch_size,\n            dtype=args.dtype if args.dtype != \"auto\" else \"bfloat16\",\n            trust_remote_code=args.trust_remote_code,\n            use_chat_template=args.use_chat_template,\n            system_prompt=args.system_prompt,\n        )\n\n\nif __name__ == \"__main__\":\n    main()\n\n"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/run_eval_job.py",
    "content": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"huggingface-hub>=0.26.0\",\n#     \"python-dotenv>=1.2.1\",\n# ]\n# ///\n\n\"\"\"\nSubmit evaluation jobs using the `hf jobs uv run` CLI.\n\nThis wrapper constructs the appropriate command to execute the local\n`inspect_eval_uv.py` script on Hugging Face Jobs with the requested hardware.\n\"\"\"\n\nimport argparse\nimport os\nimport subprocess\nimport sys\nfrom pathlib import Path\nfrom typing import Optional\n\nfrom huggingface_hub import get_token\nfrom dotenv import load_dotenv\n\nload_dotenv()\n\n\nSCRIPT_PATH = Path(__file__).with_name(\"inspect_eval_uv.py\").resolve()\n\n\ndef create_eval_job(\n    model_id: str,\n    task: str,\n    hardware: str = \"cpu-basic\",\n    hf_token: Optional[str] = None,\n    limit: Optional[int] = None,\n) -> None:\n    \"\"\"\n    Submit an evaluation job using the Hugging Face Jobs CLI.\n    \"\"\"\n    token = hf_token or os.getenv(\"HF_TOKEN\") or get_token()\n    if not token:\n        raise ValueError(\"HF_TOKEN is required. Set it in environment or pass as argument.\")\n\n    if not SCRIPT_PATH.exists():\n        raise FileNotFoundError(f\"Script not found at {SCRIPT_PATH}\")\n\n    print(f\"Preparing evaluation job for {model_id} on task {task} (hardware: {hardware})\")\n\n    cmd = [\n        \"hf\",\n        \"jobs\",\n        \"uv\",\n        \"run\",\n        str(SCRIPT_PATH),\n        \"--flavor\",\n        hardware,\n        \"--secrets\",\n        f\"HF_TOKEN={token}\",\n        \"--\",\n        \"--model\",\n        model_id,\n        \"--task\",\n        task,\n    ]\n\n    if limit:\n        cmd.extend([\"--limit\", str(limit)])\n\n    print(\"Executing:\", \" \".join(cmd))\n\n    try:\n        subprocess.run(cmd, check=True)\n    except subprocess.CalledProcessError as exc:\n        print(\"hf jobs command failed\", file=sys.stderr)\n        raise\n\n\ndef main() -> None:\n    parser = argparse.ArgumentParser(description=\"Run inspect-ai evaluations on Hugging Face Jobs\")\n    parser.add_argument(\"--model\", required=True, help=\"Model ID (e.g. Qwen/Qwen3-0.6B)\")\n    parser.add_argument(\"--task\", required=True, help=\"Inspect task (e.g. mmlu, gsm8k)\")\n    parser.add_argument(\"--hardware\", default=\"cpu-basic\", help=\"Hardware flavor (e.g. t4-small, a10g-small)\")\n    parser.add_argument(\"--limit\", type=int, default=None, help=\"Limit number of samples to evaluate\")\n\n    args = parser.parse_args()\n\n    create_eval_job(\n        model_id=args.model,\n        task=args.task,\n        hardware=args.hardware,\n        limit=args.limit,\n    )\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/run_vllm_eval_job.py",
    "content": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"huggingface-hub>=0.26.0\",\n#     \"python-dotenv>=1.2.1\",\n# ]\n# ///\n\n\"\"\"\nSubmit vLLM-based evaluation jobs using the `hf jobs uv run` CLI.\n\nThis wrapper constructs the appropriate command to execute vLLM evaluation scripts\n(lighteval or inspect-ai) on Hugging Face Jobs with GPU hardware.\n\nUnlike run_eval_job.py (which uses inference providers/APIs), this script runs\nmodels directly on the job's GPU using vLLM or HuggingFace Transformers.\n\nUsage:\n    python run_vllm_eval_job.py \\\\\n        --model meta-llama/Llama-3.2-1B \\\\\n        --task mmlu \\\\\n        --framework lighteval \\\\\n        --hardware a10g-small\n\"\"\"\n\nfrom __future__ import annotations\n\nimport argparse\nimport os\nimport subprocess\nimport sys\nfrom pathlib import Path\nfrom typing import Optional\n\nfrom huggingface_hub import get_token\nfrom dotenv import load_dotenv\n\nload_dotenv()\n\n# Script paths for different evaluation frameworks\nSCRIPT_DIR = Path(__file__).parent.resolve()\nLIGHTEVAL_SCRIPT = SCRIPT_DIR / \"lighteval_vllm_uv.py\"\nINSPECT_SCRIPT = SCRIPT_DIR / \"inspect_vllm_uv.py\"\n\n# Hardware flavor recommendations for different model sizes\nHARDWARE_RECOMMENDATIONS = {\n    \"small\": \"t4-small\",       # < 3B parameters\n    \"medium\": \"a10g-small\",    # 3B - 13B parameters\n    \"large\": \"a10g-large\",     # 13B - 34B parameters\n    \"xlarge\": \"a100-large\",    # 34B+ parameters\n}\n\n\ndef estimate_hardware(model_id: str) -> str:\n    \"\"\"\n    Estimate appropriate hardware based on model ID naming conventions.\n    \n    Returns a hardware flavor recommendation.\n    \"\"\"\n    model_lower = model_id.lower()\n    \n    # Check for explicit size indicators in model name\n    if any(x in model_lower for x in [\"70b\", \"72b\", \"65b\"]):\n        return \"a100-large\"\n    elif any(x in model_lower for x in [\"34b\", \"33b\", \"32b\", \"30b\"]):\n        return \"a10g-large\"\n    elif any(x in model_lower for x in [\"13b\", \"14b\", \"7b\", \"8b\"]):\n        return \"a10g-small\"\n    elif any(x in model_lower for x in [\"3b\", \"2b\", \"1b\", \"0.5b\", \"small\", \"mini\"]):\n        return \"t4-small\"\n    \n    # Default to medium hardware\n    return \"a10g-small\"\n\n\ndef create_lighteval_job(\n    model_id: str,\n    tasks: str,\n    hardware: str,\n    hf_token: Optional[str] = None,\n    max_samples: Optional[int] = None,\n    backend: str = \"vllm\",\n    batch_size: int = 1,\n    tensor_parallel_size: int = 1,\n    trust_remote_code: bool = False,\n    use_chat_template: bool = False,\n) -> None:\n    \"\"\"\n    Submit a lighteval evaluation job on HuggingFace Jobs.\n    \"\"\"\n    token = hf_token or os.getenv(\"HF_TOKEN\") or get_token()\n    if not token:\n        raise ValueError(\"HF_TOKEN is required. Set it in environment or pass as argument.\")\n\n    if not LIGHTEVAL_SCRIPT.exists():\n        raise FileNotFoundError(f\"Script not found at {LIGHTEVAL_SCRIPT}\")\n\n    print(f\"Preparing lighteval job for {model_id}\")\n    print(f\"  Tasks: {tasks}\")\n    print(f\"  Backend: {backend}\")\n    print(f\"  Hardware: {hardware}\")\n\n    cmd = [\n        \"hf\", \"jobs\", \"uv\", \"run\",\n        str(LIGHTEVAL_SCRIPT),\n        \"--flavor\", hardware,\n        \"--secrets\", f\"HF_TOKEN={token}\",\n        \"--\",\n        \"--model\", model_id,\n        \"--tasks\", tasks,\n        \"--backend\", backend,\n        \"--batch-size\", str(batch_size),\n        \"--tensor-parallel-size\", str(tensor_parallel_size),\n    ]\n\n    if max_samples:\n        cmd.extend([\"--max-samples\", str(max_samples)])\n\n    if trust_remote_code:\n        cmd.append(\"--trust-remote-code\")\n\n    if use_chat_template:\n        cmd.append(\"--use-chat-template\")\n\n    print(f\"\\nExecuting: {' '.join(cmd)}\")\n\n    try:\n        subprocess.run(cmd, check=True)\n    except subprocess.CalledProcessError as exc:\n        print(\"hf jobs command failed\", file=sys.stderr)\n        raise\n\n\ndef create_inspect_job(\n    model_id: str,\n    task: str,\n    hardware: str,\n    hf_token: Optional[str] = None,\n    limit: Optional[int] = None,\n    backend: str = \"vllm\",\n    tensor_parallel_size: int = 1,\n    trust_remote_code: bool = False,\n) -> None:\n    \"\"\"\n    Submit an inspect-ai evaluation job on HuggingFace Jobs.\n    \"\"\"\n    token = hf_token or os.getenv(\"HF_TOKEN\") or get_token()\n    if not token:\n        raise ValueError(\"HF_TOKEN is required. Set it in environment or pass as argument.\")\n\n    if not INSPECT_SCRIPT.exists():\n        raise FileNotFoundError(f\"Script not found at {INSPECT_SCRIPT}\")\n\n    print(f\"Preparing inspect-ai job for {model_id}\")\n    print(f\"  Task: {task}\")\n    print(f\"  Backend: {backend}\")\n    print(f\"  Hardware: {hardware}\")\n\n    cmd = [\n        \"hf\", \"jobs\", \"uv\", \"run\",\n        str(INSPECT_SCRIPT),\n        \"--flavor\", hardware,\n        \"--secrets\", f\"HF_TOKEN={token}\",\n        \"--\",\n        \"--model\", model_id,\n        \"--task\", task,\n        \"--backend\", backend,\n        \"--tensor-parallel-size\", str(tensor_parallel_size),\n    ]\n\n    if limit:\n        cmd.extend([\"--limit\", str(limit)])\n\n    if trust_remote_code:\n        cmd.append(\"--trust-remote-code\")\n\n    print(f\"\\nExecuting: {' '.join(cmd)}\")\n\n    try:\n        subprocess.run(cmd, check=True)\n    except subprocess.CalledProcessError as exc:\n        print(\"hf jobs command failed\", file=sys.stderr)\n        raise\n\n\ndef main() -> None:\n    parser = argparse.ArgumentParser(\n        description=\"Submit vLLM-based evaluation jobs to HuggingFace Jobs\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=\"\"\"\nExamples:\n  # Run lighteval with vLLM on A10G GPU\n  python run_vllm_eval_job.py \\\\\n      --model meta-llama/Llama-3.2-1B \\\\\n      --task \"leaderboard|mmlu|5\" \\\\\n      --framework lighteval \\\\\n      --hardware a10g-small\n\n  # Run inspect-ai on larger model with multi-GPU\n  python run_vllm_eval_job.py \\\\\n      --model meta-llama/Llama-3.2-70B \\\\\n      --task mmlu \\\\\n      --framework inspect \\\\\n      --hardware a100-large \\\\\n      --tensor-parallel-size 4\n\n  # Auto-detect hardware based on model size\n  python run_vllm_eval_job.py \\\\\n      --model meta-llama/Llama-3.2-1B \\\\\n      --task mmlu \\\\\n      --framework inspect\n\n  # Run with HF Transformers backend (instead of vLLM)\n  python run_vllm_eval_job.py \\\\\n      --model microsoft/phi-2 \\\\\n      --task mmlu \\\\\n      --framework inspect \\\\\n      --backend hf\n\nHardware flavors:\n  - t4-small: T4 GPU, good for models < 3B\n  - a10g-small: A10G GPU, good for models 3B-13B\n  - a10g-large: A10G GPU, good for models 13B-34B\n  - a100-large: A100 GPU, good for models 34B+\n\nFrameworks:\n  - lighteval: HuggingFace's lighteval library\n  - inspect: UK AI Safety's inspect-ai library\n\nTask formats:\n  - lighteval: \"suite|task|num_fewshot\" (e.g., \"leaderboard|mmlu|5\")\n  - inspect: task name (e.g., \"mmlu\", \"gsm8k\")\n        \"\"\",\n    )\n\n    parser.add_argument(\n        \"--model\",\n        required=True,\n        help=\"HuggingFace model ID (e.g., meta-llama/Llama-3.2-1B)\",\n    )\n    parser.add_argument(\n        \"--task\",\n        required=True,\n        help=\"Evaluation task (format depends on framework)\",\n    )\n    parser.add_argument(\n        \"--framework\",\n        choices=[\"lighteval\", \"inspect\"],\n        default=\"lighteval\",\n        help=\"Evaluation framework to use (default: lighteval)\",\n    )\n    parser.add_argument(\n        \"--hardware\",\n        default=None,\n        help=\"Hardware flavor (auto-detected if not specified)\",\n    )\n    parser.add_argument(\n        \"--backend\",\n        choices=[\"vllm\", \"hf\", \"accelerate\"],\n        default=\"vllm\",\n        help=\"Model backend (default: vllm)\",\n    )\n    parser.add_argument(\n        \"--limit\",\n        \"--max-samples\",\n        type=int,\n        default=None,\n        dest=\"limit\",\n        help=\"Limit number of samples to evaluate\",\n    )\n    parser.add_argument(\n        \"--batch-size\",\n        type=int,\n        default=1,\n        help=\"Batch size for evaluation (lighteval only)\",\n    )\n    parser.add_argument(\n        \"--tensor-parallel-size\",\n        type=int,\n        default=1,\n        help=\"Number of GPUs for tensor parallelism\",\n    )\n    parser.add_argument(\n        \"--trust-remote-code\",\n        action=\"store_true\",\n        help=\"Allow executing remote code from model repository\",\n    )\n    parser.add_argument(\n        \"--use-chat-template\",\n        action=\"store_true\",\n        help=\"Apply chat template (lighteval only)\",\n    )\n\n    args = parser.parse_args()\n\n    # Auto-detect hardware if not specified\n    hardware = args.hardware or estimate_hardware(args.model)\n    print(f\"Using hardware: {hardware}\")\n\n    # Map backend names between frameworks\n    backend = args.backend\n    if args.framework == \"lighteval\" and backend == \"hf\":\n        backend = \"accelerate\"  # lighteval uses \"accelerate\" for HF backend\n\n    if args.framework == \"lighteval\":\n        create_lighteval_job(\n            model_id=args.model,\n            tasks=args.task,\n            hardware=hardware,\n            max_samples=args.limit,\n            backend=backend,\n            batch_size=args.batch_size,\n            tensor_parallel_size=args.tensor_parallel_size,\n            trust_remote_code=args.trust_remote_code,\n            use_chat_template=args.use_chat_template,\n        )\n    else:\n        create_inspect_job(\n            model_id=args.model,\n            task=args.task,\n            hardware=hardware,\n            limit=args.limit,\n            backend=backend if backend != \"accelerate\" else \"hf\",\n            tensor_parallel_size=args.tensor_parallel_size,\n            trust_remote_code=args.trust_remote_code,\n        )\n\n\nif __name__ == \"__main__\":\n    main()\n\n"
  },
  {
    "path": "skills/hugging-face-evaluation/scripts/test_extraction.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"pyyaml\",\n# ]\n# ///\n\"\"\"\nTest script for evaluation extraction functionality.\n\nThis script demonstrates the table extraction capabilities without\nrequiring HF tokens or making actual API calls.\n\nNote: This script imports from evaluation_manager.py (same directory).\nRun from the scripts/ directory: cd scripts && uv run test_extraction.py\n\"\"\"\n\nimport yaml\n\nfrom evaluation_manager import (\n    extract_tables_from_markdown,\n    parse_markdown_table,\n    is_evaluation_table,\n    extract_metrics_from_table\n)\n\n# Sample README content with various table formats\nSAMPLE_README = \"\"\"\n# My Awesome Model\n\n## Evaluation Results\n\nHere are the benchmark results:\n\n| Benchmark | Score |\n|-----------|-------|\n| MMLU      | 85.2  |\n| HumanEval | 72.5  |\n| GSM8K     | 91.3  |\n\n### Detailed Breakdown\n\n| Category      | MMLU  | GSM8K | HumanEval |\n|---------------|-------|-------|-----------|\n| Performance   | 85.2  | 91.3  | 72.5      |\n\n## Other Information\n\nThis is not an evaluation table:\n\n| Feature | Value |\n|---------|-------|\n| Size    | 7B    |\n| Type    | Chat  |\n\n## More Results\n\n| Benchmark     | Accuracy | F1 Score |\n|---------------|----------|----------|\n| HellaSwag     | 88.9     | 0.87     |\n| TruthfulQA    | 68.7     | 0.65     |\n\"\"\"\n\n\ndef test_table_extraction():\n    \"\"\"Test markdown table extraction.\"\"\"\n    print(\"=\" * 60)\n    print(\"TEST 1: Table Extraction\")\n    print(\"=\" * 60)\n\n    tables = extract_tables_from_markdown(SAMPLE_README)\n    print(f\"Found {len(tables)} tables in the sample README\\n\")\n\n    for i, table in enumerate(tables, 1):\n        print(f\"Table {i}:\")\n        print(table[:100] + \"...\" if len(table) > 100 else table)\n        print()\n\n    return tables\n\n\ndef test_table_parsing(tables):\n    \"\"\"Test table parsing.\"\"\"\n    print(\"\\n\" + \"=\" * 60)\n    print(\"TEST 2: Table Parsing\")\n    print(\"=\" * 60)\n\n    parsed_tables = []\n    for i, table in enumerate(tables, 1):\n        print(f\"\\nParsing Table {i}:\")\n        header, rows = parse_markdown_table(table)\n\n        print(f\"  Header: {header}\")\n        print(f\"  Rows: {len(rows)}\")\n        for j, row in enumerate(rows[:3], 1):  # Show first 3 rows\n            print(f\"    Row {j}: {row}\")\n        if len(rows) > 3:\n            print(f\"    ... and {len(rows) - 3} more rows\")\n\n        parsed_tables.append((header, rows))\n\n    return parsed_tables\n\n\ndef test_evaluation_detection(parsed_tables):\n    \"\"\"Test evaluation table detection.\"\"\"\n    print(\"\\n\" + \"=\" * 60)\n    print(\"TEST 3: Evaluation Table Detection\")\n    print(\"=\" * 60)\n\n    eval_tables = []\n    for i, (header, rows) in enumerate(parsed_tables, 1):\n        is_eval = is_evaluation_table(header, rows)\n        status = \"✓ IS\" if is_eval else \"✗ NOT\"\n        print(f\"\\nTable {i}: {status} an evaluation table\")\n        print(f\"  Header: {header}\")\n\n        if is_eval:\n            eval_tables.append((header, rows))\n\n    print(f\"\\nFound {len(eval_tables)} evaluation tables\")\n    return eval_tables\n\n\ndef test_metric_extraction(eval_tables):\n    \"\"\"Test metric extraction.\"\"\"\n    print(\"\\n\" + \"=\" * 60)\n    print(\"TEST 4: Metric Extraction\")\n    print(\"=\" * 60)\n\n    all_metrics = []\n    for i, (header, rows) in enumerate(eval_tables, 1):\n        print(f\"\\nExtracting metrics from table {i}:\")\n        metrics = extract_metrics_from_table(header, rows, table_format=\"auto\")\n\n        print(f\"  Extracted {len(metrics)} metrics:\")\n        for metric in metrics:\n            print(f\"    - {metric['name']}: {metric['value']} (type: {metric['type']})\")\n\n        all_metrics.extend(metrics)\n\n    return all_metrics\n\n\ndef test_model_index_format(metrics):\n    \"\"\"Test model-index format generation.\"\"\"\n    print(\"\\n\" + \"=\" * 60)\n    print(\"TEST 5: Model-Index Format\")\n    print(\"=\" * 60)\n\n    model_index = {\n        \"model-index\": [\n            {\n                \"name\": \"test-model\",\n                \"results\": [\n                    {\n                        \"task\": {\"type\": \"text-generation\"},\n                        \"dataset\": {\n                            \"name\": \"Benchmarks\",\n                            \"type\": \"benchmark\"\n                        },\n                        \"metrics\": metrics,\n                        \"source\": {\n                            \"name\": \"Model README\",\n                            \"url\": \"https://huggingface.co/test/model\"\n                        }\n                    }\n                ]\n            }\n        ]\n    }\n\n    print(\"\\nGenerated model-index structure:\")\n    print(yaml.dump(model_index, sort_keys=False, default_flow_style=False))\n\n\ndef main():\n    \"\"\"Run all tests.\"\"\"\n    print(\"\\n\" + \"=\" * 60)\n    print(\"EVALUATION EXTRACTION TEST SUITE\")\n    print(\"=\" * 60)\n    print(\"\\nThis test demonstrates the table extraction capabilities\")\n    print(\"without requiring API access or tokens.\\n\")\n\n    # Run tests\n    tables = test_table_extraction()\n    parsed_tables = test_table_parsing(tables)\n    eval_tables = test_evaluation_detection(parsed_tables)\n    metrics = test_metric_extraction(eval_tables)\n    test_model_index_format(metrics)\n\n    # Summary\n    print(\"\\n\" + \"=\" * 60)\n    print(\"TEST SUMMARY\")\n    print(\"=\" * 60)\n    print(f\"✓ Found {len(tables)} total tables\")\n    print(f\"✓ Identified {len(eval_tables)} evaluation tables\")\n    print(f\"✓ Extracted {len(metrics)} metrics\")\n    print(\"✓ Generated model-index format successfully\")\n    print(\"\\n\" + \"=\" * 60)\n    print(\"All tests completed! The extraction logic is working correctly.\")\n    print(\"=\" * 60 + \"\\n\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-jobs/SKILL.md",
    "content": "---\nname: hugging-face-jobs\ndescription: This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.\nlicense: Complete terms in LICENSE.txt\n---\n\n# Running Workloads on Hugging Face Jobs\n\n## Overview\n\nRun any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.\n\n**Common use cases:**\n- **Data Processing** - Transform, filter, or analyze large datasets\n- **Batch Inference** - Run inference on thousands of samples\n- **Experiments & Benchmarks** - Reproducible ML experiments\n- **Model Training** - Fine-tune models (see `model-trainer` skill for TRL-specific training)\n- **Synthetic Data Generation** - Generate datasets using LLMs\n- **Development & Testing** - Test code without local GPU setup\n- **Scheduled Jobs** - Automate recurring tasks\n\n**For model training specifically:** See the `model-trainer` skill for TRL-based training workflows.\n\n## When to Use This Skill\n\nUse this skill when users want to:\n- Run Python workloads on cloud infrastructure\n- Execute jobs without local GPU/TPU setup\n- Process data at scale\n- Run batch inference or experiments\n- Schedule recurring tasks\n- Use GPUs/TPUs for any workload\n- Persist results to the Hugging Face Hub\n\n## Key Directives\n\nWhen assisting with jobs:\n\n1. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs(\"uv\", {...})` or `hf_jobs(\"run\", {...})`. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`.\n\n2. **Always handle authentication** - Jobs that interact with the Hub require `HF_TOKEN` via secrets. See Token Usage section below.\n\n3. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.\n\n4. **Set appropriate timeouts** - Default 30min may be insufficient for long-running tasks.\n\n## Prerequisites Checklist\n\nBefore starting any job, verify:\n\n### ✅ **Account & Authentication**\n- Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)\n- Authenticated login: Check with `hf_whoami()`\n- **HF_TOKEN for Hub Access** ⚠️ CRITICAL - Required for any Hub operations (push models/datasets, download private repos, etc.)\n- Token must have appropriate permissions (read for downloads, write for uploads)\n\n### ✅ **Token Usage** (See Token Usage section for details)\n\n**When tokens are required:**\n- Pushing models/datasets to Hub\n- Accessing private repositories\n- Using Hub APIs in scripts\n- Any authenticated Hub operations\n\n**How to provide tokens:**\n```python\n# hf_jobs MCP tool — $HF_TOKEN is auto-replaced with real token:\n{\"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}}\n\n# HfApi().run_uv_job() — MUST pass actual token:\nfrom huggingface_hub import get_token\nsecrets={\"HF_TOKEN\": get_token()}\n```\n\n**⚠️ CRITICAL:** The `$HF_TOKEN` placeholder is ONLY auto-replaced by the `hf_jobs` MCP tool. When using `HfApi().run_uv_job()`, you MUST pass the real token via `get_token()`. Passing the literal string `\"$HF_TOKEN\"` results in a 9-character invalid token and 401 errors.\n\n## Token Usage Guide\n\n### Understanding Tokens\n\n**What are HF Tokens?**\n- Authentication credentials for Hugging Face Hub\n- Required for authenticated operations (push, private repos, API access)\n- Stored securely on your machine after `hf auth login`\n\n**Token Types:**\n- **Read Token** - Can download models/datasets, read private repos\n- **Write Token** - Can push models/datasets, create repos, modify content\n- **Organization Token** - Can act on behalf of an organization\n\n### When Tokens Are Required\n\n**Always Required:**\n- Pushing models/datasets to Hub\n- Accessing private repositories\n- Creating new repositories\n- Modifying existing repositories\n- Using Hub APIs programmatically\n\n**Not Required:**\n- Downloading public models/datasets\n- Running jobs that don't interact with Hub\n- Reading public repository information\n\n### How to Provide Tokens to Jobs\n\n#### Method 1: Automatic Token (Recommended)\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Automatic replacement\n})\n```\n\n**How it works:**\n- `$HF_TOKEN` is a placeholder that gets replaced with your actual token\n- Uses the token from your logged-in session (`hf auth login`)\n- Most secure and convenient method\n- Token is encrypted server-side when passed as a secret\n\n**Benefits:**\n- No token exposure in code\n- Uses your current login session\n- Automatically updated if you re-login\n- Works seamlessly with MCP tools\n\n#### Method 2: Explicit Token (Not Recommended)\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"secrets\": {\"HF_TOKEN\": \"hf_abc123...\"}  # ⚠️ Hardcoded token\n})\n```\n\n**When to use:**\n- Only if automatic token doesn't work\n- Testing with a specific token\n- Organization tokens (use with caution)\n\n**Security concerns:**\n- Token visible in code/logs\n- Must manually update if token rotates\n- Risk of token exposure\n\n#### Method 3: Environment Variable (Less Secure)\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"env\": {\"HF_TOKEN\": \"hf_abc123...\"}  # ⚠️ Less secure than secrets\n})\n```\n\n**Difference from secrets:**\n- `env` variables are visible in job logs\n- `secrets` are encrypted server-side\n- Always prefer `secrets` for tokens\n\n### Using Tokens in Scripts\n\n**In your Python script, tokens are available as environment variables:**\n\n```python\n# /// script\n# dependencies = [\"huggingface-hub\"]\n# ///\n\nimport os\nfrom huggingface_hub import HfApi\n\n# Token is automatically available if passed via secrets\ntoken = os.environ.get(\"HF_TOKEN\")\n\n# Use with Hub API\napi = HfApi(token=token)\n\n# Or let huggingface_hub auto-detect\napi = HfApi()  # Automatically uses HF_TOKEN env var\n```\n\n**Best practices:**\n- Don't hardcode tokens in scripts\n- Use `os.environ.get(\"HF_TOKEN\")` to access\n- Let `huggingface_hub` auto-detect when possible\n- Verify token exists before Hub operations\n\n### Token Verification\n\n**Check if you're logged in:**\n```python\nfrom huggingface_hub import whoami\nuser_info = whoami()  # Returns your username if authenticated\n```\n\n**Verify token in job:**\n```python\nimport os\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN not found!\"\ntoken = os.environ[\"HF_TOKEN\"]\nprint(f\"Token starts with: {token[:7]}...\")  # Should start with \"hf_\"\n```\n\n### Common Token Issues\n\n**Error: 401 Unauthorized**\n- **Cause:** Token missing or invalid\n- **Fix:** Add `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` to job config\n- **Verify:** Check `hf_whoami()` works locally\n\n**Error: 403 Forbidden**\n- **Cause:** Token lacks required permissions\n- **Fix:** Ensure token has write permissions for push operations\n- **Check:** Token type at https://huggingface.co/settings/tokens\n\n**Error: Token not found in environment**\n- **Cause:** `secrets` not passed or wrong key name\n- **Fix:** Use `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` (not `env`)\n- **Verify:** Script checks `os.environ.get(\"HF_TOKEN\")`\n\n**Error: Repository access denied**\n- **Cause:** Token doesn't have access to private repo\n- **Fix:** Use token from account with access\n- **Check:** Verify repo visibility and your permissions\n\n### Token Security Best Practices\n\n1. **Never commit tokens** - Use `$HF_TOKEN` placeholder or environment variables\n2. **Use secrets, not env** - Secrets are encrypted server-side\n3. **Rotate tokens regularly** - Generate new tokens periodically\n4. **Use minimal permissions** - Create tokens with only needed permissions\n5. **Don't share tokens** - Each user should use their own token\n6. **Monitor token usage** - Check token activity in Hub settings\n\n### Complete Token Example\n\n```python\n# Example: Push results to Hub\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"huggingface-hub\", \"datasets\"]\n# ///\n\nimport os\nfrom huggingface_hub import HfApi\nfrom datasets import Dataset\n\n# Verify token is available\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n\n# Use token for Hub operations\napi = HfApi(token=os.environ[\"HF_TOKEN\"])\n\n# Create and push dataset\ndata = {\"text\": [\"Hello\", \"World\"]}\ndataset = Dataset.from_dict(data)\ndataset.push_to_hub(\"username/my-dataset\", token=os.environ[\"HF_TOKEN\"])\n\nprint(\"✅ Dataset pushed successfully!\")\n\"\"\",\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Token provided securely\n})\n```\n\n## Quick Start: Two Approaches\n\n### Approach 1: UV Scripts (Recommended)\n\nUV scripts use PEP 723 inline dependencies for clean, self-contained workloads.\n\n**MCP Tool:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"transformers\", \"torch\"]\n# ///\n\nfrom transformers import pipeline\nimport torch\n\n# Your workload here\nclassifier = pipeline(\"sentiment-analysis\")\nresult = classifier(\"I love Hugging Face!\")\nprint(result)\n\"\"\",\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\"\n})\n```\n\n**CLI Equivalent:**\n```bash\nhf jobs uv run my_script.py --flavor cpu-basic --timeout 30m\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import run_uv_job\nrun_uv_job(\"my_script.py\", flavor=\"cpu-basic\", timeout=\"30m\")\n```\n\n**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline, no file saving required\n\n**When to use:** Default choice for all workloads, custom logic, any scenario requiring `hf_jobs()`\n\n#### Custom Docker Images for UV Scripts\n\nBy default, UV scripts use `ghcr.io/astral-sh/uv:python3.12-bookworm-slim`. For ML workloads with complex dependencies, use pre-built images:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"inference.py\",\n    \"image\": \"vllm/vllm-openai:latest\",  # Pre-built image with vLLM\n    \"flavor\": \"a10g-large\"\n})\n```\n\n**CLI:**\n```bash\nhf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py\n```\n\n**Benefits:** Faster startup, pre-installed dependencies, optimized for specific frameworks\n\n#### Python Version\n\nBy default, UV scripts use Python 3.12. Specify a different version:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"my_script.py\",\n    \"python\": \"3.11\",  # Use Python 3.11\n    \"flavor\": \"cpu-basic\"\n})\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import run_uv_job\nrun_uv_job(\"my_script.py\", python=\"3.11\")\n```\n\n#### Working with Scripts\n\n⚠️ **Important:** There are *two* \"script path\" stories depending on how you run Jobs:\n\n- **Using the `hf_jobs()` MCP tool (recommended in this repo)**: the `script` value must be **inline code** (a string) or a **URL**. A local filesystem path (like `\"./scripts/foo.py\"`) won't exist inside the remote container.\n- **Using the `hf jobs uv run` CLI**: local file paths **do work** (the CLI uploads your script).\n\n**Common mistake with `hf_jobs()` MCP tool:**\n\n```python\n# ❌ Will fail (remote container can't see your local path)\nhf_jobs(\"uv\", {\"script\": \"./scripts/foo.py\"})\n```\n\n**Correct patterns with `hf_jobs()` MCP tool:**\n\n```python\n# ✅ Inline: read the local script file and pass its *contents*\nfrom pathlib import Path\nscript = Path(\"hf-jobs/scripts/foo.py\").read_text()\nhf_jobs(\"uv\", {\"script\": script})\n\n# ✅ URL: host the script somewhere reachable\nhf_jobs(\"uv\", {\"script\": \"https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py\"})\n\n# ✅ URL from GitHub\nhf_jobs(\"uv\", {\"script\": \"https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py\"})\n```\n\n**CLI equivalent (local paths supported):**\n\n```bash\nhf jobs uv run ./scripts/foo.py -- --your --args\n```\n\n#### Adding Dependencies at Runtime\n\nAdd extra dependencies beyond what's in the PEP 723 header:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"inference.py\",\n    \"dependencies\": [\"transformers\", \"torch>=2.0\"],  # Extra deps\n    \"flavor\": \"a10g-small\"\n})\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import run_uv_job\nrun_uv_job(\"inference.py\", dependencies=[\"transformers\", \"torch>=2.0\"])\n```\n\n### Approach 2: Docker-Based Jobs\n\nRun jobs with custom Docker images and commands.\n\n**MCP Tool:**\n```python\nhf_jobs(\"run\", {\n    \"image\": \"python:3.12\",\n    \"command\": [\"python\", \"-c\", \"print('Hello from HF Jobs!')\"],\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\"\n})\n```\n\n**CLI Equivalent:**\n```bash\nhf jobs run python:3.12 python -c \"print('Hello from HF Jobs!')\"\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import run_job\nrun_job(image=\"python:3.12\", command=[\"python\", \"-c\", \"print('Hello!')\"], flavor=\"cpu-basic\")\n```\n\n**Benefits:** Full Docker control, use pre-built images, run any command\n**When to use:** Need specific Docker images, non-Python workloads, complex environments\n\n**Example with GPU:**\n```python\nhf_jobs(\"run\", {\n    \"image\": \"pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel\",\n    \"command\": [\"python\", \"-c\", \"import torch; print(torch.cuda.get_device_name())\"],\n    \"flavor\": \"a10g-small\",\n    \"timeout\": \"1h\"\n})\n```\n\n**Using Hugging Face Spaces as Images:**\n\nYou can use Docker images from HF Spaces:\n```python\nhf_jobs(\"run\", {\n    \"image\": \"hf.co/spaces/lhoestq/duckdb\",  # Space as Docker image\n    \"command\": [\"duckdb\", \"-c\", \"SELECT 'Hello from DuckDB!'\"],\n    \"flavor\": \"cpu-basic\"\n})\n```\n\n**CLI:**\n```bash\nhf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c \"SELECT 'Hello!'\"\n```\n\n### Finding More UV Scripts on Hub\n\nThe `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:\n\n```python\n# Discover available UV script collections\ndataset_search({\"author\": \"uv-scripts\", \"sort\": \"downloads\", \"limit\": 20})\n\n# Explore a specific collection\nhub_repo_details([\"uv-scripts/classification\"], repo_type=\"dataset\", include_readme=True)\n```\n\n**Popular collections:** OCR, classification, synthetic-data, vLLM, dataset-creation\n\n## Hardware Selection\n\n> **Reference:** [HF Jobs Hardware Docs](https://huggingface.co/docs/hub/en/spaces-config-reference) (updated 07/2025)\n\n| Workload Type | Recommended Hardware | Use Case |\n|---------------|---------------------|----------|\n| Data processing, testing | `cpu-basic`, `cpu-upgrade` | Lightweight tasks |\n| Small models, demos | `t4-small` | <1B models, quick tests |\n| Medium models | `t4-medium`, `l4x1` | 1-7B models |\n| Large models, production | `a10g-small`, `a10g-large` | 7-13B models |\n| Very large models | `a100-large` | 13B+ models |\n| Batch inference | `a10g-large`, `a100-large` | High-throughput |\n| Multi-GPU workloads | `l4x4`, `a10g-largex2`, `a10g-largex4` | Parallel/large models |\n| TPU workloads | `v5e-1x1`, `v5e-2x2`, `v5e-2x4` | JAX/Flax, TPU-optimized |\n\n**All Available Flavors:**\n- **CPU:** `cpu-basic`, `cpu-upgrade`\n- **GPU:** `t4-small`, `t4-medium`, `l4x1`, `l4x4`, `a10g-small`, `a10g-large`, `a10g-largex2`, `a10g-largex4`, `a100-large`\n- **TPU:** `v5e-1x1`, `v5e-2x2`, `v5e-2x4`\n\n**Guidelines:**\n- Start with smaller hardware for testing\n- Scale up based on actual needs\n- Use multi-GPU for parallel workloads or large models\n- Use TPUs for JAX/Flax workloads\n- See `references/hardware_guide.md` for detailed specifications\n\n## Critical: Saving Results\n\n**⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS**\n\nThe Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, **ALL WORK IS LOST**.\n\n### Persistence Options\n\n**1. Push to Hugging Face Hub (Recommended)**\n\n```python\n# Push models\nmodel.push_to_hub(\"username/model-name\", token=os.environ[\"HF_TOKEN\"])\n\n# Push datasets\ndataset.push_to_hub(\"username/dataset-name\", token=os.environ[\"HF_TOKEN\"])\n\n# Push artifacts\napi.upload_file(\n    path_or_fileobj=\"results.json\",\n    path_in_repo=\"results.json\",\n    repo_id=\"username/results\",\n    token=os.environ[\"HF_TOKEN\"]\n)\n```\n\n**2. Use External Storage**\n\n```python\n# Upload to S3, GCS, etc.\nimport boto3\ns3 = boto3.client('s3')\ns3.upload_file('results.json', 'my-bucket', 'results.json')\n```\n\n**3. Send Results via API**\n\n```python\n# POST results to your API\nimport requests\nrequests.post(\"https://your-api.com/results\", json=results)\n```\n\n### Required Configuration for Hub Push\n\n**In job submission:**\n```python\n# hf_jobs MCP tool:\n{\"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}}  # auto-replaced\n\n# HfApi().run_uv_job():\nfrom huggingface_hub import get_token\nsecrets={\"HF_TOKEN\": get_token()}  # must pass real token\n```\n\n**In script:**\n```python\nimport os\nfrom huggingface_hub import HfApi\n\n# Token automatically available from secrets\napi = HfApi(token=os.environ.get(\"HF_TOKEN\"))\n\n# Push your results\napi.upload_file(...)\n```\n\n### Verification Checklist\n\nBefore submitting:\n- [ ] Results persistence method chosen\n- [ ] Token in secrets if using Hub (MCP: `\"$HF_TOKEN\"`, Python API: `get_token()`)\n- [ ] Script handles missing token gracefully\n- [ ] Test persistence path works\n\n**See:** `references/hub_saving.md` for detailed Hub persistence guide\n\n## Timeout Management\n\n**⚠️ DEFAULT: 30 MINUTES**\n\nJobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.\n\n### Setting Timeouts\n\n**MCP Tool:**\n```python\n{\n    \"timeout\": \"2h\"   # 2 hours\n}\n```\n\n**Supported formats:**\n- Integer/float: seconds (e.g., `300` = 5 minutes)\n- String with suffix: `\"5m\"` (minutes), `\"2h\"` (hours), `\"1d\"` (days)\n- Examples: `\"90m\"`, `\"2h\"`, `\"1.5h\"`, `300`, `\"1d\"`\n\n**Python API:**\n```python\nfrom huggingface_hub import run_job, run_uv_job\n\nrun_job(image=\"python:3.12\", command=[...], timeout=\"2h\")\nrun_uv_job(\"script.py\", timeout=7200)  # 2 hours in seconds\n```\n\n### Timeout Guidelines\n\n| Scenario | Recommended | Notes |\n|----------|-------------|-------|\n| Quick test | 10-30 min | Verify setup |\n| Data processing | 1-2 hours | Depends on data size |\n| Batch inference | 2-4 hours | Large batches |\n| Experiments | 4-8 hours | Multiple runs |\n| Long-running | 8-24 hours | Production workloads |\n\n**Always add 20-30% buffer** for setup, network delays, and cleanup.\n\n**On timeout:** Job killed immediately, all unsaved progress lost\n\n## Cost Estimation\n\n**General guidelines:**\n\n```\nTotal Cost = (Hours of runtime) × (Cost per hour)\n```\n\n**Example calculations:**\n\n**Quick test:**\n- Hardware: cpu-basic ($0.10/hour)\n- Time: 15 minutes (0.25 hours)\n- Cost: $0.03\n\n**Data processing:**\n- Hardware: l4x1 ($2.50/hour)\n- Time: 2 hours\n- Cost: $5.00\n\n**Batch inference:**\n- Hardware: a10g-large ($5/hour)\n- Time: 4 hours\n- Cost: $20.00\n\n**Cost optimization tips:**\n1. Start small - Test on cpu-basic or t4-small\n2. Monitor runtime - Set appropriate timeouts\n3. Use checkpoints - Resume if job fails\n4. Optimize code - Reduce unnecessary compute\n5. Choose right hardware - Don't over-provision\n\n## Monitoring and Tracking\n\n### Check Job Status\n\n**MCP Tool:**\n```python\n# List all jobs\nhf_jobs(\"ps\")\n\n# Inspect specific job\nhf_jobs(\"inspect\", {\"job_id\": \"your-job-id\"})\n\n# View logs\nhf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})\n\n# Cancel a job\nhf_jobs(\"cancel\", {\"job_id\": \"your-job-id\"})\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job\n\n# List your jobs\njobs = list_jobs()\n\n# List running jobs only\nrunning = [j for j in list_jobs() if j.status.stage == \"RUNNING\"]\n\n# Inspect specific job\njob_info = inspect_job(job_id=\"your-job-id\")\n\n# View logs\nfor log in fetch_job_logs(job_id=\"your-job-id\"):\n    print(log)\n\n# Cancel a job\ncancel_job(job_id=\"your-job-id\")\n```\n\n**CLI:**\n```bash\nhf jobs ps                    # List jobs\nhf jobs logs <job-id>         # View logs\nhf jobs cancel <job-id>       # Cancel job\n```\n\n**Remember:** Wait for user to request status checks. Avoid polling repeatedly.\n\n### Job URLs\n\nAfter submission, jobs have monitoring URLs:\n```\nhttps://huggingface.co/jobs/username/job-id\n```\n\nView logs, status, and details in the browser.\n\n### Wait for Multiple Jobs\n\n```python\nimport time\nfrom huggingface_hub import inspect_job, run_job\n\n# Run multiple jobs\njobs = [run_job(image=img, command=cmd) for img, cmd in workloads]\n\n# Wait for all to complete\nfor job in jobs:\n    while inspect_job(job_id=job.id).status.stage not in (\"COMPLETED\", \"ERROR\"):\n        time.sleep(10)\n```\n\n## Scheduled Jobs\n\nRun jobs on a schedule using CRON expressions or predefined schedules.\n\n**MCP Tool:**\n```python\n# Schedule a UV script that runs every hour\nhf_jobs(\"scheduled uv\", {\n    \"script\": \"your_script.py\",\n    \"schedule\": \"@hourly\",\n    \"flavor\": \"cpu-basic\"\n})\n\n# Schedule with CRON syntax\nhf_jobs(\"scheduled uv\", {\n    \"script\": \"your_script.py\",\n    \"schedule\": \"0 9 * * 1\",  # 9 AM every Monday\n    \"flavor\": \"cpu-basic\"\n})\n\n# Schedule a Docker-based job\nhf_jobs(\"scheduled run\", {\n    \"image\": \"python:3.12\",\n    \"command\": [\"python\", \"-c\", \"print('Scheduled!')\"],\n    \"schedule\": \"@daily\",\n    \"flavor\": \"cpu-basic\"\n})\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import create_scheduled_job, create_scheduled_uv_job\n\n# Schedule a Docker job\ncreate_scheduled_job(\n    image=\"python:3.12\",\n    command=[\"python\", \"-c\", \"print('Running on schedule!')\"],\n    schedule=\"@hourly\"\n)\n\n# Schedule a UV script\ncreate_scheduled_uv_job(\"my_script.py\", schedule=\"@daily\", flavor=\"cpu-basic\")\n\n# Schedule with GPU\ncreate_scheduled_uv_job(\n    \"ml_inference.py\",\n    schedule=\"0 */6 * * *\",  # Every 6 hours\n    flavor=\"a10g-small\"\n)\n```\n\n**Available schedules:**\n- `@annually`, `@yearly` - Once per year\n- `@monthly` - Once per month\n- `@weekly` - Once per week\n- `@daily` - Once per day\n- `@hourly` - Once per hour\n- CRON expression - Custom schedule (e.g., `\"*/5 * * * *\"` for every 5 minutes)\n\n**Manage scheduled jobs:**\n```python\n# MCP Tool\nhf_jobs(\"scheduled ps\")                              # List scheduled jobs\nhf_jobs(\"scheduled inspect\", {\"job_id\": \"...\"})     # Inspect details\nhf_jobs(\"scheduled suspend\", {\"job_id\": \"...\"})     # Pause\nhf_jobs(\"scheduled resume\", {\"job_id\": \"...\"})      # Resume\nhf_jobs(\"scheduled delete\", {\"job_id\": \"...\"})      # Delete\n```\n\n**Python API for management:**\n```python\nfrom huggingface_hub import (\n    list_scheduled_jobs,\n    inspect_scheduled_job,\n    suspend_scheduled_job,\n    resume_scheduled_job,\n    delete_scheduled_job\n)\n\n# List all scheduled jobs\nscheduled = list_scheduled_jobs()\n\n# Inspect a scheduled job\ninfo = inspect_scheduled_job(scheduled_job_id)\n\n# Suspend (pause) a scheduled job\nsuspend_scheduled_job(scheduled_job_id)\n\n# Resume a scheduled job\nresume_scheduled_job(scheduled_job_id)\n\n# Delete a scheduled job\ndelete_scheduled_job(scheduled_job_id)\n```\n\n## Webhooks: Trigger Jobs on Events\n\nTrigger jobs automatically when changes happen in Hugging Face repositories.\n\n**Python API:**\n```python\nfrom huggingface_hub import create_webhook\n\n# Create webhook that triggers a job when a repo changes\nwebhook = create_webhook(\n    job_id=job.id,\n    watched=[\n        {\"type\": \"user\", \"name\": \"your-username\"},\n        {\"type\": \"org\", \"name\": \"your-org-name\"}\n    ],\n    domains=[\"repo\", \"discussion\"],\n    secret=\"your-secret\"\n)\n```\n\n**How it works:**\n1. Webhook listens for changes in watched repositories\n2. When triggered, the job runs with `WEBHOOK_PAYLOAD` environment variable\n3. Your script can parse the payload to understand what changed\n\n**Use cases:**\n- Auto-process new datasets when uploaded\n- Trigger inference when models are updated\n- Run tests when code changes\n- Generate reports on repository activity\n\n**Access webhook payload in script:**\n```python\nimport os\nimport json\n\npayload = json.loads(os.environ.get(\"WEBHOOK_PAYLOAD\", \"{}\"))\nprint(f\"Event type: {payload.get('event', {}).get('action')}\")\n```\n\nSee [Webhooks Documentation](https://huggingface.co/docs/huggingface_hub/guides/webhooks) for more details.\n\n## Common Workload Patterns\n\nThis repository ships ready-to-run UV scripts in `hf-jobs/scripts/`. Prefer using them instead of inventing new templates.\n\n### Pattern 1: Dataset → Model Responses (vLLM) — `scripts/generate-responses.py`\n\n**What it does:** loads a Hub dataset (chat `messages` or a `prompt` column), applies a model chat template, generates responses with vLLM, and **pushes** the output dataset + dataset card back to the Hub.\n\n**Requires:** GPU + **write** token (it pushes a dataset).\n\n```python\nfrom pathlib import Path\n\nscript = Path(\"hf-jobs/scripts/generate-responses.py\").read_text()\nhf_jobs(\"uv\", {\n    \"script\": script,\n    \"script_args\": [\n        \"username/input-dataset\",\n        \"username/output-dataset\",\n        \"--messages-column\", \"messages\",\n        \"--model-id\", \"Qwen/Qwen3-30B-A3B-Instruct-2507\",\n        \"--temperature\", \"0.7\",\n        \"--top-p\", \"0.8\",\n        \"--max-tokens\", \"2048\",\n    ],\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"4h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n})\n```\n\n### Pattern 2: CoT Self-Instruct Synthetic Data — `scripts/cot-self-instruct.py`\n\n**What it does:** generates synthetic prompts/answers via CoT Self-Instruct, optionally filters outputs (answer-consistency / RIP), then **pushes** the generated dataset + dataset card to the Hub.\n\n**Requires:** GPU + **write** token (it pushes a dataset).\n\n```python\nfrom pathlib import Path\n\nscript = Path(\"hf-jobs/scripts/cot-self-instruct.py\").read_text()\nhf_jobs(\"uv\", {\n    \"script\": script,\n    \"script_args\": [\n        \"--seed-dataset\", \"davanstrien/s1k-reasoning\",\n        \"--output-dataset\", \"username/synthetic-math\",\n        \"--task-type\", \"reasoning\",\n        \"--num-samples\", \"5000\",\n        \"--filter-method\", \"answer-consistency\",\n    ],\n    \"flavor\": \"l4x4\",\n    \"timeout\": \"8h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n})\n```\n\n### Pattern 3: Streaming Dataset Stats (Polars + HF Hub) — `scripts/finepdfs-stats.py`\n\n**What it does:** scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.\n\n**Requires:** CPU is often enough; token needed **only** if you pass `--output-repo` (upload).\n\n```python\nfrom pathlib import Path\n\nscript = Path(\"hf-jobs/scripts/finepdfs-stats.py\").read_text()\nhf_jobs(\"uv\", {\n    \"script\": script,\n    \"script_args\": [\n        \"--limit\", \"10000\",\n        \"--show-plan\",\n        \"--output-repo\", \"username/finepdfs-temporal-stats\",\n    ],\n    \"flavor\": \"cpu-upgrade\",\n    \"timeout\": \"2h\",\n    \"env\": {\"HF_XET_HIGH_PERFORMANCE\": \"1\"},\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n})\n```\n\n## Common Failure Modes\n\n### Out of Memory (OOM)\n\n**Fix:**\n1. Reduce batch size or data chunk size\n2. Process data in smaller batches\n3. Upgrade hardware: cpu → t4 → a10g → a100\n\n### Job Timeout\n\n**Fix:**\n1. Check logs for actual runtime\n2. Increase timeout with buffer: `\"timeout\": \"3h\"`\n3. Optimize code for faster execution\n4. Process data in chunks\n\n### Hub Push Failures\n\n**Fix:**\n1. Add token to secrets: MCP uses `\"$HF_TOKEN\"` (auto-replaced), Python API uses `get_token()` (must pass real token)\n2. Verify token in script: `assert \"HF_TOKEN\" in os.environ`\n3. Check token permissions\n4. Verify repo exists or can be created\n\n### Missing Dependencies\n\n**Fix:**\nAdd to PEP 723 header:\n```python\n# /// script\n# dependencies = [\"package1\", \"package2>=1.0.0\"]\n# ///\n```\n\n### Authentication Errors\n\n**Fix:**\n1. Check `hf_whoami()` works locally\n2. Verify token in secrets — MCP: `\"$HF_TOKEN\"`, Python API: `get_token()` (NOT `\"$HF_TOKEN\"`)\n3. Re-login: `hf auth login`\n4. Check token has required permissions\n\n## Troubleshooting\n\n**Common issues:**\n- Job times out → Increase timeout, optimize code\n- Results not saved → Check persistence method, verify HF_TOKEN\n- Out of Memory → Reduce batch size, upgrade hardware\n- Import errors → Add dependencies to PEP 723 header\n- Authentication errors → Check token, verify secrets parameter\n\n**See:** `references/troubleshooting.md` for complete troubleshooting guide\n\n## Resources\n\n### References (In This Skill)\n- `references/token_usage.md` - Complete token usage guide\n- `references/hardware_guide.md` - Hardware specs and selection\n- `references/hub_saving.md` - Hub persistence guide\n- `references/troubleshooting.md` - Common issues and solutions\n\n### Scripts (In This Skill)\n- `scripts/generate-responses.py` - vLLM batch generation: dataset → responses → push to Hub\n- `scripts/cot-self-instruct.py` - CoT Self-Instruct synthetic data generation + filtering → push to Hub\n- `scripts/finepdfs-stats.py` - Polars streaming stats over `finepdfs-edu` parquet on Hub (optional push)\n\n### External Links\n\n**Official Documentation:**\n- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs) - Main documentation\n- [HF Jobs CLI Reference](https://huggingface.co/docs/huggingface_hub/guides/cli#hf-jobs) - Command line interface\n- [HF Jobs API Reference](https://huggingface.co/docs/huggingface_hub/package_reference/hf_api) - Python API details\n- [Hardware Flavors Reference](https://huggingface.co/docs/hub/en/spaces-config-reference) - Available hardware\n\n**Related Tools:**\n- [UV Scripts Guide](https://docs.astral.sh/uv/guides/scripts/) - PEP 723 inline dependencies\n- [UV Scripts Organization](https://huggingface.co/uv-scripts) - Community UV script collection\n- [HF Hub Authentication](https://huggingface.co/docs/huggingface_hub/quick-start#authentication) - Token setup\n- [Webhooks Documentation](https://huggingface.co/docs/huggingface_hub/guides/webhooks) - Event triggers\n\n## Key Takeaways\n\n1. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests\n2. **Jobs are asynchronous** - Don't wait/poll; let user check when ready\n3. **Always set timeout** - Default 30 min may be insufficient; set appropriate timeout\n4. **Always persist results** - Environment is ephemeral; without persistence, all work is lost\n5. **Use tokens securely** - MCP: `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`, Python API: `secrets={\"HF_TOKEN\": get_token()}` — `\"$HF_TOKEN\"` only works with MCP tool\n6. **Choose appropriate hardware** - Start small, scale up based on needs (see hardware guide)\n7. **Use UV scripts** - Default to `hf_jobs(\"uv\", {...})` with inline scripts for Python workloads\n8. **Handle authentication** - Verify tokens are available before Hub operations\n9. **Monitor jobs** - Provide job URLs and status check commands\n10. **Optimize costs** - Choose right hardware, set appropriate timeouts\n\n## Quick Reference: MCP Tool vs CLI vs Python API\n\n| Operation | MCP Tool | CLI | Python API |\n|-----------|----------|-----|------------|\n| Run UV script | `hf_jobs(\"uv\", {...})` | `hf jobs uv run script.py` | `run_uv_job(\"script.py\")` |\n| Run Docker job | `hf_jobs(\"run\", {...})` | `hf jobs run image cmd` | `run_job(image, command)` |\n| List jobs | `hf_jobs(\"ps\")` | `hf jobs ps` | `list_jobs()` |\n| View logs | `hf_jobs(\"logs\", {...})` | `hf jobs logs <id>` | `fetch_job_logs(job_id)` |\n| Cancel job | `hf_jobs(\"cancel\", {...})` | `hf jobs cancel <id>` | `cancel_job(job_id)` |\n| Schedule UV | `hf_jobs(\"scheduled uv\", {...})` | - | `create_scheduled_uv_job()` |\n| Schedule Docker | `hf_jobs(\"scheduled run\", {...})` | - | `create_scheduled_job()` |\n\n"
  },
  {
    "path": "skills/hugging-face-jobs/index.html",
    "content": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n    <title>hf-jobs - Run Workloads on Hugging Face Jobs</title>\n    <style>\n        * {\n            margin: 0;\n            padding: 0;\n            box-sizing: border-box;\n        }\n        \n        body {\n            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;\n            line-height: 1.6;\n            color: #333;\n            background: #f5f5f5;\n            padding: 20px;\n        }\n        \n        .container {\n            max-width: 1200px;\n            margin: 0 auto;\n            background: white;\n            padding: 40px;\n            border-radius: 8px;\n            box-shadow: 0 2px 4px rgba(0,0,0,0.1);\n        }\n        \n        h1 {\n            color: #ffd21e;\n            background: #000;\n            padding: 20px;\n            margin: -40px -40px 30px -40px;\n            border-radius: 8px 8px 0 0;\n        }\n        \n        h2 {\n            color: #1e1e1e;\n            margin-top: 30px;\n            margin-bottom: 15px;\n            padding-bottom: 10px;\n            border-bottom: 2px solid #ffd21e;\n        }\n        \n        h3 {\n            color: #555;\n            margin-top: 20px;\n            margin-bottom: 10px;\n        }\n        \n        .description {\n            background: #f9f9f9;\n            padding: 20px;\n            border-left: 4px solid #ffd21e;\n            margin-bottom: 30px;\n            border-radius: 4px;\n        }\n        \n        .file-list {\n            list-style: none;\n            padding: 0;\n        }\n        \n        .file-list li {\n            padding: 12px;\n            margin: 8px 0;\n            background: #f9f9f9;\n            border-radius: 4px;\n            border-left: 3px solid #ffd21e;\n            transition: background 0.2s;\n        }\n        \n        .file-list li:hover {\n            background: #f0f0f0;\n        }\n        \n        .file-list a {\n            color: #0066cc;\n            text-decoration: none;\n            font-weight: 500;\n            display: block;\n        }\n        \n        .file-list a:hover {\n            text-decoration: underline;\n        }\n        \n        .file-path {\n            color: #666;\n            font-size: 0.9em;\n            font-family: 'Monaco', 'Courier New', monospace;\n            margin-top: 4px;\n        }\n        \n        .file-description {\n            color: #777;\n            font-size: 0.9em;\n            margin-top: 4px;\n            font-style: italic;\n        }\n        \n        .metadata {\n            background: #f0f0f0;\n            padding: 15px;\n            border-radius: 4px;\n            margin-bottom: 30px;\n        }\n        \n        .metadata p {\n            margin: 5px 0;\n        }\n        \n        .metadata strong {\n            color: #333;\n        }\n        \n        .section {\n            margin-bottom: 40px;\n        }\n        \n        code {\n            background: #f4f4f4;\n            padding: 2px 6px;\n            border-radius: 3px;\n            font-family: 'Monaco', 'Courier New', monospace;\n            font-size: 0.9em;\n        }\n    </style>\n</head>\n<body>\n    <div class=\"container\">\n        <h1>Agent Skill : hf-jobs</h1>\n        \n        <div class=\"description\">\n            <p><strong>Run any workload on Hugging Face Jobs.</strong></p>\n            <p>Use this skill when you want to run GPU/CPU workloads (batch inference, synthetic data generation, dataset stats, experiments) on Hugging Face Jobs, with correct token handling and result persistence back to the Hub.</p>\n        </div>\n        \n        <div class=\"metadata\">\n            <p><strong>Skill Name:</strong> hf-jobs</p>\n            <p><strong>Main Documentation:</strong> <a href=\"hf-jobs/SKILL.md\">hf-jobs/SKILL.md</a></p>\n            <p><strong>Scripts Directory:</strong> <code>hf-jobs/scripts/</code></p>\n            <p><strong>References Directory:</strong> <code>hf-jobs/references/</code></p>\n        </div>\n        \n        <div class=\"section\">\n            <h2>Overview</h2>\n            <p>This skill focuses on running real workloads via Hugging Face Jobs. It includes ready-to-run UV scripts and guides for authentication (HF tokens), secrets vs env vars, timeouts, hardware selection, and pushing results to the Hub.</p>\n        </div>\n\n        <div class=\"section\">\n            <h2>Core Documentation</h2>\n            <ul class=\"file-list\">\n                <li>\n                    <a href=\"hf-jobs/SKILL.md\">SKILL.md</a>\n                    <div class=\"file-path\">hf-jobs/SKILL.md</div>\n                    <div class=\"file-description\">Complete skill documentation (how to submit jobs, tokens/secrets, timeouts, persistence, and how to use the bundled scripts)</div>\n                </li>\n            </ul>\n        </div>\n\n        <div class=\"section\">\n            <h2>References</h2>\n            <ul class=\"file-list\">\n                <li>\n                    <a href=\"hf-jobs/references/token_usage.md\">token_usage.md</a>\n                    <div class=\"file-path\">hf-jobs/references/token_usage.md</div>\n                    <div class=\"file-description\">Token best practices: secrets vs env, permissions, common errors (401/403), and secure patterns</div>\n                </li>\n                <li>\n                    <a href=\"hf-jobs/references/hub_saving.md\">hub_saving.md</a>\n                    <div class=\"file-path\">hf-jobs/references/hub_saving.md</div>\n                    <div class=\"file-description\">How to persist results: push datasets/models/files to the Hub (ephemeral job filesystem)</div>\n                </li>\n                <li>\n                    <a href=\"hf-jobs/references/hardware_guide.md\">hardware_guide.md</a>\n                    <div class=\"file-path\">hf-jobs/references/hardware_guide.md</div>\n                    <div class=\"file-description\">Flavor selection guidance for CPU/GPU/TPU workloads</div>\n                </li>\n                <li>\n                    <a href=\"hf-jobs/references/troubleshooting.md\">troubleshooting.md</a>\n                    <div class=\"file-path\">hf-jobs/references/troubleshooting.md</div>\n                    <div class=\"file-description\">Common failure modes (timeouts, missing deps, OOM, auth) and fixes</div>\n                </li>\n            </ul>\n        </div>\n\n        <div class=\"section\">\n            <h2>Scripts</h2>\n            <ul class=\"file-list\">\n                <li>\n                    <a href=\"hf-jobs/scripts/generate-responses.py\">generate-responses.py</a>\n                    <div class=\"file-path\">hf-jobs/scripts/generate-responses.py</div>\n                    <div class=\"file-description\">vLLM batch generation: load prompts/messages from a dataset, generate responses, push dataset + card to Hub</div>\n                </li>\n                <li>\n                    <a href=\"hf-jobs/scripts/cot-self-instruct.py\">cot-self-instruct.py</a>\n                    <div class=\"file-path\">hf-jobs/scripts/cot-self-instruct.py</div>\n                    <div class=\"file-description\">CoT Self-Instruct synthetic data generation (reasoning/instruction) + optional filtering, pushes dataset + card</div>\n                </li>\n                <li>\n                    <a href=\"hf-jobs/scripts/finepdfs-stats.py\">finepdfs-stats.py</a>\n                    <div class=\"file-path\">hf-jobs/scripts/finepdfs-stats.py</div>\n                    <div class=\"file-description\">Polars streaming stats over Hub parquet (finepdfs-edu); optional upload of computed stats to a dataset repo</div>\n                </li>\n            </ul>\n        </div>\n    </div>\n</body>\n</html>\n\n\n\n\n"
  },
  {
    "path": "skills/hugging-face-jobs/references/hardware_guide.md",
    "content": "# Hardware Selection Guide\n\nChoosing the right hardware (flavor) is critical for cost-effective workloads.\n\n> **Reference:** [HF Jobs Hardware Documentation](https://huggingface.co/docs/hub/en/spaces-config-reference) (updated 07/2025)\n\n## Available Hardware\n\n### CPU Flavors\n| Flavor | Description | Use Case |\n|--------|-------------|----------|\n| `cpu-basic` | Basic CPU instance | Testing, lightweight scripts |\n| `cpu-upgrade` | Enhanced CPU instance | Data processing, parallel workloads |\n\n**Use cases:** Data processing, testing scripts, lightweight workloads\n**Not recommended for:** Model training, GPU-accelerated workloads\n\n### GPU Flavors\n\n| Flavor | GPU | VRAM | Use Case |\n|--------|-----|------|----------|\n| `t4-small` | NVIDIA T4 | 16GB | <1B models, demos, quick tests |\n| `t4-medium` | NVIDIA T4 | 16GB | 1-3B models, development |\n| `l4x1` | NVIDIA L4 | 24GB | 3-7B models, efficient workloads |\n| `l4x4` | 4x NVIDIA L4 | 96GB | Multi-GPU, parallel workloads |\n| `a10g-small` | NVIDIA A10G | 24GB | 3-7B models, production |\n| `a10g-large` | NVIDIA A10G | 24GB | 7-13B models, batch inference |\n| `a10g-largex2` | 2x NVIDIA A10G | 48GB | Multi-GPU, large models |\n| `a10g-largex4` | 4x NVIDIA A10G | 96GB | Multi-GPU, very large models |\n| `a100-large` | NVIDIA A100 | 40GB | 13B+ models, fastest GPU option |\n\n### TPU Flavors\n\n| Flavor | Configuration | Use Case |\n|--------|---------------|----------|\n| `v5e-1x1` | TPU v5e (1x1) | Small TPU workloads |\n| `v5e-2x2` | TPU v5e (2x2) | Medium TPU workloads |\n| `v5e-2x4` | TPU v5e (2x4) | Large TPU workloads |\n\n**TPU Use Cases:**\n- JAX/Flax model training\n- Large-scale inference\n- TPU-optimized workloads\n\n## Selection Guidelines\n\n### By Workload Type\n\n**Data Processing**\n- **Recommended:** `cpu-upgrade` or `l4x1`\n- **Use case:** Transform, filter, analyze datasets\n- **Batch size:** Depends on data size\n- **Time:** Varies by dataset size\n\n**Batch Inference**\n- **Recommended:** `a10g-large` or `a100-large`\n- **Use case:** Run inference on thousands of samples\n- **Batch size:** 8-32 depending on model\n- **Time:** Depends on number of samples\n\n**Experiments & Benchmarks**\n- **Recommended:** `a10g-small` or `a10g-large`\n- **Use case:** Reproducible ML experiments\n- **Batch size:** Varies\n- **Time:** Depends on experiment complexity\n\n**Model Training** (see `model-trainer` skill for details)\n- **Recommended:** See model-trainer skill\n- **Use case:** Fine-tuning models\n- **Batch size:** Depends on model size\n- **Time:** Hours to days\n\n**Synthetic Data Generation**\n- **Recommended:** `a10g-large` or `a100-large`\n- **Use case:** Generate datasets using LLMs\n- **Batch size:** Depends on generation method\n- **Time:** Hours for large datasets\n\n### By Budget\n\n**Minimal Budget (<$5 total)**\n- Use `cpu-basic` or `t4-small`\n- Process small datasets\n- Quick tests and demos\n\n**Small Budget ($5-20)**\n- Use `t4-medium` or `a10g-small`\n- Process medium datasets\n- Run experiments\n\n**Medium Budget ($20-50)**\n- Use `a10g-small` or `a10g-large`\n- Process large datasets\n- Production workloads\n\n**Large Budget ($50-200)**\n- Use `a10g-large` or `a100-large`\n- Large-scale processing\n- Multiple experiments\n\n### By Model Size (for inference/processing)\n\n**Tiny Models (<1B parameters)**\n- **Recommended:** `t4-small`\n- **Example:** Qwen2.5-0.5B, TinyLlama\n- **Batch size:** 8-16\n\n**Small Models (1-3B parameters)**\n- **Recommended:** `t4-medium` or `a10g-small`\n- **Example:** Qwen2.5-1.5B, Phi-2\n- **Batch size:** 4-8\n\n**Medium Models (3-7B parameters)**\n- **Recommended:** `a10g-small` or `a10g-large`\n- **Example:** Qwen2.5-7B, Mistral-7B\n- **Batch size:** 2-4\n\n**Large Models (7-13B parameters)**\n- **Recommended:** `a10g-large` or `a100-large`\n- **Example:** Llama-3-8B\n- **Batch size:** 1-2\n\n**Very Large Models (13B+ parameters)**\n- **Recommended:** `a100-large`\n- **Example:** Llama-3-13B, Llama-3-70B\n- **Batch size:** 1\n\n## Memory Considerations\n\n### Estimating Memory Requirements\n\n**For inference:**\n```\nMemory (GB) ≈ (Model params in billions) × 2-4\n```\n\n**For training:**\n```\nMemory (GB) ≈ (Model params in billions) × 20 (full) or × 4 (LoRA)\n```\n\n**Examples:**\n- Qwen2.5-0.5B inference: ~1-2GB ✅ fits t4-small\n- Qwen2.5-7B inference: ~14-28GB ✅ fits a10g-large\n- Qwen2.5-7B training: ~140GB ❌ not feasible without LoRA\n\n### Memory Optimization\n\nIf hitting memory limits:\n\n1. **Reduce batch size**\n   ```python\n   batch_size = 1\n   ```\n\n2. **Process in chunks**\n   ```python\n   for chunk in chunks:\n       process(chunk)\n   ```\n\n3. **Use smaller models**\n   - Use quantized models\n   - Use LoRA adapters\n\n4. **Upgrade hardware**\n   - cpu → t4 → a10g → a100\n\n## Cost Estimation\n\n### Formula\n\n```\nTotal Cost = (Hours of runtime) × (Cost per hour)\n```\n\n### Example Calculations\n\n**Data processing:**\n- Hardware: cpu-upgrade ($0.50/hour)\n- Time: 1 hour\n- Cost: $0.50\n\n**Batch inference:**\n- Hardware: a10g-large ($5/hour)\n- Time: 2 hours\n- Cost: $10.00\n\n**Experiments:**\n- Hardware: a10g-small ($3.50/hour)\n- Time: 4 hours\n- Cost: $14.00\n\n### Cost Optimization Tips\n\n1. **Start small:** Test on cpu-basic or t4-small\n2. **Monitor runtime:** Set appropriate timeouts\n3. **Optimize code:** Reduce unnecessary compute\n4. **Choose right hardware:** Don't over-provision\n5. **Use checkpoints:** Resume if job fails\n6. **Monitor costs:** Check running jobs regularly\n\n## Multi-GPU Workloads\n\nMulti-GPU flavors automatically distribute workloads:\n\n**Multi-GPU flavors:**\n- `l4x4` - 4x L4 GPUs (96GB total VRAM)\n- `a10g-largex2` - 2x A10G GPUs (48GB total VRAM)\n- `a10g-largex4` - 4x A10G GPUs (96GB total VRAM)\n\n**When to use:**\n- Large models (>13B parameters)\n- Need faster processing (linear speedup)\n- Large datasets (>100K samples)\n- Parallel workloads\n- Tensor parallelism for inference\n\n**MCP Tool Example:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"process.py\",\n    \"flavor\": \"a10g-largex2\",  # 2 GPUs\n    \"timeout\": \"4h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n**CLI Equivalent:**\n```bash\nhf jobs uv run process.py --flavor a10g-largex2 --timeout 4h\n```\n\n## Choosing Between Options\n\n### CPU vs GPU\n\n**Choose CPU when:**\n- No GPU acceleration needed\n- Data processing only\n- Budget constrained\n- Simple workloads\n\n**Choose GPU when:**\n- Model inference/training\n- GPU-accelerated libraries\n- Need faster processing\n- Large models\n\n### a10g vs a100\n\n**Choose a10g when:**\n- Model <13B parameters\n- Budget conscious\n- Processing time not critical\n\n**Choose a100 when:**\n- Model 13B+ parameters\n- Need fastest processing\n- Memory requirements high\n- Budget allows\n\n### Single vs Multi-GPU\n\n**Choose single GPU when:**\n- Model <7B parameters\n- Budget constrained\n- Simpler debugging\n\n**Choose multi-GPU when:**\n- Model >13B parameters\n- Need faster processing\n- Large batch sizes required\n- Cost-effective for large jobs\n\n## Quick Reference\n\n### All Available Flavors\n\n```python\n# Official flavor list (updated 07/2025)\nFLAVORS = {\n    # CPU\n    \"cpu-basic\",      # Testing, lightweight\n    \"cpu-upgrade\",    # Data processing\n    \n    # GPU - Single\n    \"t4-small\",       # 16GB - <1B models\n    \"t4-medium\",      # 16GB - 1-3B models\n    \"l4x1\",           # 24GB - 3-7B models\n    \"a10g-small\",     # 24GB - 3-7B production\n    \"a10g-large\",     # 24GB - 7-13B models\n    \"a100-large\",     # 40GB - 13B+ models\n    \n    # GPU - Multi\n    \"l4x4\",           # 4x L4 (96GB total)\n    \"a10g-largex2\",   # 2x A10G (48GB total)\n    \"a10g-largex4\",   # 4x A10G (96GB total)\n    \n    # TPU\n    \"v5e-1x1\",        # TPU v5e 1x1\n    \"v5e-2x2\",        # TPU v5e 2x2\n    \"v5e-2x4\",        # TPU v5e 2x4\n}\n```\n\n### Workload → Hardware Mapping\n\n```python\nHARDWARE_MAP = {\n    \"data_processing\": \"cpu-upgrade\",\n    \"batch_inference_small\": \"t4-small\",\n    \"batch_inference_medium\": \"a10g-large\",\n    \"batch_inference_large\": \"a100-large\",\n    \"experiments\": \"a10g-small\",\n    \"tpu_workloads\": \"v5e-1x1\",\n    \"training\": \"see model-trainer skill\"\n}\n```\n\n### CLI Examples\n\n```bash\n# CPU job\nhf jobs run python:3.12 python script.py\n\n# GPU job\nhf jobs run --flavor a10g-large pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel python script.py\n\n# TPU job\nhf jobs run --flavor v5e-1x1 your-tpu-image python script.py\n\n# UV script with GPU\nhf jobs uv run --flavor a10g-small my_script.py\n```\n\n"
  },
  {
    "path": "skills/hugging-face-jobs/references/hub_saving.md",
    "content": "# Saving Results to Hugging Face Hub\n\n**⚠️ CRITICAL:** Job environments are ephemeral. ALL results are lost when a job completes unless persisted to the Hub or external storage.\n\n## Why Persistence is Required\n\nWhen running on Hugging Face Jobs:\n- Environment is temporary\n- All files deleted on job completion\n- No local disk persistence\n- Cannot access results after job ends\n\n**Without persistence, all work is permanently lost.**\n\n## Persistence Options\n\n### Option 1: Push to Hugging Face Hub (Recommended)\n\n**For models:**\n```python\nfrom transformers import AutoModel\nmodel.push_to_hub(\"username/model-name\", token=os.environ.get(\"HF_TOKEN\"))\n```\n\n**For datasets:**\n```python\nfrom datasets import Dataset\ndataset.push_to_hub(\"username/dataset-name\", token=os.environ.get(\"HF_TOKEN\"))\n```\n\n**For files/artifacts:**\n```python\nfrom huggingface_hub import HfApi\napi = HfApi(token=os.environ.get(\"HF_TOKEN\"))\napi.upload_file(\n    path_or_fileobj=\"results.json\",\n    path_in_repo=\"results.json\",\n    repo_id=\"username/results\",\n    repo_type=\"dataset\"\n)\n```\n\n### Option 2: External Storage\n\n**S3:**\n```python\nimport boto3\ns3 = boto3.client('s3')\ns3.upload_file('results.json', 'my-bucket', 'results.json')\n```\n\n**Google Cloud Storage:**\n```python\nfrom google.cloud import storage\nclient = storage.Client()\nbucket = client.bucket('my-bucket')\nblob = bucket.blob('results.json')\nblob.upload_from_filename('results.json')\n```\n\n### Option 3: API Endpoint\n\n```python\nimport requests\nrequests.post(\"https://your-api.com/results\", json=results)\n```\n\n## Required Configuration for Hub Push\n\n### Job Configuration\n\n**Always include HF_TOKEN:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Required for Hub operations\n})\n```\n\n### Script Configuration\n\n**Verify token exists:**\n```python\nimport os\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required for Hub operations!\"\n```\n\n**Use token for Hub operations:**\n```python\nfrom huggingface_hub import HfApi\n\n# Auto-detects HF_TOKEN from environment\napi = HfApi()\n\n# Or explicitly pass token\napi = HfApi(token=os.environ.get(\"HF_TOKEN\"))\n```\n\n## Complete Examples\n\n### Example 1: Push Dataset\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"datasets\", \"huggingface-hub\"]\n# ///\n\nimport os\nfrom datasets import Dataset\nfrom huggingface_hub import HfApi\n\n# Verify token\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n\n# Process data\ndata = {\"text\": [\"Sample 1\", \"Sample 2\"]}\ndataset = Dataset.from_dict(data)\n\n# Push to Hub\ndataset.push_to_hub(\"username/my-dataset\")\nprint(\"✅ Dataset pushed!\")\n\"\"\",\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n### Example 2: Push Model\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"transformers\"]\n# ///\n\nimport os\nfrom transformers import AutoModel, AutoTokenizer\n\n# Verify token\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n\n# Load and process model\nmodel = AutoModel.from_pretrained(\"base-model\")\ntokenizer = AutoTokenizer.from_pretrained(\"base-model\")\n# ... process model ...\n\n# Push to Hub\nmodel.push_to_hub(\"username/my-model\")\ntokenizer.push_to_hub(\"username/my-model\")\nprint(\"✅ Model pushed!\")\n\"\"\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"2h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n### Example 3: Push Artifacts\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"huggingface-hub\", \"pandas\"]\n# ///\n\nimport os\nimport json\nimport pandas as pd\nfrom huggingface_hub import HfApi\n\n# Verify token\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n\n# Generate results\nresults = {\"accuracy\": 0.95, \"loss\": 0.05}\ndf = pd.DataFrame([results])\n\n# Save files\nwith open(\"results.json\", \"w\") as f:\n    json.dump(results, f)\ndf.to_csv(\"results.csv\", index=False)\n\n# Push to Hub\napi = HfApi()\napi.upload_file(\"results.json\", \"results.json\", \"username/results\", repo_type=\"dataset\")\napi.upload_file(\"results.csv\", \"results.csv\", \"username/results\", repo_type=\"dataset\")\nprint(\"✅ Results pushed!\")\n\"\"\",\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n## Authentication Methods\n\n### Method 1: Automatic Token (Recommended)\n\n```python\n\"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n```\n\nUses your logged-in Hugging Face token automatically.\n\n### Method 2: Explicit Token\n\n```python\n\"secrets\": {\"HF_TOKEN\": \"hf_abc123...\"}\n```\n\nProvide token explicitly (not recommended for security).\n\n### Method 3: Environment Variable\n\n```python\n\"env\": {\"HF_TOKEN\": \"hf_abc123...\"}\n```\n\nPass as regular environment variable (less secure than secrets).\n\n**Always prefer Method 1** for security and convenience.\n\n## Verification Checklist\n\nBefore submitting any job that saves to Hub, verify:\n\n- [ ] `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` in job config\n- [ ] Script checks for token: `assert \"HF_TOKEN\" in os.environ`\n- [ ] Hub push code included in script\n- [ ] Repository name doesn't conflict with existing repos\n- [ ] You have write access to the target namespace\n\n## Repository Setup\n\n### Automatic Creation\n\nIf repository doesn't exist, it's created automatically when first pushing (if token has write permissions).\n\n### Manual Creation\n\nCreate repository before pushing:\n\n```python\nfrom huggingface_hub import HfApi\n\napi = HfApi()\napi.create_repo(\n    repo_id=\"username/repo-name\",\n    repo_type=\"model\",  # or \"dataset\"\n    private=False,  # or True for private repo\n)\n```\n\n### Repository Naming\n\n**Valid names:**\n- `username/my-model`\n- `username/model-name`\n- `organization/model-name`\n\n**Invalid names:**\n- `model-name` (missing username)\n- `username/model name` (spaces not allowed)\n- `username/MODEL` (uppercase discouraged)\n\n## Troubleshooting\n\n### Error: 401 Unauthorized\n\n**Cause:** HF_TOKEN not provided or invalid\n\n**Solutions:**\n1. Verify `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` in job config\n2. Check you're logged in: `hf_whoami()`\n3. Re-login: `hf auth login`\n\n### Error: 403 Forbidden\n\n**Cause:** No write access to repository\n\n**Solutions:**\n1. Check repository namespace matches your username\n2. Verify you're a member of organization (if using org namespace)\n3. Check token has write permissions\n\n### Error: Repository not found\n\n**Cause:** Repository doesn't exist and auto-creation failed\n\n**Solutions:**\n1. Manually create repository first\n2. Check repository name format\n3. Verify namespace exists\n\n### Error: Push failed\n\n**Cause:** Network issues or Hub unavailable\n\n**Solutions:**\n1. Check logs for specific error\n2. Verify token is valid\n3. Retry push operation\n\n## Best Practices\n\n1. **Always verify token exists** before Hub operations\n2. **Use descriptive repo names** (e.g., `my-experiment-results` not `results`)\n3. **Push incrementally** for large results (use checkpoints)\n4. **Verify push success** in logs before job completes\n5. **Use appropriate repo types** (model vs dataset)\n6. **Add README** with result descriptions\n7. **Tag repos** with relevant tags\n\n## Monitoring Push Progress\n\nCheck logs for push progress:\n\n**MCP Tool:**\n```python\nhf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})\n```\n\n**CLI:**\n```bash\nhf jobs logs <job-id>\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import fetch_job_logs\nfor log in fetch_job_logs(job_id=\"your-job-id\"):\n    print(log)\n```\n\n**Look for:**\n```\nPushing to username/repo-name...\nUpload file results.json: 100%\n✅ Push successful\n```\n\n## Key Takeaway\n\n**Without `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` and persistence code, all results are permanently lost.**\n\nAlways verify both are configured before submitting any job that produces results.\n\n"
  },
  {
    "path": "skills/hugging-face-jobs/references/token_usage.md",
    "content": "# Token Usage Guide for Hugging Face Jobs\n\n**⚠️ CRITICAL:** Proper token usage is essential for any job that interacts with the Hugging Face Hub.\n\n## Overview\n\nHugging Face tokens are authentication credentials that allow your jobs to interact with the Hub. They're required for:\n- Pushing models/datasets to Hub\n- Accessing private repositories\n- Creating new repositories\n- Using Hub APIs programmatically\n- Any authenticated Hub operations\n\n## Token Types\n\n### Read Token\n- **Permissions:** Download models/datasets, read private repos\n- **Use case:** Jobs that only need to download/read content\n- **Creation:** https://huggingface.co/settings/tokens\n\n### Write Token\n- **Permissions:** Push models/datasets, create repos, modify content\n- **Use case:** Jobs that need to upload results (most common)\n- **Creation:** https://huggingface.co/settings/tokens\n- **⚠️ Required for:** Pushing models, datasets, or any uploads\n\n### Organization Token\n- **Permissions:** Act on behalf of an organization\n- **Use case:** Jobs running under organization namespace\n- **Creation:** Organization settings → Tokens\n\n## Providing Tokens to Jobs\n\n### Method 1: `hf_jobs` MCP tool with `$HF_TOKEN` (Recommended) ⭐\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Automatic replacement\n})\n```\n\n**How it works:**\n1. `$HF_TOKEN` is a placeholder that gets replaced with your actual token\n2. Uses the token from your logged-in session (`hf auth login`)\n3. Token is encrypted server-side when passed as a secret\n4. Most secure and convenient method\n\n**Benefits:**\n- ✅ No token exposure in code\n- ✅ Uses your current login session\n- ✅ Automatically updated if you re-login\n- ✅ Works seamlessly with MCP tools\n- ✅ Token encrypted server-side\n\n**Requirements:**\n- Must be logged in: `hf auth login` or `hf_whoami()` works\n- Token must have required permissions\n\n**⚠️ CRITICAL:** `$HF_TOKEN` auto-replacement is an `hf_jobs` MCP tool feature ONLY. It does NOT work with `HfApi().run_uv_job()` — see Method 1b below.\n\n### Method 1b: `HfApi().run_uv_job()` with `get_token()` (Required for Python API)\n\n```python\nfrom huggingface_hub import HfApi, get_token\napi = HfApi()\napi.run_uv_job(\n    script=\"your_script.py\",\n    secrets={\"HF_TOKEN\": get_token()},  # ✅ Passes actual token value\n)\n```\n\n**How it works:**\n1. `get_token()` retrieves the token from your logged-in session\n2. The actual token value is passed to the `secrets` parameter\n3. Token is encrypted server-side\n\n**Why `\"$HF_TOKEN\"` fails with `HfApi().run_uv_job()`:**\n- The Python API passes the literal string `\"$HF_TOKEN\"` (9 characters) as the token\n- The Jobs server receives this invalid string instead of a real token\n- Result: `401 Unauthorized` errors when the script tries to authenticate\n- You MUST use `get_token()` from `huggingface_hub` to get the real token\n\n### Method 2: Explicit Token (Not Recommended)\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"secrets\": {\"HF_TOKEN\": \"hf_abc123...\"}  # ⚠️ Hardcoded token\n})\n```\n\n**When to use:**\n- Only if automatic token doesn't work\n- Testing with a specific token\n- Organization tokens (use with caution)\n\n**Security concerns:**\n- ❌ Token visible in code/logs\n- ❌ Must manually update if token rotates\n- ❌ Risk of token exposure\n- ❌ Not recommended for production\n\n### Method 3: Environment Variable (Less Secure)\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"your_script.py\",\n    \"env\": {\"HF_TOKEN\": \"hf_abc123...\"}  # ⚠️ Less secure than secrets\n})\n```\n\n**Difference from secrets:**\n- `env` variables are visible in job logs\n- `secrets` are encrypted server-side\n- Always prefer `secrets` for tokens\n\n**When to use:**\n- Only for non-sensitive configuration\n- Never use for tokens (use `secrets` instead)\n\n## Using Tokens in Scripts\n\n### Accessing Tokens\n\nTokens passed via `secrets` are available as environment variables in your script:\n\n```python\nimport os\n\n# Get token from environment\ntoken = os.environ.get(\"HF_TOKEN\")\n\n# Verify token exists\nif not token:\n    raise ValueError(\"HF_TOKEN not found in environment!\")\n```\n\n### Using with Hugging Face Hub\n\n**Option 1: Explicit token parameter**\n```python\nfrom huggingface_hub import HfApi\n\napi = HfApi(token=os.environ.get(\"HF_TOKEN\"))\napi.upload_file(...)\n```\n\n**Option 2: Auto-detection (Recommended)**\n```python\nfrom huggingface_hub import HfApi\n\n# Automatically uses HF_TOKEN env var\napi = HfApi()  # ✅ Simpler, uses token from environment\napi.upload_file(...)\n```\n\n**Option 3: With transformers/datasets**\n```python\nfrom transformers import AutoModel\nfrom datasets import load_dataset\n\n# Auto-detects HF_TOKEN from environment\nmodel = AutoModel.from_pretrained(\"username/model\")\ndataset = load_dataset(\"username/dataset\")\n\n# For push operations, token is auto-detected\nmodel.push_to_hub(\"username/new-model\")\ndataset.push_to_hub(\"username/new-dataset\")\n```\n\n### Complete Example\n\n```python\n# /// script\n# dependencies = [\"huggingface-hub\", \"datasets\"]\n# ///\n\nimport os\nfrom huggingface_hub import HfApi\nfrom datasets import Dataset\n\n# Verify token is available\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required for Hub operations!\"\n\n# Use token for Hub operations\napi = HfApi()  # Auto-detects HF_TOKEN\n\n# Create and push dataset\ndata = {\"text\": [\"Hello\", \"World\"]}\ndataset = Dataset.from_dict(data)\n\n# Push to Hub (token auto-detected)\ndataset.push_to_hub(\"username/my-dataset\")\n\nprint(\"✅ Dataset pushed successfully!\")\n```\n\n## Token Verification\n\n### Check Authentication Locally\n\n```python\nfrom huggingface_hub import whoami\n\ntry:\n    user_info = whoami()\n    print(f\"✅ Logged in as: {user_info['name']}\")\nexcept Exception as e:\n    print(f\"❌ Not authenticated: {e}\")\n```\n\n### Verify Token in Job\n\n```python\nimport os\n\n# Check token exists\nif \"HF_TOKEN\" not in os.environ:\n    raise ValueError(\"HF_TOKEN not found in environment!\")\n\ntoken = os.environ[\"HF_TOKEN\"]\n\n# Verify token format (should start with \"hf_\")\nif not token.startswith(\"hf_\"):\n    raise ValueError(f\"Invalid token format: {token[:10]}...\")\n\n# Test token works\nfrom huggingface_hub import whoami\ntry:\n    user_info = whoami(token=token)\n    print(f\"✅ Token valid for user: {user_info['name']}\")\nexcept Exception as e:\n    raise ValueError(f\"Token validation failed: {e}\")\n```\n\n## Common Token Issues\n\n### Error: 401 Unauthorized\n\n**Symptoms:**\n```\n401 Client Error: Unauthorized for url: https://huggingface.co/api/...\n```\n\n**Causes:**\n1. Token missing from job\n2. Token invalid or expired\n3. Token not passed correctly\n\n**Solutions:**\n1. Add `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` to job config\n2. Verify `hf_whoami()` works locally\n3. Re-login: `hf auth login`\n4. Check token hasn't expired\n\n**Verification:**\n```python\n# In your script\nimport os\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN missing!\"\n```\n\n### Error: 403 Forbidden\n\n**Symptoms:**\n```\n403 Client Error: Forbidden for url: https://huggingface.co/api/...\n```\n\n**Causes:**\n1. Token lacks required permissions (read-only token used for write)\n2. No access to private repository\n3. Organization permissions insufficient\n\n**Solutions:**\n1. Ensure token has write permissions\n2. Check token type at https://huggingface.co/settings/tokens\n3. Verify access to target repository\n4. Use organization token if needed\n\n**Check token permissions:**\n```python\nfrom huggingface_hub import whoami\n\nuser_info = whoami()\nprint(f\"User: {user_info['name']}\")\nprint(f\"Type: {user_info.get('type', 'user')}\")\n```\n\n### Error: Token not found in environment\n\n**Symptoms:**\n```\nKeyError: 'HF_TOKEN'\nValueError: HF_TOKEN not found\n```\n\n**Causes:**\n1. `secrets` not passed in job config\n2. Wrong key name (should be `HF_TOKEN`)\n3. Using `env` instead of `secrets`\n\n**Solutions:**\n1. Use `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` (not `env`)\n2. Verify key name is exactly `HF_TOKEN`\n3. Check job config syntax\n\n**Correct configuration:**\n```python\n# ✅ Correct\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n\n# ❌ Wrong - using env instead of secrets\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"env\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # Less secure\n})\n\n# ❌ Wrong - wrong key name\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"secrets\": {\"TOKEN\": \"$HF_TOKEN\"}  # Wrong key\n})\n```\n\n### Error: Repository access denied\n\n**Symptoms:**\n```\n403 Client Error: Forbidden\nRepository not found or access denied\n```\n\n**Causes:**\n1. Token doesn't have access to private repo\n2. Repository doesn't exist and can't be created\n3. Wrong namespace\n\n**Solutions:**\n1. Use token from account with access\n2. Verify repo visibility (public vs private)\n3. Check namespace matches token owner\n4. Create repo first if needed\n\n**Check repository access:**\n```python\nfrom huggingface_hub import HfApi\n\napi = HfApi()\ntry:\n    repo_info = api.repo_info(\"username/repo-name\")\n    print(f\"✅ Access granted: {repo_info.id}\")\nexcept Exception as e:\n    print(f\"❌ Access denied: {e}\")\n```\n\n## Token Security Best Practices\n\n### 1. Never Commit Tokens\n\n**❌ Bad:**\n```python\n# Never do this!\ntoken = \"hf_abc123xyz...\"\napi = HfApi(token=token)\n```\n\n**✅ Good:**\n```python\n# Use environment variable\ntoken = os.environ.get(\"HF_TOKEN\")\napi = HfApi(token=token)\n```\n\n### 2. Use Secrets, Not Environment Variables\n\n**❌ Bad:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"env\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # Visible in logs\n})\n```\n\n**✅ Good:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # Encrypted server-side\n})\n```\n\n### 3. Use Automatic Token Replacement\n\n**❌ Bad:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"secrets\": {\"HF_TOKEN\": \"hf_abc123...\"}  # Hardcoded\n})\n```\n\n**✅ Good:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # Automatic\n})\n```\n\n### 4. Rotate Tokens Regularly\n\n- Generate new tokens periodically\n- Revoke old tokens\n- Update job configurations\n- Monitor token usage\n\n### 5. Use Minimal Permissions\n\n- Create tokens with only needed permissions\n- Use read tokens when write isn't needed\n- Don't use admin tokens for regular jobs\n\n### 6. Don't Share Tokens\n\n- Each user should use their own token\n- Don't commit tokens to repositories\n- Don't share tokens in logs or messages\n\n### 7. Monitor Token Usage\n\n- Check token activity in Hub settings\n- Review job logs for token issues\n- Set up alerts for unauthorized access\n\n## Token Workflow Examples\n\n### Example 1: Push Model to Hub\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"transformers\"]\n# ///\n\nimport os\nfrom transformers import AutoModel, AutoTokenizer\n\n# Verify token\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n\n# Load and process model\nmodel = AutoModel.from_pretrained(\"base-model\")\n# ... process model ...\n\n# Push to Hub (token auto-detected)\nmodel.push_to_hub(\"username/my-model\")\nprint(\"✅ Model pushed!\")\n\"\"\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"2h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Token provided\n})\n```\n\n### Example 2: Access Private Dataset\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"datasets\"]\n# ///\n\nimport os\nfrom datasets import load_dataset\n\n# Verify token\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n\n# Load private dataset (token auto-detected)\ndataset = load_dataset(\"private-org/private-dataset\")\nprint(f\"✅ Loaded {len(dataset)} examples\")\n\"\"\",\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Token provided\n})\n```\n\n### Example 3: Create and Push Dataset\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"datasets\", \"huggingface-hub\"]\n# ///\n\nimport os\nfrom datasets import Dataset\nfrom huggingface_hub import HfApi\n\n# Verify token\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n\n# Create dataset\ndata = {\"text\": [\"Sample 1\", \"Sample 2\"]}\ndataset = Dataset.from_dict(data)\n\n# Push to Hub\napi = HfApi()  # Auto-detects HF_TOKEN\ndataset.push_to_hub(\"username/my-dataset\")\nprint(\"✅ Dataset pushed!\")\n\"\"\",\n    \"flavor\": \"cpu-basic\",\n    \"timeout\": \"30m\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Token provided\n})\n```\n\n## Quick Reference\n\n### Token Checklist\n\nBefore submitting a job that uses Hub:\n\n- [ ] Job includes `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`\n- [ ] Script checks for token: `assert \"HF_TOKEN\" in os.environ`\n- [ ] Token has required permissions (read/write)\n- [ ] User is logged in: `hf_whoami()` works\n- [ ] Token not hardcoded in script\n- [ ] Using `secrets` not `env` for token\n\n### Common Patterns\n\n**Pattern 1: Auto-detect token**\n```python\nfrom huggingface_hub import HfApi\napi = HfApi()  # Uses HF_TOKEN from environment\n```\n\n**Pattern 2: Explicit token**\n```python\nimport os\nfrom huggingface_hub import HfApi\napi = HfApi(token=os.environ.get(\"HF_TOKEN\"))\n```\n\n**Pattern 3: Verify token**\n```python\nimport os\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN required!\"\n```\n\n## Key Takeaways\n\n1. **Always use `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`** for Hub operations\n2. **Never hardcode tokens** in scripts or job configs\n3. **Verify token exists** in script before Hub operations\n4. **Use auto-detection** when possible (`HfApi()` without token parameter)\n5. **Check permissions** - ensure token has required access\n6. **Monitor token usage** - review activity regularly\n7. **Rotate tokens** - generate new tokens periodically\n\n"
  },
  {
    "path": "skills/hugging-face-jobs/references/troubleshooting.md",
    "content": "# Troubleshooting Guide\n\nCommon issues and solutions for Hugging Face Jobs.\n\n## Authentication Issues\n\n### Error: 401 Unauthorized\n\n**Symptoms:**\n```\n401 Client Error: Unauthorized for url: https://huggingface.co/api/...\n```\n\n**Causes:**\n- Token missing from job\n- Token invalid or expired\n- Token not passed correctly\n\n**Solutions:**\n1. Add token to secrets: `hf_jobs` MCP uses `\"$HF_TOKEN\"` (auto-replaced); `HfApi().run_uv_job()` MUST use `get_token()` from `huggingface_hub` (the literal string `\"$HF_TOKEN\"` will NOT work with the Python API)\n2. Verify `hf_whoami()` works locally\n3. Re-login: `hf auth login`\n4. Check token hasn't expired\n\n**Verification:**\n```python\n# In your script\nimport os\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN missing!\"\n```\n\n### Error: 403 Forbidden\n\n**Symptoms:**\n```\n403 Client Error: Forbidden for url: https://huggingface.co/api/...\n```\n\n**Causes:**\n- Token lacks required permissions\n- No access to private repository\n- Organization permissions insufficient\n\n**Solutions:**\n1. Ensure token has write permissions\n2. Check token type at https://huggingface.co/settings/tokens\n3. Verify access to target repository\n4. Use organization token if needed\n\n### Error: Token not found in environment\n\n**Symptoms:**\n```\nKeyError: 'HF_TOKEN'\nValueError: HF_TOKEN not found\n```\n\n**Causes:**\n- `secrets` not passed in job config\n- Wrong key name (should be `HF_TOKEN`)\n- Using `env` instead of `secrets`\n\n**Solutions:**\n1. Use `secrets` (not `env`) — with `hf_jobs` MCP: `\"$HF_TOKEN\"`; with `HfApi().run_uv_job()`: `get_token()`\n2. Verify key name is exactly `HF_TOKEN`\n3. Check job config syntax\n\n## Job Execution Issues\n\n### Error: Job Timeout\n\n**Symptoms:**\n- Job stops unexpectedly\n- Status shows \"TIMEOUT\"\n- Partial results only\n\n**Causes:**\n- Default 30min timeout exceeded\n- Job takes longer than expected\n- No timeout specified\n\n**Solutions:**\n1. Check logs for actual runtime\n2. Increase timeout with buffer: `\"timeout\": \"3h\"`\n3. Optimize code for faster execution\n4. Process data in chunks\n5. Add 20-30% buffer to estimated time\n\n**MCP Tool Example:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"timeout\": \"2h\"  # Set appropriate timeout\n})\n```\n\n**Python API Example:**\n```python\nfrom huggingface_hub import run_uv_job, inspect_job, fetch_job_logs\n\njob = run_uv_job(\"script.py\", timeout=\"4h\")\n\n# Check if job failed\njob_info = inspect_job(job_id=job.id)\nif job_info.status.stage == \"ERROR\":\n    print(f\"Job failed: {job_info.status.message}\")\n    # Check logs for details\n    for log in fetch_job_logs(job_id=job.id):\n        print(log)\n```\n\n### Error: Out of Memory (OOM)\n\n**Symptoms:**\n```\nRuntimeError: CUDA out of memory\nMemoryError: Unable to allocate array\n```\n\n**Causes:**\n- Batch size too large\n- Model too large for hardware\n- Insufficient GPU memory\n\n**Solutions:**\n1. Reduce batch size\n2. Process data in smaller chunks\n3. Upgrade hardware: cpu → t4 → a10g → a100\n4. Use smaller models or quantization\n5. Enable gradient checkpointing (for training)\n\n**Example:**\n```python\n# Reduce batch size\nbatch_size = 1\n\n# Process in chunks\nfor chunk in chunks:\n    process(chunk)\n```\n\n### Error: Missing Dependencies\n\n**Symptoms:**\n```\nModuleNotFoundError: No module named 'package_name'\nImportError: cannot import name 'X'\n```\n\n**Causes:**\n- Package not in dependencies\n- Wrong package name\n- Version mismatch\n\n**Solutions:**\n1. Add to PEP 723 header:\n   ```python\n   # /// script\n   # dependencies = [\"package-name>=1.0.0\"]\n   # ///\n   ```\n2. Check package name spelling\n3. Specify version if needed\n4. Check package availability\n\n### Error: Script Not Found\n\n**Symptoms:**\n```\nFileNotFoundError: script.py not found\n```\n\n**Causes:**\n- Local file path used (not supported)\n- URL incorrect\n- Script not accessible\n\n**Solutions:**\n1. Use inline script (recommended)\n2. Use publicly accessible URL\n3. Upload script to Hub first\n4. Check URL is correct\n\n**Correct approaches:**\n```python\n# ✅ Inline code\nhf_jobs(\"uv\", {\"script\": \"# /// script\\n# dependencies = [...]\\n# ///\\n\\n<code>\"})\n\n# ✅ From URL\nhf_jobs(\"uv\", {\"script\": \"https://huggingface.co/user/repo/resolve/main/script.py\"})\n```\n\n## Hub Push Issues\n\n### Error: Push Failed\n\n**Symptoms:**\n```\nError pushing to Hub\nUpload failed\n```\n\n**Causes:**\n- Network issues\n- Token missing or invalid\n- Repository access denied\n- File too large\n\n**Solutions:**\n1. Check token: `assert \"HF_TOKEN\" in os.environ`\n2. Verify repository exists or can be created\n3. Check network connectivity in logs\n4. Retry push operation\n5. Split large files into chunks\n\n### Error: Repository Not Found\n\n**Symptoms:**\n```\n404 Client Error: Not Found\nRepository not found\n```\n\n**Causes:**\n- Repository doesn't exist\n- Wrong repository name\n- No access to private repo\n\n**Solutions:**\n1. Create repository first:\n   ```python\n   from huggingface_hub import HfApi\n   api = HfApi()\n   api.create_repo(\"username/repo-name\", repo_type=\"dataset\")\n   ```\n2. Check repository name format\n3. Verify namespace exists\n4. Check repository visibility\n\n### Error: Results Not Saved\n\n**Symptoms:**\n- Job completes successfully\n- No results visible on Hub\n- Files not persisted\n\n**Causes:**\n- No persistence code in script\n- Push code not executed\n- Push failed silently\n\n**Solutions:**\n1. Add persistence code to script\n2. Verify push executes successfully\n3. Check logs for push errors\n4. Add error handling around push\n\n**Example:**\n```python\ntry:\n    dataset.push_to_hub(\"username/dataset\")\n    print(\"✅ Push successful\")\nexcept Exception as e:\n    print(f\"❌ Push failed: {e}\")\n    raise\n```\n\n## Hardware Issues\n\n### Error: GPU Not Available\n\n**Symptoms:**\n```\nCUDA not available\nNo GPU found\n```\n\n**Causes:**\n- CPU flavor used instead of GPU\n- GPU not requested\n- CUDA not installed in image\n\n**Solutions:**\n1. Use GPU flavor: `\"flavor\": \"a10g-large\"`\n2. Check image has CUDA support\n3. Verify GPU availability in logs\n\n### Error: Slow Performance\n\n**Symptoms:**\n- Job takes longer than expected\n- Low GPU utilization\n- CPU bottleneck\n\n**Causes:**\n- Wrong hardware selected\n- Inefficient code\n- Data loading bottleneck\n\n**Solutions:**\n1. Upgrade hardware\n2. Optimize code\n3. Use batch processing\n4. Profile code to find bottlenecks\n\n## General Issues\n\n### Error: Job Status Unknown\n\n**Symptoms:**\n- Can't check job status\n- Status API returns error\n\n**Solutions:**\n1. Use job URL: `https://huggingface.co/jobs/username/job-id`\n2. Check logs: `hf_jobs(\"logs\", {\"job_id\": \"...\"})`\n3. Inspect job: `hf_jobs(\"inspect\", {\"job_id\": \"...\"})`\n\n### Error: Logs Not Available\n\n**Symptoms:**\n- No logs visible\n- Logs delayed\n\n**Causes:**\n- Job just started (logs delayed 30-60s)\n- Job failed before logging\n- Logs not yet generated\n\n**Solutions:**\n1. Wait 30-60 seconds after job start\n2. Check job status first\n3. Use job URL for web interface\n\n### Error: Cost Unexpectedly High\n\n**Symptoms:**\n- Job costs more than expected\n- Longer runtime than estimated\n\n**Causes:**\n- Job ran longer than timeout\n- Wrong hardware selected\n- Inefficient code\n\n**Solutions:**\n1. Monitor job runtime\n2. Set appropriate timeout\n3. Optimize code\n4. Choose right hardware\n5. Check cost estimates before running\n\n## Debugging Tips\n\n### 1. Add Logging\n\n```python\nimport logging\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\nlogger.info(\"Starting processing...\")\nlogger.info(f\"Processed {count} items\")\n```\n\n### 2. Verify Environment\n\n```python\nimport os\nprint(f\"Python version: {os.sys.version}\")\nprint(f\"CUDA available: {torch.cuda.is_available()}\")\nprint(f\"HF_TOKEN present: {'HF_TOKEN' in os.environ}\")\n```\n\n### 3. Test Locally First\n\nRun script locally before submitting to catch errors early:\n```bash\npython script.py\n# Or with uv\nuv run script.py\n```\n\n### 4. Check Job Logs\n\n**MCP Tool:**\n```python\n# View logs\nhf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})\n```\n\n**CLI:**\n```bash\nhf jobs logs <job-id>\n```\n\n**Python API:**\n```python\nfrom huggingface_hub import fetch_job_logs\nfor log in fetch_job_logs(job_id=\"your-job-id\"):\n    print(log)\n```\n\n**Or use job URL:** `https://huggingface.co/jobs/username/job-id`\n\n### 5. Add Error Handling\n\n```python\ntry:\n    # Your code\n    process_data()\nexcept Exception as e:\n    print(f\"Error: {e}\")\n    import traceback\n    traceback.print_exc()\n    raise\n```\n\n### 6. Check Job Status Programmatically\n\n```python\nfrom huggingface_hub import inspect_job, fetch_job_logs\n\njob_info = inspect_job(job_id=\"your-job-id\")\nprint(f\"Status: {job_info.status.stage}\")\nprint(f\"Message: {job_info.status.message}\")\n\nif job_info.status.stage == \"ERROR\":\n    print(\"Job failed! Logs:\")\n    for log in fetch_job_logs(job_id=\"your-job-id\"):\n        print(log)\n```\n\n## Quick Reference\n\n### Common Error Codes\n\n| Code | Meaning | Solution |\n|------|---------|----------|\n| 401 | Unauthorized | Add token to secrets: MCP uses `\"$HF_TOKEN\"`, Python API uses `get_token()` |\n| 403 | Forbidden | Check token permissions |\n| 404 | Not Found | Verify repository exists |\n| 500 | Server Error | Retry or contact support |\n\n### Checklist Before Submitting\n\n- [ ] Token configured: MCP uses `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`, Python API uses `secrets={\"HF_TOKEN\": get_token()}`\n- [ ] Script checks for token: `assert \"HF_TOKEN\" in os.environ`\n- [ ] Timeout set appropriately\n- [ ] Hardware selected correctly\n- [ ] Dependencies listed in PEP 723 header\n- [ ] Persistence code included\n- [ ] Error handling added\n- [ ] Logging added for debugging\n\n## Getting Help\n\nIf issues persist:\n\n1. **Check logs** - Most errors include detailed messages\n2. **Review documentation** - See main SKILL.md\n3. **Check Hub status** - https://status.huggingface.co\n4. **Community forums** - https://discuss.huggingface.co\n5. **GitHub issues** - For bugs in huggingface_hub\n\n## Key Takeaways\n\n1. **Always include token** - MCP: `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`, Python API: `secrets={\"HF_TOKEN\": get_token()}`\n2. **Set appropriate timeout** - Default 30min may be insufficient\n3. **Verify persistence** - Results won't persist without code\n4. **Check logs** - Most issues visible in job logs\n5. **Test locally** - Catch errors before submitting\n6. **Add error handling** - Better debugging information\n7. **Monitor costs** - Set timeouts to avoid unexpected charges\n\n"
  },
  {
    "path": "skills/hugging-face-jobs/scripts/cot-self-instruct.py",
    "content": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"datasets\",\n#     \"transformers\",\n#     \"vllm>=0.6.5\",\n#     \"huggingface-hub[hf_transfer]\",\n#     \"torch\",\n#     \"numpy\",\n#     \"tqdm\",\n#     \"scikit-learn\",\n# ]\n# ///\n\"\"\"\nGenerate high-quality synthetic data using Chain-of-Thought Self-Instruct methodology.\n\nThis script implements the CoT-Self-Instruct approach from the paper \"CoT-Self-Instruct: \nBuilding high-quality synthetic prompts for reasoning and non-reasoning tasks\" (2025).\n\nIt supports two modes:\n1. Reasoning tasks: Generates both questions and answers with Chain-of-Thought\n2. Instruction tasks: Generates diverse prompts for general instruction following\n\nExample usage:\n    # Reasoning tasks with Answer-Consistency filtering\n    uv run cot-self-instruct.py \\\\\n        --seed-dataset davanstrien/s1k-reasoning \\\\\n        --output-dataset username/synthetic-math \\\\\n        --task-type reasoning \\\\\n        --num-samples 5000 \\\\\n        --filter-method answer-consistency\n\n    # Instruction tasks with RIP filtering\n    uv run cot-self-instruct.py \\\\\n        --seed-dataset wildchat-filtered \\\\\n        --output-dataset username/synthetic-prompts \\\\\n        --task-type instruction \\\\\n        --filter-method rip \\\\\n        --reward-model Nexusflow/Athene-RM-8B\n\n    # HF Jobs execution\n    hf jobs uv run --flavor l4x4 \\\\\n        --image vllm/vllm-openai \\\\\n        -e HF_TOKEN=$(python3 -c \"from huggingface_hub import get_token; print(get_token())\") \\\\\n        https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \\\\\n        [args...]\n\"\"\"\n\nimport argparse\nimport json\nimport logging\nimport os\nimport random\nimport re\nimport sys\nfrom collections import Counter\nfrom datetime import datetime\nfrom typing import Dict, List, Optional, Tuple, Union\n\nimport numpy as np\nimport torch\nfrom datasets import Dataset, load_dataset\nfrom huggingface_hub import DatasetCard, login\nfrom sklearn.cluster import KMeans\nfrom tqdm.auto import tqdm\nfrom transformers import AutoTokenizer\nfrom vllm import LLM, SamplingParams\n\n# Enable HF Transfer for faster downloads\nos.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\"\n\nlogging.basicConfig(\n    level=logging.INFO, format=\"%(asctime)s - %(levelname)s - %(message)s\"\n)\nlogger = logging.getLogger(__name__)\n\n# Prompt templates from the paper\nREASONING_PROMPT_TEMPLATE = \"\"\"You are a reasoning question generator assistant. Your goal is to create a novel, and challenging reasoning question. You are provided the following seed questions:\nSeed Question 1: {seed1}\nSeed Question 2: {seed2}\nYour task is to:\n1. Write a brand-new, self-contained reasoning question that meets the following requirements:\n(a) The question draws inspiration from the seed question without copying it verbatim, remaining novel and of comparable difficulty.\n(b) The question's final answer should be a single, unambiguous scalar value (e.g., an integer, reduced fraction, exact radical), or another answer type that can be verified in one step (e.g., 'yes/no,' a choice from A to D).\n2. Then reason step by step, solve the new question and format your output as follows:\n[New Question Begin]{{your_generated_question}}[New Question End]\n[Final Answer to New Question Begin]\\\\boxed{{your_final_answer}}[Final Answer to New Question End]\"\"\"\n\nINSTRUCTION_PROMPT_TEMPLATE = \"\"\"You are a prompt generator assistant. Your goal is to create diverse and creative synthetic prompts.\nPlease follow the steps below to create synthetic prompts.\nStep 1: Carefully read #Prompt 1# and #Prompt 2#. Identify and list all the common elements between these two prompts. If no common elements are found, list the main elements from each prompt.\nStep 2: Develop a comprehensive plan based on the #Common Elements List# or #Main Elements List# from Step 1. This plan will guide the generation of new synthetic prompts that are similar to the original prompts.\nStep 3: Execute the plan step by step and provide one #Synthetic Prompt#.\nPlease reply strictly in the following format:\n- Step 1 #Common Elements List# or #Main Elements List#:\n- Step 2 #Plan#:\n- Step 3 #Synthetic Prompt#:\n#Prompt 1#:\n{prompt1}\n#Prompt 2#:\n{prompt2}\"\"\"\n\n\ndef check_gpu_availability() -> int:\n    \"\"\"Check if CUDA is available and return the number of GPUs.\"\"\"\n    if not torch.cuda.is_available():\n        logger.error(\"CUDA is not available. This script requires a GPU.\")\n        logger.error(\n            \"Please run on a machine with NVIDIA GPU or use HF Jobs with GPU flavor.\"\n        )\n        sys.exit(1)\n\n    num_gpus = torch.cuda.device_count()\n    for i in range(num_gpus):\n        gpu_name = torch.cuda.get_device_name(i)\n        gpu_memory = torch.cuda.get_device_properties(i).total_memory / 1024**3\n        logger.info(f\"GPU {i}: {gpu_name} with {gpu_memory:.1f} GB memory\")\n\n    return num_gpus\n\n\ndef parse_thinking_output(text: str) -> str:\n    \"\"\"Remove thinking tokens from model output.\"\"\"\n    # Remove <think>...</think> blocks\n    text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)\n    return text.strip()\n\n\ndef extract_reasoning_output(text: str) -> Tuple[Optional[str], Optional[str]]:\n    \"\"\"Extract question and answer from reasoning task output.\"\"\"\n    text = parse_thinking_output(text)\n    \n    # Extract question\n    question_match = re.search(r'\\[New Question Begin\\](.*?)\\[New Question End\\]', text, re.DOTALL)\n    if not question_match:\n        return None, None\n    question = question_match.group(1).strip()\n    \n    # Extract answer\n    answer_match = re.search(r'\\[Final Answer to New Question Begin\\]\\\\?boxed\\{(.*?)\\}\\[Final Answer to New Question End\\]', text, re.DOTALL)\n    if not answer_match:\n        # Try without \\boxed\n        answer_match = re.search(r'\\[Final Answer to New Question Begin\\](.*?)\\[Final Answer to New Question End\\]', text, re.DOTALL)\n    \n    if not answer_match:\n        return question, None\n    \n    answer = answer_match.group(1).strip()\n    return question, answer\n\n\ndef extract_instruction_output(text: str) -> Optional[str]:\n    \"\"\"Extract synthetic prompt from instruction task output.\"\"\"\n    text = parse_thinking_output(text)\n    \n    # Look for the synthetic prompt after \"Step 3 #Synthetic Prompt#:\"\n    match = re.search(r'Step 3 #Synthetic Prompt#:\\s*(.+)', text, re.DOTALL)\n    if match:\n        return match.group(1).strip()\n    return None\n\n\ndef categorize_prompts(prompts: List[str], num_categories: int = 8) -> Dict[int, List[int]]:\n    \"\"\"Categorize prompts using clustering for instruction tasks.\"\"\"\n    from transformers import AutoModel\n    \n    logger.info(f\"Categorizing {len(prompts)} prompts into {num_categories} categories...\")\n    \n    # Use a small model for embeddings\n    tokenizer = AutoTokenizer.from_pretrained(\"sentence-transformers/all-MiniLM-L6-v2\")\n    model = AutoModel.from_pretrained(\"sentence-transformers/all-MiniLM-L6-v2\")\n    \n    # Get embeddings\n    embeddings = []\n    for prompt in tqdm(prompts, desc=\"Computing embeddings\"):\n        inputs = tokenizer(prompt, return_tensors=\"pt\", truncation=True, max_length=512)\n        with torch.no_grad():\n            outputs = model(**inputs)\n            embedding = outputs.last_hidden_state.mean(dim=1).numpy()\n        embeddings.append(embedding[0])\n    \n    # Cluster\n    kmeans = KMeans(n_clusters=num_categories, random_state=42)\n    labels = kmeans.fit_predict(embeddings)\n    \n    # Group by category\n    categories = {}\n    for idx, label in enumerate(labels):\n        if label not in categories:\n            categories[label] = []\n        categories[label].append(idx)\n    \n    return categories\n\n\ndef generate_synthetic_data(\n    llm: LLM,\n    seed_data: List[Dict],\n    task_type: str,\n    num_samples: int,\n    categories: Optional[Dict[int, List[int]]] = None,\n) -> List[Dict]:\n    \"\"\"Generate synthetic data using CoT-Self-Instruct.\"\"\"\n    synthetic_data = []\n    \n    # Set up progress bar\n    pbar = tqdm(total=num_samples, desc=\"Generating synthetic data\")\n    \n    while len(synthetic_data) < num_samples:\n        # Sample seed data\n        if task_type == \"reasoning\":\n            # Random sampling for reasoning tasks\n            seeds = random.sample(seed_data, min(2, len(seed_data)))\n            prompt = REASONING_PROMPT_TEMPLATE.format(\n                seed1=seeds[0].get(\"question\", seeds[0].get(\"prompt\", \"\")),\n                seed2=seeds[1].get(\"question\", seeds[1].get(\"prompt\", \"\")) if len(seeds) > 1 else seeds[0].get(\"question\", seeds[0].get(\"prompt\", \"\"))\n            )\n        else:\n            # Category-aware sampling for instruction tasks\n            if categories:\n                # Pick a random category\n                category = random.choice(list(categories.keys()))\n                category_indices = categories[category]\n                indices = random.sample(category_indices, min(2, len(category_indices)))\n                seeds = [seed_data[i] for i in indices]\n            else:\n                seeds = random.sample(seed_data, min(2, len(seed_data)))\n            \n            prompt = INSTRUCTION_PROMPT_TEMPLATE.format(\n                prompt1=seeds[0].get(\"prompt\", seeds[0].get(\"question\", \"\")),\n                prompt2=seeds[1].get(\"prompt\", seeds[1].get(\"question\", \"\")) if len(seeds) > 1 else seeds[0].get(\"prompt\", seeds[0].get(\"question\", \"\"))\n            )\n        \n        # Generate\n        sampling_params = SamplingParams(\n            temperature=0.7 if task_type == \"reasoning\" else 0.8,\n            top_p=0.95 if task_type == \"reasoning\" else 0.9,\n            max_tokens=2048,\n        )\n        \n        outputs = llm.generate([prompt], sampling_params)\n        output_text = outputs[0].outputs[0].text\n        \n        # Parse output\n        if task_type == \"reasoning\":\n            question, answer = extract_reasoning_output(output_text)\n            if question and answer:\n                synthetic_data.append({\n                    \"question\": question,\n                    \"answer\": answer,\n                    \"seed_indices\": [seed_data.index(s) for s in seeds],\n                })\n                pbar.update(1)\n        else:\n            synthetic_prompt = extract_instruction_output(output_text)\n            if synthetic_prompt:\n                synthetic_data.append({\n                    \"prompt\": synthetic_prompt,\n                    \"seed_indices\": [seed_data.index(s) for s in seeds],\n                })\n                pbar.update(1)\n    \n    pbar.close()\n    return synthetic_data\n\n\ndef answer_consistency_filter(\n    llm: LLM,\n    synthetic_data: List[Dict],\n    k_responses: int = 16,\n    threshold: float = 0.5,\n) -> List[Dict]:\n    \"\"\"Filter reasoning tasks using Answer-Consistency.\"\"\"\n    logger.info(f\"Applying Answer-Consistency filter with K={k_responses}\")\n    \n    filtered_data = []\n    \n    for item in tqdm(synthetic_data, desc=\"Answer-Consistency filtering\"):\n        question = item[\"question\"]\n        original_answer = item[\"answer\"]\n        \n        # Generate K responses\n        prompts = [question] * k_responses\n        sampling_params = SamplingParams(\n            temperature=0.6,\n            top_p=0.95,\n            max_tokens=1024,\n        )\n        \n        outputs = llm.generate(prompts, sampling_params)\n        \n        # Extract answers\n        answers = []\n        for output in outputs:\n            text = output.outputs[0].text\n            # Try to extract boxed answer\n            match = re.search(r'\\\\boxed\\{(.*?)\\}', text)\n            if match:\n                answers.append(match.group(1).strip())\n        \n        if not answers:\n            continue\n        \n        # Get majority answer\n        answer_counts = Counter(answers)\n        if answer_counts:\n            majority_answer, count = answer_counts.most_common(1)[0]\n            \n            # Check if majority answer matches original and meets threshold\n            if (majority_answer == original_answer and \n                count / len(answers) >= threshold):\n                item[\"consistency_score\"] = count / len(answers)\n                filtered_data.append(item)\n    \n    logger.info(f\"Answer-Consistency: kept {len(filtered_data)}/{len(synthetic_data)} examples\")\n    return filtered_data\n\n\ndef rip_filter(\n    llm: LLM,\n    synthetic_data: List[Dict],\n    reward_model_id: str,\n    k_responses: int = 32,\n    threshold: float = 0.5,\n) -> List[Dict]:\n    \"\"\"Filter using Rejecting Instruction Preferences (RIP).\"\"\"\n    logger.info(f\"Applying RIP filter with K={k_responses} and reward model {reward_model_id}\")\n    \n    # Note: In a full implementation, you would load and use the actual reward model\n    # For this example, we'll use a placeholder scoring mechanism\n    logger.warning(\"RIP filtering requires a reward model implementation - using placeholder\")\n    \n    filtered_data = []\n    \n    for item in tqdm(synthetic_data, desc=\"RIP filtering\"):\n        prompt = item.get(\"prompt\", item.get(\"question\", \"\"))\n        \n        # Generate K responses\n        prompts = [prompt] * k_responses\n        sampling_params = SamplingParams(\n            temperature=1.0,\n            top_p=1.0,\n            max_tokens=1024,\n        )\n        \n        outputs = llm.generate(prompts, sampling_params)\n        \n        # In real implementation: score each response with reward model\n        # For now, use length as a proxy (longer responses often score higher)\n        scores = [len(output.outputs[0].text) for output in outputs]\n        \n        # Use minimum score as quality indicator\n        min_score = min(scores) if scores else 0\n        normalized_score = min_score / 1000  # Normalize to 0-1 range\n        \n        if normalized_score >= threshold:\n            item[\"rip_score\"] = normalized_score\n            filtered_data.append(item)\n    \n    logger.info(f\"RIP filter: kept {len(filtered_data)}/{len(synthetic_data)} examples\")\n    return filtered_data\n\n\ndef create_dataset_card(\n    task_type: str,\n    source_dataset: str,\n    generation_model: str,\n    filter_method: str,\n    num_generated: int,\n    num_filtered: int,\n    generation_time: str,\n    additional_info: Dict = None,\n) -> str:\n    \"\"\"Create a comprehensive dataset card.\"\"\"\n    filter_info = \"\"\n    if filter_method == \"answer-consistency\":\n        filter_info = \"\"\"\n### Answer-Consistency Filtering\n\nThis dataset was filtered using Answer-Consistency:\n- Generated K responses for each synthetic question\n- Kept only examples where majority answer matched the generated answer\n- Ensures high-quality, correctly solved problems\"\"\"\n    elif filter_method == \"rip\":\n        filter_info = \"\"\"\n### RIP (Rejecting Instruction Preferences) Filtering\n\nThis dataset was filtered using RIP:\n- Generated K responses for each synthetic prompt\n- Scored responses using a reward model\n- Kept only prompts with high minimum scores\"\"\"\n    \n    return f\"\"\"---\ntags:\n- synthetic-data\n- cot-self-instruct\n- {task_type}\n- uv-script\n---\n\n# CoT-Self-Instruct Synthetic Data\n\nThis dataset contains synthetic {task_type} data generated using the Chain-of-Thought Self-Instruct methodology.\n\n## Generation Details\n\n- **Source Dataset**: [{source_dataset}](https://huggingface.co/datasets/{source_dataset})\n- **Generation Model**: [{generation_model}](https://huggingface.co/{generation_model})\n- **Task Type**: {task_type}\n- **Filter Method**: {filter_method}\n- **Generated Examples**: {num_generated:,}\n- **After Filtering**: {num_filtered:,} ({(num_filtered/num_generated)*100:.1f}% acceptance rate)\n- **Generation Date**: {generation_time}\n{filter_info}\n\n## Methodology\n\nGenerated using CoT-Self-Instruct, which:\n1. Uses Chain-of-Thought reasoning to analyze seed examples\n2. Generates new synthetic examples of similar quality and complexity\n3. Applies quality filtering to ensure high-quality outputs\n\nBased on the paper: \"CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks\" (2025)\n\n## Generation Script\n\nGenerated using the CoT-Self-Instruct script from [uv-scripts/synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data).\n\nTo reproduce:\n```bash\nuv run https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \\\\\n    --seed-dataset {source_dataset} \\\\\n    --output-dataset <your-dataset> \\\\\n    --task-type {task_type} \\\\\n    --generation-model {generation_model} \\\\\n    --filter-method {filter_method}\n```\n\"\"\"\n\n\ndef main():\n    parser = argparse.ArgumentParser(\n        description=\"Generate synthetic data using CoT-Self-Instruct\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=__doc__,\n    )\n    \n    # Dataset arguments\n    parser.add_argument(\n        \"--seed-dataset\",\n        type=str,\n        required=True,\n        help=\"HuggingFace dataset ID containing seed examples\",\n    )\n    parser.add_argument(\n        \"--output-dataset\",\n        type=str,\n        required=True,\n        help=\"HuggingFace dataset ID for output\",\n    )\n    \n    # Task configuration\n    parser.add_argument(\n        \"--task-type\",\n        type=str,\n        choices=[\"reasoning\", \"instruction\", \"auto\"],\n        default=\"auto\",\n        help=\"Type of task (reasoning generates Q&A, instruction generates prompts)\",\n    )\n    parser.add_argument(\n        \"--task-column\",\n        type=str,\n        default=None,\n        help=\"Column name containing tasks (auto-detected if not specified)\",\n    )\n    \n    # Model configuration\n    parser.add_argument(\n        \"--generation-model\",\n        type=str,\n        default=\"Qwen/Qwen3-30B-A3B-Thinking-2507\",\n        help=\"Model for synthetic data generation\",\n    )\n    parser.add_argument(\n        \"--filter-model\",\n        type=str,\n        default=None,\n        help=\"Model for filtering (defaults to generation model)\",\n    )\n    parser.add_argument(\n        \"--reward-model\",\n        type=str,\n        default=\"Nexusflow/Athene-RM-8B\",\n        help=\"Reward model for RIP filtering\",\n    )\n    \n    # Generation parameters\n    parser.add_argument(\n        \"--num-samples\",\n        type=int,\n        default=5000,\n        help=\"Number of synthetic examples to generate\",\n    )\n    parser.add_argument(\n        \"--batch-size\",\n        type=int,\n        default=1,\n        help=\"Batch size for generation\",\n    )\n    \n    # Filtering parameters\n    parser.add_argument(\n        \"--filter-method\",\n        type=str,\n        choices=[\"answer-consistency\", \"rip\", \"both\", \"none\"],\n        default=\"answer-consistency\",\n        help=\"Quality filtering method\",\n    )\n    parser.add_argument(\n        \"--k-responses\",\n        type=int,\n        default=16,\n        help=\"Number of responses for filtering\",\n    )\n    parser.add_argument(\n        \"--quality-threshold\",\n        type=float,\n        default=0.5,\n        help=\"Minimum quality threshold for filtering\",\n    )\n    \n    # GPU configuration\n    parser.add_argument(\n        \"--tensor-parallel-size\",\n        type=int,\n        default=None,\n        help=\"Number of GPUs for tensor parallelism (auto-detected if not set)\",\n    )\n    parser.add_argument(\n        \"--gpu-memory-utilization\",\n        type=float,\n        default=0.9,\n        help=\"GPU memory utilization\",\n    )\n    \n    # Other arguments\n    parser.add_argument(\n        \"--hf-token\",\n        type=str,\n        default=None,\n        help=\"HuggingFace API token\",\n    )\n    parser.add_argument(\n        \"--seed\",\n        type=int,\n        default=42,\n        help=\"Random seed\",\n    )\n    \n    args = parser.parse_args()\n    \n    # Set random seeds\n    random.seed(args.seed)\n    np.random.seed(args.seed)\n    torch.manual_seed(args.seed)\n    \n    # Check GPU\n    num_gpus = check_gpu_availability()\n    tensor_parallel_size = args.tensor_parallel_size or num_gpus\n    \n    # Authentication\n    hf_token = args.hf_token or os.environ.get(\"HF_TOKEN\")\n    if hf_token:\n        login(token=hf_token)\n    \n    # Load seed dataset\n    logger.info(f\"Loading seed dataset: {args.seed_dataset}\")\n    seed_dataset = load_dataset(args.seed_dataset, split=\"train\")\n    \n    # Auto-detect task type and column if needed\n    if args.task_type == \"auto\":\n        columns = seed_dataset.column_names\n        if \"question\" in columns and \"answer\" in columns:\n            args.task_type = \"reasoning\"\n            logger.info(\"Auto-detected task type: reasoning\")\n        else:\n            args.task_type = \"instruction\"\n            logger.info(\"Auto-detected task type: instruction\")\n    \n    if not args.task_column:\n        if args.task_type == \"reasoning\":\n            args.task_column = \"question\"\n        else:\n            # Try to find prompt column\n            for col in [\"prompt\", \"instruction\", \"text\", \"input\"]:\n                if col in seed_dataset.column_names:\n                    args.task_column = col\n                    break\n    \n    logger.info(f\"Using task column: {args.task_column}\")\n    \n    # Convert to list of dicts\n    seed_data = seed_dataset.to_list()\n    \n    # Categorize prompts for instruction tasks\n    categories = None\n    if args.task_type == \"instruction\" and len(seed_data) > 100:\n        prompts = [item.get(args.task_column, \"\") for item in seed_data]\n        categories = categorize_prompts(prompts)\n    \n    # Initialize generation model\n    logger.info(f\"Loading generation model: {args.generation_model}\")\n    generation_llm = LLM(\n        model=args.generation_model,\n        tensor_parallel_size=tensor_parallel_size,\n        gpu_memory_utilization=args.gpu_memory_utilization,\n    )\n    \n    # Generate synthetic data\n    start_time = datetime.now()\n    synthetic_data = generate_synthetic_data(\n        generation_llm,\n        seed_data,\n        args.task_type,\n        args.num_samples,\n        categories,\n    )\n    \n    # Apply filtering\n    filter_llm = generation_llm\n    if args.filter_model and args.filter_model != args.generation_model:\n        logger.info(f\"Loading filter model: {args.filter_model}\")\n        # Clean up generation model\n        del generation_llm\n        torch.cuda.empty_cache()\n        \n        filter_llm = LLM(\n            model=args.filter_model,\n            tensor_parallel_size=tensor_parallel_size,\n            gpu_memory_utilization=args.gpu_memory_utilization,\n        )\n    \n    filtered_data = synthetic_data\n    if args.filter_method != \"none\":\n        if args.filter_method == \"answer-consistency\" and args.task_type == \"reasoning\":\n            filtered_data = answer_consistency_filter(\n                filter_llm,\n                synthetic_data,\n                args.k_responses,\n                args.quality_threshold,\n            )\n        elif args.filter_method == \"rip\":\n            filtered_data = rip_filter(\n                filter_llm,\n                synthetic_data,\n                args.reward_model,\n                args.k_responses,\n                args.quality_threshold,\n            )\n        elif args.filter_method == \"both\":\n            if args.task_type == \"reasoning\":\n                filtered_data = answer_consistency_filter(\n                    filter_llm,\n                    synthetic_data,\n                    args.k_responses,\n                    args.quality_threshold,\n                )\n            filtered_data = rip_filter(\n                filter_llm,\n                filtered_data,\n                args.reward_model,\n                args.k_responses,\n                args.quality_threshold,\n            )\n    \n    # Create HuggingFace dataset\n    logger.info(f\"Creating dataset with {len(filtered_data)} examples\")\n    dataset = Dataset.from_list(filtered_data)\n    \n    # Create dataset card\n    generation_time = start_time.strftime(\"%Y-%m-%d %H:%M:%S UTC\")\n    dataset_card = create_dataset_card(\n        args.task_type,\n        args.seed_dataset,\n        args.generation_model,\n        args.filter_method,\n        len(synthetic_data),\n        len(filtered_data),\n        generation_time,\n    )\n    \n    # Push to hub\n    logger.info(f\"Pushing dataset to: {args.output_dataset}\")\n    # Create dataset card\n    card = DatasetCard(dataset_card)\n    dataset.push_to_hub(args.output_dataset)\n    # Push card separately\n    card.push_to_hub(args.output_dataset)\n    \n    logger.info(\"Done! Dataset available at: https://huggingface.co/datasets/\" + args.output_dataset)\n    \n    # Print example HF Jobs command if running locally\n    if len(sys.argv) > 1:\n        print(\"\\nTo run on HF Jobs:\")\n        print(f\"\"\"hf jobs uv run --flavor l4x4 \\\\\n    --image vllm/vllm-openai \\\\\n    -e HF_TOKEN=$(python3 -c \"from huggingface_hub import get_token; print(get_token())\") \\\\\n    https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \\\\\n    --seed-dataset {args.seed_dataset} \\\\\n    --output-dataset {args.output_dataset} \\\\\n    --task-type {args.task_type} \\\\\n    --generation-model {args.generation_model} \\\\\n    --filter-method {args.filter_method} \\\\\n    --num-samples {args.num_samples}\"\"\")\n\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "skills/hugging-face-jobs/scripts/finepdfs-stats.py",
    "content": "# /// script\n# requires-python = \">=3.12\"\n# dependencies = [\n#     \"polars>=1.31.0\",\n#     \"huggingface-hub\",\n#     \"datasets\",\n#     \"ascii-graph\",\n# ]\n# ///\n\"\"\"\nAnalyze educational quality trends across CommonCrawl dumps using Polars streaming.\n\nAnswers: \"Is the web getting more educational over time?\"\n\nDemonstrates Polars HF Hub integration - process 50M+ docs without downloading 300GB+.\n\nExample usage:\n    # Analyze English PDFs (default)\n    uv run finepdfs-stats.py\n\n    # Analyze all 70+ languages\n    uv run finepdfs-stats.py --all-languages\n\n    # Quick test\n    uv run finepdfs-stats.py --limit 10000 --show-plan\n\n    # Save results to HF Hub\n    uv run finepdfs-stats.py --output-repo username/finepdfs-temporal-stats\n\n    # Run on HF Jobs\n    hf jobs uv run \\\\\n        -s HF_TOKEN \\\\\n        -e HF_XET_HIGH_PERFORMANCE=1 \\\\\n        https://huggingface.co/datasets/uv-scripts/dataset-stats/raw/main/finepdfs-stats.py \\\\\n        -- --output-repo username/stats\n\"\"\"\n\nimport argparse\nimport logging\nimport os\nimport sys\nimport time\nfrom pathlib import Path\n\nimport polars as pl\nfrom ascii_graph import Pyasciigraph\nfrom datasets import Dataset\nfrom huggingface_hub import HfApi, create_repo, list_repo_tree, login\n\nlogging.basicConfig(\n    level=logging.INFO, format=\"%(asctime)s - %(levelname)s - %(message)s\"\n)\nlogger = logging.getLogger(__name__)\n\n# Common language+script codes for finepdfs-edu\nCOMMON_LANGUAGES = {\n    \"eng_Latn\": \"English (Latin script)\",\n    \"fra_Latn\": \"French (Latin script)\",\n    \"deu_Latn\": \"German (Latin script)\",\n    \"spa_Latn\": \"Spanish (Latin script)\",\n    \"por_Latn\": \"Portuguese (Latin script)\",\n    \"ita_Latn\": \"Italian (Latin script)\",\n    \"nld_Latn\": \"Dutch (Latin script)\",\n    \"pol_Latn\": \"Polish (Latin script)\",\n    \"rus_Cyrl\": \"Russian (Cyrillic script)\",\n    \"zho_Hans\": \"Chinese (Simplified)\",\n    \"zho_Hant\": \"Chinese (Traditional)\",\n    \"jpn_Jpan\": \"Japanese\",\n    \"kor_Hang\": \"Korean\",\n    \"ara_Arab\": \"Arabic\",\n    \"hin_Deva\": \"Hindi (Devanagari)\",\n}\n\n\ndef list_available_languages(dataset_id: str) -> list[str]:\n    \"\"\"List available language subsets in the dataset.\"\"\"\n    try:\n        tree = list_repo_tree(dataset_id, path_in_repo=\"data\", repo_type=\"dataset\")\n        languages = [\n            item.path.replace(\"data/\", \"\")\n            for item in tree\n            if item.path.startswith(\"data/\")\n            and \"/\" not in item.path.replace(\"data/\", \"\")\n        ]\n        return sorted(languages)\n    except Exception as e:\n        logger.warning(f\"Could not list languages: {e}\")\n        return list(COMMON_LANGUAGES.keys())\n\n\ndef compute_temporal_stats(df: pl.LazyFrame, output_path: Path) -> pl.DataFrame:\n    \"\"\"Single scan: compute stats grouped by dump for temporal analysis.\"\"\"\n    query = df.group_by(\"dump\").agg(\n        pl.len().alias(\"doc_count\"),\n        pl.col(\"token_count\").sum().alias(\"total_tokens\"),\n        pl.col(\"fw_edu_scores\").list.mean().mean().alias(\"avg_edu_score\"),\n        (pl.col(\"fw_edu_scores\").list.mean() >= 3).sum().alias(\"high_edu_count\"),\n    )\n    query.sink_parquet(output_path, engine=\"streaming\")\n    return pl.read_parquet(output_path)\n\n\ndef compute_global_stats(temporal: pl.DataFrame) -> pl.DataFrame:\n    \"\"\"Compute global stats from temporal breakdown.\"\"\"\n    total = temporal[\"doc_count\"].sum()\n    return pl.DataFrame(\n        {\n            \"total_docs\": [total],\n            \"total_tokens\": [temporal[\"total_tokens\"].sum()],\n            \"avg_edu_score\": [\n                (temporal[\"avg_edu_score\"] * temporal[\"doc_count\"]).sum() / total\n            ],\n            \"high_edu_rate\": [temporal[\"high_edu_count\"].sum() / total],\n            \"num_dumps\": [len(temporal)],\n        }\n    )\n\n\ndef format_temporal_stats(temporal: pl.DataFrame) -> pl.DataFrame:\n    \"\"\"Format temporal stats with high_edu_rate, sorted chronologically.\"\"\"\n    return (\n        temporal.with_columns(\n            (pl.col(\"high_edu_count\") / pl.col(\"doc_count\")).alias(\"high_edu_rate\")\n        )\n        .select([\"dump\", \"doc_count\", \"avg_edu_score\", \"high_edu_rate\"])\n        .sort(\n            \"dump\"\n        )  # Chronological order (CC-MAIN-2017-xx comes before CC-MAIN-2024-xx)\n    )\n\n\ndef create_ascii_charts(temporal_stats: pl.DataFrame) -> str:\n    \"\"\"Create ASCII bar charts showing temporal trends.\"\"\"\n    # Extract year from dump name (CC-MAIN-2024-42 -> 2024)\n    # Group by year and average the values for cleaner display\n    yearly = (\n        temporal_stats.with_columns(\n            pl.col(\"dump\").str.extract(r\"CC-MAIN-(\\d{4})\", 1).alias(\"year\")\n        )\n        .group_by(\"year\")\n        .agg(\n            pl.col(\"doc_count\").sum(),\n            pl.col(\"avg_edu_score\").mean(),\n            pl.col(\"high_edu_rate\").mean(),\n        )\n        .sort(\"year\")\n    )\n\n    lines = []\n\n    # High edu rate chart (more dramatic differences)\n    data_rate = [\n        (row[\"year\"], row[\"high_edu_rate\"] * 100)\n        for row in yearly.iter_rows(named=True)\n    ]\n    graph = Pyasciigraph(line_length=60, float_format=\"{0:.1f}%\")\n    lines.extend(graph.graph(\"High Educational Content (edu >= 3)\", data_rate))\n\n    lines.append(\"\")\n\n    # Avg edu score chart\n    data_score = [\n        (row[\"year\"], row[\"avg_edu_score\"]) for row in yearly.iter_rows(named=True)\n    ]\n    graph2 = Pyasciigraph(line_length=60, float_format=\"{0:.2f}\")\n    lines.extend(graph2.graph(\"Average Educational Score\", data_score))\n\n    return \"\\n\".join(lines)\n\n\ndef create_readme(\n    args,\n    global_stats: pl.DataFrame,\n    temporal_stats: pl.DataFrame,\n    scan_time: float,\n    ascii_charts: str,\n) -> str:\n    \"\"\"Create README content for the stats dataset.\"\"\"\n    stats = global_stats.to_dicts()[0]\n    total_docs = stats.get(\"total_docs\", 0)\n    docs_per_sec = total_docs / scan_time if scan_time > 0 else 0\n\n    # Get first and last year averages for trend (more representative than single dumps)\n    yearly = (\n        temporal_stats.with_columns(\n            pl.col(\"dump\").str.extract(r\"CC-MAIN-(\\d{4})\", 1).alias(\"year\")\n        )\n        .group_by(\"year\")\n        .agg(\n            pl.col(\"doc_count\").sum(),\n            pl.col(\"avg_edu_score\").mean(),\n            pl.col(\"high_edu_rate\").mean(),\n        )\n        .sort(\"year\")\n    )\n    first_year = yearly.head(1).to_dicts()[0]\n    last_year = yearly.tail(1).to_dicts()[0]\n\n    scope = (\n        \"all languages\"\n        if args.all_languages\n        else COMMON_LANGUAGES.get(args.lang, args.lang)\n    )\n\n    return f\"\"\"---\ntags:\n  - uv-script\n  - statistics\n  - polars\n  - finepdfs-edu\n  - temporal-analysis\nlicense: odc-by\nconfigs:\n  - config_name: global_stats\n    data_files: global_stats/train-*.parquet\n  - config_name: temporal_stats\n    data_files: temporal_stats/train-*.parquet\ndefault_viewer_config: temporal_stats\n---\n\n# Is the Web Getting More Educational?\n\nTemporal analysis of educational quality in **{scope}** across {stats.get(\"num_dumps\", 0)} CommonCrawl dumps.\n\n## Trend\n\n```\n{ascii_charts}\n```\n\n## Key Finding\n\n| Year | Avg Edu Score | High Edu Rate |\n|------|---------------|---------------|\n| {first_year[\"year\"]} | {first_year[\"avg_edu_score\"]:.2f} | {first_year[\"high_edu_rate\"] * 100:.1f}% |\n| {last_year[\"year\"]} | {last_year[\"avg_edu_score\"]:.2f} | {last_year[\"high_edu_rate\"] * 100:.1f}% |\n\n## Performance\n\n- **{total_docs:,} documents** processed in **{scan_time:.0f} seconds**\n- **{docs_per_sec:,.0f} docs/sec** using Polars streaming\n- Single scan, no full dataset download required\n\n## Summary\n\n| Metric | Value |\n|--------|-------|\n| Scope | {scope} |\n| Total Documents | {total_docs:,} |\n| Total Tokens | {stats.get(\"total_tokens\", 0):,} |\n| Avg Edu Score | {stats.get(\"avg_edu_score\", 0):.3f} |\n| High Edu Rate | {stats.get(\"high_edu_rate\", 0) * 100:.1f}% |\n| CommonCrawl Dumps | {stats.get(\"num_dumps\", 0)} |\n\n## Files\n\n- `global_stats` - Overall summary\n- `temporal_stats` - Per-dump breakdown (sorted chronologically)\n\n## Reproduce\n\n```bash\nuv run https://huggingface.co/datasets/uv-scripts/dataset-stats/raw/main/finepdfs-stats.py \\\\\n    {\"--all-languages\" if args.all_languages else f\"--lang {args.lang}\"} --output-repo your-username/stats\n```\n\n## Source\n\n- **Dataset**: [{args.source_dataset}](https://huggingface.co/datasets/{args.source_dataset})\n- **Script**: [uv-scripts/dataset-stats](https://huggingface.co/datasets/uv-scripts/dataset-stats)\n\"\"\"\n\n\ndef main():\n    parser = argparse.ArgumentParser(\n        description=\"Analyze educational quality trends across CommonCrawl dumps\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=__doc__,\n    )\n\n    parser.add_argument(\n        \"--source-dataset\",\n        type=str,\n        default=\"HuggingFaceFW/finepdfs-edu\",\n        help=\"Source dataset (default: HuggingFaceFW/finepdfs-edu)\",\n    )\n\n    parser.add_argument(\n        \"--lang\",\n        type=str,\n        default=\"eng_Latn\",\n        help=\"Language+script code (default: eng_Latn)\",\n    )\n\n    parser.add_argument(\n        \"--all-languages\",\n        action=\"store_true\",\n        help=\"Analyze all languages (70+) instead of single language\",\n    )\n\n    parser.add_argument(\n        \"--show-plan\",\n        action=\"store_true\",\n        help=\"Show Polars query plan (demonstrates optimization)\",\n    )\n\n    parser.add_argument(\n        \"--list-languages\",\n        action=\"store_true\",\n        help=\"List available languages and exit\",\n    )\n\n    parser.add_argument(\n        \"--limit\",\n        type=int,\n        help=\"Limit to first N rows (for testing)\",\n    )\n\n    parser.add_argument(\n        \"--output-repo\",\n        type=str,\n        help=\"HuggingFace dataset repository to upload results\",\n    )\n\n    parser.add_argument(\n        \"--output-dir\",\n        type=str,\n        default=\"./stats_output\",\n        help=\"Local directory for output files\",\n    )\n\n    parser.add_argument(\n        \"--hf-token\",\n        type=str,\n        help=\"HuggingFace API token (or set HF_TOKEN env var)\",\n    )\n\n    parser.add_argument(\n        \"--private\",\n        action=\"store_true\",\n        help=\"Make the output dataset private\",\n    )\n\n    args = parser.parse_args()\n\n    # Check for high-performance mode\n    if os.environ.get(\"HF_XET_HIGH_PERFORMANCE\"):\n        logger.info(\"High-performance mode enabled (HF_XET_HIGH_PERFORMANCE=1)\")\n\n    # List languages mode\n    if args.list_languages:\n        print(f\"Available language+script codes for {args.source_dataset}:\\n\")\n        print(\"Common languages:\")\n        for code, name in COMMON_LANGUAGES.items():\n            print(f\"  {code:12} - {name}\")\n        print(\"\\nFetching full list from HF Hub...\")\n        all_langs = list_available_languages(args.source_dataset)\n        print(f\"\\nAll available ({len(all_langs)} total):\")\n        for lang in all_langs[:30]:  # Show first 30\n            name = COMMON_LANGUAGES.get(lang, \"\")\n            print(f\"  {lang:12} {name}\")\n        if len(all_langs) > 30:\n            print(f\"  ... and {len(all_langs) - 30} more\")\n        sys.exit(0)\n\n    # Build the parquet path\n    if args.all_languages:\n        source_path = f\"hf://datasets/{args.source_dataset}/data/*/train/*.parquet\"\n        scope_desc = \"all languages\"\n    else:\n        source_path = (\n            f\"hf://datasets/{args.source_dataset}/data/{args.lang}/train/*.parquet\"\n        )\n        scope_desc = f\"{args.lang} ({COMMON_LANGUAGES.get(args.lang, 'unknown')})\"\n\n    logger.info(f\"Scanning: {source_path}\")\n    logger.info(f\"Scope: {scope_desc}\")\n\n    # Create lazy frame - this doesn't load any data yet!\n    logger.info(\"Creating lazy query plan...\")\n    df = pl.scan_parquet(source_path)\n\n    # Apply limit if specified\n    if args.limit:\n        logger.info(f\"Limiting to first {args.limit:,} rows\")\n        df = df.head(args.limit)\n\n    # Show query plan if requested\n    if args.show_plan:\n        # Build a sample query to show the plan\n        sample_query = df.select(\n            pl.len(),\n            pl.col(\"token_count\").sum(),\n            pl.col(\"language\").n_unique(),\n        )\n        print(\"\\nQuery Plan (showing Polars optimization):\")\n        print(\"=\" * 60)\n        print(sample_query.explain())\n        print(\"=\" * 60)\n        print(\"\\nNote: Polars uses projection pushdown - only reads columns needed!\")\n        print(\"The 'text' column is never loaded, making this very fast.\\n\")\n\n    # Create output directory\n    output_dir = Path(args.output_dir)\n    output_dir.mkdir(parents=True, exist_ok=True)\n\n    # Single scan: compute temporal stats\n    logger.info(\"Computing temporal stats (single scan)...\")\n    start = time.perf_counter()\n    temporal_path = output_dir / \"temporal_stats.parquet\"\n    temporal_raw = compute_temporal_stats(df, temporal_path)\n    scan_time = time.perf_counter() - start\n    logger.info(f\"Scan complete in {scan_time:.2f}s - {len(temporal_raw)} dumps\")\n\n    # Compute stats\n    global_stats = compute_global_stats(temporal_raw)\n    temporal_stats = format_temporal_stats(temporal_raw)\n\n    # Save\n    global_stats.write_parquet(output_dir / \"global_stats.parquet\")\n    temporal_stats.write_parquet(output_dir / \"temporal_stats.parquet\")\n\n    # Print results\n    total_docs = global_stats[\"total_docs\"][0]\n    docs_per_sec = total_docs / scan_time if scan_time > 0 else 0\n\n    print(\"\\n\" + \"=\" * 70)\n    print(\"IS THE WEB GETTING MORE EDUCATIONAL?\")\n    print(\"=\" * 70)\n\n    print(f\"\\nScope: {scope_desc}\")\n    print(f\"Dataset: {args.source_dataset}\")\n\n    print(\"\\n\" + \"-\" * 70)\n    print(\"GLOBAL STATS\")\n    print(\"-\" * 70)\n    print(global_stats)\n\n    print(\"\\n\" + \"-\" * 70)\n    print(f\"TEMPORAL TREND ({len(temporal_stats)} CommonCrawl dumps)\")\n    print(\"-\" * 70)\n    # Show first 5 and last 5\n    if len(temporal_stats) > 10:\n        print(\"Earliest dumps:\")\n        print(temporal_stats.head(5))\n        print(\"\\n...\")\n        print(\"\\nLatest dumps:\")\n        print(temporal_stats.tail(5))\n    else:\n        print(temporal_stats)\n\n    # Create ASCII charts\n    ascii_charts = create_ascii_charts(temporal_stats)\n    print(\"\\n\" + \"-\" * 70)\n    print(\"TREND VISUALIZATION\")\n    print(\"-\" * 70)\n    print(ascii_charts)\n\n    print(\"\\n\" + \"-\" * 70)\n    print(\"PERFORMANCE\")\n    print(\"-\" * 70)\n    print(f\"Scan time: {scan_time:.2f}s\")\n    print(f\"Documents: {total_docs:,}\")\n    print(f\"Throughput: {docs_per_sec:,.0f} docs/sec\")\n\n    logger.info(f\"Results saved to: {output_dir}\")\n\n    # Upload to HF Hub if requested\n    if args.output_repo:\n        hf_token = args.hf_token or os.environ.get(\"HF_TOKEN\")\n        if hf_token:\n            login(token=hf_token)\n\n        api = HfApi(token=hf_token)\n\n        logger.info(f\"Creating/updating dataset repository: {args.output_repo}\")\n        create_repo(\n            args.output_repo,\n            repo_type=\"dataset\",\n            private=args.private,\n            token=hf_token,\n            exist_ok=True,\n        )\n\n        # Upload each as a dataset config\n        configs = [\n            (\"global_stats\", global_stats),\n            (\"temporal_stats\", temporal_stats),\n        ]\n\n        for config_name, stats_df in configs:\n            logger.info(f\"Uploading {config_name}...\")\n            ds = Dataset.from_polars(stats_df)\n            ds.push_to_hub(\n                args.output_repo,\n                config_name=config_name,\n                token=hf_token,\n                private=args.private,\n            )\n            time.sleep(1)  # Avoid 409 conflicts\n\n        # Upload README\n        readme_content = create_readme(\n            args, global_stats, temporal_stats, scan_time, ascii_charts\n        )\n        api.upload_file(\n            path_or_fileobj=readme_content.encode(),\n            path_in_repo=\"README.md\",\n            repo_id=args.output_repo,\n            repo_type=\"dataset\",\n            token=hf_token,\n        )\n\n        dataset_url = f\"https://huggingface.co/datasets/{args.output_repo}\"\n        logger.info(f\"Dataset uploaded: {dataset_url}\")\n        print(f\"\\nResults uploaded to: {dataset_url}\")\n\n\nif __name__ == \"__main__\":\n    if len(sys.argv) == 1:\n        print(\"Is the Web Getting More Educational?\")\n        print(\"=\" * 40)\n        print(\"\\nAnalyze educational quality trends across CommonCrawl dumps\")\n        print(\"using Polars streaming - no download needed!\\n\")\n        print(\"Example commands:\\n\")\n        print(\"# Quick test:\")\n        print(\"uv run finepdfs-stats.py --limit 10000\\n\")\n        print(\"# Analyze English PDFs:\")\n        print(\"uv run finepdfs-stats.py\\n\")\n        print(\"# Analyze ALL 70+ languages:\")\n        print(\"uv run finepdfs-stats.py --all-languages\\n\")\n        print(\"# Show query plan (see Polars optimization):\")\n        print(\"uv run finepdfs-stats.py --show-plan --limit 1000\\n\")\n        print(\"# Save results to HF Hub:\")\n        print(\"uv run finepdfs-stats.py --output-repo username/temporal-stats\\n\")\n        print(\"# Run on HF Jobs:\")\n        print(\"hf jobs uv run \\\\\")\n        print(\"    -s HF_TOKEN \\\\\")\n        print(\"    -e HF_XET_HIGH_PERFORMANCE=1 \\\\\")\n        print(\n            \"    https://huggingface.co/datasets/uv-scripts/dataset-stats/raw/main/finepdfs-stats.py \\\\\"\n        )\n        print(\"    -- --output-repo username/stats\")\n        sys.exit(0)\n\n    main()\n"
  },
  {
    "path": "skills/hugging-face-jobs/scripts/generate-responses.py",
    "content": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"datasets\",\n#     \"flashinfer-python\",\n#     \"huggingface-hub[hf_transfer]\",\n#     \"hf-xet>= 1.1.7\",\n#     \"torch\",\n#     \"transformers\",\n#     \"vllm>=0.8.5\",\n# ]\n#\n# ///\n\"\"\"\nGenerate responses for prompts in a dataset using vLLM for efficient GPU inference.\n\nThis script loads a dataset from Hugging Face Hub containing chat-formatted messages,\napplies the model's chat template, generates responses using vLLM, and saves the\nresults back to the Hub with a comprehensive dataset card.\n\nExample usage:\n    # Local execution with auto GPU detection\n    uv run generate-responses.py \\\\\n        username/input-dataset \\\\\n        username/output-dataset \\\\\n        --messages-column messages\n\n    # With custom model and sampling parameters\n    uv run generate-responses.py \\\\\n        username/input-dataset \\\\\n        username/output-dataset \\\\\n        --model-id meta-llama/Llama-3.1-8B-Instruct \\\\\n        --temperature 0.9 \\\\\n        --top-p 0.95 \\\\\n        --max-tokens 2048\n\n    # HF Jobs execution (see script output for full command)\n    hf jobs uv run --flavor a100x4 ...\n\"\"\"\n\nimport argparse\nimport logging\nimport os\nimport sys\nfrom datetime import datetime\nfrom typing import Optional\n\nfrom datasets import load_dataset\nfrom huggingface_hub import DatasetCard, get_token, login\nfrom torch import cuda\nfrom tqdm.auto import tqdm\nfrom transformers import AutoTokenizer\nfrom vllm import LLM, SamplingParams\n\n# Enable HF Transfer for faster downloads\nos.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\"\n\nlogging.basicConfig(\n    level=logging.INFO, format=\"%(asctime)s - %(levelname)s - %(message)s\"\n)\nlogger = logging.getLogger(__name__)\n\n\ndef check_gpu_availability() -> int:\n    \"\"\"Check if CUDA is available and return the number of GPUs.\"\"\"\n    if not cuda.is_available():\n        logger.error(\"CUDA is not available. This script requires a GPU.\")\n        logger.error(\n            \"Please run on a machine with NVIDIA GPU or use HF Jobs with GPU flavor.\"\n        )\n        sys.exit(1)\n\n    num_gpus = cuda.device_count()\n    for i in range(num_gpus):\n        gpu_name = cuda.get_device_name(i)\n        gpu_memory = cuda.get_device_properties(i).total_memory / 1024**3\n        logger.info(f\"GPU {i}: {gpu_name} with {gpu_memory:.1f} GB memory\")\n\n    return num_gpus\n\n\ndef create_dataset_card(\n    source_dataset: str,\n    model_id: str,\n    messages_column: str,\n    prompt_column: Optional[str],\n    sampling_params: SamplingParams,\n    tensor_parallel_size: int,\n    num_examples: int,\n    generation_time: str,\n    num_skipped: int = 0,\n    max_model_len_used: Optional[int] = None,\n) -> str:\n    \"\"\"Create a comprehensive dataset card documenting the generation process.\"\"\"\n    filtering_section = \"\"\n    if num_skipped > 0:\n        skip_percentage = (num_skipped / num_examples) * 100\n        processed = num_examples - num_skipped\n        filtering_section = f\"\"\"\n\n### Filtering Statistics\n\n- **Total Examples**: {num_examples:,}\n- **Processed**: {processed:,} ({100 - skip_percentage:.1f}%)\n- **Skipped (too long)**: {num_skipped:,} ({skip_percentage:.1f}%)\n- **Max Model Length Used**: {max_model_len_used:,} tokens\n\nNote: Prompts exceeding the maximum model length were skipped and have empty responses.\"\"\"\n\n    return f\"\"\"---\ntags:\n- generated\n- vllm\n- uv-script\n---\n\n# Generated Responses Dataset\n\nThis dataset contains generated responses for prompts from [{source_dataset}](https://huggingface.co/datasets/{source_dataset}).\n\n## Generation Details\n\n- **Source Dataset**: [{source_dataset}](https://huggingface.co/datasets/{source_dataset})\n- **Input Column**: `{prompt_column if prompt_column else messages_column}` ({\"plain text prompts\" if prompt_column else \"chat messages\"})\n- **Model**: [{model_id}](https://huggingface.co/{model_id})\n- **Number of Examples**: {num_examples:,}\n- **Generation Date**: {generation_time}{filtering_section}\n\n### Sampling Parameters\n\n- **Temperature**: {sampling_params.temperature}\n- **Top P**: {sampling_params.top_p}\n- **Top K**: {sampling_params.top_k}\n- **Min P**: {sampling_params.min_p}\n- **Max Tokens**: {sampling_params.max_tokens}\n- **Repetition Penalty**: {sampling_params.repetition_penalty}\n\n### Hardware Configuration\n\n- **Tensor Parallel Size**: {tensor_parallel_size}\n- **GPU Configuration**: {tensor_parallel_size} GPU(s)\n\n## Dataset Structure\n\nThe dataset contains all columns from the source dataset plus:\n- `response`: The generated response from the model\n\n## Generation Script\n\nGenerated using the vLLM inference script from [uv-scripts/vllm](https://huggingface.co/datasets/uv-scripts/vllm).\n\nTo reproduce this generation:\n\n```bash\nuv run https://huggingface.co/datasets/uv-scripts/vllm/raw/main/generate-responses.py \\\\\n    {source_dataset} \\\\\n    <output-dataset> \\\\\n    --model-id {model_id} \\\\\n    {\"--prompt-column \" + prompt_column if prompt_column else \"--messages-column \" + messages_column} \\\\\n    --temperature {sampling_params.temperature} \\\\\n    --top-p {sampling_params.top_p} \\\\\n    --top-k {sampling_params.top_k} \\\\\n    --max-tokens {sampling_params.max_tokens}{f\" \\\\\\\\\\\\n    --max-model-len {max_model_len_used}\" if max_model_len_used else \"\"}\n```\n\"\"\"\n\n\ndef main(\n    src_dataset_hub_id: str,\n    output_dataset_hub_id: str,\n    model_id: str = \"Qwen/Qwen3-30B-A3B-Instruct-2507\",\n    messages_column: str = \"messages\",\n    prompt_column: Optional[str] = None,\n    output_column: str = \"response\",\n    temperature: float = 0.7,\n    top_p: float = 0.8,\n    top_k: int = 20,\n    min_p: float = 0.0,\n    max_tokens: int = 16384,\n    repetition_penalty: float = 1.0,\n    gpu_memory_utilization: float = 0.90,\n    max_model_len: Optional[int] = None,\n    tensor_parallel_size: Optional[int] = None,\n    skip_long_prompts: bool = True,\n    max_samples: Optional[int] = None,\n    hf_token: Optional[str] = None,\n):\n    \"\"\"\n    Main generation pipeline.\n\n    Args:\n        src_dataset_hub_id: Input dataset on Hugging Face Hub\n        output_dataset_hub_id: Where to save results on Hugging Face Hub\n        model_id: Hugging Face model ID for generation\n        messages_column: Column name containing chat messages\n        prompt_column: Column name containing plain text prompts (alternative to messages_column)\n        output_column: Column name for generated responses\n        temperature: Sampling temperature\n        top_p: Top-p sampling parameter\n        top_k: Top-k sampling parameter\n        min_p: Minimum probability threshold\n        max_tokens: Maximum tokens to generate\n        repetition_penalty: Repetition penalty parameter\n        gpu_memory_utilization: GPU memory utilization factor\n        max_model_len: Maximum model context length (None uses model default)\n        tensor_parallel_size: Number of GPUs to use (auto-detect if None)\n        skip_long_prompts: Skip prompts exceeding max_model_len instead of failing\n        max_samples: Maximum number of samples to process (None for all)\n        hf_token: Hugging Face authentication token\n    \"\"\"\n    generation_start_time = datetime.now().isoformat()\n\n    # GPU check and configuration\n    num_gpus = check_gpu_availability()\n    if tensor_parallel_size is None:\n        tensor_parallel_size = num_gpus\n        logger.info(\n            f\"Auto-detected {num_gpus} GPU(s), using tensor_parallel_size={tensor_parallel_size}\"\n        )\n    else:\n        logger.info(f\"Using specified tensor_parallel_size={tensor_parallel_size}\")\n        if tensor_parallel_size > num_gpus:\n            logger.warning(\n                f\"Requested {tensor_parallel_size} GPUs but only {num_gpus} available\"\n            )\n\n    # Authentication - try multiple methods\n    HF_TOKEN = hf_token or os.environ.get(\"HF_TOKEN\") or get_token()\n\n    if not HF_TOKEN:\n        logger.error(\"No HuggingFace token found. Please provide token via:\")\n        logger.error(\"  1. --hf-token argument\")\n        logger.error(\"  2. HF_TOKEN environment variable\")\n        logger.error(\"  3. Run 'hf auth login' or use login() in Python\")\n        sys.exit(1)\n\n    logger.info(\"HuggingFace token found, authenticating...\")\n    login(token=HF_TOKEN)\n\n    # Initialize vLLM\n    logger.info(f\"Loading model: {model_id}\")\n    vllm_kwargs = {\n        \"model\": model_id,\n        \"tensor_parallel_size\": tensor_parallel_size,\n        \"gpu_memory_utilization\": gpu_memory_utilization,\n    }\n    if max_model_len is not None:\n        vllm_kwargs[\"max_model_len\"] = max_model_len\n        logger.info(f\"Using max_model_len={max_model_len}\")\n\n    llm = LLM(**vllm_kwargs)\n\n    # Load tokenizer for chat template\n    logger.info(\"Loading tokenizer...\")\n    tokenizer = AutoTokenizer.from_pretrained(model_id)\n\n    # Create sampling parameters\n    sampling_params = SamplingParams(\n        temperature=temperature,\n        top_p=top_p,\n        top_k=top_k,\n        min_p=min_p,\n        max_tokens=max_tokens,\n        repetition_penalty=repetition_penalty,\n    )\n\n    # Load dataset\n    logger.info(f\"Loading dataset: {src_dataset_hub_id}\")\n    dataset = load_dataset(src_dataset_hub_id, split=\"train\")\n\n    # Apply max_samples if specified\n    if max_samples is not None and max_samples < len(dataset):\n        logger.info(f\"Limiting dataset to {max_samples} samples\")\n        dataset = dataset.select(range(max_samples))\n\n    total_examples = len(dataset)\n    logger.info(f\"Dataset loaded with {total_examples:,} examples\")\n\n    # Determine which column to use and validate\n    if prompt_column:\n        # Use prompt column mode\n        if prompt_column not in dataset.column_names:\n            logger.error(\n                f\"Column '{prompt_column}' not found. Available columns: {dataset.column_names}\"\n            )\n            sys.exit(1)\n        logger.info(f\"Using prompt column mode with column: '{prompt_column}'\")\n        use_messages = False\n    else:\n        # Use messages column mode\n        if messages_column not in dataset.column_names:\n            logger.error(\n                f\"Column '{messages_column}' not found. Available columns: {dataset.column_names}\"\n            )\n            sys.exit(1)\n        logger.info(f\"Using messages column mode with column: '{messages_column}'\")\n        use_messages = True\n\n    # Get effective max length for filtering\n    if max_model_len is not None:\n        effective_max_len = max_model_len\n    else:\n        # Get model's default max length\n        effective_max_len = llm.llm_engine.model_config.max_model_len\n    logger.info(f\"Using effective max model length: {effective_max_len}\")\n\n    # Process messages and apply chat template\n    logger.info(\"Preparing prompts...\")\n    all_prompts = []\n    valid_prompts = []\n    valid_indices = []\n    skipped_info = []\n\n    for i, example in enumerate(tqdm(dataset, desc=\"Processing prompts\")):\n        if use_messages:\n            # Messages mode: use existing chat messages\n            messages = example[messages_column]\n            # Apply chat template\n            prompt = tokenizer.apply_chat_template(\n                messages, tokenize=False, add_generation_prompt=True\n            )\n        else:\n            # Prompt mode: convert plain text to messages format\n            user_prompt = example[prompt_column]\n            messages = [{\"role\": \"user\", \"content\": user_prompt}]\n            # Apply chat template\n            prompt = tokenizer.apply_chat_template(\n                messages, tokenize=False, add_generation_prompt=True\n            )\n\n        all_prompts.append(prompt)\n\n        # Count tokens if filtering is enabled\n        if skip_long_prompts:\n            tokens = tokenizer.encode(prompt)\n            if len(tokens) <= effective_max_len:\n                valid_prompts.append(prompt)\n                valid_indices.append(i)\n            else:\n                skipped_info.append((i, len(tokens)))\n        else:\n            valid_prompts.append(prompt)\n            valid_indices.append(i)\n\n    # Log filtering results\n    if skip_long_prompts and skipped_info:\n        logger.warning(\n            f\"Skipped {len(skipped_info)} prompts that exceed max_model_len ({effective_max_len} tokens)\"\n        )\n        logger.info(\"Skipped prompt details (first 10):\")\n        for idx, (prompt_idx, token_count) in enumerate(skipped_info[:10]):\n            logger.info(\n                f\"  - Example {prompt_idx}: {token_count} tokens (exceeds by {token_count - effective_max_len})\"\n            )\n        if len(skipped_info) > 10:\n            logger.info(f\"  ... and {len(skipped_info) - 10} more\")\n\n        skip_percentage = (len(skipped_info) / total_examples) * 100\n        if skip_percentage > 10:\n            logger.warning(f\"WARNING: {skip_percentage:.1f}% of prompts were skipped!\")\n\n    if not valid_prompts:\n        logger.error(\"No valid prompts to process after filtering!\")\n        sys.exit(1)\n\n    # Generate responses - vLLM handles batching internally\n    logger.info(f\"Starting generation for {len(valid_prompts):,} valid prompts...\")\n    logger.info(\"vLLM will handle batching and scheduling automatically\")\n\n    outputs = llm.generate(valid_prompts, sampling_params)\n\n    # Extract generated text and create full response list\n    logger.info(\"Extracting generated responses...\")\n    responses = [\"\"] * total_examples  # Initialize with empty strings\n\n    for idx, output in enumerate(outputs):\n        original_idx = valid_indices[idx]\n        response = output.outputs[0].text.strip()\n        responses[original_idx] = response\n\n    # Add responses to dataset\n    logger.info(\"Adding responses to dataset...\")\n    dataset = dataset.add_column(output_column, responses)\n\n    # Create dataset card\n    logger.info(\"Creating dataset card...\")\n    card_content = create_dataset_card(\n        source_dataset=src_dataset_hub_id,\n        model_id=model_id,\n        messages_column=messages_column,\n        prompt_column=prompt_column,\n        sampling_params=sampling_params,\n        tensor_parallel_size=tensor_parallel_size,\n        num_examples=total_examples,\n        generation_time=generation_start_time,\n        num_skipped=len(skipped_info) if skip_long_prompts else 0,\n        max_model_len_used=effective_max_len if skip_long_prompts else None,\n    )\n\n    # Push dataset to hub\n    logger.info(f\"Pushing dataset to: {output_dataset_hub_id}\")\n    dataset.push_to_hub(output_dataset_hub_id, token=HF_TOKEN)\n\n    # Push dataset card\n    card = DatasetCard(card_content)\n    card.push_to_hub(output_dataset_hub_id, token=HF_TOKEN)\n\n    logger.info(\"✅ Generation complete!\")\n    logger.info(\n        f\"Dataset available at: https://huggingface.co/datasets/{output_dataset_hub_id}\"\n    )\n\n\nif __name__ == \"__main__\":\n    if len(sys.argv) > 1:\n        parser = argparse.ArgumentParser(\n            description=\"Generate responses for dataset prompts using vLLM\",\n            formatter_class=argparse.RawDescriptionHelpFormatter,\n            epilog=\"\"\"\nExamples:\n  # Basic usage with default Qwen model\n  uv run generate-responses.py input-dataset output-dataset\n  \n  # With custom model and parameters\n  uv run generate-responses.py input-dataset output-dataset \\\\\n    --model-id meta-llama/Llama-3.1-8B-Instruct \\\\\n    --temperature 0.9 \\\\\n    --max-tokens 2048\n  \n  # Force specific GPU configuration\n  uv run generate-responses.py input-dataset output-dataset \\\\\n    --tensor-parallel-size 2 \\\\\n    --gpu-memory-utilization 0.95\n  \n  # Using environment variable for token\n  HF_TOKEN=hf_xxx uv run generate-responses.py input-dataset output-dataset\n            \"\"\",\n        )\n\n        parser.add_argument(\n            \"src_dataset_hub_id\",\n            help=\"Input dataset on Hugging Face Hub (e.g., username/dataset-name)\",\n        )\n        parser.add_argument(\n            \"output_dataset_hub_id\", help=\"Output dataset name on Hugging Face Hub\"\n        )\n        parser.add_argument(\n            \"--model-id\",\n            type=str,\n            default=\"Qwen/Qwen3-30B-A3B-Instruct-2507\",\n            help=\"Model to use for generation (default: Qwen3-30B-A3B-Instruct-2507)\",\n        )\n        parser.add_argument(\n            \"--messages-column\",\n            type=str,\n            default=\"messages\",\n            help=\"Column containing chat messages (default: messages)\",\n        )\n        parser.add_argument(\n            \"--prompt-column\",\n            type=str,\n            help=\"Column containing plain text prompts (alternative to --messages-column)\",\n        )\n        parser.add_argument(\n            \"--output-column\",\n            type=str,\n            default=\"response\",\n            help=\"Column name for generated responses (default: response)\",\n        )\n        parser.add_argument(\n            \"--max-samples\",\n            type=int,\n            help=\"Maximum number of samples to process (default: all)\",\n        )\n        parser.add_argument(\n            \"--temperature\",\n            type=float,\n            default=0.7,\n            help=\"Sampling temperature (default: 0.7)\",\n        )\n        parser.add_argument(\n            \"--top-p\",\n            type=float,\n            default=0.8,\n            help=\"Top-p sampling parameter (default: 0.8)\",\n        )\n        parser.add_argument(\n            \"--top-k\",\n            type=int,\n            default=20,\n            help=\"Top-k sampling parameter (default: 20)\",\n        )\n        parser.add_argument(\n            \"--min-p\",\n            type=float,\n            default=0.0,\n            help=\"Minimum probability threshold (default: 0.0)\",\n        )\n        parser.add_argument(\n            \"--max-tokens\",\n            type=int,\n            default=16384,\n            help=\"Maximum tokens to generate (default: 16384)\",\n        )\n        parser.add_argument(\n            \"--repetition-penalty\",\n            type=float,\n            default=1.0,\n            help=\"Repetition penalty (default: 1.0)\",\n        )\n        parser.add_argument(\n            \"--gpu-memory-utilization\",\n            type=float,\n            default=0.90,\n            help=\"GPU memory utilization factor (default: 0.90)\",\n        )\n        parser.add_argument(\n            \"--max-model-len\",\n            type=int,\n            help=\"Maximum model context length (default: model's default)\",\n        )\n        parser.add_argument(\n            \"--tensor-parallel-size\",\n            type=int,\n            help=\"Number of GPUs to use (default: auto-detect)\",\n        )\n        parser.add_argument(\n            \"--hf-token\",\n            type=str,\n            help=\"Hugging Face token (can also use HF_TOKEN env var)\",\n        )\n        parser.add_argument(\n            \"--skip-long-prompts\",\n            action=\"store_true\",\n            default=True,\n            help=\"Skip prompts that exceed max_model_len instead of failing (default: True)\",\n        )\n        parser.add_argument(\n            \"--no-skip-long-prompts\",\n            dest=\"skip_long_prompts\",\n            action=\"store_false\",\n            help=\"Fail on prompts that exceed max_model_len\",\n        )\n\n        args = parser.parse_args()\n\n        main(\n            src_dataset_hub_id=args.src_dataset_hub_id,\n            output_dataset_hub_id=args.output_dataset_hub_id,\n            model_id=args.model_id,\n            messages_column=args.messages_column,\n            prompt_column=args.prompt_column,\n            output_column=args.output_column,\n            temperature=args.temperature,\n            top_p=args.top_p,\n            top_k=args.top_k,\n            min_p=args.min_p,\n            max_tokens=args.max_tokens,\n            repetition_penalty=args.repetition_penalty,\n            gpu_memory_utilization=args.gpu_memory_utilization,\n            max_model_len=args.max_model_len,\n            tensor_parallel_size=args.tensor_parallel_size,\n            skip_long_prompts=args.skip_long_prompts,\n            max_samples=args.max_samples,\n            hf_token=args.hf_token,\n        )\n    else:\n        # Show HF Jobs example when run without arguments\n        print(\"\"\"\nvLLM Response Generation Script\n==============================\n\nThis script requires arguments. For usage information:\n    uv run generate-responses.py --help\n\nExample HF Jobs command with multi-GPU:\n    # If you're logged in with hf auth, token will be auto-detected\n    hf jobs uv run \\\\\n        --flavor l4x4 \\\\\n        https://huggingface.co/datasets/uv-scripts/vllm/raw/main/generate-responses.py \\\\\n        username/input-dataset \\\\\n        username/output-dataset \\\\\n        --messages-column messages \\\\\n        --model-id Qwen/Qwen3-30B-A3B-Instruct-2507 \\\\\n        --temperature 0.7 \\\\\n        --max-tokens 16384\n        \"\"\")\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/SKILL.md",
    "content": "---\nname: hugging-face-model-trainer\ndescription: This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.\nlicense: Complete terms in LICENSE.txt\n---\n\n# TRL Training on Hugging Face Jobs\n\n## Overview\n\nTrain language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.\n\n**TRL provides multiple training methods:**\n- **SFT** (Supervised Fine-Tuning) - Standard instruction tuning\n- **DPO** (Direct Preference Optimization) - Alignment from preference data\n- **GRPO** (Group Relative Policy Optimization) - Online RL training\n- **Reward Modeling** - Train reward models for RLHF\n\n**For detailed TRL method documentation:**\n```python\nhf_doc_search(\"your query\", product=\"trl\")\nhf_doc_fetch(\"https://huggingface.co/docs/trl/sft_trainer\")  # SFT\nhf_doc_fetch(\"https://huggingface.co/docs/trl/dpo_trainer\")  # DPO\n# etc.\n```\n\n**See also:** `references/training_methods.md` for method overviews and selection guidance\n\n## When to Use This Skill\n\nUse this skill when users want to:\n- Fine-tune language models on cloud GPUs without local infrastructure\n- Train with TRL methods (SFT, DPO, GRPO, etc.)\n- Run training jobs on Hugging Face Jobs infrastructure\n- Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)\n- Ensure trained models are permanently saved to the Hub\n- Use modern workflows with optimized defaults\n\n### When to Use Unsloth\n\nUse **Unsloth** (`references/unsloth.md`) instead of standard TRL when:\n- **Limited GPU memory** - Unsloth uses ~60% less VRAM\n- **Speed matters** - Unsloth is ~2x faster\n- Training **large models (>13B)** - memory efficiency is critical\n- Training **Vision-Language Models (VLMs)** - Unsloth has `FastVisionModel` support\n\nSee `references/unsloth.md` for complete Unsloth documentation and `scripts/unsloth_sft_example.py` for a production-ready training script.\n\n## Key Directives\n\nWhen assisting with training jobs:\n\n1. **ALWAYS use `hf_jobs()` MCP tool** - Submit jobs using `hf_jobs(\"uv\", {...})`, NOT bash `trl-jobs` commands. The `script` parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to `hf_jobs()`. If user asks to \"train a model\", \"fine-tune\", or similar requests, you MUST create the training script AND submit the job immediately using `hf_jobs()`.\n\n2. **Always include Trackio** - Every training script should include Trackio for real-time monitoring. Use example scripts in `scripts/` as templates.\n\n3. **Provide job details after submission** - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.\n\n4. **Use example scripts as templates** - Reference `scripts/train_sft_example.py`, `scripts/train_dpo_example.py`, etc. as starting points.\n\n## Local Script Execution\n\nRepository scripts use PEP 723 inline dependencies. Run them with `uv run`:\n```bash\nuv run scripts/estimate_cost.py --help\nuv run scripts/dataset_inspector.py --help\n```\n\n## Prerequisites Checklist\n\nBefore starting any training job, verify:\n\n### ✅ **Account & Authentication**\n- Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)\n- Authenticated login: Check with `hf_whoami()`\n- **HF_TOKEN for Hub Push** ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost\n- Token must have write permissions  \n- **MUST pass `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` in job config** to make token available (the `$HF_TOKEN` syntax\n  references your actual token value)\n\n### ✅ **Dataset Requirements**\n- Dataset must exist on Hub or be loadable via `datasets.load_dataset()`\n- Format must match training method (SFT: \"messages\"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)\n- **ALWAYS validate unknown datasets** before GPU training to prevent format failures (see Dataset Validation section below)\n- Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)\n\n### ⚠️ **Critical Settings**\n- **Timeout must exceed expected training time** - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.\n- **Hub push must be enabled** - Config: `push_to_hub=True`, `hub_model_id=\"username/model-name\"`; Job: `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`\n\n## Asynchronous Job Guidelines\n\n**⚠️ IMPORTANT: Training jobs run asynchronously and can take hours**\n\n### Action Required\n\n**When user requests training:**\n1. **Create the training script** with Trackio included (use `scripts/train_sft_example.py` as template)\n2. **Submit immediately** using `hf_jobs()` MCP tool with script content inline - don't save to file unless user requests\n3. **Report submission** with job ID, monitoring URL, and estimated time\n4. **Wait for user** to request status checks - don't poll automatically\n\n### Ground Rules\n- **Jobs run in background** - Submission returns immediately; training continues independently\n- **Initial logs delayed** - Can take 30-60 seconds for logs to appear\n- **User checks status** - Wait for user to request status updates\n- **Avoid polling** - Check logs only on user request; provide monitoring links instead\n\n### After Submission\n\n**Provide to user:**\n- ✅ Job ID and monitoring URL\n- ✅ Expected completion time\n- ✅ Trackio dashboard URL\n- ✅ Note that user can request status checks later\n\n**Example Response:**\n```\n✅ Job submitted successfully!\n\nJob ID: abc123xyz\nMonitor: https://huggingface.co/jobs/username/abc123xyz\n\nExpected time: ~2 hours\nEstimated cost: ~$10\n\nThe job is running in the background. Ask me to check status/logs when ready!\n```\n\n## Quick Start: Three Approaches\n\n**💡 Tip for Demos:** For quick demos on smaller GPUs (t4-small), omit `eval_dataset` and `eval_strategy` to save ~40% memory. You'll still see training loss and learning progress.\n\n### Sequence Length Configuration\n\n**TRL config classes use `max_length` (not `max_seq_length`)** to control tokenized sequence length:\n\n```python\n# ✅ CORRECT - If you need to set sequence length\nSFTConfig(max_length=512)   # Truncate sequences to 512 tokens\nDPOConfig(max_length=2048)  # Longer context (2048 tokens)\n\n# ❌ WRONG - This parameter doesn't exist\nSFTConfig(max_seq_length=512)  # TypeError!\n```\n\n**Default behavior:** `max_length=1024` (truncates from right). This works well for most training.\n\n**When to override:**\n- **Longer context**: Set higher (e.g., `max_length=2048`)\n- **Memory constraints**: Set lower (e.g., `max_length=512`)\n- **Vision models**: Set `max_length=None` (prevents cutting image tokens)\n\n**Usually you don't need to set this parameter at all** - the examples below use the sensible default.\n\n### Approach 1: UV Scripts (Recommended—Default Choice)\n\nUV scripts use PEP 723 inline dependencies for clean, self-contained training. **This is the primary approach for Claude Code.**\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"trl>=0.12.0\", \"peft>=0.7.0\", \"trackio\"]\n# ///\n\nfrom datasets import load_dataset\nfrom peft import LoraConfig\nfrom trl import SFTTrainer, SFTConfig\nimport trackio\n\ndataset = load_dataset(\"trl-lib/Capybara\", split=\"train\")\n\n# Create train/eval split for monitoring\ndataset_split = dataset.train_test_split(test_size=0.1, seed=42)\n\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset_split[\"train\"],\n    eval_dataset=dataset_split[\"test\"],\n    peft_config=LoraConfig(r=16, lora_alpha=32),\n    args=SFTConfig(\n        output_dir=\"my-model\",\n        push_to_hub=True,\n        hub_model_id=\"username/my-model\",\n        num_train_epochs=3,\n        eval_strategy=\"steps\",\n        eval_steps=50,\n        report_to=\"trackio\",\n        project=\"meaningful_prject_name\", # project name for the training name (trackio)\n        run_name=\"meaningful_run_name\",   # descriptive name for the specific training run (trackio)\n    )\n)\n\ntrainer.train()\ntrainer.push_to_hub()\n\"\"\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"2h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n**Benefits:** Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control\n**When to use:** Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring `hf_jobs()`\n\n#### Working with Scripts\n\n⚠️ **Important:** The `script` parameter accepts either inline code (as shown above) OR a URL. **Local file paths do NOT work.**\n\n**Why local paths don't work:**\nJobs run in isolated Docker containers without access to your local filesystem. Scripts must be:\n- Inline code (recommended for custom training)\n- Publicly accessible URLs\n- Private repo URLs (with HF_TOKEN)\n\n**Common mistakes:**\n```python\n# ❌ These will all fail\nhf_jobs(\"uv\", {\"script\": \"train.py\"})\nhf_jobs(\"uv\", {\"script\": \"./scripts/train.py\"})\nhf_jobs(\"uv\", {\"script\": \"/path/to/train.py\"})\n```\n\n**Correct approaches:**\n```python\n# ✅ Inline code (recommended)\nhf_jobs(\"uv\", {\"script\": \"# /// script\\n# dependencies = [...]\\n# ///\\n\\n<your code>\"})\n\n# ✅ From Hugging Face Hub\nhf_jobs(\"uv\", {\"script\": \"https://huggingface.co/user/repo/resolve/main/train.py\"})\n\n# ✅ From GitHub\nhf_jobs(\"uv\", {\"script\": \"https://raw.githubusercontent.com/user/repo/main/train.py\"})\n\n# ✅ From Gist\nhf_jobs(\"uv\", {\"script\": \"https://gist.githubusercontent.com/user/id/raw/train.py\"})\n```\n\n**To use local scripts:** Upload to HF Hub first:\n```bash\nhf repos create my-training-scripts --type model\nhf upload my-training-scripts ./train.py train.py\n# Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py\n```\n\n### Approach 2: TRL Maintained Scripts (Official Examples)\n\nTRL provides battle-tested scripts for all methods. Can be run from URLs:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py\",\n    \"script_args\": [\n        \"--model_name_or_path\", \"Qwen/Qwen2.5-0.5B\",\n        \"--dataset_name\", \"trl-lib/Capybara\",\n        \"--output_dir\", \"my-model\",\n        \"--push_to_hub\",\n        \"--hub_model_id\", \"username/my-model\"\n    ],\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"2h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n**Benefits:** No code to write, maintained by TRL team, production-tested\n**When to use:** Standard TRL training, quick experiments, don't need custom code\n**Available:** Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts\n\n### Finding More UV Scripts on Hub\n\nThe `uv-scripts` organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:\n\n```python\n# Discover available UV script collections\ndataset_search({\"author\": \"uv-scripts\", \"sort\": \"downloads\", \"limit\": 20})\n\n# Explore a specific collection\nhub_repo_details([\"uv-scripts/classification\"], repo_type=\"dataset\", include_readme=True)\n```\n\n**Popular collections:** ocr, classification, synthetic-data, vllm, dataset-creation\n\n### Approach 3: HF Jobs CLI (Direct Terminal Commands)\n\nWhen the `hf_jobs()` MCP tool is unavailable, use the `hf jobs` CLI directly.\n\n**⚠️ CRITICAL: CLI Syntax Rules**\n\n```bash\n# ✅ CORRECT syntax - flags BEFORE script URL\nhf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN \"https://example.com/train.py\"\n\n# ❌ WRONG - \"run uv\" instead of \"uv run\"\nhf jobs run uv \"https://example.com/train.py\" --flavor a10g-large\n\n# ❌ WRONG - flags AFTER script URL (will be ignored!)\nhf jobs uv run \"https://example.com/train.py\" --flavor a10g-large\n\n# ❌ WRONG - \"--secret\" instead of \"--secrets\" (plural)\nhf jobs uv run --secret HF_TOKEN \"https://example.com/train.py\"\n```\n\n**Key syntax rules:**\n1. Command order is `hf jobs uv run` (NOT `hf jobs run uv`)\n2. All flags (`--flavor`, `--timeout`, `--secrets`) must come BEFORE the script URL\n3. Use `--secrets` (plural), not `--secret`\n4. Script URL must be the last positional argument\n\n**Complete CLI example:**\n```bash\nhf jobs uv run \\\n  --flavor a10g-large \\\n  --timeout 2h \\\n  --secrets HF_TOKEN \\\n  \"https://huggingface.co/user/repo/resolve/main/train.py\"\n```\n\n**Check job status via CLI:**\n```bash\nhf jobs ps                        # List all jobs\nhf jobs logs <job-id>             # View logs\nhf jobs inspect <job-id>          # Job details\nhf jobs cancel <job-id>           # Cancel a job\n```\n\n### Approach 4: TRL Jobs Package (Simplified Training)\n\nThe `trl-jobs` package provides optimized defaults and one-liner training.\n\n```bash\nuvx trl-jobs sft \\\n  --model_name Qwen/Qwen2.5-0.5B \\\n  --dataset_name trl-lib/Capybara\n\n```\n\n**Benefits:** Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands\n**When to use:** User working in terminal directly (not Claude Code context), quick local experimentation\n**Repository:** https://github.com/huggingface/trl-jobs\n\n⚠️ **In Claude Code context, prefer using `hf_jobs()` MCP tool (Approach 1) when available.**\n\n## Hardware Selection\n\n| Model Size | Recommended Hardware | Cost (approx/hr) | Use Case |\n|------------|---------------------|------------------|----------|\n| <1B params | `t4-small` | ~$0.75 | Demos, quick tests only without eval steps |\n| 1-3B params | `t4-medium`, `l4x1` | ~$1.50-2.50 | Development |\n| 3-7B params | `a10g-small`, `a10g-large` | ~$3.50-5.00 | Production training |\n| 7-13B params | `a10g-large`, `a100-large` | ~$5-10 | Large models (use LoRA) |\n| 13B+ params | `a100-large`, `a10g-largex2` | ~$10-20 | Very large (use LoRA) |\n\n**GPU Flavors:** cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8\n\n**Guidelines:**\n- Use **LoRA/PEFT** for models >7B to reduce memory\n- Multi-GPU automatically handled by TRL/Accelerate\n- Start with smaller hardware for testing\n\n**See:** `references/hardware_guide.md` for detailed specifications\n\n## Critical: Saving Results to Hub\n\n**⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB**\n\nThe Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, **ALL TRAINING IS LOST**.\n\n### Required Configuration\n\n**In training script/config:**\n```python\nSFTConfig(\n    push_to_hub=True,\n    hub_model_id=\"username/model-name\",  # MUST specify\n    hub_strategy=\"every_save\",  # Optional: push checkpoints\n)\n```\n\n**In job submission:**\n```python\n{\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # Enables authentication\n}\n```\n\n### Verification Checklist\n\nBefore submitting:\n- [ ] `push_to_hub=True` set in config\n- [ ] `hub_model_id` includes username/repo-name\n- [ ] `secrets` parameter includes HF_TOKEN\n- [ ] User has write access to target repo\n\n**See:** `references/hub_saving.md` for detailed troubleshooting\n\n## Timeout Management\n\n**⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING**\n\n### Setting Timeouts\n\n```python\n{\n    \"timeout\": \"2h\"   # 2 hours (formats: \"90m\", \"2h\", \"1.5h\", or seconds as integer)\n}\n```\n\n### Timeout Guidelines\n\n| Scenario | Recommended | Notes |\n|----------|-------------|-------|\n| Quick demo (50-100 examples) | 10-30 min | Verify setup |\n| Development training | 1-2 hours | Small datasets |\n| Production (3-7B model) | 4-6 hours | Full datasets |\n| Large model with LoRA | 3-6 hours | Depends on dataset |\n\n**Always add 20-30% buffer** for model/dataset loading, checkpoint saving, Hub push operations, and network delays.\n\n**On timeout:** Job killed immediately, all unsaved progress lost, must restart from beginning\n\n## Cost Estimation\n\n**Offer to estimate cost when planning jobs with known parameters.** Use `scripts/estimate_cost.py`:\n\n```bash\nuv run scripts/estimate_cost.py \\\n  --model meta-llama/Llama-2-7b-hf \\\n  --dataset trl-lib/Capybara \\\n  --hardware a10g-large \\\n  --dataset-size 16000 \\\n  --epochs 3\n```\n\nOutput includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.\n\n**When to offer:** User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5\n\n## Example Training Scripts\n\n**Production-ready templates with all best practices:**\n\nLoad these scripts for correctly:\n\n- **`scripts/train_sft_example.py`** - Complete SFT training with Trackio, LoRA, checkpoints\n- **`scripts/train_dpo_example.py`** - DPO training for preference learning\n- **`scripts/train_grpo_example.py`** - GRPO training for online RL\n\nThese scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to `hf_jobs()` or use as templates for custom scripts.\n\n## Monitoring and Tracking\n\n**Trackio** provides real-time metrics visualization. See `references/trackio_guide.md` for complete setup guide.\n\n**Key points:**\n- Add `trackio` to dependencies\n- Configure trainer with `report_to=\"trackio\" and run_name=\"meaningful_name\"`\n\n### Trackio Configuration Defaults\n\n**Use sensible defaults unless user specifies otherwise.** When generating training scripts with Trackio:\n\n**Default Configuration:**\n- **Space ID**: `{username}/trackio` (use \"trackio\" as default space name)\n- **Run naming**: Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)\n- **Config**: Keep minimal - only include hyperparameters and model/dataset info\n- **Project Name**: Use a Project Name to associate runs with a particular Project \n\n**User overrides:** If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.\n\n\nThis is useful for managing multiple jobs with the same configuration or keeping training scripts portable.\n\nSee `references/trackio_guide.md` for complete documentation including grouping runs for experiments.\n\n### Check Job Status\n\n```python\n# List all jobs\nhf_jobs(\"ps\")\n\n# Inspect specific job\nhf_jobs(\"inspect\", {\"job_id\": \"your-job-id\"})\n\n# View logs\nhf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})\n```\n\n**Remember:** Wait for user to request status checks. Avoid polling repeatedly.\n\n## Dataset Validation\n\n**Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.**\n\n### Why Validate\n\n- 50%+ of training failures are due to dataset format issues\n- DPO especially strict: requires exact column names (`prompt`, `chosen`, `rejected`)\n- Failed GPU jobs waste $1-10 and 30-60 minutes\n- Validation on CPU costs ~$0.01 and takes <1 minute\n\n### When to Validate\n\n**ALWAYS validate for:**\n- Unknown or custom datasets\n- DPO training (CRITICAL - 90% of datasets need mapping)\n- Any dataset not explicitly TRL-compatible\n\n**Skip validation for known TRL datasets:**\n- `trl-lib/ultrachat_200k`, `trl-lib/Capybara`, `HuggingFaceH4/ultrachat_200k`, etc.\n\n### Usage\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py\",\n    \"script_args\": [\"--dataset\", \"username/dataset-name\", \"--split\", \"train\"]\n})\n```\n\nThe script is fast, and will usually complete synchronously.\n\n### Reading Results\n\nThe output shows compatibility for each training method:\n\n- **`✓ READY`** - Dataset is compatible, use directly\n- **`✗ NEEDS MAPPING`** - Compatible but needs preprocessing (mapping code provided)\n- **`✗ INCOMPATIBLE`** - Cannot be used for this method\n\nWhen mapping is needed, the output includes a **\"MAPPING CODE\"** section with copy-paste ready Python code.\n\n### Example Workflow\n\n```python\n# 1. Inspect dataset (costs ~$0.01, <1 min on CPU)\nhf_jobs(\"uv\", {\n    \"script\": \"https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py\",\n    \"script_args\": [\"--dataset\", \"argilla/distilabel-math-preference-dpo\", \"--split\", \"train\"]\n})\n\n# 2. Check output markers:\n#    ✓ READY → proceed with training\n#    ✗ NEEDS MAPPING → apply mapping code below\n#    ✗ INCOMPATIBLE → choose different method/dataset\n\n# 3. If mapping needed, apply before training:\ndef format_for_dpo(example):\n    return {\n        'prompt': example['instruction'],\n        'chosen': example['chosen_response'],\n        'rejected': example['rejected_response'],\n    }\ndataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)\n\n# 4. Launch training job with confidence\n```\n\n### Common Scenario: DPO Format Mismatch\n\nMost DPO datasets use non-standard column names. Example:\n\n```\nDataset has: instruction, chosen_response, rejected_response\nDPO expects: prompt, chosen, rejected\n```\n\nThe validator detects this and provides exact mapping code to fix it.\n\n## Converting Models to GGUF\n\nAfter training, convert models to **GGUF format** for use with llama.cpp, Ollama, LM Studio, and other local inference tools.\n\n**What is GGUF:**\n- Optimized for CPU/GPU inference with llama.cpp\n- Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size\n- Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp\n- Typically 2-8GB for 7B models (vs 14GB unquantized)\n\n**When to convert:**\n- Running models locally with Ollama or LM Studio\n- Reducing model size with quantization\n- Deploying to edge devices\n- Sharing models for local-first use\n\n**See:** `references/gguf_conversion.md` for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.\n\n**Quick conversion:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"<see references/gguf_conversion.md for complete script>\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"45m\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n    \"env\": {\n        \"ADAPTER_MODEL\": \"username/my-finetuned-model\",\n        \"BASE_MODEL\": \"Qwen/Qwen2.5-0.5B\",\n        \"OUTPUT_REPO\": \"username/my-model-gguf\"\n    }\n})\n```\n\n## Common Training Patterns\n\nSee `references/training_patterns.md` for detailed examples including:\n- Quick demo (5-10 minutes)\n- Production with checkpoints\n- Multi-GPU training\n- DPO training (preference learning)\n- GRPO training (online RL)\n\n## Common Failure Modes\n\n### Out of Memory (OOM)\n\n**Fix (try in order):**\n1. Reduce batch size: `per_device_train_batch_size=1`, increase `gradient_accumulation_steps=8`. Effective batch size is `per_device_train_batch_size` x `gradient_accumulation_steps`. For best performance keep effective batch size close to 128. \n2. Enable: `gradient_checkpointing=True`\n3. Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc. \n\n### Dataset Misformatted\n\n**Fix:**\n1. Validate first with dataset inspector:\n   ```bash\n   uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \\\n     --dataset name --split train\n   ```\n2. Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)\n3. Apply mapping code from inspector output if needed\n\n### Job Timeout\n\n**Fix:**\n1. Check logs for actual runtime: `hf_jobs(\"logs\", {\"job_id\": \"...\"})`\n2. Increase timeout with buffer: `\"timeout\": \"3h\"` (add 30% to estimated time)\n3. Or reduce training: lower `num_train_epochs`, use smaller dataset, enable `max_steps`\n4. Save checkpoints: `save_strategy=\"steps\"`, `save_steps=500`, `hub_strategy=\"every_save\"`\n\n**Note:** Default 30min is insufficient for real training. Minimum 1-2 hours.\n\n### Hub Push Failures\n\n**Fix:**\n1. Add to job: `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`\n2. Add to config: `push_to_hub=True`, `hub_model_id=\"username/model-name\"`\n3. Verify auth: `mcp__huggingface__hf_whoami()`\n4. Check token has write permissions and repo exists (or set `hub_private_repo=True`)\n\n### Missing Dependencies\n\n**Fix:**\nAdd to PEP 723 header:\n```python\n# /// script\n# dependencies = [\"trl>=0.12.0\", \"peft>=0.7.0\", \"trackio\", \"missing-package\"]\n# ///\n```\n\n## Troubleshooting\n\n**Common issues:**\n- Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA\n- Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN\n- Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU\n- Dataset format error → Validate with dataset inspector (see Dataset Validation section)\n- Import/module errors → Add PEP 723 header with dependencies, verify format\n- Authentication errors → Check `mcp__huggingface__hf_whoami()`, token permissions, secrets parameter\n\n**See:** `references/troubleshooting.md` for complete troubleshooting guide\n\n## Resources\n\n### References (In This Skill)\n- `references/training_methods.md` - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling\n- `references/training_patterns.md` - Common training patterns and examples\n- `references/unsloth.md` - Unsloth for fast VLM training (~2x speed, 60% less VRAM)\n- `references/gguf_conversion.md` - Complete GGUF conversion guide\n- `references/trackio_guide.md` - Trackio monitoring setup\n- `references/hardware_guide.md` - Hardware specs and selection\n- `references/hub_saving.md` - Hub authentication troubleshooting\n- `references/troubleshooting.md` - Common issues and solutions\n- `references/local_training_macos.md` - Local training on macOS\n\n### Scripts (In This Skill)\n- `scripts/train_sft_example.py` - Production SFT template\n- `scripts/train_dpo_example.py` - Production DPO template\n- `scripts/train_grpo_example.py` - Production GRPO template\n- `scripts/unsloth_sft_example.py` - Unsloth text LLM training template (faster, less VRAM)\n- `scripts/estimate_cost.py` - Estimate time and cost (offer when appropriate)\n- `scripts/convert_to_gguf.py` - Complete GGUF conversion script\n\n### External Scripts\n- [Dataset Inspector](https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py) - Validate dataset format before training (use via `uv run` or `hf_jobs`)\n\n### External Links\n- [TRL Documentation](https://huggingface.co/docs/trl)\n- [TRL Jobs Training Guide](https://huggingface.co/docs/trl/en/jobs_training)\n- [TRL Jobs Package](https://github.com/huggingface/trl-jobs)\n- [HF Jobs Documentation](https://huggingface.co/docs/huggingface_hub/guides/jobs)\n- [TRL Example Scripts](https://github.com/huggingface/trl/tree/main/examples/scripts)\n- [UV Scripts Guide](https://docs.astral.sh/uv/guides/scripts/)\n- [UV Scripts Organization](https://huggingface.co/uv-scripts)\n\n## Key Takeaways\n\n1. **Submit scripts inline** - The `script` parameter accepts Python code directly; no file saving required unless user requests\n2. **Jobs are asynchronous** - Don't wait/poll; let user check when ready\n3. **Always set timeout** - Default 30 min is insufficient; minimum 1-2 hours recommended\n4. **Always enable Hub push** - Environment is ephemeral; without push, all results lost\n5. **Include Trackio** - Use example scripts as templates for real-time monitoring\n6. **Offer cost estimation** - When parameters are known, use `scripts/estimate_cost.py`\n7. **Use UV scripts (Approach 1)** - Default to `hf_jobs(\"uv\", {...})` with inline scripts; TRL maintained scripts for standard training; avoid bash `trl-jobs` commands in Claude Code\n8. **Use hf_doc_fetch/hf_doc_search** for latest TRL documentation\n9. **Validate dataset format** before training with dataset inspector (see Dataset Validation section)\n10. **Choose appropriate hardware** for model size; use LoRA for models >7B\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/gguf_conversion.md",
    "content": "# GGUF Conversion Guide\n\nAfter training models with TRL on Hugging Face Jobs, convert them to **GGUF format** for use with llama.cpp, Ollama, LM Studio, and other local inference tools.\n\n**This guide provides production-ready, tested code based on successful conversions.** All critical dependencies and build steps are included.\n\n## What is GGUF?\n\n**GGUF** (GPT-Generated Unified Format):\n- Optimized format for CPU/GPU inference with llama.cpp\n- Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size\n- Compatible with: Ollama, LM Studio, Jan, GPT4All, llama.cpp\n- Typically 2-8GB for 7B models (vs 14GB unquantized)\n\n## When to Convert to GGUF\n\n**Convert when:**\n- Running models locally with Ollama or LM Studio\n- Using CPU-optimized inference\n- Reducing model size with quantization\n- Deploying to edge devices\n- Sharing models for local-first use\n\n## Critical Success Factors\n\nBased on production testing, these are **essential** for reliable conversion:\n\n### 1. ✅ Install Build Tools FIRST\n**Before cloning llama.cpp**, install build dependencies:\n```python\nsubprocess.run([\"apt-get\", \"update\", \"-qq\"], check=True, capture_output=True)\nsubprocess.run([\"apt-get\", \"install\", \"-y\", \"-qq\", \"build-essential\", \"cmake\"], check=True, capture_output=True)\n```\n\n**Why:** The quantization tool requires gcc and cmake. Installing after cloning doesn't help.\n\n### 2. ✅ Use CMake (Not Make)\n**Build the quantize tool with CMake:**\n```python\n# Create build directory\nos.makedirs(\"/tmp/llama.cpp/build\", exist_ok=True)\n\n# Configure\nsubprocess.run([\n    \"cmake\", \"-B\", \"/tmp/llama.cpp/build\", \"-S\", \"/tmp/llama.cpp\",\n    \"-DGGML_CUDA=OFF\"  # Faster build, CUDA not needed for quantization\n], check=True, capture_output=True, text=True)\n\n# Build\nsubprocess.run([\n    \"cmake\", \"--build\", \"/tmp/llama.cpp/build\",\n    \"--target\", \"llama-quantize\", \"-j\", \"4\"\n], check=True, capture_output=True, text=True)\n\n# Binary path\nquantize_bin = \"/tmp/llama.cpp/build/bin/llama-quantize\"\n```\n\n**Why:** CMake is more reliable than `make` and produces consistent binary paths.\n\n### 3. ✅ Include All Dependencies\n**PEP 723 header must include:**\n```python\n# /// script\n# dependencies = [\n#     \"transformers>=4.36.0\",\n#     \"peft>=0.7.0\",\n#     \"torch>=2.0.0\",\n#     \"accelerate>=0.24.0\",\n#     \"huggingface_hub>=0.20.0\",\n#     \"sentencepiece>=0.1.99\",  # Required for tokenizer\n#     \"protobuf>=3.20.0\",        # Required for tokenizer\n#     \"numpy\",\n#     \"gguf\",\n# ]\n# ///\n```\n\n**Why:** `sentencepiece` and `protobuf` are critical for tokenizer conversion. Missing them causes silent failures.\n\n### 4. ✅ Verify Names Before Use\n**Always verify repos exist:**\n```python\n# Before submitting job, verify:\nhub_repo_details([ADAPTER_MODEL], repo_type=\"model\")\nhub_repo_details([BASE_MODEL], repo_type=\"model\")\n```\n\n**Why:** Non-existent dataset/model names cause job failures that could be caught in seconds.\n\n## Complete Conversion Script\n\nSee `scripts/convert_to_gguf.py` for the complete, production-ready script.\n\n**Key features:**\n- ✅ All dependencies in PEP 723 header\n- ✅ Build tools installed automatically\n- ✅ CMake build process (reliable)\n- ✅ Comprehensive error handling\n- ✅ Environment variable configuration\n- ✅ Automatic README generation\n\n## Quick Conversion Job\n\n```python\n# Before submitting: VERIFY MODELS EXIST\nhub_repo_details([\"username/my-finetuned-model\"], repo_type=\"model\")\nhub_repo_details([\"Qwen/Qwen2.5-0.5B\"], repo_type=\"model\")\n\n# Submit conversion job\nhf_jobs(\"uv\", {\n    \"script\": open(\"trl/scripts/convert_to_gguf.py\").read(),  # Or inline the script\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"45m\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n    \"env\": {\n        \"ADAPTER_MODEL\": \"username/my-finetuned-model\",\n        \"BASE_MODEL\": \"Qwen/Qwen2.5-0.5B\",\n        \"OUTPUT_REPO\": \"username/my-model-gguf\",\n        \"HF_USERNAME\": \"username\"  # Optional, for README\n    }\n})\n```\n\n## Conversion Process\n\nThe script performs these steps:\n\n1. **Load and Merge** - Load base model and LoRA adapter, merge them\n2. **Install Build Tools** - Install gcc, cmake (CRITICAL: before cloning llama.cpp)\n3. **Setup llama.cpp** - Clone repo, install Python dependencies\n4. **Convert to GGUF** - Create FP16 GGUF using llama.cpp converter\n5. **Build Quantize Tool** - Use CMake to build `llama-quantize`\n6. **Quantize** - Create Q4_K_M, Q5_K_M, Q8_0 versions\n7. **Upload** - Upload all versions + README to Hub\n\n## Quantization Options\n\nCommon quantization formats (from smallest to largest):\n\n| Format | Size | Quality | Use Case |\n|--------|------|---------|----------|\n| **Q4_K_M** | ~300MB | Good | **Recommended** - best balance of size/quality |\n| **Q5_K_M** | ~350MB | Better | Higher quality, slightly larger |\n| **Q8_0** | ~500MB | Very High | Near-original quality |\n| **F16** | ~1GB | Original | Full precision, largest file |\n\n**Recommendation:** Create Q4_K_M, Q5_K_M, and Q8_0 versions to give users options.\n\n## Hardware Requirements\n\n**For conversion:**\n- Small models (<1B): CPU-basic works, but slow\n- Medium models (1-7B): a10g-large recommended\n- Large models (7B+): a10g-large or a100-large\n\n**Time estimates:**\n- 0.5B model: ~15-25 minutes on A10G\n- 3B model: ~30-45 minutes on A10G\n- 7B model: ~45-60 minutes on A10G\n\n## Using GGUF Models\n\n**GGUF models work on both CPU and GPU.** They're optimized for CPU inference but can also leverage GPU acceleration when available.\n\n### With Ollama (auto-detects GPU)\n```bash\n# Download GGUF\nhf download username/my-model-gguf model-q4_k_m.gguf\n\n# Create Modelfile\necho \"FROM ./model-q4_k_m.gguf\" > Modelfile\n\n# Create and run (uses GPU automatically if available)\nollama create my-model -f Modelfile\nollama run my-model\n```\n\n### With llama.cpp\n```bash\n# CPU only\n./llama-cli -m model-q4_k_m.gguf -p \"Your prompt\"\n\n# With GPU acceleration (offload 32 layers to GPU)\n./llama-cli -m model-q4_k_m.gguf -ngl 32 -p \"Your prompt\"\n```\n\n### With LM Studio\n1. Download the `.gguf` file\n2. Import into LM Studio\n3. Start chatting\n\n## Best Practices\n\n### ✅ DO:\n1. **Verify repos exist** before submitting jobs (use `hub_repo_details`)\n2. **Install build tools FIRST** before cloning llama.cpp\n3. **Use CMake** for building quantize tool (not make)\n4. **Include all dependencies** in PEP 723 header (especially sentencepiece, protobuf)\n5. **Create multiple quantizations** - Give users choice\n6. **Test on known models** before production use\n7. **Use A10G GPU** for faster conversion\n\n### ❌ DON'T:\n1. **Assume repos exist** - Always verify with hub tools\n2. **Use make** instead of CMake - Less reliable\n3. **Remove dependencies** to \"simplify\" - They're all needed\n4. **Skip build tools** - Quantization will fail silently\n5. **Use default paths** - CMake puts binaries in build/bin/\n\n## Common Issues\n\n### Out of memory during merge\n**Fix:**\n- Use larger GPU (a10g-large or a100-large)\n- Ensure `device_map=\"auto\"` for automatic placement\n- Use `dtype=torch.float16` or `torch.bfloat16`\n\n### Conversion fails with architecture error\n**Fix:**\n- Ensure llama.cpp supports the model architecture\n- Check for standard architecture (Qwen, Llama, Mistral, etc.)\n- Update llama.cpp to latest: `git clone --depth 1 https://github.com/ggerganov/llama.cpp.git`\n- Check llama.cpp documentation for model support\n\n### Quantization fails\n**Fix:**\n- Verify build tools installed: `apt-get install build-essential cmake`\n- Use CMake (not make) to build quantize tool\n- Check binary path: `/tmp/llama.cpp/build/bin/llama-quantize`\n- Verify FP16 GGUF exists before quantizing\n\n### Missing sentencepiece error\n**Fix:**\n- Add to PEP 723 header: `\"sentencepiece>=0.1.99\", \"protobuf>=3.20.0\"`\n- Don't remove dependencies to \"simplify\" - all are required\n\n### Upload fails or times out\n**Fix:**\n- Large models (>2GB) need longer timeout: `\"timeout\": \"1h\"`\n- Upload quantized versions separately if needed\n- Check network/Hub status\n\n## Lessons Learned\n\nThese are from production testing and real failures:\n\n### 1. Always Verify Before Use\n**Lesson:** Don't assume repos/datasets exist. Check first.\n```python\n# BEFORE submitting job\nhub_repo_details([\"trl-lib/argilla-dpo-mix-7k\"], repo_type=\"dataset\")  # Would catch error\n```\n**Prevented failures:** Non-existent dataset names, typos in model names\n\n### 2. Prioritize Reliability Over Performance\n**Lesson:** Default to what's most likely to succeed.\n- Use CMake (not make) - more reliable\n- Disable CUDA in build - faster, not needed\n- Include all dependencies - don't \"simplify\"\n\n**Prevented failures:** Build failures, missing binaries\n\n### 3. Create Atomic, Self-Contained Scripts\n**Lesson:** Don't remove dependencies or steps. Scripts should work as a unit.\n- All dependencies in PEP 723 header\n- All build steps included\n- Clear error messages\n\n**Prevented failures:** Missing tokenizer libraries, build tool failures\n\n## References\n\n**In this skill:**\n- `scripts/convert_to_gguf.py` - Complete, production-ready script\n\n**External:**\n- [llama.cpp Repository](https://github.com/ggerganov/llama.cpp)\n- [GGUF Specification](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)\n- [Ollama Documentation](https://ollama.ai)\n- [LM Studio](https://lmstudio.ai)\n\n## Summary\n\n**Critical checklist for GGUF conversion:**\n- [ ] Verify adapter and base models exist on Hub\n- [ ] Use production script from `scripts/convert_to_gguf.py`\n- [ ] All dependencies in PEP 723 header (including sentencepiece, protobuf)\n- [ ] Build tools installed before cloning llama.cpp\n- [ ] CMake used for building quantize tool (not make)\n- [ ] Correct binary path: `/tmp/llama.cpp/build/bin/llama-quantize`\n- [ ] A10G GPU selected for reasonable conversion time\n- [ ] Timeout set to 45m minimum\n- [ ] HF_TOKEN in secrets for Hub upload\n\n**The script in `scripts/convert_to_gguf.py` incorporates all these lessons and has been tested successfully in production.**\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/hardware_guide.md",
    "content": "# Hardware Selection Guide\n\nChoosing the right hardware (flavor) is critical for cost-effective training.\n\n## Available Hardware\n\n### CPU\n- `cpu-basic` - Basic CPU, testing only\n- `cpu-upgrade` - Enhanced CPU\n\n**Use cases:** Dataset validation, preprocessing, testing scripts\n**Not recommended for training:** Too slow for any meaningful training\n\n### GPU Options\n\n| Flavor | GPU | Memory | Use Case | Cost/hour |\n|--------|-----|--------|----------|-----------|\n| `t4-small` | NVIDIA T4 | 16GB | <1B models, demos | ~$0.50-1 |\n| `t4-medium` | NVIDIA T4 | 16GB | 1-3B models, development | ~$1-2 |\n| `l4x1` | NVIDIA L4 | 24GB | 3-7B models, efficient training | ~$2-3 |\n| `l4x4` | 4x NVIDIA L4 | 96GB | Multi-GPU training | ~$8-12 |\n| `a10g-small` | NVIDIA A10G | 24GB | 3-7B models, production | ~$3-4 |\n| `a10g-large` | NVIDIA A10G | 24GB | 7-13B models | ~$4-6 |\n| `a10g-largex2` | 2x NVIDIA A10G | 48GB | Multi-GPU, large models | ~$8-12 |\n| `a10g-largex4` | 4x NVIDIA A10G | 96GB | Multi-GPU, very large models | ~$16-24 |\n| `a100-large` | NVIDIA A100 | 40GB | 13B+ models, fast training | ~$8-12 |\n\n### TPU Options\n\n| Flavor | Type | Use Case |\n|--------|------|----------|\n| `v5e-1x1` | TPU v5e | Small TPU workloads |\n| `v5e-2x2` | 4x TPU v5e | Medium TPU workloads |\n| `v5e-2x4` | 8x TPU v5e | Large TPU workloads |\n\n**Note:** TPUs require TPU-optimized code. Most TRL training uses GPUs.\n\n## Selection Guidelines\n\n### By Model Size\n\n**Tiny Models (<1B parameters)**\n- **Recommended:** `t4-small`\n- **Example:** Qwen2.5-0.5B, TinyLlama\n- **Batch size:** 4-8\n- **Training time:** 1-2 hours for 1K examples\n\n**Small Models (1-3B parameters)**\n- **Recommended:** `t4-medium` or `a10g-small`\n- **Example:** Qwen2.5-1.5B, Phi-2\n- **Batch size:** 2-4\n- **Training time:** 2-4 hours for 10K examples\n\n**Medium Models (3-7B parameters)**\n- **Recommended:** `a10g-small` or `a10g-large`\n- **Example:** Qwen2.5-7B, Mistral-7B\n- **Batch size:** 1-2 (or LoRA with 4-8)\n- **Training time:** 4-8 hours for 10K examples\n\n**Large Models (7-13B parameters)**\n- **Recommended:** `a10g-large` or `a100-large`\n- **Example:** Llama-3-8B, Mixtral-8x7B (with LoRA)\n- **Batch size:** 1 (full fine-tuning) or 2-4 (LoRA)\n- **Training time:** 6-12 hours for 10K examples\n- **Note:** Always use LoRA/PEFT\n\n**Very Large Models (13B+ parameters)**\n- **Recommended:** `a100-large` with LoRA\n- **Example:** Llama-3-13B, Llama-3-70B (LoRA only)\n- **Batch size:** 1-2 with LoRA\n- **Training time:** 8-24 hours for 10K examples\n- **Note:** Full fine-tuning not feasible, use LoRA/PEFT\n\n### By Budget\n\n**Minimal Budget (<$5 total)**\n- Use `t4-small`\n- Train on subset of data (100-500 examples)\n- Limit to 1-2 epochs\n- Use small model (<1B)\n\n**Small Budget ($5-20)**\n- Use `t4-medium` or `a10g-small`\n- Train on 1K-5K examples\n- 2-3 epochs\n- Model up to 3B parameters\n\n**Medium Budget ($20-50)**\n- Use `a10g-small` or `a10g-large`\n- Train on 5K-20K examples\n- 3-5 epochs\n- Model up to 7B parameters\n\n**Large Budget ($50-200)**\n- Use `a10g-large` or `a100-large`\n- Full dataset training\n- Multiple epochs\n- Model up to 13B parameters with LoRA\n\n### By Training Type\n\n**Quick Demo/Experiment**\n- `t4-small`\n- 50-100 examples\n- 5-10 steps\n- ~10-15 minutes\n\n**Development/Iteration**\n- `t4-medium` or `a10g-small`\n- 1K examples\n- 1 epoch\n- ~30-60 minutes\n\n**Production Training**\n- `a10g-large` or `a100-large`\n- Full dataset\n- 3-5 epochs\n- 4-12 hours\n\n**Research/Experimentation**\n- `a100-large`\n- Multiple runs\n- Various hyperparameters\n- Budget for 20-50 hours\n\n## Memory Considerations\n\n### Estimating Memory Requirements\n\n**Full fine-tuning:**\n```\nMemory (GB) ≈ (Model params in billions) × 20\n```\n\n**LoRA fine-tuning:**\n```\nMemory (GB) ≈ (Model params in billions) × 4\n```\n\n**Examples:**\n- Qwen2.5-0.5B full: ~10GB ✅ fits t4-small\n- Qwen2.5-1.5B full: ~30GB ❌ exceeds most GPUs\n- Qwen2.5-1.5B LoRA: ~6GB ✅ fits t4-small\n- Qwen2.5-7B full: ~140GB ❌ not feasible\n- Qwen2.5-7B LoRA: ~28GB ✅ fits a10g-large\n\n### Memory Optimization\n\nIf hitting memory limits:\n\n1. **Use LoRA/PEFT**\n   ```python\n   peft_config=LoraConfig(r=16, lora_alpha=32)\n   ```\n\n2. **Reduce batch size**\n   ```python\n   per_device_train_batch_size=1\n   ```\n\n3. **Increase gradient accumulation**\n   ```python\n   gradient_accumulation_steps=8  # Effective batch size = 1×8\n   ```\n\n4. **Enable gradient checkpointing**\n   ```python\n   gradient_checkpointing=True\n   ```\n\n5. **Use mixed precision**\n   ```python\n   bf16=True  # or fp16=True\n   ```\n\n6. **Upgrade to larger GPU**\n   - t4 → a10g → a100\n\n## Cost Estimation\n\n### Formula\n\n```\nTotal Cost = (Hours of training) × (Cost per hour)\n```\n\n### Example Calculations\n\n**Quick demo:**\n- Hardware: t4-small ($0.75/hour)\n- Time: 15 minutes (0.25 hours)\n- Cost: $0.19\n\n**Development training:**\n- Hardware: a10g-small ($3.50/hour)\n- Time: 2 hours\n- Cost: $7.00\n\n**Production training:**\n- Hardware: a10g-large ($5/hour)\n- Time: 6 hours\n- Cost: $30.00\n\n**Large model with LoRA:**\n- Hardware: a100-large ($10/hour)\n- Time: 8 hours\n- Cost: $80.00\n\n### Cost Optimization Tips\n\n1. **Start small:** Test on t4-small with subset\n2. **Use LoRA:** 4-5x cheaper than full fine-tuning\n3. **Optimize hyperparameters:** Fewer epochs if possible\n4. **Set appropriate timeout:** Don't waste compute on stalled jobs\n5. **Use checkpointing:** Resume if job fails\n6. **Monitor costs:** Check running jobs regularly\n\n## Multi-GPU Training\n\nTRL automatically handles multi-GPU training with Accelerate when using multi-GPU flavors.\n\n**Multi-GPU flavors:**\n- `l4x4` - 4x L4 GPUs\n- `a10g-largex2` - 2x A10G GPUs\n- `a10g-largex4` - 4x A10G GPUs\n\n**When to use:**\n- Models >13B parameters\n- Need faster training (linear speedup)\n- Large datasets (>50K examples)\n\n**Example:**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"train.py\",\n    \"flavor\": \"a10g-largex2\",  # 2 GPUs\n    \"timeout\": \"4h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\nNo code changes needed—TRL/Accelerate handles distribution automatically.\n\n## Choosing Between Options\n\n### a10g vs a100\n\n**Choose a10g when:**\n- Model <13B parameters\n- Budget conscious\n- Training time not critical\n\n**Choose a100 when:**\n- Model 13B+ parameters\n- Need fastest training\n- Memory requirements high\n- Budget allows\n\n### Single vs Multi-GPU\n\n**Choose single GPU when:**\n- Model <7B parameters\n- Budget constrained\n- Simpler debugging\n\n**Choose multi-GPU when:**\n- Model >13B parameters\n- Need faster training\n- Large batch sizes required\n- Cost-effective for large jobs\n\n## Quick Reference\n\n```python\n# Model size → Hardware selection\nHARDWARE_MAP = {\n    \"<1B\":     \"t4-small\",\n    \"1-3B\":    \"a10g-small\",\n    \"3-7B\":    \"a10g-large\",\n    \"7-13B\":   \"a10g-large (LoRA) or a100-large\",\n    \">13B\":    \"a100-large (LoRA required)\"\n}\n```\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/hub_saving.md",
    "content": "# Saving Training Results to Hugging Face Hub\n\n**⚠️ CRITICAL:** Training environments are ephemeral. ALL results are lost when a job completes unless pushed to the Hub.\n\n## Why Hub Push is Required\n\nWhen running on Hugging Face Jobs:\n- Environment is temporary\n- All files deleted on job completion\n- No local disk persistence\n- Cannot access results after job ends\n\n**Without Hub push, training is completely wasted.**\n\n## Required Configuration\n\n### 1. Training Configuration\n\nIn your SFTConfig or trainer config:\n\n```python\nSFTConfig(\n    push_to_hub=True,                    # Enable Hub push\n    hub_model_id=\"username/model-name\",   # Target repository\n)\n```\n\n### 2. Job Configuration\n\nWhen submitting the job:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"train.py\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # Provide authentication\n})\n```\n\n**The `$HF_TOKEN` placeholder is automatically replaced with your Hugging Face token.**\n\n## Complete Example\n\n```python\n# train.py\n# /// script\n# dependencies = [\"trl\"]\n# ///\n\nfrom trl import SFTTrainer, SFTConfig\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"trl-lib/Capybara\", split=\"train\")\n\n# Configure with Hub push\nconfig = SFTConfig(\n    output_dir=\"my-model\",\n    num_train_epochs=3,\n    \n    # ✅ CRITICAL: Hub push configuration\n    push_to_hub=True,\n    hub_model_id=\"myusername/my-trained-model\",\n    \n    # Optional: Push strategy\n    push_to_hub_model_id=\"myusername/my-trained-model\",\n    push_to_hub_organization=None,\n    push_to_hub_token=None,  # Uses environment token\n)\n\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset,\n    args=config,\n)\n\ntrainer.train()\n\n# ✅ Push final model\ntrainer.push_to_hub()\n\nprint(\"✅ Model saved to: https://huggingface.co/myusername/my-trained-model\")\n```\n\n**Submit with authentication:**\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"train.py\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"2h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Required!\n})\n```\n\n## What Gets Saved\n\nWhen `push_to_hub=True`:\n\n1. **Model weights** - Final trained parameters\n2. **Tokenizer** - Associated tokenizer\n3. **Configuration** - Model config (config.json)\n4. **Training arguments** - Hyperparameters used\n5. **Model card** - Auto-generated documentation\n6. **Checkpoints** - If `save_strategy=\"steps\"` enabled\n\n## Checkpoint Saving\n\nSave intermediate checkpoints during training:\n\n```python\nSFTConfig(\n    output_dir=\"my-model\",\n    push_to_hub=True,\n    hub_model_id=\"username/my-model\",\n    \n    # Checkpoint configuration\n    save_strategy=\"steps\",\n    save_steps=100,              # Save every 100 steps\n    save_total_limit=3,          # Keep only last 3 checkpoints\n)\n```\n\n**Benefits:**\n- Resume training if job fails\n- Compare checkpoint performance\n- Use intermediate models\n\n**Checkpoints are pushed to:** `username/my-model` (same repo)\n\n## Authentication Methods\n\n### Method 1: Automatic Token (Recommended)\n\n```python\n\"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n```\n\nUses your logged-in Hugging Face token automatically.\n\n### Method 2: Explicit Token\n\n```python\n\"secrets\": {\"HF_TOKEN\": \"hf_abc123...\"}\n```\n\nProvide token explicitly (not recommended for security).\n\n### Method 3: Environment Variable\n\n```python\n\"env\": {\"HF_TOKEN\": \"hf_abc123...\"}\n```\n\nPass as regular environment variable (less secure than secrets).\n\n**Always prefer Method 1** for security and convenience.\n\n## Verification Checklist\n\nBefore submitting any training job, verify:\n\n- [ ] `push_to_hub=True` in training config\n- [ ] `hub_model_id` is specified (format: `username/model-name`)\n- [ ] `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` in job config\n- [ ] Repository name doesn't conflict with existing repos\n- [ ] You have write access to the target namespace\n\n## Repository Setup\n\n### Automatic Creation\n\nIf repository doesn't exist, it's created automatically when first pushing.\n\n### Manual Creation\n\nCreate repository before training:\n\n```python\nfrom huggingface_hub import HfApi\n\napi = HfApi()\napi.create_repo(\n    repo_id=\"username/model-name\",\n    repo_type=\"model\",\n    private=False,  # or True for private repo\n)\n```\n\n### Repository Naming\n\n**Valid names:**\n- `username/my-model`\n- `username/model-name`\n- `organization/model-name`\n\n**Invalid names:**\n- `model-name` (missing username)\n- `username/model name` (spaces not allowed)\n- `username/MODEL` (uppercase discouraged)\n\n## Troubleshooting\n\n### Error: 401 Unauthorized\n\n**Cause:** HF_TOKEN not provided or invalid\n\n**Solutions:**\n1. Verify `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` in job config\n2. Check you're logged in: `hf auth whoami`\n3. Re-login: `hf auth login`\n\n### Error: 403 Forbidden\n\n**Cause:** No write access to repository\n\n**Solutions:**\n1. Check repository namespace matches your username\n2. Verify you're a member of organization (if using org namespace)\n3. Check repository isn't private (if accessing org repo)\n\n### Error: Repository not found\n\n**Cause:** Repository doesn't exist and auto-creation failed\n\n**Solutions:**\n1. Manually create repository first\n2. Check repository name format\n3. Verify namespace exists\n\n### Error: Push failed during training\n\n**Cause:** Network issues or Hub unavailable\n\n**Solutions:**\n1. Training continues but final push fails\n2. Checkpoints may be saved\n3. Re-run push manually after job completes\n\n### Issue: Model saved but not visible\n\n**Possible causes:**\n1. Repository is private—check https://huggingface.co/username\n2. Wrong namespace—verify `hub_model_id` matches login\n3. Push still in progress—wait a few minutes\n\n## Manual Push After Training\n\nIf training completes but push fails, push manually:\n\n```python\nfrom transformers import AutoModel, AutoTokenizer\n\n# Load from local checkpoint\nmodel = AutoModel.from_pretrained(\"./output_dir\")\ntokenizer = AutoTokenizer.from_pretrained(\"./output_dir\")\n\n# Push to Hub\nmodel.push_to_hub(\"username/model-name\", token=\"hf_abc123...\")\ntokenizer.push_to_hub(\"username/model-name\", token=\"hf_abc123...\")\n```\n\n**Note:** Only possible if job hasn't completed (files still exist).\n\n## Best Practices\n\n1. **Always enable `push_to_hub=True`**\n2. **Use checkpoint saving** for long training runs\n3. **Verify Hub push** in logs before job completes\n4. **Set appropriate `save_total_limit`** to avoid excessive checkpoints\n5. **Use descriptive repo names** (e.g., `qwen-capybara-sft` not `model1`)\n6. **Add model card** with training details\n7. **Tag models** with relevant tags (e.g., `text-generation`, `fine-tuned`)\n\n## Monitoring Push Progress\n\nCheck logs for push progress:\n\n```python\nhf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})\n```\n\n**Look for:**\n```\nPushing model to username/model-name...\nUpload file pytorch_model.bin: 100%\n✅ Model pushed successfully\n```\n\n## Example: Full Production Setup\n\n```python\n# production_train.py\n# /// script\n# dependencies = [\"trl>=0.12.0\", \"peft>=0.7.0\"]\n# ///\n\nfrom datasets import load_dataset\nfrom peft import LoraConfig\nfrom trl import SFTTrainer, SFTConfig\nimport os\n\n# Verify token is available\nassert \"HF_TOKEN\" in os.environ, \"HF_TOKEN not found in environment!\"\n\n# Load dataset\ndataset = load_dataset(\"trl-lib/Capybara\", split=\"train\")\nprint(f\"✅ Dataset loaded: {len(dataset)} examples\")\n\n# Configure with comprehensive Hub settings\nconfig = SFTConfig(\n    output_dir=\"qwen-capybara-sft\",\n    \n    # Hub configuration\n    push_to_hub=True,\n    hub_model_id=\"myusername/qwen-capybara-sft\",\n    hub_strategy=\"checkpoint\",  # Push checkpoints\n    \n    # Checkpoint configuration\n    save_strategy=\"steps\",\n    save_steps=100,\n    save_total_limit=3,\n    \n    # Training settings\n    num_train_epochs=3,\n    per_device_train_batch_size=4,\n    \n    # Logging\n    logging_steps=10,\n    logging_first_step=True,\n)\n\n# Train with LoRA\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset,\n    args=config,\n    peft_config=LoraConfig(r=16, lora_alpha=32),\n)\n\nprint(\"🚀 Starting training...\")\ntrainer.train()\n\nprint(\"💾 Pushing final model to Hub...\")\ntrainer.push_to_hub()\n\nprint(\"✅ Training complete!\")\nprint(f\"Model available at: https://huggingface.co/myusername/qwen-capybara-sft\")\n```\n\n**Submit:**\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"production_train.py\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"6h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n## Key Takeaway\n\n**Without `push_to_hub=True` and `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`, all training results are permanently lost.**\n\nAlways verify both are configured before submitting any training job.\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/local_training_macos.md",
    "content": "# Local Training on macOS (Apple Silicon)\n\nRun small LoRA fine-tuning jobs locally on Mac for smoke tests and quick iteration before submitting to HF Jobs.\n\n## When to Use Local Mac vs HF Jobs\n\n| Local Mac | HF Jobs / Cloud GPU |\n|-----------|-------------------|\n| Model ≤3B, text-only | Model 7B+ |\n| LoRA/PEFT only | QLoRA 4-bit (CUDA/bitsandbytes) |\n| Short context (≤1024) | Long context / full fine-tuning |\n| Smoke tests, dataset validation | Production runs, VLMs |\n\n**Typical workflow:** local smoke test → HF Jobs with same config → export/quantize ([gguf_conversion.md](gguf_conversion.md))\n\n## Recommended Defaults\n\n| Setting | Value | Notes |\n|---------|-------|-------|\n| Model size | 0.5B–1.5B first run | Scale up after verifying |\n| Max seq length | 512–1024 | Lower = less memory |\n| Batch size | 1 | Scale via gradient accumulation |\n| Gradient accumulation | 8–16 | Effective batch = 8–16 |\n| LoRA rank (r) | 8–16 | alpha = 2×r |\n| Dtype | float32 | fp16 causes NaN on MPS; bf16 only on M1 Pro+ and M2/M3/M4 |\n\n### Memory by hardware\n\n| Unified RAM | Max Model Size |\n|-------------|---------------|\n| 16 GB | ~0.5B–1.5B |\n| 32 GB | ~1.5B–3B |\n| 64 GB | ~3B (short context) |\n\n## Setup\n\n```bash\nxcode-select --install\npython3 -m venv .venv && source .venv/bin/activate\npip install -U \"torch>=2.2\" \"transformers>=4.40\" \"trl>=0.12\" \"peft>=0.10\" \\\n    datasets accelerate safetensors huggingface_hub\n```\n\nVerify MPS:\n```bash\npython -c \"import torch; print(torch.__version__, '| MPS:', torch.backends.mps.is_available())\"\n```\n\nOptional — configure Accelerate for local Mac (no distributed, no mixed precision, MPS device):\n```bash\naccelerate config\n```\n\n## Training Script\n\n<details>\n<summary><strong>train_lora_sft.py</strong></summary>\n\n```python\nimport os\nfrom dataclasses import dataclass\nfrom typing import Optional\nimport torch\nfrom datasets import load_dataset\nfrom transformers import AutoModelForCausalLM, AutoTokenizer, set_seed\nfrom peft import LoraConfig\nfrom trl import SFTTrainer, SFTConfig\n\nset_seed(42)\n\n@dataclass\nclass Cfg:\n    model_id: str = os.environ.get(\"MODEL_ID\", \"Qwen/Qwen2.5-0.5B-Instruct\")\n    dataset_id: str = os.environ.get(\"DATASET_ID\", \"HuggingFaceH4/ultrachat_200k\")\n    dataset_split: str = os.environ.get(\"DATASET_SPLIT\", \"train_sft[:500]\")\n    data_files: Optional[str] = os.environ.get(\"DATA_FILES\", None)\n    text_field: str = os.environ.get(\"TEXT_FIELD\", \"\")\n    messages_field: str = os.environ.get(\"MESSAGES_FIELD\", \"messages\")\n    out_dir: str = os.environ.get(\"OUT_DIR\", \"outputs/local-lora\")\n    max_seq_length: int = int(os.environ.get(\"MAX_SEQ_LENGTH\", \"512\"))\n    max_steps: int = int(os.environ.get(\"MAX_STEPS\", \"-1\"))\n\ncfg = Cfg()\ndevice = \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n\ntokenizer = AutoTokenizer.from_pretrained(cfg.model_id, use_fast=True)\nif tokenizer.pad_token is None:\n    tokenizer.pad_token = tokenizer.eos_token\ntokenizer.padding_side = \"right\"\n\nmodel = AutoModelForCausalLM.from_pretrained(cfg.model_id, torch_dtype=torch.float32)\nmodel.to(device)\nmodel.config.use_cache = False\n\nif cfg.data_files:\n    ds = load_dataset(\"json\", data_files=cfg.data_files, split=\"train\")\nelse:\n    ds = load_dataset(cfg.dataset_id, split=cfg.dataset_split)\n\ndef format_example(ex):\n    if cfg.text_field and isinstance(ex.get(cfg.text_field), str):\n        ex[\"text\"] = ex[cfg.text_field]\n        return ex\n    msgs = ex.get(cfg.messages_field)\n    if isinstance(msgs, list):\n        if hasattr(tokenizer, \"apply_chat_template\"):\n            try:\n                ex[\"text\"] = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False)\n                return ex\n            except Exception:\n                pass\n        ex[\"text\"] = \"\\n\".join([str(m) for m in msgs])\n        return ex\n    ex[\"text\"] = str(ex)\n    return ex\n\nds = ds.map(format_example)\nds = ds.remove_columns([c for c in ds.column_names if c != \"text\"])\n\nlora = LoraConfig(r=16, lora_alpha=32, lora_dropout=0.05, bias=\"none\",\n                  task_type=\"CAUSAL_LM\", target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\"])\n\nsft_kwargs = dict(\n    output_dir=cfg.out_dir, per_device_train_batch_size=1, gradient_accumulation_steps=8,\n    learning_rate=2e-4, logging_steps=10, save_steps=200, save_total_limit=2,\n    gradient_checkpointing=True, report_to=\"none\", fp16=False, bf16=False,\n    max_seq_length=cfg.max_seq_length, dataset_text_field=\"text\",\n)\nif cfg.max_steps > 0:\n    sft_kwargs[\"max_steps\"] = cfg.max_steps\nelse:\n    sft_kwargs[\"num_train_epochs\"] = 1\n\ntrainer = SFTTrainer(model=model, train_dataset=ds, peft_config=lora,\n                     args=SFTConfig(**sft_kwargs), processing_class=tokenizer)\ntrainer.train()\ntrainer.save_model(cfg.out_dir)\nprint(f\"✅ Saved to: {cfg.out_dir}\")\n```\n\n</details>\n\n### Run\n\n```bash\npython train_lora_sft.py\n```\n\n**Env overrides:**\n\n```bash\nMODEL_ID=\"Qwen/Qwen2.5-1.5B-Instruct\" python train_lora_sft.py   # different model\nMAX_STEPS=50 python train_lora_sft.py                              # quick 50-step test\nDATA_FILES=\"my_data.jsonl\" python train_lora_sft.py                # local JSONL file\nPYTORCH_ENABLE_MPS_FALLBACK=1 python train_lora_sft.py             # MPS op fallback to CPU\nPYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 python train_lora_sft.py      # disable MPS memory limit (use with caution)\n```\n\n**Local JSONL format** — chat messages or plain text:\n```jsonl\n{\"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}, {\"role\": \"assistant\", \"content\": \"Hi!\"}]}\n```\n```jsonl\n{\"text\": \"User: Hello\\nAssistant: Hi!\"}\n```\nFor plain text: `DATA_FILES=\"file.jsonl\" TEXT_FIELD=\"text\" MESSAGES_FIELD=\"\" python train_lora_sft.py`\n\n### Verify Success\n\n- Loss decreases over steps\n- `outputs/local-lora/` contains `adapter_config.json` + `*.safetensors`\n\n## Quick Evaluation\n\n<details>\n<summary><strong>eval_generate.py</strong></summary>\n\n```python\nimport os, torch\nfrom transformers import AutoTokenizer, AutoModelForCausalLM\nfrom peft import PeftModel\n\nBASE = os.environ.get(\"MODEL_ID\", \"Qwen/Qwen2.5-0.5B-Instruct\")\nADAPTER = os.environ.get(\"ADAPTER_DIR\", \"outputs/local-lora\")\ndevice = \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n\ntokenizer = AutoTokenizer.from_pretrained(BASE, use_fast=True)\nmodel = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float32)\nmodel.to(device)\nmodel = PeftModel.from_pretrained(model, ADAPTER)\n\nprompt = os.environ.get(\"PROMPT\", \"Explain gradient accumulation in 3 bullet points.\")\ninputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\nwith torch.no_grad():\n    out = model.generate(**inputs, max_new_tokens=120, do_sample=True, temperature=0.7, top_p=0.9)\nprint(tokenizer.decode(out[0], skip_special_tokens=True))\n```\n\n</details>\n\n## Troubleshooting (macOS-Specific)\n\nFor general training issues, see [troubleshooting.md](troubleshooting.md).\n\n| Problem | Fix |\n|---------|-----|\n| MPS unsupported op / crash | `PYTORCH_ENABLE_MPS_FALLBACK=1` |\n| OOM / system instability | Reduce `MAX_SEQ_LENGTH`, use smaller model, set `PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0` (caution) |\n| fp16 NaN / loss explosion | Keep `fp16=False` (default), lower learning rate |\n| LoRA \"module not found\" | Print `model.named_modules()` to find correct target names |\n| TRL TypeError on args | Check TRL version; script uses `SFTConfig` + `processing_class` (TRL ≥0.12) |\n| Intel Mac | No MPS — use HF Jobs instead |\n\n**Common LoRA target modules by architecture:**\n\n| Architecture | target_modules |\n|-------------|---------------|\n| Llama/Qwen/Mistral | `q_proj`, `k_proj`, `v_proj`, `o_proj` |\n| GPT-2/GPT-J | `c_attn`, `c_proj` |\n| BLOOM | `query_key_value`, `dense` |\n\n## MLX Alternative\n\n[MLX](https://github.com/ml-explore/mlx) offers tighter Apple Silicon integration but has a smaller ecosystem and less mature training APIs. For this skill's workflow (local validation → HF Jobs), PyTorch + MPS is recommended for consistency. See [mlx-lm](https://github.com/ml-explore/mlx-lm) for MLX-based fine-tuning.\n\n## See Also\n\n- [troubleshooting.md](troubleshooting.md) — General TRL troubleshooting\n- [hardware_guide.md](hardware_guide.md) — GPU selection for HF Jobs\n- [gguf_conversion.md](gguf_conversion.md) — Export for on-device inference\n- [training_methods.md](training_methods.md) — SFT, DPO, GRPO overview\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/reliability_principles.md",
    "content": "# Reliability Principles for Training Jobs\n\nThese principles are derived from real production failures and successful fixes. Following them prevents common failure modes and ensures reliable job execution.\n\n## Principle 1: Always Verify Before Use\n\n**Rule:** Never assume repos, datasets, or resources exist. Verify with tools first.\n\n### What It Prevents\n\n- **Non-existent datasets** - Jobs fail immediately when dataset doesn't exist\n- **Typos in names** - Simple mistakes like \"argilla-dpo-mix-7k\" vs \"ultrafeedback_binarized\"\n- **Incorrect paths** - Old or moved repos, renamed files\n- **Missing dependencies** - Undocumented requirements\n\n### How to Apply\n\n**Before submitting ANY job:**\n\n```python\n# Verify dataset exists\ndataset_search({\"query\": \"dataset-name\", \"author\": \"author-name\", \"limit\": 5})\nhub_repo_details([\"author/dataset-name\"], repo_type=\"dataset\")\n\n# Verify model exists\nhub_repo_details([\"org/model-name\"], repo_type=\"model\")\n\n# Check script/file paths (for URL-based scripts)\n# Verify before using: https://github.com/user/repo/blob/main/script.py\n```\n\n**Examples that would have caught errors:**\n\n```python\n# ❌ WRONG: Assumed dataset exists\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"...\"\"\",\n    \"env\": {\"DATASET\": \"trl-lib/argilla-dpo-mix-7k\"}  # Doesn't exist!\n})\n\n# ✅ CORRECT: Verify first\ndataset_search({\"query\": \"argilla dpo\", \"author\": \"trl-lib\"})\n# Would show: \"trl-lib/ultrafeedback_binarized\" is the correct name\n\nhub_repo_details([\"trl-lib/ultrafeedback_binarized\"], repo_type=\"dataset\")\n# Confirms it exists before using\n```\n\n### Implementation Checklist\n\n- [ ] Check dataset exists before training\n- [ ] Verify base model exists before fine-tuning\n- [ ] Confirm adapter model exists before GGUF conversion\n- [ ] Test script URLs are valid before submitting\n- [ ] Validate file paths in repositories\n- [ ] Check for recent updates/renames of resources\n\n**Time cost:** 5-10 seconds  \n**Time saved:** Hours of failed job time + debugging\n\n---\n\n## Principle 2: Prioritize Reliability Over Performance\n\n**Rule:** Default to what is most likely to succeed, not what is theoretically fastest.\n\n### What It Prevents\n\n- **Hardware incompatibilities** - Features that fail on certain GPUs\n- **Unstable optimizations** - Speed-ups that cause crashes\n- **Complex configurations** - More failure points\n- **Build system issues** - Unreliable compilation methods\n\n### How to Apply\n\n**Choose reliability:**\n\n```python\n# ❌ RISKY: Aggressive optimization that may fail\nSFTConfig(\n    torch_compile=True,  # Can fail on T4, A10G GPUs\n    optim=\"adamw_bnb_8bit\",  # Requires specific setup\n    fp16=False,  # May cause training instability\n    ...\n)\n\n# ✅ SAFE: Proven defaults\nSFTConfig(\n    # torch_compile=True,  # Commented with note: \"Enable on H100 for 20% speedup\"\n    optim=\"adamw_torch\",  # Standard, always works\n    fp16=True,  # Stable and fast\n    ...\n)\n```\n\n**For build processes:**\n\n```python\n# ❌ UNRELIABLE: Uses make (platform-dependent)\nsubprocess.run([\"make\", \"-C\", \"/tmp/llama.cpp\", \"llama-quantize\"], check=True)\n\n# ✅ RELIABLE: Uses CMake (consistent, documented)\nsubprocess.run([\n    \"cmake\", \"-B\", \"/tmp/llama.cpp/build\", \"-S\", \"/tmp/llama.cpp\",\n    \"-DGGML_CUDA=OFF\"  # Disable CUDA for faster, more reliable build\n], check=True)\n\nsubprocess.run([\n    \"cmake\", \"--build\", \"/tmp/llama.cpp/build\",\n    \"--target\", \"llama-quantize\", \"-j\", \"4\"\n], check=True)\n```\n\n### Real-World Example\n\n**The `torch.compile` failure:**\n- Added for \"20% speedup\" on H100\n- **Failed fatally on T4-medium** with cryptic error\n- Misdiagnosed as dataset issue (cost hours)\n- **Fix:** Disable by default, add as optional comment\n\n**Result:** Reliability > 20% performance gain\n\n### Implementation Checklist\n\n- [ ] Use proven, standard configurations by default\n- [ ] Comment out performance optimizations with hardware notes\n- [ ] Use stable build systems (CMake > make)\n- [ ] Test on target hardware before production\n- [ ] Document known incompatibilities\n- [ ] Provide \"safe\" and \"fast\" variants when needed\n\n**Performance loss:** 10-20% in best case  \n**Reliability gain:** 95%+ success rate vs 60-70%\n\n---\n\n## Principle 3: Create Atomic, Self-Contained Scripts\n\n**Rule:** Scripts should work as complete, independent units. Don't remove parts to \"simplify.\"\n\n### What It Prevents\n\n- **Missing dependencies** - Removed \"unnecessary\" packages that are actually required\n- **Incomplete processes** - Skipped steps that seem redundant\n- **Environment assumptions** - Scripts that need pre-setup\n- **Partial failures** - Some parts work, others fail silently\n\n### How to Apply\n\n**Complete dependency specifications:**\n\n```python\n# ❌ INCOMPLETE: \"Simplified\" by removing dependencies\n# /// script\n# dependencies = [\n#     \"transformers\",\n#     \"peft\",\n#     \"torch\",\n# ]\n# ///\n\n# ✅ COMPLETE: All dependencies explicit\n# /// script\n# dependencies = [\n#     \"transformers>=4.36.0\",\n#     \"peft>=0.7.0\",\n#     \"torch>=2.0.0\",\n#     \"accelerate>=0.24.0\",\n#     \"huggingface_hub>=0.20.0\",\n#     \"sentencepiece>=0.1.99\",  # Required for tokenizers\n#     \"protobuf>=3.20.0\",        # Required for tokenizers\n#     \"numpy\",\n#     \"gguf\",\n# ]\n# ///\n```\n\n**Complete build processes:**\n\n```python\n# ❌ INCOMPLETE: Assumes build tools exist\nsubprocess.run([\"git\", \"clone\", \"https://github.com/ggerganov/llama.cpp.git\", \"/tmp/llama.cpp\"])\nsubprocess.run([\"make\", \"-C\", \"/tmp/llama.cpp\", \"llama-quantize\"])  # FAILS: no gcc/make\n\n# ✅ COMPLETE: Installs all requirements\nsubprocess.run([\"apt-get\", \"update\", \"-qq\"], check=True)\nsubprocess.run([\"apt-get\", \"install\", \"-y\", \"-qq\", \"build-essential\", \"cmake\"], check=True)\nsubprocess.run([\"git\", \"clone\", \"https://github.com/ggerganov/llama.cpp.git\", \"/tmp/llama.cpp\"])\n# ... then build\n```\n\n### Real-World Example\n\n**The `sentencepiece` failure:**\n- Original script had it: worked fine\n- \"Simplified\" version removed it: \"doesn't look necessary\"\n- **GGUF conversion failed silently** - tokenizer couldn't convert\n- Hard to debug: no obvious error message\n- **Fix:** Restore all original dependencies\n\n**Result:** Don't remove dependencies without thorough testing\n\n### Implementation Checklist\n\n- [ ] All dependencies in PEP 723 header with version pins\n- [ ] All system packages installed by script\n- [ ] No assumptions about pre-existing environment\n- [ ] No \"optional\" steps that are actually required\n- [ ] Test scripts in clean environment\n- [ ] Document why each dependency is needed\n\n**Complexity:** Slightly longer scripts  \n**Reliability:** Scripts \"just work\" every time\n\n---\n\n## Principle 4: Provide Clear Error Context\n\n**Rule:** When things fail, make it obvious what went wrong and how to fix it.\n\n### How to Apply\n\n**Wrap subprocess calls:**\n\n```python\n# ❌ UNCLEAR: Silent failure\nsubprocess.run([...], check=True, capture_output=True)\n\n# ✅ CLEAR: Shows what failed\ntry:\n    result = subprocess.run(\n        [...],\n        check=True,\n        capture_output=True,\n        text=True\n    )\n    print(result.stdout)\n    if result.stderr:\n        print(\"Warnings:\", result.stderr)\nexcept subprocess.CalledProcessError as e:\n    print(f\"❌ Command failed!\")\n    print(\"STDOUT:\", e.stdout)\n    print(\"STDERR:\", e.stderr)\n    raise\n```\n\n**Validate inputs:**\n\n```python\n# ❌ UNCLEAR: Fails later with cryptic error\nmodel = load_model(MODEL_NAME)\n\n# ✅ CLEAR: Fails fast with clear message\nif not MODEL_NAME:\n    raise ValueError(\"MODEL_NAME environment variable not set!\")\n\nprint(f\"Loading model: {MODEL_NAME}\")\ntry:\n    model = load_model(MODEL_NAME)\n    print(f\"✅ Model loaded successfully\")\nexcept Exception as e:\n    print(f\"❌ Failed to load model: {MODEL_NAME}\")\n    print(f\"Error: {e}\")\n    print(\"Hint: Check that model exists on Hub\")\n    raise\n```\n\n### Implementation Checklist\n\n- [ ] Wrap external calls with try/except\n- [ ] Print stdout/stderr on failure\n- [ ] Validate environment variables early\n- [ ] Add progress indicators (✅, ❌, 🔄)\n- [ ] Include hints for common failures\n- [ ] Log configuration at start\n\n---\n\n## Principle 5: Test the Happy Path on Known-Good Inputs\n\n**Rule:** Before using new code in production, test with inputs you know work.\n\n### How to Apply\n\n**Known-good test inputs:**\n\n```python\n# For training\nTEST_DATASET = \"trl-lib/Capybara\"  # Small, well-formatted, widely used\nTEST_MODEL = \"Qwen/Qwen2.5-0.5B\"  # Small, fast, reliable\n\n# For GGUF conversion\nTEST_ADAPTER = \"evalstate/qwen-capybara-medium\"  # Known working model\nTEST_BASE = \"Qwen/Qwen2.5-0.5B\"  # Compatible base\n```\n\n**Testing workflow:**\n\n1. Test with known-good inputs first\n2. If that works, try production inputs\n3. If production fails, you know it's the inputs (not code)\n4. Isolate the difference\n\n### Implementation Checklist\n\n- [ ] Maintain list of known-good test models/datasets\n- [ ] Test new scripts with test inputs first\n- [ ] Document what makes inputs \"good\"\n- [ ] Keep test jobs cheap (small models, short timeouts)\n- [ ] Only move to production after test succeeds\n\n**Time cost:** 5-10 minutes for test run  \n**Debugging time saved:** Hours\n\n---\n\n## Summary: The Reliability Checklist\n\nBefore submitting ANY job:\n\n### Pre-Flight Checks\n- [ ] **Verified** all repos/datasets exist (hub_repo_details)\n- [ ] **Tested** with known-good inputs if new code\n- [ ] **Using** proven hardware/configuration\n- [ ] **Included** all dependencies in PEP 723 header\n- [ ] **Installed** system requirements (build tools, etc.)\n- [ ] **Set** appropriate timeout (not default 30m)\n- [ ] **Configured** Hub push with HF_TOKEN\n- [ ] **Added** clear error handling\n\n### Script Quality\n- [ ] Self-contained (no external setup needed)\n- [ ] Complete dependencies listed\n- [ ] Build tools installed by script\n- [ ] Progress indicators included\n- [ ] Error messages are clear\n- [ ] Configuration logged at start\n\n### Job Configuration\n- [ ] Timeout > expected runtime + 30% buffer\n- [ ] Hardware appropriate for model size\n- [ ] Secrets include HF_TOKEN\n- [ ] Environment variables set correctly\n- [ ] Cost estimated and acceptable\n\n**Following these principles transforms job success rate from ~60-70% to ~95%+**\n\n---\n\n## When Principles Conflict\n\nSometimes reliability and performance conflict. Here's how to choose:\n\n| Scenario | Choose | Rationale |\n|----------|--------|-----------|\n| Demo/test | Reliability | Fast failure is worse than slow success |\n| Production (first run) | Reliability | Prove it works before optimizing |\n| Production (proven) | Performance | Safe to optimize after validation |\n| Time-critical | Reliability | Failures cause more delay than slow runs |\n| Cost-critical | Balanced | Test with small model, then optimize |\n\n**General rule:** Reliability first, optimize second.\n\n---\n\n## Further Reading\n\n- `troubleshooting.md` - Common issues and fixes\n- `training_patterns.md` - Proven training configurations\n- `gguf_conversion.md` - Production GGUF workflow\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/trackio_guide.md",
    "content": "# Trackio Integration for TRL Training\n\n**Trackio** is an experiment tracking library that provides real-time metrics visualization for remote training on Hugging Face Jobs infrastructure.\n\n⚠️ **IMPORTANT**: For Jobs training (remote cloud GPUs):\n- Training happens on ephemeral cloud runners (not your local machine)\n- Trackio syncs metrics to a Hugging Face Space for real-time monitoring\n- Without a Space, metrics are lost when the job completes\n- The Space dashboard persists your training metrics permanently\n\n## Setting Up Trackio for Jobs\n\n**Step 1: Add trackio dependency**\n```python\n# /// script\n# dependencies = [\n#     \"trl>=0.12.0\",\n#     \"trackio\",  # Required!\n# ]\n# ///\n```\n\n**Step 2: Create a Trackio Space (one-time setup)**\n\n**Option A: Let Trackio auto-create (Recommended)**\nPass a `space_id` to `trackio.init()` and Trackio will automatically create the Space if it doesn't exist.\n\n**Option B: Create manually**\n- Create Space via Hub UI at https://huggingface.co/new-space\n- Select Gradio SDK\n- OR use command: `hf repos create my-trackio-dashboard --type space --space-sdk gradio`\n\n**Step 3: Initialize Trackio with space_id**\n```python\nimport trackio\n\ntrackio.init(\n    project=\"my-training\",\n    space_id=\"username/trackio\",  # CRITICAL for Jobs! Replace 'username' with your HF username\n    config={\n        \"model\": \"Qwen/Qwen2.5-0.5B\",\n        \"dataset\": \"trl-lib/Capybara\",\n        \"learning_rate\": 2e-5,\n    }\n)\n```\n\n**Step 4: Configure TRL to use Trackio**\n```python\nSFTConfig(\n    report_to=\"trackio\",\n    # ... other config\n)\n```\n\n**Step 5: Finish tracking**\n```python\ntrainer.train()\ntrackio.finish()  # Ensures final metrics are synced\n```\n\n## What Trackio Tracks\n\nTrackio automatically logs:\n- ✅ Training loss\n- ✅ Learning rate\n- ✅ GPU utilization\n- ✅ Memory usage\n- ✅ Training throughput\n- ✅ Custom metrics\n\n## How It Works with Jobs\n\n1. **Training runs** → Metrics logged to local SQLite DB\n2. **Every 5 minutes** → Trackio syncs DB to HF Dataset (Parquet)\n3. **Space dashboard** → Reads from Dataset, displays metrics in real-time\n4. **Job completes** → Final sync ensures all metrics persisted\n\n## Default Configuration Pattern\n\n**Use sensible defaults for trackio configuration unless user requests otherwise.**\n\n### Recommended Defaults\n\n```python\nimport trackio\n\ntrackio.init(\n    project=\"qwen-capybara-sft\",\n    name=\"baseline-run\",             # Descriptive name user will recognize\n    space_id=\"username/trackio\",     # Default space: {username}/trackio\n    config={\n        # Keep config minimal - hyperparameters and model/dataset info only\n        \"model\": \"Qwen/Qwen2.5-0.5B\",\n        \"dataset\": \"trl-lib/Capybara\",\n        \"learning_rate\": 2e-5,\n        \"num_epochs\": 3,\n    }\n)\n```\n\n**Key principles:**\n- **Space ID**: Use `{username}/trackio` with \"trackio\" as default space name\n- **Run naming**: Unless otherwise specified, name the run in a way the user will recognize\n- **Config**: Keep minimal - don't automatically capture job metadata unless requested\n- **Grouping**: Optional - only use if user requests organizing related experiments\n\n## Grouping Runs (Optional)\n\nThe `group` parameter helps organize related runs together in the dashboard sidebar. This is useful when user is running multiple experiments with different configurations but wants to compare them together:\n\n```python\n# Example: Group runs by experiment type\ntrackio.init(project=\"my-project\", run_name=\"baseline-run-1\", group=\"baseline\")\ntrackio.init(project=\"my-project\", run_name=\"augmented-run-1\", group=\"augmented\")\ntrackio.init(project=\"my-project\", run_name=\"tuned-run-1\", group=\"tuned\")\n```\n\nRuns with the same group name can be grouped together in the sidebar, making it easier to compare related experiments. You can group by any configuration parameter:\n\n```python\n# Hyperparameter sweep - group by learning rate\ntrackio.init(project=\"hyperparam-sweep\", run_name=\"lr-0.001-run\", group=\"lr_0.001\")\ntrackio.init(project=\"hyperparam-sweep\", run_name=\"lr-0.01-run\", group=\"lr_0.01\")\n```\n\n## Environment Variables for Jobs\n\nYou can configure trackio using environment variables instead of passing parameters to `trackio.init()`. This is useful for managing configuration across multiple jobs.\n\n\n\n**`HF_TOKEN`**\nRequired for creating Spaces and writing to datasets (passed via `secrets`):\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"...\",\n    \"secrets\": {\n        \"HF_TOKEN\": \"$HF_TOKEN\"  # Enables Space creation and Hub push\n    }\n})\n```\n\n### Example with Environment Variables\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# Training script - trackio config from environment\nimport trackio\nfrom datetime import datetime\n\n# Auto-generate run name\ntimestamp = datetime.now().strftime(\"%Y-%m-%d_%H-%M\")\nrun_name = f\"sft_qwen25_{timestamp}\"\n\n# Project and space_id can come from environment variables\ntrackio.init(run_name=run_name, group=\"SFT\")\n\n# ... training code ...\ntrackio.finish()\n\"\"\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"2h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n**When to use environment variables:**\n- Managing multiple jobs with same configuration\n- Keeping training scripts portable across projects\n- Separating configuration from code\n\n**When to use direct parameters:**\n- Single job with specific configuration\n- When clarity in code is preferred\n- When each job has different project/space\n\n## Viewing the Dashboard\n\nAfter starting training:\n1. Navigate to the Space: `https://huggingface.co/spaces/username/trackio`\n2. The Gradio dashboard shows all tracked experiments\n3. Filter by project, compare runs, view charts with smoothing\n\n## Recommendation\n\n- **Trackio**: Best for real-time monitoring during long training runs\n- **Weights & Biases**: Best for team collaboration, requires account\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/training_methods.md",
    "content": "# TRL Training Methods Overview\n\nTRL (Transformer Reinforcement Learning) provides multiple training methods for fine-tuning and aligning language models. This reference provides a brief overview of each method.\n\n## Supervised Fine-Tuning (SFT)\n\n**What it is:** Standard instruction tuning with supervised learning on demonstration data.\n\n**When to use:**\n- Initial fine-tuning of base models on task-specific data\n- Teaching new capabilities or domains\n- Most common starting point for fine-tuning\n\n**Dataset format:** Conversational format with \"messages\" field, OR text field, OR prompt/completion pairs\n\n**Example:**\n```python\nfrom trl import SFTTrainer, SFTConfig\n\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset,\n    args=SFTConfig(\n        output_dir=\"my-model\",\n        push_to_hub=True,\n        hub_model_id=\"username/my-model\",\n        eval_strategy=\"no\",  # Disable eval for simple example\n        # max_length=1024 is the default - only set if you need different length\n    )\n)\ntrainer.train()\n```\n\n**Note:** For production training with evaluation monitoring, see `scripts/train_sft_example.py`\n\n**Documentation:** `hf_doc_fetch(\"https://huggingface.co/docs/trl/sft_trainer\")`\n\n## Direct Preference Optimization (DPO)\n\n**What it is:** Alignment method that trains directly on preference pairs (chosen vs rejected responses) without requiring a reward model.\n\n**When to use:**\n- Aligning models to human preferences\n- Improving response quality after SFT\n- Have paired preference data (chosen/rejected responses)\n\n**Dataset format:** Preference pairs with \"chosen\" and \"rejected\" fields\n\n**Example:**\n```python\nfrom trl import DPOTrainer, DPOConfig\n\ntrainer = DPOTrainer(\n    model=\"Qwen/Qwen2.5-0.5B-Instruct\",  # Use instruct model\n    train_dataset=dataset,\n    args=DPOConfig(\n        output_dir=\"dpo-model\",\n        beta=0.1,  # KL penalty coefficient\n        eval_strategy=\"no\",  # Disable eval for simple example\n        # max_length=1024 is the default - only set if you need different length\n    )\n)\ntrainer.train()\n```\n\n**Note:** For production training with evaluation monitoring, see `scripts/train_dpo_example.py`\n\n**Documentation:** `hf_doc_fetch(\"https://huggingface.co/docs/trl/dpo_trainer\")`\n\n## Group Relative Policy Optimization (GRPO)\n\n**What it is:** Online RL method that optimizes relative to group performance, useful for tasks with verifiable rewards.\n\n**When to use:**\n- Tasks with automatic reward signals (code execution, math verification)\n- Online learning scenarios\n- When DPO offline data is insufficient\n\n**Dataset format:** Prompt-only format (model generates responses, reward computed online)\n\n**Example:**\n```python\n# Use TRL maintained script\nhf_jobs(\"uv\", {\n    \"script\": \"https://raw.githubusercontent.com/huggingface/trl/main/examples/scripts/grpo.py\",\n    \"script_args\": [\n        \"--model_name_or_path\", \"Qwen/Qwen2.5-0.5B-Instruct\",\n        \"--dataset_name\", \"trl-lib/math_shepherd\",\n        \"--output_dir\", \"grpo-model\"\n    ],\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"4h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n**Documentation:** `hf_doc_fetch(\"https://huggingface.co/docs/trl/grpo_trainer\")`\n\n## Reward Modeling\n\n**What it is:** Train a reward model to score responses, used as a component in RLHF pipelines.\n\n**When to use:**\n- Building RLHF pipeline\n- Need automatic quality scoring\n- Creating reward signals for PPO training\n\n**Dataset format:** Preference pairs with \"chosen\" and \"rejected\" responses\n\n**Documentation:** `hf_doc_fetch(\"https://huggingface.co/docs/trl/reward_trainer\")`\n\n## Method Selection Guide\n\n| Method | Complexity | Data Required | Use Case |\n|--------|-----------|---------------|----------|\n| **SFT** | Low | Demonstrations | Initial fine-tuning |\n| **DPO** | Medium | Paired preferences | Post-SFT alignment |\n| **GRPO** | Medium | Prompts + reward fn | Online RL with automatic rewards |\n| **Reward** | Medium | Paired preferences | Building RLHF pipeline |\n\n## Recommended Pipeline\n\n**For most use cases:**\n1. **Start with SFT** - Fine-tune base model on task data\n2. **Follow with DPO** - Align to preferences using paired data\n3. **Optional: GGUF conversion** - Deploy for local inference\n\n**For advanced RL scenarios:**\n1. **Start with SFT** - Fine-tune base model\n2. **Train reward model** - On preference data\n\n## Dataset Format Reference\n\nFor complete dataset format specifications, use:\n```python\nhf_doc_fetch(\"https://huggingface.co/docs/trl/dataset_formats\")\n```\n\nOr validate your dataset:\n```bash\nuv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \\\n  --dataset your/dataset --split train\n```\n\n## See Also\n\n- `references/training_patterns.md` - Common training patterns and examples\n- `scripts/train_sft_example.py` - Complete SFT template\n- `scripts/train_dpo_example.py` - Complete DPO template\n- [Dataset Inspector](https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py) - Dataset format validation tool\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/training_patterns.md",
    "content": "# Common Training Patterns\n\nThis guide provides common training patterns and use cases for TRL on Hugging Face Jobs.\n\n## Multi-GPU Training\n\nAutomatic distributed training across multiple GPUs. TRL/Accelerate handles distribution automatically:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# Your training script here (same as single GPU)\n# No changes needed - Accelerate detects multiple GPUs\n\"\"\",\n    \"flavor\": \"a10g-largex2\",  # 2x A10G GPUs\n    \"timeout\": \"4h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n**Tips for multi-GPU:**\n- No code changes needed\n- Use `per_device_train_batch_size` (per GPU, not total)\n- Effective batch size = `per_device_train_batch_size` × `num_gpus` × `gradient_accumulation_steps`\n- Monitor GPU utilization to ensure both GPUs are being used\n\n## DPO Training (Preference Learning)\n\nTrain with preference data for alignment:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"\n# /// script\n# dependencies = [\"trl>=0.12.0\", \"trackio\"]\n# ///\n\nfrom datasets import load_dataset\nfrom trl import DPOTrainer, DPOConfig\nimport trackio\n\ndataset = load_dataset(\"trl-lib/ultrafeedback_binarized\", split=\"train\")\n\n# Create train/eval split\ndataset_split = dataset.train_test_split(test_size=0.1, seed=42)\n\nconfig = DPOConfig(\n    output_dir=\"dpo-model\",\n    push_to_hub=True,\n    hub_model_id=\"username/dpo-model\",\n    num_train_epochs=1,\n    beta=0.1,  # KL penalty coefficient\n    eval_strategy=\"steps\",\n    eval_steps=50,\n    report_to=\"trackio\",\n    run_name=\"baseline_run\", # use a meaningful run name\n    # max_length=1024,  # Default - only set if you need different sequence length\n)\n\ntrainer = DPOTrainer(\n    model=\"Qwen/Qwen2.5-0.5B-Instruct\",  # Use instruct model as base\n    train_dataset=dataset_split[\"train\"],\n    eval_dataset=dataset_split[\"test\"],  # IMPORTANT: Provide eval_dataset when eval_strategy is enabled\n    args=config,\n)\n\ntrainer.train()\ntrainer.push_to_hub()\ntrackio.finish()\n\"\"\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"3h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n**For DPO documentation:** Use `hf_doc_fetch(\"https://huggingface.co/docs/trl/dpo_trainer\")`\n\n## GRPO Training (Online RL)\n\nGroup Relative Policy Optimization for online reinforcement learning:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"https://raw.githubusercontent.com/huggingface/trl/main/examples/scripts/grpo.py\",\n    \"script_args\": [\n        \"--model_name_or_path\", \"Qwen/Qwen2.5-0.5B-Instruct\",\n        \"--dataset_name\", \"trl-lib/math_shepherd\",\n        \"--output_dir\", \"grpo-model\",\n        \"--push_to_hub\",\n        \"--hub_model_id\", \"username/grpo-model\"\n    ],\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"4h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n**For GRPO documentation:** Use `hf_doc_fetch(\"https://huggingface.co/docs/trl/grpo_trainer\")`\n\n## Trackio Configuration\n\n**Use sensible defaults for trackio setup.** See `references/trackio_guide.md` for complete documentation including grouping runs for experiments.\n\n### Basic Pattern\n\n```python\nimport trackio\n\ntrackio.init(\n    project=\"my-training\",\n    run_name=\"baseline-run\",             # Descriptive name user will recognize\n    space_id=\"username/trackio\",     # Default space: {username}/trackio\n    config={\n        # Keep config minimal - hyperparameters and model/dataset info only\n        \"model\": \"Qwen/Qwen2.5-0.5B\",\n        \"dataset\": \"trl-lib/Capybara\",\n        \"learning_rate\": 2e-5,\n    }\n)\n\n# Your training code...\n\ntrackio.finish()\n```\n\n### Grouping for Experiments (Optional)\n\nWhen user wants to compare related runs, use the `group` parameter:\n\n```python\n# Hyperparameter sweep\ntrackio.init(project=\"hyperparam-sweep\", run_name=\"lr-0.001\", group=\"lr_0.001\")\ntrackio.init(project=\"hyperparam-sweep\", run_name=\"lr-0.01\", group=\"lr_0.01\")\n```\n\n## Pattern Selection Guide\n\n| Use Case | Pattern | Hardware | Time |\n|----------|---------|----------|------|\n| SFT training | `scripts/train_sft_example.py` | a10g-large | 2-6 hours |\n| Large dataset (>10K) | Multi-GPU | a10g-largex2 | 4-12 hours |\n| Preference learning | DPO Training | a10g-large | 2-4 hours |\n| Online RL | GRPO Training | a10g-large | 3-6 hours |\n\n## Critical: Evaluation Dataset Requirements\n\n**⚠️ IMPORTANT**: If you set `eval_strategy=\"steps\"` or `eval_strategy=\"epoch\"`, you **MUST** provide an `eval_dataset` to the trainer, or the training will hang.\n\n### ✅ CORRECT - With eval dataset:\n```python\ndataset_split = dataset.train_test_split(test_size=0.1, seed=42)\n\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset_split[\"train\"],\n    eval_dataset=dataset_split[\"test\"],  # ← MUST provide when eval_strategy is enabled\n    args=SFTConfig(eval_strategy=\"steps\", ...),\n)\n```\n\n### ❌ WRONG - Will hang:\n```python\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset,\n    # NO eval_dataset but eval_strategy=\"steps\" ← WILL HANG\n    args=SFTConfig(eval_strategy=\"steps\", ...),\n)\n```\n\n### Option: Disable evaluation if no eval dataset\n```python\nconfig = SFTConfig(\n    eval_strategy=\"no\",  # ← Explicitly disable evaluation\n    # ... other config\n)\n\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset,\n    # No eval_dataset needed\n    args=config,\n)\n```\n\n## Best Practices\n\n1. **Use train/eval splits** - Create evaluation split for monitoring progress\n2. **Enable Trackio** - Monitor progress in real-time\n3. **Add 20-30% buffer to timeout** - Account for loading/saving overhead\n4. **Test with TRL official scripts first** - Use maintained examples before custom code\n5. **Always provide eval_dataset** - When using eval_strategy, or set to \"no\"\n6. **Use multi-GPU for large models** - 7B+ models benefit significantly\n\n## See Also\n\n- `scripts/train_sft_example.py` - Complete SFT template with Trackio and eval split\n- `scripts/train_dpo_example.py` - Complete DPO template\n- `scripts/train_grpo_example.py` - Complete GRPO template\n- `references/hardware_guide.md` - Detailed hardware specifications\n- `references/training_methods.md` - Overview of all TRL training methods\n- `references/troubleshooting.md` - Common issues and solutions\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/troubleshooting.md",
    "content": "# Troubleshooting TRL Training Jobs\n\nCommon issues and solutions when training with TRL on Hugging Face Jobs.\n\n## Training Hangs at \"Starting training...\" Step\n\n**Problem:** Job starts but hangs at the training step - never progresses, never times out, just sits there.\n\n**Root Cause:** Using `eval_strategy=\"steps\"` or `eval_strategy=\"epoch\"` without providing an `eval_dataset` to the trainer.\n\n**Solution:**\n\n**Option A: Provide eval_dataset (recommended)**\n```python\n# Create train/eval split\ndataset_split = dataset.train_test_split(test_size=0.1, seed=42)\n\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset_split[\"train\"],\n    eval_dataset=dataset_split[\"test\"],  # ← MUST provide when eval_strategy is enabled\n    args=SFTConfig(\n        eval_strategy=\"steps\",\n        eval_steps=50,\n        ...\n    ),\n)\n```\n\n**Option B: Disable evaluation**\n```python\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=dataset,\n    # No eval_dataset\n    args=SFTConfig(\n        eval_strategy=\"no\",  # ← Explicitly disable\n        ...\n    ),\n)\n```\n\n**Prevention:**\n- Always create train/eval split for better monitoring\n- Use `dataset.train_test_split(test_size=0.1, seed=42)`\n- Check example scripts: `scripts/train_sft_example.py` includes proper eval setup\n\n## Job Times Out\n\n**Problem:** Job terminates before training completes, all progress lost.\n\n**Solutions:**\n- Increase timeout parameter (e.g., `\"timeout\": \"4h\"`)\n- Reduce `num_train_epochs` or use smaller dataset slice\n- Use smaller model or enable LoRA/PEFT to speed up training\n- Add 20-30% buffer to estimated time for loading/saving overhead\n\n**Prevention:**\n- Always start with a quick demo run to estimate timing\n- Use `scripts/estimate_cost.py` to get time estimates\n- Monitor first runs closely via Trackio or logs\n\n## Model Not Saved to Hub\n\n**Problem:** Training completes but model doesn't appear on Hub - all work lost.\n\n**Check:**\n- [ ] `push_to_hub=True` in training config\n- [ ] `hub_model_id` specified with username (e.g., `\"username/model-name\"`)\n- [ ] `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` in job submission\n- [ ] User has write access to target repo\n- [ ] Token has write permissions (check at https://huggingface.co/settings/tokens)\n- [ ] Training script calls `trainer.push_to_hub()` at the end\n\n**See:** `references/hub_saving.md` for detailed Hub authentication troubleshooting\n\n## Out of Memory (OOM)\n\n**Problem:** Job fails with CUDA out of memory error.\n\n**Solutions (in order of preference):**\n1. **Reduce batch size:** Lower `per_device_train_batch_size` (try 4 → 2 → 1)\n2. **Increase gradient accumulation:** Raise `gradient_accumulation_steps` to maintain effective batch size\n3. **Disable evaluation:** Remove `eval_dataset` and `eval_strategy` (saves ~40% memory, good for demos)\n4. **Enable LoRA/PEFT:** Use `peft_config=LoraConfig(r=8, lora_alpha=16)` to train adapters only (smaller rank = less memory)\n5. **Use larger GPU:** Switch from `t4-small` → `l4x1` → `a10g-large` → `a100-large`\n6. **Enable gradient checkpointing:** Set `gradient_checkpointing=True` in config (slower but saves memory)\n7. **Use smaller model:** Try a smaller variant (e.g., 0.5B instead of 3B)\n\n**Memory guidelines:**\n- T4 (16GB): <1B models with LoRA\n- A10G (24GB): 1-3B models with LoRA, <1B full fine-tune\n- A100 (40GB/80GB): 7B+ models with LoRA, 3B full fine-tune\n\n## Parameter Naming Issues\n\n**Problem:** `TypeError: SFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'`\n\n**Cause:** TRL config classes use `max_length`, not `max_seq_length`.\n\n**Solution:**\n```python\n# ✅ CORRECT - TRL uses max_length\nSFTConfig(max_length=512)\nDPOConfig(max_length=512)\n\n# ❌ WRONG - This will fail\nSFTConfig(max_seq_length=512)\n```\n\n**Note:** Most TRL configs don't require explicit max_length - the default (1024) works well. Only set if you need a specific value.\n\n## Dataset Format Error\n\n**Problem:** Training fails with dataset format errors or missing fields.\n\n**Solutions:**\n1. **Check format documentation:**\n   ```python\n   hf_doc_fetch(\"https://huggingface.co/docs/trl/dataset_formats\")\n   ```\n\n2. **Validate dataset before training:**\n   ```bash\n   uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \\\n     --dataset <dataset-name> --split train\n   ```\n   Or via hf_jobs:\n   ```python\n   hf_jobs(\"uv\", {\n       \"script\": \"https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py\",\n       \"script_args\": [\"--dataset\", \"dataset-name\", \"--split\", \"train\"]\n   })\n   ```\n\n3. **Verify field names:**\n   - **SFT:** Needs \"messages\" field (conversational), OR \"text\" field, OR \"prompt\"/\"completion\"\n   - **DPO:** Needs \"chosen\" and \"rejected\" fields\n   - **GRPO:** Needs prompt-only format\n\n4. **Check dataset split:**\n   - Ensure split exists (e.g., `split=\"train\"`)\n   - Preview dataset: `load_dataset(\"name\", split=\"train[:5]\")`\n\n## Import/Module Errors\n\n**Problem:** Job fails with \"ModuleNotFoundError\" or import errors.\n\n**Solutions:**\n1. **Add PEP 723 header with dependencies:**\n   ```python\n   # /// script\n   # dependencies = [\n   #     \"trl>=0.12.0\",\n   #     \"peft>=0.7.0\",\n   #     \"transformers>=4.36.0\",\n   # ]\n   # ///\n   ```\n\n2. **Verify exact format:**\n   - Must have `# ///` delimiters (with space after `#`)\n   - Dependencies must be valid PyPI package names\n   - Check spelling and version constraints\n\n3. **Test locally first:**\n   ```bash\n   uv run train.py  # Tests if dependencies are correct\n   ```\n\n## Authentication Errors\n\n**Problem:** Job fails with authentication or permission errors when pushing to Hub.\n\n**Solutions:**\n1. **Verify authentication:**\n   ```python\n   mcp__huggingface__hf_whoami()  # Check who's authenticated\n   ```\n\n2. **Check token permissions:**\n   - Go to https://huggingface.co/settings/tokens\n   - Ensure token has \"write\" permission\n   - Token must not be \"read-only\"\n\n3. **Verify token in job:**\n   ```python\n   \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # Must be in job config\n   ```\n\n4. **Check repo permissions:**\n   - User must have write access to target repo\n   - If org repo, user must be member with write access\n   - Repo must exist or user must have permission to create\n\n## Job Stuck or Not Starting\n\n**Problem:** Job shows \"pending\" or \"starting\" for extended period.\n\n**Solutions:**\n- Check Jobs dashboard for status: https://huggingface.co/jobs\n- Verify hardware availability (some GPU types may have queues)\n- Try different hardware flavor if one is heavily utilized\n- Check for account billing issues (Jobs requires paid plan)\n\n**Typical startup times:**\n- CPU jobs: 10-30 seconds\n- GPU jobs: 30-90 seconds\n- If >3 minutes: likely queued or stuck\n\n## Training Loss Not Decreasing\n\n**Problem:** Training runs but loss stays flat or doesn't improve.\n\n**Solutions:**\n1. **Check learning rate:** May be too low (try 2e-5 to 5e-5) or too high (try 1e-6)\n2. **Verify dataset quality:** Inspect examples to ensure they're reasonable\n3. **Check model size:** Very small models may not have capacity for task\n4. **Increase training steps:** May need more epochs or larger dataset\n5. **Verify dataset format:** Wrong format may cause degraded training\n\n## Logs Not Appearing\n\n**Problem:** Cannot see training logs or progress.\n\n**Solutions:**\n1. **Wait 30-60 seconds:** Initial logs can be delayed\n2. **Check logs via MCP tool:**\n   ```python\n   hf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})\n   ```\n3. **Use Trackio for real-time monitoring:** See `references/trackio_guide.md`\n4. **Verify job is actually running:**\n   ```python\n   hf_jobs(\"inspect\", {\"job_id\": \"your-job-id\"})\n   ```\n\n## Checkpoint/Resume Issues\n\n**Problem:** Cannot resume from checkpoint or checkpoint not saved.\n\n**Solutions:**\n1. **Enable checkpoint saving:**\n   ```python\n   SFTConfig(\n       save_strategy=\"steps\",\n       save_steps=100,\n       hub_strategy=\"every_save\",  # Push each checkpoint\n   )\n   ```\n\n2. **Verify checkpoints pushed to Hub:** Check model repo for checkpoint folders\n\n3. **Resume from checkpoint:**\n   ```python\n   trainer = SFTTrainer(\n       model=\"username/model-name\",  # Can be checkpoint path\n       resume_from_checkpoint=\"username/model-name/checkpoint-1000\",\n   )\n   ```\n\n## Getting Help\n\nIf issues persist:\n\n1. **Check TRL documentation:**\n   ```python\n   hf_doc_search(\"your issue\", product=\"trl\")\n   ```\n\n2. **Check Jobs documentation:**\n   ```python\n   hf_doc_fetch(\"https://huggingface.co/docs/huggingface_hub/guides/jobs\")\n   ```\n\n3. **Review related guides:**\n   - `references/hub_saving.md` - Hub authentication issues\n   - `references/hardware_guide.md` - Hardware selection and specs\n   - `references/training_patterns.md` - Eval dataset requirements\n   - SKILL.md \"Working with Scripts\" section - Script format and URL issues\n\n4. **Ask in HF forums:** https://discuss.huggingface.co/\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/references/unsloth.md",
    "content": "# Unsloth: Fast Fine-Tuning with Memory Optimization\n\n**Unsloth** is a fine-tuning library that provides ~2x faster training and ~60% less VRAM usage for LLM training. It's particularly useful when working with limited GPU memory or when speed is critical.\n\n- **GitHub**: [unslothai/unsloth](https://github.com/unslothai/unsloth)\n- **Docs**: [unsloth.ai/docs](https://unsloth.ai/docs)\n\n## When to Use Unsloth\n\nUse Unsloth if instructed to do so, or one of the following use cases applies:\n\n| Use Case | Recommendation |\n|----------|----------------|\n| Standard text LLM fine-tuning | TRL is sufficient, but Unsloth is faster |\n| Limited GPU memory | **Use Unsloth** - 60% less VRAM |\n| Need maximum speed | **Use Unsloth** - 2x faster |\n| Large models (>13B) | **Use Unsloth** - memory efficiency critical |\n\n## Supported Models\n\nUnsloth supports many popular models including:\n- **Text LLMs**: Llama 3/3.1/3.2/3.3, Qwen 2.5/3, Mistral, Phi-4, Gemma 2/3, LFM2/2.5\n- **Vision LLMs**: Qwen3-VL, Gemma 3, Llama 3.2 Vision, Pixtral\n\nUse Unsloth's pre-optimized model variants when available:\n```python\n# Unsloth-optimized models load faster and use less memory\nmodel_id = \"unsloth/LFM2.5-1.2B-Instruct\"      # 4-bit quantized\nmodel_id = \"unsloth/gemma-3-4b-pt\"            # Vision model\nmodel_id = \"unsloth/Qwen3-VL-8B-Instruct\"     # Vision model\n```\n\n## Installation\n\n```python\n# /// script\n# dependencies = [\n#     \"unsloth\",\n#     \"trl\",\n#     \"datasets\",\n#     \"trackio\",\n# ]\n# ///\n```\n\n## Basic Usage: Text LLM\n\n```python\nfrom unsloth import FastLanguageModel\nfrom trl import SFTTrainer, SFTConfig\nfrom datasets import load_dataset\n\n# Load model with Unsloth optimizations\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n    model_name=\"LiquidAI/LFM2.5-1.2B-Instruct\",\n    max_seq_length=4096,\n)\n\n# Add LoRA adapters\nmodel = FastLanguageModel.get_peft_model(\n    model,\n    r=16,\n    lora_alpha=16,\n    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\", \"in_proj\", \"w1\", \"w2\", \"w3\"],\n    lora_dropout=0,\n    bias=\"none\",\n    use_gradient_checkpointing=\"unsloth\",\n    random_state=3407,\n)\n\n# Load dataset\ndataset = load_dataset(\"trl-lib/Capybara\", split=\"train\")\n\n# Train with TRL\ntrainer = SFTTrainer(\n    model=model,\n    tokenizer=tokenizer,\n    train_dataset=dataset,\n    args=SFTConfig(\n        output_dir=\"./output\",\n        per_device_train_batch_size=2,\n        gradient_accumulation_steps=4,\n        max_steps=500,\n        learning_rate=2e-4,\n        report_to=\"trackio\",\n    ),\n)\n\ntrainer.train()\n```\n\n## LFM2.5 Specific Settings\n\nFor LFM2.5 inference, use these recommended generation parameters:\n\n**Instruct models:**\n```python\ntemperature = 0.1\ntop_k = 50\ntop_p = 0.1\nrepetition_penalty = 1.05\n```\n\n**Thinking models:**\n```python\ntemperature = 0.05\ntop_k = 50\nrepetition_penalty = 1.05\n```\n\n## Vision-Language Models (VLMs)\n\nUnsloth provides specialized support for VLMs with `FastVisionModel`:\n\n```python\nfrom unsloth import FastVisionModel, get_chat_template\nfrom unsloth.trainer import UnslothVisionDataCollator\nfrom trl import SFTTrainer, SFTConfig\nfrom datasets import load_dataset\n\n# Load VLM with Unsloth\nmodel, processor = FastVisionModel.from_pretrained(\n    \"unsloth/gemma-3-4b-pt\",  # or \"unsloth/Qwen3-VL-8B-Instruct\"\n    load_in_4bit=True,\n    use_gradient_checkpointing=\"unsloth\",\n)\n\n# Add LoRA for all modalities\nmodel = FastVisionModel.get_peft_model(\n    model,\n    finetune_vision_layers=True,      # Train vision encoder\n    finetune_language_layers=True,    # Train language model\n    finetune_attention_modules=True,  # Train attention\n    finetune_mlp_modules=True,        # Train MLPs\n    r=16,\n    lora_alpha=32,\n    target_modules=\"all-linear\",\n)\n\n# Apply chat template (required for base models)\nprocessor = get_chat_template(processor, \"gemma-3\")\n\n# Load VLM dataset (with images and messages)\ndataset = load_dataset(\"your-vlm-dataset\", split=\"train\", streaming=True)\n\n# Enable training mode\nFastVisionModel.for_training(model)\n\n# Train with VLM-specific collator\ntrainer = SFTTrainer(\n    model=model,\n    train_dataset=dataset,\n    processing_class=processor.tokenizer,\n    data_collator=UnslothVisionDataCollator(model, processor),\n    args=SFTConfig(\n        output_dir=\"./vlm-output\",\n        per_device_train_batch_size=2,\n        gradient_accumulation_steps=4,\n        max_steps=500,\n        learning_rate=2e-4,\n        # VLM-specific settings\n        remove_unused_columns=False,\n        dataset_text_field=\"\",\n        dataset_kwargs={\"skip_prepare_dataset\": True},\n        report_to=\"trackio\",\n    ),\n)\n\ntrainer.train()\n```\n\n## Key Differences from Standard TRL\n\n| Aspect | Standard TRL | Unsloth |\n|--------|--------------|---------|\n| Model loading | `AutoModelForCausalLM.from_pretrained()` | `FastLanguageModel.from_pretrained()` |\n| LoRA setup | `PeftModel` / `LoraConfig` | `FastLanguageModel.get_peft_model()` |\n| VLM loading | Limited support | `FastVisionModel.from_pretrained()` |\n| VLM collator | Manual | `UnslothVisionDataCollator` |\n| Memory usage | Standard | ~60% less |\n| Training speed | Standard | ~2x faster |\n\n## VLM Dataset Format\n\nVLM datasets should have:\n- `images`: List of PIL images or image paths\n- `messages`: Conversation format with image references\n\n```python\n{\n    \"images\": [<PIL.Image>, ...],\n    \"messages\": [\n        {\"role\": \"user\", \"content\": [\n            {\"type\": \"image\"},\n            {\"type\": \"text\", \"text\": \"Describe this image\"}\n        ]},\n        {\"role\": \"assistant\", \"content\": \"This image shows...\"}\n    ]\n}\n```\n\n## Streaming Datasets\n\nFor large VLM datasets, use streaming to avoid disk space issues:\n\n```python\ndataset = load_dataset(\n    \"your-vlm-dataset\",\n    split=\"train\",\n    streaming=True,  # Stream from Hub\n)\n\n# Must use max_steps with streaming (no epoch-based training)\nSFTConfig(max_steps=500, ...)\n```\n\n## Saving Models\n\n### Save LoRA Adapter\n\n```python\nmodel.save_pretrained(\"./adapter\")\nprocessor.save_pretrained(\"./adapter\")\n\n# Push to Hub\nmodel.push_to_hub(\"username/my-vlm-adapter\")\nprocessor.push_to_hub(\"username/my-vlm-adapter\")\n```\n\n### Merge and Save Full Model\n\n```python\n# Merge LoRA weights into base model\nmodel = model.merge_and_unload()\n\n# Save merged model\nmodel.save_pretrained(\"./merged\")\ntokenizer.save_pretrained(\"./merged\")\n```\n\n### Convert to GGUF\n\nUnsloth models can be converted to GGUF for llama.cpp/Ollama:\n\n```python\n# Save in 16-bit for GGUF conversion\nmodel.save_pretrained_gguf(\"./gguf\", tokenizer, quantization_method=\"f16\")\n\n# Or directly quantize\nmodel.save_pretrained_gguf(\"./gguf\", tokenizer, quantization_method=\"q4_k_m\")\n```\n\n## Qwen3-VL Specific Settings\n\nFor Qwen3-VL models, use these recommended settings:\n\n**Instruct models:**\n```python\ntemperature = 0.7\ntop_p = 0.8\npresence_penalty = 1.5\n```\n\n**Thinking models:**\n```python\ntemperature = 1.0\ntop_p = 0.95\npresence_penalty = 0.0\n```\n\n## Hardware Requirements\n\n| Model | Min VRAM (Unsloth 4-bit) | Recommended GPU |\n|-------|--------------------------|-----------------|\n| 2B-4B | 8GB | T4, L4 |\n| 7B-8B | 16GB | A10G, L4x4 |\n| 13B | 24GB | A10G-large |\n| 30B+ | 48GB+ | A100 |\n\n## Example: Full VLM Training Script\n\nSee `scripts/unsloth_sft_example.py` for a complete production-ready example that includes:\n- Unsloth VLM setup\n- Streaming dataset support\n- Trackio monitoring\n- Hub push\n- CLI arguments\n\nRun locally:\n```bash\nuv run scripts/unsloth_sft_example.py \\\n    --dataset trl-lib/Capybara \\\n    --max-steps 500 \\\n    --output-repo username/my-model\n```\n\nRun on HF Jobs:\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"<script content>\",\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"2h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n## See Also\n\n- `scripts/unsloth_sft_example.py` - Complete text LLM training example\n- [Unsloth Documentation](https://unsloth.ai/docs)\n- [LFM2.5 Guide](https://unsloth.ai/docs/models/tutorials/lfm2.5)\n- [Qwen3-VL Guide](https://unsloth.ai/docs/models/qwen3-vl-how-to-run-and-fine-tune)\n- [Unsloth GitHub](https://github.com/unslothai/unsloth)\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/convert_to_gguf.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"transformers>=4.36.0\",\n#     \"peft>=0.7.0\",\n#     \"torch>=2.0.0\",\n#     \"accelerate>=0.24.0\",\n#     \"huggingface_hub>=0.20.0\",\n#     \"sentencepiece>=0.1.99\",\n#     \"protobuf>=3.20.0\",\n#     \"numpy\",\n#     \"gguf\",\n# ]\n# ///\n\n\"\"\"\nGGUF Conversion Script - Production Ready\n\nThis script converts a LoRA fine-tuned model to GGUF format for use with:\n- llama.cpp\n- Ollama\n- LM Studio\n- Other GGUF-compatible tools\n\nPREREQUISITES (install these FIRST):\n- Ubuntu/Debian: sudo apt-get update && sudo apt-get install -y build-essential cmake\n- RHEL/CentOS: sudo yum groupinstall -y \"Development Tools\" && sudo yum install -y cmake\n- macOS: xcode-select --install && brew install cmake\n\nUsage:\n    Set environment variables:\n    - ADAPTER_MODEL: Your fine-tuned model (e.g., \"username/my-finetuned-model\")\n    - BASE_MODEL: Base model used for fine-tuning (e.g., \"Qwen/Qwen2.5-0.5B\")\n    - OUTPUT_REPO: Where to upload GGUF files (e.g., \"username/my-model-gguf\")\n    - HF_USERNAME: Your Hugging Face username (optional, for README)\n\nDependencies: All required packages are declared in PEP 723 header above.\n\"\"\"\n\nimport os\nimport sys\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom peft import PeftModel\nfrom huggingface_hub import HfApi\nimport subprocess\n\n\ndef check_system_dependencies():\n    \"\"\"Check if required system packages are available.\"\"\"\n    print(\"🔍 Checking system dependencies...\")\n    \n    # Check for git\n    if subprocess.run([\"which\", \"git\"], capture_output=True).returncode != 0:\n        print(\"  ❌ git is not installed. Please install it:\")\n        print(\"     Ubuntu/Debian: sudo apt-get install git\")\n        print(\"     RHEL/CentOS: sudo yum install git\")\n        print(\"     macOS: brew install git\")\n        return False\n    \n    # Check for make or cmake\n    has_make = subprocess.run([\"which\", \"make\"], capture_output=True).returncode == 0\n    has_cmake = subprocess.run([\"which\", \"cmake\"], capture_output=True).returncode == 0\n    \n    if not has_make and not has_cmake:\n        print(\"  ❌ Neither make nor cmake found. Please install build tools:\")\n        print(\"     Ubuntu/Debian: sudo apt-get install build-essential cmake\")\n        print(\"     RHEL/CentOS: sudo yum groupinstall 'Development Tools' && sudo yum install cmake\")\n        print(\"     macOS: xcode-select --install && brew install cmake\")\n        return False\n    \n    print(\"  ✅ System dependencies found\")\n    return True\n\n\ndef run_command(cmd, description):\n    \"\"\"Run a command with error handling.\"\"\"\n    print(f\"   {description}...\")\n    try:\n        result = subprocess.run(\n            cmd,\n            check=True,\n            capture_output=True,\n            text=True\n        )\n        if result.stdout:\n            print(f\"   {result.stdout[:200]}\")  # Show first 200 chars\n        return True\n    except subprocess.CalledProcessError as e:\n        print(f\"   ❌ Command failed: {' '.join(cmd)}\")\n        if e.stdout:\n            print(f\"   STDOUT: {e.stdout[:500]}\")\n        if e.stderr:\n            print(f\"   STDERR: {e.stderr[:500]}\")\n        return False\n    except FileNotFoundError:\n        print(f\"   ❌ Command not found: {cmd[0]}\")\n        return False\n\n\nprint(\"🔄 GGUF Conversion Script\")\nprint(\"=\" * 60)\n\n# Check system dependencies first\nif not check_system_dependencies():\n    print(\"\\n❌ Please install the missing system dependencies and try again.\")\n    sys.exit(1)\n\n# Configuration from environment variables\nADAPTER_MODEL = os.environ.get(\"ADAPTER_MODEL\", \"evalstate/qwen-capybara-medium\")\nBASE_MODEL = os.environ.get(\"BASE_MODEL\", \"Qwen/Qwen2.5-0.5B\")\nOUTPUT_REPO = os.environ.get(\"OUTPUT_REPO\", \"evalstate/qwen-capybara-medium-gguf\")\nusername = os.environ.get(\"HF_USERNAME\", ADAPTER_MODEL.split('/')[0])\n\nprint(f\"\\n📦 Configuration:\")\nprint(f\"   Base model: {BASE_MODEL}\")\nprint(f\"   Adapter model: {ADAPTER_MODEL}\")\nprint(f\"   Output repo: {OUTPUT_REPO}\")\n\n# Step 1: Load base model and adapter\nprint(\"\\n🔧 Step 1: Loading base model and LoRA adapter...\")\nprint(\"   (This may take a few minutes)\")\n\ntry:\n    base_model = AutoModelForCausalLM.from_pretrained(\n        BASE_MODEL,\n        dtype=torch.float16,\n        device_map=\"auto\",\n        trust_remote_code=True,\n    )\n    print(\"   ✅ Base model loaded\")\nexcept Exception as e:\n    print(f\"   ❌ Failed to load base model: {e}\")\n    sys.exit(1)\n\ntry:\n    # Load and merge adapter\n    print(\"   Loading LoRA adapter...\")\n    model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)\n    print(\"   ✅ Adapter loaded\")\n\n    print(\"   Merging adapter with base model...\")\n    merged_model = model.merge_and_unload()\n    print(\"   ✅ Models merged!\")\nexcept Exception as e:\n    print(f\"   ❌ Failed to merge models: {e}\")\n    sys.exit(1)\n\ntry:\n    # Load tokenizer\n    tokenizer = AutoTokenizer.from_pretrained(ADAPTER_MODEL, trust_remote_code=True)\n    print(\"   ✅ Tokenizer loaded\")\nexcept Exception as e:\n    print(f\"   ❌ Failed to load tokenizer: {e}\")\n    sys.exit(1)\n\n# Step 2: Save merged model temporarily\nprint(\"\\n💾 Step 2: Saving merged model...\")\nmerged_dir = \"/tmp/merged_model\"\ntry:\n    merged_model.save_pretrained(merged_dir, safe_serialization=True)\n    tokenizer.save_pretrained(merged_dir)\n    print(f\"   ✅ Merged model saved to {merged_dir}\")\nexcept Exception as e:\n    print(f\"   ❌ Failed to save merged model: {e}\")\n    sys.exit(1)\n\n# Step 3: Install llama.cpp for conversion\nprint(\"\\n📥 Step 3: Setting up llama.cpp for GGUF conversion...\")\n\n# Clone llama.cpp repository\nif not run_command(\n    [\"git\", \"clone\", \"https://github.com/ggerganov/llama.cpp.git\", \"/tmp/llama.cpp\"],\n    \"Cloning llama.cpp repository\"\n):\n    print(\"   Trying alternative clone method...\")\n    # Try shallow clone\n    if not run_command(\n        [\"git\", \"clone\", \"--depth\", \"1\", \"https://github.com/ggerganov/llama.cpp.git\", \"/tmp/llama.cpp\"],\n        \"Cloning llama.cpp (shallow)\"\n    ):\n        sys.exit(1)\n\n# Install Python dependencies\nprint(\"   Installing Python dependencies...\")\nif not run_command(\n    [\"pip\", \"install\", \"-r\", \"/tmp/llama.cpp/requirements.txt\"],\n    \"Installing llama.cpp requirements\"\n):\n    print(\"   ⚠️  Some requirements may already be installed\")\n\nif not run_command(\n    [\"pip\", \"install\", \"sentencepiece\", \"protobuf\"],\n    \"Installing tokenizer dependencies\"\n):\n    print(\"   ⚠️  Tokenizer dependencies may already be installed\")\n\n# Step 4: Convert to GGUF (FP16)\nprint(\"\\n🔄 Step 4: Converting to GGUF format (FP16)...\")\ngguf_output_dir = \"/tmp/gguf_output\"\nos.makedirs(gguf_output_dir, exist_ok=True)\n\nconvert_script = \"/tmp/llama.cpp/convert_hf_to_gguf.py\"\nmodel_name = ADAPTER_MODEL.split('/')[-1]\ngguf_file = f\"{gguf_output_dir}/{model_name}-f16.gguf\"\n\nprint(f\"   Running conversion...\")\nif not run_command(\n    [\n        sys.executable, convert_script,\n        merged_dir,\n        \"--outfile\", gguf_file,\n        \"--outtype\", \"f16\"\n    ],\n    f\"Converting to FP16\"\n):\n    print(\"   ❌ Conversion failed!\")\n    sys.exit(1)\n\nprint(f\"   ✅ FP16 GGUF created: {gguf_file}\")\n\n# Step 5: Quantize to different formats\nprint(\"\\n⚙️  Step 5: Creating quantized versions...\")\n\n# Build quantize tool using CMake (more reliable than make)\nprint(\"   Building quantize tool with CMake...\")\nos.makedirs(\"/tmp/llama.cpp/build\", exist_ok=True)\n\n# Configure with CMake\nif not run_command(\n    [\"cmake\", \"-B\", \"/tmp/llama.cpp/build\", \"-S\", \"/tmp/llama.cpp\",\n     \"-DGGML_CUDA=OFF\"],\n    \"Configuring with CMake\"\n):\n    print(\"   ❌ CMake configuration failed\")\n    sys.exit(1)\n\n# Build just the quantize tool\nif not run_command(\n    [\"cmake\", \"--build\", \"/tmp/llama.cpp/build\", \"--target\", \"llama-quantize\", \"-j\", \"4\"],\n    \"Building llama-quantize\"\n):\n    print(\"   ❌ Build failed!\")\n    sys.exit(1)\n\nprint(\"   ✅ Quantize tool built\")\n\n# Use the CMake build output path\nquantize_bin = \"/tmp/llama.cpp/build/bin/llama-quantize\"\n\n# Common quantization formats\nquant_formats = [\n    (\"Q4_K_M\", \"4-bit, medium quality (recommended)\"),\n    (\"Q5_K_M\", \"5-bit, higher quality\"),\n    (\"Q8_0\", \"8-bit, very high quality\"),\n]\n\nquantized_files = []\nfor quant_type, description in quant_formats:\n    print(f\"   Creating {quant_type} quantization ({description})...\")\n    quant_file = f\"{gguf_output_dir}/{model_name}-{quant_type.lower()}.gguf\"\n\n    if not run_command(\n        [quantize_bin, gguf_file, quant_file, quant_type],\n        f\"Quantizing to {quant_type}\"\n    ):\n        print(f\"   ⚠️  Skipping {quant_type} due to error\")\n        continue\n\n    quantized_files.append((quant_file, quant_type))\n    \n    # Get file size\n    size_mb = os.path.getsize(quant_file) / (1024 * 1024)\n    print(f\"   ✅ {quant_type}: {size_mb:.1f} MB\")\n\nif not quantized_files:\n    print(\"   ❌ No quantized versions were created successfully\")\n    sys.exit(1)\n\n# Step 6: Upload to Hub\nprint(\"\\n☁️  Step 6: Uploading to Hugging Face Hub...\")\napi = HfApi()\n\n# Create repo\nprint(f\"   Creating repository: {OUTPUT_REPO}\")\ntry:\n    api.create_repo(repo_id=OUTPUT_REPO, repo_type=\"model\", exist_ok=True)\n    print(\"   ✅ Repository ready\")\nexcept Exception as e:\n    print(f\"   ℹ️  Repository may already exist: {e}\")\n\n# Upload FP16 version\nprint(\"   Uploading FP16 GGUF...\")\ntry:\n    api.upload_file(\n        path_or_fileobj=gguf_file,\n        path_in_repo=f\"{model_name}-f16.gguf\",\n        repo_id=OUTPUT_REPO,\n    )\n    print(\"   ✅ FP16 uploaded\")\nexcept Exception as e:\n    print(f\"   ❌ Upload failed: {e}\")\n    sys.exit(1)\n\n# Upload quantized versions\nfor quant_file, quant_type in quantized_files:\n    print(f\"   Uploading {quant_type}...\")\n    try:\n        api.upload_file(\n            path_or_fileobj=quant_file,\n            path_in_repo=f\"{model_name}-{quant_type.lower()}.gguf\",\n            repo_id=OUTPUT_REPO,\n        )\n        print(f\"   ✅ {quant_type} uploaded\")\n    except Exception as e:\n        print(f\"   ❌ Upload failed for {quant_type}: {e}\")\n        continue\n\n# Create README\nprint(\"\\n📝 Creating README...\")\nreadme_content = f\"\"\"---\nbase_model: {BASE_MODEL}\ntags:\n- gguf\n- llama.cpp\n- quantized\n- trl\n- sft\n---\n\n# {OUTPUT_REPO.split('/')[-1]}\n\nThis is a GGUF conversion of [{ADAPTER_MODEL}](https://huggingface.co/{ADAPTER_MODEL}), which is a LoRA fine-tuned version of [{BASE_MODEL}](https://huggingface.co/{BASE_MODEL}).\n\n## Model Details\n\n- **Base Model:** {BASE_MODEL}\n- **Fine-tuned Model:** {ADAPTER_MODEL}\n- **Training:** Supervised Fine-Tuning (SFT) with TRL\n- **Format:** GGUF (for llama.cpp, Ollama, LM Studio, etc.)\n\n## Available Quantizations\n\n| File | Quant | Size | Description | Use Case |\n|------|-------|------|-------------|----------|\n| {model_name}-f16.gguf | F16 | ~1GB | Full precision | Best quality, slower |\n| {model_name}-q8_0.gguf | Q8_0 | ~500MB | 8-bit | High quality |\n| {model_name}-q5_k_m.gguf | Q5_K_M | ~350MB | 5-bit medium | Good quality, smaller |\n| {model_name}-q4_k_m.gguf | Q4_K_M | ~300MB | 4-bit medium | Recommended - good balance |\n\n## Usage\n\n### With llama.cpp\n\n```bash\n# Download model\nhf download {OUTPUT_REPO} {model_name}-q4_k_m.gguf\n\n# Run with llama.cpp\n./llama-cli -m {model_name}-q4_k_m.gguf -p \"Your prompt here\"\n```\n\n### With Ollama\n\n1. Create a `Modelfile`:\n```\nFROM ./{model_name}-q4_k_m.gguf\n```\n\n2. Create the model:\n```bash\nollama create my-model -f Modelfile\nollama run my-model\n```\n\n### With LM Studio\n\n1. Download the `.gguf` file\n2. Import into LM Studio\n3. Start chatting!\n\n## License\n\nInherits the license from the base model: {BASE_MODEL}\n\n## Citation\n\n```bibtex\n@misc{{{OUTPUT_REPO.split('/')[-1].replace('-', '_')},\n  author = {{{username}}},\n  title = {{{OUTPUT_REPO.split('/')[-1]}}},\n  year = {{2025}},\n  publisher = {{Hugging Face}},\n  url = {{https://huggingface.co/{OUTPUT_REPO}}}\n}}\n```\n\n---\n\n*Converted to GGUF format using llama.cpp*\n\"\"\"\n\ntry:\n    api.upload_file(\n        path_or_fileobj=readme_content.encode(),\n        path_in_repo=\"README.md\",\n        repo_id=OUTPUT_REPO,\n    )\n    print(\"   ✅ README uploaded\")\nexcept Exception as e:\n    print(f\"   ❌ README upload failed: {e}\")\n\nprint(\"\\n\" + \"=\" * 60)\nprint(\"✅ GGUF Conversion Complete!\")\nprint(f\"📦 Repository: https://huggingface.co/{OUTPUT_REPO}\")\nprint(f\"\\n📥 Download with:\")\nprint(f\"   hf download {OUTPUT_REPO} {model_name}-q4_k_m.gguf\")\nprint(f\"\\n🚀 Use with Ollama:\")\nprint(\"   1. Download the GGUF file\")\nprint(f\"   2. Create Modelfile: FROM ./{model_name}-q4_k_m.gguf\")\nprint(\"   3. ollama create my-model -f Modelfile\")\nprint(\"   4. ollama run my-model\")\nprint(\"=\" * 60)\n"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/dataset_inspector.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"\nDataset Format Inspector for TRL Training (LLM-Optimized Output)\n\nInspects Hugging Face datasets to determine TRL training compatibility.\nUses Datasets Server API for instant results - no dataset download needed!\n\nULTRA-EFFICIENT: Uses HF Datasets Server API - completes in <2 seconds.\n\nUsage with HF Jobs:\n    hf_jobs(\"uv\", {\n        \"script\": \"https://huggingface.co/datasets/evalstate/trl-helpers/raw/main/dataset_inspector.py\",\n        \"script_args\": [\"--dataset\", \"your/dataset\", \"--split\", \"train\"]\n    })\n\"\"\"\n\nimport argparse\nimport sys\nimport json\nimport urllib.request\nimport urllib.parse\nfrom typing import List, Dict, Any\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Inspect dataset format for TRL training\")\n    parser.add_argument(\"--dataset\", type=str, required=True, help=\"Dataset name\")\n    parser.add_argument(\"--split\", type=str, default=\"train\", help=\"Dataset split (default: train)\")\n    parser.add_argument(\"--config\", type=str, default=\"default\", help=\"Dataset config name (default: default)\")\n    parser.add_argument(\"--preview\", type=int, default=150, help=\"Max chars per field preview\")\n    parser.add_argument(\"--samples\", type=int, default=5, help=\"Number of samples to fetch (default: 5)\")\n    parser.add_argument(\"--json-output\", action=\"store_true\", help=\"Output as JSON\")\n    return parser.parse_args()\n\n\ndef api_request(url: str) -> Dict:\n    \"\"\"Make API request to Datasets Server\"\"\"\n    try:\n        with urllib.request.urlopen(url, timeout=10) as response:\n            return json.loads(response.read().decode())\n    except urllib.error.HTTPError as e:\n        if e.code == 404:\n            return None\n        raise Exception(f\"API request failed: {e.code} {e.reason}\")\n    except Exception as e:\n        raise Exception(f\"API request failed: {str(e)}\")\n\n\ndef get_splits(dataset: str) -> Dict:\n    \"\"\"Get available splits for dataset\"\"\"\n    url = f\"https://datasets-server.huggingface.co/splits?dataset={urllib.parse.quote(dataset)}\"\n    return api_request(url)\n\n\ndef get_rows(dataset: str, config: str, split: str, offset: int = 0, length: int = 5) -> Dict:\n    \"\"\"Get rows from dataset\"\"\"\n    url = f\"https://datasets-server.huggingface.co/rows?dataset={urllib.parse.quote(dataset)}&config={config}&split={split}&offset={offset}&length={length}\"\n    return api_request(url)\n\n\ndef find_columns(columns: List[str], patterns: List[str]) -> List[str]:\n    \"\"\"Find columns matching patterns\"\"\"\n    return [c for c in columns if any(p in c.lower() for p in patterns)]\n\n\ndef check_sft_compatibility(columns: List[str]) -> Dict[str, Any]:\n    \"\"\"Check SFT compatibility\"\"\"\n    has_messages = \"messages\" in columns\n    has_text = \"text\" in columns\n    has_prompt_completion = \"prompt\" in columns and \"completion\" in columns\n    \n    ready = has_messages or has_text or has_prompt_completion\n    \n    possible_prompt = find_columns(columns, [\"prompt\", \"instruction\", \"question\", \"input\"])\n    possible_response = find_columns(columns, [\"response\", \"completion\", \"output\", \"answer\"])\n    \n    return {\n        \"ready\": ready,\n        \"reason\": \"messages\" if has_messages else \"text\" if has_text else \"prompt+completion\" if has_prompt_completion else None,\n        \"possible_prompt\": possible_prompt[0] if possible_prompt else None,\n        \"possible_response\": possible_response[0] if possible_response else None,\n        \"has_context\": \"context\" in columns,\n    }\n\n\ndef check_dpo_compatibility(columns: List[str]) -> Dict[str, Any]:\n    \"\"\"Check DPO compatibility\"\"\"\n    has_standard = \"prompt\" in columns and \"chosen\" in columns and \"rejected\" in columns\n    \n    possible_prompt = find_columns(columns, [\"prompt\", \"instruction\", \"question\", \"input\"])\n    possible_chosen = find_columns(columns, [\"chosen\", \"preferred\", \"winner\"])\n    possible_rejected = find_columns(columns, [\"rejected\", \"dispreferred\", \"loser\"])\n    \n    can_map = bool(possible_prompt and possible_chosen and possible_rejected)\n    \n    return {\n        \"ready\": has_standard,\n        \"can_map\": can_map,\n        \"prompt_col\": possible_prompt[0] if possible_prompt else None,\n        \"chosen_col\": possible_chosen[0] if possible_chosen else None,\n        \"rejected_col\": possible_rejected[0] if possible_rejected else None,\n    }\n\n\ndef check_grpo_compatibility(columns: List[str]) -> Dict[str, Any]:\n    \"\"\"Check GRPO compatibility\"\"\"\n    has_prompt = \"prompt\" in columns\n    has_no_responses = \"chosen\" not in columns and \"rejected\" not in columns\n    \n    possible_prompt = find_columns(columns, [\"prompt\", \"instruction\", \"question\", \"input\"])\n    \n    return {\n        \"ready\": has_prompt and has_no_responses,\n        \"can_map\": bool(possible_prompt) and has_no_responses,\n        \"prompt_col\": possible_prompt[0] if possible_prompt else None,\n    }\n\n\ndef check_kto_compatibility(columns: List[str]) -> Dict[str, Any]:\n    \"\"\"Check KTO compatibility\"\"\"\n    return {\"ready\": \"prompt\" in columns and \"completion\" in columns and \"label\" in columns}\n\n\ndef generate_mapping_code(method: str, info: Dict[str, Any]) -> str:\n    \"\"\"Generate mapping code for a training method\"\"\"\n    if method == \"SFT\":\n        if info[\"ready\"]:\n            return None\n        \n        prompt_col = info.get(\"possible_prompt\")\n        response_col = info.get(\"possible_response\")\n        has_context = info.get(\"has_context\", False)\n        \n        if not prompt_col:\n            return None\n        \n        if has_context and response_col:\n            return f\"\"\"def format_for_sft(example):\n    text = f\"Instruction: {{example['{prompt_col}']}}\\n\\n\"\n    if example.get('context'):\n        text += f\"Context: {{example['context']}}\\n\\n\"\n    text += f\"Response: {{example['{response_col}']}}\"\n    return {{'text': text}}\n\ndataset = dataset.map(format_for_sft, remove_columns=dataset.column_names)\"\"\"\n        elif response_col:\n            return f\"\"\"def format_for_sft(example):\n    return {{'text': f\"{{example['{prompt_col}']}}\\n\\n{{example['{response_col}']}}}}\n\ndataset = dataset.map(format_for_sft, remove_columns=dataset.column_names)\"\"\"\n        else:\n            return f\"\"\"def format_for_sft(example):\n    return {{'text': example['{prompt_col}']}}\n\ndataset = dataset.map(format_for_sft, remove_columns=dataset.column_names)\"\"\"\n    \n    elif method == \"DPO\":\n        if info[\"ready\"] or not info[\"can_map\"]:\n            return None\n        \n        return f\"\"\"def format_for_dpo(example):\n    return {{\n        'prompt': example['{info['prompt_col']}'],\n        'chosen': example['{info['chosen_col']}'],\n        'rejected': example['{info['rejected_col']}'],\n    }}\n\ndataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)\"\"\"\n    \n    elif method == \"GRPO\":\n        if info[\"ready\"] or not info[\"can_map\"]:\n            return None\n        \n        return f\"\"\"def format_for_grpo(example):\n    return {{'prompt': example['{info['prompt_col']}']}}\n\ndataset = dataset.map(format_for_grpo, remove_columns=dataset.column_names)\"\"\"\n    \n    return None\n\n\ndef format_value_preview(value: Any, max_chars: int) -> str:\n    \"\"\"Format value for preview\"\"\"\n    if value is None:\n        return \"None\"\n    elif isinstance(value, str):\n        return value[:max_chars] + (\"...\" if len(value) > max_chars else \"\")\n    elif isinstance(value, list):\n        if len(value) > 0 and isinstance(value[0], dict):\n            return f\"[{len(value)} items] Keys: {list(value[0].keys())}\"\n        preview = str(value)\n        return preview[:max_chars] + (\"...\" if len(preview) > max_chars else \"\")\n    else:\n        preview = str(value)\n        return preview[:max_chars] + (\"...\" if len(preview) > max_chars else \"\")\n\n\ndef main():\n    args = parse_args()\n    \n    print(f\"Fetching dataset info via Datasets Server API...\")\n    \n    try:\n        # Get splits info\n        splits_data = get_splits(args.dataset)\n        if not splits_data or \"splits\" not in splits_data:\n            print(f\"ERROR: Could not fetch splits for dataset '{args.dataset}'\")\n            print(f\"       Dataset may not exist or is not accessible via Datasets Server API\")\n            sys.exit(1)\n        \n        # Find the right config\n        available_configs = set()\n        split_found = False\n        config_to_use = args.config\n        \n        for split_info in splits_data[\"splits\"]:\n            available_configs.add(split_info[\"config\"])\n            if split_info[\"config\"] == args.config and split_info[\"split\"] == args.split:\n                split_found = True\n        \n        # If default config not found, try first available\n        if not split_found and available_configs:\n            config_to_use = list(available_configs)[0]\n            print(f\"Config '{args.config}' not found, trying '{config_to_use}'...\")\n        \n        # Get rows\n        rows_data = get_rows(args.dataset, config_to_use, args.split, offset=0, length=args.samples)\n        \n        if not rows_data or \"rows\" not in rows_data:\n            print(f\"ERROR: Could not fetch rows for dataset '{args.dataset}'\")\n            print(f\"       Split '{args.split}' may not exist\")\n            print(f\"       Available configs: {', '.join(sorted(available_configs))}\")\n            sys.exit(1)\n        \n        rows = rows_data[\"rows\"]\n        if not rows:\n            print(f\"ERROR: No rows found in split '{args.split}'\")\n            sys.exit(1)\n        \n        # Extract column info from first row\n        first_row = rows[0][\"row\"]\n        columns = list(first_row.keys())\n        features = rows_data.get(\"features\", [])\n        \n        # Get total count if available\n        total_examples = \"Unknown\"\n        for split_info in splits_data[\"splits\"]:\n            if split_info[\"config\"] == config_to_use and split_info[\"split\"] == args.split:\n                total_examples = f\"{split_info.get('num_examples', 'Unknown'):,}\" if isinstance(split_info.get('num_examples'), int) else \"Unknown\"\n                break\n        \n    except Exception as e:\n        print(f\"ERROR: {str(e)}\")\n        sys.exit(1)\n    \n    # Run compatibility checks\n    sft_info = check_sft_compatibility(columns)\n    dpo_info = check_dpo_compatibility(columns)\n    grpo_info = check_grpo_compatibility(columns)\n    kto_info = check_kto_compatibility(columns)\n    \n    # Determine recommended methods\n    recommended = []\n    if sft_info[\"ready\"]:\n        recommended.append(\"SFT\")\n    elif sft_info[\"possible_prompt\"]:\n        recommended.append(\"SFT (needs mapping)\")\n    \n    if dpo_info[\"ready\"]:\n        recommended.append(\"DPO\")\n    elif dpo_info[\"can_map\"]:\n        recommended.append(\"DPO (needs mapping)\")\n    \n    if grpo_info[\"ready\"]:\n        recommended.append(\"GRPO\")\n    elif grpo_info[\"can_map\"]:\n        recommended.append(\"GRPO (needs mapping)\")\n    \n    if kto_info[\"ready\"]:\n        recommended.append(\"KTO\")\n    \n    # JSON output mode\n    if args.json_output:\n        result = {\n            \"dataset\": args.dataset,\n            \"config\": config_to_use,\n            \"split\": args.split,\n            \"total_examples\": total_examples,\n            \"columns\": columns,\n            \"features\": [{\"name\": f[\"name\"], \"type\": f[\"type\"]} for f in features] if features else [],\n            \"compatibility\": {\n                \"SFT\": sft_info,\n                \"DPO\": dpo_info,\n                \"GRPO\": grpo_info,\n                \"KTO\": kto_info,\n            },\n            \"recommended_methods\": recommended,\n        }\n        print(json.dumps(result, indent=2))\n        sys.exit(0)\n    \n    # Human-readable output optimized for LLM parsing\n    print(\"=\" * 80)\n    print(f\"DATASET INSPECTION RESULTS\")\n    print(\"=\" * 80)\n    \n    print(f\"\\nDataset: {args.dataset}\")\n    print(f\"Config: {config_to_use}\")\n    print(f\"Split: {args.split}\")\n    print(f\"Total examples: {total_examples}\")\n    print(f\"Samples fetched: {len(rows)}\")\n    \n    print(f\"\\n{'COLUMNS':-<80}\")\n    if features:\n        for feature in features:\n            print(f\"  {feature['name']}: {feature['type']}\")\n    else:\n        for col in columns:\n            print(f\"  {col}: (type info not available)\")\n    \n    print(f\"\\n{'EXAMPLE DATA':-<80}\")\n    example = first_row\n    for col in columns:\n        value = example.get(col)\n        display = format_value_preview(value, args.preview)\n        print(f\"\\n{col}:\")\n        print(f\"  {display}\")\n    \n    print(f\"\\n{'TRAINING METHOD COMPATIBILITY':-<80}\")\n    \n    # SFT\n    print(f\"\\n[SFT] {'✓ READY' if sft_info['ready'] else '✗ NEEDS MAPPING'}\")\n    if sft_info[\"ready\"]:\n        print(f\"  Reason: Dataset has '{sft_info['reason']}' field\")\n        print(f\"  Action: Use directly with SFTTrainer\")\n    elif sft_info[\"possible_prompt\"]:\n        print(f\"  Detected: prompt='{sft_info['possible_prompt']}' response='{sft_info['possible_response']}'\")\n        print(f\"  Action: Apply mapping code (see below)\")\n    else:\n        print(f\"  Status: Cannot determine mapping - manual inspection needed\")\n    \n    # DPO\n    print(f\"\\n[DPO] {'✓ READY' if dpo_info['ready'] else '✗ NEEDS MAPPING' if dpo_info['can_map'] else '✗ INCOMPATIBLE'}\")\n    if dpo_info[\"ready\"]:\n        print(f\"  Reason: Dataset has 'prompt', 'chosen', 'rejected' fields\")\n        print(f\"  Action: Use directly with DPOTrainer\")\n    elif dpo_info[\"can_map\"]:\n        print(f\"  Detected: prompt='{dpo_info['prompt_col']}' chosen='{dpo_info['chosen_col']}' rejected='{dpo_info['rejected_col']}'\")\n        print(f\"  Action: Apply mapping code (see below)\")\n    else:\n        print(f\"  Status: Missing required fields (prompt + chosen + rejected)\")\n    \n    # GRPO\n    print(f\"\\n[GRPO] {'✓ READY' if grpo_info['ready'] else '✗ NEEDS MAPPING' if grpo_info['can_map'] else '✗ INCOMPATIBLE'}\")\n    if grpo_info[\"ready\"]:\n        print(f\"  Reason: Dataset has 'prompt' field\")\n        print(f\"  Action: Use directly with GRPOTrainer\")\n    elif grpo_info[\"can_map\"]:\n        print(f\"  Detected: prompt='{grpo_info['prompt_col']}'\")\n        print(f\"  Action: Apply mapping code (see below)\")\n    else:\n        print(f\"  Status: Missing prompt field\")\n    \n    # KTO\n    print(f\"\\n[KTO] {'✓ READY' if kto_info['ready'] else '✗ INCOMPATIBLE'}\")\n    if kto_info[\"ready\"]:\n        print(f\"  Reason: Dataset has 'prompt', 'completion', 'label' fields\")\n        print(f\"  Action: Use directly with KTOTrainer\")\n    else:\n        print(f\"  Status: Missing required fields (prompt + completion + label)\")\n    \n    # Mapping code\n    print(f\"\\n{'MAPPING CODE (if needed)':-<80}\")\n    \n    mapping_needed = False\n    \n    sft_mapping = generate_mapping_code(\"SFT\", sft_info)\n    if sft_mapping:\n        print(f\"\\n# For SFT Training:\")\n        print(sft_mapping)\n        mapping_needed = True\n    \n    dpo_mapping = generate_mapping_code(\"DPO\", dpo_info)\n    if dpo_mapping:\n        print(f\"\\n# For DPO Training:\")\n        print(dpo_mapping)\n        mapping_needed = True\n    \n    grpo_mapping = generate_mapping_code(\"GRPO\", grpo_info)\n    if grpo_mapping:\n        print(f\"\\n# For GRPO Training:\")\n        print(grpo_mapping)\n        mapping_needed = True\n    \n    if not mapping_needed:\n        print(\"\\nNo mapping needed - dataset is ready for training!\")\n    \n    print(f\"\\n{'SUMMARY':-<80}\")\n    print(f\"Recommended training methods: {', '.join(recommended) if recommended else 'None (dataset needs formatting)'}\")\n    print(f\"\\nNote: Used Datasets Server API (instant, no download required)\")\n    \n    print(\"\\n\" + \"=\" * 80)\n    sys.exit(0)\n\n\nif __name__ == \"__main__\":\n    try:\n        main()\n    except KeyboardInterrupt:\n        sys.exit(0)\n    except Exception as e:\n        print(f\"ERROR: {e}\", file=sys.stderr)\n        sys.exit(1)"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/estimate_cost.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"\nEstimate training time and cost for TRL jobs.\n\nUsage with uv:\n    uv run estimate_cost.py --model <model> --dataset <dataset> --hardware <flavor>\n    \nExample:\n    uv run estimate_cost.py --model Qwen/Qwen2.5-0.5B --dataset trl-lib/Capybara --hardware a10g-large\n\"\"\"\n\nimport argparse\n\n# Hardware costs per hour (approximate)\nHARDWARE_COSTS = {\n    \"t4-small\": 0.75,\n    \"t4-medium\": 1.50,\n    \"l4x1\": 2.50,\n    \"a10g-small\": 3.50,\n    \"a10g-large\": 5.00,\n    \"a10g-largex2\": 10.00,\n    \"a10g-largex4\": 20.00,\n    \"a100-large\": 10.00,\n}\n\n# Model sizes in billions of parameters\nMODEL_SIZES = {\n    \"0.5B\": 0.5,\n    \"1.5B\": 1.5,\n    \"3B\": 3,\n    \"7B\": 7,\n    \"13B\": 13,\n}\n\ndef estimate_training_time(model_params, dataset_size, epochs, hardware):\n    \"\"\"Estimate training time in hours.\"\"\"\n    # Rough estimates based on empirical observations\n    # These are approximations and actual times will vary\n    \n    base_time_per_1k_examples = 0.1  # hours for 1B model on a10g-large\n    \n    # Adjust for model size\n    time = base_time_per_1k_examples * model_params * (dataset_size / 1000) * epochs\n    \n    # Adjust for hardware (relative to a10g-large baseline)\n    hardware_multipliers = {\n        \"t4-small\": 2.0,\n        \"t4-medium\": 1.5,\n        \"l4x1\": 1.2,\n        \"a10g-small\": 1.3,\n        \"a10g-large\": 1.0,\n        \"a10g-largex2\": 0.6,\n        \"a10g-largex4\": 0.4,\n        \"a100-large\": 0.7,\n    }\n    \n    multiplier = hardware_multipliers.get(hardware, 1.0)\n    time *= multiplier\n    \n    return time\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Estimate training cost for TRL jobs\")\n    parser.add_argument(\"--model\", required=True, help=\"Model name or size (e.g., 'Qwen/Qwen2.5-0.5B' or '0.5B')\")\n    parser.add_argument(\"--dataset\", required=True, help=\"Dataset name\")\n    parser.add_argument(\"--hardware\", required=True, choices=HARDWARE_COSTS.keys(), help=\"Hardware flavor\")\n    parser.add_argument(\"--dataset-size\", type=int, help=\"Override dataset size (number of examples)\")\n    parser.add_argument(\"--epochs\", type=int, default=3, help=\"Number of training epochs\")\n    return parser.parse_args()\n\ndef extract_model_size(model_name):\n    \"\"\"Extract model size from name or return parsed value.\"\"\"\n    for size_str, size_val in MODEL_SIZES.items():\n        if size_str in model_name:\n            return size_val\n    \n    # Try to parse directly\n    try:\n        if \"B\" in model_name:\n            return float(model_name.replace(\"B\", \"\"))\n    except:\n        pass\n    \n    return 1.0  # Default to 1B if can't determine\n\ndef main():\n    args = parse_args()\n    \n    # Extract model parameters\n    model_params = extract_model_size(args.model)\n    print(f\"📊 Model: {args.model} (~{model_params}B parameters)\")\n    \n    # Estimate dataset size (would need to load to get real size)\n    if args.dataset_size:\n        dataset_size = args.dataset_size\n    else:\n        # Common dataset sizes (approximations)\n        dataset_sizes = {\n            \"trl-lib/Capybara\": 16000,\n            \"Anthropic/hh-rlhf\": 160000,\n        }\n        dataset_size = dataset_sizes.get(args.dataset, 10000)\n    \n    print(f\"📦 Dataset: {args.dataset} (~{dataset_size} examples)\")\n    print(f\"🔄 Epochs: {args.epochs}\")\n    print(f\"💻 Hardware: {args.hardware}\")\n    print()\n    \n    # Estimate training time\n    estimated_hours = estimate_training_time(model_params, dataset_size, args.epochs, args.hardware)\n    estimated_cost = estimated_hours * HARDWARE_COSTS[args.hardware]\n    \n    # Recommend timeout with buffer\n    recommended_timeout_hours = estimated_hours * 1.3  # 30% buffer\n    \n    print(f\"⏱️  Estimated training time: {estimated_hours:.1f} hours\")\n    print(f\"💰 Estimated cost: ${estimated_cost:.2f}\")\n    print(f\"⏰ Recommended timeout: {recommended_timeout_hours:.1f}h (with 30% buffer)\")\n    print()\n    \n    # Warnings and recommendations\n    if estimated_hours > 4:\n        print(\"⚠️  Long training time - consider:\")\n        print(\"   - Using faster hardware\")\n        print(\"   - Reducing epochs\")\n        print(\"   - Using a smaller dataset subset for testing\")\n    \n    if model_params >= 7 and args.hardware not in [\"a10g-largex2\", \"a10g-largex4\", \"a100-large\"]:\n        print(\"⚠️  Large model - consider using:\")\n        print(\"   - Larger GPU (a100-large)\")\n        print(\"   - Multi-GPU setup (a10g-largex2 or a10g-largex4)\")\n        print(\"   - LoRA/PEFT for memory efficiency\")\n    \n    print()\n    print(\"📋 Example job configuration:\")\n    print(f\"\"\"\nhf_jobs(\"uv\", {{\n    \"script\": \"your_training_script.py\",\n    \"flavor\": \"{args.hardware}\",\n    \"timeout\": \"{recommended_timeout_hours:.0f}h\",\n    \"secrets\": {{\"HF_TOKEN\": \"$HF_TOKEN\"}}\n}})\n\"\"\")\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/train_dpo_example.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"trl>=0.12.0\",\n#     \"transformers>=4.36.0\",\n#     \"accelerate>=0.24.0\",\n#     \"trackio\",\n# ]\n# ///\n\n\"\"\"\nProduction-ready DPO training example for preference learning.\n\nDPO (Direct Preference Optimization) trains models on preference pairs\n(chosen vs rejected responses) without requiring a reward model.\n\nUsage with hf_jobs MCP tool:\n    hf_jobs(\"uv\", {\n        \"script\": '''<paste this entire file>''',\n        \"flavor\": \"a10g-large\",\n        \"timeout\": \"3h\",\n        \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n    })\n\nOr submit the script content directly inline without saving to a file.\n\"\"\"\n\nimport trackio\nfrom datasets import load_dataset\nfrom trl import DPOTrainer, DPOConfig\n\n\n# Load preference dataset\nprint(\"📦 Loading dataset...\")\ndataset = load_dataset(\"trl-lib/ultrafeedback_binarized\", split=\"train\")\nprint(f\"✅ Dataset loaded: {len(dataset)} preference pairs\")\n\n# Create train/eval split\nprint(\"🔀 Creating train/eval split...\")\ndataset_split = dataset.train_test_split(test_size=0.1, seed=42)\ntrain_dataset = dataset_split[\"train\"]\neval_dataset = dataset_split[\"test\"]\nprint(f\"   Train: {len(train_dataset)} pairs\")\nprint(f\"   Eval: {len(eval_dataset)} pairs\")\n\n# Training configuration\nconfig = DPOConfig(\n    # CRITICAL: Hub settings\n    output_dir=\"qwen-dpo-aligned\",\n    push_to_hub=True,\n    hub_model_id=\"username/qwen-dpo-aligned\",\n    hub_strategy=\"every_save\",\n\n    # DPO-specific parameters\n    beta=0.1,  # KL penalty coefficient (higher = stay closer to reference)\n\n    # Training parameters\n    num_train_epochs=1,  # DPO typically needs fewer epochs than SFT\n    per_device_train_batch_size=4,\n    gradient_accumulation_steps=4,\n    learning_rate=5e-7,  # DPO uses much lower LR than SFT\n    # max_length=1024,  # Default - only set if you need different sequence length\n\n    # Logging & checkpointing\n    logging_steps=10,\n    save_strategy=\"steps\",\n    save_steps=100,\n    save_total_limit=2,\n\n    # Evaluation - IMPORTANT: Only enable if eval_dataset provided\n    eval_strategy=\"steps\",\n    eval_steps=100,\n\n    # Optimization\n    warmup_ratio=0.1,\n    lr_scheduler_type=\"cosine\",\n\n    # Monitoring\n    report_to=\"trackio\",  # Integrate with Trackio\n    project=\"meaningful_project_name\", # project name for the training name (trackio)\n    run_name=\"baseline-run\", #Descriptive name for this training run\n\n)\n\n# Initialize and train\n# Note: DPO requires an instruct-tuned model as the base\nprint(\"🎯 Initializing trainer...\")\ntrainer = DPOTrainer(\n    model=\"Qwen/Qwen2.5-0.5B-Instruct\",  # Use instruct model, not base model\n    train_dataset=train_dataset,\n    eval_dataset=eval_dataset,  # CRITICAL: Must provide eval_dataset when eval_strategy is enabled\n    args=config,\n)\n\nprint(\"🚀 Starting DPO training...\")\ntrainer.train()\n\nprint(\"💾 Pushing to Hub...\")\ntrainer.push_to_hub()\n\n# Finish Trackio tracking\ntrackio.finish()\n\nprint(\"✅ Complete! Model at: https://huggingface.co/username/qwen-dpo-aligned\")\nprint(\"📊 View metrics at: https://huggingface.co/spaces/username/trackio\")"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/train_grpo_example.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"trl>=0.12.0\",\n#     \"transformers>=4.36.0\",\n#     \"accelerate>=0.24.0\",\n#     \"trackio\",\n# ]\n# ///\n\n\"\"\"\nProduction-ready GRPO training example for online RL.\n\nGRPO (Group Relative Policy Optimization) is an online RL method that\noptimizes relative to group performance. Best for tasks with automatic\nreward signals like code execution or math verification.\n\nUsage with hf_jobs MCP tool:\n    hf_jobs(\"uv\", {\n        \"script\": '''<paste this entire file>''',\n        \"flavor\": \"a10g-large\",\n        \"timeout\": \"4h\",\n        \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n    })\n\nOr submit the script content directly inline without saving to a file.\n\nNote: For most GRPO use cases, the TRL maintained script is recommended:\n    https://raw.githubusercontent.com/huggingface/trl/main/examples/scripts/grpo.py\n\"\"\"\n\nimport trackio\nfrom datasets import load_dataset\nfrom trl import GRPOTrainer, GRPOConfig\n\n\n# Load dataset (GRPO uses prompt-only format)\ndataset = load_dataset(\"trl-lib/math_shepherd\", split=\"train\")\nprint(f\"✅ Dataset loaded: {len(dataset)} prompts\")\n\n# Training configuration\nconfig = GRPOConfig(\n    # CRITICAL: Hub settings\n    output_dir=\"qwen-grpo-math\",\n    push_to_hub=True,\n    hub_model_id=\"username/qwen-grpo-math\",\n    hub_strategy=\"every_save\",\n\n    # Training parameters\n    num_train_epochs=1,\n    per_device_train_batch_size=4,\n    gradient_accumulation_steps=4,\n    learning_rate=1e-6,\n\n    # Logging & checkpointing\n    logging_steps=10,\n    save_strategy=\"steps\",\n    save_steps=100,\n    save_total_limit=2,\n\n    # Optimization\n    warmup_ratio=0.1,\n    lr_scheduler_type=\"cosine\",\n\n    # Monitoring\n    report_to=\"trackio\",  # Integrate with Trackio\n    project=\"meaningful_project_name\", # project name for the training name (trackio)\n    run_name=\"baseline-run\", #Descriptive name for this training run\n\n)\n\n# Initialize and train\n# Note: GRPO requires an instruct-tuned model as the base\ntrainer = GRPOTrainer(\n    model=\"Qwen/Qwen2.5-0.5B-Instruct\",\n    train_dataset=dataset,\n    args=config,\n)\n\nprint(\"🚀 Starting GRPO training...\")\ntrainer.train()\n\nprint(\"💾 Pushing to Hub...\")\ntrainer.push_to_hub()\n\n\nprint(\"✅ Complete! Model at: https://huggingface.co/username/qwen-grpo-math\")\nprint(\"📊 View metrics at: https://huggingface.co/spaces/username/trackio\")"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/train_sft_example.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"trl>=0.12.0\",\n#     \"peft>=0.7.0\",\n#     \"transformers>=4.36.0\",\n#     \"accelerate>=0.24.0\",\n#     \"trackio\",\n# ]\n# ///\n\n\"\"\"\nProduction-ready SFT training example with all best practices.\n\nThis script demonstrates:\n- Trackio integration for real-time monitoring\n- LoRA/PEFT for efficient training\n- Proper Hub saving configuration\n- Train/eval split for monitoring\n- Checkpoint management\n- Optimized training parameters\n\nUsage with hf_jobs MCP tool:\n    hf_jobs(\"uv\", {\n        \"script\": '''<paste this entire file>''',\n        \"flavor\": \"a10g-large\",\n        \"timeout\": \"3h\",\n        \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"},\n    })\n\nOr submit the script content directly inline without saving to a file.\n\"\"\"\n\nimport trackio\nfrom datasets import load_dataset\nfrom peft import LoraConfig\nfrom trl import SFTTrainer, SFTConfig\n\n\n# Load dataset\nprint(\"📦 Loading dataset...\")\ndataset = load_dataset(\"trl-lib/Capybara\", split=\"train\")\nprint(f\"✅ Dataset loaded: {len(dataset)} examples\")\n\n# Create train/eval split\nprint(\"🔀 Creating train/eval split...\")\ndataset_split = dataset.train_test_split(test_size=0.1, seed=42)\ntrain_dataset = dataset_split[\"train\"]\neval_dataset = dataset_split[\"test\"]\nprint(f\"   Train: {len(train_dataset)} examples\")\nprint(f\"   Eval: {len(eval_dataset)} examples\")\n\n# Note: For memory-constrained demos, skip eval by using full dataset as train_dataset\n# and removing eval_dataset, eval_strategy, and eval_steps from config below\n\n# Training configuration\nconfig = SFTConfig(\n    # CRITICAL: Hub settings\n    output_dir=\"qwen-capybara-sft\",\n    push_to_hub=True,\n    hub_model_id=\"username/qwen-capybara-sft\",\n    hub_strategy=\"every_save\",  # Push checkpoints\n\n    # Training parameters\n    num_train_epochs=3,\n    per_device_train_batch_size=4,\n    gradient_accumulation_steps=4,\n    learning_rate=2e-5,\n    # max_length=1024,  # Default - only set if you need different sequence length\n\n    # Logging & checkpointing\n    logging_steps=10,\n    save_strategy=\"steps\",\n    save_steps=100,\n    save_total_limit=2,\n\n    # Evaluation - IMPORTANT: Only enable if eval_dataset provided\n    eval_strategy=\"steps\",\n    eval_steps=100,\n\n    # Optimization\n    warmup_ratio=0.1,\n    lr_scheduler_type=\"cosine\",\n\n    # Monitoring\n    report_to=\"trackio\",  # Integrate with Trackio\n    project=\"meaningful_project_name\", # project name for the training name (trackio)\n    run_name=\"baseline-run\", #Descriptive name for this training run\n)\n\n# LoRA configuration\npeft_config = LoraConfig(\n    r=16,\n    lora_alpha=32,\n    lora_dropout=0.05,\n    bias=\"none\",\n    task_type=\"CAUSAL_LM\",\n    target_modules=[\"q_proj\", \"v_proj\"],\n)\n\n# Initialize and train\nprint(\"🎯 Initializing trainer...\")\ntrainer = SFTTrainer(\n    model=\"Qwen/Qwen2.5-0.5B\",\n    train_dataset=train_dataset,\n    eval_dataset=eval_dataset,  # CRITICAL: Must provide eval_dataset when eval_strategy is enabled\n    args=config,\n    peft_config=peft_config,\n)\n\nprint(\"🚀 Starting training...\")\ntrainer.train()\n\nprint(\"💾 Pushing to Hub...\")\ntrainer.push_to_hub()\n\n# Finish Trackio tracking\ntrackio.finish()\n\nprint(\"✅ Complete! Model at: https://huggingface.co/username/qwen-capybara-sft\")\nprint(\"📊 View metrics at: https://huggingface.co/spaces/username/trackio\")"
  },
  {
    "path": "skills/hugging-face-model-trainer/scripts/unsloth_sft_example.py",
    "content": "# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"unsloth\",\n#     \"datasets\",\n#     \"trl==0.22.2\",\n#     \"huggingface_hub[hf_transfer]\",\n#     \"trackio\",\n#     \"tensorboard\",\n#     \"transformers==4.57.3\",\n# ]\n# ///\n\"\"\"\nFine-tune LLMs using Unsloth optimizations for ~60% less VRAM and 2x faster training.\n\nSupports epoch-based or step-based training with optional eval split.\nDefault model: LFM2.5-1.2B-Instruct (Liquid Foundation Model).\n\nEpoch-based training (recommended for full datasets):\n    uv run unsloth_sft_example.py \\\n        --dataset mlabonne/FineTome-100k \\\n        --num-epochs 1 \\\n        --eval-split 0.2 \\\n        --output-repo your-username/model-finetuned\n\nRun on HF Jobs (1 epoch with eval):\n    hf jobs uv run unsloth_sft_example.py \\\n        --flavor a10g-small --secrets HF_TOKEN --timeout 4h \\\n        -- --dataset mlabonne/FineTome-100k \\\n           --num-epochs 1 \\\n           --eval-split 0.2 \\\n           --output-repo your-username/model-finetuned\n\nStep-based training (for quick tests):\n    uv run unsloth_sft_example.py \\\n        --dataset mlabonne/FineTome-100k \\\n        --max-steps 500 \\\n        --output-repo your-username/model-finetuned\n\"\"\"\n\nimport argparse\nimport logging\nimport os\nimport sys\nimport time\n\n# Force unbuffered output for HF Jobs logs\nsys.stdout.reconfigure(line_buffering=True)\nsys.stderr.reconfigure(line_buffering=True)\n\nlogging.basicConfig(\n    level=logging.INFO,\n    format=\"%(asctime)s - %(levelname)s - %(message)s\",\n)\nlogger = logging.getLogger(__name__)\n\n\ndef check_cuda():\n    \"\"\"Check CUDA availability and exit if not available.\"\"\"\n    import torch\n\n    if not torch.cuda.is_available():\n        logger.error(\"CUDA is not available. This script requires a GPU.\")\n        logger.error(\"Run on a machine with a CUDA-capable GPU or use HF Jobs:\")\n        logger.error(\n            \"  hf jobs uv run unsloth_sft_example.py --flavor a10g-small ...\"\n        )\n        sys.exit(1)\n    logger.info(f\"CUDA available: {torch.cuda.get_device_name(0)}\")\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(\n        description=\"Fine-tune LLMs with Unsloth optimizations\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=\"\"\"\nExamples:\n  # Quick test run\n  uv run unsloth_sft_example.py \\\\\n      --dataset mlabonne/FineTome-100k \\\\\n      --max-steps 50 \\\\\n      --output-repo username/model-test\n\n  # Full training with eval\n  uv run unsloth_sft_example.py \\\\\n      --dataset mlabonne/FineTome-100k \\\\\n      --num-epochs 1 \\\\\n      --eval-split 0.2 \\\\\n      --output-repo username/model-finetuned\n\n  # With Trackio monitoring\n  uv run unsloth_sft_example.py \\\\\n      --dataset mlabonne/FineTome-100k \\\\\n      --num-epochs 1 \\\\\n      --output-repo username/model-finetuned \\\\\n      --trackio-space username/trackio\n        \"\"\",\n    )\n\n    # Model and data\n    parser.add_argument(\n        \"--base-model\",\n        default=\"LiquidAI/LFM2.5-1.2B-Instruct\",\n        help=\"Base model (default: LiquidAI/LFM2.5-1.2B-Instruct)\",\n    )\n    parser.add_argument(\n        \"--dataset\",\n        required=True,\n        help=\"Dataset in ShareGPT/conversation format (e.g., mlabonne/FineTome-100k)\",\n    )\n    parser.add_argument(\n        \"--output-repo\",\n        required=True,\n        help=\"HF Hub repo to push model to (e.g., 'username/model-finetuned')\",\n    )\n\n    # Training config\n    parser.add_argument(\n        \"--num-epochs\",\n        type=float,\n        default=None,\n        help=\"Number of epochs (default: None). Use instead of --max-steps.\",\n    )\n    parser.add_argument(\n        \"--max-steps\",\n        type=int,\n        default=None,\n        help=\"Training steps (default: None). Use for quick tests or streaming.\",\n    )\n    parser.add_argument(\n        \"--batch-size\",\n        type=int,\n        default=2,\n        help=\"Per-device batch size (default: 2)\",\n    )\n    parser.add_argument(\n        \"--gradient-accumulation\",\n        type=int,\n        default=4,\n        help=\"Gradient accumulation steps (default: 4). Effective batch = batch-size * this\",\n    )\n    parser.add_argument(\n        \"--learning-rate\",\n        type=float,\n        default=2e-4,\n        help=\"Learning rate (default: 2e-4)\",\n    )\n    parser.add_argument(\n        \"--max-seq-length\",\n        type=int,\n        default=2048,\n        help=\"Maximum sequence length (default: 2048)\",\n    )\n\n    # LoRA config\n    parser.add_argument(\n        \"--lora-r\",\n        type=int,\n        default=16,\n        help=\"LoRA rank (default: 16). Higher = more capacity but more VRAM\",\n    )\n    parser.add_argument(\n        \"--lora-alpha\",\n        type=int,\n        default=16,\n        help=\"LoRA alpha (default: 16). Same as r per Unsloth recommendation\",\n    )\n\n    # Logging\n    parser.add_argument(\n        \"--trackio-space\",\n        default=None,\n        help=\"HF Space for Trackio dashboard (e.g., 'username/trackio')\",\n    )\n    parser.add_argument(\n        \"--run-name\",\n        default=None,\n        help=\"Custom run name for Trackio (default: auto-generated)\",\n    )\n    parser.add_argument(\n        \"--save-local\",\n        default=\"unsloth-output\",\n        help=\"Local directory to save model (default: unsloth-output)\",\n    )\n\n    # Evaluation and data control\n    parser.add_argument(\n        \"--eval-split\",\n        type=float,\n        default=0.0,\n        help=\"Fraction of data for evaluation (0.0-0.5). Default: 0.0 (no eval)\",\n    )\n    parser.add_argument(\n        \"--num-samples\",\n        type=int,\n        default=None,\n        help=\"Limit samples (default: None = use all)\",\n    )\n    parser.add_argument(\n        \"--seed\",\n        type=int,\n        default=3407,\n        help=\"Random seed for reproducibility (default: 3407)\",\n    )\n    parser.add_argument(\n        \"--merge-model\",\n        action=\"store_true\",\n        default=False,\n        help=\"Merge LoRA weights into base model before uploading (larger file, easier to use)\",\n    )\n\n    return parser.parse_args()\n\n\ndef main():\n    args = parse_args()\n\n    # Validate epochs/steps configuration\n    if not args.num_epochs and not args.max_steps:\n        args.num_epochs = 1\n        logger.info(\"Using default --num-epochs=1\")\n\n    # Determine training duration display\n    if args.num_epochs:\n        duration_str = f\"{args.num_epochs} epoch(s)\"\n    else:\n        duration_str = f\"{args.max_steps} steps\"\n\n    print(\"=\" * 70)\n    print(\"LLM Fine-tuning with Unsloth\")\n    print(\"=\" * 70)\n    print(\"\\nConfiguration:\")\n    print(f\"  Base model:      {args.base_model}\")\n    print(f\"  Dataset:         {args.dataset}\")\n    print(f\"  Num samples:     {args.num_samples or 'all'}\")\n    print(f\"  Eval split:      {args.eval_split if args.eval_split > 0 else '(disabled)'}\")\n    print(f\"  Seed:            {args.seed}\")\n    print(f\"  Training:        {duration_str}\")\n    print(f\"  Batch size:      {args.batch_size} x {args.gradient_accumulation} = {args.batch_size * args.gradient_accumulation}\")\n    print(f\"  Learning rate:   {args.learning_rate}\")\n    print(f\"  LoRA rank:       {args.lora_r}\")\n    print(f\"  Max seq length:  {args.max_seq_length}\")\n    print(f\"  Output repo:     {args.output_repo}\")\n    print(f\"  Trackio space:   {args.trackio_space or '(not configured)'}\")\n    print()\n\n    # Check CUDA before heavy imports\n    check_cuda()\n\n    # Enable fast transfers\n    os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\"\n\n    # Set Trackio space if provided\n    if args.trackio_space:\n        os.environ[\"TRACKIO_SPACE_ID\"] = args.trackio_space\n        logger.info(f\"Trackio dashboard: https://huggingface.co/spaces/{args.trackio_space}\")\n\n    # Import heavy dependencies\n    from unsloth import FastLanguageModel\n    from unsloth.chat_templates import standardize_data_formats, train_on_responses_only\n    from datasets import load_dataset\n    from trl import SFTTrainer, SFTConfig\n    from huggingface_hub import login\n\n    # Login to Hub\n    token = os.environ.get(\"HF_TOKEN\") or os.environ.get(\"hfjob\")\n    if token:\n        login(token=token)\n        logger.info(\"Logged in to Hugging Face Hub\")\n    else:\n        logger.warning(\"HF_TOKEN not set - model upload may fail\")\n\n    # 1. Load model\n    print(\"\\n[1/5] Loading model...\")\n    start = time.time()\n\n    model, tokenizer = FastLanguageModel.from_pretrained(\n        model_name=args.base_model,\n        max_seq_length=args.max_seq_length,\n        load_in_4bit=False,\n        load_in_8bit=False,\n        load_in_16bit=True,\n        full_finetuning=False,\n    )\n\n    # Add LoRA adapters\n    model = FastLanguageModel.get_peft_model(\n        model,\n        r=args.lora_r,\n        target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\", \"in_proj\", \"w1\", \"w2\", \"w3\"],\n        lora_alpha=args.lora_alpha,\n        lora_dropout=0,\n        bias=\"none\",\n        use_gradient_checkpointing=\"unsloth\",\n        random_state=args.seed,\n        use_rslora=False,\n        loftq_config=None,\n    )\n    print(f\"Model loaded in {time.time() - start:.1f}s\")\n\n    # 2. Load and prepare dataset\n    print(\"\\n[2/5] Loading dataset...\")\n    start = time.time()\n\n    dataset = load_dataset(args.dataset, split=\"train\")\n    print(f\"  Dataset has {len(dataset)} total samples\")\n\n    if args.num_samples:\n        dataset = dataset.select(range(min(args.num_samples, len(dataset))))\n        print(f\"  Limited to {len(dataset)} samples\")\n\n    # Auto-detect and normalize conversation column\n    for col in [\"messages\", \"conversations\", \"conversation\"]:\n        if col in dataset.column_names and isinstance(dataset[0][col], list):\n            if col != \"conversations\":\n                dataset = dataset.rename_column(col, \"conversations\")\n            break\n    dataset = standardize_data_formats(dataset)\n\n    # Apply chat template\n    def formatting_prompts_func(examples):\n        texts = tokenizer.apply_chat_template(\n            examples[\"conversations\"],\n            tokenize=False,\n            add_generation_prompt=False,\n        )\n        # Remove BOS token to avoid duplicates\n        return {\"text\": [x.removeprefix(tokenizer.bos_token) for x in texts]}\n\n    dataset = dataset.map(formatting_prompts_func, batched=True)\n\n    # Split for evaluation if requested\n    if args.eval_split > 0:\n        split = dataset.train_test_split(test_size=args.eval_split, seed=args.seed)\n        train_data = split[\"train\"]\n        eval_data = split[\"test\"]\n        print(f\"  Train: {len(train_data)} samples, Eval: {len(eval_data)} samples\")\n    else:\n        train_data = dataset\n        eval_data = None\n\n    print(f\"  Dataset ready in {time.time() - start:.1f}s\")\n\n    # 3. Configure trainer\n    print(\"\\n[3/5] Configuring trainer...\")\n\n    # Calculate steps per epoch for logging/eval intervals\n    effective_batch = args.batch_size * args.gradient_accumulation\n    num_samples = len(train_data)\n    steps_per_epoch = num_samples // effective_batch\n\n    # Determine run name and logging steps\n    if args.run_name:\n        run_name = args.run_name\n    elif args.num_epochs:\n        run_name = f\"unsloth-sft-{args.num_epochs}ep\"\n    else:\n        run_name = f\"unsloth-sft-{args.max_steps}steps\"\n\n    if args.num_epochs:\n        logging_steps = max(1, steps_per_epoch // 10)\n        save_steps = max(1, steps_per_epoch // 4)\n    else:\n        logging_steps = max(1, args.max_steps // 20)\n        save_steps = max(1, args.max_steps // 4)\n\n    # Determine reporting backend\n    if args.trackio_space:\n        report_to = [\"tensorboard\", \"trackio\"]\n    else:\n        report_to = [\"tensorboard\"]\n\n    training_config = SFTConfig(\n        output_dir=args.save_local,\n        dataset_text_field=\"text\",\n        per_device_train_batch_size=args.batch_size,\n        gradient_accumulation_steps=args.gradient_accumulation,\n        warmup_steps=5,\n        num_train_epochs=args.num_epochs if args.num_epochs else 1,\n        max_steps=args.max_steps if args.max_steps else -1,\n        learning_rate=args.learning_rate,\n        logging_steps=logging_steps,\n        optim=\"adamw_8bit\",\n        weight_decay=0.01,\n        lr_scheduler_type=\"linear\",\n        seed=args.seed,\n        max_length=args.max_seq_length,\n        report_to=report_to,\n        run_name=run_name,\n        push_to_hub=True,\n        hub_model_id=args.output_repo,\n        save_steps=save_steps,\n        save_total_limit=3,\n    )\n\n    # Add evaluation config if eval is enabled\n    if eval_data:\n        if args.num_epochs:\n            training_config.eval_strategy = \"epoch\"\n            print(\"  Evaluation enabled: every epoch\")\n        else:\n            training_config.eval_strategy = \"steps\"\n            training_config.eval_steps = max(1, args.max_steps // 5)\n            print(f\"  Evaluation enabled: every {training_config.eval_steps} steps\")\n\n    trainer = SFTTrainer(\n        model=model,\n        tokenizer=tokenizer,\n        train_dataset=train_data,\n        eval_dataset=eval_data,\n        args=training_config,\n    )\n\n    # Train on responses only (mask user inputs)\n    trainer = train_on_responses_only(\n        trainer,\n        instruction_part=\"<|im_start|>user\\n\",\n        response_part=\"<|im_start|>assistant\\n\",\n    )\n\n    # 4. Train\n    print(f\"\\n[4/5] Training for {duration_str}...\")\n    if args.num_epochs:\n        print(f\"  (~{steps_per_epoch} steps/epoch, {int(steps_per_epoch * args.num_epochs)} total steps)\")\n    start = time.time()\n\n    train_result = trainer.train()\n\n    train_time = time.time() - start\n    total_steps = train_result.metrics.get(\"train_steps\", args.max_steps or steps_per_epoch * args.num_epochs)\n    print(f\"\\nTraining completed in {train_time / 60:.1f} minutes\")\n    print(f\"  Speed: {total_steps / train_time:.2f} steps/s\")\n\n    # Print training metrics\n    train_loss = train_result.metrics.get(\"train_loss\")\n    if train_loss:\n        print(f\"  Final train loss: {train_loss:.4f}\")\n\n    # Print eval results if eval was enabled\n    if eval_data:\n        print(\"\\nRunning final evaluation...\")\n        try:\n            eval_results = trainer.evaluate()\n            eval_loss = eval_results.get(\"eval_loss\")\n            if eval_loss:\n                print(f\"  Final eval loss: {eval_loss:.4f}\")\n                if train_loss:\n                    ratio = eval_loss / train_loss\n                    if ratio > 1.5:\n                        print(f\"  Warning: Eval loss is {ratio:.1f}x train loss - possible overfitting\")\n                    else:\n                        print(f\"  Eval/train ratio: {ratio:.2f} - model generalizes well\")\n        except Exception as e:\n            print(f\"  Warning: Final evaluation failed: {e}\")\n            print(\"  Continuing to save model...\")\n\n    # 5. Save and push\n    print(\"\\n[5/5] Saving model...\")\n\n    if args.merge_model:\n        print(\"Merging LoRA weights into base model...\")\n        print(f\"\\nPushing merged model to {args.output_repo}...\")\n        model.push_to_hub_merged(\n            args.output_repo,\n            tokenizer=tokenizer,\n            save_method=\"merged_16bit\",\n        )\n        print(f\"Merged model available at: https://huggingface.co/{args.output_repo}\")\n    else:\n        model.save_pretrained(args.save_local)\n        tokenizer.save_pretrained(args.save_local)\n        print(f\"Saved locally to {args.save_local}/\")\n\n        print(f\"\\nPushing adapter to {args.output_repo}...\")\n        model.push_to_hub(args.output_repo, tokenizer=tokenizer)\n        print(f\"Adapter available at: https://huggingface.co/{args.output_repo}\")\n\n    print(\"\\n\" + \"=\" * 70)\n    print(\"Done!\")\n    print(\"=\" * 70)\n\n\nif __name__ == \"__main__\":\n    if len(sys.argv) == 1:\n        print(\"=\" * 70)\n        print(\"LLM Fine-tuning with Unsloth\")\n        print(\"=\" * 70)\n        print(\"\\nFine-tune language models with optional train/eval split.\")\n        print(\"\\nFeatures:\")\n        print(\"  - ~60% less VRAM with Unsloth optimizations\")\n        print(\"  - 2x faster training vs standard methods\")\n        print(\"  - Epoch-based or step-based training\")\n        print(\"  - Optional evaluation to detect overfitting\")\n        print(\"  - Trains only on assistant responses (masked user inputs)\")\n        print(\"\\nEpoch-based training:\")\n        print(\"\\n  uv run unsloth_sft_example.py \\\\\")\n        print(\"      --dataset mlabonne/FineTome-100k \\\\\")\n        print(\"      --num-epochs 1 \\\\\")\n        print(\"      --eval-split 0.2 \\\\\")\n        print(\"      --output-repo your-username/model-finetuned\")\n        print(\"\\nHF Jobs example:\")\n        print(\"\\n  hf jobs uv run unsloth_sft_example.py \\\\\")\n        print(\"      --flavor a10g-small --secrets HF_TOKEN --timeout 4h \\\\\")\n        print(\"      -- --dataset mlabonne/FineTome-100k \\\\\")\n        print(\"         --num-epochs 1 \\\\\")\n        print(\"         --eval-split 0.2 \\\\\")\n        print(\"         --output-repo your-username/model-finetuned\")\n        print(\"\\nFor full help: uv run unsloth_sft_example.py --help\")\n        print(\"=\" * 70)\n        sys.exit(0)\n\n    main()\n"
  },
  {
    "path": "skills/hugging-face-paper-pages/SKILL.md",
    "content": "---\nname: hugging-face-paper-pages\ndescription: Look up and read Hugging Face paper pages in markdown, and use the papers API for structured metadata such as authors, linked models/datasets/spaces, Github repo and project page. Use when the user shares a Hugging Face paper page URL, an arXiv URL or ID, or asks to summarize, explain, or analyze an AI research paper.\n---\n\n# Hugging Face Paper Pages\n\nHugging Face Paper pages (hf.co/papers) is a platform built on top of arXiv (arxiv.org), specifically for research papers in the field of artificial intelligence (AI) and computer science. Hugging Face users can submit their paper at hf.co/papers/submit, which features it on the Daily Papers feed (hf.co/papers). Each day, users can upvote papers and comment on papers. Each paper page allows authors to:\n- claim their paper (by clicking their name on the `authors` field). This makes the paper page appear on their Hugging Face profile.\n- link the associated model checkpoints, datasets and Spaces by including the HF paper or arXiv URL in the model card, dataset card or README of the Space\n- link the Github repository and/or project page URLs\n- link the HF organization. This also makes the paper page appear on the Hugging Face organization page.\n\nWhenever someone mentions a HF paper or arXiv abstract/PDF URL in a model card, dataset card or README of a Space repository, the paper will be automatically indexed. Note that not all papers indexed on Hugging Face are also submitted to daily papers. The latter is more a manner of promoting a research paper. Papers can only be submitted to daily papers up until 14 days after their publication date on arXiv.\n\nThe Hugging Face team has built an easy-to-use API to interact with paper pages. Content of the papers can be fetched as markdown, or structured metadata can be returned such as author names, linked models/datasets/spaces, linked Github repo and project page.\n\n## When to Use\n\n- User shares a Hugging Face paper page URL (e.g. `https://huggingface.co/papers/2602.08025`)\n- User shares a Hugging Face markdown paper page URL (e.g. `https://huggingface.co/papers/2602.08025.md`)\n- User shares an arXiv URL (e.g. `https://arxiv.org/abs/2602.08025` or  `https://arxiv.org/pdf/2602.08025`)\n- User mentions a arXiv ID (e.g. `2602.08025`)\n- User asks you to summarize, explain, or analyze an AI research paper\n\n## Parsing the paper ID\n\nIt's recommended to parse the paper ID (arXiv ID) from whatever the user provides:\n\n| Input | Paper ID |\n| --- | --- |\n| `https://huggingface.co/papers/2602.08025` | `2602.08025` |\n| `https://huggingface.co/papers/2602.08025.md` | `2602.08025` |\n| `https://arxiv.org/abs/2602.08025` | `2602.08025` |\n| `https://arxiv.org/pdf/2602.08025` | `2602.08025` |\n| `2602.08025v1` | `2602.08025v1` |\n| `2602.08025` | `2602.08025` |\n\nThis allows you to provide the paper ID into any of the hub API endpoints mentioned below.\n\n### Fetch the paper page as markdown\n\nThe content of a paper can be fetched as markdown like so:\n\n```bash\ncurl -s \"https://huggingface.co/papers/{PAPER_ID}.md\"\n```\n\nThis should return the Hugging Face paper page as markdown. This relies on the HTML version of the paper at https://arxiv.org/html/{PAPER_ID}.\n\nThere are 2 exceptions:\n- Not all arXiv papers have an HTML version. If the HTML version of the paper does not exist, then the content falls back to the HTML of the Hugging Face paper page.\n- If it results in a 404, it means the paper is not yet indexed on hf.co/papers. See [Error handling](#error-handling) for info.\n\nAlternatively, you can request markdown from the normal paper page URL, like so:\n\n```bash\ncurl -s -H \"Accept: text/markdown\" \"https://huggingface.co/papers/{PAPER_ID}\"\n```\n\n### Paper Pages API Endpoints\n\nAll endpoints use the base URL `https://huggingface.co`.\n\n#### Get structured metadata\n\nFetch the paper metadata as JSON using the Hugging Face REST API:\n\n```bash\ncurl -s \"https://huggingface.co/api/papers/{PAPER_ID}\"\n```\n\nThis returns structured metadata that can include:\n\n- authors (names and Hugging Face usernames, in case they have claimed the paper)\n- media URLs (uploaded when submitting the paper to Daily Papers)\n- summary (abstract) and AI-generated summary\n- project page and GitHub repository\n- organization and engagement metadata (number of upvotes)\n\nTo find models linked to the paper, use:\n\n```bash\ncurl https://huggingface.co/api/models?filter=arxiv:{PAPER_ID}\n```\n\nTo find datasets linked to the paper, use:\n\n```bash\ncurl https://huggingface.co/api/datasets?filter=arxiv:{PAPER_ID}\n```\n\nTo find spaces linked to the paper, use:\n\n```bash\ncurl https://huggingface.co/api/spaces?filter=arxiv:{PAPER_ID}\n```\n\n#### Claim paper authorship\n\nClaim authorship of a paper for a Hugging Face user:\n\n```bash\ncurl \"https://huggingface.co/api/settings/papers/claim\" \\\n  --request POST \\\n  --header \"Content-Type: application/json\" \\\n  --header \"Authorization: Bearer $HF_TOKEN\" \\\n  --data '{\n    \"paperId\": \"{PAPER_ID}\",\n    \"claimAuthorId\": \"{AUTHOR_ENTRY_ID}\",\n    \"targetUserId\": \"{USER_ID}\"\n  }'\n```\n\n- Endpoint: `POST /api/settings/papers/claim`\n- Body:\n  - `paperId` (string, required): arXiv paper identifier being claimed\n  - `claimAuthorId` (string): author entry on the paper being claimed, 24-char hex ID\n  - `targetUserId` (string): HF user who should receive the claim, 24-char hex ID\n- Response: paper authorship claim result, including the claimed paper ID\n\n#### Get daily papers\n\nFetch the Daily Papers feed:\n\n```bash\ncurl -s -H \"Authorization: Bearer $HF_TOKEN\" \\\n  \"https://huggingface.co/api/daily_papers?p=0&limit=20&date=2017-07-21&sort=publishedAt\"\n```\n\n- Endpoint: `GET /api/daily_papers`\n- Query parameters:\n  - `p` (integer): page number\n  - `limit` (integer): number of results, between 1 and 100\n  - `date` (string): RFC 3339 full-date, for example `2017-07-21`\n  - `week` (string): ISO week, for example `2024-W03`\n  - `month` (string): month value, for example `2024-01`\n  - `submitter` (string): filter by submitter\n  - `sort` (enum): `publishedAt` or `trending`\n- Response: list of daily papers\n\n#### List papers\n\nList arXiv papers sorted by published date:\n\n```bash\ncurl -s -H \"Authorization: Bearer $HF_TOKEN\" \\\n  \"https://huggingface.co/api/papers?cursor={CURSOR}&limit=20\"\n```\n\n- Endpoint: `GET /api/papers`\n- Query parameters:\n  - `cursor` (string): pagination cursor\n  - `limit` (integer): number of results, between 1 and 100\n- Response: list of papers\n\n#### Search papers\n\nPerform hybrid semantic and full-text search on papers:\n\n```bash\ncurl -s -H \"Authorization: Bearer $HF_TOKEN\" \\\n  \"https://huggingface.co/api/papers/search?q=vision+language&limit=20\"\n```\n\nThis searches over the paper title, authors, and content.\n\n- Endpoint: `GET /api/papers/search`\n- Query parameters:\n  - `q` (string): search query, max length 250\n  - `limit` (integer): number of results, between 1 and 120\n- Response: matching papers\n\n#### Index a paper\n\nInsert a paper from arXiv by ID. If the paper is already indexed, only its authors can re-index it:\n\n```bash\ncurl \"https://huggingface.co/api/papers/index\" \\\n  --request POST \\\n  --header \"Content-Type: application/json\" \\\n  --header \"Authorization: Bearer $HF_TOKEN\" \\\n  --data '{\n    \"arxivId\": \"{ARXIV_ID}\"\n  }'\n```\n\n- Endpoint: `POST /api/papers/index`\n- Body:\n  - `arxivId` (string, required): arXiv ID to index, for example `2301.00001`\n- Pattern: `^\\d{4}\\.\\d{4,5}$`\n- Response: empty JSON object on success\n\n#### Update paper links\n\nUpdate the project page, GitHub repository, or submitting organization for a paper. The requester must be the paper author, the Daily Papers submitter, or a papers admin:\n\n```bash\ncurl \"https://huggingface.co/api/papers/{PAPER_OBJECT_ID}/links\" \\\n  --request POST \\\n  --header \"Content-Type: application/json\" \\\n  --header \"Authorization: Bearer $HF_TOKEN\" \\\n  --data '{\n    \"projectPage\": \"https://example.com\",\n    \"githubRepo\": \"https://github.com/org/repo\",\n    \"organizationId\": \"{ORGANIZATION_ID}\"\n  }'\n```\n\n- Endpoint: `POST /api/papers/{paperId}/links`\n- Path parameters:\n  - `paperId` (string, required): Hugging Face paper object ID\n- Body:\n  - `githubRepo` (string, nullable): GitHub repository URL\n  - `organizationId` (string, nullable): organization ID, 24-char hex ID\n  - `projectPage` (string, nullable): project page URL\n- Response: empty JSON object on success\n\n## Error Handling\n\n- **404 on `https://huggingface.co/papers/{PAPER_ID}` or `md` endpoint**: the paper is not indexed on Hugging Face paper pages yet.\n- **404 on `/api/papers/{PAPER_ID}`**: the paper may not be indexed on Hugging Face paper pages yet.\n- **Paper ID not found**: verify the extracted arXiv ID, including any version suffix\n\n### Fallbacks\n\nIf the Hugging Face paper page does not contain enough detail for the user's question:\n\n- Check the regular paper page at `https://huggingface.co/papers/{PAPER_ID}`\n- Fall back to the arXiv page or PDF for the original source:\n  - `https://arxiv.org/abs/{PAPER_ID}`\n  - `https://arxiv.org/pdf/{PAPER_ID}`\n\n## Notes\n\n- No authentication is required for public paper pages.\n- Write endpoints such as claim authorship, index paper, and update paper links require `Authorization: Bearer $HF_TOKEN`.\n- Prefer the `.md` endpoint for reliable machine-readable output.\n- Prefer `/api/papers/{PAPER_ID}` when you need structured JSON fields instead of page markdown."
  },
  {
    "path": "skills/hugging-face-paper-publisher/SKILL.md",
    "content": "---\nname: hugging-face-paper-publisher\ndescription: Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.\n---\n\n# Overview\nThis skill provides comprehensive tools for AI engineers and researchers to publish, manage, and link research papers on the Hugging Face Hub. It streamlines the workflow from paper creation to publication, including integration with arXiv, model/dataset linking, and authorship management.\n\n## Integration with HF Ecosystem\n- **Paper Pages**: Index and discover papers on Hugging Face Hub\n- **arXiv Integration**: Automatic paper indexing from arXiv IDs\n- **Model/Dataset Linking**: Connect papers to relevant artifacts through metadata\n- **Authorship Verification**: Claim and verify paper authorship\n- **Research Article Template**: Generate professional, modern scientific papers\n\n# Version\n1.0.0\n\n# Dependencies\nThe included script uses PEP 723 inline dependencies. Prefer `uv run` over\nmanual environment setup.\n\n- huggingface_hub>=0.26.0\n- pyyaml>=6.0.3\n- requests>=2.32.5\n- markdown>=3.5.0\n- python-dotenv>=1.2.1\n\n# Core Capabilities\n\n## 1. Paper Page Management\n- **Index Papers**: Add papers to Hugging Face from arXiv\n- **Claim Authorship**: Verify and claim authorship on published papers\n- **Manage Visibility**: Control which papers appear on your profile\n- **Paper Discovery**: Find and explore papers in the HF ecosystem\n\n## 2. Link Papers to Artifacts\n- **Model Cards**: Add paper citations to model metadata\n- **Dataset Cards**: Link papers to datasets via README\n- **Automatic Tagging**: Hub auto-generates arxiv:<PAPER_ID> tags\n- **Citation Management**: Maintain proper attribution and references\n\n## 3. Research Article Creation\n- **Markdown Templates**: Generate professional paper formatting\n- **Modern Design**: Clean, readable research article layouts\n- **Dynamic TOC**: Automatic table of contents generation\n- **Section Structure**: Standard scientific paper organization\n- **LaTeX Math**: Support for equations and technical notation\n\n## 4. Metadata Management\n- **YAML Frontmatter**: Proper model/dataset card metadata\n- **Citation Tracking**: Maintain paper references across repositories\n- **Version Control**: Track paper updates and revisions\n- **Multi-Paper Support**: Link multiple papers to single artifacts\n\n# Usage Instructions\n\nThe skill includes Python scripts in `scripts/` for paper publishing operations.\n\n### Prerequisites\n- Run scripts with `uv run` (dependencies are resolved from the script header)\n- Set `HF_TOKEN` environment variable with Write-access token\n\n> **All paths are relative to the directory containing this SKILL.md\nfile.**\n> Before running any script, first `cd` to that directory or use the full\npath.\n\n\n### Method 1: Index Paper from arXiv\n\nAdd a paper to Hugging Face Paper Pages from arXiv.\n\n**Basic Usage:**\n```bash\nuv run scripts/paper_manager.py index \\\n  --arxiv-id \"2301.12345\"\n```\n\n**Check If Paper Exists:**\n```bash\nuv run scripts/paper_manager.py check \\\n  --arxiv-id \"2301.12345\"\n```\n\n**Direct URL Access:**\nYou can also visit `https://huggingface.co/papers/{arxiv-id}` directly to index a paper.\n\n### Method 2: Link Paper to Model/Dataset\n\nAdd paper references to model or dataset README with proper YAML metadata.\n\n**Add to Model Card:**\n```bash\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/model-name\" \\\n  --repo-type \"model\" \\\n  --arxiv-id \"2301.12345\"\n```\n\n**Add to Dataset Card:**\n```bash\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/dataset-name\" \\\n  --repo-type \"dataset\" \\\n  --arxiv-id \"2301.12345\"\n```\n\n**Add Multiple Papers:**\n```bash\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/model-name\" \\\n  --repo-type \"model\" \\\n  --arxiv-ids \"2301.12345,2302.67890,2303.11111\"\n```\n\n**With Custom Citation:**\n```bash\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/model-name\" \\\n  --repo-type \"model\" \\\n  --arxiv-id \"2301.12345\" \\\n  --citation \"$(cat citation.txt)\"\n```\n\n#### How Linking Works\n\nWhen you add an arXiv paper link to a model or dataset README:\n1. The Hub extracts the arXiv ID from the link\n2. A tag `arxiv:<PAPER_ID>` is automatically added to the repository\n3. Users can click the tag to view the Paper Page\n4. The Paper Page shows all models/datasets citing this paper\n5. Papers are discoverable through filters and search\n\n### Method 3: Claim Authorship\n\nVerify your authorship on papers published on Hugging Face.\n\n**Start Claim Process:**\n```bash\nuv run scripts/paper_manager.py claim \\\n  --arxiv-id \"2301.12345\" \\\n  --email \"your.email@institution.edu\"\n```\n\n**Manual Process:**\n1. Navigate to your paper's page: `https://huggingface.co/papers/{arxiv-id}`\n2. Find your name in the author list\n3. Click your name and select \"Claim authorship\"\n4. Wait for admin team verification\n\n**Check Authorship Status:**\n```bash\nuv run scripts/paper_manager.py check-authorship \\\n  --arxiv-id \"2301.12345\"\n```\n\n### Method 4: Manage Paper Visibility\n\nControl which verified papers appear on your public profile.\n\n**List Your Papers:**\n```bash\nuv run scripts/paper_manager.py list-my-papers\n```\n\n**Toggle Visibility:**\n```bash\nuv run scripts/paper_manager.py toggle-visibility \\\n  --arxiv-id \"2301.12345\" \\\n  --show true\n```\n\n**Manage in Settings:**\nNavigate to your account settings → Papers section to toggle \"Show on profile\" for each paper.\n\n### Method 5: Create Research Article\n\nGenerate a professional markdown-based research paper using modern templates.\n\n**Create from Template:**\n```bash\nuv run scripts/paper_manager.py create \\\n  --template \"standard\" \\\n  --title \"Your Paper Title\" \\\n  --output \"paper.md\"\n```\n\n**Available Templates:**\n- `standard` - Traditional scientific paper structure\n- `modern` - Clean, web-friendly format inspired by Distill\n- `arxiv` - arXiv-style formatting\n- `ml-report` - Machine learning experiment report\n\n**Generate Complete Paper:**\n```bash\nuv run scripts/paper_manager.py create \\\n  --template \"modern\" \\\n  --title \"Fine-Tuning Large Language Models with LoRA\" \\\n  --authors \"Jane Doe, John Smith\" \\\n  --abstract \"$(cat abstract.txt)\" \\\n  --output \"paper.md\"\n```\n\n**Convert to HTML:**\n```bash\nuv run scripts/paper_manager.py convert \\\n  --input \"paper.md\" \\\n  --output \"paper.html\" \\\n  --style \"modern\"\n```\n\n### Paper Template Structure\n\n**Standard Research Paper Sections:**\n```markdown\n---\ntitle: Your Paper Title\nauthors: Jane Doe, John Smith\naffiliations: University X, Lab Y\ndate: 2025-01-15\narxiv: 2301.12345\ntags: [machine-learning, nlp, fine-tuning]\n---\n\n# Abstract\nBrief summary of the paper...\n\n# 1. Introduction\nBackground and motivation...\n\n# 2. Related Work\nPrevious research and context...\n\n# 3. Methodology\nApproach and implementation...\n\n# 4. Experiments\nSetup, datasets, and procedures...\n\n# 5. Results\nFindings and analysis...\n\n# 6. Discussion\nInterpretation and implications...\n\n# 7. Conclusion\nSummary and future work...\n\n# References\n```\n\n**Modern Template Features:**\n- Dynamic table of contents\n- Responsive design for web viewing\n- Code syntax highlighting\n- Interactive figures and charts\n- Math equation rendering (LaTeX)\n- Citation management\n- Author affiliation linking\n\n### Commands Reference\n\n**Index Paper:**\n```bash\nuv run scripts/paper_manager.py index --arxiv-id \"2301.12345\"\n```\n\n**Link to Repository:**\n```bash\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/repo-name\" \\\n  --repo-type \"model|dataset|space\" \\\n  --arxiv-id \"2301.12345\" \\\n  [--citation \"Full citation text\"] \\\n  [--create-pr]\n```\n\n**Claim Authorship:**\n```bash\nuv run scripts/paper_manager.py claim \\\n  --arxiv-id \"2301.12345\" \\\n  --email \"your.email@edu\"\n```\n\n**Manage Visibility:**\n```bash\nuv run scripts/paper_manager.py toggle-visibility \\\n  --arxiv-id \"2301.12345\" \\\n  --show true|false\n```\n\n**Create Research Article:**\n```bash\nuv run scripts/paper_manager.py create \\\n  --template \"standard|modern|arxiv|ml-report\" \\\n  --title \"Paper Title\" \\\n  [--authors \"Author1, Author2\"] \\\n  [--abstract \"Abstract text\"] \\\n  [--output \"filename.md\"]\n```\n\n**Convert Markdown to HTML:**\n```bash\nuv run scripts/paper_manager.py convert \\\n  --input \"paper.md\" \\\n  --output \"paper.html\" \\\n  [--style \"modern|classic\"]\n```\n\n**Check Paper Status:**\n```bash\nuv run scripts/paper_manager.py check --arxiv-id \"2301.12345\"\n```\n\n**List Your Papers:**\n```bash\nuv run scripts/paper_manager.py list-my-papers\n```\n\n**Search Papers:**\n```bash\nuv run scripts/paper_manager.py search --query \"transformer attention\"\n```\n\n### YAML Metadata Format\n\nWhen linking papers to models or datasets, proper YAML frontmatter is required:\n\n**Model Card Example:**\n```yaml\n---\nlanguage:\n  - en\nlicense: apache-2.0\ntags:\n  - text-generation\n  - transformers\n  - llm\nlibrary_name: transformers\n---\n\n# Model Name\n\nThis model is based on the approach described in [Our Paper](https://arxiv.org/abs/2301.12345).\n\n## Citation\n\n```bibtex\n@article{doe2023paper,\n  title={Your Paper Title},\n  author={Doe, Jane and Smith, John},\n  journal={arXiv preprint arXiv:2301.12345},\n  year={2023}\n}\n```\n```\n\n**Dataset Card Example:**\n```yaml\n---\nlanguage:\n  - en\nlicense: cc-by-4.0\ntask_categories:\n  - text-generation\n  - question-answering\nsize_categories:\n  - 10K<n<100K\n---\n\n# Dataset Name\n\nDataset introduced in [Our Paper](https://arxiv.org/abs/2301.12345).\n\nFor more details, see the [paper page](https://huggingface.co/papers/2301.12345).\n```\n\nThe Hub automatically extracts arXiv IDs from these links and creates `arxiv:2301.12345` tags.\n\n### Integration Examples\n\n**Workflow 1: Publish New Research**\n```bash\n# 1. Create research article\nuv run scripts/paper_manager.py create \\\n  --template \"modern\" \\\n  --title \"Novel Fine-Tuning Approach\" \\\n  --output \"paper.md\"\n\n# 2. Edit paper.md with your content\n\n# 3. Submit to arXiv (external process)\n# Upload to arxiv.org, get arXiv ID\n\n# 4. Index on Hugging Face\nuv run scripts/paper_manager.py index --arxiv-id \"2301.12345\"\n\n# 5. Link to your model\nuv run scripts/paper_manager.py link \\\n  --repo-id \"your-username/your-model\" \\\n  --repo-type \"model\" \\\n  --arxiv-id \"2301.12345\"\n\n# 6. Claim authorship\nuv run scripts/paper_manager.py claim \\\n  --arxiv-id \"2301.12345\" \\\n  --email \"your.email@edu\"\n```\n\n**Workflow 2: Link Existing Paper**\n```bash\n# 1. Check if paper exists\nuv run scripts/paper_manager.py check --arxiv-id \"2301.12345\"\n\n# 2. Index if needed\nuv run scripts/paper_manager.py index --arxiv-id \"2301.12345\"\n\n# 3. Link to multiple repositories\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/model-v1\" \\\n  --repo-type \"model\" \\\n  --arxiv-id \"2301.12345\"\n\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/training-data\" \\\n  --repo-type \"dataset\" \\\n  --arxiv-id \"2301.12345\"\n\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/demo-space\" \\\n  --repo-type \"space\" \\\n  --arxiv-id \"2301.12345\"\n```\n\n**Workflow 3: Update Model with Paper Reference**\n```bash\n# 1. Get current README\nhf download username/model-name README.md\n\n# 2. Add paper link\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/model-name\" \\\n  --repo-type \"model\" \\\n  --arxiv-id \"2301.12345\" \\\n  --citation \"Full citation for the paper\"\n\n# The script will:\n# - Add YAML metadata if missing\n# - Insert arXiv link in README\n# - Add formatted citation\n# - Preserve existing content\n```\n\n### Best Practices\n\n1. **Paper Indexing**\n   - Index papers as soon as they're published on arXiv\n   - Include full citation information in model/dataset cards\n   - Use consistent paper references across related repositories\n\n2. **Metadata Management**\n   - Add YAML frontmatter to all model/dataset cards\n   - Include proper licensing information\n   - Tag with relevant task categories and domains\n\n3. **Authorship**\n   - Claim authorship on papers where you're listed as author\n   - Use institutional email addresses for verification\n   - Keep paper visibility settings updated\n\n4. **Repository Linking**\n   - Link papers to all relevant models, datasets, and Spaces\n   - Include paper context in README descriptions\n   - Add BibTeX citations for easy reference\n\n5. **Research Articles**\n   - Use templates consistently within projects\n   - Include code and data links in papers\n   - Generate web-friendly HTML versions for sharing\n\n### Advanced Usage\n\n**Batch Link Papers:**\n```bash\n# Link multiple papers to one repository\nfor arxiv_id in \"2301.12345\" \"2302.67890\" \"2303.11111\"; do\n  uv run scripts/paper_manager.py link \\\n    --repo-id \"username/model-name\" \\\n    --repo-type \"model\" \\\n    --arxiv-id \"$arxiv_id\"\ndone\n```\n\n**Extract Paper Info:**\n```bash\n# Get paper metadata from arXiv\nuv run scripts/paper_manager.py info \\\n  --arxiv-id \"2301.12345\" \\\n  --format \"json\"\n```\n\n**Generate Citation:**\n```bash\n# Create BibTeX citation\nuv run scripts/paper_manager.py citation \\\n  --arxiv-id \"2301.12345\" \\\n  --format \"bibtex\"\n```\n\n**Validate Links:**\n```bash\n# Check all paper links in a repository\nuv run scripts/paper_manager.py validate \\\n  --repo-id \"username/model-name\" \\\n  --repo-type \"model\"\n```\n\n### Error Handling\n\n- **Paper Not Found**: arXiv ID doesn't exist or isn't indexed yet\n- **Permission Denied**: HF_TOKEN lacks write access to repository\n- **Invalid YAML**: Malformed metadata in README frontmatter\n- **Authorship Failed**: Email doesn't match paper author records\n- **Already Claimed**: Another user has claimed authorship\n- **Rate Limiting**: Too many API requests in short time\n\n### Troubleshooting\n\n**Issue**: \"Paper not found on Hugging Face\"\n- **Solution**: Visit `hf.co/papers/{arxiv-id}` to trigger indexing\n\n**Issue**: \"Authorship claim not verified\"\n- **Solution**: Wait for admin review or contact HF support with proof\n\n**Issue**: \"arXiv tag not appearing\"\n- **Solution**: Ensure README includes proper arXiv URL format\n\n**Issue**: \"Cannot link to repository\"\n- **Solution**: Verify HF_TOKEN has write permissions\n\n**Issue**: \"Template rendering errors\"\n- **Solution**: Check markdown syntax and YAML frontmatter format\n\n### Resources and References\n\n- **Hugging Face Paper Pages**: [hf.co/papers](https://huggingface.co/papers)\n- **Model Cards Guide**: [hf.co/docs/hub/model-cards](https://huggingface.co/docs/hub/en/model-cards)\n- **Dataset Cards Guide**: [hf.co/docs/hub/datasets-cards](https://huggingface.co/docs/hub/en/datasets-cards)\n- **Research Article Template**: [tfrere/research-article-template](https://huggingface.co/spaces/tfrere/research-article-template)\n- **arXiv Format Guide**: [arxiv.org/help/submit](https://arxiv.org/help/submit)\n\n### Integration with tfrere's Research Template\n\nThis skill complements [tfrere's research article template](https://huggingface.co/spaces/tfrere/research-article-template) by providing:\n\n- Automated paper indexing workflows\n- Repository linking capabilities\n- Metadata management tools\n- Citation generation utilities\n\nYou can use tfrere's template for writing, then use this skill to publish and link the paper on Hugging Face Hub.\n\n### Common Patterns\n\n**Pattern 1: New Paper Publication**\n```bash\n# Write → Publish → Index → Link\nuv run scripts/paper_manager.py create --template modern --output paper.md\n# (Submit to arXiv)\nuv run scripts/paper_manager.py index --arxiv-id \"2301.12345\"\nuv run scripts/paper_manager.py link --repo-id \"user/model\" --arxiv-id \"2301.12345\"\n```\n\n**Pattern 2: Existing Paper Discovery**\n```bash\n# Search → Check → Link\nuv run scripts/paper_manager.py search --query \"transformers\"\nuv run scripts/paper_manager.py check --arxiv-id \"2301.12345\"\nuv run scripts/paper_manager.py link --repo-id \"user/model\" --arxiv-id \"2301.12345\"\n```\n\n**Pattern 3: Author Portfolio Management**\n```bash\n# Claim → Verify → Organize\nuv run scripts/paper_manager.py claim --arxiv-id \"2301.12345\"\nuv run scripts/paper_manager.py list-my-papers\nuv run scripts/paper_manager.py toggle-visibility --arxiv-id \"2301.12345\" --show true\n```\n\n### API Integration\n\n**Python Script Example:**\n```python\nfrom scripts.paper_manager import PaperManager\n\npm = PaperManager(hf_token=\"your_token\")\n\n# Index paper\npm.index_paper(\"2301.12345\")\n\n# Link to model\npm.link_paper(\n    repo_id=\"username/model\",\n    repo_type=\"model\",\n    arxiv_id=\"2301.12345\",\n    citation=\"Full citation text\"\n)\n\n# Check status\nstatus = pm.check_paper(\"2301.12345\")\nprint(status)\n```\n\n### Future Enhancements\n\nPlanned features for future versions:\n- Support for non-arXiv papers (conference proceedings, journals)\n- Automatic citation formatting from DOI\n- Paper comparison and versioning tools\n- Collaborative paper writing features\n- Integration with LaTeX workflows\n- Automated figure and table extraction\n- Paper metrics and impact tracking\n"
  },
  {
    "path": "skills/hugging-face-paper-publisher/examples/example_usage.md",
    "content": "# Example Usage: HF Paper Publisher Skill\n\nThis document demonstrates common workflows for publishing research papers on Hugging Face Hub.\n\n## Example 1: Index an Existing arXiv Paper\n\nIf you've already published a paper on arXiv and want to make it discoverable on Hugging Face:\n\n```bash\n# Check if paper exists\nuv run scripts/paper_manager.py check --arxiv-id \"2301.12345\"\n\n# Index the paper\nuv run scripts/paper_manager.py index --arxiv-id \"2301.12345\"\n\n# Get paper information\nuv run scripts/paper_manager.py info --arxiv-id \"2301.12345\"\n```\n\nExpected output:\n```json\n{\n  \"exists\": true,\n  \"url\": \"https://huggingface.co/papers/2301.12345\",\n  \"arxiv_id\": \"2301.12345\",\n  \"arxiv_url\": \"https://arxiv.org/abs/2301.12345\"\n}\n```\n\n## Example 2: Link Paper to Your Model\n\nAfter indexing a paper, link it to your model repository:\n\n```bash\n# Link single paper\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/my-awesome-model\" \\\n  --repo-type \"model\" \\\n  --arxiv-id \"2301.12345\"\n\n# Link multiple papers\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/my-awesome-model\" \\\n  --repo-type \"model\" \\\n  --arxiv-ids \"2301.12345,2302.67890\"\n```\n\nThis will:\n1. Download the model's README.md\n2. Add or update YAML frontmatter\n3. Insert paper references with links\n4. Upload the updated README\n5. Hub automatically creates `arxiv:2301.12345` tags\n\n## Example 3: Link Paper to Dataset\n\nSame process for datasets:\n\n```bash\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/my-dataset\" \\\n  --repo-type \"dataset\" \\\n  --arxiv-id \"2301.12345\" \\\n  --citation \"$(cat citation.bib)\"\n```\n\n## Example 4: Create a New Research Article\n\nGenerate a research paper from template:\n\n```bash\n# Create with standard template\nuv run scripts/paper_manager.py create \\\n  --template \"standard\" \\\n  --title \"Efficient Fine-Tuning of Large Language Models\" \\\n  --authors \"Jane Doe, John Smith\" \\\n  --abstract \"We propose a novel approach to fine-tuning...\" \\\n  --output \"paper.md\"\n\n# Create with modern template\nuv run scripts/paper_manager.py create \\\n  --template \"modern\" \\\n  --title \"Vision Transformers for Medical Imaging\" \\\n  --output \"medical_vit_paper.md\"\n\n# Create ML experiment report\nuv run scripts/paper_manager.py create \\\n  --template \"ml-report\" \\\n  --title \"BERT Fine-tuning Experiment Results\" \\\n  --output \"bert_experiment_report.md\"\n```\n\n## Example 5: Generate Citations\n\nGet formatted citations for papers:\n\n```bash\n# BibTeX format\nuv run scripts/paper_manager.py citation \\\n  --arxiv-id \"2301.12345\" \\\n  --format \"bibtex\"\n```\n\nOutput:\n```bibtex\n@article{arxiv2301_12345,\n  title={Efficient Fine-Tuning of Large Language Models},\n  author={Doe, Jane and Smith, John},\n  journal={arXiv preprint arXiv:2301.12345},\n  year={2023}\n}\n```\n\n## Example 6: Complete Workflow - New Paper\n\nFull workflow from paper creation to publication:\n\n```bash\n# Step 1: Create research article\nuv run scripts/paper_manager.py create \\\n  --template \"modern\" \\\n  --title \"Novel Architecture for Multimodal Learning\" \\\n  --authors \"Alice Chen, Bob Kumar\" \\\n  --output \"multimodal_paper.md\"\n\n# Step 2: Edit the paper (use your favorite editor)\n# vim multimodal_paper.md\n\n# Step 3: Submit to arXiv (external process)\n# Upload to arxiv.org, receive arXiv ID: 2312.99999\n\n# Step 4: Index on Hugging Face\nuv run scripts/paper_manager.py index --arxiv-id \"2312.99999\"\n\n# Step 5: Link to your models/datasets\nuv run scripts/paper_manager.py link \\\n  --repo-id \"alice/multimodal-model-v1\" \\\n  --repo-type \"model\" \\\n  --arxiv-id \"2312.99999\"\n\nuv run scripts/paper_manager.py link \\\n  --repo-id \"alice/multimodal-dataset\" \\\n  --repo-type \"dataset\" \\\n  --arxiv-id \"2312.99999\"\n\n# Step 6: Generate citation for README\nuv run scripts/paper_manager.py citation \\\n  --arxiv-id \"2312.99999\" \\\n  --format \"bibtex\" > citation.bib\n```\n\n## Example 7: Batch Link Papers\n\nLink multiple papers to multiple repositories:\n\n```bash\n#!/bin/bash\n\n# List of papers\nPAPERS=(\"2301.12345\" \"2302.67890\" \"2303.11111\")\n\n# List of models\nMODELS=(\"username/model-a\" \"username/model-b\" \"username/model-c\")\n\n# Link each paper to each model\nfor paper in \"${PAPERS[@]}\"; do\n  for model in \"${MODELS[@]}\"; do\n    echo \"Linking $paper to $model...\"\n    uv run scripts/paper_manager.py link \\\n      --repo-id \"$model\" \\\n      --repo-type \"model\" \\\n      --arxiv-id \"$paper\"\n  done\ndone\n```\n\n## Example 8: Update Model Card with Paper Info\n\nGet paper info and manually update model card:\n\n```bash\n# Get paper information\nuv run scripts/paper_manager.py info \\\n  --arxiv-id \"2301.12345\" \\\n  --format \"text\" > paper_info.txt\n\n# View the information\ncat paper_info.txt\n\n# Manually incorporate into your model card or use the link command\n```\n\n## Example 9: Search and Discover Papers\n\n```bash\n# Search for papers (opens browser)\nuv run scripts/paper_manager.py search \\\n  --query \"transformer attention mechanism\"\n```\n\n## Example 10: Working with tfrere's Template\n\nThis skill complements [tfrere's research article template](https://huggingface.co/spaces/tfrere/research-article-template):\n\n```bash\n# 1. Use tfrere's Space to create a beautiful web-based paper\n# Visit: https://huggingface.co/spaces/tfrere/research-article-template\n\n# 2. Export your paper content to markdown\n\n# 3. Submit to arXiv\n\n# 4. Use this skill to index and link\nuv run scripts/paper_manager.py index --arxiv-id \"YOUR_ARXIV_ID\"\nuv run scripts/paper_manager.py link \\\n  --repo-id \"your-username/your-model\" \\\n  --arxiv-id \"YOUR_ARXIV_ID\"\n```\n\n## Example 11: Error Handling\n\n```bash\n# Check if paper exists before linking\nif uv run scripts/paper_manager.py check --arxiv-id \"2301.12345\" | grep -q '\"exists\": true'; then\n  echo \"Paper exists, proceeding with link...\"\n  uv run scripts/paper_manager.py link \\\n    --repo-id \"username/model\" \\\n    --arxiv-id \"2301.12345\"\nelse\n  echo \"Paper doesn't exist, indexing first...\"\n  uv run scripts/paper_manager.py index --arxiv-id \"2301.12345\"\n  uv run scripts/paper_manager.py link \\\n    --repo-id \"username/model\" \\\n    --arxiv-id \"2301.12345\"\nfi\n```\n\n## Example 12: CI/CD Integration\n\nAdd to your `.github/workflows/update-paper.yml`:\n\n```yaml\nname: Update Paper Links\n\non:\n  push:\n    branches: [main]\n  workflow_dispatch:\n\njobs:\n  update:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n\n      - name: Set up uv\n        uses: astral-sh/setup-uv@v5\n\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.10'\n\n      - name: Link paper to model\n        env:\n          HF_TOKEN: ${{ secrets.HF_TOKEN }}\n        run: |\n          uv run scripts/paper_manager.py link \\\n            --repo-id \"${{ github.repository_owner }}/model-name\" \\\n            --repo-type \"model\" \\\n            --arxiv-id \"2301.12345\"\n```\n\n## Tips and Best Practices\n\n1. **Always check if paper exists** before indexing to avoid unnecessary operations\n2. **Use meaningful commit messages** when linking papers to repositories\n3. **Include full citations** in model cards for proper attribution\n4. **Link papers to all relevant artifacts** (models, datasets, spaces)\n5. **Generate BibTeX citations** for easy reference by others\n6. **Keep paper visibility updated** in your HF profile settings\n7. **Use templates consistently** within your research group\n8. **Version control your papers** alongside code\n\n## Troubleshooting\n\n### Paper not found after indexing\n\n```bash\n# Visit the URL directly to trigger indexing\nopen \"https://huggingface.co/papers/2301.12345\"\n\n# Wait a few seconds, then check again\nuv run scripts/paper_manager.py check --arxiv-id \"2301.12345\"\n```\n\n### Permission denied when linking\n\n```bash\n# Verify your token has write access\necho $HF_TOKEN\n\n# Set token if missing\nexport HF_TOKEN=\"your_token_here\"\n\n# Or use .env file\necho \"HF_TOKEN=your_token_here\" > .env\n```\n\n### arXiv ID format issues\n\n```bash\n# The script handles various formats:\nuv run scripts/paper_manager.py check --arxiv-id \"2301.12345\"\nuv run scripts/paper_manager.py check --arxiv-id \"arxiv:2301.12345\"\nuv run scripts/paper_manager.py check --arxiv-id \"https://arxiv.org/abs/2301.12345\"\n\n# All are equivalent and will be normalized\n```\n\n## Next Steps\n\n- Explore the [Paper Pages documentation](https://huggingface.co/docs/hub/en/paper-pages)\n- Check out [tfrere's research template](https://huggingface.co/spaces/tfrere/research-article-template)\n- Browse [papers on HF](https://huggingface.co/papers)\n- Learn about [model cards](https://huggingface.co/docs/hub/en/model-cards)\n"
  },
  {
    "path": "skills/hugging-face-paper-publisher/references/quick_reference.md",
    "content": "# Quick Reference Guide\n\n## Essential Commands\n\n### Paper Indexing\n```bash\n# Index from arXiv\nuv run scripts/paper_manager.py index --arxiv-id \"2301.12345\"\n\n# Check if exists\nuv run scripts/paper_manager.py check --arxiv-id \"2301.12345\"\n```\n\n### Linking Papers\n```bash\n# Link to model\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/model\" \\\n  --repo-type \"model\" \\\n  --arxiv-id \"2301.12345\"\n\n# Link to dataset\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/dataset\" \\\n  --repo-type \"dataset\" \\\n  --arxiv-id \"2301.12345\"\n\n# Link multiple papers\nuv run scripts/paper_manager.py link \\\n  --repo-id \"username/model\" \\\n  --repo-type \"model\" \\\n  --arxiv-ids \"2301.12345,2302.67890\"\n```\n\n### Creating Papers\n```bash\n# Standard template\nuv run scripts/paper_manager.py create \\\n  --template \"standard\" \\\n  --title \"Paper Title\" \\\n  --output \"paper.md\"\n\n# Modern template\nuv run scripts/paper_manager.py create \\\n  --template \"modern\" \\\n  --title \"Paper Title\" \\\n  --authors \"Author1, Author2\" \\\n  --abstract \"Abstract text\" \\\n  --output \"paper.md\"\n\n# ML Report\nuv run scripts/paper_manager.py create \\\n  --template \"ml-report\" \\\n  --title \"Experiment Report\" \\\n  --output \"report.md\"\n\n# arXiv style\nuv run scripts/paper_manager.py create \\\n  --template \"arxiv\" \\\n  --title \"Paper Title\" \\\n  --output \"paper.md\"\n```\n\n### Citations\n```bash\n# Generate BibTeX\nuv run scripts/paper_manager.py citation \\\n  --arxiv-id \"2301.12345\" \\\n  --format \"bibtex\"\n```\n\n### Paper Info\n```bash\n# JSON format\nuv run scripts/paper_manager.py info \\\n  --arxiv-id \"2301.12345\" \\\n  --format \"json\"\n\n# Text format\nuv run scripts/paper_manager.py info \\\n  --arxiv-id \"2301.12345\" \\\n  --format \"text\"\n```\n\n## URL Formats\n\n### Hugging Face Paper Pages\n- View paper: `https://huggingface.co/papers/{arxiv-id}`\n- Example: `https://huggingface.co/papers/2301.12345`\n\n### arXiv\n- Abstract: `https://arxiv.org/abs/{arxiv-id}`\n- PDF: `https://arxiv.org/pdf/{arxiv-id}.pdf`\n- Example: `https://arxiv.org/abs/2301.12345`\n\n## YAML Metadata Format\n\n### Model Card\n```yaml\n---\nlanguage:\n  - en\nlicense: apache-2.0\ntags:\n  - text-generation\n  - transformers\nlibrary_name: transformers\n---\n```\n\n### Dataset Card\n```yaml\n---\nlanguage:\n  - en\nlicense: cc-by-4.0\ntask_categories:\n  - text-generation\nsize_categories:\n  - 10K<n<100K\n---\n```\n\n## arXiv ID Formats\n\nAll these formats work:\n- `2301.12345`\n- `arxiv:2301.12345`\n- `https://arxiv.org/abs/2301.12345`\n- `https://arxiv.org/pdf/2301.12345.pdf`\n\n## Environment Setup\n\n### Set Token\n```bash\nexport HF_TOKEN=\"your_token\"\n```\n\n### Or use .env file\n```bash\necho \"HF_TOKEN=your_token\" > .env\n```\n\n## Common Workflows\n\n### 1. Index & Link\n```bash\nuv run scripts/paper_manager.py index --arxiv-id \"2301.12345\"\nuv run scripts/paper_manager.py link --repo-id \"user/model\" --arxiv-id \"2301.12345\"\n```\n\n### 2. Create & Publish\n```bash\nuv run scripts/paper_manager.py create --template \"modern\" --title \"Title\" --output \"paper.md\"\n# Edit paper.md\n# Submit to arXiv → get ID\nuv run scripts/paper_manager.py index --arxiv-id \"NEW_ID\"\nuv run scripts/paper_manager.py link --repo-id \"user/model\" --arxiv-id \"NEW_ID\"\n```\n\n### 3. Batch Link\n```bash\nfor id in \"2301.12345\" \"2302.67890\"; do\n  uv run scripts/paper_manager.py link --repo-id \"user/model\" --arxiv-id \"$id\"\ndone\n```\n\n## Troubleshooting\n\n### Paper not found\nVisit `https://huggingface.co/papers/{arxiv-id}` to trigger indexing\n\n### Permission denied\nCheck `HF_TOKEN` is set and has write access\n\n### arXiv API errors\nWait a moment and retry - arXiv has rate limits\n\n## Tips\n\n1. Always check paper exists before linking\n2. Use templates for consistency\n3. Include full citations in model cards\n4. Link papers to all relevant artifacts\n5. Keep citations up to date\n\n## Templates Available\n\n- `standard` - Traditional academic paper\n- `modern` - Web-friendly format (Distill-style)\n- `arxiv` - arXiv journal format\n- `ml-report` - ML experiment documentation\n\n## File Locations\n\n- Scripts: `scripts/paper_manager.py`\n- Templates: `templates/*.md`\n- Examples: `examples/example_usage.md`\n- This guide: `references/quick_reference.md`\n\n## Getting Help\n\n```bash\n# Command help\nuv run scripts/paper_manager.py --help\n\n# Subcommand help\nuv run scripts/paper_manager.py link --help\n```\n\n## Additional Resources\n\n- [Full documentation](../SKILL.md)\n- [Usage examples](../examples/example_usage.md)\n- [HF Paper Pages](https://huggingface.co/papers)\n- [tfrere's template](https://huggingface.co/spaces/tfrere/research-article-template)\n"
  },
  {
    "path": "skills/hugging-face-paper-publisher/scripts/paper_manager.py",
    "content": "#!/usr/bin/env -S uv run\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\n#     \"huggingface_hub\",\n#     \"pyyaml\",\n#     \"requests\",\n#     \"python-dotenv\",\n# ]\n# ///\n\"\"\"\nPaper Manager for Hugging Face Hub\nManages paper indexing, linking, authorship, and article creation.\n\"\"\"\n\nimport argparse\nimport os\nimport sys\nimport re\nimport json\nfrom pathlib import Path\nfrom typing import Optional, List, Dict, Any\nfrom datetime import datetime\n\ntry:\n    from huggingface_hub import HfApi, hf_hub_download, get_token\n    import yaml\n    import requests\n    from dotenv import load_dotenv\nexcept ImportError as e:\n    print(f\"Error: Missing required dependency: {e}\")\n    print(\"Tip: run this script with `uv run scripts/paper_manager.py ...`.\")\n    sys.exit(1)\n\n# Load environment variables\nload_dotenv()\n\n\nclass PaperManager:\n    \"\"\"Manages paper publishing operations on Hugging Face Hub.\"\"\"\n\n    def __init__(self, hf_token: Optional[str] = None):\n        \"\"\"Initialize Paper Manager with HF token.\"\"\"\n        self.token = hf_token or os.getenv(\"HF_TOKEN\") or get_token()\n        if not self.token:\n            print(\"Warning: No HF_TOKEN found. Some operations will fail.\")\n        self.api = HfApi(token=self.token)\n\n    def index_paper(self, arxiv_id: str) -> Dict[str, Any]:\n        \"\"\"\n        Index a paper on Hugging Face from arXiv.\n\n        Args:\n            arxiv_id: arXiv identifier (e.g., \"2301.12345\")\n\n        Returns:\n            dict: Status information\n        \"\"\"\n        # Clean and validate arXiv ID\n        try:\n            arxiv_id = self._clean_arxiv_id(arxiv_id)\n        except ValueError as e:\n            print(f\"Error: {e}\")\n            return {\"status\": \"error\", \"message\": str(e)}\n\n        print(f\"Indexing paper {arxiv_id} on Hugging Face...\")\n\n        # Check if paper exists\n        paper_url = f\"https://huggingface.co/papers/{arxiv_id}\"\n\n        try:\n            response = requests.get(paper_url, timeout=10)\n            if response.status_code == 200:\n                print(f\"✓ Paper already indexed at {paper_url}\")\n                return {\"status\": \"exists\", \"url\": paper_url}\n            else:\n                print(f\"Paper not indexed. Visit {paper_url} to trigger indexing.\")\n                print(\"The paper will be automatically indexed when you first visit the URL.\")\n                return {\"status\": \"not_indexed\", \"url\": paper_url, \"action\": \"visit_url\"}\n        except requests.RequestException as e:\n            print(f\"Error checking paper status: {e}\")\n            return {\"status\": \"error\", \"message\": str(e)}\n\n    def check_paper(self, arxiv_id: str) -> Dict[str, Any]:\n        \"\"\"\n        Check if a paper exists on Hugging Face.\n\n        Args:\n            arxiv_id: arXiv identifier\n\n        Returns:\n            dict: Paper status and metadata\n        \"\"\"\n        try:\n            arxiv_id = self._clean_arxiv_id(arxiv_id)\n        except ValueError as e:\n            return {\"exists\": False, \"error\": str(e)}\n        paper_url = f\"https://huggingface.co/papers/{arxiv_id}\"\n\n        try:\n            response = requests.get(paper_url, timeout=10)\n            if response.status_code == 200:\n                return {\n                    \"exists\": True,\n                    \"url\": paper_url,\n                    \"arxiv_id\": arxiv_id,\n                    \"arxiv_url\": f\"https://arxiv.org/abs/{arxiv_id}\"\n                }\n            else:\n                return {\n                    \"exists\": False,\n                    \"arxiv_id\": arxiv_id,\n                    \"index_url\": paper_url,\n                    \"message\": f\"Visit {paper_url} to index this paper\"\n                }\n        except requests.RequestException as e:\n            return {\"exists\": False, \"error\": str(e)}\n\n    def link_paper_to_repo(\n        self,\n        repo_id: str,\n        arxiv_id: str,\n        repo_type: str = \"model\",\n        citation: Optional[str] = None,\n        create_pr: bool = False\n    ) -> Dict[str, Any]:\n        \"\"\"\n        Link a paper to a model/dataset/space repository.\n\n        Args:\n            repo_id: Repository identifier (e.g., \"username/repo-name\")\n            arxiv_id: arXiv identifier\n            repo_type: Type of repository (\"model\", \"dataset\", or \"space\")\n            citation: Optional full citation text\n            create_pr: Create a PR instead of direct commit\n\n        Returns:\n            dict: Operation status\n        \"\"\"\n        try:\n            arxiv_id = self._clean_arxiv_id(arxiv_id)\n        except ValueError as e:\n            print(f\"Error: {e}\")\n            return {\"status\": \"error\", \"message\": str(e)}\n\n        print(f\"Linking paper {arxiv_id} to {repo_type} {repo_id}...\")\n\n        try:\n            # Download current README\n            readme_path = hf_hub_download(\n                repo_id=repo_id,\n                filename=\"README.md\",\n                repo_type=repo_type,\n                token=self.token\n            )\n\n            with open(readme_path, 'r', encoding='utf-8') as f:\n                content = f.read()\n\n            # Parse or create YAML frontmatter\n            updated_content = self._add_paper_to_readme(content, arxiv_id, citation)\n\n            # Upload updated README\n            commit_message = f\"Add paper reference: arXiv:{arxiv_id}\"\n\n            if create_pr:\n                # Create PR (not implemented in basic version)\n                print(\"PR creation not yet implemented. Committing directly.\")\n\n            self.api.upload_file(\n                path_or_fileobj=updated_content.encode('utf-8'),\n                path_in_repo=\"README.md\",\n                repo_id=repo_id,\n                repo_type=repo_type,\n                commit_message=commit_message,\n                token=self.token\n            )\n\n            paper_url = f\"https://huggingface.co/papers/{arxiv_id}\"\n            repo_url = f\"https://huggingface.co/{repo_id}\"\n\n            print(f\"✓ Successfully linked paper to repository\")\n            print(f\"  Paper: {paper_url}\")\n            print(f\"  Repo: {repo_url}\")\n\n            return {\n                \"status\": \"success\",\n                \"paper_url\": paper_url,\n                \"repo_url\": repo_url,\n                \"arxiv_id\": arxiv_id\n            }\n\n        except Exception as e:\n            print(f\"Error linking paper: {e}\")\n            return {\"status\": \"error\", \"message\": str(e)}\n\n    def _add_paper_to_readme(\n        self,\n        content: str,\n        arxiv_id: str,\n        citation: Optional[str] = None\n    ) -> str:\n        \"\"\"\n        Add paper reference to README content.\n\n        Args:\n            content: Current README content\n            arxiv_id: arXiv identifier\n            citation: Optional citation text\n\n        Returns:\n            str: Updated README content\n        \"\"\"\n        arxiv_url = f\"https://arxiv.org/abs/{arxiv_id}\"\n        hf_paper_url = f\"https://huggingface.co/papers/{arxiv_id}\"\n\n        # Check if YAML frontmatter exists\n        yaml_pattern = r'^---\\s*\\n(.*?)\\n---\\s*\\n'\n        match = re.match(yaml_pattern, content, re.DOTALL)\n\n        if match:\n            # YAML exists, check if paper already referenced\n            if arxiv_id in content:\n                print(f\"Paper {arxiv_id} already referenced in README\")\n                return content\n\n            # Add to existing content (after YAML)\n            yaml_end = match.end()\n            before = content[:yaml_end]\n            after = content[yaml_end:]\n        else:\n            # No YAML, add minimal frontmatter\n            yaml_content = \"---\\n---\\n\\n\"\n            before = yaml_content\n            after = content\n\n        # Add paper reference section with boundary markers\n        paper_section = \"\\n<!-- paper-manager:start -->\\n\"\n        paper_section += f\"## Paper\\n\\n\"\n        paper_section += f\"This {'model' if 'model' in content.lower() else 'work'} is based on research presented in:\\n\\n\"\n        paper_section += f\"**[View on arXiv]({arxiv_url})** | \"\n        paper_section += f\"**[View on Hugging Face]({hf_paper_url})**\\n\\n\"\n\n        if citation:\n            safe_citation = self._sanitize_text(citation)\n            paper_section += f\"### Citation\\n\\n```bibtex\\n{safe_citation}\\n```\\n\\n\"\n\n        paper_section += \"<!-- paper-manager:end -->\\n\"\n\n        # Insert after YAML, before main content\n        updated_content = before + paper_section + after\n\n        return updated_content\n\n    def create_research_article(\n        self,\n        template: str,\n        title: str,\n        output: str,\n        authors: Optional[str] = None,\n        abstract: Optional[str] = None\n    ) -> Dict[str, Any]:\n        \"\"\"\n        Create a research article from template.\n\n        Args:\n            template: Template name (\"standard\", \"modern\", \"arxiv\", \"ml-report\")\n            title: Paper title\n            output: Output filename\n            authors: Comma-separated author names\n            abstract: Abstract text\n\n        Returns:\n            dict: Creation status\n        \"\"\"\n        print(f\"Creating research article with '{template}' template...\")\n\n        # Load template\n        template_dir = Path(__file__).parent.parent / \"templates\"\n        template_file = template_dir / f\"{template}.md\"\n\n        if not template_file.exists():\n            return {\n                \"status\": \"error\",\n                \"message\": f\"Template '{template}' not found at {template_file}\"\n            }\n\n        with open(template_file, 'r', encoding='utf-8') as f:\n            template_content = f.read()\n\n        # Prepare safe values for different contexts\n        date_str = datetime.now().strftime(\"%Y-%m-%d\")\n        safe_title_body = self._sanitize_text(title)\n        authors_val = authors if authors else \"Your Name\"\n        safe_authors_body = self._sanitize_text(authors_val)\n        abstract_val = abstract if abstract else \"Abstract to be written...\"\n        safe_abstract_body = self._sanitize_text(abstract_val)\n\n        # Split frontmatter from body for context-aware escaping\n        fm_pattern = r'^(---\\s*\\n)(.*?\\n)(---\\s*\\n)'\n        fm_match = re.match(fm_pattern, template_content, re.DOTALL)\n\n        if fm_match:\n            fm_open, fm_body, fm_close = fm_match.group(1), fm_match.group(2), fm_match.group(3)\n            body = template_content[fm_match.end():]\n\n            # YAML-escape values in frontmatter\n            fm_body = fm_body.replace(\"{{TITLE}}\", self._escape_yaml_value(title))\n            fm_body = fm_body.replace(\"{{AUTHORS}}\", self._escape_yaml_value(authors_val))\n            fm_body = fm_body.replace(\"{{DATE}}\", date_str)\n\n            # Sanitize values in body\n            body = body.replace(\"{{TITLE}}\", safe_title_body)\n            body = body.replace(\"{{AUTHORS}}\", safe_authors_body)\n            body = body.replace(\"{{ABSTRACT}}\", safe_abstract_body)\n            body = body.replace(\"{{DATE}}\", date_str)\n\n            content = fm_open + fm_body + fm_close + body\n        else:\n            # No frontmatter — sanitize everything\n            content = template_content.replace(\"{{TITLE}}\", safe_title_body)\n            content = content.replace(\"{{DATE}}\", date_str)\n            content = content.replace(\"{{AUTHORS}}\", safe_authors_body)\n            content = content.replace(\"{{ABSTRACT}}\", safe_abstract_body)\n\n        # Write output\n        with open(output, 'w', encoding='utf-8') as f:\n            f.write(content)\n\n        print(f\"✓ Research article created at {output}\")\n\n        return {\n            \"status\": \"success\",\n            \"output\": output,\n            \"template\": template\n        }\n\n    def get_arxiv_info(self, arxiv_id: str) -> Dict[str, Any]:\n        \"\"\"\n        Fetch paper information from arXiv API.\n\n        Args:\n            arxiv_id: arXiv identifier\n\n        Returns:\n            dict: Paper metadata\n        \"\"\"\n        try:\n            arxiv_id = self._clean_arxiv_id(arxiv_id)\n        except ValueError as e:\n            return {\"error\": str(e)}\n        api_url = f\"https://export.arxiv.org/api/query?id_list={arxiv_id}\"\n\n        try:\n            response = requests.get(api_url, timeout=10)\n            response.raise_for_status()\n\n            # Parse XML response (simplified)\n            content = response.text\n\n            # Extract basic info with regex (proper XML parsing would be better)\n            title_match = re.search(r'<title>(.*?)</title>', content, re.DOTALL)\n            authors_matches = re.findall(r'<name>(.*?)</name>', content)\n            summary_match = re.search(r'<summary>(.*?)</summary>', content, re.DOTALL)\n\n            # Sanitize all text extracted from the external API\n            raw_title = title_match.group(1).strip() if title_match else None\n            raw_authors = authors_matches[1:] if len(authors_matches) > 1 else []\n            raw_abstract = summary_match.group(1).strip() if summary_match else None\n\n            return {\n                \"arxiv_id\": arxiv_id,\n                \"title\": self._sanitize_text(raw_title) if raw_title else None,\n                \"authors\": [self._sanitize_text(a) for a in raw_authors],\n                \"abstract\": self._sanitize_text(raw_abstract) if raw_abstract else None,\n                \"arxiv_url\": f\"https://arxiv.org/abs/{arxiv_id}\",\n                \"pdf_url\": f\"https://arxiv.org/pdf/{arxiv_id}.pdf\"\n            }\n        except Exception as e:\n            return {\"error\": str(e)}\n\n    def generate_citation(\n        self,\n        arxiv_id: str,\n        format: str = \"bibtex\"\n    ) -> str:\n        \"\"\"\n        Generate citation for a paper.\n\n        Args:\n            arxiv_id: arXiv identifier\n            format: Citation format (\"bibtex\", \"apa\", \"mla\")\n\n        Returns:\n            str: Formatted citation\n        \"\"\"\n        try:\n            arxiv_id = self._clean_arxiv_id(arxiv_id)\n        except ValueError as e:\n            return f\"Error: {e}\"\n\n        info = self.get_arxiv_info(arxiv_id)\n\n        if \"error\" in info:\n            return f\"Error fetching paper info: {info['error']}\"\n\n        if format == \"bibtex\":\n            # Generate BibTeX citation\n            key = f\"arxiv{arxiv_id.replace('.', '_')}\"\n            raw_authors = \" and \".join(info.get(\"authors\", [\"Unknown\"]))\n            raw_title = info.get(\"title\", \"Untitled\")\n            year = arxiv_id.split(\".\")[0][:2]  # Extract year from ID (simplified)\n            year = f\"20{year}\" if int(year) < 50 else f\"19{year}\"\n\n            # Escape BibTeX structural characters in untrusted values\n            safe_title = raw_title.replace('{', r'\\{').replace('}', r'\\}')\n            safe_authors = raw_authors.replace('{', r'\\{').replace('}', r'\\}')\n\n            citation = f\"\"\"@article{{{key},\n  title={{{safe_title}}},\n  author={{{safe_authors}}},\n  journal={{arXiv preprint arXiv:{arxiv_id}}},\n  year={{{year}}}\n}}\"\"\"\n            return citation\n\n        return f\"Format '{format}' not yet implemented\"\n\n    # Patterns for valid arXiv IDs\n    _ARXIV_ID_MODERN = re.compile(r'^\\d{4}\\.\\d{4,5}(v\\d+)?$')\n    _ARXIV_ID_LEGACY = re.compile(r'^[a-zA-Z\\-]+/\\d{7}(v\\d+)?$')\n\n    @staticmethod\n    def _clean_arxiv_id(arxiv_id: str) -> str:\n        \"\"\"Clean, normalize, and validate arXiv ID.\n\n        Raises:\n            ValueError: If the cleaned ID does not match a valid arXiv format.\n        \"\"\"\n        # Remove common prefixes and whitespace\n        arxiv_id = arxiv_id.strip()\n        arxiv_id = re.sub(r'^(arxiv:|arXiv:)', '', arxiv_id, flags=re.IGNORECASE)\n        arxiv_id = re.sub(r'https?://arxiv\\.org/(abs|pdf)/', '', arxiv_id)\n        arxiv_id = arxiv_id.replace('.pdf', '')\n\n        # Validate format\n        if not (PaperManager._ARXIV_ID_MODERN.match(arxiv_id)\n                or PaperManager._ARXIV_ID_LEGACY.match(arxiv_id)):\n            raise ValueError(\n                f\"Invalid arXiv ID: {arxiv_id!r}. \"\n                \"Expected format: YYMM.NNNNN[vN] or category/YYMMNNN[vN]\"\n            )\n\n        return arxiv_id\n\n    @staticmethod\n    def _escape_yaml_value(value: str) -> str:\n        \"\"\"Escape a string for safe use as a YAML scalar value.\n\n        Wraps in double quotes and escapes internal quotes and backslashes\n        to prevent YAML injection via crafted titles/authors.\n        \"\"\"\n        value = value.replace('\\\\', '\\\\\\\\').replace('\"', '\\\\\"')\n        return f'\"{value}\"'\n\n    @staticmethod\n    def _sanitize_text(text: str) -> str:\n        \"\"\"Sanitize untrusted text for safe inclusion in Markdown/YAML output.\n\n        Normalizes whitespace, strips control characters, and neutralizes\n        markdown code-fence breakout and YAML document delimiters.\n        \"\"\"\n        # Remove control characters (keep newlines and tabs)\n        text = re.sub(r'[\\x00-\\x08\\x0b\\x0c\\x0e-\\x1f\\x7f]', '', text)\n        # Normalize whitespace runs (collapse multiple spaces/tabs, preserve single newlines)\n        text = re.sub(r'[^\\S\\n]+', ' ', text)\n        text = re.sub(r'\\n{3,}', '\\n\\n', text)\n        # Neutralize markdown code fence breakout\n        text = text.replace('```', r'\\`\\`\\`')\n        # Neutralize YAML document delimiters at line start\n        text = re.sub(r'^---', r'\\\\---', text, flags=re.MULTILINE)\n        return text.strip()\n\n\ndef main():\n    \"\"\"Main CLI entry point.\"\"\"\n    parser = argparse.ArgumentParser(\n        description=\"Paper Manager for Hugging Face Hub\",\n        formatter_class=argparse.RawDescriptionHelpFormatter\n    )\n\n    subparsers = parser.add_subparsers(dest=\"command\", help=\"Command to execute\")\n\n    # Index command\n    index_parser = subparsers.add_parser(\"index\", help=\"Index a paper from arXiv\")\n    index_parser.add_argument(\"--arxiv-id\", required=True, help=\"arXiv paper ID\")\n\n    # Check command\n    check_parser = subparsers.add_parser(\"check\", help=\"Check if paper exists\")\n    check_parser.add_argument(\"--arxiv-id\", required=True, help=\"arXiv paper ID\")\n\n    # Link command\n    link_parser = subparsers.add_parser(\"link\", help=\"Link paper to repository\")\n    link_parser.add_argument(\"--repo-id\", required=True, help=\"Repository ID\")\n    link_parser.add_argument(\"--repo-type\", default=\"model\", choices=[\"model\", \"dataset\", \"space\"])\n    link_parser.add_argument(\"--arxiv-id\", help=\"Single arXiv ID\")\n    link_parser.add_argument(\"--arxiv-ids\", help=\"Comma-separated arXiv IDs\")\n    link_parser.add_argument(\"--citation\", help=\"Full citation text\")\n    link_parser.add_argument(\"--create-pr\", action=\"store_true\", help=\"Create PR instead of direct commit\")\n\n    # Create command\n    create_parser = subparsers.add_parser(\"create\", help=\"Create research article\")\n    create_parser.add_argument(\"--template\", required=True, help=\"Template name\")\n    create_parser.add_argument(\"--title\", required=True, help=\"Paper title\")\n    create_parser.add_argument(\"--output\", required=True, help=\"Output filename\")\n    create_parser.add_argument(\"--authors\", help=\"Comma-separated authors\")\n    create_parser.add_argument(\"--abstract\", help=\"Abstract text\")\n\n    # Info command\n    info_parser = subparsers.add_parser(\"info\", help=\"Get paper information\")\n    info_parser.add_argument(\"--arxiv-id\", required=True, help=\"arXiv paper ID\")\n    info_parser.add_argument(\"--format\", default=\"json\", choices=[\"json\", \"text\"])\n\n    # Citation command\n    citation_parser = subparsers.add_parser(\"citation\", help=\"Generate citation\")\n    citation_parser.add_argument(\"--arxiv-id\", required=True, help=\"arXiv paper ID\")\n    citation_parser.add_argument(\"--format\", default=\"bibtex\", choices=[\"bibtex\", \"apa\", \"mla\"])\n\n    # Search command\n    search_parser = subparsers.add_parser(\"search\", help=\"Search papers\")\n    search_parser.add_argument(\"--query\", required=True, help=\"Search query\")\n\n    args = parser.parse_args()\n\n    if not args.command:\n        parser.print_help()\n        sys.exit(1)\n\n    # Initialize manager\n    manager = PaperManager()\n\n    # Execute command\n    if args.command == \"index\":\n        result = manager.index_paper(args.arxiv_id)\n        print(json.dumps(result, indent=2))\n\n    elif args.command == \"check\":\n        result = manager.check_paper(args.arxiv_id)\n        print(json.dumps(result, indent=2))\n\n    elif args.command == \"link\":\n        arxiv_ids = []\n        if args.arxiv_id:\n            arxiv_ids.append(args.arxiv_id)\n        if args.arxiv_ids:\n            arxiv_ids.extend([id.strip() for id in args.arxiv_ids.split(\",\")])\n\n        if not arxiv_ids:\n            print(\"Error: Must provide --arxiv-id or --arxiv-ids\")\n            sys.exit(1)\n\n        for arxiv_id in arxiv_ids:\n            result = manager.link_paper_to_repo(\n                repo_id=args.repo_id,\n                arxiv_id=arxiv_id,\n                repo_type=args.repo_type,\n                citation=args.citation,\n                create_pr=args.create_pr\n            )\n            print(json.dumps(result, indent=2))\n\n    elif args.command == \"create\":\n        result = manager.create_research_article(\n            template=args.template,\n            title=args.title,\n            output=args.output,\n            authors=args.authors,\n            abstract=args.abstract\n        )\n        print(json.dumps(result, indent=2))\n\n    elif args.command == \"info\":\n        result = manager.get_arxiv_info(args.arxiv_id)\n        if args.format == \"json\":\n            print(json.dumps(result, indent=2))\n        else:\n            if \"error\" in result:\n                print(f\"Error: {result['error']}\")\n            else:\n                print(f\"Title: {result.get('title')}\")\n                print(f\"Authors: {', '.join(result.get('authors', []))}\")\n                print(f\"arXiv URL: {result.get('arxiv_url')}\")\n                print(f\"\\nAbstract:\\n{result.get('abstract')}\")\n\n    elif args.command == \"citation\":\n        citation = manager.generate_citation(args.arxiv_id, args.format)\n        print(citation)\n\n    elif args.command == \"search\":\n        print(f\"Searching for: {args.query}\")\n        print(\"Search functionality coming soon!\")\n        print(f\"Visit: https://huggingface.co/papers?search={args.query}\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-paper-publisher/templates/arxiv.md",
    "content": "---\ntitle: {{TITLE}}\nauthors: {{AUTHORS}}\ndate: {{DATE}}\narxiv:\nlayout: arxiv\n---\n\n# {{TITLE}}\n\n<div class=\"arxiv-header\">\n\n**{{AUTHORS}}**\n\n*Submitted to arXiv: {{DATE}}*\n\n</div>\n\n---\n\n**Abstract**—{{ABSTRACT}}\n\n**Index Terms**—Machine Learning, Deep Learning, Neural Networks\n\n---\n\n## I. INTRODUCTION\n\n**T**HIS paper presents [brief overview of the contribution]. The main contributions of this work are:\n\n- Contribution 1: Description\n- Contribution 2: Description\n- Contribution 3: Description\n\nThe rest of this paper is organized as follows: Section II reviews related work, Section III describes the proposed methodology, Section IV presents experimental results, and Section V concludes the paper.\n\n## II. RELATED WORK\n\n### A. Subarea 1\n\nDiscussion of relevant prior work in subarea 1.\n\n### B. Subarea 2\n\nDiscussion of relevant prior work in subarea 2.\n\n### C. Comparison with Prior Art\n\nTable comparing existing methods:\n\n| Method | Year | Approach | Limitation |\n|--------|------|----------|------------|\n| Method A [1] | 2020 | Description | Issue |\n| Method B [2] | 2021 | Description | Issue |\n| Method C [3] | 2023 | Description | Issue |\n\n## III. METHODOLOGY\n\n### A. Problem Formulation\n\nLet $X = \\{x_1, x_2, ..., x_n\\}$ be the input space and $Y = \\{y_1, y_2, ..., y_m\\}$ be the output space. We aim to learn a function $f: X \\rightarrow Y$ that minimizes:\n\n$$\n\\mathcal{L}(\\theta) = \\sum_{i=1}^{N} \\ell(f(x_i; \\theta), y_i) + \\lambda R(\\theta)\n$$\n\nwhere $\\theta$ represents model parameters, $\\ell$ is the loss function, and $R(\\theta)$ is a regularization term.\n\n### B. Model Architecture\n\nDescribe the model architecture in detail.\n\n**Input Layer**: Description\n\n**Hidden Layers**: Let $h^{(l)}$ denote the activation of layer $l$:\n\n$$\nh^{(l)} = \\sigma(W^{(l)}h^{(l-1)} + b^{(l)})\n$$\n\nwhere $\\sigma$ is the activation function, $W^{(l)}$ is the weight matrix, and $b^{(l)}$ is the bias vector.\n\n**Output Layer**: Description\n\n### C. Training Algorithm\n\n**Algorithm 1**: Training Procedure\n\n```\n1: Input: Training data D = {(xi, yi)}\n2: Initialize parameters θ\n3: for epoch = 1 to max_epochs do\n4:     for each mini-batch B ⊂ D do\n5:         Compute loss: L(θ) = 1/|B| Σ ℓ(f(xi; θ), yi)\n6:         Update: θ ← θ - η∇θL(θ)\n7:     end for\n8: end for\n9: Return: Trained parameters θ*\n```\n\n### D. Complexity Analysis\n\n**Time Complexity**: The training algorithm has time complexity $O(NTE)$ where $N$ is the dataset size, $T$ is the number of epochs, and $E$ is the per-example computation cost.\n\n**Space Complexity**: The model requires $O(P)$ space where $P$ is the number of parameters.\n\n## IV. EXPERIMENTS\n\n### A. Experimental Setup\n\n**Datasets**: We evaluate on the following benchmarks:\n\n1. **Dataset A**: Description (size, splits, characteristics)\n2. **Dataset B**: Description\n3. **Dataset C**: Description\n\n**Baselines**: We compare against:\n\n- Baseline 1 [4]: Description\n- Baseline 2 [5]: Description\n- Baseline 3 [6]: Description\n\n**Evaluation Metrics**: Performance is measured using:\n\n- Metric 1: Definition\n- Metric 2: Definition\n- Metric 3: Definition\n\n**Implementation Details**: All experiments are conducted using:\n\n- Framework: PyTorch 2.0\n- Hardware: NVIDIA A100 GPUs\n- Hyperparameters: Learning rate $\\eta = 10^{-4}$, batch size $B = 32$, epochs $T = 100$\n\n### B. Quantitative Results\n\n**TABLE I: MAIN RESULTS**\n\n| Method | Dataset A | Dataset B | Dataset C | Average |\n|--------|-----------|-----------|-----------|---------|\n| Baseline 1 [4] | 82.3 | 78.5 | 80.1 | 80.3 |\n| Baseline 2 [5] | 85.7 | 82.1 | 83.9 | 83.9 |\n| Baseline 3 [6] | 88.1 | 85.3 | 86.7 | 86.7 |\n| **Ours** | **91.2** | **88.9** | **90.1** | **90.1** |\n\nOur method achieves state-of-the-art performance across all three benchmarks, with an average improvement of 3.4 percentage points over the previous best method.\n\n### C. Ablation Study\n\n**TABLE II: ABLATION STUDY RESULTS**\n\n| Configuration | Dataset A | Δ |\n|---------------|-----------|---|\n| Full Model | 91.2 | - |\n| w/o Component A | 88.7 | -2.5 |\n| w/o Component B | 89.4 | -1.8 |\n| w/o Component C | 90.5 | -0.7 |\n\nThe ablation study demonstrates that all components contribute to the final performance, with Component A having the largest impact.\n\n### D. Qualitative Analysis\n\n**Fig. 1**: Visualization of learned representations using t-SNE projection.\n\n**Fig. 2**: Example predictions showing correct classifications and failure cases.\n\n### E. Computational Efficiency\n\n**TABLE III: COMPUTATIONAL REQUIREMENTS**\n\n| Method | Parameters | FLOPs | Inference (ms) |\n|--------|------------|-------|----------------|\n| Baseline 1 [4] | 50M | 10G | 8.2 |\n| Baseline 2 [5] | 100M | 25G | 15.7 |\n| Baseline 3 [6] | 200M | 50G | 28.3 |\n| **Ours** | **80M** | **18G** | **12.1** |\n\nOur method achieves superior performance while maintaining reasonable computational costs.\n\n## V. DISCUSSION\n\n### A. Analysis of Results\n\nThe experimental results demonstrate that [analysis].\n\n### B. Limitations\n\nCurrent limitations include:\n\n1. Limitation 1: Description\n2. Limitation 2: Description\n3. Limitation 3: Description\n\n### C. Broader Impact\n\nPotential applications include:\n\n- Application 1: Description\n- Application 2: Description\n- Application 3: Description\n\n**Ethical Considerations**: [Discussion of potential risks and mitigation strategies]\n\n## VI. CONCLUSION\n\nThis paper presented {{TITLE}}, which achieves [main achievement]. The key contributions are:\n\n1. Contribution 1: Summary\n2. Contribution 2: Summary\n3. Contribution 3: Summary\n\nFuture work will focus on [future directions].\n\n## ACKNOWLEDGMENTS\n\nThe authors thank [acknowledgments]. This work was supported by [funding sources].\n\n## REFERENCES\n\n[1] Author A et al., \"Paper Title,\" *Conference Name*, 2020.\n\n[2] Author B et al., \"Paper Title,\" *Journal Name*, vol. X, no. Y, pp. Z-W, 2021.\n\n[3] Author C et al., \"Paper Title,\" *arXiv preprint arXiv:XXXX.XXXXX*, 2023.\n\n[4] Author D et al., \"Baseline 1 Paper,\" *Conference*, 2019.\n\n[5] Author E et al., \"Baseline 2 Paper,\" *Conference*, 2021.\n\n[6] Author F et al., \"Baseline 3 Paper,\" *Conference*, 2023.\n\n---\n\n## APPENDIX A: ADDITIONAL EXPERIMENTS\n\nSupplementary experimental results.\n\n## APPENDIX B: PROOF OF THEOREM\n\n**Theorem 1**: Statement of theorem.\n\n**Proof**: Detailed proof.\n\n## APPENDIX C: HYPERPARAMETERS\n\nComplete list of hyperparameters used in all experiments:\n\n| Hyperparameter | Value | Description |\n|----------------|-------|-------------|\n| Learning rate | $10^{-4}$ | Initial learning rate |\n| Batch size | 32 | Training batch size |\n| Epochs | 100 | Number of training epochs |\n| Optimizer | AdamW | Optimization algorithm |\n| Weight decay | 0.01 | L2 regularization coefficient |\n| Warmup steps | 1000 | LR warmup duration |\n| Dropout | 0.1 | Dropout probability |\n\n---\n\n<style>\n.arxiv-header {\n    text-align: center;\n    margin-bottom: 2em;\n}\n\nbody {\n    font-family: 'Computer Modern', serif;\n    line-height: 1.6;\n}\n\nh1 {\n    text-align: center;\n    font-size: 1.8em;\n    margin-top: 1em;\n}\n\nh2 {\n    font-size: 1.3em;\n    margin-top: 1.5em;\n    font-weight: bold;\n}\n\nh3 {\n    font-size: 1.1em;\n    font-style: italic;\n    margin-top: 1em;\n}\n\ntable {\n    margin: 1em auto;\n    border-collapse: collapse;\n}\n\nth, td {\n    border: 1px solid #000;\n    padding: 0.5em;\n    text-align: center;\n}\n</style>\n"
  },
  {
    "path": "skills/hugging-face-paper-publisher/templates/ml-report.md",
    "content": "---\ntitle: {{TITLE}}\nauthors: {{AUTHORS}}\ndate: {{DATE}}\ntype: ml-experiment-report\ntags: [machine-learning, experiment-report]\n---\n\n# {{TITLE}}\n\n**Machine Learning Experiment Report**\n\n**Researchers**: {{AUTHORS}}\n**Date**: {{DATE}}\n**Status**: Draft / Final / In Review\n\n---\n\n## Executive Summary\n\n{{ABSTRACT}}\n\n### Key Findings\n- Finding 1\n- Finding 2\n- Finding 3\n\n### Recommendations\n- Recommendation 1\n- Recommendation 2\n\n---\n\n## 1. Objective\n\n### 1.1 Research Question\n\nWhat specific question are we trying to answer?\n\n### 1.2 Success Criteria\n\nHow will we measure success?\n\n- **Metric 1**: Target value\n- **Metric 2**: Target value\n- **Metric 3**: Target value\n\n### 1.3 Constraints\n\n- Computational budget\n- Time constraints\n- Data availability\n\n---\n\n## 2. Dataset\n\n### 2.1 Data Description\n\n| Property | Value |\n|----------|-------|\n| **Name** | Dataset name |\n| **Source** | Origin of data |\n| **Size** | Number of examples |\n| **Features** | Feature count and types |\n| **Target** | What we're predicting |\n| **License** | Usage rights |\n\n### 2.2 Data Splits\n\n| Split | Size | Percentage |\n|-------|------|------------|\n| Train | X examples | Y% |\n| Validation | X examples | Y% |\n| Test | X examples | Y% |\n\n### 2.3 Data Quality\n\n- **Missing Values**: Analysis and handling\n- **Outliers**: Detection and treatment\n- **Imbalance**: Class distribution\n- **Preprocessing**: Transformations applied\n\n### 2.4 Exploratory Analysis\n\nKey insights from data exploration:\n\n1. Pattern 1\n2. Pattern 2\n3. Pattern 3\n\n---\n\n## 3. Model\n\n### 3.1 Architecture\n\nDescribe the model architecture:\n\n```\nInput → Layer 1 → Layer 2 → ... → Output\n```\n\n### 3.2 Model Specifications\n\n| Component | Configuration |\n|-----------|--------------|\n| **Type** | Model family |\n| **Parameters** | Total count |\n| **Layers** | Number and types |\n| **Activation** | Functions used |\n| **Dropout** | Regularization rate |\n\n### 3.3 Baseline Models\n\nWhat are we comparing against?\n\n1. **Baseline 1**: Simple baseline (e.g., majority class)\n2. **Baseline 2**: Standard approach (e.g., logistic regression)\n3. **Baseline 3**: Previous best method\n\n---\n\n## 4. Training\n\n### 4.1 Hyperparameters\n\n| Hyperparameter | Value | Rationale |\n|----------------|-------|-----------|\n| Learning Rate | 1e-4 | Tuned via grid search |\n| Batch Size | 32 | GPU memory constraint |\n| Epochs | 100 | Based on validation |\n| Optimizer | AdamW | Standard for transformers |\n| Weight Decay | 0.01 | Regularization |\n| LR Schedule | Cosine | Smooth convergence |\n\n### 4.2 Training Process\n\n```python\n# Training pseudocode\nfor epoch in range(num_epochs):\n    train_loss = train_one_epoch(model, train_loader)\n    val_loss = validate(model, val_loader)\n    if val_loss < best_loss:\n        save_checkpoint(model)\n```\n\n### 4.3 Computational Resources\n\n| Resource | Specification |\n|----------|--------------|\n| **Hardware** | GPU model and count |\n| **Memory** | RAM and VRAM |\n| **Training Time** | Hours/days |\n| **Cost** | Estimated compute cost |\n\n### 4.4 Training Curves\n\nInclude plots of:\n- Training loss over time\n- Validation loss over time\n- Learning rate schedule\n- Other relevant metrics\n\n---\n\n## 5. Results\n\n### 5.1 Quantitative Results\n\n| Model | Accuracy | Precision | Recall | F1 | AUC |\n|-------|----------|-----------|--------|-------|-----|\n| Baseline 1 | 0.65 | 0.64 | 0.66 | 0.65 | 0.70 |\n| Baseline 2 | 0.78 | 0.77 | 0.79 | 0.78 | 0.82 |\n| **Ours** | **0.89** | **0.88** | **0.90** | **0.89** | **0.93** |\n\n### 5.2 Statistical Significance\n\n- **P-value**: Statistical test results\n- **Confidence Intervals**: 95% CI for key metrics\n- **Multiple Runs**: Mean ± std over N runs\n\n### 5.3 Per-Class Performance\n\n| Class | Precision | Recall | F1 | Support |\n|-------|-----------|--------|-----|---------|\n| Class 1 | 0.90 | 0.88 | 0.89 | 500 |\n| Class 2 | 0.87 | 0.91 | 0.89 | 450 |\n| Class 3 | 0.88 | 0.89 | 0.88 | 550 |\n\n### 5.4 Qualitative Results\n\n#### Success Cases\n\nExamples where the model performs well.\n\n#### Failure Cases\n\nExamples where the model fails and why.\n\n---\n\n## 6. Analysis\n\n### 6.1 Ablation Study\n\n| Configuration | Score | Change |\n|---------------|-------|--------|\n| Full Model | 0.89 | - |\n| - Feature Set A | 0.85 | -0.04 |\n| - Feature Set B | 0.87 | -0.02 |\n| - Augmentation | 0.86 | -0.03 |\n\n### 6.2 Error Analysis\n\nWhat types of errors is the model making?\n\n1. **Error Type 1**: Frequency and cause\n2. **Error Type 2**: Frequency and cause\n3. **Error Type 3**: Frequency and cause\n\n### 6.3 Feature Importance\n\nWhich features matter most?\n\n| Feature | Importance | Notes |\n|---------|------------|-------|\n| Feature 1 | 0.35 | Most predictive |\n| Feature 2 | 0.28 | Secondary signal |\n| Feature 3 | 0.15 | Marginal impact |\n\n---\n\n## 7. Robustness\n\n### 7.1 Cross-Dataset Evaluation\n\nHow does the model generalize to other datasets?\n\n| Dataset | Score | Notes |\n|---------|-------|-------|\n| Original | 0.89 | Training distribution |\n| Dataset A | 0.82 | Similar domain |\n| Dataset B | 0.71 | Different domain |\n\n### 7.2 Adversarial Robustness\n\nPerformance under adversarial conditions.\n\n### 7.3 Fairness Analysis\n\nPerformance across demographic groups or sensitive attributes.\n\n---\n\n## 8. Deployment Considerations\n\n### 8.1 Model Size\n\n- **Parameters**: Total count\n- **Disk Size**: MB/GB on disk\n- **Memory**: Runtime memory usage\n\n### 8.2 Inference Speed\n\n| Batch Size | Latency | Throughput |\n|------------|---------|------------|\n| 1 | 10ms | 100 QPS |\n| 8 | 45ms | 178 QPS |\n| 32 | 150ms | 213 QPS |\n\n### 8.3 Production Requirements\n\n- **Dependencies**: Software requirements\n- **Infrastructure**: Hardware needs\n- **Monitoring**: What to track in production\n- **Fallback**: Backup strategy\n\n---\n\n## 9. Conclusions\n\n### 9.1 Summary\n\nKey takeaways from the experiment.\n\n### 9.2 Did We Meet Objectives?\n\n| Objective | Status | Notes |\n|-----------|--------|-------|\n| Objective 1 | ✅ Met | Achieved target |\n| Objective 2 | ⚠️ Partial | Close to target |\n| Objective 3 | ❌ Not Met | Needs more work |\n\n### 9.3 Lessons Learned\n\nWhat did we learn from this experiment?\n\n1. Lesson 1\n2. Lesson 2\n3. Lesson 3\n\n---\n\n## 10. Next Steps\n\n### 10.1 Short-term (1-2 weeks)\n\n- [ ] Task 1\n- [ ] Task 2\n- [ ] Task 3\n\n### 10.2 Medium-term (1-2 months)\n\n- [ ] Task 1\n- [ ] Task 2\n- [ ] Task 3\n\n### 10.3 Long-term (3+ months)\n\n- [ ] Task 1\n- [ ] Task 2\n- [ ] Task 3\n\n---\n\n## References\n\n1. Reference 1\n2. Reference 2\n3. Reference 3\n\n---\n\n## Appendix\n\n### A. Hyperparameter Search\n\nResults from hyperparameter tuning.\n\n### B. Additional Experiments\n\nSupplementary experiments not included in main text.\n\n### C. Code\n\nLinks to code repositories:\n- Training code: [link]\n- Evaluation code: [link]\n- Model checkpoint: [link]\n\n### D. Data Card\n\nDetailed data documentation following standard practices.\n\n### E. Model Card\n\nModel documentation following responsible AI practices.\n"
  },
  {
    "path": "skills/hugging-face-paper-publisher/templates/modern.md",
    "content": "---\ntitle: {{TITLE}}\nauthors: {{AUTHORS}}\ndate: {{DATE}}\narxiv:\ntags: [machine-learning, ai]\nlayout: modern\n---\n\n<div class=\"header\">\n\n# {{TITLE}}\n\n<div class=\"authors\">\n{{AUTHORS}}\n</div>\n\n<div class=\"date\">\n{{DATE}}\n</div>\n\n<div class=\"links\">\n[arXiv](#) · [PDF](#) · [Code](#) · [Demo](#)\n</div>\n\n</div>\n\n---\n\n## Abstract\n\n<div class=\"abstract\">\n\n{{ABSTRACT}}\n\n</div>\n\n---\n\n## Introduction\n\nModern research requires clear, accessible communication. This template provides a clean, web-friendly format inspired by Distill and modern scientific publications.\n\n<div class=\"key-insight\">\n💡 **Key Insight**: Present your main contribution upfront to engage readers immediately.\n</div>\n\n### Why This Matters\n\nExplain the significance of your work in plain language. What real-world problems does it solve?\n\n### Our Approach\n\nSummarize your methodology at a high level before diving into details.\n\n---\n\n## Background\n\n<div class=\"definition\">\n**Definition**: Clearly define key terms and concepts early in the paper.\n</div>\n\nProvide context necessary to understand your contribution without overwhelming readers with details.\n\n### Problem Statement\n\nFormally state the problem you're addressing.\n\n### Challenges\n\nWhat makes this problem difficult?\n\n1. **Challenge 1**: Description\n2. **Challenge 2**: Description\n3. **Challenge 3**: Description\n\n---\n\n## Method\n\nPresent your approach with clear visual aids and intuitive explanations.\n\n<div class=\"figure\">\n\n```\n[Diagram of your architecture goes here]\n```\n\n**Figure 1**: Overview of the proposed method. Caption explains the key components.\n\n</div>\n\n### Model Architecture\n\nDescribe your model systematically:\n\n```python\n# Pseudocode example\nclass YourModel:\n    def __init__(self):\n        self.encoder = Encoder()\n        self.decoder = Decoder()\n\n    def forward(self, x):\n        z = self.encoder(x)\n        output = self.decoder(z)\n        return output\n```\n\n### Training Strategy\n\nExplain how you train the model, including:\n\n- **Objective Function**: Mathematical formulation\n- **Optimization**: Algorithm and hyperparameters\n- **Regularization**: Techniques to prevent overfitting\n\n---\n\n## Experiments\n\n### Setup\n\n<div class=\"experiment-details\">\n\n| Component | Configuration |\n|-----------|--------------|\n| **Dataset** | Name, Size, Split |\n| **Hardware** | GPU Type, RAM |\n| **Framework** | PyTorch 2.0, Transformers |\n| **Training Time** | Hours/Days |\n\n</div>\n\n### Results\n\nPresent results clearly with tables and visualizations.\n\n<div class=\"results-table\">\n\n| Model | Accuracy | F1 Score | Params | Speed |\n|-------|----------|----------|--------|-------|\n| Baseline | 85.2% | 0.84 | 100M | 100 tok/s |\n| **Ours** | **92.1%** | **0.91** | 120M | 95 tok/s |\n| SOTA | 90.5% | 0.89 | 300M | 60 tok/s |\n\n</div>\n\n<div class=\"insight\">\n🔍 **Observation**: Our method achieves state-of-the-art performance with fewer parameters.\n</div>\n\n### Analysis\n\nDeep dive into what the results reveal:\n\n1. **Performance**: How does your method compare?\n2. **Efficiency**: What are the computational costs?\n3. **Robustness**: How does it perform across different scenarios?\n\n---\n\n## Ablation Study\n\nSystematically evaluate each component's contribution.\n\n<div class=\"ablation-results\">\n\n| Configuration | Score | Δ |\n|---------------|-------|---|\n| Full Model | 92.1% | - |\n| - Component A | 89.3% | -2.8% |\n| - Component B | 90.1% | -2.0% |\n| - Component C | 91.5% | -0.6% |\n\n</div>\n\n**Conclusion**: All components contribute meaningfully, with Component A being most critical.\n\n---\n\n## Discussion\n\n### What We Learned\n\nSynthesize insights from your experiments.\n\n### Limitations\n\n<div class=\"limitations\">\n\n⚠️ **Current Limitations**:\n\n1. Performance on domain X is limited\n2. Computational requirements are high\n3. Requires large training datasets\n\n</div>\n\n### Future Directions\n\nWhere should the community go next?\n\n- **Direction 1**: Description\n- **Direction 2**: Description\n- **Direction 3**: Description\n\n---\n\n## Related Work\n\nCompare and contrast with existing methods.\n\n### Prior Approaches\n\n| Method | Year | Key Idea | Limitation |\n|--------|------|----------|------------|\n| Method A | 2020 | Approach 1 | Issue X |\n| Method B | 2021 | Approach 2 | Issue Y |\n| Method C | 2023 | Approach 3 | Issue Z |\n\n### How We Differ\n\nClearly articulate what's novel about your work.\n\n---\n\n## Conclusion\n\n<div class=\"conclusion\">\n\nWe presented **{{TITLE}}**, which achieves:\n\n1. ✅ **Main contribution 1**\n2. ✅ **Main contribution 2**\n3. ✅ **Main contribution 3**\n\nOur results demonstrate [key finding], opening new directions for [future work].\n\n</div>\n\n---\n\n## Reproducibility\n\n<div class=\"reproducibility\">\n\n### Code & Data\n\n- **Code**: [github.com/username/repo](#)\n- **Models**: [huggingface.co/username/model](#)\n- **Datasets**: [huggingface.co/datasets/username/dataset](#)\n- **Demo**: [huggingface.co/spaces/username/demo](#)\n\n### Citation\n\n```bibtex\n@article{yourpaper2025,\n  title={{{{TITLE}}}},\n  author={{{{AUTHORS}}}},\n  year={2025},\n  journal={arXiv preprint}\n}\n```\n\n</div>\n\n---\n\n## Acknowledgments\n\nThank funding agencies, collaborators, and computing resources that made this work possible.\n\n---\n\n<div class=\"appendix\">\n\n## Appendix\n\n### A. Additional Results\n\nSupplementary experiments and extended results.\n\n### B. Hyperparameters\n\nComplete training configuration:\n\n```yaml\nlearning_rate: 1e-4\nbatch_size: 32\nepochs: 100\noptimizer: AdamW\nscheduler: cosine\nwarmup_steps: 1000\n```\n\n### C. Dataset Details\n\nDetailed information about datasets used.\n\n</div>\n\n---\n\n<style>\n.header { text-align: center; margin-bottom: 2em; }\n.authors { font-size: 1.2em; margin: 0.5em 0; }\n.date { color: #666; margin: 0.5em 0; }\n.links { margin-top: 1em; }\n.abstract { background: #f5f5f5; padding: 1.5em; border-radius: 8px; margin: 1em 0; }\n.key-insight, .insight { background: #e8f4f8; border-left: 4px solid #2196F3; padding: 1em; margin: 1em 0; }\n.definition { background: #fff3e0; border-left: 4px solid #ff9800; padding: 1em; margin: 1em 0; }\n.limitations { background: #ffebee; border-left: 4px solid #f44336; padding: 1em; margin: 1em 0; }\n.conclusion { background: #e8f5e9; border-left: 4px solid #4caf50; padding: 1.5em; margin: 1em 0; }\n.figure { text-align: center; margin: 2em 0; }\n.experiment-details, .results-table, .ablation-results { margin: 1em 0; }\n.reproducibility { background: #f5f5f5; padding: 1.5em; border-radius: 8px; margin: 2em 0; }\n</style>\n"
  },
  {
    "path": "skills/hugging-face-paper-publisher/templates/standard.md",
    "content": "---\ntitle: {{TITLE}}\nauthors: {{AUTHORS}}\ndate: {{DATE}}\narxiv:\ntags: [machine-learning, deep-learning]\n---\n\n# {{TITLE}}\n\n**{{AUTHORS}}**\n\n*{{DATE}}*\n\n---\n\n## Abstract\n\n{{ABSTRACT}}\n\n---\n\n## 1. Introduction\n\nProvide background and motivation for your research. Explain:\n- What problem are you addressing?\n- Why is it important?\n- What is novel about your approach?\n\n### 1.1 Motivation\n\nDescribe the real-world context and importance of the problem.\n\n### 1.2 Contributions\n\nList the main contributions of your work:\n1. First contribution\n2. Second contribution\n3. Third contribution\n\n---\n\n## 2. Related Work\n\nSurvey previous research relevant to your work. Organize by:\n- Different approaches to the problem\n- Complementary methods\n- Alternative solutions\n\n### 2.1 Previous Approaches\n\nDiscuss earlier methods and their limitations.\n\n### 2.2 Recent Advances\n\nHighlight recent developments in the field.\n\n---\n\n## 3. Background\n\nProvide necessary technical background for understanding your work.\n\n### 3.1 Problem Formulation\n\nFormally define the problem you're solving.\n\n### 3.2 Preliminaries\n\nIntroduce key concepts, notation, and terminology.\n\n---\n\n## 4. Methodology\n\nDescribe your approach in detail.\n\n### 4.1 Overview\n\nProvide a high-level description of your method.\n\n### 4.2 Model Architecture\n\nDetail the technical components of your system.\n\n### 4.3 Training Procedure\n\nExplain how the model is trained.\n\n### 4.4 Implementation Details\n\nProvide reproducibility information:\n- Hyperparameters\n- Hardware requirements\n- Software dependencies\n\n---\n\n## 5. Experiments\n\nPresent your experimental setup and results.\n\n### 5.1 Datasets\n\nDescribe the datasets used for evaluation.\n\n### 5.2 Evaluation Metrics\n\nDefine the metrics used to assess performance.\n\n### 5.3 Baselines\n\nList comparison methods.\n\n### 5.4 Experimental Setup\n\nDetail the experimental configuration.\n\n---\n\n## 6. Results\n\nPresent and analyze your findings.\n\n### 6.1 Main Results\n\nReport primary experimental results.\n\n| Model | Dataset | Metric | Score |\n|-------|---------|--------|-------|\n| Baseline | Dataset A | Accuracy | 0.85 |\n| Ours | Dataset A | Accuracy | 0.92 |\n\n### 6.2 Ablation Studies\n\nAnalyze the contribution of different components.\n\n### 6.3 Qualitative Analysis\n\nProvide examples and case studies.\n\n---\n\n## 7. Discussion\n\nInterpret your results and discuss implications.\n\n### 7.1 Analysis\n\nWhat do the results tell us?\n\n### 7.2 Limitations\n\nAcknowledge limitations of your approach.\n\n### 7.3 Broader Impact\n\nDiscuss societal implications and potential applications.\n\n---\n\n## 8. Conclusion\n\nSummarize your work and contributions.\n\n### 8.1 Summary\n\nRecap the main findings.\n\n### 8.2 Future Work\n\nSuggest directions for future research.\n\n---\n\n## Acknowledgments\n\nThank collaborators, funding sources, and computational resources.\n\n---\n\n## References\n\n1. Author A, et al. \"Paper Title.\" Conference/Journal, Year.\n2. Author B, et al. \"Another Paper.\" Conference/Journal, Year.\n\n---\n\n## Appendix\n\n### A. Additional Experiments\n\nSupplementary experimental results.\n\n### B. Implementation Details\n\nCode snippets and configuration details.\n\n### C. Hyperparameters\n\nComplete list of hyperparameters used.\n"
  },
  {
    "path": "skills/hugging-face-tool-builder/SKILL.md",
    "content": "---\nname: hugging-face-tool-builder\ndescription: Use this skill when the user wants to build tool/scripts or achieve a task where using data from the Hugging Face API would help. This is especially useful when chaining or combining API calls or the task will be repeated/automated. This Skill creates a reusable script to fetch, enrich or process data.\n---\n\n# Hugging Face API Tool Builder\n\nYour purpose is now is to create reusable command line scripts and utilities for using the Hugging Face API, allowing chaining, piping and intermediate processing where helpful. You can access the API directly, as well as use the `hf` command line tool. Model and Dataset cards can be accessed from repositories directly.\n\n## Script Rules\n\nMake sure to follow these rules:\n - Scripts must take a `--help` command line argument to describe their inputs and outputs\n - Non-destructive scripts should be tested before handing over to the User\n - Shell scripts are preferred, but use Python or TSX if complexity or user need requires it.\n - IMPORTANT: Use the `HF_TOKEN` environment variable as an Authorization header. For example: `curl -H \"Authorization: Bearer ${HF_TOKEN}\" https://huggingface.co/api/`. This provides higher rate limits and appropriate authorization for data access.\n - Investigate the shape of the API results before commiting to a final design; make use of piping and chaining where composability would be an advantage - prefer simple solutions where possible.\n - Share usage examples once complete.\n\nBe sure to confirm User preferences where there are questions or clarifications needed.\n\n## Sample Scripts\n\nPaths below are relative to this skill directory.\n\nReference examples:\n- `references/hf_model_papers_auth.sh` — uses `HF_TOKEN` automatically and chains trending → model metadata → model card parsing with fallbacks; it demonstrates multi-step API usage plus auth hygiene for gated/private content.\n- `references/find_models_by_paper.sh` — optional `HF_TOKEN` usage via `--token`, consistent authenticated search, and a retry path when arXiv-prefixed searches are too narrow; it shows resilient query strategy and clear user-facing help.\n- `references/hf_model_card_frontmatter.sh` — uses the `hf` CLI to download model cards, extracts YAML frontmatter, and emits NDJSON summaries (license, pipeline tag, tags, gated prompt flag) for easy filtering.\n\nBaseline examples (ultra-simple, minimal logic, raw JSON output with `HF_TOKEN` header):\n- `references/baseline_hf_api.sh` — bash\n- `references/baseline_hf_api.py` — python\n- `references/baseline_hf_api.tsx` — typescript executable\n\nComposable utility (stdin → NDJSON):\n- `references/hf_enrich_models.sh` — reads model IDs from stdin, fetches metadata per ID, emits one JSON object per line for streaming pipelines.\n\nComposability through piping (shell-friendly JSON output):\n- `references/baseline_hf_api.sh 25 | jq -r '.[].id' | references/hf_enrich_models.sh | jq -s 'sort_by(.downloads) | reverse | .[:10]'`\n- `references/baseline_hf_api.sh 50 | jq '[.[] | {id, downloads}] | sort_by(.downloads) | reverse | .[:10]'`\n- `printf '%s\\n' openai/gpt-oss-120b meta-llama/Meta-Llama-3.1-8B | references/hf_model_card_frontmatter.sh | jq -s 'map({id, license, has_extra_gated_prompt})'`\n\n## High Level Endpoints\n\nThe following are the main API endpoints available at `https://huggingface.co`\n\n```\n/api/datasets\n/api/models\n/api/spaces\n/api/collections\n/api/daily_papers\n/api/notifications\n/api/settings\n/api/whoami-v2\n/api/trending\n/oauth/userinfo\n```\n\n## Accessing the API\n\nThe API is documented with the OpenAPI standard at `https://huggingface.co/.well-known/openapi.json`.\n\n**IMPORTANT:** DO NOT ATTEMPT to read `https://huggingface.co/.well-known/openapi.json` directly as it is too large to process. \n\n**IMPORTANT** Use `jq` to query and extract relevant parts. For example, \n\n Command to Get All 160 Endpoints\n\n```bash\ncurl -s \"https://huggingface.co/.well-known/openapi.json\" | jq '.paths | keys | sort'\n```\n\nModel Search Endpoint Details\n\n```bash\ncurl -s \"https://huggingface.co/.well-known/openapi.json\" | jq '.paths[\"/api/models\"]'\n```\n\nYou can also query endpoints to see the shape of the data. When doing so constrain results to low numbers to make them easy to process, yet representative.\n\n## Using the HF command line tool\n\nThe `hf` command line tool gives you further access to Hugging Face repository content and infrastructure. \n\n```bash\n❯ hf --help\nUsage: hf [OPTIONS] COMMAND [ARGS]...\n\n  Hugging Face Hub CLI\n\nOptions:\n  --help                Show this message and exit.\n\nCommands:\n  auth                 Manage authentication (login, logout, etc.).\n  cache                Manage local cache directory.\n  download             Download files from the Hub.\n  endpoints            Manage Hugging Face Inference Endpoints.\n  env                  Print information about the environment.\n  jobs                 Run and manage Jobs on the Hub.\n  repo                 Manage repos on the Hub.\n  repo-files           Manage files in a repo on the Hub.\n  upload               Upload a file or a folder to the Hub.\n  upload-large-folder  Upload a large folder to the Hub.\n  version              Print information about the hf version.\n```\n\nThe `hf` CLI command has replaced the now deprecated `huggingface_hub` CLI command.\n"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/baseline_hf_api.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nUltra-simple Hugging Face API example (Python).\n\nFetches a small list of models from the HF API and prints raw JSON.\nUses HF_TOKEN for auth if the environment variable is set.\n\"\"\"\n\nfrom __future__ import annotations\n\nimport os\nimport sys\nimport urllib.request\n\n\ndef show_help() -> None:\n    print(\n        \"\"\"Ultra-simple Hugging Face API example (Python)\n\nUsage:\n  baseline_hf_api.py [limit]\n  baseline_hf_api.py --help\n\nDescription:\n  Fetches a small list of models from the HF API and prints raw JSON.\n  Uses HF_TOKEN for auth if the environment variable is set.\n\nExamples:\n  baseline_hf_api.py\n  baseline_hf_api.py 5\n  HF_TOKEN=your_token baseline_hf_api.py 10\n\"\"\"\n    )\n\n\ndef main() -> int:\n    if len(sys.argv) > 1 and sys.argv[1] == \"--help\":\n        show_help()\n        return 0\n\n    limit = sys.argv[1] if len(sys.argv) > 1 else \"3\"\n    if not limit.isdigit():\n        print(\"Error: limit must be a number\", file=sys.stderr)\n        return 1\n\n    token = os.getenv(\"HF_TOKEN\")\n    headers = {\"Authorization\": f\"Bearer {token}\"} if token else {}\n    url = f\"https://huggingface.co/api/models?limit={limit}\"\n\n    req = urllib.request.Request(url, headers=headers)\n    with urllib.request.urlopen(req) as resp:\n        sys.stdout.write(resp.read().decode(\"utf-8\"))\n    return 0\n\n\nif __name__ == \"__main__\":\n    raise SystemExit(main())\n"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/baseline_hf_api.sh",
    "content": "#!/usr/bin/env bash\n\nset -euo pipefail\n\nshow_help() {\n    cat << EOF\nUltra-simple Hugging Face API example (Shell)\n\nUsage:\n  $0 [limit]\n  $0 --help\n\nDescription:\n  Fetches a small list of models from the HF API and prints raw JSON.\n  Uses HF_TOKEN for auth if the environment variable is set.\n\nExamples:\n  $0\n  $0 5\n  HF_TOKEN=your_token $0 10\nEOF\n}\n\nif [[ \"${1:-}\" == \"--help\" ]]; then\n    show_help\n    exit 0\nfi\n\nLIMIT=\"${1:-3}\"\nif ! [[ \"$LIMIT\" =~ ^[0-9]+$ ]]; then\n    echo \"Error: limit must be a number\" >&2\n    exit 1\nfi\n\nheaders=()\nif [[ -n \"${HF_TOKEN:-}\" ]]; then\n    headers=(-H \"Authorization: Bearer ${HF_TOKEN}\")\nfi\n\ncurl -s \"${headers[@]}\" \"https://huggingface.co/api/models?limit=${LIMIT}\"\n"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/baseline_hf_api.tsx",
    "content": "#!/usr/bin/env tsx\n\n/**\n * Ultra-simple Hugging Face API example (TSX).\n *\n * Fetches a small list of models from the HF API and prints raw JSON.\n * Uses HF_TOKEN for auth if the environment variable is set.\n */\n\nconst showHelp = () => {\n  console.log(`Ultra-simple Hugging Face API example (TSX)\n\nUsage:\n  baseline_hf_api.tsx [limit]\n  baseline_hf_api.tsx --help\n\nDescription:\n  Fetches a small list of models from the HF API and prints raw JSON.\n  Uses HF_TOKEN for auth if the environment variable is set.\n\nExamples:\n  baseline_hf_api.tsx\n  baseline_hf_api.tsx 5\n  HF_TOKEN=your_token baseline_hf_api.tsx 10\n`);\n};\n\nconst arg = process.argv[2];\nif (arg === \"--help\") {\n  showHelp();\n  process.exit(0);\n}\n\nconst limit = arg ?? \"3\";\nif (!/^\\d+$/.test(limit)) {\n  console.error(\"Error: limit must be a number\");\n  process.exit(1);\n}\n\nconst token = process.env.HF_TOKEN;\nconst headers: Record<string, string> = token\n  ? { Authorization: `Bearer ${token}` }\n  : {};\n\nconst url = `https://huggingface.co/api/models?limit=${limit}`;\n\n(async () => {\n  const res = await fetch(url, { headers });\n\n  if (!res.ok) {\n    console.error(`Error: ${res.status} ${res.statusText}`);\n    process.exit(1);\n  }\n\n  const text = await res.text();\n  process.stdout.write(text);\n})();\n"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/find_models_by_paper.sh",
    "content": "#!/bin/bash\n\n# Find models associated with papers on Hugging Face\n# Usage: ./find_models_by_paper.sh [arXiv_id|search_term]\n# Optional: Set HF_TOKEN environment variable for private/gated models\n\nset -e\n\n# Colors for output\nRED='\\033[0;31m'\nGREEN='\\033[0;32m'\nYELLOW='\\033[1;33m'\nBLUE='\\033[0;34m'\nNC='\\033[0m' # No Color\n\n# Help function\nshow_help() {\n    echo -e \"${BLUE}Find models associated with papers on Hugging Face${NC}\"\n    echo \"\"\n    echo -e \"${YELLOW}Usage:${NC}\"\n    echo \"  $0 [OPTIONS] [search_term|arXiv_id]\"\n    echo \"\"\n    echo -e \"${YELLOW}Options:${NC}\"\n    echo \"  --help    Show this help message\"\n    echo \"  --token   Use HF_TOKEN environment variable (if set)\"\n    echo \"\"\n    echo -e \"${YELLOW}Environment:${NC}\"\n    echo \"  HF_TOKEN  Optional: Hugging Face token for private/gated models\"\n    echo \"\"\n    echo -e \"${YELLOW}Examples:${NC}\"\n    echo \"  $0 1910.01108                    # Search by arXiv ID\"\n    echo \"  $0 distilbert                     # Search by model name\"\n    echo \"  $0 transformer                    # Search by keyword\"\n    echo \"  HF_TOKEN=your_token $0 1910.01108  # Use authentication\"\n    echo \"\"\n    echo -e \"${YELLOW}Description:${NC}\"\n    echo \"This script finds Hugging Face models that are associated with research papers.\"\n    echo \"It searches for models that have arXiv IDs in their tags or mentions papers in their metadata.\"\n    echo \"\"\n    echo -e \"${YELLOW}Notes:${NC}\"\n    echo \"• HF_TOKEN is optional for public models\"\n    echo \"• Use HF_TOKEN for private repositories or gated models\"\n    echo \"• HF_TOKEN enables higher rate limits for heavy usage\"\n}\n\n# Parse arguments\nUSE_TOKEN=false\nPOSITIONAL_ARGS=()\n\nwhile [[ $# -gt 0 ]]; do\n    case $1 in\n        --help)\n            show_help\n            exit 0\n            ;;\n        --token)\n            USE_TOKEN=true\n            shift\n            ;;\n        -*)\n            echo -e \"${RED}Unknown option: $1${NC}\"\n            show_help\n            exit 1\n            ;;\n        *)\n            POSITIONAL_ARGS+=(\"$1\")\n            shift\n            ;;\n    esac\ndone\n\nset -- \"${POSITIONAL_ARGS[@]}\"\n\nif [[ $# -eq 0 ]]; then\n    echo -e \"${RED}Error: Please provide a search term or arXiv ID${NC}\"\n    echo -e \"Use ${YELLOW}$0 --help${NC} for usage information\"\n    exit 1\nfi\n\nSEARCH_TERM=\"$1\"\n\n# Set up authentication header if HF_TOKEN is available\nif [[ -n \"$HF_TOKEN\" ]] && [[ \"$USE_TOKEN\" == true || -n \"$HF_TOKEN\" ]]; then\n    AUTH_HEADER=\"-H \\\"Authorization: Bearer $HF_TOKEN\\\"\"\n    echo -e \"${BLUE}Using HF_TOKEN for authentication${NC}\"\nelse\n    AUTH_HEADER=\"\"\n    if [[ -n \"$HF_TOKEN\" ]]; then\n        echo -e \"${YELLOW}HF_TOKEN found but not using it (add --token flag to use)${NC}\"\n    fi\nfi\n\n# Check if the input looks like an arXiv ID (format: YYYY.NNNNN or YYYY.NNNNNNN)\nif [[ \"$SEARCH_TERM\" =~ ^[0-9]{4}\\.[0-9]{4,7}$ ]]; then\n    echo -e \"${BLUE}Searching for models associated with arXiv paper: $SEARCH_TERM${NC}\"\n    SEARCH_QUERY=\"arxiv%3A$SEARCH_TERM\"\n    IS_ARXIV_SEARCH=true\nelse\n    echo -e \"${BLUE}Searching for models related to: $SEARCH_TERM${NC}\"\n    SEARCH_QUERY=\"$SEARCH_TERM\"\n    IS_ARXIV_SEARCH=false\nfi\n\n# Function to extract arXiv IDs from tags\nextract_arxiv_ids() {\n    local tags=\"$1\"\n    echo \"$tags\" | jq -r '.[] | select(. | startswith(\"arxiv:\")) | split(\":\")[1]' 2>/dev/null || true\n}\n\n# Function to get paper title from arXiv ID\nget_paper_title() {\n    local arxiv_id=\"$1\"\n    # Try to get paper title from Hugging Face tags if available\n    # This is a simplified approach - in practice, you might want to call arXiv API\n    echo \"Paper Title (arXiv:$arxiv_id)\"\n}\n\n# Search for models\nAPI_URL=\"https://huggingface.co/api/models\"\necho -e \"${YELLOW}Searching Hugging Face API...${NC}\"\n\n# Build curl command with authentication if available\nCURL_CMD=\"curl -s $AUTH_HEADER \\\"$API_URL?search=$SEARCH_QUERY&limit=50\\\"\"\necho -e \"${BLUE}API Query: $API_URL?search=$SEARCH_QUERY&limit=50${NC}\"\n\n# Execute the API call\nif [[ -n \"$HF_TOKEN\" ]]; then\n    RESPONSE=$(curl -s -H \"Authorization: Bearer $HF_TOKEN\" \"$API_URL?search=$SEARCH_QUERY&limit=50\" || true)\nelse\n    RESPONSE=$(curl -s \"$API_URL?search=$SEARCH_QUERY&limit=50\" || true)\nfi\n\n# Check if we got a valid response\nif [[ -z \"$RESPONSE\" ]] || [[ \"$RESPONSE\" == \"[]\" ]]; then\n    echo -e \"${RED}No models found for search term: $SEARCH_TERM${NC}\"\n    \n    # If arXiv search failed, try without arxiv: prefix\n    if [[ \"$IS_ARXIV_SEARCH\" == true ]]; then\n        echo -e \"${YELLOW}Trying broader search without arxiv: prefix...${NC}\"\n        SEARCH_QUERY=\"$SEARCH_TERM\"\n        IS_ARXIV_SEARCH=false\n        \n        if [[ -n \"$HF_TOKEN\" ]]; then\n            RESPONSE=$(curl -s -H \"Authorization: Bearer $HF_TOKEN\" \"$API_URL?search=$SEARCH_QUERY&limit=50\" || true)\n        else\n            RESPONSE=$(curl -s \"$API_URL?search=$SEARCH_QUERY&limit=50\" || true)\n        fi\n        \n        if [[ -z \"$RESPONSE\" ]] || [[ \"$RESPONSE\" == \"[]\" ]]; then\n            echo -e \"${RED}Still no results found. Try a different search term.${NC}\"\n            exit 1\n        fi\n    else\n        exit 1\n    fi\nfi\n\n# Process the results\necho -e \"${GREEN}Found models! Processing results...${NC}\"\n\n# Use jq to process the JSON response and find models with paper associations\nMODELS_WITH_PAPERS=$(echo \"$RESPONSE\" | jq -r '\n  .[] |\n  select(.id != null) |\n  {\n    id: .id,\n    arxiv_tags: [.tags[] | select(. | startswith(\"arxiv:\"))] | join(\"; \"),\n    downloads: (.downloads // 0),\n    likes: (.likes // 0),\n    task: (.pipeline_tag // \"unknown\"),\n    library: (.library_name // \"unknown\")\n  }\n  | @base64' 2>/dev/null || true)\n\n# Count total results\nTOTAL_MODELS=$(echo \"$RESPONSE\" | jq 'length' 2>/dev/null || echo \"0\")\nMODELS_WITH_PAPERS_COUNT=$(echo \"$MODELS_WITH_PAPERS\" | wc -l)\n\necho -e \"${BLUE}Results Summary:${NC}\"\necho -e \"  Total models found: $TOTAL_MODELS\"\necho -e \"  Models with paper associations: $MODELS_WITH_PAPERS_COUNT\"\necho \"\"\n\nif [[ -z \"$MODELS_WITH_PAPERS\" ]]; then\n    # Show all models even if no paper associations found\n    echo -e \"${YELLOW}No explicit paper associations found. Showing all matching models:${NC}\"\n    echo \"$RESPONSE\" | jq -r '\n      .[] |\n      select(.id != null) |\n      \"📦 \\(.id)\n   Task: \\(.pipeline_tag // \"unknown\")\n   Downloads: \\(.downloads // 0)\n   Likes: \\(.likes // 0)\n   Library: \\(.library_name // \"unknown\")\n   ---\"\n    ' 2>/dev/null || echo \"Failed to parse response\"\nelse\n    # Show models with paper associations\n    echo -e \"${GREEN}Models with paper associations:${NC}\"\n    echo \"$MODELS_WITH_PAPERS\" | while read -r model_data; do\n        if [[ -n \"$model_data\" ]]; then\n            # Decode base64 and show formatted\n            echo \"$model_data\" | base64 -d | jq -r '\n                \"📄 \\(.id)\n   arXiv: \\(.arxiv_tags)\n   Task: \\(.task)\n   Downloads: \\(.downloads)\n   Likes: \\(.likes)\n   Library: \\(.library)\n   ---\"\n            ' 2>/dev/null || echo \"Failed to parse model data\"\n        fi\n    done\nfi\n\n# Additional search tips\necho \"\"\necho -e \"${BLUE}Search Tips:${NC}\"\necho \"• Try searching with the full arXiv ID (e.g., 1910.01108)\"\necho \"• Try searching with the paper title keywords\"\necho \"• Try searching with the model name\"\necho \"• Use HF_TOKEN for private models or higher rate limits\"\necho \"\"\necho -e \"${BLUE}Examples to try:${NC}\"\necho \"  $0 1910.01108                    # DistilBERT paper\"\necho \"  $0 1810.04805                    # BERT paper\" \necho \"  $0 1706.03762                    # Attention is All You Need paper\"\necho \"  $0 roberta                       # RoBERTa models\"\necho \"  $0 transformer                   # Transformer models\"\necho \"  HF_TOKEN=your_token $0 1910.01108  # Use authentication\"\n"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/hf_enrich_models.sh",
    "content": "#!/usr/bin/env bash\n\nset -euo pipefail\n\nshow_help() {\n    cat << 'USAGE'\nStream model IDs on stdin, emit one JSON object per line (NDJSON).\n\nUsage:\n  hf_enrich_models.sh [MODEL_ID ...]\n  cat ids.txt | hf_enrich_models.sh\n  baseline_hf_api.sh 50 | jq -r '.[].id' | hf_enrich_models.sh\n\nDescription:\n  Reads newline-separated model IDs and fetches basic metadata for each.\n  Outputs NDJSON with id, downloads, likes, pipeline_tag, tags.\n  Uses HF_TOKEN for auth if the environment variable is set.\n\nExamples:\n  hf_enrich_models.sh gpt2 distilbert-base-uncased\n  baseline_hf_api.sh 50 | jq -r '.[].id' | hf_enrich_models.sh | jq -s 'sort_by(.downloads)'\n  HF_TOKEN=your_token hf_enrich_models.sh microsoft/DialoGPT-medium\nUSAGE\n}\n\nif [[ \"${1:-}\" == \"--help\" ]]; then\n    show_help\n    exit 0\nfi\n\nif ! command -v jq >/dev/null 2>&1; then\n    echo \"Error: jq is required but not installed\" >&2\n    exit 1\nfi\n\nheaders=()\nif [[ -n \"${HF_TOKEN:-}\" ]]; then\n    headers=(-H \"Authorization: Bearer ${HF_TOKEN}\")\nfi\n\nemit_error() {\n    local model_id=\"$1\"\n    local message=\"$2\"\n    jq -cn --arg id \"$model_id\" --arg error \"$message\" '{id: $id, error: $error}'\n}\n\nprocess_id() {\n    local model_id=\"$1\"\n\n    if [[ -z \"$model_id\" ]]; then\n        return 0\n    fi\n\n    local url=\"https://huggingface.co/api/models/${model_id}\"\n    local response\n    response=$(curl -s \"${headers[@]}\" \"$url\" 2>/dev/null || true)\n\n    if [[ -z \"$response\" ]]; then\n        emit_error \"$model_id\" \"request_failed\"\n        return 0\n    fi\n\n    if ! jq -e . >/dev/null 2>&1 <<<\"$response\"; then\n        emit_error \"$model_id\" \"invalid_json\"\n        return 0\n    fi\n\n    if jq -e '.error' >/dev/null 2>&1 <<<\"$response\"; then\n        emit_error \"$model_id\" \"not_found\"\n        return 0\n    fi\n\n    jq -c --arg id \"$model_id\" '{\n        id: (.id // $id),\n        downloads: (.downloads // 0),\n        likes: (.likes // 0),\n        pipeline_tag: (.pipeline_tag // \"unknown\"),\n        tags: (.tags // [])\n    }' <<<\"$response\" 2>/dev/null || emit_error \"$model_id\" \"parse_failed\"\n}\n\nif [[ $# -gt 0 ]]; then\n    for model_id in \"$@\"; do\n        process_id \"$model_id\"\n    done\n    exit 0\nfi\n\nif [[ -t 0 ]]; then\n    show_help\n    exit 1\nfi\n\nwhile IFS= read -r model_id; do\n    process_id \"$model_id\"\ndone\n"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/hf_model_card_frontmatter.sh",
    "content": "#!/usr/bin/env bash\n\nset -euo pipefail\n\nshow_help() {\n    cat << 'USAGE'\nFetch Hugging Face model cards via the hf CLI and summarize frontmatter.\n\nUsage:\n  hf_model_card_frontmatter.sh [MODEL_ID ...]\n  cat ids.txt | hf_model_card_frontmatter.sh\n\nDescription:\n  Downloads README.md for each model via `hf download`, extracts YAML\n  frontmatter, and emits one JSON object per line (NDJSON) with key fields.\n  Uses HF_TOKEN if set (passed to the hf CLI).\n\nOutput fields:\n  id, license, pipeline_tag, library_name, tags, language,\n  new_version, has_extra_gated_prompt\n\nExamples:\n  hf_model_card_frontmatter.sh openai/gpt-oss-120b\n  cat ids.txt | hf_model_card_frontmatter.sh | jq -s '.'\n  hf_model_card_frontmatter.sh meta-llama/Meta-Llama-3-8B \\\n    | jq -s 'map({id, license, has_extra_gated_prompt})'\nUSAGE\n}\n\nif [[ \"${1:-}\" == \"--help\" ]]; then\n    show_help\n    exit 0\nfi\n\nif ! command -v hf >/dev/null 2>&1; then\n    echo \"Error: hf CLI is required but not installed\" >&2\n    exit 1\nfi\n\nif ! command -v python3 >/dev/null 2>&1; then\n    echo \"Error: python3 is required but not installed\" >&2\n    exit 1\nfi\n\ntoken_args=()\nif [[ -n \"${HF_TOKEN:-}\" ]]; then\n    token_args=(--token \"$HF_TOKEN\")\nfi\n\ntmp_dir=$(mktemp -d)\ncleanup() {\n    rm -rf \"$tmp_dir\"\n}\ntrap cleanup EXIT\n\nemit_error() {\n    local model_id=\"$1\"\n    local message=\"$2\"\n    python3 - << 'PY' \"$model_id\" \"$message\"\nimport json\nimport sys\n\nmodel_id = sys.argv[1]\nmessage = sys.argv[2]\nprint(json.dumps({\"id\": model_id, \"error\": message}))\nPY\n}\n\nparse_readme() {\n    local model_id=\"$1\"\n    local readme_path=\"$2\"\n\n    MODEL_ID=\"$model_id\" README_PATH=\"$readme_path\" python3 - << 'PY'\nimport json\nimport os\nimport sys\n\nmodel_id = os.environ.get(\"MODEL_ID\", \"\")\nreadme_path = os.environ.get(\"README_PATH\", \"\")\n\ntry:\n    with open(readme_path, \"r\", encoding=\"utf-8\") as f:\n        lines = f.read().splitlines()\nexcept OSError:\n    print(json.dumps({\"id\": model_id, \"error\": \"readme_missing\"}))\n    sys.exit(0)\n\nfrontmatter = []\nin_block = False\nfor line in lines:\n    if line.strip() == \"---\":\n        if in_block:\n            break\n        in_block = True\n        continue\n    if in_block:\n        frontmatter.append(line)\n\nif not frontmatter:\n    print(json.dumps({\"id\": model_id, \"error\": \"frontmatter_missing\"}))\n    sys.exit(0)\n\nkey = None\nout = {}\n\nfor line in frontmatter:\n    stripped = line.strip()\n    if not stripped or line.lstrip().startswith(\"#\"):\n        continue\n\n    if \":\" in line and not line.lstrip().startswith(\"- \"):\n        key_candidate, value = line.split(\":\", 1)\n        key_candidate = key_candidate.strip()\n        value = value.strip()\n        if key_candidate and all(c.isalnum() or c in \"_-\" for c in key_candidate):\n            key = key_candidate\n            if value in (\"|\", \"|-\", \">\", \">-\") or value == \"\":\n                out[key] = None\n                continue\n            if value.startswith(\"[\") and value.endswith(\"]\"):\n                items = [v.strip() for v in value.strip(\"[]\").split(\",\") if v.strip()]\n                out[key] = items\n            else:\n                out[key] = value\n            continue\n\n    if line.lstrip().startswith(\"- \") and key:\n        item = line.strip()[2:]\n        if key not in out or out[key] is None:\n            out[key] = []\n        if isinstance(out[key], list):\n            out[key].append(item)\n\nresult = {\n    \"id\": model_id,\n    \"license\": out.get(\"license\"),\n    \"pipeline_tag\": out.get(\"pipeline_tag\"),\n    \"library_name\": out.get(\"library_name\"),\n    \"tags\": out.get(\"tags\", []),\n    \"language\": out.get(\"language\", []),\n    \"new_version\": out.get(\"new_version\"),\n    \"has_extra_gated_prompt\": \"extra_gated_prompt\" in out,\n}\n\nprint(json.dumps(result))\nPY\n}\n\nprocess_id() {\n    local model_id=\"$1\"\n\n    if [[ -z \"$model_id\" ]]; then\n        return 0\n    fi\n\n    local safe_id\n    safe_id=$(printf '%s' \"$model_id\" | tr '/' '_')\n    local local_dir=\"$tmp_dir/$safe_id\"\n\n    if ! hf download \"$model_id\" README.md --repo-type model --local-dir \"$local_dir\" \"${token_args[@]}\" >/dev/null 2>&1; then\n        emit_error \"$model_id\" \"download_failed\"\n        return 0\n    fi\n\n    local readme_path=\"$local_dir/README.md\"\n    if [[ ! -f \"$readme_path\" ]]; then\n        emit_error \"$model_id\" \"readme_missing\"\n        return 0\n    fi\n\n    parse_readme \"$model_id\" \"$readme_path\"\n}\n\nif [[ $# -gt 0 ]]; then\n    for model_id in \"$@\"; do\n        process_id \"$model_id\"\n    done\n    exit 0\nfi\n\nif [[ -t 0 ]]; then\n    show_help\n    exit 1\nfi\n\nwhile IFS= read -r model_id; do\n    process_id \"$model_id\"\ndone\n"
  },
  {
    "path": "skills/hugging-face-tool-builder/references/hf_model_papers_auth.sh",
    "content": "#!/usr/bin/env bash\n\n# Hugging Face Model Papers Tool with Authentication\n# Fetches papers referenced by Hugging Face models using HF_TOKEN if available\n\nset -euo pipefail\n\n# Help function\nshow_help() {\n    cat << EOF\nHugging Face Model Papers Tool with Authentication\n\nThis tool fetches papers referenced by Hugging Face models.\nSupports authentication via HF_TOKEN environment variable.\n\nUsage:\n    $0 [OPTIONS]\n\nOptions:\n    MODEL_ID            Specific model to analyze (e.g., microsoft/DialoGPT-medium)\n    --trending [N]      Show papers for top N trending models (default: 5)\n    --help              Show this help message\n\nEnvironment Variables:\n    HF_TOKEN            Hugging Face API token (optional, for private models)\n\nExamples:\n    # Get papers for a specific model\n    $0 microsoft/DialoGPT-medium\n\n    # Get papers with authentication\n    HF_TOKEN=your_token_here $0 your-private-model\n\n    # Get papers for top 3 trending models\n    $0 --trending 3\n\nEOF\n}\n\n# Function to make authenticated API calls\nhf_api_call() {\n    local url=\"$1\"\n    local headers=()\n    \n    # Add authentication header if HF_TOKEN is set\n    if [[ -n \"${HF_TOKEN:-}\" ]]; then\n        headers+=(-H \"Authorization: Bearer $HF_TOKEN\")\n    fi\n    \n    curl -s \"${headers[@]}\" \"$url\" 2>/dev/null || echo '{\"error\": \"Network error\"}'\n}\n\n# Function to extract papers from text\nextract_papers() {\n    local text=\"$1\"\n    local title=\"$2\"\n    \n    echo \"$title\"\n    \n    # Find ArXiv URLs\n    local arxiv_urls=$(echo \"$text\" | grep -oE 'https?://arxiv\\.org/[^[:space:]\\])]+' | head -5)\n    if [[ -n \"$arxiv_urls\" ]]; then\n        echo \"ArXiv Papers:\"\n        echo \"$arxiv_urls\" | sed 's/^/  • /'\n    fi\n    \n    # Find DOI URLs\n    local doi_urls=$(echo \"$text\" | grep -oE 'https?://doi\\.org/[^[:space:]\\])]+' | head -3)\n    if [[ -n \"$doi_urls\" ]]; then\n        echo \"DOI Papers:\"\n        echo \"$doi_urls\" | sed 's/^/  • /'\n    fi\n    \n    # Find arxiv IDs in format YYYY.NNNNN\n    local arxiv_ids=$(echo \"$text\" | grep -oE 'arXiv:[0-9]{4}\\.[0-9]{4,5}' | head -5)\n    if [[ -n \"$arxiv_ids\" ]]; then\n        echo \"ArXiv IDs:\"\n        echo \"$arxiv_ids\" | sed 's/^/  • /'\n    fi\n    \n    # Check for paper mentions\n    if echo \"$text\" | grep -qi \"paper\\|publication\\|citation\"; then\n        local paper_mentions=$(echo \"$text\" | grep -i -A1 -B1 \"paper\\|publication\" | head -6)\n        if [[ -n \"$paper_mentions\" ]]; then\n            echo \"Paper mentions:\"\n            echo \"$paper_mentions\" | sed 's/^/  /'\n        fi\n    fi\n    \n    if [[ -z \"$arxiv_urls\" && -z \"$doi_urls\" && -z \"$arxiv_ids\" ]]; then\n        echo \"No papers found in model card\"\n    fi\n}\n\n# Function to get model papers\nget_model_papers() {\n    local model_id=\"$1\"\n    \n    echo \"=== $model_id ===\"\n    \n    # Get model info from API with authentication\n    local api_url=\"https://huggingface.co/api/models/$model_id\"\n    local response=$(hf_api_call \"$api_url\")\n    \n    if echo \"$response\" | grep -q '\"error\"'; then\n        echo \"Error: Could not fetch model '$model_id'\"\n        if [[ -z \"${HF_TOKEN:-}\" ]]; then\n            echo \"Note: This might be a private model. Try setting HF_TOKEN environment variable.\"\n        fi\n        return 1\n    fi\n    \n    # Parse basic info\n    local downloads=$(echo \"$response\" | jq -r '.downloads // 0')\n    local likes=$(echo \"$response\" | jq -r '.likes // 0')\n    echo \"Downloads: $downloads | Likes: $likes\"\n    \n    # Get model card\n    local card_url=\"https://huggingface.co/$model_id/raw/main/README.md\"\n    local card_content=$(curl -s \"$card_url\" 2>/dev/null || echo \"\")\n    \n    if [[ -n \"$card_content\" ]]; then\n        extract_papers \"$card_content\" \"Papers from model card:\"\n    else\n        echo \"Could not fetch model card\"\n    fi\n    \n    # Check tags for arxiv references\n    local arxiv_tag=$(echo \"$response\" | jq -r '.tags[]' 2>/dev/null | grep arxiv || true)\n    if [[ -n \"$arxiv_tag\" ]]; then\n        echo \"ArXiv from tags: $arxiv_tag\"\n    fi\n    \n    echo\n}\n\n# Function to get trending models\nget_trending_models() {\n    local limit=\"${1:-5}\"\n    \n    echo \"Fetching top $limit trending models...\"\n    \n    local trending_url=\"https://huggingface.co/api/trending?type=model&limit=$limit\"\n    local response=$(hf_api_call \"$trending_url\")\n    \n    echo \"$response\" | jq -r '.recentlyTrending[] | .repoData.id' | head -\"$limit\" | while read -r model_id; do\n        if [[ -n \"$model_id\" ]]; then\n            get_model_papers \"$model_id\"\n        fi\n    done\n}\n\n# Main\nif [[ $# -eq 0 ]]; then\n    echo \"Error: No arguments provided\"\n    show_help\n    exit 1\nfi\n\nif [[ \"$1\" == \"--help\" ]]; then\n    show_help\n    exit 0\nelif [[ \"$1\" == \"--trending\" ]]; then\n    if [[ -n \"${2:-}\" ]] && [[ \"$2\" =~ ^[0-9]+$ ]]; then\n        get_trending_models \"$2\"\n    else\n        get_trending_models 5\n    fi\nelse\n    get_model_papers \"$1\"\nfi\n"
  },
  {
    "path": "skills/hugging-face-trackio/.claude-plugin/plugin.json",
    "content": "{\n  \"name\": \"trackio-cli\",\n  \"version\": \"1.0.0\",\n  \"description\": \"Query Trackio projects, runs, and metrics using the `trackio` CLI. Use when the user needs to list projects/runs/metrics, get summaries, or retrieve metric values from local Trackio databases. Supports both human-readable and JSON output formats. Covers project discovery, run inspection, metric querying, and system metrics. Designed for LLM agents and automation scripts.\",\n  \"author\": {\n    \"name\": \"Hugging Face\"\n  },\n  \"repository\": \"https://github.com/huggingface/skills\",\n  \"license\": \"Apache-2.0\",\n  \"keywords\": [\n    \"trackio\",\n    \"experiment-tracking\",\n    \"metrics\",\n    \"mlops\",\n    \"cli\",\n    \"monitoring\"\n  ]\n}\n\n"
  },
  {
    "path": "skills/hugging-face-trackio/SKILL.md",
    "content": "---\nname: hugging-face-trackio\ndescription: Track and visualize ML training experiments with Trackio. Use when logging metrics during training (Python API), firing alerts for training diagnostics, or retrieving/analyzing logged metrics (CLI). Supports real-time dashboard visualization, alerts with webhooks, HF Space syncing, and JSON output for automation.\n---\n\n# Trackio - Experiment Tracking for ML Training\n\nTrackio is an experiment tracking library for logging and visualizing ML training metrics. It syncs to Hugging Face Spaces for real-time monitoring dashboards.\n\n## Three Interfaces\n\n| Task | Interface | Reference |\n|------|-----------|-----------|\n| **Logging metrics** during training | Python API | [references/logging_metrics.md](references/logging_metrics.md) |\n| **Firing alerts** for training diagnostics | Python API | [references/alerts.md](references/alerts.md) |\n| **Retrieving metrics & alerts** after/during training | CLI | [references/retrieving_metrics.md](references/retrieving_metrics.md) |\n\n## When to Use Each\n\n### Python API → Logging\n\nUse `import trackio` in your training scripts to log metrics:\n\n- Initialize tracking with `trackio.init()`\n- Log metrics with `trackio.log()` or use TRL's `report_to=\"trackio\"`\n- Finalize with `trackio.finish()`\n\n**Key concept**: For remote/cloud training, pass `space_id` — metrics sync to a Space dashboard so they persist after the instance terminates.\n\n→ See [references/logging_metrics.md](references/logging_metrics.md) for setup, TRL integration, and configuration options.\n\n### Python API → Alerts\n\nInsert `trackio.alert()` calls in training code to flag important events — like inserting print statements for debugging, but structured and queryable:\n\n- `trackio.alert(title=\"...\", level=trackio.AlertLevel.WARN)` — fire an alert\n- Three severity levels: `INFO`, `WARN`, `ERROR`\n- Alerts are printed to terminal, stored in the database, shown in the dashboard, and optionally sent to webhooks (Slack/Discord)\n\n**Key concept for LLM agents**: Alerts are the primary mechanism for autonomous experiment iteration. An agent should insert alerts into training code for diagnostic conditions (loss spikes, NaN gradients, low accuracy, training stalls). Since alerts are printed to the terminal, an agent that is watching the training script's output will see them automatically. For background or detached runs, the agent can poll via CLI instead.\n\n→ See [references/alerts.md](references/alerts.md) for the full alerts API, webhook setup, and autonomous agent workflows.\n\n### CLI → Retrieving\n\nUse the `trackio` command to query logged metrics and alerts:\n\n- `trackio list projects/runs/metrics` — discover what's available\n- `trackio get project/run/metric` — retrieve summaries and values\n- `trackio list alerts --project <name> --json` — retrieve alerts\n- `trackio show` — launch the dashboard\n- `trackio sync` — sync to HF Space\n\n**Key concept**: Add `--json` for programmatic output suitable for automation and LLM agents.\n\n→ See [references/retrieving_metrics.md](references/retrieving_metrics.md) for all commands, workflows, and JSON output formats.\n\n## Minimal Logging Setup\n\n```python\nimport trackio\n\ntrackio.init(project=\"my-project\", space_id=\"username/trackio\")\ntrackio.log({\"loss\": 0.1, \"accuracy\": 0.9})\ntrackio.log({\"loss\": 0.09, \"accuracy\": 0.91})\ntrackio.finish()\n```\n\n### Minimal Retrieval\n\n```bash\ntrackio list projects --json\ntrackio get metric --project my-project --run my-run --metric loss --json\n```\n\n## Autonomous ML Experiment Workflow\n\nWhen running experiments autonomously as an LLM agent, the recommended workflow is:\n\n1. **Set up training with alerts** — insert `trackio.alert()` calls for diagnostic conditions\n2. **Launch training** — run the script in the background\n3. **Poll for alerts** — use `trackio list alerts --project <name> --json --since <timestamp>` to check for new alerts\n4. **Read metrics** — use `trackio get metric ...` to inspect specific values\n5. **Iterate** — based on alerts and metrics, stop the run, adjust hyperparameters, and launch a new run\n\n```python\nimport trackio\n\ntrackio.init(project=\"my-project\", config={\"lr\": 1e-4})\n\nfor step in range(num_steps):\n    loss = train_step()\n    trackio.log({\"loss\": loss, \"step\": step})\n\n    if step > 100 and loss > 5.0:\n        trackio.alert(\n            title=\"Loss divergence\",\n            text=f\"Loss {loss:.4f} still high after {step} steps\",\n            level=trackio.AlertLevel.ERROR,\n        )\n    if step > 0 and abs(loss) < 1e-8:\n        trackio.alert(\n            title=\"Vanishing loss\",\n            text=\"Loss near zero — possible gradient collapse\",\n            level=trackio.AlertLevel.WARN,\n        )\n\ntrackio.finish()\n```\n\nThen poll from a separate terminal/process:\n\n```bash\ntrackio list alerts --project my-project --json --since \"2025-01-01T00:00:00\"\n```\n"
  },
  {
    "path": "skills/hugging-face-trackio/references/alerts.md",
    "content": "# Trackio Alerts\n\nAlerts let you flag important training events directly from code. They are the primary mechanism for LLM agents to diagnose runs and iterate autonomously on ML experiments.\n\nAlerts are printed to the terminal, stored in the database, displayed in the dashboard, and optionally sent to webhooks (Slack/Discord).\n\n## Core API\n\n### trackio.alert()\n\n```python\ntrackio.alert(\n    title=\"Loss divergence\",                    # Short title (required)\n    text=\"Loss 5.2 still high after 200 steps\", # Detailed description (optional)\n    level=trackio.AlertLevel.WARN,               # INFO, WARN, or ERROR (default: WARN)\n    webhook_url=\"https://hooks.slack.com/...\",   # Per-alert webhook override (optional)\n)\n```\n\n### Alert Levels\n\n| Level | Usage |\n|-------|-------|\n| `trackio.AlertLevel.INFO` | Informational milestones (checkpoints saved, eval completed) |\n| `trackio.AlertLevel.WARN` | Potential issues (loss plateau, low accuracy, high gradient norm) |\n| `trackio.AlertLevel.ERROR` | Critical failures (NaN loss, divergence, OOM) |\n\n### Webhook Support\n\nSet a global webhook URL via `trackio.init()` or the `TRACKIO_WEBHOOK_URL` environment variable. Alerts are auto-formatted for Slack and Discord URLs.\n\n```python\ntrackio.init(\n    project=\"my-project\",\n    webhook_url=\"https://hooks.slack.com/services/...\",\n    webhook_min_level=trackio.AlertLevel.WARN,  # Only send WARN+ to webhook\n)\n```\n\nPer-alert override:\n\n```python\ntrackio.alert(\n    title=\"Critical failure\",\n    level=trackio.AlertLevel.ERROR,\n    webhook_url=\"https://hooks.slack.com/services/...\",  # Overrides global URL\n)\n```\n\nEnvironment variables:\n- `TRACKIO_WEBHOOK_URL` — global webhook URL\n- `TRACKIO_WEBHOOK_MIN_LEVEL` — minimum level for webhook delivery (`info`, `warn`, `error`)\n\n## Retrieving Alerts (CLI)\n\n```bash\n# List all alerts for a project\ntrackio list alerts --project my-project --json\n\n# Filter by run or level\ntrackio list alerts --project my-project --run my-run --level error --json\n\n# Poll for new alerts since a timestamp (efficient for agents)\ntrackio list alerts --project my-project --json --since \"2025-06-01T12:00:00\"\n```\n\n### JSON Output Structure\n\n```json\n{\n  \"project\": \"my-project\",\n  \"run\": null,\n  \"level\": null,\n  \"since\": \"2025-06-01T12:00:00\",\n  \"alerts\": [\n    {\n      \"run\": \"run-name\",\n      \"title\": \"Loss divergence\",\n      \"text\": \"Loss 5.2 still high after 200 steps\",\n      \"level\": \"warn\",\n      \"step\": 200,\n      \"timestamp\": \"2025-06-01T12:05:30\"\n    }\n  ]\n}\n```\n\n## Autonomous Agent Workflow\n\nThe recommended pattern for an LLM agent running ML experiments:\n\n### 1. Insert Alerts Into Training Code\n\nAdd diagnostic `trackio.alert()` calls for conditions the agent should react to:\n\n```python\nimport trackio\n\ntrackio.init(project=\"hyperparam-sweep\", config={\"lr\": lr, \"batch_size\": bs})\n\nfor step in range(num_steps):\n    loss = train_step()\n    trackio.log({\"loss\": loss, \"step\": step})\n\n    if step > 200 and loss > 5.0:\n        trackio.alert(\n            title=\"Loss divergence\",\n            text=f\"Loss {loss:.4f} still above 5.0 after {step} steps — learning rate may be too high\",\n            level=trackio.AlertLevel.ERROR,\n        )\n\n    if step > 500 and loss_delta < 0.001:\n        trackio.alert(\n            title=\"Training stall\",\n            text=f\"Loss barely changed over last 100 steps (delta={loss_delta:.6f})\",\n            level=trackio.AlertLevel.WARN,\n        )\n\n    if math.isnan(loss):\n        trackio.alert(\n            title=\"NaN loss\",\n            text=\"Loss became NaN — training is broken\",\n            level=trackio.AlertLevel.ERROR,\n        )\n        break\n\ntrackio.finish()\n```\n\n### 2. Monitor Alerts\n\nAlerts are automatically printed to the terminal when fired. If the agent is watching the training script's output (e.g. running in the foreground or tailing logs), it will see alerts immediately — no polling needed.\n\nFor background or detached runs, poll for alerts via CLI:\n\n```bash\n# Poll for alerts (run periodically)\ntrackio list alerts --project hyperparam-sweep --json --since \"2025-06-01T00:00:00\"\n```\n\n### 3. Inspect Metrics Around the Alert\n\nWhen an alert fires, use `trackio get snapshot` to see all metrics at that point:\n\n```bash\n# Alert fired at step 200 — get all metrics in a ±5 step window\ntrackio get snapshot --project hyperparam-sweep --run run-1 --around 200 --window 5 --json\n\n# Or inspect a single metric around the alert's timestamp\ntrackio get metric --project hyperparam-sweep --run run-1 --metric loss --around 200 --window 10 --json\n```\n\n### 4. React and Iterate\n\nBased on alerts:\n- **ERROR alerts** → stop the run, adjust hyperparameters, relaunch\n- **WARN alerts** → inspect metrics with `trackio get snapshot ...`, decide whether to intervene\n- **INFO alerts** → note progress, continue monitoring\n\n### 5. Compare Across Runs\n\n```bash\n# Check metrics from previous runs\ntrackio get run --project hyperparam-sweep --run run-1 --json\ntrackio get metric --project hyperparam-sweep --run run-1 --metric loss --json\n\n# Launch new run with adjusted config\npython train.py --lr 5e-5\n```\n\n## Using Alerts with Transformers / TRL\n\nWhen using `report_to=\"trackio\"`, you don't control the training loop directly. Use a `TrainerCallback` to fire alerts:\n\n```python\nfrom transformers import TrainerCallback\n\nclass AlertCallback(TrainerCallback):\n    def on_log(self, args, state, control, logs=None, **kwargs):\n        if \"trackio\" not in args.report_to:\n            return\n        if logs and \"loss\" in logs:\n            if logs[\"loss\"] > 5.0 and state.global_step > 100:\n                trackio.alert(\n                    title=\"High loss\",\n                    text=f\"Loss {logs['loss']:.4f} at step {state.global_step}\",\n                    level=trackio.AlertLevel.ERROR,\n                )\n\ntrainer = SFTTrainer(\n    model=model,\n    args=SFTConfig(output_dir=\"./out\", report_to=\"trackio\"),\n    callbacks=[AlertCallback()],\n    ...\n)\n```\n"
  },
  {
    "path": "skills/hugging-face-trackio/references/logging_metrics.md",
    "content": "# Logging Metrics with Trackio\n\n**Trackio** is a lightweight, free experiment tracking library from Hugging Face. It provides a wandb-compatible API for logging metrics with local-first design.\n\n- **GitHub**: [gradio-app/trackio](https://github.com/gradio-app/trackio)\n- **Docs**: [huggingface.co/docs/trackio](https://huggingface.co/docs/trackio/index)\n\n## Installation\n\n```bash\npip install trackio\n# or\nuv pip install trackio\n```\n\n## Core API\n\n### Basic Usage\n\n```python\nimport trackio\n\n# Initialize a run\ntrackio.init(\n    project=\"my-project\",\n    config={\"learning_rate\": 0.001, \"epochs\": 10}\n)\n\n# Log metrics during training\nfor epoch in range(10):\n    loss = train_epoch()\n    trackio.log({\"loss\": loss, \"epoch\": epoch})\n\n# Finalize the run\ntrackio.finish()\n```\n\n### Key Functions\n\n| Function | Purpose |\n|----------|---------|\n| `trackio.init(...)` | Start a new tracking run |\n| `trackio.log(dict)` | Log metrics (called repeatedly during training) |\n| `trackio.finish()` | Finalize run and ensure all metrics are saved |\n| `trackio.show()` | Launch the local dashboard |\n| `trackio.sync(...)` | Sync local project to HF Space |\n\n## trackio.init() Parameters\n\n```python\ntrackio.init(\n    project=\"my-project\",           # Project name (groups runs together)\n    name=\"run-name\",                # Optional: name for this specific run\n    config={...},                   # Hyperparameters and config to log\n    space_id=\"username/trackio\",    # Optional: sync to HF Space for remote dashboard\n    group=\"experiment-group\",       # Optional: group related runs\n)\n```\n\n## Local vs Remote Dashboard\n\n### Local (Default)\n\nBy default, trackio stores metrics in a local SQLite database and runs the dashboard locally:\n\n```python\ntrackio.init(project=\"my-project\")\n# ... training ...\ntrackio.finish()\n\n# Launch local dashboard\ntrackio.show()\n```\n\nOr from terminal:\n```bash\ntrackio show --project my-project\n```\n\n### Remote (HF Space)\n\nPass `space_id` to sync metrics to a Hugging Face Space for persistent, shareable dashboards:\n\n```python\ntrackio.init(\n    project=\"my-project\",\n    space_id=\"username/trackio\"  # Auto-creates Space if it doesn't exist\n)\n```\n\n⚠️ **For remote training** (cloud GPUs, HF Jobs, etc.): Always use `space_id` since local storage is lost when the instance terminates.\n\n### Sync Local to Remote\n\nSync existing local projects to a Space:\n\n```python\ntrackio.sync(project=\"my-project\", space_id=\"username/my-experiments\")\n```\n\n## wandb Compatibility\n\nTrackio is API-compatible with wandb. Drop-in replacement:\n\n```python\nimport trackio as wandb\n\nwandb.init(project=\"my-project\")\nwandb.log({\"loss\": 0.5})\nwandb.finish()\n```\n\n## TRL Integration\n\nWhen using TRL trainers, set `report_to=\"trackio\"` for automatic metric logging:\n\n```python\nfrom trl import SFTConfig, SFTTrainer\nimport trackio\n\ntrackio.init(\n    project=\"sft-training\",\n    space_id=\"username/trackio\",\n    config={\"model\": \"Qwen/Qwen2.5-0.5B\", \"dataset\": \"trl-lib/Capybara\"}\n)\n\nconfig = SFTConfig(\n    output_dir=\"./output\",\n    report_to=\"trackio\",  # Automatic metric logging\n    # ... other config\n)\n\ntrainer = SFTTrainer(model=model, args=config, ...)\ntrainer.train()\ntrackio.finish()\n```\n\n## What Gets Logged\n\nWith TRL/Transformers integration, trackio automatically captures:\n- Training loss\n- Learning rate\n- Eval metrics\n- Training throughput\n\nFor manual logging, log any numeric metrics:\n\n```python\ntrackio.log({\n    \"train_loss\": 0.5,\n    \"train_accuracy\": 0.85,\n    \"val_loss\": 0.4,\n    \"val_accuracy\": 0.88,\n    \"epoch\": 1\n})\n```\n\n## Grouping Runs\n\nUse `group` to organize related experiments in the dashboard sidebar:\n\n```python\n# Group by experiment type\ntrackio.init(project=\"my-project\", name=\"baseline-v1\", group=\"baseline\")\ntrackio.init(project=\"my-project\", name=\"augmented-v1\", group=\"augmented\")\n\n# Group by hyperparameter\ntrackio.init(project=\"hyperparam-sweep\", name=\"lr-0.001\", group=\"lr_0.001\")\ntrackio.init(project=\"hyperparam-sweep\", name=\"lr-0.01\", group=\"lr_0.01\")\n```\n\n## Configuration Best Practices\n\nKeep config minimal — only log what's useful for comparing runs:\n\n```python\ntrackio.init(\n    project=\"qwen-sft-capybara\",\n    name=\"baseline-lr2e5\",\n    config={\n        \"model\": \"Qwen/Qwen2.5-0.5B\",\n        \"dataset\": \"trl-lib/Capybara\",\n        \"learning_rate\": 2e-5,\n        \"num_epochs\": 3,\n        \"batch_size\": 8,\n    }\n)\n```\n\n## Embedding Dashboards\n\nEmbed Space dashboards in websites with query parameters:\n\n```html\n<iframe \n  src=\"https://username-trackio.hf.space/?project=my-project&metrics=train_loss,val_loss&sidebar=hidden\" \n  style=\"width:1600px; height:500px; border:0;\">\n</iframe>\n```\n\nQuery parameters:\n- `project`: Filter to specific project\n- `metrics`: Comma-separated metric names to show\n- `sidebar`: `hidden` or `collapsed`\n- `smoothing`: 0-20 (smoothing slider value)\n- `xmin`, `xmax`: X-axis limits\n"
  },
  {
    "path": "skills/hugging-face-trackio/references/retrieving_metrics.md",
    "content": "# Retrieving Metrics with Trackio CLI\n\nThe `trackio` CLI provides direct terminal access to query Trackio experiment tracking data locally without needing to start the MCP server.\n\n## Quick Command Reference\n\n| Task | Command |\n|------|---------|\n| List projects | `trackio list projects` |\n| List runs | `trackio list runs --project <name>` |\n| List metrics | `trackio list metrics --project <name> --run <name>` |\n| List system metrics | `trackio list system-metrics --project <name> --run <name>` |\n| List alerts | `trackio list alerts --project <name> [--run <name>] [--level <level>] [--since <timestamp>]` |\n| Get project summary | `trackio get project --project <name>` |\n| Get run summary | `trackio get run --project <name> --run <name>` |\n| Get metric values | `trackio get metric --project <name> --run <name> --metric <name>` |\n| Get metric at step | `trackio get metric ... --metric <name> --step <N>` |\n| Get metric around step | `trackio get metric ... --metric <name> --around <N> --window <W>` |\n| Get all metrics snapshot | `trackio get snapshot --project <name> --run <name> --step <N>` |\n| Get system metrics | `trackio get system-metric --project <name> --run <name>` |\n| Show dashboard | `trackio show [--project <name>]` |\n| Sync to Space | `trackio sync --project <name> --space-id <space_id>` |\n\n## Core Commands\n\n### List Commands\n\n```bash\ntrackio list projects                                    # List all projects\ntrackio list projects --json                            # JSON output\n\ntrackio list runs --project <name>                      # List runs in project\ntrackio list runs --project <name> --json               # JSON output\n\ntrackio list metrics --project <name> --run <name>      # List metrics for run\ntrackio list metrics --project <name> --run <name> --json\n\ntrackio list system-metrics --project <name> --run <name>  # List system metrics\ntrackio list system-metrics --project <name> --run <name> --json\n\ntrackio list alerts --project <name>                       # List alerts\ntrackio list alerts --project <name> --run <name> --json   # Filter by run\ntrackio list alerts --project <name> --level error --json  # Filter by level\ntrackio list alerts --project <name> --json --since <ts>   # Poll since timestamp\n```\n\n### Get Commands\n\n```bash\ntrackio get project --project <name>                    # Project summary\ntrackio get project --project <name> --json             # JSON output\n\ntrackio get run --project <name> --run <name>           # Run summary\ntrackio get run --project <name> --run <name> --json\n\ntrackio get metric --project <name> --run <name> --metric <name>  # Metric values\ntrackio get metric --project <name> --run <name> --metric <name> --json\ntrackio get metric ... --metric <name> --step 200                 # At exact step\ntrackio get metric ... --metric <name> --around 200 --window 10   # ±10 steps\ntrackio get metric ... --metric <name> --at-time <ts> --window 60 # ±60 seconds\n\ntrackio get snapshot --project <name> --run <name> --step 200 --json       # All metrics at step\ntrackio get snapshot --project <name> --run <name> --around 200 --window 5 --json  # Window\ntrackio get snapshot --project <name> --run <name> --at-time <ts> --window 60 --json\n\ntrackio get system-metric --project <name> --run <name>           # All system metrics\ntrackio get system-metric --project <name> --run <name> --metric <name>  # Specific metric\ntrackio get system-metric --project <name> --run <name> --json\n```\n\n### Dashboard Commands\n\n```bash\ntrackio show                                              # Launch dashboard\ntrackio show --project <name>                           # Load specific project\ntrackio show --theme <theme>                            # Custom theme\ntrackio show --mcp-server                                # Enable MCP server\ntrackio show --color-palette \"#FF0000,#00FF00\"         # Custom colors\n```\n\n### Sync Commands\n\n```bash\ntrackio sync --project <name> --space-id <space_id>     # Sync to HF Space\ntrackio sync --project <name> --space-id <space_id> --private  # Private space\ntrackio sync --project <name> --space-id <space_id> --force   # Overwrite\n```\n\n## Output Formats\n\nAll `list` and `get` commands support two output formats:\n\n- **Human-readable** (default): Formatted text for terminal viewing\n- **JSON** (with `--json` flag): Structured JSON for programmatic use\n\n## Common Patterns\n\n### Discover Projects and Runs\n\n```bash\n# List all available projects\ntrackio list projects\n\n# List runs in a project\ntrackio list runs --project my-project\n\n# Get project overview\ntrackio get project --project my-project --json\n```\n\n### Inspect Run Details\n\n```bash\n# Get run summary with all metrics\ntrackio get run --project my-project --run my-run --json\n\n# List available metrics\ntrackio list metrics --project my-project --run my-run\n\n# Get specific metric values\ntrackio get metric --project my-project --run my-run --metric loss --json\n```\n\n### Query System Metrics\n\n```bash\n# List system metrics (GPU, etc.)\ntrackio list system-metrics --project my-project --run my-run\n\n# Get all system metric data\ntrackio get system-metric --project my-project --run my-run --json\n\n# Get specific system metric\ntrackio get system-metric --project my-project --run my-run --metric gpu_utilization --json\n```\n\n### Automation Scripts\n\n```bash\n# Extract latest metric value\nLATEST_LOSS=$(trackio get metric --project my-project --run my-run --metric loss --json | jq -r '.values[-1].value')\n\n# Export run summary to file\ntrackio get run --project my-project --run my-run --json > run_summary.json\n\n# Filter runs with jq\ntrackio list runs --project my-project --json | jq '.runs[] | select(startswith(\"train\"))'\n```\n\n### LLM Agent Workflow\n\n```bash\n# 1. Discover available projects\ntrackio list projects --json\n\n# 2. Explore project structure\ntrackio get project --project my-project --json\n\n# 3. Inspect specific run\ntrackio get run --project my-project --run my-run --json\n\n# 4. Query metric values\ntrackio get metric --project my-project --run my-run --metric accuracy --json\n\n# 5. Poll for alerts (use --since for efficient incremental polling)\ntrackio list alerts --project my-project --json --since \"2025-06-01T00:00:00\"\n\n# 6. When an alert fires at step N, get all metrics around that point\ntrackio get snapshot --project my-project --run my-run --around 200 --window 5 --json\n```\n\n## Error Handling\n\nCommands validate inputs and return clear errors:\n\n- Missing project: `Error: Project '<name>' not found.`\n- Missing run: `Error: Run '<name>' not found in project '<project>'.`\n- Missing metric: `Error: Metric '<name>' not found in run '<run>' of project '<project>'.`\n\nAll errors exit with non-zero status code and write to stderr.\n\n## Key Options\n\n- `--project`: Project name (required for most commands)\n- `--run`: Run name (required for run-specific commands)\n- `--metric`: Metric name (required for metric-specific commands)\n- `--json`: Output in JSON format instead of human-readable\n- `--step`: Exact step filter (for `get metric`, `get snapshot`)\n- `--around`: Center step for window filter (for `get metric`, `get snapshot`)\n- `--at-time`: Center ISO timestamp for window filter (for `get metric`, `get snapshot`)\n- `--window`: Window size: ±steps for `--around`, ±seconds for `--at-time` (default: 10)\n- `--level`: Alert level filter (`info`, `warn`, `error`) (for `list alerts`)\n- `--since`: ISO timestamp to filter alerts after (for `list alerts`)\n- `--theme`: Dashboard theme (for `show` command)\n- `--mcp-server`: Enable MCP server mode (for `show` command)\n- `--color-palette`: Comma-separated hex colors (for `show` command)\n- `--private`: Create private Space (for `sync` command)\n- `--force`: Overwrite existing database (for `sync` command)\n\n## JSON Output Structure\n\n### List Projects\n```json\n{\"projects\": [\"project1\", \"project2\"]}\n```\n\n### List Runs\n```json\n{\"project\": \"my-project\", \"runs\": [\"run1\", \"run2\"]}\n```\n\n### Project Summary\n```json\n{\n  \"project\": \"my-project\",\n  \"num_runs\": 3,\n  \"runs\": [\"run1\", \"run2\", \"run3\"],\n  \"last_activity\": 100\n}\n```\n\n### Run Summary\n```json\n{\n  \"project\": \"my-project\",\n  \"run\": \"my-run\",\n  \"num_logs\": 50,\n  \"metrics\": [\"loss\", \"accuracy\"],\n  \"config\": {\"learning_rate\": 0.001},\n  \"last_step\": 49\n}\n```\n\n### Metric Values\n```json\n{\n  \"project\": \"my-project\",\n  \"run\": \"my-run\",\n  \"metric\": \"loss\",\n  \"values\": [\n    {\"step\": 0, \"timestamp\": \"2024-01-01T00:00:00\", \"value\": 0.5},\n    {\"step\": 1, \"timestamp\": \"2024-01-01T00:01:00\", \"value\": 0.4}\n  ]\n}\n```\n\n## References\n\n- **Complete CLI documentation**: See [docs/source/cli_commands.md](docs/source/cli_commands.md)\n- **API and MCP Server**: See [docs/source/api_mcp_server.md](docs/source/api_mcp_server.md)\n\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/SKILL.md",
    "content": "---\nname: hugging-face-vision-trainer\ndescription: Trains and fine-tunes vision models for object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3 — plus any Transformers classifier), and SAM/SAM2 segmentation using Hugging Face Transformers on Hugging Face Jobs cloud GPUs. Covers COCO-format dataset preparation, Albumentations augmentation, mAP/mAR evaluation, accuracy metrics, SAM segmentation with bbox/point prompts, DiceCE loss, hardware selection, cost estimation, Trackio monitoring, and Hub persistence. Use when users mention training object detection, image classification, SAM, SAM2, segmentation, image matting, DETR, D-FINE, RT-DETR, ViT, timm, MobileNet, ResNet, bounding box models, or fine-tuning vision models on Hugging Face Jobs.\n---\n\n# Vision Model Training on Hugging Face Jobs\n\nTrain object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.\n\n## When to Use This Skill\n\nUse this skill when users want to:\n- Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local\n- Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local\n- Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts\n- Train bounding-box detectors on custom datasets\n- Train image classifiers on custom datasets\n- Train segmentation models on custom mask datasets with prompts\n- Run vision training jobs on Hugging Face Jobs infrastructure\n- Ensure trained vision models are permanently saved to the Hub\n\n## Related Skills\n\n- **`hugging-face-jobs`** — General HF Jobs infrastructure: token authentication, hardware flavors, timeout management, cost estimation, secrets, environment variables, scheduled jobs, and result persistence. **Refer to the Jobs skill for any non-training-specific Jobs questions** (e.g., \"how do secrets work?\", \"what hardware is available?\", \"how do I pass tokens?\").\n- **`hugging-face-model-trainer`** — TRL-based language model training (SFT, DPO, GRPO). Use that skill for text/language model fine-tuning.\n\n## Local Script Execution\n\nHelper scripts use PEP 723 inline dependencies. Run them with `uv run`:\n```bash\nuv run scripts/dataset_inspector.py --dataset username/dataset-name --split train\nuv run scripts/estimate_cost.py --help\n```\n\n## Prerequisites Checklist\n\nBefore starting any training job, verify:\n\n### Account & Authentication\n- Hugging Face Account with [Pro](https://hf.co/pro), [Team](https://hf.co/enterprise), or [Enterprise](https://hf.co/enterprise) plan (Jobs require paid plan)\n- Authenticated login: Check with `hf_whoami()` (tool) or `hf auth whoami` (terminal)\n- Token has **write** permissions\n- **MUST pass token in job secrets** — see directive #3 below for syntax (MCP tool vs Python API)\n\n### Dataset Requirements — Object Detection\n- Dataset must exist on Hub\n- Annotations must use the `objects` column with `bbox`, `category` (and optionally `area`) sub-fields\n- Bboxes can be in **xywh (COCO)** or **xyxy (Pascal VOC)** format — auto-detected and converted\n- Categories can be **integers or strings** — strings are auto-remapped to integer IDs\n- `image_id` column is **optional** — generated automatically if missing\n- **ALWAYS validate unknown datasets** before GPU training (see Dataset Validation section)\n\n### Dataset Requirements — Image Classification\n- Dataset must exist on Hub\n- Must have an **`image` column** (PIL images) and a **`label` column** (integer class IDs or strings)\n- The label column can be `ClassLabel` type (with names) or plain integers/strings — strings are auto-remapped\n- Common column names auto-detected: `label`, `labels`, `class`, `fine_label`\n- **ALWAYS validate unknown datasets** before GPU training (see Dataset Validation section)\n\n### Dataset Requirements — SAM/SAM2 Segmentation\n- Dataset must exist on Hub\n- Must have an **`image` column** (PIL images) and a **`mask` column** (binary ground-truth segmentation mask)\n- Must have a **prompt** — either:\n  - A **`prompt` column** with JSON containing `{\"bbox\": [x0,y0,x1,y1]}` or `{\"point\": [x,y]}`\n  - OR a dedicated **`bbox`** column with `[x0,y0,x1,y1]` values\n  - OR a dedicated **`point`** column with `[x,y]` or `[[x,y],...]` values\n- Bboxes should be in **xyxy** format (absolute pixel coordinates)\n- Example dataset: `merve/MicroMat-mini` (image matting with bbox prompts)\n- **ALWAYS validate unknown datasets** before GPU training (see Dataset Validation section)\n\n### Critical Settings\n- **Timeout must exceed expected training time** — Default 30min is TOO SHORT. See directive #6 for recommended values.\n- **Hub push must be enabled** — `push_to_hub=True`, `hub_model_id=\"username/model-name\"`, token in `secrets`\n\n## Dataset Validation\n\n**Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.**\n\n**ALWAYS validate for** unknown/custom datasets or any dataset you haven't trained with before. **Skip for** `cppe-5` (the default in the training script).\n\n### Running the Inspector\n\n**Option 1: Via HF Jobs (recommended — avoids local SSL/dependency issues):**\n```python\nhf_jobs(\"uv\", {\n    \"script\": \"path/to/dataset_inspector.py\",\n    \"script_args\": [\"--dataset\", \"username/dataset-name\", \"--split\", \"train\"]\n})\n```\n\n**Option 2: Locally:**\n```bash\nuv run scripts/dataset_inspector.py --dataset username/dataset-name --split train\n```\n\n**Option 3: Via `HfApi().run_uv_job()` (if hf_jobs MCP unavailable):**\n```python\nfrom huggingface_hub import HfApi\napi = HfApi()\napi.run_uv_job(\n    script=\"scripts/dataset_inspector.py\",\n    script_args=[\"--dataset\", \"username/dataset-name\", \"--split\", \"train\"],\n    flavor=\"cpu-basic\",\n    timeout=300,\n)\n```\n\n### Reading Results\n\n- **`✓ READY`** — Dataset is compatible, use directly\n- **`✗ NEEDS FORMATTING`** — Needs preprocessing (mapping code provided in output)\n\n## Automatic Bbox Preprocessing\n\nThe object detection training script (`scripts/object_detection_training.py`) automatically handles bbox format detection (xyxy→xywh conversion), bbox sanitization, `image_id` generation, string category→integer remapping, and dataset truncation. **No manual preprocessing needed** — just ensure the dataset has `objects.bbox` and `objects.category` columns.\n\n## Training workflow\n\nCopy this checklist and track progress:\n\n```\nTraining Progress:\n- [ ] Step 1: Verify prerequisites (account, token, dataset)\n- [ ] Step 2: Validate dataset format (run dataset_inspector.py)\n- [ ] Step 3: Ask user about dataset size and validation split\n- [ ] Step 4: Prepare training script (OD: scripts/object_detection_training.py, IC: scripts/image_classification_training.py, SAM: scripts/sam_segmentation_training.py)\n- [ ] Step 5: Save script locally, submit job, and report details\n```\n\n**Step 1: Verify prerequisites**\n\nFollow the Prerequisites Checklist above.\n\n**Step 2: Validate dataset**\n\nRun the dataset inspector BEFORE spending GPU time. See \"Dataset Validation\" section above.\n\n**Step 3: Ask user preferences**\n\nALWAYS use the AskUserQuestion tool with option-style format:\n\n```python\nAskUserQuestion({\n    \"questions\": [\n        {\n            \"question\": \"Do you want to run a quick test with a subset of the data first?\",\n            \"header\": \"Dataset Size\",\n            \"options\": [\n                {\"label\": \"Quick test run (10% of data)\", \"description\": \"Faster, cheaper (~30-60 min, ~$2-5) to validate setup\"},\n                {\"label\": \"Full dataset (Recommended)\", \"description\": \"Complete training for best model quality\"}\n            ],\n            \"multiSelect\": false\n        },\n        {\n            \"question\": \"Do you want to create a validation split from the training data?\",\n            \"header\": \"Split data\",\n            \"options\": [\n                {\"label\": \"Yes (Recommended)\", \"description\": \"Automatically split 15% of training data for validation\"},\n                {\"label\": \"No\", \"description\": \"Use existing validation split from dataset\"}\n            ],\n            \"multiSelect\": false\n        },\n        {\n            \"question\": \"Which GPU hardware do you want to use?\",\n            \"header\": \"Hardware Flavor\",\n            \"options\": [\n                {\"label\": \"t4-small ($0.40/hr)\", \"description\": \"1x T4, 16 GB VRAM — sufficient for all OD models under 100M params\"},\n                {\"label\": \"l4x1 ($0.80/hr)\", \"description\": \"1x L4, 24 GB VRAM — more headroom for large images or batch sizes\"},\n                {\"label\": \"a10g-large ($1.50/hr)\", \"description\": \"1x A10G, 24 GB VRAM — faster training, more CPU/RAM\"},\n                {\"label\": \"a100-large ($2.50/hr)\", \"description\": \"1x A100, 80 GB VRAM — fastest, for very large datasets or image sizes\"}\n            ],\n            \"multiSelect\": false\n        }\n    ]\n})\n```\n\n**Step 4: Prepare training script**\n\nFor object detection, use [scripts/object_detection_training.py](scripts/object_detection_training.py) as the production-ready template. For image classification, use [scripts/image_classification_training.py](scripts/image_classification_training.py). For SAM/SAM2 segmentation, use [scripts/sam_segmentation_training.py](scripts/sam_segmentation_training.py). All scripts use `HfArgumentParser` — all configuration is passed via CLI arguments in `script_args`, NOT by editing Python variables. For timm model details, see [references/timm_trainer.md](references/timm_trainer.md). For SAM2 training details, see [references/finetune_sam2_trainer.md](references/finetune_sam2_trainer.md).\n\n**Step 5: Save script, submit job, and report**\n\n1. **Save the script locally** to `submitted_jobs/` in the workspace root (create if needed) with a descriptive name like `training_<dataset>_<YYYYMMDD_HHMMSS>.py`. Tell the user the path.\n2. **Submit** using `hf_jobs` MCP tool (preferred) or `HfApi().run_uv_job()` — see directive #1 for both methods. Pass all config via `script_args`.\n3. **Report** the job ID (from `.id` attribute), monitoring URL, Trackio dashboard (`https://huggingface.co/spaces/{username}/trackio`), expected time, and estimated cost.\n4. **Wait for user** to request status checks — don't poll automatically. Training jobs run asynchronously and can take hours.\n\n## Critical directives\n\nThese rules prevent common failures. Follow them exactly.\n\n### 1. Job submission: `hf_jobs` MCP tool vs Python API\n\n**`hf_jobs()` is an MCP tool, NOT a Python function.** Do NOT try to import it from `huggingface_hub`. Call it as a tool:\n\n```\nhf_jobs(\"uv\", {\"script\": training_script_content, \"flavor\": \"a10g-large\", \"timeout\": \"4h\", \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}})\n```\n\n**If `hf_jobs` MCP tool is unavailable**, use the Python API directly:\n\n```python\nfrom huggingface_hub import HfApi, get_token\napi = HfApi()\njob_info = api.run_uv_job(\n    script=\"path/to/training_script.py\",  # file PATH, NOT content\n    script_args=[\"--dataset_name\", \"cppe-5\", ...],\n    flavor=\"a10g-large\",\n    timeout=14400,  # seconds (4 hours)\n    env={\"PYTHONUNBUFFERED\": \"1\"},\n    secrets={\"HF_TOKEN\": get_token()},  # MUST use get_token(), NOT \"$HF_TOKEN\"\n)\nprint(f\"Job ID: {job_info.id}\")\n```\n\n**Critical differences between the two methods:**\n\n| | `hf_jobs` MCP tool | `HfApi().run_uv_job()` |\n|---|---|---|\n| `script` param | Python code string or URL (NOT local paths) | File path to `.py` file (NOT content) |\n| Token in secrets | `\"$HF_TOKEN\"` (auto-replaced) | `get_token()` (actual token value) |\n| Timeout format | String (`\"4h\"`) | Seconds (`14400`) |\n\n**Rules for both methods:**\n- The training script MUST include PEP 723 inline metadata with dependencies\n- Do NOT use `image` or `command` parameters (those belong to `run_job()`, not `run_uv_job()`)\n\n### 2. Authentication via job secrets + explicit hub_token injection\n\n**Job config** MUST include the token in secrets — syntax depends on submission method (see table above).\n\n**Training script requirement:** The Transformers `Trainer` calls `create_repo(token=self.args.hub_token)` during `__init__()` when `push_to_hub=True`. The training script MUST inject `HF_TOKEN` into `training_args.hub_token` AFTER parsing args but BEFORE creating the `Trainer`. The template `scripts/object_detection_training.py` already includes this:\n\n```python\nhf_token = os.environ.get(\"HF_TOKEN\")\nif training_args.push_to_hub and not training_args.hub_token:\n    if hf_token:\n        training_args.hub_token = hf_token\n```\n\nIf you write a custom script, you MUST include this token injection before the `Trainer(...)` call.\n\n- Do NOT call `login()` in custom scripts unless replicating the full pattern from `scripts/object_detection_training.py`\n- Do NOT rely on implicit token resolution (`hub_token=None`) — unreliable in Jobs\n- See the `hugging-face-jobs` skill → *Token Usage Guide* for full details\n\n### 3. JobInfo attribute\n\nAccess the job identifier using `.id` (NOT `.job_id` or `.name` — these don't exist):\n\n```python\njob_info = api.run_uv_job(...)  # or hf_jobs(\"uv\", {...})\njob_id = job_info.id  # Correct -- returns string like \"687fb701029421ae5549d998\"\n```\n\n### 4. Required training flags and HfArgumentParser boolean syntax\n\n`scripts/object_detection_training.py` uses `HfArgumentParser` — all config is passed via `script_args`. Boolean arguments have two syntaxes:\n\n- **`bool` fields** (e.g., `push_to_hub`, `do_train`): Use as bare flags (`--push_to_hub`) or negate with `--no_` prefix (`--no_remove_unused_columns`)\n- **`Optional[bool]` fields** (e.g., `greater_is_better`): MUST pass explicit value (`--greater_is_better True`). Bare `--greater_is_better` causes `error: expected one argument`\n\nRequired flags for object detection:\n\n```\n--no_remove_unused_columns          # MUST: preserves image column for pixel_values\n--no_eval_do_concat_batches         # MUST: images have different numbers of target boxes\n--push_to_hub                       # MUST: environment is ephemeral\n--hub_model_id username/model-name\n--metric_for_best_model eval_map\n--greater_is_better True            # MUST pass \"True\" explicitly (Optional[bool])\n--do_train\n--do_eval\n```\n\nRequired flags for image classification:\n\n```\n--no_remove_unused_columns          # MUST: preserves image column for pixel_values\n--push_to_hub                       # MUST: environment is ephemeral\n--hub_model_id username/model-name\n--metric_for_best_model eval_accuracy\n--greater_is_better True            # MUST pass \"True\" explicitly (Optional[bool])\n--do_train\n--do_eval\n```\n\nRequired flags for SAM/SAM2 segmentation:\n\n```\n--remove_unused_columns False       # MUST: preserves input_boxes/input_points\n--push_to_hub                       # MUST: environment is ephemeral\n--hub_model_id username/model-name\n--do_train\n--prompt_type bbox                  # or \"point\"\n--dataloader_pin_memory False       # MUST: avoids pin_memory issues with custom collator\n```\n\n### 5. Timeout management\n\nDefault 30 min is TOO SHORT for object detection. Set minimum 2-4 hours. Add 30% buffer for model loading, preprocessing, and Hub push.\n\n| Scenario | Timeout |\n|----------|---------|\n| Quick test (100-200 images, 5-10 epochs) | 1h |\n| Development (500-1K images, 15-20 epochs) | 2-3h |\n| Production (1K-5K images, 30 epochs) | 4-6h |\n| Large dataset (5K+ images) | 6-12h |\n\n### 6. Trackio monitoring\n\nTrackio is **always enabled** in the object detection training script — it calls `trackio.init()` and `trackio.finish()` automatically. No need to pass `--report_to trackio`. The project name is taken from `--output_dir` and the run name from `--run_name`. For image classification, pass `--report_to trackio` in `TrainingArguments`.\n\nDashboard at: `https://huggingface.co/spaces/{username}/trackio`\n\n## Model & hardware selection\n\n### Recommended object detection models\n\n| Model | Params | Use case |\n|-------|--------|----------|\n| `ustc-community/dfine-small-coco` | 10.4M | Best starting point — fast, cheap, SOTA quality |\n| `PekingU/rtdetr_v2_r18vd` | 20.2M | Lightweight real-time detector |\n| `ustc-community/dfine-large-coco` | 31.4M | Higher accuracy, still efficient |\n| `PekingU/rtdetr_v2_r50vd` | 43M | Strong real-time baseline |\n| `ustc-community/dfine-xlarge-obj365` | 63.5M | Best accuracy (pretrained on Objects365) |\n| `PekingU/rtdetr_v2_r101vd` | 76M | Largest RT-DETR v2 variant |\n\nStart with `ustc-community/dfine-small-coco` for fast iteration. Move to D-FINE Large or RT-DETR v2 R50 for better accuracy.\n\n### Recommended image classification models\n\nAll `timm/` models work out of the box via `AutoModelForImageClassification` (loaded as `TimmWrapperForImageClassification`). See [references/timm_trainer.md](references/timm_trainer.md) for details.\n\n| Model | Params | Use case |\n|-------|--------|----------|\n| `timm/mobilenetv3_small_100.lamb_in1k` | 2.5M | Ultra-lightweight — mobile/edge, fastest training |\n| `timm/mobilevit_s.cvnets_in1k` | 5.6M | Mobile transformer — good accuracy/speed trade-off |\n| `timm/resnet50.a1_in1k` | 25.6M | Strong CNN baseline — reliable, well-studied |\n| `timm/vit_base_patch16_dinov3.lvd1689m` | 86.6M | Best accuracy — DINOv3 self-supervised ViT |\n\nStart with `timm/mobilenetv3_small_100.lamb_in1k` for fast iteration. Move to `timm/resnet50.a1_in1k` or `timm/vit_base_patch16_dinov3.lvd1689m` for better accuracy.\n\n### Recommended SAM/SAM2 segmentation models\n\n| Model | Params | Use case |\n|-------|--------|----------|\n| `facebook/sam2.1-hiera-tiny` | 38.9M | Fastest SAM2 — good for quick experiments |\n| `facebook/sam2.1-hiera-small` | 46.0M | Best starting point — good quality/speed balance |\n| `facebook/sam2.1-hiera-base-plus` | 80.8M | Higher capacity for complex segmentation |\n| `facebook/sam2.1-hiera-large` | 224.4M | Best SAM2 accuracy — requires more VRAM |\n| `facebook/sam-vit-base` | 93.7M | Original SAM — ViT-B backbone |\n| `facebook/sam-vit-large` | 312.3M | Original SAM — ViT-L backbone |\n| `facebook/sam-vit-huge` | 641.1M | Original SAM — ViT-H, best SAM v1 accuracy |\n\nStart with `facebook/sam2.1-hiera-small` for fast iteration. SAM2 models are generally more efficient than SAM v1 at similar quality. Only the mask decoder is trained by default (vision and prompt encoders are frozen).\n\n### Hardware recommendation\n\nAll recommended OD and IC models are under 100M params — **`t4-small` (16 GB VRAM, $0.40/hr) is sufficient for all of them.** Image classification models are generally smaller and faster than object detection models — `t4-small` handles even ViT-Base comfortably. For SAM2 models up to `hiera-base-plus`, `t4-small` is sufficient since only the mask decoder is trained. For `sam2.1-hiera-large` or SAM v1 models, use `l4x1` or `a10g-large`. Only upgrade if you hit OOM from large batch sizes — reduce batch size first before switching hardware. Common upgrade path: `t4-small` → `l4x1` ($0.80/hr, 24 GB) → `a10g-large` ($1.50/hr, 24 GB).\n\nFor full hardware flavor list: refer to the `hugging-face-jobs` skill. For cost estimation: run `scripts/estimate_cost.py`.\n\n## Quick start — Object Detection\n\nThe `script_args` below are the same for both submission methods. See directive #1 for the critical differences between them.\n\n```python\nOD_SCRIPT_ARGS = [\n    \"--model_name_or_path\", \"ustc-community/dfine-small-coco\",\n    \"--dataset_name\", \"cppe-5\",\n    \"--image_square_size\", \"640\",\n    \"--output_dir\", \"dfine_finetuned\",\n    \"--num_train_epochs\", \"30\",\n    \"--per_device_train_batch_size\", \"8\",\n    \"--learning_rate\", \"5e-5\",\n    \"--eval_strategy\", \"epoch\",\n    \"--save_strategy\", \"epoch\",\n    \"--save_total_limit\", \"2\",\n    \"--load_best_model_at_end\",\n    \"--metric_for_best_model\", \"eval_map\",\n    \"--greater_is_better\", \"True\",\n    \"--no_remove_unused_columns\",\n    \"--no_eval_do_concat_batches\",\n    \"--push_to_hub\",\n    \"--hub_model_id\", \"username/model-name\",\n    \"--do_train\",\n    \"--do_eval\",\n]\n```\n\n```python\nfrom huggingface_hub import HfApi, get_token\napi = HfApi()\njob_info = api.run_uv_job(\n    script=\"scripts/object_detection_training.py\",\n    script_args=OD_SCRIPT_ARGS,\n    flavor=\"t4-small\",\n    timeout=14400,\n    env={\"PYTHONUNBUFFERED\": \"1\"},\n    secrets={\"HF_TOKEN\": get_token()},\n)\nprint(f\"Job ID: {job_info.id}\")\n```\n\n### Key OD `script_args`\n\n- `--model_name_or_path` — recommended: `\"ustc-community/dfine-small-coco\"` (see model table above)\n- `--dataset_name` — the Hub dataset ID\n- `--image_square_size` — 480 (fast iteration) or 800 (better accuracy)\n- `--hub_model_id` — `\"username/model-name\"` for Hub persistence\n- `--num_train_epochs` — 30 typical for convergence\n- `--train_val_split` — fraction to split for validation (default 0.15), set if dataset lacks a validation split\n- `--max_train_samples` — truncate training set (useful for quick test runs, e.g. `\"785\"` for ~10% of a 7.8K dataset)\n- `--max_eval_samples` — truncate evaluation set\n\n## Quick start — Image Classification\n\n```python\nIC_SCRIPT_ARGS = [\n    \"--model_name_or_path\", \"timm/mobilenetv3_small_100.lamb_in1k\",\n    \"--dataset_name\", \"ethz/food101\",\n    \"--output_dir\", \"food101_classifier\",\n    \"--num_train_epochs\", \"5\",\n    \"--per_device_train_batch_size\", \"32\",\n    \"--per_device_eval_batch_size\", \"32\",\n    \"--learning_rate\", \"5e-5\",\n    \"--eval_strategy\", \"epoch\",\n    \"--save_strategy\", \"epoch\",\n    \"--save_total_limit\", \"2\",\n    \"--load_best_model_at_end\",\n    \"--metric_for_best_model\", \"eval_accuracy\",\n    \"--greater_is_better\", \"True\",\n    \"--no_remove_unused_columns\",\n    \"--push_to_hub\",\n    \"--hub_model_id\", \"username/food101-classifier\",\n    \"--do_train\",\n    \"--do_eval\",\n]\n```\n\n```python\nfrom huggingface_hub import HfApi, get_token\napi = HfApi()\njob_info = api.run_uv_job(\n    script=\"scripts/image_classification_training.py\",\n    script_args=IC_SCRIPT_ARGS,\n    flavor=\"t4-small\",\n    timeout=7200,\n    env={\"PYTHONUNBUFFERED\": \"1\"},\n    secrets={\"HF_TOKEN\": get_token()},\n)\nprint(f\"Job ID: {job_info.id}\")\n```\n\n### Key IC `script_args`\n\n- `--model_name_or_path` — any `timm/` model or Transformers classification model (see model table above)\n- `--dataset_name` — the Hub dataset ID\n- `--image_column_name` — column containing PIL images (default: `\"image\"`)\n- `--label_column_name` — column containing class labels (default: `\"label\"`)\n- `--hub_model_id` — `\"username/model-name\"` for Hub persistence\n- `--num_train_epochs` — 3-5 typical for classification (fewer than OD)\n- `--per_device_train_batch_size` — 16-64 (classification models use less memory than OD)\n- `--train_val_split` — fraction to split for validation (default 0.15), set if dataset lacks a validation split\n- `--max_train_samples` / `--max_eval_samples` — truncate for quick tests\n\n## Quick start — SAM/SAM2 Segmentation\n\n```python\nSAM_SCRIPT_ARGS = [\n    \"--model_name_or_path\", \"facebook/sam2.1-hiera-small\",\n    \"--dataset_name\", \"merve/MicroMat-mini\",\n    \"--prompt_type\", \"bbox\",\n    \"--prompt_column_name\", \"prompt\",\n    \"--output_dir\", \"sam2-finetuned\",\n    \"--num_train_epochs\", \"30\",\n    \"--per_device_train_batch_size\", \"4\",\n    \"--learning_rate\", \"1e-5\",\n    \"--logging_steps\", \"1\",\n    \"--save_strategy\", \"epoch\",\n    \"--save_total_limit\", \"2\",\n    \"--remove_unused_columns\", \"False\",\n    \"--dataloader_pin_memory\", \"False\",\n    \"--push_to_hub\",\n    \"--hub_model_id\", \"username/sam2-finetuned\",\n    \"--do_train\",\n    \"--report_to\", \"trackio\",\n]\n```\n\n```python\nfrom huggingface_hub import HfApi, get_token\napi = HfApi()\njob_info = api.run_uv_job(\n    script=\"scripts/sam_segmentation_training.py\",\n    script_args=SAM_SCRIPT_ARGS,\n    flavor=\"t4-small\",\n    timeout=7200,\n    env={\"PYTHONUNBUFFERED\": \"1\"},\n    secrets={\"HF_TOKEN\": get_token()},\n)\nprint(f\"Job ID: {job_info.id}\")\n```\n\n### Key SAM `script_args`\n\n- `--model_name_or_path` — SAM or SAM2 model (see model table above); auto-detects SAM vs SAM2\n- `--dataset_name` — the Hub dataset ID (e.g., `\"merve/MicroMat-mini\"`)\n- `--prompt_type` — `\"bbox\"` or `\"point\"` — type of prompt in the dataset\n- `--prompt_column_name` — column with JSON-encoded prompts (default: `\"prompt\"`)\n- `--bbox_column_name` — dedicated bbox column (alternative to JSON prompt column)\n- `--point_column_name` — dedicated point column (alternative to JSON prompt column)\n- `--mask_column_name` — column with ground-truth masks (default: `\"mask\"`)\n- `--hub_model_id` — `\"username/model-name\"` for Hub persistence\n- `--num_train_epochs` — 20-30 typical for SAM fine-tuning\n- `--per_device_train_batch_size` — 2-4 (SAM models use significant memory)\n- `--freeze_vision_encoder` / `--freeze_prompt_encoder` — freeze encoder weights (default: both frozen, only mask decoder trains)\n- `--train_val_split` — fraction to split for validation (default 0.1)\n\n## Checking job status\n\n**MCP tool (if available):**\n```\nhf_jobs(\"ps\")                                   # List all jobs\nhf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})      # View logs\nhf_jobs(\"inspect\", {\"job_id\": \"your-job-id\"})   # Job details\n```\n\n**Python API fallback:**\n```python\nfrom huggingface_hub import HfApi\napi = HfApi()\napi.list_jobs()                                  # List all jobs\napi.get_job_logs(job_id=\"your-job-id\")           # View logs\napi.get_job(job_id=\"your-job-id\")                # Job details\n```\n\n## Common failure modes\n\n### OOM (CUDA out of memory)\nReduce `per_device_train_batch_size` (try 4, then 2), reduce `IMAGE_SIZE`, or upgrade hardware.\n\n### Dataset format errors\nRun `scripts/dataset_inspector.py` first. The training script auto-detects xyxy vs xywh, converts string categories to integer IDs, and adds `image_id` if missing. Ensure `objects.bbox` contains 4-value coordinate lists in absolute pixels and `objects.category` contains either integer IDs or string labels.\n\n### Hub push failures (401)\nVerify: (1) job secrets include token (see directive #2), (2) script sets `training_args.hub_token` BEFORE creating the `Trainer`, (3) `push_to_hub=True` is set, (4) correct `hub_model_id`, (5) token has write permissions.\n\n### Job timeout\nIncrease timeout (see directive #5 table), reduce epochs/dataset, or use checkpoint strategy with `hub_strategy=\"every_save\"`.\n\n### KeyError: 'test' (missing test split)\nThe object detection training script handles this gracefully — it falls back to the `validation` split. Ensure you're using the latest `scripts/object_detection_training.py`.\n\n### Single-class dataset: \"iteration over a 0-d tensor\"\n`torchmetrics.MeanAveragePrecision` returns scalar (0-d) tensors for per-class metrics when there's only one class. The template `scripts/object_detection_training.py` handles this by calling `.unsqueeze(0)` on these tensors. Ensure you're using the latest template.\n\n### Poor detection performance (mAP < 0.15)\nIncrease epochs (30-50), ensure 500+ images, check per-class mAP for imbalanced classes, try different learning rates (1e-5 to 1e-4), increase image size.\n\nFor comprehensive troubleshooting: see [references/reliability_principles.md](references/reliability_principles.md)\n\n## Reference files\n\n- [scripts/object_detection_training.py](scripts/object_detection_training.py) — Production-ready object detection training script\n- [scripts/image_classification_training.py](scripts/image_classification_training.py) — Production-ready image classification training script (supports timm models)\n- [scripts/sam_segmentation_training.py](scripts/sam_segmentation_training.py) — Production-ready SAM/SAM2 segmentation training script (bbox & point prompts)\n- [scripts/dataset_inspector.py](scripts/dataset_inspector.py) — Validate dataset format for OD, classification, and SAM segmentation\n- [scripts/estimate_cost.py](scripts/estimate_cost.py) — Estimate training costs for any vision model (includes SAM/SAM2)\n- [references/object_detection_training_notebook.md](references/object_detection_training_notebook.md) — Object detection training workflow, augmentation strategies, and training patterns\n- [references/image_classification_training_notebook.md](references/image_classification_training_notebook.md) — Image classification training workflow with ViT, preprocessing, and evaluation\n- [references/finetune_sam2_trainer.md](references/finetune_sam2_trainer.md) — SAM2 fine-tuning walkthrough with MicroMat dataset, DiceCE loss, and Trainer integration\n- [references/timm_trainer.md](references/timm_trainer.md) — Using timm models with HF Trainer (TimmWrapper, transforms, full example)\n- [references/hub_saving.md](references/hub_saving.md) — Detailed Hub persistence guide and verification checklist\n- [references/reliability_principles.md](references/reliability_principles.md) — Failure prevention principles from production experience\n\n## External links\n\n- [Transformers Object Detection Guide](https://huggingface.co/docs/transformers/tasks/object_detection)\n- [Transformers Image Classification Guide](https://huggingface.co/docs/transformers/tasks/image_classification)\n- [DETR Model Documentation](https://huggingface.co/docs/transformers/model_doc/detr)\n- [ViT Model Documentation](https://huggingface.co/docs/transformers/model_doc/vit)\n- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs) — Main Jobs documentation\n- [HF Jobs Configuration](https://huggingface.co/docs/hub/en/jobs-configuration) — Hardware, secrets, timeouts, namespaces\n- [HF Jobs CLI Reference](https://huggingface.co/docs/huggingface_hub/guides/cli#hf-jobs) — Command line interface\n- [Object Detection Models](https://huggingface.co/models?pipeline_tag=object-detection)\n- [Image Classification Models](https://huggingface.co/models?pipeline_tag=image-classification)\n- [SAM2 Model Documentation](https://huggingface.co/docs/transformers/model_doc/sam2)\n- [SAM Model Documentation](https://huggingface.co/docs/transformers/model_doc/sam)\n- [Object Detection Datasets](https://huggingface.co/datasets?task_categories=task_categories:object-detection)\n- [Image Classification Datasets](https://huggingface.co/datasets?task_categories=task_categories:image-classification)\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/finetune_sam2_trainer.md",
    "content": "# Fine-tuning SAM2 with HF Trainer\n\nFine-tune SAM2.1 on a small part of the MicroMat dataset for image matting,\nusing the Hugging Face Trainer with a custom loss function.\n\n```python\n!pip install -q transformers datasets monai trackio\n```\n\n## Load and explore the dataset\n\n```python\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"merve/MicroMat-mini\", split=\"train\")\ndataset\n```\n\n```python\ndataset = dataset.train_test_split(test_size=0.1)\ntrain_ds = dataset[\"train\"]\nval_ds = dataset[\"test\"]\n```\n\n```python\nimport json\n\ntrain_ds[0]\n```\n\n```python\njson.loads(train_ds[\"prompt\"][0])[\"bbox\"]\n```\n\n## Visualize a sample\n\n```python\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n\ndef show_mask(mask, ax, bbox):\n    color = np.array([0.12, 0.56, 1.0, 0.6])\n    mask = np.array(mask)\n    h, w = mask.shape\n    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, 4)\n    ax.imshow(mask_image)\n    x0, y0, x1, y1 = bbox\n    ax.add_patch(\n        plt.Rectangle(\n            (x0, y0), x1 - x0, y1 - y0, fill=False, edgecolor=\"lime\", linewidth=2\n        )\n    )\n\n\nexample = train_ds[0]\nimage = np.array(example[\"image\"])\nground_truth_mask = np.array(example[\"mask\"])\n\nfig, ax = plt.subplots()\nax.imshow(image)\nshow_mask(ground_truth_mask, ax, json.loads(example[\"prompt\"])[\"bbox\"])\nax.set_title(\"Ground truth mask\")\nax.set_axis_off()\nplt.show()\n```\n\n## Build the dataset and collator\n\n`SAMDataset` wraps each sample into the format expected by the SAM2 processor.\nGround-truth masks are stored under the key `\"labels\"` so the Trainer\nautomatically pops them before calling `model.forward()`.\n\n```python\nfrom torch.utils.data import Dataset\nimport torch\nimport torch.nn.functional as F\n\n\nclass SAMDataset(Dataset):\n    def __init__(self, dataset, processor):\n        self.dataset = dataset\n        self.processor = processor\n\n    def __len__(self):\n        return len(self.dataset)\n\n    def __getitem__(self, idx):\n        item = self.dataset[idx]\n        image = item[\"image\"]\n        prompt = json.loads(item[\"prompt\"])[\"bbox\"]\n        inputs = self.processor(image, input_boxes=[[prompt]], return_tensors=\"pt\")\n        inputs[\"labels\"] = (np.array(item[\"mask\"]) > 0).astype(np.float32)\n        inputs[\"original_image_size\"] = torch.tensor(image.size[::-1])\n        return inputs\n\n\ndef collate_fn(batch):\n    pixel_values = torch.cat([item[\"pixel_values\"] for item in batch], dim=0)\n    original_sizes = torch.stack([item[\"original_sizes\"] for item in batch])\n    input_boxes = torch.cat([item[\"input_boxes\"] for item in batch], dim=0)\n    labels = torch.cat(\n        [\n            F.interpolate(\n                torch.as_tensor(x[\"labels\"]).unsqueeze(0).unsqueeze(0).float(),\n                size=(256, 256),\n                mode=\"nearest\",\n            )\n            for x in batch\n        ],\n        dim=0,\n    ).long()\n\n    return {\n        \"pixel_values\": pixel_values,\n        \"original_sizes\": original_sizes,\n        \"input_boxes\": input_boxes,\n        \"labels\": labels,\n        \"original_image_size\": torch.stack(\n            [item[\"original_image_size\"] for item in batch]\n        ),\n        \"multimask_output\": False,\n    }\n```\n\n```python\nfrom transformers import Sam2Processor\n\nprocessor = Sam2Processor.from_pretrained(\"facebook/sam2.1-hiera-small\")\n\ntrain_dataset = SAMDataset(dataset=train_ds, processor=processor)\nval_dataset = SAMDataset(dataset=val_ds, processor=processor)\n```\n\n## Load model and freeze encoder layers\n\n```python\nfrom transformers import Sam2Model\n\nmodel = Sam2Model.from_pretrained(\"facebook/sam2.1-hiera-small\")\n\nfor name, param in model.named_parameters():\n    if name.startswith(\"vision_encoder\") or name.startswith(\"prompt_encoder\"):\n        param.requires_grad_(False)\n```\n\n## Inference before training\n\n```python\nitem = val_ds[1]\nimg = item[\"image\"]\nbbox = json.loads(item[\"prompt\"])[\"bbox\"]\ninputs = processor(images=img, input_boxes=[[bbox]], return_tensors=\"pt\").to(\n    model.device\n)\n\nwith torch.no_grad():\n    outputs = model(**inputs)\n\nmasks = processor.post_process_masks(outputs.pred_masks.cpu(), inputs[\"original_sizes\"])[0]\npreds = masks.squeeze(0)\nmask = (preds[0] > 0).cpu().numpy()\n\noverlay = np.asarray(img, dtype=np.uint8).copy()\noverlay[mask] = 0.55 * overlay[mask] + 0.45 * np.array([0, 255, 0], dtype=np.float32)\n\nplt.imshow(overlay)\nplt.title(\"Before training\")\nplt.axis(\"off\")\nplt.show()\n```\n\n## Define custom loss\n\nSAM2 does not compute loss in its `forward()`, so we provide a\n`compute_loss_func` to the Trainer. The Trainer pops `\"labels\"` from the\nbatch before calling `model(**inputs)`, then passes `(outputs, labels)` to\nthis function.\n\n```python\nimport monai\nfrom transformers import Trainer, TrainingArguments\nimport trackio\n\nseg_loss = monai.losses.DiceCELoss(sigmoid=True, squared_pred=True, reduction=\"mean\")\n\n\ndef compute_loss(outputs, labels, num_items_in_batch=None):\n    predicted_masks = outputs.pred_masks.squeeze(1)\n    return seg_loss(predicted_masks, labels.float())\n```\n\n## Train with Trainer\n\nKey settings:\n- `remove_unused_columns=False`: the Trainer must keep `input_boxes`,\n  `original_sizes`, etc. that are not in the model's `forward()` signature.\n- `compute_loss_func`: our custom DiceCE loss.\n- `report_to=\"trackio\"`: logs the training loss to trackio.\n\n```python\ntraining_args = TrainingArguments(\n    output_dir=\"sam2-finetuned\",\n    num_train_epochs=30,\n    per_device_train_batch_size=4,\n    learning_rate=1e-5,\n    weight_decay=0,\n    logging_steps=1,\n    save_strategy=\"epoch\",\n    save_total_limit=2,\n    remove_unused_columns=False,\n    dataloader_pin_memory=False,\n    report_to=\"trackio\",\n)\n\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=train_dataset,\n    data_collator=collate_fn,\n    compute_loss_func=compute_loss,\n)\n\ntrainer.train()\n```\n\n## Inference after training\n\n```python\nitem = val_ds[1]\nimg = item[\"image\"]\nbbox = json.loads(item[\"prompt\"])[\"bbox\"]\n\ninputs = processor(images=img, input_boxes=[[bbox]], return_tensors=\"pt\").to(\n    model.device\n)\n\nwith torch.no_grad():\n    outputs = model(**inputs)\n\npreds = processor.post_process_masks(\n    outputs.pred_masks.cpu(), inputs[\"original_sizes\"]\n)[0]\npreds = preds.squeeze(0)\nmask = (preds[0] > 0).cpu().numpy()\n\noverlay = np.asarray(img, dtype=np.uint8).copy()\noverlay[mask] = 0.55 * overlay[mask] + 0.45 * np.array([0, 255, 0], dtype=np.float32)\n\nplt.imshow(overlay)\nplt.title(\"After training\")\nplt.axis(\"off\")\nplt.show()\n```\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/hub_saving.md",
    "content": "# Saving Vision Models to Hugging Face Hub\n\n## Contents\n- Why Hub Push is Required\n- Required Configuration (TrainingArguments, job config)\n- Complete Example\n- What Gets Saved\n- Important: Save Image Processor\n- Checkpoint Saving\n- Model Card Configuration\n- Saving Label Mappings\n- Authentication Methods\n- Verification Checklist\n- Repository Setup (automatic/manual creation, naming)\n- Troubleshooting (401, 403, push failures, inference issues)\n- Manual Push After Training\n- Example: Full Production Setup\n- Inference Example\n\n---\n\n**CRITICAL:** Training environments are ephemeral. ALL results are lost when a job completes unless pushed to the Hub.\n\n## Why Hub Push is Required\n\nWhen running on Hugging Face Jobs:\n- Environment is temporary\n- All files deleted on job completion\n- No local disk persistence\n- Cannot access results after job ends\n\n**Without Hub push, training is completely wasted.**\n\n## Required Configuration\n\n### 1. Training Configuration\n\nIn your TrainingArguments:\n\n```python\nfrom transformers import TrainingArguments\n\ntraining_args = TrainingArguments(\n    output_dir=\"my-object-detector\",\n    push_to_hub=True,                    # Enable Hub push\n    hub_model_id=\"username/model-name\",   # Target repository\n)\n```\n\n### 2. Job Configuration\n\nWhen submitting the job:\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": training_script_content,  # Pass the Python script content directly as a string\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # Provide authentication\n})\n```\n\n**The `$HF_TOKEN` syntax references your actual Hugging Face token value.**\n\n## Complete Example\n\n```python\n# train_detector.py\n# /// script\n# dependencies = [\"transformers\", \"torch\", \"torchvision\", \"datasets\"]\n# ///\n\nfrom transformers import (\n    AutoImageProcessor,\n    AutoModelForObjectDetection,\n    TrainingArguments,\n    Trainer\n)\nfrom datasets import load_dataset\nimport os\nimport torch\n\n# Load dataset\ndataset = load_dataset(\"cppe-5\", split=\"train\")\n\n# Load model and processor\nmodel_name = \"facebook/detr-resnet-50\"\nimage_processor = AutoImageProcessor.from_pretrained(model_name)\nmodel = AutoModelForObjectDetection.from_pretrained(\n    model_name,\n    num_labels=5,  # Number of classes\n    ignore_mismatched_sizes=True\n)\n\n# Configure with Hub push\ntraining_args = TrainingArguments(\n    output_dir=\"my-detector\",\n    num_train_epochs=10,\n    per_device_train_batch_size=8,\n\n    # ✅ CRITICAL: Hub push configuration\n    push_to_hub=True,\n    hub_model_id=\"myusername/cppe5-detector\",\n\n    # Optional: Push strategy\n    hub_strategy=\"checkpoint\",  # Push checkpoints during training\n)\n\n# ✅ CRITICAL: Authenticate with Hub BEFORE creating Trainer\nfrom huggingface_hub import login\nhf_token = os.environ.get(\"HF_TOKEN\") or os.environ.get(\"hfjob\")\nif hf_token:\n    login(token=hf_token)\n    training_args.hub_token = hf_token\nelif training_args.push_to_hub:\n    raise ValueError(\"HF_TOKEN not found! Add secrets={'HF_TOKEN': '$HF_TOKEN'} to job config.\")\n\n# Define collate function\ndef collate_fn(batch):\n    pixel_values = [item[\"pixel_values\"] for item in batch]\n    labels = [item[\"labels\"] for item in batch]\n    encoding = image_processor.pad(pixel_values, return_tensors=\"pt\")\n    return {\n        \"pixel_values\": encoding[\"pixel_values\"],\n        \"labels\": labels\n    }\n\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=dataset,\n    data_collator=collate_fn,\n)\n\ntrainer.train()\n\n# ✅ Push final model and processor\ntrainer.push_to_hub()\nimage_processor.push_to_hub(\"myusername/cppe5-detector\")\n\nprint(\"✅ Model saved to: https://huggingface.co/myusername/cppe5-detector\")\n```\n\n**Submit with authentication:**\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": training_script_content,  # Pass script content as a string, NOT a filename\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"4h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Required!\n})\n```\n\n## What Gets Saved\n\nWhen `push_to_hub=True`:\n\n1. **Model weights** - Final trained parameters\n2. **Image processor** - Associated preprocessing configuration\n3. **Configuration** - Model config (config.json) including:\n   - Number of labels/classes\n   - Architecture details (backbone, num_queries, etc.)\n   - Label mappings (id2label, label2id)\n4. **Training arguments** - Hyperparameters used\n5. **Model card** - Auto-generated documentation\n6. **Checkpoints** - If `save_strategy=\"steps\"` enabled\n\n## Important: Save Image Processor\n\n**Object detection models require the image processor to be saved separately:**\n\n```python\n# After training completes\ntrainer.push_to_hub()\n\n# ✅ Also push the image processor\nimage_processor.push_to_hub(\n    repo_id=\"username/model-name\",\n    commit_message=\"Upload image processor\"\n)\n```\n\n**Why this matters:**\n- Models need specific image preprocessing (resizing, normalization)\n- Image processor contains critical configuration\n- Without it, model cannot be used for inference\n\n## Checkpoint Saving\n\nSave intermediate checkpoints during training:\n\n```python\nTrainingArguments(\n    output_dir=\"my-detector\",\n    push_to_hub=True,\n    hub_model_id=\"username/my-detector\",\n\n    # Checkpoint configuration\n    save_strategy=\"steps\",\n    save_steps=500,              # Save every 500 steps\n    save_total_limit=3,          # Keep only last 3 checkpoints\n    hub_strategy=\"checkpoint\",   # Push checkpoints to Hub\n)\n```\n\n**Benefits:**\n- Resume training if job fails\n- Compare checkpoint performance\n- Use intermediate models\n- Track training progress\n\n**Checkpoints are pushed to:** `username/my-detector` (same repo)\n\n## Model Card Configuration\n\nAdd metadata for better discoverability:\n\n```python\n# At the end of training script\nmodel.push_to_hub(\n    \"username/my-detector\",\n    commit_message=\"Upload trained object detection model\",\n    tags=[\"object-detection\", \"vision\", \"cppe-5\"],\n    model_card_kwargs={\n        \"license\": \"apache-2.0\",\n        \"dataset\": \"cppe-5\",\n        \"metrics\": [\"map\", \"recall\", \"precision\"],\n        \"pipeline_tag\": \"object-detection\",\n    }\n)\n```\n\n## Saving Label Mappings\n\n**Critical for object detection:** Save class labels with the model:\n\n```python\n# Define your label mappings\nid2label = {0: \"Coverall\", 1: \"Face_Shield\", 2: \"Gloves\", 3: \"Goggles\", 4: \"Mask\"}\nlabel2id = {v: k for k, v in id2label.items()}\n\n# Update model config before training\nmodel.config.id2label = id2label\nmodel.config.label2id = label2id\n\n# Now train and push\ntrainer.train()\ntrainer.push_to_hub()\n```\n\n**Without label mappings:**\n- Model outputs will be numeric IDs only\n- No human-readable class names\n- Difficult to interpret results\n\n## Authentication Methods\n\nFor a complete guide on token types, `$HF_TOKEN` automatic replacement, `secrets` vs `env` differences, and security best practices, see the `hugging-face-jobs` skill → *Token Usage Guide*.\n\n**Recommended:** Always pass tokens via `secrets` (encrypted server-side):\n\n```python\n\"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}  # ✅ Automatic replacement with your logged-in token\n```\n\n## Verification Checklist\n\nBefore submitting any training job, verify:\n\n- [ ] `push_to_hub=True` in TrainingArguments\n- [ ] `hub_model_id` is specified (format: `username/model-name`)\n- [ ] Image processor will be saved separately\n- [ ] Label mappings (id2label, label2id) are configured\n- [ ] Repository name doesn't conflict with existing repos\n- [ ] You have write access to the target namespace\n\n## Repository Setup\n\n### Automatic Creation\n\nIf repository doesn't exist, it's created automatically when first pushing.\n\n### Manual Creation\n\nCreate repository before training:\n\n```python\nfrom huggingface_hub import HfApi\n\napi = HfApi()\napi.create_repo(\n    repo_id=\"username/detector-name\",\n    repo_type=\"model\",\n    private=False,  # or True for private repo\n)\n```\n\n### Repository Naming\n\n**Valid names:**\n- `username/detr-cppe5`\n- `username/yolos-object-detector`\n- `organization/custom-detector`\n\n**Invalid names:**\n- `detector-name` (missing username)\n- `username/detector name` (spaces not allowed)\n- `username/DETECTOR` (uppercase discouraged)\n\n**Recommended naming:**\n- Include model architecture: `detr-`, `yolos-`, `deta-`\n- Include dataset: `-cppe5`, `-coco`, `-voc`\n- Be descriptive: `detr-resnet50-cppe5` > `model1`\n\n## Troubleshooting\n\n### Error: 401 Unauthorized\n\n**Cause:** HF_TOKEN not provided, invalid, or not authenticated before Trainer init\n\n**Solutions:**\n1. Verify `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}` in job config\n2. Verify script calls `login(token=hf_token)` AND sets `training_args.hub_token = hf_token` BEFORE creating the `Trainer`\n3. Check you're logged in locally: `hf auth whoami`\n4. Re-login: `hf auth login`\n\n**Root cause:** The `Trainer` calls `create_repo(token=self.args.hub_token)` during `__init__()` when `push_to_hub=True`. Relying on implicit env-var token resolution is unreliable in Jobs. Calling `login()` saves the token globally, and setting `training_args.hub_token` ensures the Trainer passes it explicitly to all Hub API calls.\n\n### Error: 403 Forbidden\n\n**Cause:** No write access to repository\n\n**Solutions:**\n1. Check repository namespace matches your username\n2. Verify you're a member of organization (if using org namespace)\n3. Check repository isn't private (if accessing org repo)\n\n### Error: Repository not found\n\n**Cause:** Repository doesn't exist and auto-creation failed\n\n**Solutions:**\n1. Manually create repository first\n2. Check repository name format\n3. Verify namespace exists\n\n### Error: Push failed during training\n\n**Cause:** Network issues or Hub unavailable\n\n**Solutions:**\n1. Training continues but final push fails\n2. Checkpoints may be saved\n3. Re-run push manually after job completes\n\n### Issue: Model loads but inference fails\n\n**Possible causes:**\n1. Image processor not saved—verify it's pushed separately\n2. Label mappings missing—check config.json has id2label\n3. Wrong image size—verify image processor matches training config\n\n### Issue: Model saved but not visible\n\n**Possible causes:**\n1. Repository is private—check https://huggingface.co/username\n2. Wrong namespace—verify `hub_model_id` matches login\n3. Push still in progress—wait a few minutes\n\n## Manual Push After Training\n\nIf training completes but push fails, push manually:\n\n```python\nfrom transformers import AutoModelForObjectDetection, AutoImageProcessor\n\n# Load from local checkpoint\nmodel = AutoModelForObjectDetection.from_pretrained(\"./output_dir\")\nimage_processor = AutoImageProcessor.from_pretrained(\"./output_dir\")\n\n# Push to Hub\nmodel.push_to_hub(\"username/model-name\", token=\"hf_abc123...\")\nimage_processor.push_to_hub(\"username/model-name\", token=\"hf_abc123...\")\n```\n\n**Note:** Only possible if job hasn't completed (files still exist).\n\n## Best Practices\n\n1. **Always enable `push_to_hub=True`**\n2. **Save image processor separately** - critical for inference\n3. **Configure label mappings** before training\n4. **Use checkpoint saving** for long training runs\n5. **Verify Hub push** in logs before job completes\n6. **Set appropriate `save_total_limit`** to avoid excessive checkpoints\n7. **Use descriptive repo names** (e.g., `detr-cppe5` not `detector1`)\n8. **Add model card** with:\n   - Training dataset\n   - Evaluation metrics (mAP, IoU)\n   - Example usage code\n   - Limitations\n9. **Tag models appropriately**:\n   - `object-detection`\n   - Architecture: `detr`, `yolos`, `deta`\n   - Dataset: `coco`, `voc`, `cppe-5`\n\n## Monitoring Push Progress\n\nCheck logs for push progress:\n\n```python\nhf_jobs(\"logs\", {\"job_id\": \"your-job-id\"})\n```\n\n**Look for:**\n```\nPushing model to username/detector-name...\nUpload file pytorch_model.bin: 100%\n✅ Model pushed successfully\nPushing image processor...\n✅ Image processor pushed successfully\n```\n\n## Example: Full Production Setup\n\n```python\n# production_detector.py\n# /// script\n# dependencies = [\n#     \"transformers>=4.30.0\",\n#     \"torch>=2.0.0\",\n#     \"torchvision>=0.15.0\",\n#     \"datasets>=2.12.0\",\n#     \"evaluate>=0.4.0\"\n# ]\n# ///\n\nfrom transformers import (\n    AutoImageProcessor,\n    AutoModelForObjectDetection,\n    TrainingArguments,\n    Trainer\n)\nfrom datasets import load_dataset\nimport os\nimport torch\n\n# Configuration\nMODEL_NAME = \"facebook/detr-resnet-50\"\nDATASET_NAME = \"cppe-5\"\nHUB_MODEL_ID = \"myusername/detr-cppe5-detector\"\nNUM_CLASSES = 5\n\n# Class labels\nid2label = {0: \"Coverall\", 1: \"Face_Shield\", 2: \"Gloves\", 3: \"Goggles\", 4: \"Mask\"}\nlabel2id = {v: k for k, v in id2label.items()}\n\nprint(f\"🔧 Loading dataset: {DATASET_NAME}\")\ndataset = load_dataset(DATASET_NAME, split=\"train\")\nprint(f\"✅ Dataset loaded: {len(dataset)} examples\")\n\nprint(f\"🔧 Loading model: {MODEL_NAME}\")\nimage_processor = AutoImageProcessor.from_pretrained(MODEL_NAME)\nmodel = AutoModelForObjectDetection.from_pretrained(\n    MODEL_NAME,\n    num_labels=NUM_CLASSES,\n    id2label=id2label,\n    label2id=label2id,\n    ignore_mismatched_sizes=True\n)\nprint(\"✅ Model loaded\")\n\n# Configure with comprehensive Hub settings\ntraining_args = TrainingArguments(\n    output_dir=\"detr-cppe5\",\n\n    # Hub configuration\n    push_to_hub=True,\n    hub_model_id=HUB_MODEL_ID,\n    hub_strategy=\"checkpoint\",  # Push checkpoints\n\n    # Checkpoint configuration\n    save_strategy=\"steps\",\n    save_steps=500,\n    save_total_limit=3,\n\n    # Training settings\n    num_train_epochs=10,\n    per_device_train_batch_size=8,\n    gradient_accumulation_steps=2,\n    learning_rate=1e-4,\n    warmup_steps=500,\n\n    # Evaluation\n    eval_strategy=\"steps\",\n    eval_steps=500,\n\n    # Logging\n    logging_steps=50,\n    logging_first_step=True,\n\n    # Performance\n    fp16=True,  # Mixed precision training\n    dataloader_num_workers=4,\n)\n\n# ✅ CRITICAL: Authenticate with Hub BEFORE creating Trainer\n# login() saves the token globally so ALL hub operations can find it.\nfrom huggingface_hub import login\nhf_token = os.environ.get(\"HF_TOKEN\") or os.environ.get(\"hfjob\")\nif hf_token:\n    login(token=hf_token)\n    training_args.hub_token = hf_token\nelif training_args.push_to_hub:\n    raise ValueError(\"HF_TOKEN not found! Add secrets={'HF_TOKEN': '$HF_TOKEN'} to job config.\")\n\n# Data collator\ndef collate_fn(batch):\n    pixel_values = [item[\"pixel_values\"] for item in batch]\n    labels = [item[\"labels\"] for item in batch]\n    encoding = image_processor.pad(pixel_values, return_tensors=\"pt\")\n    return {\n        \"pixel_values\": encoding[\"pixel_values\"],\n        \"labels\": labels\n    }\n\n# Create trainer\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=dataset,\n    data_collator=collate_fn,\n)\n\nprint(\"🚀 Starting training...\")\ntrainer.train()\n\nprint(\"💾 Pushing final model to Hub...\")\ntrainer.push_to_hub(\n    commit_message=\"Upload trained DETR model on CPPE-5\",\n    tags=[\"object-detection\", \"detr\", \"cppe-5\", \"vision\"],\n)\n\nprint(\"💾 Pushing image processor to Hub...\")\nimage_processor.push_to_hub(\n    repo_id=HUB_MODEL_ID,\n    commit_message=\"Upload image processor\"\n)\n\nprint(\"✅ Training complete!\")\nprint(f\"Model available at: https://huggingface.co/{HUB_MODEL_ID}\")\nprint(f\"\\nTo use your model:\")\nprint(f\"```python\")\nprint(f\"from transformers import AutoImageProcessor, AutoModelForObjectDetection\")\nprint(f\"\")\nprint(f\"processor = AutoImageProcessor.from_pretrained('{HUB_MODEL_ID}')\")\nprint(f\"model = AutoModelForObjectDetection.from_pretrained('{HUB_MODEL_ID}')\")\nprint(f\"```\")\n```\n\n**Submit:**\n\n```python\nhf_jobs(\"uv\", {\n    \"script\": training_script_content,  # Pass script content as a string, NOT a filename\n    \"flavor\": \"a10g-large\",\n    \"timeout\": \"8h\",\n    \"secrets\": {\"HF_TOKEN\": \"$HF_TOKEN\"}\n})\n```\n\n## Inference Example\n\nAfter training, use your model:\n\n```python\nfrom transformers import AutoImageProcessor, AutoModelForObjectDetection\nfrom PIL import Image\nimport torch\n\n# Load model from Hub\nprocessor = AutoImageProcessor.from_pretrained(\"username/detr-cppe5-detector\")\nmodel = AutoModelForObjectDetection.from_pretrained(\"username/detr-cppe5-detector\")\n\n# Load and process image\nimage = Image.open(\"test_image.jpg\")\ninputs = processor(images=image, return_tensors=\"pt\")\n\n# Run inference\nwith torch.no_grad():\n    outputs = model(**inputs)\n\n# Post-process results\ntarget_sizes = torch.tensor([image.size[::-1]])\nresults = processor.post_process_object_detection(\n    outputs,\n    threshold=0.5,\n    target_sizes=target_sizes\n)[0]\n\n# Print detections\nfor score, label, box in zip(results[\"scores\"], results[\"labels\"], results[\"boxes\"]):\n    box = [round(i, 2) for i in box.tolist()]\n    print(\n        f\"Detected {model.config.id2label[label.item()]} with confidence \"\n        f\"{round(score.item(), 3)} at location {box}\"\n    )\n```\n\n## Key Takeaway\n\n**Without `push_to_hub=True` and `secrets={\"HF_TOKEN\": \"$HF_TOKEN\"}`, all training results are permanently lost.**\n\n**For object detection, also remember to:**\n1. Save the image processor separately\n2. Configure label mappings (id2label, label2id)\n3. Include appropriate model card metadata\n\nAlways verify all three are configured before submitting any training job.\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/image_classification_training_notebook.md",
    "content": "# Image classification\n\n## Contents\n- Load Food-101 dataset\n- Preprocess (ViT image processor, torchvision transforms)\n- Evaluate (accuracy metric, compute_metrics)\n- Train (TrainingArguments, Trainer setup, push to Hub)\n- Inference (pipeline, manual prediction)\n\n---\n\nImage classification assigns a label or class to an image. Unlike text or audio classification, the inputs are the\npixel values that comprise an image. There are many applications for image classification, such as detecting damage\nafter a natural disaster, monitoring crop health, or helping screen medical images for signs of disease.\n\nThis guide illustrates how to:\n\n1. Fine-tune [ViT](../model_doc/vit) on the [Food-101](https://huggingface.co/datasets/ethz/food101) dataset to classify a food item in an image.\n2. Use your fine-tuned model for inference.\n\nTo see all architectures and checkpoints compatible with this task, we recommend checking the [task-page](https://huggingface.co/tasks/image-classification)\n\nBefore you begin, make sure you have all the necessary libraries installed:\n\n```bash\npip install transformers datasets evaluate accelerate pillow torchvision scikit-learn trackio\n```\n\nWe encourage you to log in to your Hugging Face account to upload and share your model with the community. When prompted, enter your token to log in:\n\n```py\n>>> from huggingface_hub import notebook_login\n\n>>> notebook_login()\n```\n\n## Load Food-101 dataset\n\nStart by loading a smaller subset of the Food-101 dataset from the 🤗 Datasets library. This will give you a chance to\nexperiment and make sure everything works before spending more time training on the full dataset.\n\n```py\n>>> from datasets import load_dataset\n\n>>> food = load_dataset(\"ethz/food101\", split=\"train[:5000]\")\n```\n\nSplit the dataset's `train` split into a train and test set with the [train_test_split](https://huggingface.co/docs/datasets/v4.5.0/en/package_reference/main_classes#datasets.Dataset.train_test_split) method:\n\n```py\n>>> food = food.train_test_split(test_size=0.2)\n```\n\nThen take a look at an example:\n\n```py\n>>> food[\"train\"][0]\n{'image': ,\n 'label': 79}\n```\n\nEach example in the dataset has two fields:\n\n- `image`: a PIL image of the food item\n- `label`: the label class of the food item\n\nTo make it easier for the model to get the label name from the label id, create a dictionary that maps the label name\nto an integer and vice versa:\n\n```py\n>>> labels = food[\"train\"].features[\"label\"].names\n>>> label2id, id2label = dict(), dict()\n>>> for i, label in enumerate(labels):\n...     label2id[label] = str(i)\n...     id2label[str(i)] = label\n```\n\nNow you can convert the label id to a label name:\n\n```py\n>>> id2label[str(79)]\n'prime_rib'\n```\n\n## Preprocess\n\nThe next step is to load a ViT image processor to process the image into a tensor:\n\n```py\n>>> from transformers import AutoImageProcessor\n\n>>> checkpoint = \"google/vit-base-patch16-224-in21k\"\n>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint)\n```\n\nApply some image transformations to the images to make the model more robust against overfitting. Here you'll use torchvision's [`transforms`](https://pytorch.org/vision/stable/transforms.html) module, but you can also use any image library you like.\n\nCrop a random part of the image, resize it, and normalize it with the image mean and standard deviation:\n\n```py\n>>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor\n\n>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)\n>>> size = (\n...     image_processor.size[\"shortest_edge\"]\n...     if \"shortest_edge\" in image_processor.size\n...     else (image_processor.size[\"height\"], image_processor.size[\"width\"])\n... )\n>>> _transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize])\n```\n\nThen create a preprocessing function to apply the transforms and return the `pixel_values` - the inputs to the model - of the image:\n\n```py\n>>> def transforms(examples):\n...     examples[\"pixel_values\"] = [_transforms(img.convert(\"RGB\")) for img in examples[\"image\"]]\n...     del examples[\"image\"]\n...     return examples\n```\n\nTo apply the preprocessing function over the entire dataset, use 🤗 Datasets [with_transform](https://huggingface.co/docs/datasets/v4.5.0/en/package_reference/main_classes#datasets.Dataset.with_transform) method. The transforms are applied on the fly when you load an element of the dataset:\n\n```py\n>>> food = food.with_transform(transforms)\n```\n\nNow create a batch of examples using [DefaultDataCollator](/docs/transformers/v5.2.0/en/main_classes/data_collator#transformers.DefaultDataCollator). Unlike other data collators in 🤗 Transformers, the `DefaultDataCollator` does not apply additional preprocessing such as padding.\n\n```py\n>>> from transformers import DefaultDataCollator\n\n>>> data_collator = DefaultDataCollator()\n```\n\n## Evaluate\n\nIncluding a metric during training is often helpful for evaluating your model's performance. You can quickly load an\nevaluation method with the 🤗 [Evaluate](https://huggingface.co/docs/evaluate/index) library. For this task, load\nthe [accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy) metric (see the 🤗 Evaluate [quick tour](https://huggingface.co/docs/evaluate/a_quick_tour) to learn more about how to load and compute a metric):\n\n```py\n>>> import evaluate\n\n>>> accuracy = evaluate.load(\"accuracy\")\n```\n\nThen create a function that passes your predictions and labels to [compute](https://huggingface.co/docs/evaluate/v0.4.6/en/package_reference/main_classes#evaluate.EvaluationModule.compute) to calculate the accuracy:\n\n```py\n>>> import numpy as np\n\n>>> def compute_metrics(eval_pred):\n...     predictions, labels = eval_pred\n...     predictions = np.argmax(predictions, axis=1)\n...     return accuracy.compute(predictions=predictions, references=labels)\n```\n\nYour `compute_metrics` function is ready to go now, and you'll return to it when you set up your training.\n\n## Train\n\nIf you aren't familiar with finetuning a model with the [Trainer](/docs/transformers/v5.2.0/en/main_classes/trainer#transformers.Trainer), take a look at the basic tutorial [here](../training#train-with-pytorch-trainer)!\n\nYou're ready to start training your model now! Load ViT with [AutoModelForImageClassification](/docs/transformers/v5.2.0/en/model_doc/auto#transformers.AutoModelForImageClassification). Specify the number of labels along with the number of expected labels, and the label mappings:\n\n```py\n>>> from transformers import AutoModelForImageClassification, TrainingArguments, Trainer\n\n>>> model = AutoModelForImageClassification.from_pretrained(\n...     checkpoint,\n...     num_labels=len(labels),\n...     id2label=id2label,\n...     label2id=label2id,\n... )\n```\n\nAt this point, only three steps remain:\n\n1. Define your training hyperparameters in [TrainingArguments](/docs/transformers/v5.2.0/en/main_classes/trainer#transformers.TrainingArguments). It is important you don't remove unused columns because that'll drop the `image` column. Without the `image` column, you can't create `pixel_values`. Set `remove_unused_columns=False` to prevent this behavior! The only other required parameter is `output_dir` which specifies where to save your model. You'll push this model to the Hub by setting `push_to_hub=True` (you need to be signed in to Hugging Face to upload your model). At the end of each epoch, the [Trainer](/docs/transformers/v5.2.0/en/main_classes/trainer#transformers.Trainer) will evaluate the accuracy and save the training checkpoint.\n2. Pass the training arguments to [Trainer](/docs/transformers/v5.2.0/en/main_classes/trainer#transformers.Trainer) along with the model, dataset, tokenizer, data collator, and `compute_metrics` function.\n3. Call [train()](/docs/transformers/v5.2.0/en/main_classes/trainer#transformers.Trainer.train) to finetune your model.\n\n```py\n>>> training_args = TrainingArguments(\n...     output_dir=\"my_awesome_food_model\",\n...     remove_unused_columns=False,\n...     eval_strategy=\"epoch\",\n...     save_strategy=\"epoch\",\n...     learning_rate=5e-5,\n...     per_device_train_batch_size=16,\n...     gradient_accumulation_steps=4,\n...     per_device_eval_batch_size=16,\n...     num_train_epochs=3,\n...     warmup_steps=0.1,\n...     logging_steps=10,\n...     report_to=\"trackio\",\n...     run_name=\"food101\",\n...     load_best_model_at_end=True,\n...     metric_for_best_model=\"accuracy\",\n...     push_to_hub=True,\n... )\n\n>>> trainer = Trainer(\n...     model=model,\n...     args=training_args,\n...     data_collator=data_collator,\n...     train_dataset=food[\"train\"],\n...     eval_dataset=food[\"test\"],\n...     processing_class=image_processor,\n...     compute_metrics=compute_metrics,\n... )\n\n>>> trainer.train()\n```\n\nOnce training is completed, share your model to the Hub with the [push_to_hub()](/docs/transformers/v5.2.0/en/main_classes/trainer#transformers.Trainer.push_to_hub) method so everyone can use your model:\n\n```py\n>>> trainer.push_to_hub()\n```\n\nFor a more in-depth example of how to finetune a model for image classification, take a look at the corresponding [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).\n\n## Inference\n\nGreat, now that you've fine-tuned a model, you can use it for inference!\n\nLoad an image you'd like to run inference on:\n\n```py\n>>> ds = load_dataset(\"ethz/food101\", split=\"validation[:10]\")\n>>> image = ds[\"image\"][0]\n```\n\n    \n\nThe simplest way to try out your finetuned model for inference is to use it in a [pipeline()](/docs/transformers/v5.2.0/en/main_classes/pipelines#transformers.pipeline). Instantiate a `pipeline` for image classification with your model, and pass your image to it:\n\n```py\n>>> from transformers import pipeline\n\n>>> classifier = pipeline(\"image-classification\", model=\"my_awesome_food_model\")\n>>> classifier(image)\n[{'score': 0.31856709718704224, 'label': 'beignets'},\n {'score': 0.015232225880026817, 'label': 'bruschetta'},\n {'score': 0.01519392803311348, 'label': 'chicken_wings'},\n {'score': 0.013022331520915031, 'label': 'pork_chop'},\n {'score': 0.012728818692266941, 'label': 'prime_rib'}]\n```\n\nYou can also manually replicate the results of the `pipeline` if you'd like:\n\nLoad an image processor to preprocess the image and return the `input` as PyTorch tensors:\n\n```py\n>>> from transformers import AutoImageProcessor\n>>> import torch\n\n>>> image_processor = AutoImageProcessor.from_pretrained(\"my_awesome_food_model\")\n>>> inputs = image_processor(image, return_tensors=\"pt\")\n```\n\nPass your inputs to the model and return the logits:\n\n```py\n>>> from transformers import AutoModelForImageClassification\n\n>>> model = AutoModelForImageClassification.from_pretrained(\"my_awesome_food_model\")\n>>> with torch.no_grad():\n...     logits = model(**inputs).logits\n```\n\nGet the predicted label with the highest probability, and use the model's `id2label` mapping to convert it to a label:\n\n```py\n>>> predicted_label = logits.argmax(-1).item()\n>>> model.config.id2label[predicted_label]\n'beignets'\n```\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/object_detection_training_notebook.md",
    "content": "# Object Detection Training Reference\n\n## Contents\n- Load the CPPE-5 dataset\n- Preprocess the data (augmentation with Albumentations, COCO annotation formatting)\n- Preparing function to compute mAP\n- Training the detection model (TrainingArguments, Trainer setup)\n- Evaluate\n- Inference (loading from Hub, running predictions, visualizing results)\n\n---\n\nObject detection is the computer vision task of detecting instances (such as humans, buildings, or cars) in an image. Object detection models receive an image as input and output\ncoordinates of the bounding boxes and associated labels of the detected objects. An image can contain multiple objects,\neach with its own bounding box and a label (e.g. it can have a car and a building), and each object can\nbe present in different parts of an image (e.g. the image can have several cars).\nThis task is commonly used in autonomous driving for detecting things like pedestrians, road signs, and traffic lights.\nOther applications include counting objects in images, image search, and more.\n\nIn this guide, you will learn how to:\n\n 1. Finetune [DETR](https://huggingface.co/docs/transformers/model_doc/detr), a model that combines a convolutional\n backbone with an encoder-decoder Transformer, on the [CPPE-5](https://huggingface.co/datasets/cppe-5)\n dataset.\n 2. Use your finetuned model for inference.\n\nTo see all architectures and checkpoints compatible with this task, we recommend checking the [task-page](https://huggingface.co/tasks/object-detection)\n\nBefore you begin, make sure you have all the necessary libraries installed:\n\n```bash\npip install -q datasets transformers accelerate timm trackio\npip install -q -U albumentations>=1.4.5 torchmetrics pycocotools\n```\n\nYou'll use 🤗 Datasets to load a dataset from the Hugging Face Hub, 🤗 Transformers to train your model,\nand `albumentations` to augment the data.\n\nWe encourage you to share your model with the community. Log in to your Hugging Face account to upload it to the Hub.\nWhen prompted, enter your token to log in:\n\n```py\n>>> from huggingface_hub import notebook_login\n\n>>> notebook_login()\n```\n\nTo get started, we'll define global constants, namely the model name and image size. For this tutorial, we'll use the conditional DETR model due to its faster convergence. Feel free to select any object detection model available in the `transformers` library.\n\n```py\n>>> MODEL_NAME = \"microsoft/conditional-detr-resnet-50\"  # or \"facebook/detr-resnet-50\"\n>>> IMAGE_SIZE = 480\n```\n\n## Load the CPPE-5 dataset\n\nThe [CPPE-5 dataset](https://huggingface.co/datasets/cppe-5) contains images with\nannotations identifying medical personal protective equipment (PPE) in the context of the COVID-19 pandemic.\n\nStart by loading the dataset and creating a `validation` split from `train`:\n\n```py\n>>> from datasets import load_dataset\n\n>>> cppe5 = load_dataset(\"cppe-5\")\n\n>>> if \"validation\" not in cppe5:\n...     split = cppe5[\"train\"].train_test_split(0.15, seed=1337)\n...     cppe5[\"train\"] = split[\"train\"]\n...     cppe5[\"validation\"] = split[\"test\"]\n\n>>> cppe5\nDatasetDict({\n    train: Dataset({\n        features: ['image_id', 'image', 'width', 'height', 'objects'],\n        num_rows: 850\n    })\n    test: Dataset({\n        features: ['image_id', 'image', 'width', 'height', 'objects'],\n        num_rows: 29\n    })\n    validation: Dataset({\n        features: ['image_id', 'image', 'width', 'height', 'objects'],\n        num_rows: 150\n    })\n})\n```\n\nYou'll see that this dataset has 1000 images for train and validation sets and a test set with 29 images.\n\nTo get familiar with the data, explore what the examples look like.\n\n```py\n>>> cppe5[\"train\"][0]\n{\n  'image_id': 366,\n  'image': ,\n  'width': 500,\n  'height': 500,\n  'objects': {\n    'id': [1932, 1933, 1934],\n    'area': [27063, 34200, 32431],\n    'bbox': [[29.0, 11.0, 97.0, 279.0],\n      [201.0, 1.0, 120.0, 285.0],\n      [382.0, 0.0, 113.0, 287.0]],\n    'category': [0, 0, 0]\n  }\n}\n```\n\nThe examples in the dataset have the following fields:\n\n- `image_id`: the example image id\n- `image`: a `PIL.Image.Image` object containing the image\n- `width`: width of the image\n- `height`: height of the image\n- `objects`: a dictionary containing bounding box metadata for the objects in the image:\n  - `id`: the annotation id\n  - `area`: the area of the bounding box\n  - `bbox`: the object's bounding box (in the [COCO format](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/#coco) )\n  - `category`: the object's category, with possible values including `Coverall (0)`, `Face_Shield (1)`, `Gloves (2)`, `Goggles (3)` and `Mask (4)`\n\nYou may notice that the `bbox` field follows the COCO format, which is the format that the DETR model expects.\nHowever, the grouping of the fields inside `objects` differs from the annotation format DETR requires. You will\nneed to apply some preprocessing transformations before using this data for training.\n\nTo get an even better understanding of the data, visualize an example in the dataset.\n\n```py\n>>> import numpy as np\n>>> import os\n>>> from PIL import Image, ImageDraw\n\n>>> image = cppe5[\"train\"][2][\"image\"]\n>>> annotations = cppe5[\"train\"][2][\"objects\"]\n>>> draw = ImageDraw.Draw(image)\n\n>>> categories = cppe5[\"train\"].features[\"objects\"][\"category\"].feature.names\n\n>>> id2label = {index: x for index, x in enumerate(categories, start=0)}\n>>> label2id = {v: k for k, v in id2label.items()}\n\n>>> for i in range(len(annotations[\"id\"])):\n...     box = annotations[\"bbox\"][i]\n...     class_idx = annotations[\"category\"][i]\n...     x, y, w, h = tuple(box)\n...     # Check if coordinates are normalized or not\n...     if max(box) > 1.0:\n...         # Coordinates are un-normalized, no need to re-scale them\n...         x1, y1 = int(x), int(y)\n...         x2, y2 = int(x + w), int(y + h)\n...     else:\n...         # Coordinates are normalized, re-scale them\n...         x1 = int(x * width)\n...         y1 = int(y * height)\n...         x2 = int((x + w) * width)\n...         y2 = int((y + h) * height)\n...     draw.rectangle((x, y, x + w, y + h), outline=\"red\", width=1)\n...     draw.text((x, y), id2label[class_idx], fill=\"white\")\n\n>>> image\n```\n\n    \n\nTo visualize the bounding boxes with associated labels, you can get the labels from the dataset's metadata, specifically\nthe `category` field.\nYou'll also want to create dictionaries that map a label id to a label class (`id2label`) and the other way around (`label2id`).\nYou can use them later when setting up the model. Including these maps will make your model reusable by others if you share\nit on the Hugging Face Hub. Please note that, the part of above code that draws the bounding boxes assume that it is in `COCO` format `(x_min, y_min, width, height)`. It has to be adjusted to work for other formats like `(x_min, y_min, x_max, y_max)`.\n\nAs a final step of getting familiar with the data, explore it for potential issues. One common problem with datasets for\nobject detection is bounding boxes that \"stretch\" beyond the edge of the image. Such \"runaway\" bounding boxes can raise\nerrors during training and should be addressed. There are a few examples with this issue in this dataset.\nTo keep things simple in this guide, we will set `clip=True` for `BboxParams` in transformations below.\n\n## Preprocess the data\n\nTo finetune a model, you must preprocess the data you plan to use to match precisely the approach used for the pre-trained model.\n[AutoImageProcessor](/docs/transformers/v5.1.0/en/model_doc/auto#transformers.AutoImageProcessor) takes care of processing image data to create `pixel_values`, `pixel_mask`, and\n`labels` that a DETR model can train with. The image processor has some attributes that you won't have to worry about:\n\n- `image_mean = [0.485, 0.456, 0.406 ]`\n- `image_std = [0.229, 0.224, 0.225]`\n\nThese are the mean and standard deviation used to normalize images during the model pre-training. These values are crucial\nto replicate when doing inference or finetuning a pre-trained image model.\n\nInstantiate the image processor from the same checkpoint as the model you want to finetune.\n\n```py\n>>> from transformers import AutoImageProcessor\n\n>>> MAX_SIZE = IMAGE_SIZE\n\n>>> image_processor = AutoImageProcessor.from_pretrained(\n...     MODEL_NAME,\n...     do_resize=True,\n...     size={\"max_height\": MAX_SIZE, \"max_width\": MAX_SIZE},\n...     do_pad=True,\n...     pad_size={\"height\": MAX_SIZE, \"width\": MAX_SIZE},\n... )\n```\n\nBefore passing the images to the `image_processor`, apply two preprocessing transformations to the dataset:\n\n- Augmenting images\n- Reformatting annotations to meet DETR expectations\n\nFirst, to make sure the model does not overfit on the training data, you can apply image augmentation with any data augmentation library. Here we use [Albumentations](https://albumentations.ai/docs/).\nThis library ensures that transformations affect the image and update the bounding boxes accordingly.\nThe 🤗 Datasets library documentation has a detailed [guide on how to augment images for object detection](https://huggingface.co/docs/datasets/object_detection),\nand it uses the exact same dataset as an example. Apply some geometric and color transformations to the image. For additional augmentation options, explore the [Albumentations Demo Space](https://huggingface.co/spaces/qubvel-hf/albumentations-demo).\n\n```py\n>>> import albumentations as A\n\n>>> train_augment_and_transform = A.Compose(\n...     [\n...         A.Perspective(p=0.1),\n...         A.HorizontalFlip(p=0.5),\n...         A.RandomBrightnessContrast(p=0.5),\n...         A.HueSaturationValue(p=0.1),\n...     ],\n...     bbox_params=A.BboxParams(format=\"coco\", label_fields=[\"category\"], clip=True, min_area=25),\n... )\n\n>>> validation_transform = A.Compose(\n...     [A.NoOp()],\n...     bbox_params=A.BboxParams(format=\"coco\", label_fields=[\"category\"], clip=True),\n... )\n```\n\nThe `image_processor` expects the annotations to be in the following format: `{'image_id': int, 'annotations': list[Dict]}`,\n where each dictionary is a COCO object annotation. Let's add a function to reformat annotations for a single example:\n\n```py\n>>> def format_image_annotations_as_coco(image_id, categories, areas, bboxes):\n...     \"\"\"Format one set of image annotations to the COCO format\n\n...     Args:\n...         image_id (str): image id. e.g. \"0001\"\n...         categories (list[int]): list of categories/class labels corresponding to provided bounding boxes\n...         areas (list[float]): list of corresponding areas to provided bounding boxes\n...         bboxes (list[tuple[float]]): list of bounding boxes provided in COCO format\n...             ([center_x, center_y, width, height] in absolute coordinates)\n\n...     Returns:\n...         dict: {\n...             \"image_id\": image id,\n...             \"annotations\": list of formatted annotations\n...         }\n...     \"\"\"\n...     annotations = []\n...     for category, area, bbox in zip(categories, areas, bboxes):\n...         formatted_annotation = {\n...             \"image_id\": image_id,\n...             \"category_id\": category,\n...             \"iscrowd\": 0,\n...             \"area\": area,\n...             \"bbox\": list(bbox),\n...         }\n...         annotations.append(formatted_annotation)\n\n...     return {\n...         \"image_id\": image_id,\n...         \"annotations\": annotations,\n...     }\n\n```\n\nNow you can combine the image and annotation transformations to use on a batch of examples:\n\n```py\n>>> def augment_and_transform_batch(examples, transform, image_processor, return_pixel_mask=False):\n...     \"\"\"Apply augmentations and format annotations in COCO format for object detection task\"\"\"\n\n...     images = []\n...     annotations = []\n...     for image_id, image, objects in zip(examples[\"image_id\"], examples[\"image\"], examples[\"objects\"]):\n...         image = np.array(image.convert(\"RGB\"))\n\n...         # apply augmentations\n...         output = transform(image=image, bboxes=objects[\"bbox\"], category=objects[\"category\"])\n...         images.append(output[\"image\"])\n\n...         # format annotations in COCO format\n...         formatted_annotations = format_image_annotations_as_coco(\n...             image_id, output[\"category\"], objects[\"area\"], output[\"bboxes\"]\n...         )\n...         annotations.append(formatted_annotations)\n\n...     # Apply the image processor transformations: resizing, rescaling, normalization\n...     result = image_processor(images=images, annotations=annotations, return_tensors=\"pt\")\n\n...     if not return_pixel_mask:\n...         result.pop(\"pixel_mask\", None)\n\n...     return result\n```\n\nApply this preprocessing function to the entire dataset using 🤗 Datasets [with_transform](https://huggingface.co/docs/datasets/v4.5.0/en/package_reference/main_classes#datasets.Dataset.with_transform) method. This method applies\ntransformations on the fly when you load an element of the dataset.\n\nAt this point, you can check what an example from the dataset looks like after the transformations. You should see a tensor\nwith `pixel_values`, a tensor with `pixel_mask`, and `labels`.\n\n```py\n>>> from functools import partial\n\n>>> # Make transform functions for batch and apply for dataset splits\n>>> train_transform_batch = partial(\n...     augment_and_transform_batch, transform=train_augment_and_transform, image_processor=image_processor\n... )\n>>> validation_transform_batch = partial(\n...     augment_and_transform_batch, transform=validation_transform, image_processor=image_processor\n... )\n\n>>> cppe5[\"train\"] = cppe5[\"train\"].with_transform(train_transform_batch)\n>>> cppe5[\"validation\"] = cppe5[\"validation\"].with_transform(validation_transform_batch)\n>>> cppe5[\"test\"] = cppe5[\"test\"].with_transform(validation_transform_batch)\n\n>>> cppe5[\"train\"][15]\n{'pixel_values': tensor([[[ 1.9235,  1.9407,  1.9749,  ..., -0.7822, -0.7479, -0.6965],\n          [ 1.9578,  1.9749,  1.9920,  ..., -0.7993, -0.7650, -0.7308],\n          [ 2.0092,  2.0092,  2.0263,  ..., -0.8507, -0.8164, -0.7822],\n          ...,\n          [ 0.0741,  0.0741,  0.0741,  ...,  0.0741,  0.0741,  0.0741],\n          [ 0.0741,  0.0741,  0.0741,  ...,  0.0741,  0.0741,  0.0741],\n          [ 0.0741,  0.0741,  0.0741,  ...,  0.0741,  0.0741,  0.0741]],\n\n          [[ 1.6232,  1.6408,  1.6583,  ...,  0.8704,  1.0105,  1.1331],\n          [ 1.6408,  1.6583,  1.6758,  ...,  0.8529,  0.9930,  1.0980],\n          [ 1.6933,  1.6933,  1.7108,  ...,  0.8179,  0.9580,  1.0630],\n          ...,\n          [ 0.2052,  0.2052,  0.2052,  ...,  0.2052,  0.2052,  0.2052],\n          [ 0.2052,  0.2052,  0.2052,  ...,  0.2052,  0.2052,  0.2052],\n          [ 0.2052,  0.2052,  0.2052,  ...,  0.2052,  0.2052,  0.2052]],\n\n          [[ 1.8905,  1.9080,  1.9428,  ..., -0.1487, -0.0964, -0.0615],\n          [ 1.9254,  1.9428,  1.9603,  ..., -0.1661, -0.1138, -0.0790],\n          [ 1.9777,  1.9777,  1.9951,  ..., -0.2010, -0.1138, -0.0790],\n          ...,\n          [ 0.4265,  0.4265,  0.4265,  ...,  0.4265,  0.4265,  0.4265],\n          [ 0.4265,  0.4265,  0.4265,  ...,  0.4265,  0.4265,  0.4265],\n          [ 0.4265,  0.4265,  0.4265,  ...,  0.4265,  0.4265,  0.4265]]]),\n  'labels': {'image_id': tensor([688]), 'class_labels': tensor([3, 4, 2, 0, 0]), 'boxes': tensor([[0.4700, 0.1933, 0.1467, 0.0767],\n          [0.4858, 0.2600, 0.1150, 0.1000],\n          [0.4042, 0.4517, 0.1217, 0.1300],\n          [0.4242, 0.3217, 0.3617, 0.5567],\n          [0.6617, 0.4033, 0.5400, 0.4533]]), 'area': tensor([ 4048.,  4140.,  5694., 72478., 88128.]), 'iscrowd': tensor([0, 0, 0, 0, 0]), 'orig_size': tensor([480, 480])}}\n```\n\nYou have successfully augmented the individual images and prepared their annotations. However, preprocessing isn't\ncomplete yet. In the final step, create a custom `collate_fn` to batch images together.\nPad images (which are now `pixel_values`) to the largest image in a batch, and create a corresponding `pixel_mask`\nto indicate which pixels are real (1) and which are padding (0).\n\n```py\n>>> import torch\n\n>>> def collate_fn(batch):\n...     data = {}\n...     data[\"pixel_values\"] = torch.stack([x[\"pixel_values\"] for x in batch])\n...     data[\"labels\"] = [x[\"labels\"] for x in batch]\n...     if \"pixel_mask\" in batch[0]:\n...         data[\"pixel_mask\"] = torch.stack([x[\"pixel_mask\"] for x in batch])\n...     return data\n\n```\n\n## Preparing function to compute mAP\n\nObject detection models are commonly evaluated with a set of COCO-style metrics. We are going to use `torchmetrics` to compute `mAP` (mean average precision) and `mAR` (mean average recall) metrics and will wrap it to `compute_metrics` function in order to use in [Trainer](/docs/transformers/v5.1.0/en/main_classes/trainer#transformers.Trainer) for evaluation.\n\nIntermediate format of boxes used for training is `YOLO` (normalized) but we will compute metrics for boxes in `Pascal VOC` (absolute) format in order to correctly handle box areas. Let's define a function that converts bounding boxes to `Pascal VOC` format:\n\n```py\n>>> from transformers.image_transforms import center_to_corners_format\n\n>>> def convert_bbox_yolo_to_pascal(boxes, image_size):\n...     \"\"\"\n...     Convert bounding boxes from YOLO format (x_center, y_center, width, height) in range [0, 1]\n...     to Pascal VOC format (x_min, y_min, x_max, y_max) in absolute coordinates.\n\n...     Args:\n...         boxes (torch.Tensor): Bounding boxes in YOLO format\n...         image_size (tuple[int, int]): Image size in format (height, width)\n\n...     Returns:\n...         torch.Tensor: Bounding boxes in Pascal VOC format (x_min, y_min, x_max, y_max)\n...     \"\"\"\n...     # convert center to corners format\n...     boxes = center_to_corners_format(boxes)\n\n...     # convert to absolute coordinates\n...     height, width = image_size\n...     boxes = boxes * torch.tensor([[width, height, width, height]])\n\n...     return boxes\n```\n\nThen, in `compute_metrics` function we collect `predicted` and `target` bounding boxes, scores and labels from evaluation loop results and pass it to the scoring function.\n\n```py\n>>> import numpy as np\n>>> from dataclasses import dataclass\n>>> from torchmetrics.detection.mean_ap import MeanAveragePrecision\n\n>>> @dataclass\n>>> class ModelOutput:\n...     logits: torch.Tensor\n...     pred_boxes: torch.Tensor\n\n>>> @torch.no_grad()\n>>> def compute_metrics(evaluation_results, image_processor, threshold=0.0, id2label=None):\n...     \"\"\"\n...     Compute mean average mAP, mAR and their variants for the object detection task.\n\n...     Args:\n...         evaluation_results (EvalPrediction): Predictions and targets from evaluation.\n...         threshold (float, optional): Threshold to filter predicted boxes by confidence. Defaults to 0.0.\n...         id2label (Optional[dict], optional): Mapping from class id to class name. Defaults to None.\n\n...     Returns:\n...         Mapping[str, float]: Metrics in a form of dictionary {: }\n...     \"\"\"\n\n...     predictions, targets = evaluation_results.predictions, evaluation_results.label_ids\n\n...     # For metric computation we need to provide:\n...     #  - targets in a form of list of dictionaries with keys \"boxes\", \"labels\"\n...     #  - predictions in a form of list of dictionaries with keys \"boxes\", \"scores\", \"labels\"\n\n...     image_sizes = []\n...     post_processed_targets = []\n...     post_processed_predictions = []\n\n...     # Collect targets in the required format for metric computation\n...     for batch in targets:\n...         # collect image sizes, we will need them for predictions post processing\n...         batch_image_sizes = torch.tensor(np.array([x[\"orig_size\"] for x in batch]))\n...         image_sizes.append(batch_image_sizes)\n...         # collect targets in the required format for metric computation\n...         # boxes were converted to YOLO format needed for model training\n...         # here we will convert them to Pascal VOC format (x_min, y_min, x_max, y_max)\n...         for image_target in batch:\n...             boxes = torch.tensor(image_target[\"boxes\"])\n...             boxes = convert_bbox_yolo_to_pascal(boxes, image_target[\"orig_size\"])\n...             labels = torch.tensor(image_target[\"class_labels\"])\n...             post_processed_targets.append({\"boxes\": boxes, \"labels\": labels})\n\n...     # Collect predictions in the required format for metric computation,\n...     # model produce boxes in YOLO format, then image_processor convert them to Pascal VOC format\n...     for batch, target_sizes in zip(predictions, image_sizes):\n...         batch_logits, batch_boxes = batch[1], batch[2]\n...         output = ModelOutput(logits=torch.tensor(batch_logits), pred_boxes=torch.tensor(batch_boxes))\n...         post_processed_output = image_processor.post_process_object_detection(\n...             output, threshold=threshold, target_sizes=target_sizes\n...         )\n...         post_processed_predictions.extend(post_processed_output)\n\n...     # Compute metrics\n...     metric = MeanAveragePrecision(box_format=\"xyxy\", class_metrics=True)\n...     metric.update(post_processed_predictions, post_processed_targets)\n...     metrics = metric.compute()\n\n...     # Replace list of per class metrics with separate metric for each class\n...     classes = metrics.pop(\"classes\")\n...     map_per_class = metrics.pop(\"map_per_class\")\n...     mar_100_per_class = metrics.pop(\"mar_100_per_class\")\n...     for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class):\n...         class_name = id2label[class_id.item()] if id2label is not None else class_id.item()\n...         metrics[f\"map_{class_name}\"] = class_map\n...         metrics[f\"mar_100_{class_name}\"] = class_mar\n\n...     metrics = {k: round(v.item(), 4) for k, v in metrics.items()}\n\n...     return metrics\n\n>>> eval_compute_metrics_fn = partial(\n...     compute_metrics, image_processor=image_processor, id2label=id2label, threshold=0.0\n... )\n```\n\n## Training the detection model\n\nYou have done most of the heavy lifting in the previous sections, so now you are ready to train your model!\nThe images in this dataset are still quite large, even after resizing. This means that finetuning this model will\nrequire at least one GPU.\n\nTraining involves the following steps:\n\n1. Load the model with [AutoModelForObjectDetection](/docs/transformers/v5.1.0/en/model_doc/auto#transformers.AutoModelForObjectDetection) using the same checkpoint as in the preprocessing.\n2. Define your training hyperparameters in [TrainingArguments](/docs/transformers/v5.1.0/en/main_classes/trainer#transformers.TrainingArguments).\n3. Pass the training arguments to [Trainer](/docs/transformers/v5.1.0/en/main_classes/trainer#transformers.Trainer) along with the model, dataset, image processor, and data collator.\n4. Call [train()](/docs/transformers/v5.1.0/en/main_classes/trainer#transformers.Trainer.train) to finetune your model.\n\nWhen loading the model from the same checkpoint that you used for the preprocessing, remember to pass the `label2id`\nand `id2label` maps that you created earlier from the dataset's metadata. Additionally, we specify `ignore_mismatched_sizes=True` to replace the existing classification head with a new one.\n\n```py\n>>> from transformers import AutoModelForObjectDetection\n\n>>> model = AutoModelForObjectDetection.from_pretrained(\n...     MODEL_NAME,\n...     id2label=id2label,\n...     label2id=label2id,\n...     ignore_mismatched_sizes=True,\n... )\n```\n\nIn the [TrainingArguments](/docs/transformers/v5.1.0/en/main_classes/trainer#transformers.TrainingArguments) use `output_dir` to specify where to save your model, then configure hyperparameters as you see fit. For `num_train_epochs=30` training will take about 35 minutes in Google Colab T4 GPU, increase the number of epoch to get better results.\n\nImportant notes:\n\n- Set `remove_unused_columns` to `False`.\n- Set `eval_do_concat_batches=False` to get proper evaluation results. Images have different number of target boxes, if batches are concatenated we will not be able to determine which boxes belongs to particular image.\n\nIf you wish to share your model by pushing to the Hub, set `push_to_hub` to `True` (you must be signed in to Hugging\nFace to upload your model).\n\n```py\n>>> from transformers import TrainingArguments\n\n>>> training_args = TrainingArguments(\n...     output_dir=\"detr_finetuned_cppe5\",\n...     num_train_epochs=30,\n...     fp16=False,\n...     per_device_train_batch_size=8,\n...     dataloader_num_workers=4,\n...     learning_rate=5e-5,\n...     lr_scheduler_type=\"cosine\",\n...     weight_decay=1e-4,\n...     max_grad_norm=0.01,\n...     metric_for_best_model=\"eval_map\",\n...     greater_is_better=True,\n...     load_best_model_at_end=True,\n...     eval_strategy=\"epoch\",\n...     save_strategy=\"epoch\",\n...     save_total_limit=2,\n...     remove_unused_columns=False,\n...     report_to=\"trackio\",\n...     run_name=\"cppe\",\n...     eval_do_concat_batches=False,\n...     push_to_hub=True,\n... )\n```\n\nFinally, bring everything together, and call [train()](/docs/transformers/v5.1.0/en/main_classes/trainer#transformers.Trainer.train):\n\n```py\n>>> from transformers import Trainer\n\n>>> trainer = Trainer(\n...     model=model,\n...     args=training_args,\n...     train_dataset=cppe5[\"train\"],\n...     eval_dataset=cppe5[\"validation\"],\n...     processing_class=image_processor,\n...     data_collator=collate_fn,\n...     compute_metrics=eval_compute_metrics_fn,\n... )\n\n>>> trainer.train()\n```\n\nTraining runs for 30 epochs (~26 minutes on a T4 GPU for CPPE-5). Final epoch 30 results:\n\n| Metric | Value |\n|--------|-------|\n| Training Loss | 0.994 |\n| Validation Loss | 1.346 |\n| mAP | 0.277 |\n| mAP@50 | 0.555 |\n| mAP@75 | 0.253 |\n| mAR@100 | 0.443 |\n\nPer-class mAP at epoch 30: Coverall 0.530, Face Shield 0.276, Gloves 0.175, Goggles 0.157, Mask 0.249.\n\nKey observations:\n- mAP improves rapidly in early epochs (0.009 at epoch 1 → 0.18 by epoch 10), then gradually converges\n- Large objects are detected better (mAP_large=0.524) than small objects (mAP_small=0.148)\n- Class imbalance visible: Coverall highest mAP (0.530), Goggles lowest (0.157)\n\n<!-- Full per-epoch training metrics table omitted for brevity. -->\n\n\nIf you have set `push_to_hub` to `True` in the `training_args`, the training checkpoints are pushed to the\nHugging Face Hub. Upon training completion, push the final model to the Hub as well by calling the [push_to_hub()](/docs/transformers/v5.1.0/en/main_classes/trainer#transformers.Trainer.push_to_hub) method.\n\n```py\n>>> trainer.push_to_hub()\n```\n\n## Evaluate\n\n```py\n>>> from pprint import pprint\n\n>>> metrics = trainer.evaluate(eval_dataset=cppe5[\"test\"], metric_key_prefix=\"test\")\n>>> pprint(metrics)\n{'epoch': 30.0,\n  'test_loss': 1.0877351760864258,\n  'test_map': 0.4116,\n  'test_map_50': 0.741,\n  'test_map_75': 0.3663,\n  'test_map_Coverall': 0.5937,\n  'test_map_Face_Shield': 0.5863,\n  'test_map_Gloves': 0.3416,\n  'test_map_Goggles': 0.1468,\n  'test_map_Mask': 0.3894,\n  'test_map_large': 0.5637,\n  'test_map_medium': 0.3257,\n  'test_map_small': 0.3589,\n  'test_mar_1': 0.323,\n  'test_mar_10': 0.5237,\n  'test_mar_100': 0.5587,\n  'test_mar_100_Coverall': 0.6756,\n  'test_mar_100_Face_Shield': 0.7294,\n  'test_mar_100_Gloves': 0.4721,\n  'test_mar_100_Goggles': 0.4125,\n  'test_mar_100_Mask': 0.5038,\n  'test_mar_large': 0.7283,\n  'test_mar_medium': 0.4901,\n  'test_mar_small': 0.4469,\n  'test_runtime': 1.6526,\n  'test_samples_per_second': 17.548,\n  'test_steps_per_second': 2.42}\n```\n\nThese results can be further improved by adjusting the hyperparameters in [TrainingArguments](/docs/transformers/v5.1.0/en/main_classes/trainer#transformers.TrainingArguments). Give it a go!\n\n## Inference\n\nNow that you have finetuned a model, evaluated it, and uploaded it to the Hugging Face Hub, you can use it for inference.\n\n```py\n>>> import torch\n>>> import requests\n\n>>> from PIL import Image, ImageDraw\n>>> from transformers import AutoImageProcessor, AutoModelForObjectDetection\n\n>>> url = \"https://images.pexels.com/photos/8413299/pexels-photo-8413299.jpeg?auto=compress&cs=tinysrgb&w=630&h=375&dpr=2\"\n>>> image = Image.open(requests.get(url, stream=True).raw)\n```\n\nLoad model and image processor from the Hugging Face Hub (skip to use already trained in this session):\n\n```py\n>>> from accelerate import Accelerator\n\n>>> device = Accelerator().device\n>>> model_repo = \"qubvel-hf/detr_finetuned_cppe5\"\n\n>>> image_processor = AutoImageProcessor.from_pretrained(model_repo)\n>>> model = AutoModelForObjectDetection.from_pretrained(model_repo)\n>>> model = model.to(device)\n```\n\nAnd detect bounding boxes:\n\n```py\n\n>>> with torch.no_grad():\n...     inputs = image_processor(images=[image], return_tensors=\"pt\")\n...     outputs = model(**inputs.to(device))\n...     target_sizes = torch.tensor([[image.size[1], image.size[0]]])\n...     results = image_processor.post_process_object_detection(outputs, threshold=0.3, target_sizes=target_sizes)[0]\n\n>>> for score, label, box in zip(results[\"scores\"], results[\"labels\"], results[\"boxes\"]):\n...     box = [round(i, 2) for i in box.tolist()]\n...     print(\n...         f\"Detected {model.config.id2label[label.item()]} with confidence \"\n...         f\"{round(score.item(), 3)} at location {box}\"\n...     )\nDetected Gloves with confidence 0.683 at location [244.58, 124.33, 300.35, 185.13]\nDetected Mask with confidence 0.517 at location [143.73, 64.58, 219.57, 125.89]\nDetected Gloves with confidence 0.425 at location [179.15, 155.57, 262.4, 226.35]\nDetected Coverall with confidence 0.407 at location [307.13, -1.18, 477.82, 318.06]\nDetected Coverall with confidence 0.391 at location [68.61, 126.66, 309.03, 318.89]\n```\n\nLet's plot the result:\n\n```py\n>>> draw = ImageDraw.Draw(image)\n\n>>> for score, label, box in zip(results[\"scores\"], results[\"labels\"], results[\"boxes\"]):\n...     box = [round(i, 2) for i in box.tolist()]\n...     x, y, x2, y2 = tuple(box)\n...     draw.rectangle((x, y, x2, y2), outline=\"red\", width=1)\n...     draw.text((x, y), model.config.id2label[label.item()], fill=\"white\")\n\n>>> image\n```\n\n    \n\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/reliability_principles.md",
    "content": "# Reliability Principles for Training Jobs\n\n## Contents\n- Principle 1: Always Verify Before Use\n- Principle 2: Prioritize Reliability Over Performance\n- Principle 3: Create Atomic, Self-Contained Scripts\n- Principle 4: Provide Clear Error Context\n- Principle 5: Test the Happy Path on Known-Good Inputs\n- Summary: The Reliability Checklist (pre-flight, script quality, job config)\n- When Principles Conflict\n\n---\n\nThese principles are derived from real production failures and successful fixes. Following them prevents common failure modes and ensures reliable job execution.\n\n## Principle 1: Always Verify Before Use\n\n**Rule:** Never assume repos, datasets, or resources exist. Verify with tools first.\n\n### What It Prevents\n\n- **Non-existent datasets** - Jobs fail immediately when dataset doesn't exist\n- **Typos in names** - Simple mistakes like \"argilla-dpo-mix-7k\" vs \"ultrafeedback_binarized\"\n- **Incorrect paths** - Old or moved repos, renamed files\n- **Missing dependencies** - Undocumented requirements\n\n### How to Apply\n\n**Before submitting ANY job:**\n\n```python\n# Verify dataset exists\ndataset_search({\"query\": \"dataset-name\", \"author\": \"author-name\", \"limit\": 5})\nhub_repo_details([\"author/dataset-name\"], repo_type=\"dataset\")\n\n# Verify model exists\nhub_repo_details([\"org/model-name\"], repo_type=\"model\")\n\n# Check script/file paths (for URL-based scripts)\n# Verify before using: https://github.com/user/repo/blob/main/script.py\n```\n\n**Examples that would have caught errors:**\n\n```python\n# ❌ WRONG: Assumed dataset exists\nhf_jobs(\"uv\", {\n    \"script\": \"\"\"...\"\"\",\n    \"env\": {\"DATASET\": \"trl-lib/argilla-dpo-mix-7k\"}  # Doesn't exist!\n})\n\n# ✅ CORRECT: Verify first\ndataset_search({\"query\": \"argilla dpo\", \"author\": \"trl-lib\"})\n# Would show: \"trl-lib/ultrafeedback_binarized\" is the correct name\n\nhub_repo_details([\"trl-lib/ultrafeedback_binarized\"], repo_type=\"dataset\")\n# Confirms it exists before using\n```\n\n### Implementation Checklist\n\n- [ ] Check dataset exists before training\n- [ ] Test script URLs are valid before submitting\n- [ ] Check for recent updates/renames of resources\n- [ ] Check for dataset format\n\n**Time cost:** 5-10 seconds  \n**Time saved:** Hours of failed job time + debugging\n\n---\n\n## Principle 2: Prioritize Reliability Over Performance\n\n**Rule:** Default to what is most likely to succeed, not what is theoretically fastest.\n\n### What It Prevents\n\n- **Hardware incompatibilities** - Features that fail on certain GPUs\n- **Unstable optimizations** - Speed-ups that cause crashes\n- **Complex configurations** - More failure points\n- **Build system issues** - Unreliable compilation methods\n\n### How to Apply\n\n**Choose reliability:**\n\n```python\n# ❌ RISKY: Aggressive optimization that may fail\nTrainingArguments(\n    torch_compile=True,  # Can fail on T4, A10G GPUs\n    optim=\"adamw_bnb_8bit\",  # Requires specific setup\n    dataloader_num_workers=8,  # May cause OOM on small instances\n    ...\n)\n\n# ✅ SAFE: Proven defaults\nTrainingArguments(\n    # torch_compile=True,  # Commented with note: \"Enable on H100 for 20% speedup\"\n    optim=\"adamw_torch\",  # Standard, always works\n    fp16=True,  # Stable and fast on T4/A10G\n    dataloader_num_workers=4,  # Conservative, reliable\n    ...\n)\n```\n\n### Real-World Example\n\n**The `torch.compile` failure:**\n- Added for \"20% speedup\" on H100\n- **Failed fatally on T4-medium** with cryptic error\n- Misdiagnosed as dataset issue (cost hours)\n- **Fix:** Disable by default, add as optional comment\n\n**Result:** Reliability > 20% performance gain\n\n### Implementation Checklist\n\n- [ ] Use proven, standard configurations by default\n- [ ] Comment out performance optimizations with hardware notes\n- [ ] Use stable build systems (CMake > make)\n- [ ] Test on target hardware before production\n- [ ] Document known incompatibilities\n- [ ] Provide \"safe\" and \"fast\" variants when needed\n\n**Performance loss:** 10-20% in best case  \n**Reliability gain:** 95%+ success rate vs 60-70%\n\n---\n\n## Principle 3: Create Atomic, Self-Contained Scripts\n\n**Rule:** Scripts should work as complete, independent units. Don't remove parts to \"simplify.\"\n\n### What It Prevents\n\n- **Missing dependencies** - Removed \"unnecessary\" packages that are actually required\n- **Incomplete processes** - Skipped steps that seem redundant\n- **Environment assumptions** - Scripts that need pre-setup\n- **Partial failures** - Some parts work, others fail silently\n\n### How to Apply\n\n**Complete dependency specifications:**\n\n```python\n# ❌ INCOMPLETE: \"Simplified\" by removing dependencies\n# /// script\n# dependencies = [\n#     \"transformers\",\n#     \"torch\",\n#     \"datasets\",\n# ]\n# ///\n\n# ✅ COMPLETE: All dependencies explicit\n# /// script\n# dependencies = [\n#     \"transformers>=5.2.0\",\n#     \"accelerate>=1.1.0\",\n#     \"albumentations>=1.4.16\",  # Required for augmentation + bbox handling\n#     \"timm\",                     # Required for vision backbones\n#     \"datasets>=4.0\",\n#     \"torchmetrics\",             # Required for mAP/mAR computation\n#     \"pycocotools\",              # Required for COCO evaluation\n#     \"trackio\",                  # Required for metrics monitoring\n#     \"huggingface_hub\",\n# ]\n# ///\n```\n\n### Real-World Example\n\n**The `albumentations` failure:**\n- Original script had it: augmentations and bbox clipping worked fine\n- \"Simplified\" version removed it: \"not strictly needed for training\"\n- **Training crashed on bbox augmentation** — no fallback for COCO-format bbox handling\n- Hard to debug: error appeared in data loading, not in augmentation setup\n- **Fix:** Restore all original dependencies\n\n**Result:** Don't remove dependencies without thorough testing\n\n### Implementation Checklist\n\n- [ ] All dependencies in PEP 723 header with version pins\n- [ ] All system packages installed by script\n- [ ] No assumptions about pre-existing environment\n- [ ] No \"optional\" steps that are actually required\n- [ ] Test scripts in clean environment\n- [ ] Document why each dependency is needed\n\n**Complexity:** Slightly longer scripts  \n**Reliability:** Scripts \"just work\" every time\n\n---\n\n## Principle 4: Provide Clear Error Context\n\n**Rule:** When things fail, make it obvious what went wrong and how to fix it.\n\n### How to Apply\n\n**Wrap subprocess calls:**\n\n```python\n# ❌ UNCLEAR: Silent failure\nsubprocess.run([...], check=True, capture_output=True)\n\n# ✅ CLEAR: Shows what failed\ntry:\n    result = subprocess.run(\n        [...],\n        check=True,\n        capture_output=True,\n        text=True\n    )\n    print(result.stdout)\n    if result.stderr:\n        print(\"Warnings:\", result.stderr)\nexcept subprocess.CalledProcessError as e:\n    print(f\"❌ Command failed!\")\n    print(\"STDOUT:\", e.stdout)\n    print(\"STDERR:\", e.stderr)\n    raise\n```\n\n**Validate inputs:**\n\n```python\n# ❌ UNCLEAR: Fails later with cryptic error\nmodel = load_model(MODEL_NAME)\n\n# ✅ CLEAR: Fails fast with clear message\nif not MODEL_NAME:\n    raise ValueError(\"MODEL_NAME environment variable not set!\")\n\nprint(f\"Loading model: {MODEL_NAME}\")\ntry:\n    model = load_model(MODEL_NAME)\n    print(f\"✅ Model loaded successfully\")\nexcept Exception as e:\n    print(f\"❌ Failed to load model: {MODEL_NAME}\")\n    print(f\"Error: {e}\")\n    print(\"Hint: Check that model exists on Hub\")\n    raise\n```\n\n### Implementation Checklist\n\n- [ ] Wrap external calls with try/except\n- [ ] Print stdout/stderr on failure\n- [ ] Validate environment variables early\n- [ ] Add progress indicators (✅, ❌, 🔄)\n- [ ] Include hints for common failures\n- [ ] Log configuration at start\n\n---\n\n## Principle 5: Test the Happy Path on Known-Good Inputs\n\n**Rule:** Before using new code in production, test with inputs you know work.\n\n## Summary: The Reliability Checklist\n\nBefore submitting ANY job:\n\n### Pre-Flight Checks\n- [ ] **Verified** all repos/datasets exist (hub_repo_details)\n- [ ] **Tested** with known-good inputs if new code\n- [ ] **Using** proven hardware/configuration\n- [ ] **Included** all dependencies in PEP 723 header\n- [ ] **Installed** system requirements (build tools, etc.)\n- [ ] **Set** appropriate timeout (not default 30m)\n- [ ] **Configured** Hub push with HF_TOKEN (login() + hub_token)\n- [ ] **Added** clear error handling\n\n### Script Quality\n- [ ] Self-contained (no external setup needed)\n- [ ] Complete dependencies listed\n- [ ] Build tools installed by script\n- [ ] Progress indicators included\n- [ ] Error messages are clear\n- [ ] Configuration logged at start\n\n### Job Configuration\n- [ ] Timeout > expected runtime + 30% buffer\n- [ ] Hardware appropriate for model size\n- [ ] Secrets include HF_TOKEN (see SKILL.md directive #2 for syntax)\n- [ ] Script calls `login(token=hf_token)` and sets `training_args.hub_token = hf_token` BEFORE `Trainer()` init\n- [ ] Environment variables set correctly\n- [ ] Cost estimated and acceptable\n\n**Following these principles transforms job success rate from ~60-70% to ~95%+**\n\n---\n\n## When Principles Conflict\n\nSometimes reliability and performance conflict. Here's how to choose:\n\n| Scenario | Choose | Rationale |\n|----------|--------|-----------|\n| Demo/test | Reliability | Fast failure is worse than slow success |\n| Production (first run) | Reliability | Prove it works before optimizing |\n| Production (proven) | Performance | Safe to optimize after validation |\n| Time-critical | Reliability | Failures cause more delay than slow runs |\n| Cost-critical | Balanced | Test with small model, then optimize |\n\n**General rule:** Reliability first, optimize second.\n\n---\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/references/timm_trainer.md",
    "content": "# Using timm models with Hugging Face Trainer\n\nTransformers has first-class support for timm models via the `TimmWrapper` classes. You can load any timm model and use it directly with the `Trainer` API for image classification. Here's how it works:\n\n## Loading a timm model\n\nThe `TimmWrapperForImageClassification` class (in `transformers/src/transformers/models/timm_wrapper/modeling_timm_wrapper.py`) wraps timm models so they're fully compatible with the Trainer API. You can load them via the `Auto` classes:\n\n```python\nfrom transformers import AutoModelForImageClassification, AutoImageProcessor, Trainer, TrainingArguments\n\n# Load a timm model for image classification\ncheckpoint = \"timm/resnet50.a1_in1k\"\nimage_processor = AutoImageProcessor.from_pretrained(checkpoint)\nmodel = AutoModelForImageClassification.from_pretrained(\n    checkpoint,\n    num_labels=10,  # set to your number of classes\n    ignore_mismatched_sizes=True,  # needed when changing num_labels from pretrained\n)\n```\n\n## Key details\n\n1. **Image processor**: The `TimmWrapperImageProcessor` automatically resolves the correct transforms from timm's config. It exposes both `val_transforms` and `train_transforms` (with augmentations), as noted in the code:\n\n```64:65:transformers/src/transformers/models/timm_wrapper/image_processing_timm_wrapper.py\n        # useful for training, see examples/pytorch/image-classification/run_image_classification.py\n        self.train_transforms = timm.data.create_transform(**self.data_config, is_training=True)\n```\n\n2. **Loss computation is built-in**: `TimmWrapperForImageClassification.forward()` accepts a `labels` argument and computes cross-entropy loss automatically, which is exactly what Trainer expects:\n\n```374:376:transformers/src/transformers/models/timm_wrapper/modeling_timm_wrapper.py\n        loss = None\n        if labels is not None:\n            loss = self.loss_function(labels, logits, self.config)\n```\n\n3. **Returns `ImageClassifierOutput`**: The output format is the standard transformers output, so Trainer handles it seamlessly.\n\n## Full training example\n\n```python\nfrom transformers import AutoModelForImageClassification, AutoImageProcessor, Trainer, TrainingArguments\nfrom datasets import load_dataset\n\n# Load dataset\ndataset = load_dataset(\"food101\", split=\"train[:5000]\")\ndataset = dataset.train_test_split(test_size=0.2)\n\n# Load timm model + processor\ncheckpoint = \"timm/resnet50.a1_in1k\"\nimage_processor = AutoImageProcessor.from_pretrained(checkpoint)\nmodel = AutoModelForImageClassification.from_pretrained(\n    checkpoint,\n    num_labels=101,\n    ignore_mismatched_sizes=True,\n)\n\n# Preprocessing\ndef transform(batch):\n    batch[\"pixel_values\"] = [image_processor(img)[\"pixel_values\"][0] for img in batch[\"image\"]]\n    batch[\"labels\"] = batch[\"label\"]\n    return batch\n\ndataset[\"train\"].set_transform(transform)\ndataset[\"test\"].set_transform(transform)\n\n# Train\ntraining_args = TrainingArguments(\n    output_dir=\"./timm-finetuned\",\n    num_train_epochs=3,\n    per_device_train_batch_size=16,\n    per_device_eval_batch_size=16,\n    eval_strategy=\"epoch\",\n    save_strategy=\"epoch\",\n    logging_steps=50,\n    remove_unused_columns=False,\n)\n\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=dataset[\"train\"],\n    eval_dataset=dataset[\"test\"],\n)\n\ntrainer.train()\n```\n\nAny timm checkpoint on the Hub (prefixed with `timm/`) works out of the box (ResNet, EfficientNet, ViT, ConvNeXt, etc). The wrapper handles all the translation between timm's interface and what Trainer expects."
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/dataset_inspector.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"\nDataset Format Inspector for Vision Model Training\n\nInspects Hugging Face datasets to determine compatibility with object detection\nand image classification training.\nUses Datasets Server API for instant results - no dataset download needed!\n\nULTRA-EFFICIENT: Uses HF Datasets Server API - completes in <2 seconds.\n\nUsage with HF Jobs:\n    hf_jobs(\"uv\", {\n        \"script\": \"path/to/dataset_inspector.py\",\n        \"script_args\": [\"--dataset\", \"your/dataset\", \"--split\", \"train\"]\n    })\n\"\"\"\n\nimport argparse\nimport math\nimport sys\nimport json\nimport urllib.request\nimport urllib.parse\nfrom typing import List, Dict, Any, Tuple\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Inspect dataset format for vision model training\")\n    parser.add_argument(\"--dataset\", type=str, required=True, help=\"Dataset name\")\n    parser.add_argument(\"--split\", type=str, default=\"train\", help=\"Dataset split (default: train)\")\n    parser.add_argument(\"--config\", type=str, default=\"default\", help=\"Dataset config name (default: default)\")\n    parser.add_argument(\"--preview\", type=int, default=150, help=\"Max chars per field preview\")\n    parser.add_argument(\"--samples\", type=int, default=5, help=\"Number of samples to fetch (default: 5)\")\n    parser.add_argument(\"--json-output\", action=\"store_true\", help=\"Output as JSON\")\n    return parser.parse_args()\n\n\ndef api_request(url: str) -> Dict:\n    \"\"\"Make API request to Datasets Server\"\"\"\n    try:\n        with urllib.request.urlopen(url, timeout=10) as response:\n            return json.loads(response.read().decode())\n    except urllib.error.HTTPError as e:\n        if e.code == 404:\n            return None\n        raise Exception(f\"API request failed: {e.code} {e.reason}\")\n    except Exception as e:\n        raise Exception(f\"API request failed: {str(e)}\")\n\n\ndef get_splits(dataset: str) -> Dict:\n    \"\"\"Get available splits for dataset\"\"\"\n    url = f\"https://datasets-server.huggingface.co/splits?dataset={urllib.parse.quote(dataset)}\"\n    return api_request(url)\n\n\ndef get_rows(dataset: str, config: str, split: str, offset: int = 0, length: int = 5) -> Dict:\n    \"\"\"Get rows from dataset\"\"\"\n    url = f\"https://datasets-server.huggingface.co/rows?dataset={urllib.parse.quote(dataset)}&config={config}&split={split}&offset={offset}&length={length}\"\n    return api_request(url)\n\n\ndef find_columns(columns: List[str], patterns: List[str]) -> List[str]:\n    \"\"\"Find columns matching patterns\"\"\"\n    return [c for c in columns if any(p in c.lower() for p in patterns)]\n\n\ndef detect_bbox_format(bbox: List[float], image_size: Tuple[int, int] = None) -> str:\n    \"\"\"\n    Detect bounding box format based on values and optionally image dimensions.\n    Common formats:\n    - [x_min, y_min, x_max, y_max] - XYXY (Pascal VOC)\n    - [x_min, y_min, width, height] - XYWH (COCO)\n    - [x_center, y_center, width, height] - CXCYWH (YOLO normalized)\n    \"\"\"\n    if len(bbox) != 4:\n        return \"unknown (not 4 values)\"\n\n    a, b, c, d = bbox\n\n    is_normalized = all(0 <= v <= 1 for v in bbox)\n\n    if c < a or d < b:\n        if is_normalized:\n            return \"xywh_normalized\"\n        return \"xywh (COCO style)\"\n\n    # c > a and d > b — ambiguous between xyxy and xywh.\n    # Use image dimensions to disambiguate when available.\n    if image_size is not None:\n        img_w, img_h = image_size\n        # If interpreting as xywh, right edge = a + c; if that overshoots the\n        # image while c alone fits, the format is more likely xyxy.\n        xywh_exceeds = (a + c > img_w * 1.05) or (b + d > img_h * 1.05)\n        xyxy_exceeds = (c > img_w * 1.05) or (d > img_h * 1.05)\n        if xywh_exceeds and not xyxy_exceeds:\n            return \"xyxy (Pascal VOC style)\"\n        if xyxy_exceeds and not xywh_exceeds:\n            return \"xywh (COCO style)\"\n\n    if is_normalized:\n        return \"xyxy_normalized\"\n    return \"xyxy (Pascal VOC style)\"\n\n\ndef _extract_image_size(row: Dict) -> Tuple[int, int] | None:\n    \"\"\"Try to extract (width, height) from the image column returned by Datasets Server.\"\"\"\n    for col in (\"image\", \"img\", \"picture\", \"photo\"):\n        img = row.get(col)\n        if isinstance(img, dict):\n            w = img.get(\"width\")\n            h = img.get(\"height\")\n            if isinstance(w, (int, float)) and isinstance(h, (int, float)):\n                return (int(w), int(h))\n    return None\n\n\ndef analyze_annotations(sample_rows: List[Dict], annotation_cols: List[str]) -> Dict[str, Any]:\n    \"\"\"Analyze annotation structure from sample rows\"\"\"\n    if not annotation_cols:\n        return {\"found\": False}\n\n    annotation_col = annotation_cols[0]\n    annotations_info = {\n        \"found\": True,\n        \"column\": annotation_col,\n        \"sample_structures\": [],\n        \"bbox_formats\": [],\n        \"categories_found\": [],\n        \"avg_objects_per_image\": 0,\n        \"max_objects\": 0,\n        \"min_objects\": float('inf'),\n    }\n\n    total_objects = 0\n    valid_samples = 0\n\n    for row in sample_rows:\n        ann = row[\"row\"].get(annotation_col)\n        if not ann:\n            continue\n\n        valid_samples += 1\n        image_size = _extract_image_size(row[\"row\"])\n\n        # Check if it's a list of annotations or a dict\n        if isinstance(ann, dict):\n            # COCO-style or structured annotation\n            sample_structure = {\n                \"type\": \"dict\",\n                \"keys\": list(ann.keys())\n            }\n\n            # Check for bounding boxes\n            if \"bbox\" in ann or \"bboxes\" in ann:\n                bbox_key = \"bbox\" if \"bbox\" in ann else \"bboxes\"\n                bboxes = ann[bbox_key]\n                if isinstance(bboxes, list) and len(bboxes) > 0:\n                    if isinstance(bboxes[0], list):\n                        # Multiple bboxes\n                        num_objects = len(bboxes)\n                        total_objects += num_objects\n                        annotations_info[\"max_objects\"] = max(annotations_info[\"max_objects\"], num_objects)\n                        annotations_info[\"min_objects\"] = min(annotations_info[\"min_objects\"], num_objects)\n\n                        # Analyze first bbox format\n                        bbox_format = detect_bbox_format(bboxes[0], image_size)\n                        annotations_info[\"bbox_formats\"].append(bbox_format)\n                    else:\n                        # Single bbox\n                        total_objects += 1\n                        annotations_info[\"max_objects\"] = max(annotations_info[\"max_objects\"], 1)\n                        annotations_info[\"min_objects\"] = min(annotations_info[\"min_objects\"], 1)\n                        bbox_format = detect_bbox_format(bboxes, image_size)\n                        annotations_info[\"bbox_formats\"].append(bbox_format)\n\n            # Check for categories/classes\n            for key in [\"category\", \"categories\", \"label\", \"labels\", \"class\", \"classes\", \"category_id\"]:\n                if key in ann:\n                    cats = ann[key]\n                    if isinstance(cats, list):\n                        annotations_info[\"categories_found\"].extend([str(c) for c in cats])\n                    else:\n                        annotations_info[\"categories_found\"].append(str(cats))\n\n            annotations_info[\"sample_structures\"].append(sample_structure)\n\n        elif isinstance(ann, list):\n            # List of annotation dicts\n            sample_structure = {\n                \"type\": \"list\",\n                \"length\": len(ann),\n                \"item_type\": type(ann[0]).__name__ if ann else None\n            }\n\n            if ann and isinstance(ann[0], dict):\n                sample_structure[\"item_keys\"] = list(ann[0].keys())\n\n                # Count objects\n                num_objects = len(ann)\n                total_objects += num_objects\n                annotations_info[\"max_objects\"] = max(annotations_info[\"max_objects\"], num_objects)\n                annotations_info[\"min_objects\"] = min(annotations_info[\"min_objects\"], num_objects)\n\n                # Check first annotation\n                first_ann = ann[0]\n                if \"bbox\" in first_ann:\n                    bbox_format = detect_bbox_format(first_ann[\"bbox\"], image_size)\n                    annotations_info[\"bbox_formats\"].append(bbox_format)\n\n                # Check for categories\n                for key in [\"category\", \"label\", \"class\", \"category_id\"]:\n                    if key in first_ann:\n                        for item in ann:\n                            if key in item:\n                                annotations_info[\"categories_found\"].append(str(item[key]))\n\n            annotations_info[\"sample_structures\"].append(sample_structure)\n\n    if valid_samples > 0:\n        annotations_info[\"avg_objects_per_image\"] = round(total_objects / valid_samples, 2)\n\n    if annotations_info[\"min_objects\"] == float('inf'):\n        annotations_info[\"min_objects\"] = 0\n\n    # Get unique categories\n    annotations_info[\"categories_found\"] = list(set(annotations_info[\"categories_found\"]))\n    annotations_info[\"num_classes\"] = len(annotations_info[\"categories_found\"])\n\n    # Get most common bbox format\n    if annotations_info[\"bbox_formats\"]:\n        from collections import Counter\n        format_counts = Counter(annotations_info[\"bbox_formats\"])\n        annotations_info[\"primary_bbox_format\"] = format_counts.most_common(1)[0][0]\n\n    return annotations_info\n\n\ndef check_image_classification_compatibility(columns: List[str], sample_rows: List[Dict], features: List[Dict]) -> Dict[str, Any]:\n    \"\"\"Check image classification dataset compatibility\"\"\"\n\n    image_cols = find_columns(columns, [\"image\", \"img\", \"picture\", \"photo\"])\n    has_image = len(image_cols) > 0\n\n    label_cols = find_columns(columns, [\"label\", \"labels\", \"class\", \"fine_label\", \"coarse_label\"])\n    has_label = len(label_cols) > 0\n\n    label_info: Dict[str, Any] = {\"found\": has_label}\n\n    if has_label:\n        label_col = label_cols[0]\n        label_info[\"column\"] = label_col\n\n        # Detect whether label is ClassLabel (int with names) or plain int/string\n        for f in features:\n            if f.get(\"name\") == label_col:\n                ftype = f.get(\"type\", \"\")\n                if isinstance(ftype, dict) and ftype.get(\"_type\") == \"ClassLabel\":\n                    label_info[\"type\"] = \"ClassLabel\"\n                    names = ftype.get(\"names\", [])\n                    label_info[\"num_classes\"] = len(names)\n                    label_info[\"class_names\"] = names[:20]\n                    if len(names) > 20:\n                        label_info[\"class_names_truncated\"] = True\n                elif isinstance(ftype, dict) and ftype.get(\"dtype\") in (\"int64\", \"int32\", \"int8\"):\n                    label_info[\"type\"] = \"int\"\n                elif isinstance(ftype, dict) and ftype.get(\"dtype\") == \"string\":\n                    label_info[\"type\"] = \"string\"\n                break\n\n        # Discover unique labels from samples if ClassLabel info wasn't in features\n        if \"num_classes\" not in label_info:\n            unique = set()\n            for row in sample_rows:\n                val = row[\"row\"].get(label_col)\n                if val is not None:\n                    unique.add(val)\n            label_info[\"sample_unique_labels\"] = sorted(unique, key=str)[:20]\n            label_info[\"sample_unique_count\"] = len(unique)\n\n    ready = has_image and has_label\n    return {\n        \"ready\": ready,\n        \"has_image\": has_image,\n        \"image_columns\": image_cols,\n        \"has_label\": has_label,\n        \"label_columns\": label_cols,\n        \"label_info\": label_info,\n    }\n\n\ndef check_object_detection_compatibility(columns: List[str], sample_rows: List[Dict]) -> Dict[str, Any]:\n    \"\"\"Check object detection dataset compatibility\"\"\"\n\n    # Find image column\n    image_cols = find_columns(columns, [\"image\", \"img\", \"picture\", \"photo\"])\n    has_image = len(image_cols) > 0\n\n    # Find annotation columns\n    annotation_cols = find_columns(columns, [\"objects\", \"annotations\", \"ann\", \"bbox\", \"bboxes\", \"detection\"])\n    has_annotations = len(annotation_cols) > 0\n\n    # Analyze annotations\n    annotations_info = analyze_annotations(sample_rows, annotation_cols) if has_annotations else {\"found\": False}\n\n    # Check for separate bbox and category columns\n    bbox_cols = find_columns(columns, [\"bbox\", \"bboxes\", \"boxes\"])\n    category_cols = find_columns(columns, [\"category\", \"label\", \"class\", \"categories\", \"labels\", \"classes\"])\n\n    # Determine readiness\n    ready = has_image and (has_annotations or (len(bbox_cols) > 0 and len(category_cols) > 0))\n\n    return {\n        \"ready\": ready,\n        \"has_image\": has_image,\n        \"image_columns\": image_cols,\n        \"has_annotations\": has_annotations,\n        \"annotation_columns\": annotation_cols,\n        \"separate_bbox_columns\": bbox_cols,\n        \"separate_category_columns\": category_cols,\n        \"annotations_info\": annotations_info,\n    }\n\n\ndef check_sam_segmentation_compatibility(columns: List[str], sample_rows: List[Dict], features: List[Dict]) -> Dict[str, Any]:\n    \"\"\"Check SAM/SAM2 segmentation dataset compatibility.\n\n    A valid SAM segmentation dataset needs:\n    - An image column\n    - A mask column (binary ground-truth segmentation mask)\n    - A prompt: either a bbox prompt or point prompt (in a JSON prompt column, or dedicated columns)\n    \"\"\"\n\n    image_cols = find_columns(columns, [\"image\", \"img\", \"picture\", \"photo\"])\n    has_image = len(image_cols) > 0\n\n    mask_cols = find_columns(columns, [\"mask\", \"segmentation\", \"alpha\", \"matte\"])\n    has_mask = len(mask_cols) > 0\n\n    prompt_cols = find_columns(columns, [\"prompt\"])\n    bbox_cols = [c for c in columns if c in (\"bbox\", \"bboxes\", \"box\", \"boxes\")]\n    point_cols = [c for c in columns if c in (\"point\", \"points\", \"input_point\", \"input_points\")]\n\n    prompt_info: Dict[str, Any] = {\n        \"has_prompt\": False,\n        \"prompt_type\": None,\n        \"source\": None,\n        \"bbox_valid\": None,\n    }\n\n    # Try JSON prompt column first\n    if prompt_cols:\n        for row in sample_rows:\n            raw = row[\"row\"].get(prompt_cols[0])\n            if raw is None:\n                continue\n            parsed = raw if isinstance(raw, dict) else _try_json(raw)\n            if parsed is None:\n                continue\n\n            if isinstance(parsed, dict):\n                if \"bbox\" in parsed or \"box\" in parsed:\n                    prompt_info[\"has_prompt\"] = True\n                    prompt_info[\"prompt_type\"] = \"bbox\"\n                    prompt_info[\"source\"] = f\"JSON column '{prompt_cols[0]}'\"\n                    bbox = parsed.get(\"bbox\") or parsed.get(\"box\")\n                    prompt_info[\"bbox_valid\"] = _validate_bbox(bbox, _extract_image_size(row[\"row\"]))\n                    break\n                elif \"point\" in parsed or \"points\" in parsed:\n                    prompt_info[\"has_prompt\"] = True\n                    prompt_info[\"prompt_type\"] = \"point\"\n                    prompt_info[\"source\"] = f\"JSON column '{prompt_cols[0]}'\"\n                    break\n\n    if not prompt_info[\"has_prompt\"] and bbox_cols:\n        prompt_info[\"has_prompt\"] = True\n        prompt_info[\"prompt_type\"] = \"bbox\"\n        prompt_info[\"source\"] = f\"column '{bbox_cols[0]}'\"\n        for row in sample_rows:\n            bbox = row[\"row\"].get(bbox_cols[0])\n            if bbox is not None:\n                prompt_info[\"bbox_valid\"] = _validate_bbox(bbox, _extract_image_size(row[\"row\"]))\n                break\n\n    if not prompt_info[\"has_prompt\"] and point_cols:\n        prompt_info[\"has_prompt\"] = True\n        prompt_info[\"prompt_type\"] = \"point\"\n        prompt_info[\"source\"] = f\"column '{point_cols[0]}'\"\n\n    ready = has_image and has_mask and prompt_info[\"has_prompt\"]\n\n    return {\n        \"ready\": ready,\n        \"has_image\": has_image,\n        \"image_columns\": image_cols,\n        \"has_mask\": has_mask,\n        \"mask_columns\": mask_cols,\n        \"prompt_columns\": prompt_cols,\n        \"bbox_columns\": bbox_cols,\n        \"point_columns\": point_cols,\n        \"prompt_info\": prompt_info,\n    }\n\n\ndef _try_json(value) -> Any:\n    if not isinstance(value, str):\n        return None\n    try:\n        return json.loads(value)\n    except (json.JSONDecodeError, TypeError):\n        return None\n\n\ndef _validate_bbox(bbox, image_size=None) -> Dict[str, Any]:\n    \"\"\"Validate a single bounding box and return diagnostics.\"\"\"\n    result: Dict[str, Any] = {\"valid\": False}\n    if not isinstance(bbox, (list, tuple)):\n        result[\"error\"] = \"bbox is not a list\"\n        return result\n    if len(bbox) != 4:\n        result[\"error\"] = f\"expected 4 values, got {len(bbox)}\"\n        return result\n    try:\n        vals = [float(v) for v in bbox]\n    except (TypeError, ValueError):\n        result[\"error\"] = \"non-numeric values\"\n        return result\n\n    if not all(math.isfinite(v) for v in vals):\n        result[\"error\"] = \"contains non-finite values\"\n        return result\n\n    x0, y0, x1, y1 = vals\n    if x1 <= x0 or y1 <= y0:\n        if vals[2] > 0 and vals[3] > 0:\n            result[\"format_hint\"] = \"likely xywh\"\n        else:\n            result[\"error\"] = \"degenerate bbox (zero or negative area)\"\n            return result\n    else:\n        result[\"format_hint\"] = \"likely xyxy\"\n\n    if image_size is not None:\n        img_w, img_h = image_size\n        if any(v > max(img_w, img_h) * 1.5 for v in vals):\n            result[\"warning\"] = \"coordinates exceed image bounds\"\n\n    result[\"valid\"] = True\n    result[\"values\"] = vals\n    return result\n\n\ndef generate_mapping_code(info: Dict[str, Any]) -> str:\n    \"\"\"Generate mapping code if needed\"\"\"\n    if info[\"ready\"]:\n        ann_info = info[\"annotations_info\"]\n        if not ann_info.get(\"found\"):\n            return None\n\n        # Check if format conversion is needed\n        ann_col = ann_info.get(\"column\")\n        bbox_format = ann_info.get(\"primary_bbox_format\", \"unknown\")\n\n        if \"coco\" in bbox_format.lower() or \"xywh\" in bbox_format.lower():\n            # Already COCO format\n            return f\"\"\"# Dataset appears to be in COCO format (xywh)\n# Image column: {info['image_columns'][0] if info['image_columns'] else 'image'}\n# Annotation column: {ann_col}\n# Use directly with transformers object detection models\"\"\"\n        elif \"xyxy\" in bbox_format.lower():\n            # Need to convert from XYXY to XYWH\n            return f\"\"\"# Convert from XYXY (Pascal VOC) to XYWH (COCO) format\ndef convert_to_coco_format(example):\n    annotations = example['{ann_col}']\n    if isinstance(annotations, list):\n        for ann in annotations:\n            if 'bbox' in ann:\n                x_min, y_min, x_max, y_max = ann['bbox']\n                ann['bbox'] = [x_min, y_min, x_max - x_min, y_max - y_min]\n    elif isinstance(annotations, dict) and 'bbox' in annotations:\n        bbox = annotations['bbox']\n        if isinstance(bbox, list) and len(bbox) > 0 and isinstance(bbox[0], list):\n            for i, box in enumerate(bbox):\n                x_min, y_min, x_max, y_max = box\n                bbox[i] = [x_min, y_min, x_max - x_min, y_max - y_min]\n    return example\n\ndataset = dataset.map(convert_to_coco_format)\"\"\"\n\n    elif not info[\"ready\"]:\n        # Need to create annotations structure\n        if info[\"separate_bbox_columns\"] and info[\"separate_category_columns\"]:\n            bbox_col = info[\"separate_bbox_columns\"][0]\n            cat_col = info[\"separate_category_columns\"][0]\n\n            return f\"\"\"# Combine separate bbox and category columns\ndef create_annotations(example):\n    bboxes = example['{bbox_col}']\n    categories = example['{cat_col}']\n\n    if not isinstance(bboxes, list):\n        bboxes = [bboxes]\n    if not isinstance(categories, list):\n        categories = [categories]\n\n    annotations = []\n    for bbox, cat in zip(bboxes, categories):\n        annotations.append({{'bbox': bbox, 'category': cat}})\n\n    example['objects'] = annotations\n    return example\n\ndataset = dataset.map(create_annotations)\"\"\"\n\n    return None\n\n\ndef format_value_preview(value: Any, max_chars: int) -> str:\n    \"\"\"Format value for preview\"\"\"\n    if value is None:\n        return \"None\"\n    elif isinstance(value, str):\n        return value[:max_chars] + (\"...\" if len(value) > max_chars else \"\")\n    elif isinstance(value, dict):\n        keys = list(value.keys())\n        return f\"{{dict with {len(keys)} keys: {', '.join(keys[:5])}}}\"\n    elif isinstance(value, list):\n        if len(value) == 0:\n            return \"[]\"\n        elif isinstance(value[0], dict):\n            return f\"[{len(value)} items] First item keys: {list(value[0].keys())}\"\n        elif isinstance(value[0], list):\n            return f\"[{len(value)} items] First item: {value[0]}\"\n        else:\n            preview = str(value)\n            return preview[:max_chars] + (\"...\" if len(preview) > max_chars else \"\")\n    else:\n        preview = str(value)\n        return preview[:max_chars] + (\"...\" if len(preview) > max_chars else \"\")\n\n\ndef main():\n    args = parse_args()\n\n    print(f\"Fetching dataset info via Datasets Server API...\")\n\n    try:\n        # Get splits info\n        splits_data = get_splits(args.dataset)\n        if not splits_data or \"splits\" not in splits_data:\n            print(f\"ERROR: Could not fetch splits for dataset '{args.dataset}'\")\n            print(f\"       Dataset may not exist or is not accessible via Datasets Server API\")\n            sys.exit(1)\n\n        # Find the right config\n        available_configs = set()\n        split_found = False\n        config_to_use = args.config\n\n        for split_info in splits_data[\"splits\"]:\n            available_configs.add(split_info[\"config\"])\n            if split_info[\"config\"] == args.config and split_info[\"split\"] == args.split:\n                split_found = True\n\n        # If default config not found, try first available\n        if not split_found and available_configs:\n            config_to_use = list(available_configs)[0]\n            print(f\"Config '{args.config}' not found, trying '{config_to_use}'...\")\n\n        # Get rows\n        rows_data = get_rows(args.dataset, config_to_use, args.split, offset=0, length=args.samples)\n\n        if not rows_data or \"rows\" not in rows_data:\n            print(f\"ERROR: Could not fetch rows for dataset '{args.dataset}'\")\n            print(f\"       Split '{args.split}' may not exist\")\n            print(f\"       Available configs: {', '.join(sorted(available_configs))}\")\n            sys.exit(1)\n\n        rows = rows_data[\"rows\"]\n        if not rows:\n            print(f\"ERROR: No rows found in split '{args.split}'\")\n            sys.exit(1)\n\n        # Extract column info from first row\n        first_row = rows[0][\"row\"]\n        columns = list(first_row.keys())\n        features = rows_data.get(\"features\", [])\n\n        # Get total count if available\n        total_examples = \"Unknown\"\n        for split_info in splits_data[\"splits\"]:\n            if split_info[\"config\"] == config_to_use and split_info[\"split\"] == args.split:\n                total_examples = f\"{split_info.get('num_examples', 'Unknown'):,}\" if isinstance(split_info.get('num_examples'), int) else \"Unknown\"\n                break\n\n    except Exception as e:\n        print(f\"ERROR: {str(e)}\")\n        sys.exit(1)\n\n    # Run compatibility checks\n    od_info = check_object_detection_compatibility(columns, rows)\n    ic_info = check_image_classification_compatibility(columns, rows, features)\n    sam_info = check_sam_segmentation_compatibility(columns, rows, features)\n\n    # JSON output mode\n    if args.json_output:\n        result = {\n            \"dataset\": args.dataset,\n            \"config\": config_to_use,\n            \"split\": args.split,\n            \"total_examples\": total_examples,\n            \"columns\": columns,\n            \"features\": [{\"name\": f[\"name\"], \"type\": f[\"type\"]} for f in features] if features else [],\n            \"object_detection_compatibility\": od_info,\n            \"image_classification_compatibility\": ic_info,\n            \"sam_segmentation_compatibility\": sam_info,\n        }\n        print(json.dumps(result, indent=2))\n        sys.exit(0)\n\n    # Human-readable output optimized for LLM parsing\n    print(\"=\" * 80)\n    print(f\"VISION DATASET INSPECTION\")\n    print(\"=\" * 80)\n\n    print(f\"\\nDataset: {args.dataset}\")\n    print(f\"Config: {config_to_use}\")\n    print(f\"Split: {args.split}\")\n    print(f\"Total examples: {total_examples}\")\n    print(f\"Samples fetched: {len(rows)}\")\n\n    print(f\"\\n{'COLUMNS':-<80}\")\n    if features:\n        for feature in features:\n            print(f\"  {feature['name']}: {feature['type']}\")\n    else:\n        for col in columns:\n            print(f\"  {col}: (type info not available)\")\n\n    print(f\"\\n{'EXAMPLE DATA':-<80}\")\n    example = first_row\n    for col in columns:\n        value = example.get(col)\n        display = format_value_preview(value, args.preview)\n        print(f\"\\n{col}:\")\n        print(f\"  {display}\")\n\n    # --- Image Classification ---\n    print(f\"\\n{'IMAGE CLASSIFICATION COMPATIBILITY':-<80}\")\n    print(f\"\\n[STATUS] {'✓ READY' if ic_info['ready'] else '✗ NOT COMPATIBLE'}\")\n\n    print(f\"\\nImage Column:\")\n    if ic_info[\"has_image\"]:\n        print(f\"  ✓ Found: {', '.join(ic_info['image_columns'])}\")\n    else:\n        print(f\"  ✗ No image column detected\")\n\n    print(f\"\\nLabel Column:\")\n    if ic_info[\"has_label\"]:\n        print(f\"  ✓ Found: {', '.join(ic_info['label_columns'])}\")\n        li = ic_info[\"label_info\"]\n        if li.get(\"type\"):\n            print(f\"    • Type: {li['type']}\")\n        if li.get(\"num_classes\"):\n            print(f\"    • Number of Classes: {li['num_classes']}\")\n        if li.get(\"class_names\"):\n            names = li[\"class_names\"]\n            display = \", \".join(str(n) for n in names[:10])\n            if len(names) > 10:\n                display += f\" ... ({li['num_classes']} total)\"\n            print(f\"    • Classes: {display}\")\n        elif li.get(\"sample_unique_labels\"):\n            labels = li[\"sample_unique_labels\"]\n            display = \", \".join(str(l) for l in labels[:10])\n            if li.get(\"sample_unique_count\", 0) > 10:\n                display += f\" ... ({li['sample_unique_count']}+ from sample)\"\n            print(f\"    • Sample labels: {display}\")\n    else:\n        print(f\"  ✗ No label column detected\")\n        print(f\"  Expected column names: 'label', 'labels', 'class', 'fine_label'\")\n\n    if ic_info[\"ready\"]:\n        lc = ic_info[\"label_info\"].get(\"column\", \"label\")\n        print(f\"\\n  Use with: scripts/image_classification_training.py\")\n        print(f\"    --image_column_name {ic_info['image_columns'][0]} --label_column_name {lc}\")\n\n    # --- Object Detection ---\n    print(f\"\\n{'OBJECT DETECTION COMPATIBILITY':-<80}\")\n    print(f\"\\n[STATUS] {'✓ READY' if od_info['ready'] else '✗ NOT COMPATIBLE'}\")\n\n    print(f\"\\nImage Column:\")\n    if od_info[\"has_image\"]:\n        print(f\"  ✓ Found: {', '.join(od_info['image_columns'])}\")\n    else:\n        print(f\"  ✗ No image column detected\")\n        print(f\"  Expected column names: 'image', 'img', 'picture', 'photo'\")\n\n    print(f\"\\nAnnotations:\")\n    if od_info[\"has_annotations\"]:\n        print(f\"  ✓ Found: {', '.join(od_info['annotation_columns'])}\")\n        ann_info = od_info[\"annotations_info\"]\n        if ann_info.get(\"found\"):\n            print(f\"\\n  Annotation Details:\")\n            print(f\"    • Column: {ann_info['column']}\")\n            if ann_info.get(\"primary_bbox_format\"):\n                print(f\"    • BBox Format: {ann_info['primary_bbox_format']}\")\n            if ann_info.get(\"num_classes\", 0) > 0:\n                print(f\"    • Number of Classes: {ann_info['num_classes']}\")\n                print(f\"    • Classes: {', '.join(ann_info['categories_found'][:10])}\")\n                if len(ann_info['categories_found']) > 10:\n                    print(f\"      (showing first 10 of {len(ann_info['categories_found'])})\")\n            print(f\"    • Avg Objects/Image: {ann_info['avg_objects_per_image']}\")\n            print(f\"    • Min Objects: {ann_info['min_objects']}\")\n            print(f\"    • Max Objects: {ann_info['max_objects']}\")\n    elif od_info[\"separate_bbox_columns\"] and od_info[\"separate_category_columns\"]:\n        print(f\"  ⚠ Separate bbox and category columns found:\")\n        print(f\"    BBox columns: {', '.join(od_info['separate_bbox_columns'])}\")\n        print(f\"    Category columns: {', '.join(od_info['separate_category_columns'])}\")\n        print(f\"  Action: These need to be combined (see mapping code below)\")\n    else:\n        print(f\"  ✗ No annotation columns detected\")\n        print(f\"  Expected: 'objects', 'annotations', 'bbox'/'bboxes' + 'category'/'label'\")\n\n    # --- SAM Segmentation ---\n    print(f\"\\n{'SAM SEGMENTATION COMPATIBILITY':-<80}\")\n    print(f\"\\n[STATUS] {'✓ READY' if sam_info['ready'] else '✗ NOT COMPATIBLE'}\")\n\n    print(f\"\\nImage Column:\")\n    if sam_info[\"has_image\"]:\n        print(f\"  ✓ Found: {', '.join(sam_info['image_columns'])}\")\n    else:\n        print(f\"  ✗ No image column detected\")\n\n    print(f\"\\nMask Column:\")\n    if sam_info[\"has_mask\"]:\n        print(f\"  ✓ Found: {', '.join(sam_info['mask_columns'])}\")\n    else:\n        print(f\"  ✗ No mask column detected\")\n        print(f\"  Expected column names: 'mask', 'segmentation', 'alpha', 'matte'\")\n\n    print(f\"\\nPrompt:\")\n    pi = sam_info[\"prompt_info\"]\n    if pi[\"has_prompt\"]:\n        print(f\"  ✓ Type: {pi['prompt_type']} (from {pi['source']})\")\n        if pi.get(\"bbox_valid\"):\n            bv = pi[\"bbox_valid\"]\n            if bv[\"valid\"]:\n                print(f\"    • BBox values: {bv.get('values')}\")\n                if bv.get(\"format_hint\"):\n                    print(f\"    • Format: {bv['format_hint']}\")\n                if bv.get(\"warning\"):\n                    print(f\"    ⚠ {bv['warning']}\")\n            else:\n                print(f\"    ✗ Invalid bbox: {bv.get('error', 'unknown error')}\")\n    else:\n        print(f\"  ✗ No prompt detected\")\n        print(f\"  Expected: 'prompt' column (JSON with bbox/point), or 'bbox'/'point' column\")\n\n    if sam_info[\"ready\"]:\n        pc = sam_info[\"prompt_columns\"][0] if sam_info[\"prompt_columns\"] else None\n        args_hint = f\"--prompt_type {pi['prompt_type']}\"\n        if pc:\n            args_hint += f\" --prompt_column_name {pc}\"\n        print(f\"\\n  Use with: scripts/sam_segmentation_training.py\")\n        print(f\"    {args_hint}\")\n\n    # Mapping code (OD only)\n    mapping_code = generate_mapping_code(od_info)\n\n    if mapping_code:\n        print(f\"\\n{'OD PREPROCESSING CODE':-<80}\")\n        print(mapping_code)\n    elif od_info[\"ready\"]:\n        print(f\"\\n  ✓ No OD preprocessing needed.\")\n\n    # --- Summary ---\n    print(f\"\\n{'SUMMARY':-<80}\")\n    if ic_info[\"ready\"]:\n        num_cls = ic_info[\"label_info\"].get(\"num_classes\") or ic_info[\"label_info\"].get(\"sample_unique_count\", \"?\")\n        print(f\"✓ Image Classification: READY ({num_cls} classes)\")\n    else:\n        print(f\"✗ Image Classification: not compatible\")\n\n    if od_info[\"ready\"]:\n        ann_info = od_info[\"annotations_info\"]\n        fmt = ann_info.get(\"primary_bbox_format\", \"\")\n        cls = ann_info.get(\"num_classes\", \"?\")\n        print(f\"✓ Object Detection: READY ({cls} classes, {fmt})\")\n    else:\n        print(f\"✗ Object Detection: not compatible\")\n\n    if sam_info[\"ready\"]:\n        print(f\"✓ SAM Segmentation: READY (prompt: {pi['prompt_type']})\")\n    else:\n        print(f\"✗ SAM Segmentation: not compatible\")\n\n    print(f\"\\nNote: Used Datasets Server API (instant, no download required)\")\n\n    print(\"\\n\" + \"=\" * 80)\n    sys.exit(0)\n\n\nif __name__ == \"__main__\":\n    try:\n        main()\n    except KeyboardInterrupt:\n        sys.exit(0)\n    except Exception as e:\n        print(f\"ERROR: {e}\", file=sys.stderr)\n        sys.exit(1)\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/estimate_cost.py",
    "content": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = []\n# ///\n\"\"\"\nEstimate training time and cost for vision model training jobs on Hugging Face Jobs.\n\nUsage:\n    uv run estimate_cost.py --model ustc-community/dfine-small-coco --dataset cppe-5 --hardware t4-small\n    uv run estimate_cost.py --model PekingU/rtdetr_v2_r50vd --dataset-size 5000 --hardware t4-small --epochs 30\n    uv run estimate_cost.py --model google/vit-base-patch16-224-in21k --dataset ethz/food101 --hardware t4-small --epochs 3\n\"\"\"\n\nimport argparse\n\nHARDWARE_COSTS = {\n    \"t4-small\": 0.40,\n    \"t4-medium\": 0.60,\n    \"l4x1\": 0.80,\n    \"l4x4\": 3.80,\n    \"a10g-small\": 1.00,\n    \"a10g-large\": 1.50,\n    \"a10g-largex2\": 3.00,\n    \"a10g-largex4\": 5.00,\n    \"l40sx1\": 1.80,\n    \"l40sx4\": 8.30,\n    \"a100-large\": 2.50,\n    \"a100x4\": 10.00,\n}\n\n# Vision model sizes in millions of parameters\nMODEL_PARAMS_M = {\n    # Object detection\n    \"dfine-small\": 10.4,\n    \"dfine-large\": 31.4,\n    \"dfine-xlarge\": 63.5,\n    \"rtdetr_v2_r18vd\": 20.2,\n    \"rtdetr_v2_r50vd\": 43.0,\n    \"rtdetr_v2_r101vd\": 76.0,\n    \"detr-resnet-50\": 41.3,\n    \"detr-resnet-101\": 60.2,\n    \"yolos-small\": 30.7,\n    \"yolos-tiny\": 6.5,\n    # Image classification\n    \"mobilenetv3_small\": 2.5,\n    \"mobilevit_s\": 5.6,\n    \"resnet50\": 25.6,\n    \"vit_base_patch16\": 86.6,\n    # SAM / SAM2 segmentation\n    \"sam-vit-base\": 93.7,\n    \"sam-vit-large\": 312.3,\n    \"sam-vit-huge\": 641.1,\n    \"sam2.1-hiera-tiny\": 38.9,\n    \"sam2.1-hiera-small\": 46.0,\n    \"sam2.1-hiera-base-plus\": 80.8,\n    \"sam2.1-hiera-large\": 224.4,\n}\n\nKNOWN_DATASETS = {\n    # Object detection\n    \"cppe-5\": 1000,\n    \"merve/license-plate\": 6180,\n    # Image classification\n    \"ethz/food101\": 75750,\n    # SAM segmentation\n    \"merve/MicroMat-mini\": 240,\n}\n\n\ndef extract_model_params(model_name: str) -> float:\n    \"\"\"Extract model size in millions of parameters from the model name.\"\"\"\n    name_lower = model_name.lower()\n    for key, params in MODEL_PARAMS_M.items():\n        if key.lower() in name_lower:\n            return params\n    return 30.0  # reasonable default for vision models\n\n\ndef estimate_training_time(model_params_m: float, dataset_size: int, epochs: int,\n                           image_size: int, batch_size: int, hardware: str) -> float:\n    \"\"\"Estimate training time in hours for vision model training.\"\"\"\n    # Steps per epoch\n    steps_per_epoch = dataset_size / batch_size\n    # empirical calibration values\n    base_secs_per_step = 0.8\n    model_factor = (model_params_m / 30.0) ** 0.6\n    image_factor = (image_size / 640.0) ** 2\n\n\n    batch_factor = (batch_size / 8.0) ** 0.7\n\n    secs_per_step = base_secs_per_step * model_factor * image_factor * batch_factor\n\n    hardware_multipliers = {\n        \"t4-small\": 2.0,\n        \"t4-medium\": 2.0,\n        \"l4x1\": 1.2,\n        \"l4x4\": 0.5,\n        \"a10g-small\": 1.0,\n        \"a10g-large\": 1.0,\n        \"a10g-largex2\": 0.6,\n        \"a10g-largex4\": 0.4,\n        \"l40sx1\": 0.7,\n        \"l40sx4\": 0.25,\n        \"a100-large\": 0.5,\n        \"a100x4\": 0.2,\n    }\n\n    multiplier = hardware_multipliers.get(hardware, 1.0)\n    total_steps = steps_per_epoch * epochs\n    total_secs = total_steps * secs_per_step * multiplier\n\n    # Add overhead: model loading (~2 min), eval per epoch (~10% of training), Hub push (~3 min)\n    eval_overhead = total_secs * 0.10\n    fixed_overhead = 5 * 60  # 5 minutes\n    total_secs += eval_overhead + fixed_overhead\n\n    return total_secs / 3600\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"Estimate training cost for vision model training jobs\")\n    parser.add_argument(\"--model\", required=True,\n                        help=\"Model name (e.g., 'ustc-community/dfine-small-coco' or 'detr-resnet-50')\")\n    parser.add_argument(\"--dataset\", default=None, help=\"Dataset name (for known size lookup)\")\n    parser.add_argument(\"--hardware\", required=True, choices=HARDWARE_COSTS.keys(), help=\"Hardware flavor\")\n    parser.add_argument(\"--dataset-size\", type=int, default=None,\n                        help=\"Number of training images (overrides dataset lookup)\")\n    parser.add_argument(\"--epochs\", type=int, default=30, help=\"Number of training epochs (default: 30)\")\n    parser.add_argument(\"--image-size\", type=int, default=640, help=\"Image square size in pixels (default: 640)\")\n    parser.add_argument(\"--batch-size\", type=int, default=8, help=\"Per-device batch size (default: 8)\")\n    return parser.parse_args()\n\n\ndef main():\n    args = parse_args()\n\n    model_params = extract_model_params(args.model)\n    print(f\"Model: {args.model} (~{model_params:.1f}M parameters)\")\n\n    if args.dataset_size:\n        dataset_size = args.dataset_size\n    elif args.dataset and args.dataset in KNOWN_DATASETS:\n        dataset_size = KNOWN_DATASETS[args.dataset]\n    elif args.dataset:\n        print(f\"Unknown dataset '{args.dataset}', defaulting to 1000 images.\")\n        print(f\"Use --dataset-size to specify the exact count.\")\n        dataset_size = 1000\n    else:\n        dataset_size = 1000\n\n    print(f\"Dataset: {args.dataset or 'custom'} (~{dataset_size} images)\")\n    print(f\"Epochs: {args.epochs}\")\n    print(f\"Image size: {args.image_size}px\")\n    print(f\"Batch size: {args.batch_size}\")\n    print(f\"Hardware: {args.hardware} (${HARDWARE_COSTS[args.hardware]:.2f}/hr)\")\n    print()\n\n    estimated_hours = estimate_training_time(\n        model_params, dataset_size, args.epochs, args.image_size, args.batch_size, args.hardware\n    )\n    estimated_cost = estimated_hours * HARDWARE_COSTS[args.hardware]\n    recommended_timeout = estimated_hours * 1.3  # 30% buffer\n\n    print(f\"Estimated training time: {estimated_hours:.1f} hours\")\n    print(f\"Estimated cost: ${estimated_cost:.2f}\")\n    print(f\"Recommended timeout: {recommended_timeout:.1f}h (with 30% buffer)\")\n    print()\n\n    if estimated_hours > 6:\n        print(\"Warning: Long training time. Consider:\")\n        print(\"   - Reducing epochs or image size\")\n        print(\"   - Using --max_train_samples for a test run first\")\n        print(\"   - Upgrading hardware\")\n        print()\n\n    if model_params > 50 and args.hardware in (\"t4-small\", \"t4-medium\"):\n        print(\"Warning: Large model on T4. If you hit OOM:\")\n        print(\"   - Reduce batch size (try 4, then 2)\")\n        print(\"   - Reduce image size (try 480)\")\n        print(\"   - Upgrade to l4x1 or a10g-small\")\n        print()\n\n    timeout_str = f\"{recommended_timeout:.0f}h\"\n    timeout_secs = int(recommended_timeout * 3600)\n    print(f\"Example job configuration (MCP tool):\")\n    print(f\"\"\"\nhf_jobs(\"uv\", {{\n    \"script\": \"scripts/object_detection_training.py\",\n    \"script_args\": [\n        \"--model_name_or_path\", \"{args.model}\",\n        \"--dataset_name\", \"{args.dataset or 'your-dataset'}\",\n        \"--image_square_size\", \"{args.image_size}\",\n        \"--num_train_epochs\", \"{args.epochs}\",\n        \"--per_device_train_batch_size\", \"{args.batch_size}\",\n        \"--push_to_hub\", \"--do_train\", \"--do_eval\"\n    ],\n    \"flavor\": \"{args.hardware}\",\n    \"timeout\": \"{timeout_str}\",\n    \"secrets\": {{\"HF_TOKEN\": \"$HF_TOKEN\"}}\n}})\n\"\"\")\n    print(f\"Example job configuration (Python API):\")\n    print(f\"\"\"\napi.run_uv_job(\n    script=\"scripts/object_detection_training.py\",\n    script_args=[...],\n    flavor=\"{args.hardware}\",\n    timeout={timeout_secs},\n    secrets={{\"HF_TOKEN\": get_token()}},\n)\n\"\"\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/image_classification_training.py",
    "content": "# /// script\n# dependencies = [\n#     \"transformers>=5.2.0\",\n#     \"accelerate>=1.1.0\",\n#     \"timm\",\n#     \"datasets>=4.0\",\n#     \"evaluate\",\n#     \"scikit-learn\",\n#     \"torchvision\",\n#     \"trackio\",\n#     \"huggingface_hub\",\n# ]\n# ///\n\n\"\"\"Fine-tuning any Transformers or timm model supported by AutoModelForImageClassification using the Trainer API.\"\"\"\n\nimport logging\nimport os\nimport sys\nfrom dataclasses import dataclass, field\nfrom functools import partial\nfrom typing import Any\n\nimport evaluate\nimport numpy as np\nimport torch\nfrom datasets import load_dataset\nfrom torchvision.transforms import (\n    CenterCrop,\n    Compose,\n    Normalize,\n    RandomHorizontalFlip,\n    RandomResizedCrop,\n    Resize,\n    ToTensor,\n)\n\nimport trackio\n\nimport transformers\nfrom transformers import (\n    AutoConfig,\n    AutoImageProcessor,\n    AutoModelForImageClassification,\n    DefaultDataCollator,\n    HfArgumentParser,\n    Trainer,\n    TrainingArguments,\n)\nfrom transformers.trainer import EvalPrediction\nfrom transformers.utils import check_min_version\nfrom transformers.utils.versions import require_version\n\n\nlogger = logging.getLogger(__name__)\n\ncheck_min_version(\"4.57.0.dev0\")\nrequire_version(\"datasets>=2.0.0\")\n\n\n@dataclass\nclass DataTrainingArguments:\n    dataset_name: str = field(\n        default=\"ethz/food101\",\n        metadata={\"help\": \"Name of a dataset from the Hub.\"},\n    )\n    dataset_config_name: str | None = field(\n        default=None,\n        metadata={\"help\": \"The configuration name of the dataset to use (via the datasets library).\"},\n    )\n    train_val_split: float | None = field(\n        default=0.15,\n        metadata={\"help\": \"Fraction to split off of train for validation (used only when no validation split exists).\"},\n    )\n    max_train_samples: int | None = field(\n        default=None,\n        metadata={\"help\": \"Truncate training set to this many samples (for debugging / quick tests).\"},\n    )\n    max_eval_samples: int | None = field(\n        default=None,\n        metadata={\"help\": \"Truncate evaluation set to this many samples.\"},\n    )\n    image_column_name: str = field(\n        default=\"image\",\n        metadata={\"help\": \"The column name for images in the dataset.\"},\n    )\n    label_column_name: str = field(\n        default=\"label\",\n        metadata={\"help\": \"The column name for labels in the dataset.\"},\n    )\n\n\n@dataclass\nclass ModelArguments:\n    model_name_or_path: str = field(\n        default=\"timm/mobilenetv3_small_100.lamb_in1k\",\n        metadata={\"help\": \"Path to pretrained model or model identifier from huggingface.co/models.\"},\n    )\n    config_name: str | None = field(\n        default=None,\n        metadata={\"help\": \"Pretrained config name or path if not the same as model_name.\"},\n    )\n    cache_dir: str | None = field(\n        default=None,\n        metadata={\"help\": \"Where to store pretrained models downloaded from the Hub.\"},\n    )\n    model_revision: str = field(\n        default=\"main\",\n        metadata={\"help\": \"The specific model version to use (branch, tag, or commit id).\"},\n    )\n    image_processor_name: str | None = field(\n        default=None,\n        metadata={\"help\": \"Name or path of image processor config.\"},\n    )\n    ignore_mismatched_sizes: bool = field(\n        default=True,\n        metadata={\"help\": \"Allow loading weights when num_labels differs from pretrained checkpoint.\"},\n    )\n    token: str | None = field(\n        default=None,\n        metadata={\"help\": \"Auth token for private models / datasets.\"},\n    )\n    trust_remote_code: bool = field(\n        default=False,\n        metadata={\"help\": \"Whether to trust remote code from Hub repos.\"},\n    )\n\n\ndef build_transforms(image_processor, is_training: bool):\n    \"\"\"Build torchvision transforms from the image processor's config.\"\"\"\n    if hasattr(image_processor, \"size\"):\n        size = image_processor.size\n        if \"shortest_edge\" in size:\n            img_size = size[\"shortest_edge\"]\n        elif \"height\" in size and \"width\" in size:\n            img_size = (size[\"height\"], size[\"width\"])\n        else:\n            img_size = 224\n    else:\n        img_size = 224\n\n    if hasattr(image_processor, \"image_mean\") and image_processor.image_mean:\n        normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)\n    else:\n        normalize = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])\n\n    if is_training:\n        return Compose([\n            RandomResizedCrop(img_size),\n            RandomHorizontalFlip(),\n            ToTensor(),\n            normalize,\n        ])\n    else:\n        if isinstance(img_size, int):\n            resize_size = int(img_size / 0.875)  # standard 87.5% center crop ratio\n        else:\n            resize_size = tuple(int(s / 0.875) for s in img_size)\n        return Compose([\n            Resize(resize_size),\n            CenterCrop(img_size),\n            ToTensor(),\n            normalize,\n        ])\n\n\ndef main():\n    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))\n    if len(sys.argv) == 2 and sys.argv[1].endswith(\".json\"):\n        model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))\n    else:\n        model_args, data_args, training_args = parser.parse_args_into_dataclasses()\n\n    # --- Hub authentication ---\n    from huggingface_hub import login\n    hf_token = os.environ.get(\"HF_TOKEN\") or os.environ.get(\"hfjob\")\n    if hf_token:\n        login(token=hf_token)\n        training_args.hub_token = hf_token\n        logger.info(\"Logged in to Hugging Face Hub\")\n    elif training_args.push_to_hub:\n        logger.warning(\"HF_TOKEN not found in environment. Hub push will likely fail.\")\n\n    # --- Trackio ---\n    trackio.init(project=training_args.output_dir, name=training_args.run_name)\n\n    # --- Logging ---\n    logging.basicConfig(\n        format=\"%(asctime)s - %(levelname)s - %(name)s - %(message)s\",\n        datefmt=\"%m/%d/%Y %H:%M:%S\",\n        handlers=[logging.StreamHandler(sys.stdout)],\n    )\n    if training_args.should_log:\n        transformers.utils.logging.set_verbosity_info()\n\n    log_level = training_args.get_process_log_level()\n    logger.setLevel(log_level)\n    transformers.utils.logging.set_verbosity(log_level)\n    transformers.utils.logging.enable_default_handler()\n    transformers.utils.logging.enable_explicit_format()\n\n    logger.warning(\n        f\"Process rank: {training_args.local_process_index}, device: {training_args.device}, \"\n        f\"n_gpu: {training_args.n_gpu}, distributed training: \"\n        f\"{training_args.parallel_mode.value == 'distributed'}, 16-bits training: {training_args.fp16}\"\n    )\n    logger.info(f\"Training/evaluation parameters {training_args}\")\n\n    # --- Load dataset ---\n    dataset = load_dataset(\n        data_args.dataset_name,\n        data_args.dataset_config_name,\n        cache_dir=model_args.cache_dir,\n        trust_remote_code=model_args.trust_remote_code,\n    )\n\n    # --- Resolve label column ---\n    label_col = data_args.label_column_name\n    if label_col not in dataset[\"train\"].column_names:\n        candidates = [c for c in dataset[\"train\"].column_names if c in (\"label\", \"labels\", \"class\", \"fine_label\")]\n        if candidates:\n            label_col = candidates[0]\n            logger.info(f\"Label column '{data_args.label_column_name}' not found, using '{label_col}'\")\n        else:\n            raise ValueError(\n                f\"Label column '{data_args.label_column_name}' not found. \"\n                f\"Available columns: {dataset['train'].column_names}\"\n            )\n\n    # --- Discover labels ---\n    label_feature = dataset[\"train\"].features[label_col]\n    if hasattr(label_feature, \"names\"):\n        label_names = label_feature.names\n    else:\n        unique_labels = sorted(set(dataset[\"train\"][label_col]))\n        if all(isinstance(l, str) for l in unique_labels):\n            label_names = unique_labels\n        else:\n            label_names = [str(l) for l in unique_labels]\n\n    num_labels = len(label_names)\n    id2label = dict(enumerate(label_names))\n    label2id = {v: k for k, v in id2label.items()}\n    logger.info(f\"Number of classes: {num_labels}\")\n\n    # --- Remap string labels to int if needed ---\n    sample_label = dataset[\"train\"][0][label_col]\n    if isinstance(sample_label, str):\n        logger.info(\"Remapping string labels to integer IDs\")\n        for split_name in list(dataset.keys()):\n            dataset[split_name] = dataset[split_name].map(\n                lambda ex: {label_col: label2id[ex[label_col]]},\n            )\n\n    # --- Shuffle + Train/val split ---\n    dataset[\"train\"] = dataset[\"train\"].shuffle(seed=training_args.seed)\n\n    data_args.train_val_split = None if \"validation\" in dataset else data_args.train_val_split\n    if isinstance(data_args.train_val_split, float) and data_args.train_val_split > 0.0:\n        split = dataset[\"train\"].train_test_split(data_args.train_val_split, seed=training_args.seed)\n        dataset[\"train\"] = split[\"train\"]\n        dataset[\"validation\"] = split[\"test\"]\n\n    # --- Truncate ---\n    if data_args.max_train_samples is not None:\n        max_train = min(data_args.max_train_samples, len(dataset[\"train\"]))\n        dataset[\"train\"] = dataset[\"train\"].select(range(max_train))\n        logger.info(f\"Truncated training set to {max_train} samples\")\n    if data_args.max_eval_samples is not None and \"validation\" in dataset:\n        max_eval = min(data_args.max_eval_samples, len(dataset[\"validation\"]))\n        dataset[\"validation\"] = dataset[\"validation\"].select(range(max_eval))\n        logger.info(f\"Truncated validation set to {max_eval} samples\")\n\n    # --- Load model & image processor ---\n    common_pretrained_args = {\n        \"cache_dir\": model_args.cache_dir,\n        \"revision\": model_args.model_revision,\n        \"token\": model_args.token,\n        \"trust_remote_code\": model_args.trust_remote_code,\n    }\n\n    config = AutoConfig.from_pretrained(\n        model_args.config_name or model_args.model_name_or_path,\n        num_labels=num_labels,\n        label2id=label2id,\n        id2label=id2label,\n        **common_pretrained_args,\n    )\n\n    model = AutoModelForImageClassification.from_pretrained(\n        model_args.model_name_or_path,\n        config=config,\n        ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,\n        **common_pretrained_args,\n    )\n\n    image_processor = AutoImageProcessor.from_pretrained(\n        model_args.image_processor_name or model_args.model_name_or_path,\n        **common_pretrained_args,\n    )\n\n    # --- Build transforms ---\n    train_transforms = build_transforms(image_processor, is_training=True)\n    val_transforms = build_transforms(image_processor, is_training=False)\n\n    image_col = data_args.image_column_name\n\n    def preprocess_train(examples):\n        return {\n            \"pixel_values\": [train_transforms(img.convert(\"RGB\")) for img in examples[image_col]],\n            \"labels\": examples[label_col],\n        }\n\n    def preprocess_val(examples):\n        return {\n            \"pixel_values\": [val_transforms(img.convert(\"RGB\")) for img in examples[image_col]],\n            \"labels\": examples[label_col],\n        }\n\n    dataset[\"train\"].set_transform(preprocess_train)\n    if \"validation\" in dataset:\n        dataset[\"validation\"].set_transform(preprocess_val)\n    if \"test\" in dataset:\n        dataset[\"test\"].set_transform(preprocess_val)\n\n    # --- Metrics ---\n    accuracy_metric = evaluate.load(\"accuracy\")\n\n    def compute_metrics(eval_pred: EvalPrediction):\n        predictions = np.argmax(eval_pred.predictions, axis=1)\n        return accuracy_metric.compute(predictions=predictions, references=eval_pred.label_ids)\n\n    # --- Trainer ---\n    eval_dataset = None\n    if training_args.do_eval:\n        if \"validation\" in dataset:\n            eval_dataset = dataset[\"validation\"]\n        elif \"test\" in dataset:\n            eval_dataset = dataset[\"test\"]\n\n    trainer = Trainer(\n        model=model,\n        args=training_args,\n        train_dataset=dataset[\"train\"] if training_args.do_train else None,\n        eval_dataset=eval_dataset,\n        processing_class=image_processor,\n        data_collator=DefaultDataCollator(),\n        compute_metrics=compute_metrics,\n    )\n\n    # --- Train ---\n    if training_args.do_train:\n        train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)\n        trainer.save_model()\n        trainer.log_metrics(\"train\", train_result.metrics)\n        trainer.save_metrics(\"train\", train_result.metrics)\n        trainer.save_state()\n\n    # --- Evaluate ---\n    if training_args.do_eval:\n        test_dataset = dataset.get(\"test\", dataset.get(\"validation\"))\n        test_prefix = \"test\" if \"test\" in dataset else \"eval\"\n        if test_dataset is not None:\n            metrics = trainer.evaluate(eval_dataset=test_dataset, metric_key_prefix=test_prefix)\n            trainer.log_metrics(test_prefix, metrics)\n            trainer.save_metrics(test_prefix, metrics)\n\n    trackio.finish()\n\n    # --- Push to Hub ---\n    kwargs = {\n        \"finetuned_from\": model_args.model_name_or_path,\n        \"dataset\": data_args.dataset_name,\n        \"tags\": [\"image-classification\", \"vision\"],\n    }\n    if training_args.push_to_hub:\n        trainer.push_to_hub(**kwargs)\n    else:\n        trainer.create_model_card(**kwargs)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/object_detection_training.py",
    "content": "# /// script\n# dependencies = [\n#     \"transformers>=5.2.0\",\n#     \"accelerate>=1.1.0\",\n#     \"albumentations >= 1.4.16\",\n#     \"timm\",\n#     \"datasets>=4.0\",\n#     \"torchmetrics\",\n#     \"pycocotools\",\n#     \"trackio\",\n#     \"huggingface_hub\",\n# ]\n# ///\n\n\"\"\"Finetuning any 🤗 Transformers model supported by AutoModelForObjectDetection for object detection leveraging the Trainer API.\"\"\"\n\nimport logging\nimport math\nimport os\nimport sys\nfrom collections.abc import Mapping\nfrom dataclasses import dataclass, field\nfrom functools import partial\nfrom typing import Any\n\nimport albumentations as A\nimport numpy as np\nimport torch\nfrom datasets import load_dataset\nfrom torchmetrics.detection.mean_ap import MeanAveragePrecision\n\nimport trackio\n\nimport transformers\nfrom transformers import (\n    AutoConfig,\n    AutoImageProcessor,\n    AutoModelForObjectDetection,\n    HfArgumentParser,\n    Trainer,\n    TrainingArguments,\n)\nfrom transformers.image_processing_utils import BatchFeature\nfrom transformers.image_transforms import center_to_corners_format\nfrom transformers.trainer import EvalPrediction\nfrom transformers.utils import check_min_version\nfrom transformers.utils.versions import require_version\n\n\nlogger = logging.getLogger(__name__)\n\n# Will error if the minimal version of Transformers is not installed. Remove at your own risks.\ncheck_min_version(\"4.57.0.dev0\")\n\nrequire_version(\"datasets>=2.0.0\", \"To fix: pip install -r examples/pytorch/object-detection/requirements.txt\")\n\n\n@dataclass\nclass ModelOutput:\n    logits: torch.Tensor\n    pred_boxes: torch.Tensor\n\n\ndef format_image_annotations_as_coco(\n    image_id: str, categories: list[int], areas: list[float], bboxes: list[tuple[float]]\n) -> dict:\n    \"\"\"Format one set of image annotations to the COCO format\n\n    Args:\n        image_id (str): image id. e.g. \"0001\"\n        categories (list[int]): list of categories/class labels corresponding to provided bounding boxes\n        areas (list[float]): list of corresponding areas to provided bounding boxes\n        bboxes (list[tuple[float]]): list of bounding boxes provided in COCO format\n            ([center_x, center_y, width, height] in absolute coordinates)\n\n    Returns:\n        dict: {\n            \"image_id\": image id,\n            \"annotations\": list of formatted annotations\n        }\n    \"\"\"\n    annotations = []\n    for category, area, bbox in zip(categories, areas, bboxes):\n        formatted_annotation = {\n            \"image_id\": image_id,\n            \"category_id\": category,\n            \"iscrowd\": 0,\n            \"area\": area,\n            \"bbox\": list(bbox),\n        }\n        annotations.append(formatted_annotation)\n\n    return {\n        \"image_id\": image_id,\n        \"annotations\": annotations,\n    }\n\n\ndef detect_bbox_format_from_samples(dataset, image_col=\"image\", objects_col=\"objects\", num_samples=50):\n    \"\"\"\n    Detect whether bboxes are xyxy (Pascal VOC) or xywh (COCO) by checking\n    bbox coordinates against image dimensions. The correct format interpretation\n    should keep bboxes within image bounds.\n    \"\"\"\n    exceeds_if_xywh = 0\n    exceeds_if_xyxy = 0\n    total = 0\n\n    for example in dataset.select(range(min(num_samples, len(dataset)))):\n        img_w, img_h = example[image_col].size\n        for bbox in example[objects_col][\"bbox\"]:\n            if len(bbox) != 4:\n                continue\n            a, b, c, d = float(bbox[0]), float(bbox[1]), float(bbox[2]), float(bbox[3])\n            total += 1\n\n            # If 3rd < 1st or 4th < 2nd, can't be xyxy (x_max must exceed x_min)\n            if c < a or d < b:\n                return \"xywh\"\n\n            # xywh: right/bottom edge = origin + size; exceeding image → wrong format\n            if a + c > img_w * 1.05:\n                exceeds_if_xywh += 1\n            if b + d > img_h * 1.05:\n                exceeds_if_xywh += 1\n            # xyxy: right/bottom edge = coordinate itself\n            if c > img_w * 1.05:\n                exceeds_if_xyxy += 1\n            if d > img_h * 1.05:\n                exceeds_if_xyxy += 1\n\n    if total == 0:\n        return \"xywh\"\n\n    fmt = \"xyxy\" if exceeds_if_xywh > exceeds_if_xyxy else \"xywh\"\n    logger.info(\n        f\"Detected bbox format: {fmt} (checked {total} bboxes from {min(num_samples, len(dataset))} images)\"\n    )\n    return fmt\n\n\ndef sanitize_dataset(dataset, bbox_format=\"xywh\", image_col=\"image\", objects_col=\"objects\"):\n    \"\"\"\n    Validate bboxes, convert xyxy→xywh if needed, clip to image bounds, and remove\n    entries with non-finite values, non-positive dimensions, or degenerate area (<1 px).\n    Drops images with no remaining valid bboxes.\n    \"\"\"\n    convert_xyxy = bbox_format == \"xyxy\"\n\n    def _validate(example):\n        img_w, img_h = example[image_col].size\n        objects = example[objects_col]\n        bboxes = objects[\"bbox\"]\n        n = len(bboxes)\n\n        valid_indices = []\n        converted_bboxes = []\n\n        for i, bbox in enumerate(bboxes):\n            if len(bbox) != 4:\n                continue\n            vals = [float(v) for v in bbox]\n            if not all(math.isfinite(v) for v in vals):\n                continue\n\n            if convert_xyxy:\n                x_min, y_min, x_max, y_max = vals\n                w, h = x_max - x_min, y_max - y_min\n            else:\n                x_min, y_min, w, h = vals\n\n            if w <= 0 or h <= 0:\n                continue\n\n            x_min, y_min = max(0.0, x_min), max(0.0, y_min)\n            if x_min >= img_w or y_min >= img_h:\n                continue\n            w = min(w, img_w - x_min)\n            h = min(h, img_h - y_min)\n\n            if w * h < 1.0:\n                continue\n\n            valid_indices.append(i)\n            converted_bboxes.append([x_min, y_min, w, h])\n\n        # Rebuild objects dict, filtering all list-valued fields by valid_indices\n        new_objects = {}\n        for key, value in objects.items():\n            if key == \"bbox\":\n                new_objects[\"bbox\"] = converted_bboxes\n            elif isinstance(value, list) and len(value) == n:\n                new_objects[key] = [value[j] for j in valid_indices]\n            else:\n                new_objects[key] = value\n\n        if \"area\" not in new_objects or len(new_objects.get(\"area\", [])) != len(converted_bboxes):\n            new_objects[\"area\"] = [b[2] * b[3] for b in converted_bboxes]\n\n        example[objects_col] = new_objects\n        return example\n\n    before = len(dataset)\n    dataset = dataset.map(_validate)\n    dataset = dataset.filter(lambda ex: len(ex[objects_col][\"bbox\"]) > 0)\n    after = len(dataset)\n    if before != after:\n        logger.warning(f\"Dropped {before - after}/{before} images with no valid bboxes after sanitization\")\n    logger.info(f\"Bbox sanitization complete: {after} images with valid bboxes remain\")\n    return dataset\n\n\ndef convert_bbox_yolo_to_pascal(boxes: torch.Tensor, image_size: tuple[int, int]) -> torch.Tensor:\n    \"\"\"\n    Convert bounding boxes from YOLO format (x_center, y_center, width, height) in range [0, 1]\n    to Pascal VOC format (x_min, y_min, x_max, y_max) in absolute coordinates.\n\n    Args:\n        boxes (torch.Tensor): Bounding boxes in YOLO format\n        image_size (tuple[int, int]): Image size in format (height, width)\n\n    Returns:\n        torch.Tensor: Bounding boxes in Pascal VOC format (x_min, y_min, x_max, y_max)\n    \"\"\"\n    # convert center to corners format\n    boxes = center_to_corners_format(boxes)\n\n\n    if isinstance(image_size, torch.Tensor):\n        image_size = image_size.tolist()\n    elif isinstance(image_size, np.ndarray):\n        image_size = image_size.tolist()\n    height, width = image_size\n    boxes = boxes * torch.tensor([[width, height, width, height]])\n\n    return boxes\n\n\ndef augment_and_transform_batch(\n    examples: Mapping[str, Any],\n    transform: A.Compose,\n    image_processor: AutoImageProcessor,\n    return_pixel_mask: bool = False,\n) -> BatchFeature:\n    \"\"\"Apply augmentations and format annotations in COCO format for object detection task\"\"\"\n\n    images = []\n    annotations = []\n    image_ids = examples[\"image_id\"] if \"image_id\" in examples else range(len(examples[\"image\"]))\n    for image_id, image, objects in zip(image_ids, examples[\"image\"], examples[\"objects\"]):\n        image = np.array(image.convert(\"RGB\"))\n\n        # Filter invalid bboxes before augmentation (safety net after sanitize_dataset)\n        bboxes = objects[\"bbox\"]\n        categories = objects[\"category\"]\n        areas = objects[\"area\"]\n        valid = [\n            (b, c, a)\n            for b, c, a in zip(bboxes, categories, areas)\n            if len(b) == 4 and b[2] > 0 and b[3] > 0 and b[0] >= 0 and b[1] >= 0\n        ]\n        if valid:\n            bboxes, categories, areas = zip(*valid)\n        else:\n            bboxes, categories, areas = [], [], []\n\n        # apply augmentations\n        output = transform(image=image, bboxes=list(bboxes), category=list(categories))\n        images.append(output[\"image\"])\n\n        # format annotations in COCO format (recompute areas from post-augmentation bboxes)\n        post_areas = [b[2] * b[3] for b in output[\"bboxes\"]] if output[\"bboxes\"] else []\n        formatted_annotations = format_image_annotations_as_coco(\n            image_id, output[\"category\"], post_areas, output[\"bboxes\"]\n        )\n        annotations.append(formatted_annotations)\n\n    # Apply the image processor transformations: resizing, rescaling, normalization\n    result = image_processor(images=images, annotations=annotations, return_tensors=\"pt\")\n\n    if not return_pixel_mask:\n        result.pop(\"pixel_mask\", None)\n\n    return result\n\n\ndef collate_fn(batch: list[BatchFeature]) -> Mapping[str, torch.Tensor | list[Any]]:\n    data = {}\n    data[\"pixel_values\"] = torch.stack([x[\"pixel_values\"] for x in batch])\n    data[\"labels\"] = [x[\"labels\"] for x in batch]\n    if \"pixel_mask\" in batch[0]:\n        data[\"pixel_mask\"] = torch.stack([x[\"pixel_mask\"] for x in batch])\n    return data\n\n\n@torch.no_grad()\ndef compute_metrics(\n    evaluation_results: EvalPrediction,\n    image_processor: AutoImageProcessor,\n    threshold: float = 0.0,\n    id2label: Mapping[int, str] | None = None,\n) -> Mapping[str, float]:\n    \"\"\"\n    Compute mean average mAP, mAR and their variants for the object detection task.\n\n    Args:\n        evaluation_results (EvalPrediction): Predictions and targets from evaluation.\n        threshold (float, optional): Threshold to filter predicted boxes by confidence. Defaults to 0.0.\n        id2label (Optional[dict], optional): Mapping from class id to class name. Defaults to None.\n\n    Returns:\n        Mapping[str, float]: Metrics in a form of dictionary {<metric_name>: <metric_value>}\n    \"\"\"\n\n    predictions, targets = evaluation_results.predictions, evaluation_results.label_ids\n\n    # For metric computation we need to provide:\n    #  - targets in a form of list of dictionaries with keys \"boxes\", \"labels\"\n    #  - predictions in a form of list of dictionaries with keys \"boxes\", \"scores\", \"labels\"\n\n    image_sizes = []\n    post_processed_targets = []\n    post_processed_predictions = []\n\n    # Collect targets in the required format for metric computation\n    for batch in targets:\n        # collect image sizes, we will need them for predictions post processing\n        batch_image_sizes = torch.tensor([x[\"orig_size\"] for x in batch])\n        image_sizes.append(batch_image_sizes)\n        # collect targets in the required format for metric computation\n        # boxes were converted to YOLO format needed for model training\n        # here we will convert them to Pascal VOC format (x_min, y_min, x_max, y_max)\n        for image_target in batch:\n            boxes = torch.tensor(image_target[\"boxes\"])\n            boxes = convert_bbox_yolo_to_pascal(boxes, image_target[\"orig_size\"])\n            labels = torch.tensor(image_target[\"class_labels\"])\n            post_processed_targets.append({\"boxes\": boxes, \"labels\": labels})\n\n    # Collect predictions in the required format for metric computation,\n    # model produce boxes in YOLO format, then image_processor convert them to Pascal VOC format\n    for batch, target_sizes in zip(predictions, image_sizes):\n        batch_logits, batch_boxes = batch[1], batch[2]\n        output = ModelOutput(logits=torch.tensor(batch_logits), pred_boxes=torch.tensor(batch_boxes))\n        post_processed_output = image_processor.post_process_object_detection(\n            output, threshold=threshold, target_sizes=target_sizes\n        )\n        post_processed_predictions.extend(post_processed_output)\n\n    # Compute metrics\n    metric = MeanAveragePrecision(box_format=\"xyxy\", class_metrics=True)\n    metric.update(post_processed_predictions, post_processed_targets)\n    metrics = metric.compute()\n\n    # Replace list of per class metrics with separate metric for each class\n    classes = metrics.pop(\"classes\")\n    map_per_class = metrics.pop(\"map_per_class\")\n    mar_100_per_class = metrics.pop(\"mar_100_per_class\")\n    # Single-class datasets return 0-d scalar tensors; make them iterable\n    if classes.dim() == 0:\n        classes = classes.unsqueeze(0)\n        map_per_class = map_per_class.unsqueeze(0)\n        mar_100_per_class = mar_100_per_class.unsqueeze(0)\n    for class_id, class_map, class_mar in zip(classes, map_per_class, mar_100_per_class):\n        class_name = id2label[class_id.item()] if id2label is not None else class_id.item()\n        metrics[f\"map_{class_name}\"] = class_map\n        metrics[f\"mar_100_{class_name}\"] = class_mar\n\n    metrics = {k: round(v.item(), 4) for k, v in metrics.items()}\n\n    return metrics\n\n\n@dataclass\nclass DataTrainingArguments:\n    \"\"\"\n    Arguments pertaining to what data we are going to input our model for training and eval.\n    Using `HfArgumentParser` we can turn this class into argparse arguments to be able to specify\n    them on the command line.\n    \"\"\"\n\n    dataset_name: str = field(\n        default=\"cppe-5\",\n        metadata={\n            \"help\": \"Name of a dataset from the hub (could be your own, possibly private dataset hosted on the hub).\"\n        },\n    )\n    dataset_config_name: str | None = field(\n        default=None, metadata={\"help\": \"The configuration name of the dataset to use (via the datasets library).\"}\n    )\n    train_val_split: float | None = field(\n        default=0.15, metadata={\"help\": \"Percent to split off of train for validation.\"}\n    )\n    image_square_size: int | None = field(\n        default=600,\n        metadata={\"help\": \"Image longest size will be resized to this value, then image will be padded to square.\"},\n    )\n    max_train_samples: int | None = field(\n        default=None,\n        metadata={\n            \"help\": (\n                \"For debugging purposes or quicker training, truncate the number of training examples to this \"\n                \"value if set.\"\n            )\n        },\n    )\n    max_eval_samples: int | None = field(\n        default=None,\n        metadata={\n            \"help\": (\n                \"For debugging purposes or quicker training, truncate the number of evaluation examples to this \"\n                \"value if set.\"\n            )\n        },\n    )\n    use_fast: bool | None = field(\n        default=True,\n        metadata={\"help\": \"Use a fast torchvision-base image processor if it is supported for a given model.\"},\n    )\n\n\n@dataclass\nclass ModelArguments:\n    \"\"\"\n    Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.\n    \"\"\"\n\n    model_name_or_path: str = field(\n        default=\"facebook/detr-resnet-50\",\n        metadata={\"help\": \"Path to pretrained model or model identifier from huggingface.co/models\"},\n    )\n    config_name: str | None = field(\n        default=None, metadata={\"help\": \"Pretrained config name or path if not the same as model_name\"}\n    )\n    cache_dir: str | None = field(\n        default=None, metadata={\"help\": \"Where do you want to store the pretrained models downloaded from s3\"}\n    )\n    model_revision: str = field(\n        default=\"main\",\n        metadata={\"help\": \"The specific model version to use (can be a branch name, tag name or commit id).\"},\n    )\n    image_processor_name: str = field(default=None, metadata={\"help\": \"Name or path of preprocessor config.\"})\n    ignore_mismatched_sizes: bool = field(\n        default=True,\n        metadata={\n            \"help\": \"Whether or not to raise an error if some of the weights from the checkpoint do not have the same size as the weights of the model (if for instance, you are instantiating a model with 10 labels from a checkpoint with 3 labels).\"\n        },\n    )\n    token: str = field(\n        default=None,\n        metadata={\n            \"help\": (\n                \"The token to use as HTTP bearer authorization for remote files. If not specified, will use the token \"\n                \"generated when running `hf auth login` (stored in `~/.huggingface`).\"\n            )\n        },\n    )\n    trust_remote_code: bool = field(\n        default=False,\n        metadata={\n            \"help\": (\n                \"Whether to trust the execution of code from datasets/models defined on the Hub.\"\n                \" This option should only be set to `True` for repositories you trust and in which you have read the\"\n                \" code, as it will execute code present on the Hub on your local machine.\"\n            )\n        },\n    )\n\n\ndef main():\n    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))\n    if len(sys.argv) == 2 and sys.argv[1].endswith(\".json\"):\n\n        model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))\n    else:\n        model_args, data_args, training_args = parser.parse_args_into_dataclasses()\n\n\n    from huggingface_hub import login\n    hf_token = os.environ.get(\"HF_TOKEN\") or os.environ.get(\"hfjob\")\n    if hf_token:\n        login(token=hf_token)\n        training_args.hub_token = hf_token\n        logger.info(\"Logged in to Hugging Face Hub\")\n    elif training_args.push_to_hub:\n        logger.warning(\"HF_TOKEN not found in environment. Hub push will likely fail.\")\n\n    # Initialize Trackio for real-time experiment tracking\n    trackio.init(project=training_args.output_dir, name=training_args.run_name)\n\n    logging.basicConfig(\n        format=\"%(asctime)s - %(levelname)s - %(name)s - %(message)s\",\n        datefmt=\"%m/%d/%Y %H:%M:%S\",\n        handlers=[logging.StreamHandler(sys.stdout)],\n    )\n\n    if training_args.should_log:\n        # The default of training_args.log_level is passive, so we set log level at info here to have that default.\n        transformers.utils.logging.set_verbosity_info()\n\n    log_level = training_args.get_process_log_level()\n    logger.setLevel(log_level)\n    transformers.utils.logging.set_verbosity(log_level)\n    transformers.utils.logging.enable_default_handler()\n    transformers.utils.logging.enable_explicit_format()\n\n    # Log on each process the small summary:\n    logger.warning(\n        f\"Process rank: {training_args.local_process_index}, device: {training_args.device}, n_gpu: {training_args.n_gpu}, \"\n        + f\"distributed training: {training_args.parallel_mode.value == 'distributed'}, 16-bits training: {training_args.fp16}\"\n    )\n    logger.info(f\"Training/evaluation parameters {training_args}\")\n\n    dataset = load_dataset(\n        data_args.dataset_name, cache_dir=model_args.cache_dir, trust_remote_code=model_args.trust_remote_code\n    )\n\n    bbox_format = detect_bbox_format_from_samples(dataset[\"train\"])\n    if bbox_format == \"xyxy\":\n        logger.info(\"Converting bboxes from xyxy (Pascal VOC) → xywh (COCO) format across all splits\")\n    for split_name in list(dataset.keys()):\n        dataset[split_name] = sanitize_dataset(dataset[split_name], bbox_format=bbox_format)\n\n    for split_name in list(dataset.keys()):\n        if \"image_id\" not in dataset[split_name].column_names:\n            dataset[split_name] = dataset[split_name].add_column(\n                \"image_id\", list(range(len(dataset[split_name])))\n            )\n\n    dataset[\"train\"] = dataset[\"train\"].shuffle(seed=training_args.seed)\n\n    data_args.train_val_split = None if \"validation\" in dataset else data_args.train_val_split\n    if isinstance(data_args.train_val_split, float) and data_args.train_val_split > 0.0:\n        split = dataset[\"train\"].train_test_split(data_args.train_val_split, seed=training_args.seed)\n        dataset[\"train\"] = split[\"train\"]\n        dataset[\"validation\"] = split[\"test\"]\n\n    categories = None\n    try:\n        if isinstance(dataset[\"train\"].features[\"objects\"], dict):\n            cat_feature = dataset[\"train\"].features[\"objects\"][\"category\"].feature\n        else:\n            cat_feature = dataset[\"train\"].features[\"objects\"].feature[\"category\"]\n\n        if hasattr(cat_feature, \"names\"):\n            categories = cat_feature.names\n    except (AttributeError, KeyError):\n        pass\n\n    if categories is None:\n        # Category is a Value type (not ClassLabel) — scan dataset to discover labels\n        logger.info(\"Category feature is not ClassLabel — scanning dataset to discover category labels...\")\n        unique_cats = set()\n        for example in dataset[\"train\"]:\n            cats = example[\"objects\"][\"category\"]\n            if isinstance(cats, list):\n                unique_cats.update(cats)\n            else:\n                unique_cats.add(cats)\n\n        if all(isinstance(c, int) for c in unique_cats):\n            max_cat = max(unique_cats)\n            categories = [f\"class_{i}\" for i in range(max_cat + 1)]\n        elif all(isinstance(c, str) for c in unique_cats):\n            categories = sorted(unique_cats)\n        else:\n            categories = [str(c) for c in sorted(unique_cats, key=str)]\n        logger.info(f\"Discovered {len(categories)} categories: {categories}\")\n\n    id2label = dict(enumerate(categories))\n    label2id = {v: k for k, v in id2label.items()}\n\n    # Remap string categories to integer IDs if needed\n    sample_cats = dataset[\"train\"][0][\"objects\"][\"category\"]\n    if sample_cats and isinstance(sample_cats[0], str):\n        logger.info(f\"Remapping string categories to integer IDs: {label2id}\")\n\n        def _remap_categories(example):\n            objects = example[\"objects\"]\n            objects[\"category\"] = [label2id[c] for c in objects[\"category\"]]\n            example[\"objects\"] = objects\n            return example\n\n        for split_name in list(dataset.keys()):\n            dataset[split_name] = dataset[split_name].map(_remap_categories)\n        logger.info(\"Category remapping complete\")\n\n    if data_args.max_train_samples is not None:\n        max_train = min(data_args.max_train_samples, len(dataset[\"train\"]))\n        dataset[\"train\"] = dataset[\"train\"].select(range(max_train))\n        logger.info(f\"Truncated training set to {max_train} samples\")\n    if data_args.max_eval_samples is not None and \"validation\" in dataset:\n        max_eval = min(data_args.max_eval_samples, len(dataset[\"validation\"]))\n        dataset[\"validation\"] = dataset[\"validation\"].select(range(max_eval))\n        logger.info(f\"Truncated validation set to {max_eval} samples\")\n\n    common_pretrained_args = {\n        \"cache_dir\": model_args.cache_dir,\n        \"revision\": model_args.model_revision,\n        \"token\": model_args.token,\n        \"trust_remote_code\": model_args.trust_remote_code,\n    }\n    config = AutoConfig.from_pretrained(\n        model_args.config_name or model_args.model_name_or_path,\n        label2id=label2id,\n        id2label=id2label,\n        **common_pretrained_args,\n    )\n    model = AutoModelForObjectDetection.from_pretrained(\n        model_args.model_name_or_path,\n        config=config,\n        ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,\n        **common_pretrained_args,\n    )\n    image_processor = AutoImageProcessor.from_pretrained(\n        model_args.image_processor_name or model_args.model_name_or_path,\n        do_resize=True,\n        size={\"max_height\": data_args.image_square_size, \"max_width\": data_args.image_square_size},\n        do_pad=True,\n        pad_size={\"height\": data_args.image_square_size, \"width\": data_args.image_square_size},\n        use_fast=data_args.use_fast,\n        **common_pretrained_args,\n    )\n\n    max_size = data_args.image_square_size\n    train_augment_and_transform = A.Compose(\n        [\n            A.Compose(\n                [\n                    A.SmallestMaxSize(max_size=max_size, p=1.0),\n                    A.RandomSizedBBoxSafeCrop(height=max_size, width=max_size, p=1.0),\n                ],\n                p=0.2,\n            ),\n            A.OneOf(\n                [\n                    A.Blur(blur_limit=7, p=0.5),\n                    A.MotionBlur(blur_limit=7, p=0.5),\n                    A.Defocus(radius=(1, 5), alias_blur=(0.1, 0.25), p=0.1),\n                ],\n                p=0.1,\n            ),\n            A.Perspective(p=0.1),\n            A.HorizontalFlip(p=0.5),\n            A.RandomBrightnessContrast(p=0.5),\n            A.HueSaturationValue(p=0.1),\n        ],\n        bbox_params=A.BboxParams(format=\"coco\", label_fields=[\"category\"], clip=True, min_area=25),\n    )\n    validation_transform = A.Compose(\n        [A.NoOp()],\n        bbox_params=A.BboxParams(format=\"coco\", label_fields=[\"category\"], clip=True),\n    )\n\n    train_transform_batch = partial(\n        augment_and_transform_batch, transform=train_augment_and_transform, image_processor=image_processor\n    )\n    validation_transform_batch = partial(\n        augment_and_transform_batch, transform=validation_transform, image_processor=image_processor\n    )\n\n    dataset[\"train\"] = dataset[\"train\"].with_transform(train_transform_batch)\n    dataset[\"validation\"] = dataset[\"validation\"].with_transform(validation_transform_batch)\n    if \"test\" in dataset:\n        dataset[\"test\"] = dataset[\"test\"].with_transform(validation_transform_batch)\n\n\n    eval_compute_metrics_fn = partial(\n        compute_metrics, image_processor=image_processor, id2label=id2label, threshold=0.0\n    )\n\n    trainer = Trainer(\n        model=model,\n        args=training_args,\n        train_dataset=dataset[\"train\"] if training_args.do_train else None,\n        eval_dataset=dataset[\"validation\"] if training_args.do_eval else None,\n        processing_class=image_processor,\n        data_collator=collate_fn,\n        compute_metrics=eval_compute_metrics_fn,\n    )\n\n    # Training\n    if training_args.do_train:\n        train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)\n        trainer.save_model()\n        trainer.log_metrics(\"train\", train_result.metrics)\n        trainer.save_metrics(\"train\", train_result.metrics)\n        trainer.save_state()\n\n    if training_args.do_eval:\n        test_dataset = dataset[\"test\"] if \"test\" in dataset else dataset[\"validation\"]\n        test_prefix = \"test\" if \"test\" in dataset else \"eval\"\n        metrics = trainer.evaluate(eval_dataset=test_dataset, metric_key_prefix=test_prefix)\n        trainer.log_metrics(test_prefix, metrics)\n        trainer.save_metrics(test_prefix, metrics)\n\n    trackio.finish()\n\n    kwargs = {\n        \"finetuned_from\": model_args.model_name_or_path,\n        \"dataset\": data_args.dataset_name,\n        \"tags\": [\"object-detection\", \"vision\"],\n    }\n    if training_args.push_to_hub:\n        trainer.push_to_hub(**kwargs)\n    else:\n        trainer.create_model_card(**kwargs)\n\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "skills/hugging-face-vision-trainer/scripts/sam_segmentation_training.py",
    "content": "# /// script\n# dependencies = [\n#     \"transformers>=5.2.0\",\n#     \"accelerate>=1.1.0\",\n#     \"datasets>=4.0\",\n#     \"torchvision\",\n#     \"monai\",\n#     \"trackio\",\n#     \"huggingface_hub\",\n# ]\n# ///\n\n\"\"\"Fine-tune SAM or SAM2 for segmentation using bounding-box or point prompts with the HF Trainer API.\"\"\"\n\nimport json\nimport logging\nimport math\nimport os\nimport sys\nfrom dataclasses import dataclass, field\nfrom typing import Any\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom datasets import load_dataset\nfrom torch.utils.data import Dataset\n\nimport monai\nimport trackio\n\nimport transformers\nfrom transformers import (\n    HfArgumentParser,\n    Trainer,\n    TrainingArguments,\n)\nfrom transformers.utils import check_min_version\n\nlogger = logging.getLogger(__name__)\n\ncheck_min_version(\"4.57.0.dev0\")\n\n\n# ---------------------------------------------------------------------------\n# Dataset wrapper\n# ---------------------------------------------------------------------------\n\nclass SAMSegmentationDataset(Dataset):\n    \"\"\"Wraps a HF dataset into the format expected by SAM/SAM2 processors.\n\n    Each sample must contain an image, a binary mask, and a prompt (bbox or\n    point).  Prompts are read from a JSON-encoded ``prompt`` column or from\n    dedicated ``bbox`` / ``point`` columns.\n    \"\"\"\n\n    def __init__(self, dataset, processor, prompt_type: str,\n                 image_col: str, mask_col: str, prompt_col: str | None,\n                 bbox_col: str | None, point_col: str | None):\n        self.dataset = dataset\n        self.processor = processor\n        self.prompt_type = prompt_type\n        self.image_col = image_col\n        self.mask_col = mask_col\n        self.prompt_col = prompt_col\n        self.bbox_col = bbox_col\n        self.point_col = point_col\n\n    def __len__(self):\n        return len(self.dataset)\n\n    def _extract_prompt(self, item):\n        if self.prompt_col and self.prompt_col in item:\n            raw = item[self.prompt_col]\n            parsed = json.loads(raw) if isinstance(raw, str) else raw\n            if self.prompt_type == \"bbox\":\n                return parsed.get(\"bbox\") or parsed.get(\"box\")\n            return parsed.get(\"point\") or parsed.get(\"points\")\n\n        if self.prompt_type == \"bbox\" and self.bbox_col:\n            return item[self.bbox_col]\n        if self.prompt_type == \"point\" and self.point_col:\n            return item[self.point_col]\n        raise ValueError(\"Could not extract prompt from sample\")\n\n    def __getitem__(self, idx):\n        item = self.dataset[idx]\n        image = item[self.image_col]\n        prompt = self._extract_prompt(item)\n\n        if self.prompt_type == \"bbox\":\n            inputs = self.processor(image, input_boxes=[[prompt]], return_tensors=\"pt\")\n        else:\n            if isinstance(prompt[0], (int, float)):\n                prompt = [prompt]\n            inputs = self.processor(image, input_points=[[prompt]], return_tensors=\"pt\")\n\n        mask = np.array(item[self.mask_col])\n        if mask.ndim == 3:\n            mask = mask[:, :, 0]\n        inputs[\"labels\"] = (mask > 0).astype(np.float32)\n        inputs[\"original_image_size\"] = torch.tensor(image.size[::-1])\n        return inputs\n\n\ndef collate_fn(batch):\n    pixel_values = torch.cat([item[\"pixel_values\"] for item in batch], dim=0)\n    original_sizes = torch.stack([item[\"original_sizes\"] for item in batch])\n    original_image_size = torch.stack([item[\"original_image_size\"] for item in batch])\n\n    has_boxes = \"input_boxes\" in batch[0]\n    has_points = \"input_points\" in batch[0]\n\n    labels = torch.cat(\n        [\n            F.interpolate(\n                torch.as_tensor(x[\"labels\"]).unsqueeze(0).unsqueeze(0).float(),\n                size=(256, 256),\n                mode=\"nearest\",\n            )\n            for x in batch\n        ],\n        dim=0,\n    ).long()\n\n    result = {\n        \"pixel_values\": pixel_values,\n        \"original_sizes\": original_sizes,\n        \"labels\": labels,\n        \"original_image_size\": original_image_size,\n        \"multimask_output\": False,\n    }\n\n    if has_boxes:\n        result[\"input_boxes\"] = torch.cat([item[\"input_boxes\"] for item in batch], dim=0)\n    if has_points:\n        result[\"input_points\"] = torch.cat([item[\"input_points\"] for item in batch], dim=0)\n        if \"input_labels\" in batch[0]:\n            result[\"input_labels\"] = torch.cat([item[\"input_labels\"] for item in batch], dim=0)\n\n    return result\n\n\n# ---------------------------------------------------------------------------\n# Custom loss (SAM/SAM2 don't compute loss in forward())\n# ---------------------------------------------------------------------------\n\nseg_loss = monai.losses.DiceCELoss(sigmoid=True, squared_pred=True, reduction=\"mean\")\n\n\ndef compute_loss(outputs, labels, num_items_in_batch=None):\n    predicted_masks = outputs.pred_masks.squeeze(1)\n    return seg_loss(predicted_masks, labels.float())\n\n\n# ---------------------------------------------------------------------------\n# CLI arguments\n# ---------------------------------------------------------------------------\n\n@dataclass\nclass DataTrainingArguments:\n    dataset_name: str = field(\n        default=\"merve/MicroMat-mini\",\n        metadata={\"help\": \"Hub dataset ID.\"},\n    )\n    dataset_config_name: str | None = field(\n        default=None,\n        metadata={\"help\": \"Dataset config name.\"},\n    )\n    train_val_split: float | None = field(\n        default=0.1,\n        metadata={\"help\": \"Fraction to split off for validation (used when no validation split exists).\"},\n    )\n    max_train_samples: int | None = field(\n        default=None,\n        metadata={\"help\": \"Truncate training set (for quick tests).\"},\n    )\n    max_eval_samples: int | None = field(\n        default=None,\n        metadata={\"help\": \"Truncate evaluation set.\"},\n    )\n    image_column_name: str = field(\n        default=\"image\",\n        metadata={\"help\": \"Column containing PIL images.\"},\n    )\n    mask_column_name: str = field(\n        default=\"mask\",\n        metadata={\"help\": \"Column containing ground-truth binary masks.\"},\n    )\n    prompt_column_name: str | None = field(\n        default=\"prompt\",\n        metadata={\"help\": \"Column with JSON-encoded prompt (bbox/point). Set to '' to disable.\"},\n    )\n    bbox_column_name: str | None = field(\n        default=None,\n        metadata={\"help\": \"Column with bbox prompt ([x0,y0,x1,y1]). Used when prompt_column_name is unset.\"},\n    )\n    point_column_name: str | None = field(\n        default=None,\n        metadata={\"help\": \"Column with point prompt ([x,y] or [[x,y],...]). Used when prompt_column_name is unset.\"},\n    )\n    prompt_type: str = field(\n        default=\"bbox\",\n        metadata={\"help\": \"Prompt type: 'bbox' or 'point'.\"},\n    )\n\n\n@dataclass\nclass ModelArguments:\n    model_name_or_path: str = field(\n        default=\"facebook/sam2.1-hiera-small\",\n        metadata={\"help\": \"Pretrained SAM/SAM2 model identifier.\"},\n    )\n    cache_dir: str | None = field(default=None, metadata={\"help\": \"Cache directory.\"})\n    model_revision: str = field(default=\"main\", metadata={\"help\": \"Model revision.\"})\n    token: str | None = field(default=None, metadata={\"help\": \"Auth token.\"})\n    trust_remote_code: bool = field(default=False, metadata={\"help\": \"Trust remote code.\"})\n    freeze_vision_encoder: bool = field(\n        default=True,\n        metadata={\"help\": \"Freeze vision encoder weights.\"},\n    )\n    freeze_prompt_encoder: bool = field(\n        default=True,\n        metadata={\"help\": \"Freeze prompt encoder weights.\"},\n    )\n\n\n# ---------------------------------------------------------------------------\n# Main\n# ---------------------------------------------------------------------------\n\ndef main():\n    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))\n    parser.set_defaults(per_device_train_batch_size=4, num_train_epochs=30)\n    if len(sys.argv) == 2 and sys.argv[1].endswith(\".json\"):\n        model_args, data_args, training_args = parser.parse_json_file(\n            json_file=os.path.abspath(sys.argv[1])\n        )\n    else:\n        model_args, data_args, training_args = parser.parse_args_into_dataclasses()\n\n    from huggingface_hub import login\n    hf_token = os.environ.get(\"HF_TOKEN\") or os.environ.get(\"hfjob\")\n    if hf_token:\n        login(token=hf_token)\n        training_args.hub_token = hf_token\n        logger.info(\"Logged in to Hugging Face Hub\")\n    elif training_args.push_to_hub:\n        logger.warning(\"HF_TOKEN not found in environment. Hub push will likely fail.\")\n\n    trackio.init(project=training_args.output_dir, name=training_args.run_name)\n\n    logging.basicConfig(\n        format=\"%(asctime)s - %(levelname)s - %(name)s - %(message)s\",\n        datefmt=\"%m/%d/%Y %H:%M:%S\",\n        handlers=[logging.StreamHandler(sys.stdout)],\n    )\n    if training_args.should_log:\n        transformers.utils.logging.set_verbosity_info()\n\n    log_level = training_args.get_process_log_level()\n    logger.setLevel(log_level)\n    transformers.utils.logging.set_verbosity(log_level)\n    transformers.utils.logging.enable_default_handler()\n    transformers.utils.logging.enable_explicit_format()\n\n    logger.info(f\"Training/evaluation parameters {training_args}\")\n\n    # ---- Load dataset ----\n    dataset = load_dataset(\n        data_args.dataset_name,\n        data_args.dataset_config_name,\n        cache_dir=model_args.cache_dir,\n        trust_remote_code=model_args.trust_remote_code,\n    )\n\n    if \"train\" not in dataset:\n        if len(dataset.keys()) == 1:\n            only_split = list(dataset.keys())[0]\n            dataset[only_split] = dataset[only_split].shuffle(seed=training_args.seed)\n            dataset = dataset[only_split].train_test_split(test_size=data_args.train_val_split or 0.1)\n            dataset = {\"train\": dataset[\"train\"], \"validation\": dataset[\"test\"]}\n        else:\n            raise ValueError(f\"No 'train' split found. Available: {list(dataset.keys())}\")\n    elif \"validation\" not in dataset and \"test\" not in dataset:\n        dataset[\"train\"] = dataset[\"train\"].shuffle(seed=training_args.seed)\n        split = dataset[\"train\"].train_test_split(\n            test_size=data_args.train_val_split or 0.1, seed=training_args.seed\n        )\n        dataset[\"train\"] = split[\"train\"]\n        dataset[\"validation\"] = split[\"test\"]\n\n    if data_args.max_train_samples is not None:\n        n = min(data_args.max_train_samples, len(dataset[\"train\"]))\n        dataset[\"train\"] = dataset[\"train\"].select(range(n))\n        logger.info(f\"Truncated training set to {n} samples\")\n    eval_key = \"validation\" if \"validation\" in dataset else \"test\"\n    if data_args.max_eval_samples is not None and eval_key in dataset:\n        n = min(data_args.max_eval_samples, len(dataset[eval_key]))\n        dataset[eval_key] = dataset[eval_key].select(range(n))\n        logger.info(f\"Truncated eval set to {n} samples\")\n\n    # ---- Detect model family (SAM vs SAM2) and load processor/model ----\n    model_id = model_args.model_name_or_path.lower()\n    is_sam2 = \"sam2\" in model_id\n\n    if is_sam2:\n        from transformers import Sam2Processor, Sam2Model\n        processor = Sam2Processor.from_pretrained(model_args.model_name_or_path)\n        model = Sam2Model.from_pretrained(model_args.model_name_or_path)\n    else:\n        from transformers import SamProcessor, SamModel\n        processor = SamProcessor.from_pretrained(model_args.model_name_or_path)\n        model = SamModel.from_pretrained(model_args.model_name_or_path)\n\n    if model_args.freeze_vision_encoder:\n        for name, param in model.named_parameters():\n            if name.startswith(\"vision_encoder\"):\n                param.requires_grad_(False)\n    if model_args.freeze_prompt_encoder:\n        for name, param in model.named_parameters():\n            if name.startswith(\"prompt_encoder\"):\n                param.requires_grad_(False)\n\n    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)\n    total = sum(p.numel() for p in model.parameters())\n    logger.info(f\"Trainable params: {trainable:,} / {total:,} ({100 * trainable / total:.1f}%)\")\n\n    # ---- Build datasets ----\n    prompt_col = data_args.prompt_column_name if data_args.prompt_column_name else None\n    ds_kwargs = dict(\n        processor=processor,\n        prompt_type=data_args.prompt_type,\n        image_col=data_args.image_column_name,\n        mask_col=data_args.mask_column_name,\n        prompt_col=prompt_col,\n        bbox_col=data_args.bbox_column_name,\n        point_col=data_args.point_column_name,\n    )\n\n    train_dataset = SAMSegmentationDataset(dataset=dataset[\"train\"], **ds_kwargs)\n    eval_dataset = None\n    if eval_key in dataset:\n        eval_dataset = SAMSegmentationDataset(dataset=dataset[eval_key], **ds_kwargs)\n\n    # ---- Train ----\n    trainer = Trainer(\n        model=model,\n        args=training_args,\n        train_dataset=train_dataset if training_args.do_train else None,\n        eval_dataset=eval_dataset if training_args.do_eval else None,\n        data_collator=collate_fn,\n        compute_loss_func=compute_loss,\n    )\n\n    if training_args.do_train:\n        train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)\n        trainer.save_model()\n        trainer.log_metrics(\"train\", train_result.metrics)\n        trainer.save_metrics(\"train\", train_result.metrics)\n        trainer.save_state()\n\n    if training_args.do_eval and eval_dataset is not None:\n        metrics = trainer.evaluate()\n        trainer.log_metrics(\"eval\", metrics)\n        trainer.save_metrics(\"eval\", metrics)\n\n    trackio.finish()\n\n    kwargs = {\n        \"finetuned_from\": model_args.model_name_or_path,\n        \"dataset\": data_args.dataset_name,\n        \"tags\": [\"image-segmentation\", \"vision\", \"sam\"],\n    }\n    if training_args.push_to_hub:\n        trainer.push_to_hub(**kwargs)\n    else:\n        trainer.create_model_card(**kwargs)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "skills/huggingface-gradio/SKILL.md",
    "content": "---\nname: gradio\ndescription: Build Gradio web UIs and demos in Python. Use when creating or editing Gradio apps, components, event listeners, layouts, or chatbots.\n---\n\n# Gradio\n\nGradio is a Python library for building interactive web UIs and ML demos. This skill covers the core API, patterns, and examples.\n\n## Guides\n\nDetailed guides on specific topics (read these when relevant):\n\n- [Quickstart](https://www.gradio.app/guides/quickstart)\n- [The Interface Class](https://www.gradio.app/guides/the-interface-class)\n- [Blocks and Event Listeners](https://www.gradio.app/guides/blocks-and-event-listeners)\n- [Controlling Layout](https://www.gradio.app/guides/controlling-layout)\n- [More Blocks Features](https://www.gradio.app/guides/more-blocks-features)\n- [Custom CSS and JS](https://www.gradio.app/guides/custom-CSS-and-JS)\n- [Streaming Outputs](https://www.gradio.app/guides/streaming-outputs)\n- [Streaming Inputs](https://www.gradio.app/guides/streaming-inputs)\n- [Sharing Your App](https://www.gradio.app/guides/sharing-your-app)\n- [Custom HTML Components](https://www.gradio.app/guides/custom-HTML-components)\n- [Getting Started with the Python Client](https://www.gradio.app/guides/getting-started-with-the-python-client)\n- [Getting Started with the JS Client](https://www.gradio.app/guides/getting-started-with-the-js-client)\n\n## Core Patterns\n\n**Interface** (high-level): wraps a function with input/output components.\n\n```python\nimport gradio as gr\n\ndef greet(name):\n    return f\"Hello {name}!\"\n\ngr.Interface(fn=greet, inputs=\"text\", outputs=\"text\").launch()\n```\n\n**Blocks** (low-level): flexible layout with explicit event wiring.\n\n```python\nimport gradio as gr\n\nwith gr.Blocks() as demo:\n    name = gr.Textbox(label=\"Name\")\n    output = gr.Textbox(label=\"Greeting\")\n    btn = gr.Button(\"Greet\")\n    btn.click(fn=lambda n: f\"Hello {n}!\", inputs=name, outputs=output)\n\ndemo.launch()\n```\n\n**ChatInterface**: high-level wrapper for chatbot UIs.\n\n```python\nimport gradio as gr\n\ndef respond(message, history):\n    return f\"You said: {message}\"\n\ngr.ChatInterface(fn=respond).launch()\n```\n\n## Key Component Signatures\n\n### `Textbox(value: str | I18nData | Callable | None = None, type: Literal['text', 'password', 'email'] = \"text\", lines: int = 1, max_lines: int | None = None, placeholder: str | I18nData | None = None, label: str | I18nData | None = None, info: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, autofocus: bool = False, autoscroll: bool = True, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", text_align: Literal['left', 'right'] | None = None, rtl: bool = False, buttons: list[Literal['copy'] | Button] | None = None, max_length: int | None = None, submit_btn: str | bool | None = False, stop_btn: str | bool | None = False, html_attributes: InputHTMLAttributes | None = None)`\nCreates a textarea for user to enter string input or display string output..\n\n### `Number(value: float | Callable | None = None, label: str | I18nData | None = None, placeholder: str | I18nData | None = None, info: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", buttons: list[Button] | None = None, precision: int | None = None, minimum: float | None = None, maximum: float | None = None, step: float = 1)`\nCreates a numeric field for user to enter numbers as input or display numeric output..\n\n### `Slider(minimum: float = 0, maximum: float = 100, value: float | Callable | None = None, step: float | None = None, precision: int | None = None, label: str | I18nData | None = None, info: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", randomize: bool = False, buttons: list[Literal['reset']] | None = None)`\nCreates a slider that ranges from {minimum} to {maximum} with a step size of {step}..\n\n### `Checkbox(value: bool | Callable = False, label: str | I18nData | None = None, info: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", buttons: list[Button] | None = None)`\nCreates a checkbox that can be set to `True` or `False`.\n\n### `Dropdown(choices: Sequence[str | int | float | tuple[str, str | int | float]] | None = None, value: str | int | float | Sequence[str | int | float] | Callable | DefaultValue | None = DefaultValue(), type: Literal['value', 'index'] = \"value\", multiselect: bool | None = None, allow_custom_value: bool = False, max_choices: int | None = None, filterable: bool = True, label: str | I18nData | None = None, info: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", buttons: list[Button] | None = None)`\nCreates a dropdown of choices from which a single entry or multiple entries can be selected (as an input component) or displayed (as an output component)..\n\n### `Radio(choices: Sequence[str | int | float | tuple[str, str | int | float]] | None = None, value: str | int | float | Callable | None = None, type: Literal['value', 'index'] = \"value\", label: str | I18nData | None = None, info: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", rtl: bool = False, buttons: list[Button] | None = None)`\nCreates a set of (string or numeric type) radio buttons of which only one can be selected..\n\n### `Image(value: str | PIL.Image.Image | np.ndarray | Callable | None = None, format: str = \"webp\", height: int | str | None = None, width: int | str | None = None, image_mode: Literal['1', 'L', 'P', 'RGB', 'RGBA', 'CMYK', 'YCbCr', 'LAB', 'HSV', 'I', 'F'] | None = \"RGB\", sources: list[Literal['upload', 'webcam', 'clipboard']] | Literal['upload', 'webcam', 'clipboard'] | None = None, type: Literal['numpy', 'pil', 'filepath'] = \"numpy\", label: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, buttons: list[Literal['download', 'share', 'fullscreen'] | Button] | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, streaming: bool = False, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", webcam_options: WebcamOptions | None = None, placeholder: str | None = None, watermark: WatermarkOptions | None = None)`\nCreates an image component that can be used to upload images (as an input) or display images (as an output)..\n\n### `Audio(value: str | Path | tuple[int, np.ndarray] | Callable | None = None, sources: list[Literal['upload', 'microphone']] | Literal['upload', 'microphone'] | None = None, type: Literal['numpy', 'filepath'] = \"numpy\", label: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, streaming: bool = False, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", format: Literal['wav', 'mp3'] | None = None, autoplay: bool = False, editable: bool = True, buttons: list[Literal['download', 'share'] | Button] | None = None, waveform_options: WaveformOptions | dict | None = None, loop: bool = False, recording: bool = False, subtitles: str | Path | list[dict[str, Any]] | None = None, playback_position: float = 0)`\nCreates an audio component that can be used to upload/record audio (as an input) or display audio (as an output)..\n\n### `Video(value: str | Path | Callable | None = None, format: str | None = None, sources: list[Literal['upload', 'webcam']] | Literal['upload', 'webcam'] | None = None, height: int | str | None = None, width: int | str | None = None, label: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", webcam_options: WebcamOptions | None = None, include_audio: bool | None = None, autoplay: bool = False, buttons: list[Literal['download', 'share'] | Button] | None = None, loop: bool = False, streaming: bool = False, watermark: WatermarkOptions | None = None, subtitles: str | Path | list[dict[str, Any]] | None = None, playback_position: float = 0)`\nCreates a video component that can be used to upload/record videos (as an input) or display videos (as an output).\n\n### `File(value: str | list[str] | Callable | None = None, file_count: Literal['single', 'multiple', 'directory'] = \"single\", file_types: list[str] | None = None, type: Literal['filepath', 'binary'] = \"filepath\", label: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, height: int | str | float | None = None, interactive: bool | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", allow_reordering: bool = False, buttons: list[Button] | None = None)`\nCreates a file component that allows uploading one or more generic files (when used as an input) or displaying generic files or URLs for download (as output).\n\n### `Chatbot(value: list[MessageDict | Message] | Callable | None = None, label: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, container: bool = True, scale: int | None = None, min_width: int = 160, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, autoscroll: bool = True, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", height: int | str | None = 400, resizable: bool = False, max_height: int | str | None = None, min_height: int | str | None = None, editable: Literal['user', 'all'] | None = None, latex_delimiters: list[dict[str, str | bool]] | None = None, rtl: bool = False, buttons: list[Literal['share', 'copy', 'copy_all'] | Button] | None = None, watermark: str | None = None, avatar_images: tuple[str | Path | None, str | Path | None] | None = None, sanitize_html: bool = True, render_markdown: bool = True, feedback_options: list[str] | tuple[str, ...] | None = ('Like', 'Dislike'), feedback_value: Sequence[str | None] | None = None, line_breaks: bool = True, layout: Literal['panel', 'bubble'] | None = None, placeholder: str | None = None, examples: list[ExampleMessage] | None = None, allow_file_downloads: <class 'inspect._empty'> = True, group_consecutive_messages: bool = True, allow_tags: list[str] | bool = True, reasoning_tags: list[tuple[str, str]] | None = None, like_user_message: bool = False)`\nCreates a chatbot that displays user-submitted messages and responses.\n\n### `Button(value: str | I18nData | Callable = \"Run\", every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, variant: Literal['primary', 'secondary', 'stop', 'huggingface'] = \"secondary\", size: Literal['sm', 'md', 'lg'] = \"lg\", icon: str | Path | None = None, link: str | None = None, link_target: Literal['_self', '_blank', '_parent', '_top'] = \"_self\", visible: bool | Literal['hidden'] = True, interactive: bool = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", scale: int | None = None, min_width: int | None = None)`\nCreates a button that can be assigned arbitrary .click() events.\n\n### `Markdown(value: str | I18nData | Callable | None = None, label: str | I18nData | None = None, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool | None = None, rtl: bool = False, latex_delimiters: list[dict[str, str | bool]] | None = None, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", sanitize_html: bool = True, line_breaks: bool = False, header_links: bool = False, height: int | str | None = None, max_height: int | str | None = None, min_height: int | str | None = None, buttons: list[Literal['copy']] | None = None, container: bool = False, padding: bool = False)`\nUsed to render arbitrary Markdown output.\n\n### `HTML(value: Any | Callable | None = None, label: str | I18nData | None = None, html_template: str = \"${value}\", css_template: str = \"\", js_on_load: str | None = \"element.addEventListener('click', function() { trigger('click') });\", apply_default_css: bool = True, every: Timer | float | None = None, inputs: Component | Sequence[Component] | set[Component] | None = None, show_label: bool = False, visible: bool | Literal['hidden'] = True, elem_id: str | None = None, elem_classes: list[str] | str | None = None, render: bool = True, key: int | str | tuple[int | str, ...] | None = None, preserved_by_key: list[str] | str | None = \"value\", min_height: int | None = None, max_height: int | None = None, container: bool = False, padding: bool = False, autoscroll: bool = False, buttons: list[Button] | None = None, props: Any)`\nCreates a component with arbitrary HTML.\n\n\n## Custom HTML Components\n\nIf a task requires significant customization of an existing component or a component that doesn't exist in Gradio, you can create one with `gr.HTML`. It supports `html_template` (with `${}` JS expressions and `{{}}` Handlebars syntax), `css_template` for scoped styles, and `js_on_load` for interactivity — where `props.value` updates the component value and `trigger('event_name')` fires Gradio events. For reuse, subclass `gr.HTML` and define `api_info()` for API/MCP support. See the [full guide](https://www.gradio.app/guides/custom-HTML-components).\n\nHere's an example that shows how to create and use these kinds of components:\n\n```python\nimport gradio as gr\n\nclass StarRating(gr.HTML):\n    def __init__(self, label, value=0, **kwargs):\n        html_template = \"\"\"\n        <h2>${label} rating:</h2>\n        ${Array.from({length: 5}, (_, i) => `<img class='${i < value ? '' : 'faded'}' src='https://upload.wikimedia.org/wikipedia/commons/d/df/Award-star-gold-3d.svg'>`).join('')}\n        \"\"\"\n        css_template = \"\"\"\n            img { height: 50px; display: inline-block; cursor: pointer; }\n            .faded { filter: grayscale(100%); opacity: 0.3; }\n        \"\"\"\n        js_on_load = \"\"\"\n            const imgs = element.querySelectorAll('img');\n            imgs.forEach((img, index) => {\n                img.addEventListener('click', () => {\n                    props.value = index + 1;\n                });\n            });\n        \"\"\"\n        super().__init__(value=value, label=label, html_template=html_template, css_template=css_template, js_on_load=js_on_load, **kwargs)\n\n    def api_info(self):\n        return {\"type\": \"integer\", \"minimum\": 0, \"maximum\": 5}\n\n\nwith gr.Blocks() as demo:\n    gr.Markdown(\"# Restaurant Review\")\n    food_rating = StarRating(label=\"Food\", value=3)\n    service_rating = StarRating(label=\"Service\", value=3)\n    ambience_rating = StarRating(label=\"Ambience\", value=3)\n    average_btn = gr.Button(\"Calculate Average Rating\")\n    rating_output = StarRating(label=\"Average\", value=3)\n    def calculate_average(food, service, ambience):\n        return round((food + service + ambience) / 3)\n    average_btn.click(\n        fn=calculate_average,\n        inputs=[food_rating, service_rating, ambience_rating],\n        outputs=rating_output\n    )\n\ndemo.launch()\n```\n\n## Event Listeners\n\nAll event listeners share the same signature:\n\n```python\ncomponent.event_name(\n    fn: Callable | None | Literal[\"decorator\"] = \"decorator\",\n    inputs: Component | Sequence[Component] | set[Component] | None = None,\n    outputs: Component | Sequence[Component] | set[Component] | None = None,\n    api_name: str | None = None,\n    api_description: str | None | Literal[False] = None,\n    scroll_to_output: bool = False,\n    show_progress: Literal[\"full\", \"minimal\", \"hidden\"] = \"full\",\n    show_progress_on: Component | Sequence[Component] | None = None,\n    queue: bool = True,\n    batch: bool = False,\n    max_batch_size: int = 4,\n    preprocess: bool = True,\n    postprocess: bool = True,\n    cancels: dict[str, Any] | list[dict[str, Any]] | None = None,\n    trigger_mode: Literal[\"once\", \"multiple\", \"always_last\"] | None = None,\n    js: str | Literal[True] | None = None,\n    concurrency_limit: int | None | Literal[\"default\"] = \"default\",\n    concurrency_id: str | None = None,\n    api_visibility: Literal[\"public\", \"private\", \"undocumented\"] = \"public\",\n    time_limit: int | None = None,\n    stream_every: float = 0.5,\n    key: int | str | tuple[int | str, ...] | None = None,\n    validator: Callable | None = None,\n) -> Dependency\n```\n\nSupported events per component:\n\n- **AnnotatedImage**: select\n- **Audio**: stream, change, clear, play, pause, stop, pause, start_recording, pause_recording, stop_recording, upload, input\n- **BarPlot**: select, double_click\n- **BrowserState**: change\n- **Button**: click\n- **Chatbot**: change, select, like, retry, undo, example_select, option_select, clear, copy, edit\n- **Checkbox**: change, input, select\n- **CheckboxGroup**: change, input, select\n- **ClearButton**: click\n- **Code**: change, input, focus, blur\n- **ColorPicker**: change, input, submit, focus, blur\n- **Dataframe**: change, input, select, edit\n- **Dataset**: click, select\n- **DateTime**: change, submit\n- **DeepLinkButton**: click\n- **Dialogue**: change, input, submit\n- **DownloadButton**: click\n- **Dropdown**: change, input, select, focus, blur, key_up\n- **DuplicateButton**: click\n- **File**: change, select, clear, upload, delete, download\n- **FileExplorer**: change, input, select\n- **Gallery**: select, upload, change, delete, preview_close, preview_open\n- **HTML**: change, input, click, double_click, submit, stop, edit, clear, play, pause, end, start_recording, pause_recording, stop_recording, focus, blur, upload, release, select, stream, like, example_select, option_select, load, key_up, apply, delete, tick, undo, retry, expand, collapse, download, copy\n- **HighlightedText**: change, select\n- **Image**: clear, change, stream, select, upload, input\n- **ImageEditor**: clear, change, input, select, upload, apply\n- **ImageSlider**: clear, change, stream, select, upload, input\n- **JSON**: change\n- **Label**: change, select\n- **LinePlot**: select, double_click\n- **LoginButton**: click\n- **Markdown**: change, copy\n- **Model3D**: change, upload, edit, clear\n- **MultimodalTextbox**: change, input, select, submit, focus, blur, stop\n- **Navbar**: change\n- **Number**: change, input, submit, focus, blur\n- **ParamViewer**: change, upload\n- **Plot**: change\n- **Radio**: select, change, input\n- **ScatterPlot**: select, double_click\n- **SimpleImage**: clear, change, upload\n- **Slider**: change, input, release\n- **State**: change\n- **Textbox**: change, input, select, submit, focus, blur, stop, copy\n- **Timer**: tick\n- **UploadButton**: click, upload\n- **Video**: change, clear, start_recording, stop_recording, stop, play, pause, end, upload, input\n\n## Additional Reference\n\n- [End-to-End Examples](examples.md) — complete working apps\n"
  },
  {
    "path": "skills/huggingface-gradio/examples.md",
    "content": "# Gradio End-to-End Examples\n\nComplete working Gradio apps for reference.\n\n## Blocks Essay Simple\n\n```python\nimport gradio as gr\n\ndef change_textbox(choice):\n    if choice == \"short\":\n        return gr.Textbox(lines=2, visible=True)\n    elif choice == \"long\":\n        return gr.Textbox(lines=8, visible=True, value=\"Lorem ipsum dolor sit amet\")\n    else:\n        return gr.Textbox(visible=False)\n\nwith gr.Blocks() as demo:\n    radio = gr.Radio(\n        [\"short\", \"long\", \"none\"], label=\"What kind of essay would you like to write?\"\n    )\n    text = gr.Textbox(lines=2, interactive=True, buttons=[\"copy\"])\n    radio.change(fn=change_textbox, inputs=radio, outputs=text)\n\ndemo.launch()\n```\n\n## Blocks Flipper\n\n```python\nimport numpy as np\nimport gradio as gr\n\ndef flip_text(x):\n    return x[::-1]\n\ndef flip_image(x):\n    return np.fliplr(x)\n\nwith gr.Blocks() as demo:\n    gr.Markdown(\"Flip text or image files using this demo.\")\n    with gr.Tab(\"Flip Text\"):\n        text_input = gr.Textbox()\n        text_output = gr.Textbox()\n        text_button = gr.Button(\"Flip\")\n    with gr.Tab(\"Flip Image\"):\n        with gr.Row():\n            image_input = gr.Image()\n            image_output = gr.Image()\n        image_button = gr.Button(\"Flip\")\n\n    with gr.Accordion(\"Open for More!\", open=False):\n        gr.Markdown(\"Look at me...\")\n        temp_slider = gr.Slider(\n            0, 1,\n            value=0.1,\n            step=0.1,\n            interactive=True,\n            label=\"Slide me\",\n        )\n\n    text_button.click(flip_text, inputs=text_input, outputs=text_output)\n    image_button.click(flip_image, inputs=image_input, outputs=image_output)\n\ndemo.launch()\n```\n\n## Blocks Form\n\n```python\nimport gradio as gr\n\nwith gr.Blocks() as demo:\n    name_box = gr.Textbox(label=\"Name\")\n    age_box = gr.Number(label=\"Age\", minimum=0, maximum=100)\n    symptoms_box = gr.CheckboxGroup([\"Cough\", \"Fever\", \"Runny Nose\"])\n    submit_btn = gr.Button(\"Submit\")\n\n    with gr.Column(visible=False) as output_col:\n        diagnosis_box = gr.Textbox(label=\"Diagnosis\")\n        patient_summary_box = gr.Textbox(label=\"Patient Summary\")\n\n    def submit(name, age, symptoms):\n        return {\n            submit_btn: gr.Button(visible=False),\n            output_col: gr.Column(visible=True),\n            diagnosis_box: \"covid\" if \"Cough\" in symptoms else \"flu\",\n            patient_summary_box: f\"{name}, {age} y/o\",\n        }\n\n    submit_btn.click(\n        submit,\n        [name_box, age_box, symptoms_box],\n        [submit_btn, diagnosis_box, patient_summary_box, output_col],\n    )\n\ndemo.launch()\n```\n\n## Blocks Hello\n\n```python\nimport gradio as gr\n\ndef welcome(name):\n    return f\"Welcome to Gradio, {name}!\"\n\nwith gr.Blocks() as demo:\n    gr.Markdown(\n    \"\"\"\n    # Hello World!\n    Start typing below to see the output.\n    \"\"\")\n    inp = gr.Textbox(placeholder=\"What is your name?\")\n    out = gr.Textbox()\n    inp.change(welcome, inp, out)\n\ndemo.launch()\n```\n\n## Blocks Layout\n\n```python\nimport gradio as gr\n\ndemo = gr.Blocks()\n\nwith demo:\n    with gr.Row():\n        gr.Image(interactive=True, scale=2)\n        gr.Image()\n    with gr.Row():\n        gr.Textbox(label=\"Text\")\n        gr.Number(label=\"Count\", scale=2)\n        gr.Radio(choices=[\"One\", \"Two\"])\n    with gr.Row():\n        gr.Button(\"500\", scale=0, min_width=500)\n        gr.Button(\"A\", scale=0)\n        gr.Button(\"grow\")\n    with gr.Row():\n        gr.Textbox()\n        gr.Textbox()\n        gr.Button()\n    with gr.Row():\n        with gr.Row():\n            with gr.Column():\n                gr.Textbox(label=\"Text\")\n                gr.Number(label=\"Count\")\n                gr.Radio(choices=[\"One\", \"Two\"])\n            gr.Image()\n            with gr.Column():\n                gr.Image(interactive=True)\n                gr.Image()\n    gr.Image()\n    gr.Textbox(label=\"Text\")\n    gr.Number(label=\"Count\")\n    gr.Radio(choices=[\"One\", \"Two\"])\n\ndemo.launch()\n```\n\n## Calculator\n\n```python\nimport gradio as gr\n\ndef calculator(num1, operation, num2):\n    if operation == \"add\":\n        return num1 + num2\n    elif operation == \"subtract\":\n        return num1 - num2\n    elif operation == \"multiply\":\n        return num1 * num2\n    elif operation == \"divide\":\n        if num2 == 0:\n            raise gr.Error(\"Cannot divide by zero!\")\n        return num1 / num2\n\ndemo = gr.Interface(\n    calculator,\n    [\n        \"number\",\n        gr.Radio([\"add\", \"subtract\", \"multiply\", \"divide\"]),\n        \"number\"\n    ],\n    \"number\",\n    examples=[\n        [45, \"add\", 3],\n        [3.14, \"divide\", 2],\n        [144, \"multiply\", 2.5],\n        [0, \"subtract\", 1.2],\n    ],\n    title=\"Toy Calculator\",\n    description=\"Here's a sample toy calculator.\",\n    api_name=\"predict\"\n)\n\ndemo.launch()\n```\n\n## Chatbot Simple\n\n```python\nimport gradio as gr\nimport random\nimport time\n\nwith gr.Blocks() as demo:\n    chatbot = gr.Chatbot()\n    msg = gr.Textbox()\n    clear = gr.ClearButton([msg, chatbot])\n\n    def respond(message, chat_history):\n        bot_message = random.choice([\"How are you?\", \"Today is a great day\", \"I'm very hungry\"])\n        chat_history.append({\"role\": \"user\", \"content\": message})\n        chat_history.append({\"role\": \"assistant\", \"content\": bot_message})\n        time.sleep(2)\n        return \"\", chat_history\n\n    msg.submit(respond, [msg, chatbot], [msg, chatbot])\n\ndemo.launch()\n```\n\n## Chatbot Streaming\n\n```python\nimport gradio as gr\nimport random\nimport time\n\nwith gr.Blocks() as demo:\n    chatbot = gr.Chatbot()\n    msg = gr.Textbox()\n    clear = gr.Button(\"Clear\")\n\n    def user(user_message, history: list):\n        return \"\", history + [{\"role\": \"user\", \"content\": user_message}]\n\n    def bot(history: list):\n        bot_message = random.choice([\"How are you?\", \"I love you\", \"I'm very hungry\"])\n        history.append({\"role\": \"assistant\", \"content\": \"\"})\n        for character in bot_message:\n            history[-1]['content'] += character\n            time.sleep(0.05)\n            yield history\n\n    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(\n        bot, chatbot, chatbot\n    )\n    clear.click(lambda: None, None, chatbot, queue=False)\n\ndemo.launch()\n```\n\n## Custom Css\n\n```python\nimport gradio as gr\n\nwith gr.Blocks() as demo:\n    with gr.Column(elem_classes=\"cool-col\"):\n        gr.Markdown(\"### Gradio Demo with Custom CSS\", elem_classes=\"darktest\")\n        gr.Markdown(\n            elem_classes=\"markdown\",\n            value=\"Resize the browser window to see the CSS media query in action.\",\n        )\n\nif __name__ == \"__main__\":\n    demo.launch(css_paths=[\"demo/custom_css/custom_css.css\"])\n```\n\n## Fake Diffusion\n\n```python\nimport gradio as gr\nimport numpy as np\nimport time\n\ndef fake_diffusion(steps):\n    rng = np.random.default_rng()\n    for i in range(steps):\n        time.sleep(1)\n        image = rng.random(size=(600, 600, 3))\n        yield image\n    image = np.ones((1000,1000,3), np.uint8)\n    image[:] = [255, 124, 0]\n    yield image\n\ndemo = gr.Interface(fake_diffusion,\n                    inputs=gr.Slider(1, 10, 3, step=1),\n                    outputs=\"image\",\n                    api_name=\"predict\")\n\ndemo.launch()\n```\n\n## Hello World\n\n```python\nimport gradio as gr\n\n\ndef greet(name):\n    return \"Hello \" + name + \"!\"\n\n\ndemo = gr.Interface(fn=greet, inputs=\"textbox\", outputs=\"textbox\", api_name=\"predict\")\n\ndemo.launch()\n```\n\n## Image Editor\n\n```python\nimport gradio as gr\nimport time\n\n\ndef sleep(im):\n    time.sleep(5)\n    return [im[\"background\"], im[\"layers\"][0], im[\"layers\"][1], im[\"composite\"]]\n\n\ndef predict(im):\n    return im[\"composite\"]\n\n\nwith gr.Blocks() as demo:\n    with gr.Row():\n        im = gr.ImageEditor(\n            type=\"numpy\",\n        )\n        im_preview = gr.Image()\n    n_upload = gr.Number(0, label=\"Number of upload events\", step=1)\n    n_change = gr.Number(0, label=\"Number of change events\", step=1)\n    n_input = gr.Number(0, label=\"Number of input events\", step=1)\n\n    im.upload(lambda x: x + 1, outputs=n_upload, inputs=n_upload)\n    im.change(lambda x: x + 1, outputs=n_change, inputs=n_change)\n    im.input(lambda x: x + 1, outputs=n_input, inputs=n_input)\n    im.change(predict, outputs=im_preview, inputs=im, show_progress=\"hidden\")\n\ndemo.launch()\n```\n\n## On Listener Decorator\n\n```python\nimport gradio as gr\n\nwith gr.Blocks() as demo:\n    name = gr.Textbox(label=\"Name\")\n    output = gr.Textbox(label=\"Output Box\")\n    greet_btn = gr.Button(\"Greet\")\n\n    @gr.on(triggers=[name.submit, greet_btn.click], inputs=name, outputs=output)\n    def greet(name):\n        return \"Hello \" + name + \"!\"\n\ndemo.launch()\n```\n\n## Render Merge\n\n```python\nimport gradio as gr\nimport time\n\nwith gr.Blocks() as demo:\n    text_count = gr.Slider(1, 5, value=1, step=1, label=\"Textbox Count\")\n\n    @gr.render(inputs=text_count)\n    def render_count(count):\n        boxes = []\n        for i in range(count):\n            box = gr.Textbox(label=f\"Box {i}\")\n            boxes.append(box)\n\n        def merge(*args):\n            time.sleep(0.2)  # simulate a delay\n            return \" \".join(args)\n\n        merge_btn.click(merge, boxes, output)\n\n        def clear():\n            time.sleep(0.2)  # simulate a delay\n            return [\" \"] * count\n\n        clear_btn.click(clear, None, boxes)\n\n        def countup():\n            time.sleep(0.2)  # simulate a delay\n            return list(range(count))\n\n        count_btn.click(countup, None, boxes, queue=False)\n\n    with gr.Row():\n        merge_btn = gr.Button(\"Merge\")\n        clear_btn = gr.Button(\"Clear\")\n        count_btn = gr.Button(\"Count\")\n\n    output = gr.Textbox()\n\ndemo.launch()\n```\n\n## Reverse Audio 2\n\n```python\nimport gradio as gr\nimport numpy as np\n\ndef reverse_audio(audio):\n    sr, data = audio\n    return (sr, np.flipud(data))\n\ndemo = gr.Interface(fn=reverse_audio,\n                    inputs=\"microphone\",\n                    outputs=\"audio\", api_name=\"predict\")\n\ndemo.launch()\n```\n\n## Sepia Filter\n\n```python\nimport numpy as np\nimport gradio as gr\n\ndef sepia(input_img):\n    sepia_filter = np.array([\n        [0.393, 0.769, 0.189],\n        [0.349, 0.686, 0.168],\n        [0.272, 0.534, 0.131]\n    ])\n    sepia_img = input_img.dot(sepia_filter.T)\n    sepia_img /= sepia_img.max()\n    return sepia_img\n\ndemo = gr.Interface(sepia, gr.Image(), \"image\", api_name=\"predict\")\ndemo.launch()\n```\n\n## Sort Records\n\n```python\nimport gradio as gr\n\ndef sort_records(records):\n    return records.sort(\"Quantity\")\n\ndemo = gr.Interface(\n    sort_records,\n    gr.Dataframe(\n        headers=[\"Item\", \"Quantity\"],\n        datatype=[\"str\", \"number\"],  \n        row_count=3,\n        column_count=2,\n        column_limits=(2, 2),\n        type=\"polars\"\n    ),\n    \"dataframe\",\n    description=\"Sort by Quantity\"\n)\n\ndemo.launch()\n```\n\n## Streaming Simple\n\n```python\nimport gradio as gr\n\nwith gr.Blocks() as demo:\n    with gr.Row():\n        with gr.Column():\n            input_img = gr.Image(label=\"Input\", sources=\"webcam\")\n        with gr.Column():\n            output_img = gr.Image(label=\"Output\")\n        input_img.stream(lambda s: s, input_img, output_img, time_limit=15, stream_every=0.1, concurrency_limit=30)\n\nif __name__ == \"__main__\":\n\n    demo.launch()\n```\n\n## Tabbed Interface Lite\n\n```python\nimport gradio as gr\n\nhello_world = gr.Interface(lambda name: \"Hello \" + name, \"text\", \"text\", api_name=\"predict\")\nbye_world = gr.Interface(lambda name: \"Bye \" + name, \"text\", \"text\", api_name=\"predict\")\nchat = gr.ChatInterface(lambda *args: \"Hello \" + args[0], api_name=\"chat\")\n\ndemo = gr.TabbedInterface([hello_world, bye_world, chat], [\"Hello World\", \"Bye World\", \"Chat\"])\n\ndemo.launch()\n```\n\n## Tax Calculator\n\n```python\nimport gradio as gr\n\ndef tax_calculator(income, marital_status, assets):\n    tax_brackets = [(10, 0), (25, 8), (60, 12), (120, 20), (250, 30)]\n    total_deductible = sum(cost for cost, deductible in zip(assets[\"Cost\"], assets[\"Deductible\"]) if deductible)\n    taxable_income = income - total_deductible\n\n    total_tax = 0\n    for bracket, rate in tax_brackets:\n        if taxable_income > bracket:\n            total_tax += (taxable_income - bracket) * rate / 100\n\n    if marital_status == \"Married\":\n        total_tax *= 0.75\n    elif marital_status == \"Divorced\":\n        total_tax *= 0.8\n\n    return round(total_tax)\n\ndemo = gr.Interface(\n    tax_calculator,\n    [\n        \"number\",\n        gr.Radio([\"Single\", \"Married\", \"Divorced\"]),\n        gr.Dataframe(\n            headers=[\"Item\", \"Cost\", \"Deductible\"],\n            datatype=[\"str\", \"number\", \"bool\"],  \n            label=\"Assets Purchased this Year\",\n        ),\n    ],\n    gr.Number(label=\"Tax due\"),\n    examples=[\n        [10000, \"Married\", [[\"Suit\", 5000, True], [\"Laptop (for work)\", 800, False], [\"Car\", 1800, True]]],\n        [80000, \"Single\", [[\"Suit\", 800, True], [\"Watch\", 1800, True], [\"Food\", 800, True]]],\n    ],\n    live=True,\n    api_name=\"predict\"\n)\n\ndemo.launch()\n```\n\n## Timer Simple\n\n```python\nimport gradio as gr\nimport random\nimport time\n\nwith gr.Blocks() as demo:\n  timer = gr.Timer(1)\n  timestamp = gr.Number(label=\"Time\")\n  timer.tick(lambda: round(time.time()), outputs=timestamp, api_name=\"timestamp\")\n\n  number = gr.Number(lambda: random.randint(1, 10), every=timer, label=\"Random Number\")\n  with gr.Row():\n    gr.Button(\"Start\").click(lambda: gr.Timer(active=True), None, timer)\n    gr.Button(\"Stop\").click(lambda: gr.Timer(active=False), None, timer)\n    gr.Button(\"Go Fast\").click(lambda: 0.2, None, timer)\n\nif __name__ == \"__main__\":\n  demo.launch()\n```\n\n## Variable Outputs\n\n```python\nimport gradio as gr\n\nmax_textboxes = 10\n\ndef variable_outputs(k):\n    k = int(k)\n    return [gr.Textbox(visible=True)]*k + [gr.Textbox(visible=False)]*(max_textboxes-k)\n\nwith gr.Blocks() as demo:\n    s = gr.Slider(1, max_textboxes, value=max_textboxes, step=1, label=\"How many textboxes to show:\")\n    textboxes = []\n    for i in range(max_textboxes):\n        t = gr.Textbox(f\"Textbox {i}\")\n        textboxes.append(t)\n\n    s.change(variable_outputs, s, textboxes)\n\nif __name__ == \"__main__\":\n   demo.launch()\n```\n\n## Video Identity\n\n```python\nimport gradio as gr\nfrom gradio.media import get_video\n\ndef video_identity(video):\n    return video\n\n# get_video() returns file paths to sample media included with Gradio\ndemo = gr.Interface(video_identity,\n                    gr.Video(),\n                    \"playable_video\",\n                    examples=[\n                        get_video(\"world.mp4\")\n                    ],\n                    cache_examples=True,\n                    api_name=\"predict\",)\n\ndemo.launch()\n```\n"
  },
  {
    "path": "skills/transformers.js/SKILL.md",
    "content": "---\nname: transformers-js\ndescription: Use Transformers.js to run state-of-the-art machine learning models directly in JavaScript/TypeScript. Supports NLP (text classification, translation, summarization), computer vision (image classification, object detection), audio (speech recognition, audio classification), and multimodal tasks. Works in Node.js and browsers (with WebGPU/WASM) using pre-trained models from Hugging Face Hub.\nlicense: Apache-2.0\nmetadata:\n  author: huggingface\n  version: \"3.8.1\"\n  category: machine-learning\n  repository: https://github.com/huggingface/transformers.js\ncompatibility: Requires Node.js 18+ or modern browser with ES modules support. WebGPU support requires compatible browser/environment. Internet access needed for downloading models from Hugging Face Hub (optional if using local models).\n---\n\n# Transformers.js - Machine Learning for JavaScript\n\nTransformers.js enables running state-of-the-art machine learning models directly in JavaScript, both in browsers and Node.js environments, with no server required.\n\n## When to Use This Skill\n\nUse this skill when you need to:\n- Run ML models for text analysis, generation, or translation in JavaScript\n- Perform image classification, object detection, or segmentation\n- Implement speech recognition or audio processing\n- Build multimodal AI applications (text-to-image, image-to-text, etc.)\n- Run models client-side in the browser without a backend\n\n## Installation\n\n### NPM Installation\n```bash\nnpm install @huggingface/transformers\n```\n\n### Browser Usage (CDN)\n```javascript\n<script type=\"module\">\n  import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers';\n</script>\n```\n\n## Core Concepts\n\n### 1. Pipeline API\nThe pipeline API is the easiest way to use models. It groups together preprocessing, model inference, and postprocessing:\n\n```javascript\nimport { pipeline } from '@huggingface/transformers';\n\n// Create a pipeline for a specific task\nconst pipe = await pipeline('sentiment-analysis');\n\n// Use the pipeline\nconst result = await pipe('I love transformers!');\n// Output: [{ label: 'POSITIVE', score: 0.999817686 }]\n\n// IMPORTANT: Always dispose when done to free memory\nawait classifier.dispose();\n```\n\n**⚠️ Memory Management:** All pipelines must be disposed with `pipe.dispose()` when finished to prevent memory leaks. See examples in [Code Examples](./references/EXAMPLES.md) for cleanup patterns across different environments.\n\n### 2. Model Selection\nYou can specify a custom model as the second argument:\n\n```javascript\nconst pipe = await pipeline(\n  'sentiment-analysis',\n  'Xenova/bert-base-multilingual-uncased-sentiment'\n);\n```\n\n**Finding Models:**\n\nBrowse available Transformers.js models on Hugging Face Hub:\n- **All models**: https://huggingface.co/models?library=transformers.js&sort=trending\n- **By task**: Add `pipeline_tag` parameter\n  - Text generation: https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending\n  - Image classification: https://huggingface.co/models?pipeline_tag=image-classification&library=transformers.js&sort=trending\n  - Speech recognition: https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js&sort=trending\n\n**Tip:** Filter by task type, sort by trending/downloads, and check model cards for performance metrics and usage examples.\n\n### 3. Device Selection\nChoose where to run the model:\n\n```javascript\n// Run on CPU (default for WASM)\nconst pipe = await pipeline('sentiment-analysis', 'model-id');\n\n// Run on GPU (WebGPU - experimental)\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  device: 'webgpu',\n});\n```\n\n### 4. Quantization Options\nControl model precision vs. performance:\n\n```javascript\n// Use quantized model (faster, smaller)\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  dtype: 'q4',  // Options: 'fp32', 'fp16', 'q8', 'q4'\n});\n```\n\n## Supported Tasks\n\n**Note:** All examples below show basic usage.\n\n### Natural Language Processing\n\n#### Text Classification\n```javascript\nconst classifier = await pipeline('text-classification');\nconst result = await classifier('This movie was amazing!');\n```\n\n#### Named Entity Recognition (NER)\n```javascript\nconst ner = await pipeline('token-classification');\nconst entities = await ner('My name is John and I live in New York.');\n```\n\n#### Question Answering\n```javascript\nconst qa = await pipeline('question-answering');\nconst answer = await qa({\n  question: 'What is the capital of France?',\n  context: 'Paris is the capital and largest city of France.'\n});\n```\n\n#### Text Generation\n```javascript\nconst generator = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX');\nconst text = await generator('Once upon a time', {\n  max_new_tokens: 100,\n  temperature: 0.7\n});\n```\n\n**For streaming and chat:** See **[Text Generation Guide](./references/TEXT_GENERATION.md)** for:\n- Streaming token-by-token output with `TextStreamer`\n- Chat/conversation format with system/user/assistant roles\n- Generation parameters (temperature, top_k, top_p)\n- Browser and Node.js examples\n- React components and API endpoints\n\n#### Translation\n```javascript\nconst translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M');\nconst output = await translator('Hello, how are you?', {\n  src_lang: 'eng_Latn',\n  tgt_lang: 'fra_Latn'\n});\n```\n\n#### Summarization\n```javascript\nconst summarizer = await pipeline('summarization');\nconst summary = await summarizer(longText, {\n  max_length: 100,\n  min_length: 30\n});\n```\n\n#### Zero-Shot Classification\n```javascript\nconst classifier = await pipeline('zero-shot-classification');\nconst result = await classifier('This is a story about sports.', ['politics', 'sports', 'technology']);\n```\n\n### Computer Vision\n\n#### Image Classification\n```javascript\nconst classifier = await pipeline('image-classification');\nconst result = await classifier('https://example.com/image.jpg');\n// Or with local file\nconst result = await classifier(imageUrl);\n```\n\n#### Object Detection\n```javascript\nconst detector = await pipeline('object-detection');\nconst objects = await detector('https://example.com/image.jpg');\n// Returns: [{ label: 'person', score: 0.95, box: { xmin, ymin, xmax, ymax } }, ...]\n```\n\n#### Image Segmentation\n```javascript\nconst segmenter = await pipeline('image-segmentation');\nconst segments = await segmenter('https://example.com/image.jpg');\n```\n\n#### Depth Estimation\n```javascript\nconst depthEstimator = await pipeline('depth-estimation');\nconst depth = await depthEstimator('https://example.com/image.jpg');\n```\n\n#### Zero-Shot Image Classification\n```javascript\nconst classifier = await pipeline('zero-shot-image-classification');\nconst result = await classifier('image.jpg', ['cat', 'dog', 'bird']);\n```\n\n### Audio Processing\n\n#### Automatic Speech Recognition\n```javascript\nconst transcriber = await pipeline('automatic-speech-recognition');\nconst result = await transcriber('audio.wav');\n// Returns: { text: 'transcribed text here' }\n```\n\n#### Audio Classification\n```javascript\nconst classifier = await pipeline('audio-classification');\nconst result = await classifier('audio.wav');\n```\n\n#### Text-to-Speech\n```javascript\nconst synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts');\nconst audio = await synthesizer('Hello, this is a test.', {\n  speaker_embeddings: speakerEmbeddings\n});\n```\n\n### Multimodal\n\n#### Image-to-Text (Image Captioning)\n```javascript\nconst captioner = await pipeline('image-to-text');\nconst caption = await captioner('image.jpg');\n```\n\n#### Document Question Answering\n```javascript\nconst docQA = await pipeline('document-question-answering');\nconst answer = await docQA('document-image.jpg', 'What is the total amount?');\n```\n\n#### Zero-Shot Object Detection\n```javascript\nconst detector = await pipeline('zero-shot-object-detection');\nconst objects = await detector('image.jpg', ['person', 'car', 'tree']);\n```\n\n### Feature Extraction (Embeddings)\n\n```javascript\nconst extractor = await pipeline('feature-extraction');\nconst embeddings = await extractor('This is a sentence to embed.');\n// Returns: tensor of shape [1, sequence_length, hidden_size]\n\n// For sentence embeddings (mean pooling)\nconst extractor = await pipeline('feature-extraction', 'onnx-community/all-MiniLM-L6-v2-ONNX');\nconst embeddings = await extractor('Text to embed', { pooling: 'mean', normalize: true });\n```\n\n## Finding and Choosing Models\n\n### Browsing the Hugging Face Hub\n\nDiscover compatible Transformers.js models on Hugging Face Hub:\n\n**Base URL (all models):**\n```\nhttps://huggingface.co/models?library=transformers.js&sort=trending\n```\n\n**Filter by task** using the `pipeline_tag` parameter:\n\n| Task | URL |\n|------|-----|\n| **Text Generation** | https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending |\n| **Text Classification** | https://huggingface.co/models?pipeline_tag=text-classification&library=transformers.js&sort=trending |\n| **Translation** | https://huggingface.co/models?pipeline_tag=translation&library=transformers.js&sort=trending |\n| **Summarization** | https://huggingface.co/models?pipeline_tag=summarization&library=transformers.js&sort=trending |\n| **Question Answering** | https://huggingface.co/models?pipeline_tag=question-answering&library=transformers.js&sort=trending |\n| **Image Classification** | https://huggingface.co/models?pipeline_tag=image-classification&library=transformers.js&sort=trending |\n| **Object Detection** | https://huggingface.co/models?pipeline_tag=object-detection&library=transformers.js&sort=trending |\n| **Image Segmentation** | https://huggingface.co/models?pipeline_tag=image-segmentation&library=transformers.js&sort=trending |\n| **Speech Recognition** | https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js&sort=trending |\n| **Audio Classification** | https://huggingface.co/models?pipeline_tag=audio-classification&library=transformers.js&sort=trending |\n| **Image-to-Text** | https://huggingface.co/models?pipeline_tag=image-to-text&library=transformers.js&sort=trending |\n| **Feature Extraction** | https://huggingface.co/models?pipeline_tag=feature-extraction&library=transformers.js&sort=trending |\n| **Zero-Shot Classification** | https://huggingface.co/models?pipeline_tag=zero-shot-classification&library=transformers.js&sort=trending |\n\n**Sort options:**\n- `&sort=trending` - Most popular recently\n- `&sort=downloads` - Most downloaded overall\n- `&sort=likes` - Most liked by community\n- `&sort=modified` - Recently updated\n\n### Choosing the Right Model\n\nConsider these factors when selecting a model:\n\n**1. Model Size**\n- **Small (< 100MB)**: Fast, suitable for browsers, limited accuracy\n- **Medium (100MB - 500MB)**: Balanced performance, good for most use cases\n- **Large (> 500MB)**: High accuracy, slower, better for Node.js or powerful devices\n\n**2. Quantization**\nModels are often available in different quantization levels:\n- `fp32` - Full precision (largest, most accurate)\n- `fp16` - Half precision (smaller, still accurate)\n- `q8` - 8-bit quantized (much smaller, slight accuracy loss)\n- `q4` - 4-bit quantized (smallest, noticeable accuracy loss)\n\n**3. Task Compatibility**\nCheck the model card for:\n- Supported tasks (some models support multiple tasks)\n- Input/output formats\n- Language support (multilingual vs. English-only)\n- License restrictions\n\n**4. Performance Metrics**\nModel cards typically show:\n- Accuracy scores\n- Benchmark results\n- Inference speed\n- Memory requirements\n\n### Example: Finding a Text Generation Model\n\n```javascript\n// 1. Visit: https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending\n\n// 2. Browse and select a model (e.g., onnx-community/gemma-3-270m-it-ONNX)\n\n// 3. Check model card for:\n//    - Model size: ~270M parameters\n//    - Quantization: q4 available\n//    - Language: English\n//    - Use case: Instruction-following chat\n\n// 4. Use the model:\nimport { pipeline } from '@huggingface/transformers';\n\nconst generator = await pipeline(\n  'text-generation',\n  'onnx-community/gemma-3-270m-it-ONNX',\n  { dtype: 'q4' } // Use quantized version for faster inference\n);\n\nconst output = await generator('Explain quantum computing in simple terms.', {\n  max_new_tokens: 100\n});\n\nawait generator.dispose();\n```\n\n### Tips for Model Selection\n\n1. **Start Small**: Test with a smaller model first, then upgrade if needed\n2. **Check ONNX Support**: Ensure the model has ONNX files (look for `onnx` folder in model repo)\n3. **Read Model Cards**: Model cards contain usage examples, limitations, and benchmarks\n4. **Test Locally**: Benchmark inference speed and memory usage in your environment\n5. **Community Models**: Look for models by `Xenova` (Transformers.js maintainer) or `onnx-community`\n6. **Version Pin**: Use specific git commits in production for stability:\n   ```javascript\n   const pipe = await pipeline('task', 'model-id', { revision: 'abc123' });\n   ```\n\n## Advanced Configuration\n\n### Environment Configuration (`env`)\n\nThe `env` object provides comprehensive control over Transformers.js execution, caching, and model loading.\n\n**Quick Overview:**\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// View version\nconsole.log(env.version); // e.g., '3.8.1'\n\n// Common settings\nenv.allowRemoteModels = true;  // Load from Hugging Face Hub\nenv.allowLocalModels = false;  // Load from file system\nenv.localModelPath = '/models/'; // Local model directory\nenv.useFSCache = true;         // Cache models on disk (Node.js)\nenv.useBrowserCache = true;    // Cache models in browser\nenv.cacheDir = './.cache';     // Cache directory location\n```\n\n**Configuration Patterns:**\n\n```javascript\n// Development: Fast iteration with remote models\nenv.allowRemoteModels = true;\nenv.useFSCache = true;\n\n// Production: Local models only\nenv.allowRemoteModels = false;\nenv.allowLocalModels = true;\nenv.localModelPath = '/app/models/';\n\n// Custom CDN\nenv.remoteHost = 'https://cdn.example.com/models';\n\n// Disable caching (testing)\nenv.useFSCache = false;\nenv.useBrowserCache = false;\n```\n\nFor complete documentation on all configuration options, caching strategies, cache management, pre-downloading models, and more, see:\n\n**→ [Configuration Reference](./references/CONFIGURATION.md)**\n\n### Working with Tensors\n\n```javascript\nimport { AutoTokenizer, AutoModel } from '@huggingface/transformers';\n\n// Load tokenizer and model separately for more control\nconst tokenizer = await AutoTokenizer.from_pretrained('bert-base-uncased');\nconst model = await AutoModel.from_pretrained('bert-base-uncased');\n\n// Tokenize input\nconst inputs = await tokenizer('Hello world!');\n\n// Run model\nconst outputs = await model(inputs);\n```\n\n### Batch Processing\n\n```javascript\nconst classifier = await pipeline('sentiment-analysis');\n\n// Process multiple texts\nconst results = await classifier([\n  'I love this!',\n  'This is terrible.',\n  'It was okay.'\n]);\n```\n\n## Browser-Specific Considerations\n\n### WebGPU Usage\nWebGPU provides GPU acceleration in browsers:\n\n```javascript\nconst pipe = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX', {\n  device: 'webgpu',\n  dtype: 'fp32'\n});\n```\n\n**Note**: WebGPU is experimental. Check browser compatibility and file issues if problems occur.\n\n### WASM Performance\nDefault browser execution uses WASM:\n\n```javascript\n// Optimized for browsers with quantization\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  dtype: 'q8'  // or 'q4' for even smaller size\n});\n```\n\n### Progress Tracking & Loading Indicators\n\nModels can be large (ranging from a few MB to several GB) and consist of multiple files. Track download progress by passing a callback to the `pipeline()` function:\n\n```javascript\nimport { pipeline } from '@huggingface/transformers';\n\n// Track progress for each file\nconst fileProgress = {};\n\nfunction onProgress(info) {\n  console.log(`${info.status}: ${info.file}`);\n  \n  if (info.status === 'progress') {\n    fileProgress[info.file] = info.progress;\n    console.log(`${info.file}: ${info.progress.toFixed(1)}%`);\n  }\n  \n  if (info.status === 'done') {\n    console.log(`✓ ${info.file} complete`);\n  }\n}\n\n// Pass callback to pipeline\nconst classifier = await pipeline('sentiment-analysis', null, {\n  progress_callback: onProgress\n});\n```\n\n**Progress Info Properties:**\n\n```typescript\ninterface ProgressInfo {\n  status: 'initiate' | 'download' | 'progress' | 'done' | 'ready';\n  name: string;      // Model id or path\n  file: string;      // File being processed\n  progress?: number; // Percentage (0-100, only for 'progress' status)\n  loaded?: number;   // Bytes downloaded (only for 'progress' status)\n  total?: number;    // Total bytes (only for 'progress' status)\n}\n```\n\nFor complete examples including browser UIs, React components, CLI progress bars, and retry logic, see:\n\n**→ [Pipeline Options - Progress Callback](./references/PIPELINE_OPTIONS.md#progress-callback)**\n\n## Error Handling\n\n```javascript\ntry {\n  const pipe = await pipeline('sentiment-analysis', 'model-id');\n  const result = await pipe('text to analyze');\n} catch (error) {\n  if (error.message.includes('fetch')) {\n    console.error('Model download failed. Check internet connection.');\n  } else if (error.message.includes('ONNX')) {\n    console.error('Model execution failed. Check model compatibility.');\n  } else {\n    console.error('Unknown error:', error);\n  }\n}\n```\n\n## Performance Tips\n\n1. **Reuse Pipelines**: Create pipeline once, reuse for multiple inferences\n2. **Use Quantization**: Start with `q8` or `q4` for faster inference\n3. **Batch Processing**: Process multiple inputs together when possible\n4. **Cache Models**: Models are cached automatically (see **[Caching Reference](./references/CACHE.md)** for details on browser Cache API, Node.js filesystem cache, and custom implementations)\n5. **WebGPU for Large Models**: Use WebGPU for models that benefit from GPU acceleration\n6. **Prune Context**: For text generation, limit `max_new_tokens` to avoid memory issues\n7. **Clean Up Resources**: Call `pipe.dispose()` when done to free memory\n\n## Memory Management\n\n**IMPORTANT:** Always call `pipe.dispose()` when finished to prevent memory leaks.\n\n```javascript\nconst pipe = await pipeline('sentiment-analysis');\nconst result = await pipe('Great product!');\nawait pipe.dispose();  // ✓ Free memory (100MB - several GB per model)\n```\n\n**When to dispose:**\n- Application shutdown or component unmount\n- Before loading a different model\n- After batch processing in long-running apps\n\nModels consume significant memory and hold GPU/CPU resources. Disposal is critical for browser memory limits and server stability.\n\nFor detailed patterns (React cleanup, servers, browser), see **[Code Examples](./references/EXAMPLES.md)**\n\n## Troubleshooting\n\n### Model Not Found\n- Verify model exists on Hugging Face Hub\n- Check model name spelling\n- Ensure model has ONNX files (look for `onnx` folder in model repo)\n\n### Memory Issues\n- Use smaller models or quantized versions (`dtype: 'q4'`)\n- Reduce batch size\n- Limit sequence length with `max_length`\n\n### WebGPU Errors\n- Check browser compatibility (Chrome 113+, Edge 113+)\n- Try `dtype: 'fp16'` if `fp32` fails\n- Fall back to WASM if WebGPU unavailable\n\n## Reference Documentation\n\n### This Skill\n- **[Pipeline Options](./references/PIPELINE_OPTIONS.md)** - Configure `pipeline()` with `progress_callback`, `device`, `dtype`, etc.\n- **[Configuration Reference](./references/CONFIGURATION.md)** - Global `env` configuration for caching and model loading\n- **[Caching Reference](./references/CACHE.md)** - Browser Cache API, Node.js filesystem cache, and custom cache implementations\n- **[Text Generation Guide](./references/TEXT_GENERATION.md)** - Streaming, chat format, and generation parameters\n- **[Model Architectures](./references/MODEL_ARCHITECTURES.md)** - Supported models and selection tips\n- **[Code Examples](./references/EXAMPLES.md)** - Real-world implementations for different runtimes\n\n### Official Transformers.js\n- Official docs: https://huggingface.co/docs/transformers.js\n- API reference: https://huggingface.co/docs/transformers.js/api/pipelines\n- Model hub: https://huggingface.co/models?library=transformers.js\n- GitHub: https://github.com/huggingface/transformers.js\n- Examples: https://github.com/huggingface/transformers.js/tree/main/examples\n\n## Best Practices\n\n1. **Always Dispose Pipelines**: Call `pipe.dispose()` when done - critical for preventing memory leaks\n2. **Start with Pipelines**: Use the pipeline API unless you need fine-grained control\n3. **Test Locally First**: Test models with small inputs before deploying\n4. **Monitor Model Sizes**: Be aware of model download sizes for web applications\n5. **Handle Loading States**: Show progress indicators for better UX\n6. **Version Pin**: Pin specific model versions for production stability\n7. **Error Boundaries**: Always wrap pipeline calls in try-catch blocks\n8. **Progressive Enhancement**: Provide fallbacks for unsupported browsers\n9. **Reuse Models**: Load once, use many times - don't recreate pipelines unnecessarily\n10. **Graceful Shutdown**: Dispose models on SIGTERM/SIGINT in servers\n\n## Quick Reference: Task IDs\n\n| Task | Task ID |\n|------|---------|\n| Text classification | `text-classification` or `sentiment-analysis` |\n| Token classification | `token-classification` or `ner` |\n| Question answering | `question-answering` |\n| Fill mask | `fill-mask` |\n| Summarization | `summarization` |\n| Translation | `translation` |\n| Text generation | `text-generation` |\n| Text-to-text generation | `text2text-generation` |\n| Zero-shot classification | `zero-shot-classification` |\n| Image classification | `image-classification` |\n| Image segmentation | `image-segmentation` |\n| Object detection | `object-detection` |\n| Depth estimation | `depth-estimation` |\n| Image-to-image | `image-to-image` |\n| Zero-shot image classification | `zero-shot-image-classification` |\n| Zero-shot object detection | `zero-shot-object-detection` |\n| Automatic speech recognition | `automatic-speech-recognition` |\n| Audio classification | `audio-classification` |\n| Text-to-speech | `text-to-speech` or `text-to-audio` |\n| Image-to-text | `image-to-text` |\n| Document question answering | `document-question-answering` |\n| Feature extraction | `feature-extraction` |\n| Sentence similarity | `sentence-similarity` |\n\n---\n\nThis skill enables you to integrate state-of-the-art machine learning capabilities directly into JavaScript applications without requiring separate ML servers or Python environments.\n"
  },
  {
    "path": "skills/transformers.js/references/CACHE.md",
    "content": "# Caching Reference\n\nComplete guide to caching strategies for Transformers.js models across different environments.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Browser Caching](#browser-caching)\n3. [Node.js Caching](#nodejs-caching)\n4. [Custom Cache Implementation](#custom-cache-implementation)\n5. [Cache Configuration](#cache-configuration)\n\n## Overview\n\nTransformers.js models can be large (from a few MB to several GB), so caching is critical for performance. The caching strategy differs based on the environment:\n\n- **Browser**: Uses the Cache API (browser cache storage)\n- **Node.js**: Uses filesystem cache in `~/.cache/huggingface/`\n- **Custom**: Implement your own cache (database, cloud storage, etc.)\n\n### Default Behavior\n\n```javascript\nimport { pipeline } from '@huggingface/transformers';\n\n// First load: downloads model\nconst pipe = await pipeline('sentiment-analysis');\n\n// Subsequent loads: uses cached model\nconst pipe2 = await pipeline('sentiment-analysis'); // Fast!\n```\n\nCaching is **automatic** and enabled by default. Models are cached after the first download.\n\n## Browser Caching\n\n### Using the Cache API\n\nIn browser environments, Transformers.js uses the [Cache API](https://developer.mozilla.org/en-US/docs/Web/API/Cache) to store models:\n\n```javascript\nimport { env, pipeline } from '@huggingface/transformers';\n\n// Browser cache is enabled by default\nconsole.log(env.useBrowserCache); // true\n\n// Load model (cached automatically)\nconst classifier = await pipeline('sentiment-analysis');\n```\n\n**How it works:**\n\n1. Model files are downloaded from Hugging Face Hub\n2. Files are stored in the browser's Cache Storage\n3. Subsequent loads retrieve from cache (no network request)\n4. Cache persists across page reloads and browser sessions\n\n### Cache Location\n\nBrowser caches are stored in:\n- **Chrome/Edge**: `Cache Storage` in DevTools → Application tab → Cache storage\n- **Firefox**: `about:cache` → Storage\n- **Safari**: Web Inspector → Storage tab\n\n### Disable Browser Cache\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Disable browser caching (not recommended)\nenv.useBrowserCache = false;\n\n// Models will be re-downloaded on every page load\n```\n\n**Use case:** Testing, development, or debugging cache issues.\n\n### Browser Storage Limits\n\nBrowsers impose storage quotas:\n\n- **Chrome**: ~60% of available disk space (but can evict data)\n- **Firefox**: ~50% of available disk space\n- **Safari**: ~1GB per origin (prompt for more)\n\n**Tip:** Monitor storage usage with the [Storage API](https://developer.mozilla.org/en-US/docs/Web/API/Storage_API):\n\n```javascript\nif ('storage' in navigator && 'estimate' in navigator.storage) {\n  const estimate = await navigator.storage.estimate();\n  const percentUsed = (estimate.usage / estimate.quota) * 100;\n  console.log(`Storage: ${percentUsed.toFixed(2)}% used`);\n  console.log(`Available: ${((estimate.quota - estimate.usage) / 1024 / 1024).toFixed(2)} MB`);\n}\n```\n\n## Node.js Caching\n\n### Filesystem Cache\n\nIn Node.js, models are cached to the filesystem:\n\n```javascript\nimport { env, pipeline } from '@huggingface/transformers';\n\n// Default cache directory (Node.js)\nconsole.log(env.cacheDir); // './.cache' (relative to current directory)\n\n// Filesystem cache is enabled by default\nconsole.log(env.useFSCache); // true\n\n// Load model (cached to disk)\nconst classifier = await pipeline('sentiment-analysis');\n```\n\n### Default Cache Location\n\n**Default behavior:**\n- Cache directory: `./.cache` (relative to where Node.js process runs)\n- Full default path: `~/.cache/huggingface/` when using Hugging Face tools\n\n**Note:** The statement \"Models are cached automatically in `~/.cache/huggingface/`\" from performance tips is specific to Hugging Face's Python tooling convention. In Transformers.js for Node.js, the default is `./.cache` unless configured otherwise.\n\n### Custom Cache Directory\n\n```javascript\nimport { env, pipeline } from '@huggingface/transformers';\n\n// Set custom cache directory\nenv.cacheDir = '/var/cache/transformers';\n\n// Or use environment variable (Node.js convention)\nenv.cacheDir = process.env.HF_HOME || '~/.cache/huggingface';\n\n// Now load model\nconst classifier = await pipeline('sentiment-analysis');\n// Cached to: /var/cache/transformers/models--Xenova--distilbert-base-uncased-finetuned-sst-2-english/\n```\n\n**Pattern:** `models--{organization}--{model-name}/`\n\n### Disable Filesystem Cache\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Disable filesystem caching (not recommended)\nenv.useFSCache = false;\n\n// Models will be re-downloaded on every load\n```\n\n**Use case:** Testing, CI/CD environments, or containers with ephemeral storage.\n\n## Custom Cache Implementation\n\nImplement your own cache for specialized storage backends.\n\n### Custom Cache Interface\n\n```typescript\ninterface CacheInterface {\n  /**\n   * Check if a URL is cached\n   */\n  match(url: string): Promise<Response | undefined>;\n  \n  /**\n   * Store a URL and its response\n   */\n  put(url: string, response: Response): Promise<void>;\n}\n```\n\n### Example: Cloud Storage Cache (S3)\n\n```javascript\nimport { env, pipeline } from '@huggingface/transformers';\nimport { S3Client, GetObjectCommand, PutObjectCommand } from '@aws-sdk/client-s3';\nimport { Readable } from 'stream';\n\nclass S3Cache {\n  constructor(bucket, region = 'us-east-1') {\n    this.bucket = bucket;\n    this.s3 = new S3Client({ region });\n  }\n\n  async match(url) {\n    const key = this.urlToKey(url);\n    \n    try {\n      const command = new GetObjectCommand({\n        Bucket: this.bucket,\n        Key: key\n      });\n      const response = await this.s3.send(command);\n      \n      // Convert stream to buffer\n      const chunks = [];\n      for await (const chunk of response.Body) {\n        chunks.push(chunk);\n      }\n      const body = Buffer.concat(chunks);\n      \n      return new Response(body, {\n        status: 200,\n        headers: JSON.parse(response.Metadata.headers || '{}')\n      });\n    } catch (error) {\n      if (error.name === 'NoSuchKey') return undefined;\n      throw error;\n    }\n  }\n\n  async put(url, response) {\n    const key = this.urlToKey(url);\n    const clonedResponse = response.clone();\n    const body = Buffer.from(await clonedResponse.arrayBuffer());\n    const headers = JSON.stringify(Object.fromEntries(response.headers.entries()));\n\n    const command = new PutObjectCommand({\n      Bucket: this.bucket,\n      Key: key,\n      Body: body,\n      Metadata: { headers }\n    });\n    \n    await this.s3.send(command);\n  }\n\n  urlToKey(url) {\n    // Convert URL to S3 key (remove protocol, replace slashes)\n    return url.replace(/^https?:\\/\\//, '').replace(/\\//g, '_');\n  }\n}\n\n// Configure S3 cache\nenv.useCustomCache = true;\nenv.customCache = new S3Cache('my-transformers-cache', 'us-east-1');\nenv.useFSCache = false;\n\n// Use S3 cache\nconst classifier = await pipeline('sentiment-analysis');\n```\n\n## Cache Configuration\n\n### Environment Variables\n\nUse environment variables to configure caching:\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Configure cache directory from environment\nenv.cacheDir = process.env.TRANSFORMERS_CACHE || './.cache';\n\n// Disable caching in CI/CD\nif (process.env.CI === 'true') {\n  env.useFSCache = false;\n  env.useBrowserCache = false;\n}\n\n// Production: use pre-cached models\nif (process.env.NODE_ENV === 'production') {\n  env.allowRemoteModels = false;\n  env.allowLocalModels = true;\n  env.localModelPath = process.env.MODEL_PATH || '/app/models';\n}\n```\n\n### Configuration Patterns\n\n#### Development: Enable All Caching\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\nenv.allowRemoteModels = true;\nenv.useFSCache = true;         // Node.js\nenv.useBrowserCache = true;    // Browser\nenv.cacheDir = './.cache';\n```\n\n#### Production: Local Models Only\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\nenv.allowRemoteModels = false;\nenv.allowLocalModels = true;\nenv.localModelPath = '/app/models';\nenv.useFSCache = true;\n```\n\n#### Testing: Disable Caching\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\nenv.useFSCache = false;\nenv.useBrowserCache = false;\nenv.allowRemoteModels = true; // Download every time\n```\n\n#### Hybrid: Cache + Remote Fallback\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Try local cache first, fall back to remote\nenv.allowRemoteModels = true;\nenv.allowLocalModels = true;\nenv.useFSCache = true;\nenv.localModelPath = './models';\n```\n\n---\n\n## Summary\n\nTransformers.js provides flexible caching options:\n\n- **Browser**: Cache API (automatic, persistent)\n- **Node.js**: Filesystem cache (default `./.cache`, configurable)\n- **Custom**: Implement your own (database, cloud storage, etc.)\n\n**Key takeaways:**\n\n1. Caching is enabled by default and automatic\n2. Configure cache **before** loading models\n3. Browser uses Cache API, Node.js uses filesystem\n4. Custom caches enable advanced storage backends\n5. Monitor cache size and implement cleanup strategies\n6. Pre-download models for production deployments\n\nFor more configuration options, see:\n- [Configuration Reference](./CONFIGURATION.md)\n- [Pipeline Options](./PIPELINE_OPTIONS.md)\n"
  },
  {
    "path": "skills/transformers.js/references/CONFIGURATION.md",
    "content": "# Environment Configuration Reference\n\nComplete guide to configuring Transformers.js behavior using the `env` object.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Remote Model Configuration](#remote-model-configuration)\n3. [Local Model Configuration](#local-model-configuration)\n4. [Cache Configuration](#cache-configuration)\n5. [WASM Configuration](#wasm-configuration)\n6. [Common Configuration Patterns](#common-configuration-patterns)\n7. [Environment Best Practices](#environment-best-practices)\n\n## Overview\n\nThe `env` object provides comprehensive control over Transformers.js execution, caching, and model loading:\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// View current version\nconsole.log(env.version); // e.g., '3.8.1'\n```\n\n### Available Properties\n\n```typescript\ninterface TransformersEnvironment {\n  // Version info\n  version: string;\n  \n  // Backend configuration\n  backends: {\n    onnx: Partial<ONNXEnv>;\n  };\n  \n  // Remote model settings\n  allowRemoteModels: boolean;\n  remoteHost: string;\n  remotePathTemplate: string;\n  \n  // Local model settings\n  allowLocalModels: boolean;\n  localModelPath: string;\n  useFS: boolean;\n  \n  // Cache settings\n  useBrowserCache: boolean;\n  useFSCache: boolean;\n  cacheDir: string | null;\n  useCustomCache: boolean;\n  customCache: CacheInterface | null;\n  useWasmCache: boolean;\n  cacheKey: string;\n}\n```\n\n## Remote Model Configuration\n\nControl how models are loaded from remote sources (default: Hugging Face Hub).\n\n### Disable Remote Loading\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Force local-only mode (no network requests)\nenv.allowRemoteModels = false;\n```\n\n**Use case:** Offline applications, security requirements, or air-gapped environments.\n\n### Custom Model Host\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Use your own CDN or model server\nenv.remoteHost = 'https://cdn.example.com/models';\n\n// Customize the URL pattern\n// Default: '{model}/resolve/{revision}/{file}'\nenv.remotePathTemplate = 'custom/{model}/{file}';\n```\n\n**Use case:** Self-hosting models, using a CDN for faster downloads, or corporate proxies.\n\n### Example: Private Model Server\n\n```javascript\nimport { env, pipeline } from '@huggingface/transformers';\n\n// Configure custom model host\nenv.remoteHost = 'https://models.mycompany.com';\nenv.remotePathTemplate = '{model}/{file}';\n\n// Models will be loaded from:\n// https://models.mycompany.com/my-model/model.onnx\nconst pipe = await pipeline('sentiment-analysis', 'my-model');\n```\n\n## Local Model Configuration\n\nControl loading models from the local file system.\n\n### Enable Local Models\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Enable local file system loading\nenv.allowLocalModels = true;\n\n// Set the base path for local models\nenv.localModelPath = '/path/to/models/';\n```\n\n**Default values:**\n- Browser: `allowLocalModels = false`, `localModelPath = '/models/'`\n- Node.js: `allowLocalModels = true`, `localModelPath = '/models/'`\n\n### File System Control\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Disable file system usage entirely (Node.js only)\nenv.useFS = false;\n```\n\n### Example: Local Model Directory Structure\n\n```\n/app/models/\n├── onnx-community/\n│   ├── Supertonic-TTS-ONNX/\n│   │   ├── config.json\n│   │   ├── tokenizer.json\n│   │   ├── model.onnx\n│   │   └── ...\n│   └── yolo26l-pose-ONNX/\n│       ├── config.json\n│       ├── preprocessor_config.json\n│       ├── model.onnx\n│       └── ...\n```\n\n```javascript\nenv.allowLocalModels = true;\nenv.localModelPath = '/app/models/';\nenv.allowRemoteModels = false; // Offline mode\n\nconst classifier = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english');\n```\n\n## Cache Configuration\n\nTransformers.js supports multiple caching strategies to improve performance and reduce network usage.\n\n### Quick Configuration\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Browser cache (Cache API)\nenv.useBrowserCache = true; // default: true\nenv.cacheKey = 'my-app-transformers-cache'; // default: 'transformers-cache'\n\n// Node.js filesystem cache\nenv.useFSCache = true; // default: true\nenv.cacheDir = './custom-cache-dir'; // default: './.cache'\n\n// Custom cache implementation\nenv.useCustomCache = true;\nenv.customCache = new CustomCache(); // Implement Cache API interface\n\n// WASM binary caching\nenv.useWasmCache = true; // default: true\n```\n\n### Disable Caching\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Disable all caching (re-download on every load)\nenv.useFSCache = false;\nenv.useBrowserCache = false;\nenv.useWasmCache = false;\nenv.cacheDir = null;\n```\n\nFor comprehensive caching documentation including:\n- Browser Cache API details and storage limits\n- Node.js filesystem cache structure and management\n- Custom cache implementations (Redis, database, S3)\n- Cache clearing and monitoring strategies\n- Best practices and troubleshooting\n\nSee **[Caching Reference](./CACHE.md)**\n\n## WASM Configuration\n\nConfigure ONNX Runtime Web Assembly backend settings.\n\n### Basic WASM Settings\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Set custom WASM paths\nenv.backends.onnx.wasm.wasmPaths = 'https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/';\n\n// Configure number of threads (Node.js only)\nenv.backends.onnx.wasm.numThreads = 4;\n\n// Enable/disable SIMD (single instruction, multiple data)\nenv.backends.onnx.wasm.simd = true;\n```\n\n### Proxy Configuration\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Configure proxy for WASM downloads\nenv.backends.onnx.wasm.proxy = true;\n```\n\n### Self-Hosted WASM Files\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Host WASM files on your own server\nenv.backends.onnx.wasm.wasmPaths = '/static/wasm/';\n```\n\n**Required files:**\n- `ort-wasm.wasm` - Main WASM binary\n- `ort-wasm-simd.wasm` - SIMD-enabled WASM binary\n- `ort-wasm-threaded.wasm` - Multi-threaded WASM binary\n- `ort-wasm-simd-threaded.wasm` - SIMD + multi-threaded WASM binary\n\n## Common Configuration Patterns\n\n### Development Setup\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Fast iteration with caching\nenv.allowRemoteModels = true;\nenv.useBrowserCache = true; // Browser\nenv.useFSCache = true;      // Node.js\nenv.cacheDir = './.cache';\n```\n\n### Production (Local Models)\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Secure, offline-capable setup\nenv.allowRemoteModels = false;\nenv.allowLocalModels = true;\nenv.localModelPath = '/app/models/';\nenv.useFSCache = false; // Models already local\n```\n\n### Offline-First Application\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Try local first, fall back to remote\nenv.allowLocalModels = true;\nenv.localModelPath = './models/';\nenv.allowRemoteModels = true;\nenv.useFSCache = true;\nenv.cacheDir = './cache';\n```\n\n### Custom CDN\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Use your own model hosting\nenv.remoteHost = 'https://cdn.example.com/ml-models';\nenv.remotePathTemplate = '{model}/{file}';\nenv.useBrowserCache = true;\n```\n\n### Memory-Constrained Environment\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Minimize disk/memory usage\nenv.useFSCache = false;\nenv.useBrowserCache = false;\nenv.useWasmCache = false;\nenv.cacheDir = null;\n```\n\n### Testing/CI Environment\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Predictable, isolated testing\nenv.allowRemoteModels = false;\nenv.allowLocalModels = true;\nenv.localModelPath = './test-fixtures/models/';\nenv.useFSCache = false;\n```\n\n\n\n## Environment Best Practices\n\n### 1. Configure Early\n\nSet `env` properties before loading any models:\n\n```javascript\nimport { env, pipeline } from '@huggingface/transformers';\n\n// ✓ Good: Configure before loading\nenv.allowRemoteModels = false;\nenv.localModelPath = '/app/models/';\nconst pipe = await pipeline('sentiment-analysis');\n\n// ✗ Bad: Configuring after loading may not take effect\nconst pipe = await pipeline('sentiment-analysis');\nenv.allowRemoteModels = false; // Too late!\n```\n\n### 2. Use Environment Variables\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\n// Configure based on environment\nenv.allowRemoteModels = process.env.NODE_ENV === 'development';\nenv.cacheDir = process.env.MODEL_CACHE_DIR || './.cache';\nenv.localModelPath = process.env.LOCAL_MODELS_PATH || '/app/models/';\n```\n\n### 3. Handle Errors Gracefully\n\n```javascript\nimport { pipeline, env } from '@huggingface/transformers';\n\ntry {\n  env.allowRemoteModels = false;\n  const pipe = await pipeline('sentiment-analysis', 'my-model');\n} catch (error) {\n  if (error.message.includes('not found')) {\n    console.error('Model not found locally. Enable remote models or download the model.');\n  }\n  throw error;\n}\n```\n\n### 4. Log Configuration\n\n```javascript\nimport { env } from '@huggingface/transformers';\n\nconsole.log('Transformers.js Configuration:', {\n  version: env.version,\n  allowRemoteModels: env.allowRemoteModels,\n  allowLocalModels: env.allowLocalModels,\n  localModelPath: env.localModelPath,\n  cacheDir: env.cacheDir,\n  useFSCache: env.useFSCache,\n  useBrowserCache: env.useBrowserCache\n});\n```\n\n## Related Documentation\n\n- **[Caching Reference](./CACHE.md)** - Comprehensive caching guide (browser, Node.js, custom implementations)\n- [Pipeline Options](./PIPELINE_OPTIONS.md) - Configure pipeline loading with `progress_callback`, `device`, `dtype`, etc.\n- [Model Architectures](./MODEL_ARCHITECTURES.md) - Supported models and architectures\n- [Examples](./EXAMPLES.md) - Code examples for different runtimes\n- [Main Skill Guide](../SKILL.md) - Getting started and common usage\n"
  },
  {
    "path": "skills/transformers.js/references/EXAMPLES.md",
    "content": "# Transformers.js Code Examples\n\nWorking examples showing how to use Transformers.js across different runtimes and frameworks.\n\nAll examples use the same task and model for consistency:\n- **Task**: `feature-extraction`\n- **Model**: `onnx-community/all-MiniLM-L6-v2-ONNX`\n\n## Table of Contents\n1. [Browser (Vanilla JS)](#browser-vanilla-js)\n2. [Node.js](#nodejs)\n3. [React](#react)\n4. [Express API](#express-api)\n\n## Browser (Vanilla JS)\n\n### Basic Usage\n\n```html\n<!DOCTYPE html>\n<html>\n<head>\n  <title>Feature Extraction</title>\n</head>\n<body>\n  <h1>Text Embedding Generator</h1>\n  <textarea id=\"input\" placeholder=\"Enter text to embed...\"></textarea>\n  <button onclick=\"generateEmbedding()\">Generate Embedding</button>\n  <div id=\"result\"></div>\n  <div id=\"loading\" style=\"display:none;\">Loading model...</div>\n\n  <script type=\"module\">\n    import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.8.1';\n    \n    let extractor;\n    \n    // Initialize model on page load\n    document.getElementById('loading').style.display = 'block';\n    extractor = await pipeline(\n      'feature-extraction',\n      'onnx-community/all-MiniLM-L6-v2-ONNX'\n    );\n    document.getElementById('loading').style.display = 'none';\n    \n    window.generateEmbedding = async function() {\n      const text = document.getElementById('input').value;\n      const output = await extractor(text, { pooling: 'mean', normalize: true });\n      \n      document.getElementById('result').innerHTML = `\n        <h3>Embedding Generated:</h3>\n        <p>Dimensions: ${output.data.length}</p>\n        <p>First 5 values: ${Array.from(output.data).slice(0, 5).join(', ')}</p>\n      `;\n    };\n    \n    // Cleanup on page unload\n    window.addEventListener('beforeunload', () => {\n      if (extractor) extractor.dispose();\n    });\n  </script>\n</body>\n</html>\n```\n\n### With Progress Tracking\n\n```html\n<!DOCTYPE html>\n<html>\n<head>\n  <title>Feature Extraction with Progress</title>\n  <style>\n    .file-progress {\n      margin: 10px 0;\n    }\n    .file-name {\n      font-size: 12px;\n      margin-bottom: 5px;\n    }\n    .progress-bar {\n      width: 100%;\n      height: 20px;\n      background: #f0f0f0;\n      border-radius: 5px;\n      overflow: hidden;\n    }\n    .progress-fill {\n      height: 100%;\n      background: #4CAF50;\n      transition: width 0.3s;\n    }\n  </style>\n</head>\n<body>\n  <h1>Text Embedding Generator</h1>\n  <div id=\"loading\">\n    <p id=\"status\">Loading model...</p>\n    <div id=\"progress-container\"></div>\n  </div>\n  <div id=\"app\" style=\"display:none;\">\n    <textarea id=\"input\" placeholder=\"Enter text...\"></textarea>\n    <button onclick=\"generateEmbedding()\">Generate</button>\n    <div id=\"result\"></div>\n  </div>\n\n  <script type=\"module\">\n    import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.8.1';\n    \n    let extractor;\n    const fileProgressBars = {};\n    const progressContainer = document.getElementById('progress-container');\n    \n    extractor = await pipeline(\n      'feature-extraction',\n      'onnx-community/all-MiniLM-L6-v2-ONNX',\n      {\n        progress_callback: (info) => {\n          document.getElementById('status').textContent = `${info.status}: ${info.file}`;\n          \n          if (info.status === 'progress') {\n            // Create progress bar for each file\n            if (!fileProgressBars[info.file]) {\n              const fileDiv = document.createElement('div');\n              fileDiv.className = 'file-progress';\n              fileDiv.innerHTML = `\n                <div class=\"file-name\">${info.file}</div>\n                <div class=\"progress-bar\">\n                  <div class=\"progress-fill\"></div>\n                </div>\n              `;\n              progressContainer.appendChild(fileDiv);\n              fileProgressBars[info.file] = fileDiv.querySelector('.progress-fill');\n            }\n            \n            // Update progress\n            fileProgressBars[info.file].style.width = `${info.progress}%`;\n          }\n          \n          if (info.status === 'ready') {\n            document.getElementById('loading').style.display = 'none';\n            document.getElementById('app').style.display = 'block';\n          }\n        }\n      }\n    );\n    \n    window.generateEmbedding = async function() {\n      const text = document.getElementById('input').value;\n      const output = await extractor(text, { pooling: 'mean', normalize: true });\n      \n      document.getElementById('result').innerHTML = `\n        <p>Embedding: ${output.data.length} dimensions</p>\n      `;\n    };\n    \n    // Cleanup on page unload\n    window.addEventListener('beforeunload', () => {\n      if (extractor) extractor.dispose();\n    });\n  </script>\n</body>\n</html>\n```\n\n## Node.js\n\n### Basic Script\n\n```javascript\n// embed.js\nimport { pipeline } from '@huggingface/transformers';\n\nasync function generateEmbedding(text) {\n  const extractor = await pipeline(\n    'feature-extraction',\n    'onnx-community/all-MiniLM-L6-v2-ONNX'\n  );\n  \n  const output = await extractor(text, { pooling: 'mean', normalize: true });\n  \n  console.log('Text:', text);\n  console.log('Embedding dimensions:', output.data.length);\n  console.log('First 5 values:', Array.from(output.data).slice(0, 5));\n  \n  await extractor.dispose();\n}\n\ngenerateEmbedding('Hello, world!');\n```\n\n### Batch Processing\n\n```javascript\n// batch-embed.js\nimport { pipeline } from '@huggingface/transformers';\nimport fs from 'fs/promises';\n\nasync function embedDocuments(documents) {\n  const extractor = await pipeline(\n    'feature-extraction',\n    'onnx-community/all-MiniLM-L6-v2-ONNX'\n  );\n  \n  console.log(`Processing ${documents.length} documents...`);\n  \n  const embeddings = [];\n  \n  for (let i = 0; i < documents.length; i++) {\n    const output = await extractor(documents[i], { \n      pooling: 'mean', \n      normalize: true \n    });\n    \n    embeddings.push({\n      text: documents[i],\n      embedding: Array.from(output.data)\n    });\n    \n    console.log(`Processed ${i + 1}/${documents.length}`);\n  }\n  \n  await fs.writeFile(\n    'embeddings.json',\n    JSON.stringify(embeddings, null, 2)\n  );\n  \n  console.log('Saved to embeddings.json');\n  \n  await extractor.dispose();\n}\n\nconst documents = [\n  'The cat sat on the mat',\n  'A dog played in the park',\n  'Machine learning is fascinating'\n];\n\nembedDocuments(documents);\n```\n\n### CLI with Progress\n\n```javascript\n// cli-embed.js\nimport { pipeline } from '@huggingface/transformers';\n\nasync function main() {\n  const text = process.argv[2] || 'Hello, world!';\n  \n  console.log('Loading model...');\n  \n  const fileProgress = {};\n  \n  const extractor = await pipeline(\n    'feature-extraction',\n    'onnx-community/all-MiniLM-L6-v2-ONNX',\n    {\n      progress_callback: (info) => {\n        if (info.status === 'progress') {\n          fileProgress[info.file] = info.progress;\n          \n          // Show all files progress\n          const progressLines = Object.entries(fileProgress)\n            .map(([file, progress]) => `  ${file}: ${progress.toFixed(1)}%`)\n            .join('\\n');\n          \n          process.stdout.write(`\\r\\x1b[K${progressLines}`);\n        }\n        \n        if (info.status === 'done') {\n          console.log(`\\n✓ ${info.file} complete`);\n        }\n        \n        if (info.status === 'ready') {\n          console.log('\\nModel ready!');\n        }\n      }\n    }\n  );\n  \n  console.log('Generating embedding...');\n  const output = await extractor(text, { pooling: 'mean', normalize: true });\n  \n  console.log(`\\nText: \"${text}\"`);\n  console.log(`Dimensions: ${output.data.length}`);\n  console.log(`First 5 values: ${Array.from(output.data).slice(0, 5).join(', ')}`);\n  \n  await extractor.dispose();\n}\n\nmain();\n```\n\n## React\n\n### Basic Component\n\n```jsx\n// EmbeddingGenerator.jsx\nimport { useState, useRef, useEffect } from 'react';\nimport { pipeline } from '@huggingface/transformers';\n\nexport function EmbeddingGenerator() {\n  const extractorRef = useRef(null);\n  const [text, setText] = useState('');\n  const [embedding, setEmbedding] = useState(null);\n  const [loading, setLoading] = useState(false);\n\n  const generate = async () => {\n    if (!text) return;\n    \n    setLoading(true);\n    \n    // Load model on first generate\n    if (!extractorRef.current) {\n      extractorRef.current = await pipeline(\n        'feature-extraction',\n        'onnx-community/all-MiniLM-L6-v2-ONNX'\n      );\n    }\n    \n    const output = await extractorRef.current(text, { \n      pooling: 'mean', \n      normalize: true \n    });\n    setEmbedding(Array.from(output.data));\n    setLoading(false);\n  };\n\n  // Cleanup on unmount\n  useEffect(() => {\n    return () => {\n      if (extractorRef.current) {\n        extractorRef.current.dispose();\n      }\n    };\n  }, []);\n\n  return (\n    <div>\n      <h2>Text Embedding Generator</h2>\n      \n      <textarea\n        value={text}\n        onChange={(e) => setText(e.target.value)}\n        placeholder=\"Enter text\"\n        disabled={loading}\n      />\n      \n      <button onClick={generate} disabled={loading || !text}>\n        {loading ? 'Processing...' : 'Generate Embedding'}\n      </button>\n      \n      {embedding && (\n        <div>\n          <h3>Result:</h3>\n          <p>Dimensions: {embedding.length}</p>\n          <p>First 5 values: {embedding.slice(0, 5).join(', ')}</p>\n        </div>\n      )}\n    </div>\n  );\n}\n```\n\n### With Progress Tracking\n\n```jsx\n// EmbeddingGeneratorWithProgress.jsx\nimport { useState, useRef, useEffect } from 'react';\nimport { pipeline } from '@huggingface/transformers';\n\nexport function EmbeddingGeneratorWithProgress() {\n  const extractorRef = useRef(null);\n  const [text, setText] = useState('');\n  const [embedding, setEmbedding] = useState(null);\n  const [fileProgress, setFileProgress] = useState({});\n  const [status, setStatus] = useState('');\n  const [loading, setLoading] = useState(false);\n\n  const generate = async () => {\n    if (!text) return;\n    \n    setLoading(true);\n    \n    // Load model on first generate\n    if (!extractorRef.current) {\n      setStatus('Loading model...');\n      \n      extractorRef.current = await pipeline(\n        'feature-extraction',\n        'onnx-community/all-MiniLM-L6-v2-ONNX',\n        {\n          progress_callback: (info) => {\n            setStatus(`${info.status}: ${info.file}`);\n            \n            if (info.status === 'progress') {\n              setFileProgress(prev => ({\n                ...prev,\n                [info.file]: info.progress\n              }));\n            }\n            \n            if (info.status === 'ready') {\n              setStatus('Model ready!');\n            }\n          }\n        }\n      );\n    }\n    \n    setStatus('Generating embedding...');\n    const output = await extractorRef.current(text, { \n      pooling: 'mean', \n      normalize: true \n    });\n    setEmbedding(Array.from(output.data));\n    setStatus('Complete!');\n    setLoading(false);\n  };\n\n  // Cleanup on unmount\n  useEffect(() => {\n    return () => {\n      if (extractorRef.current) {\n        extractorRef.current.dispose();\n      }\n    };\n  }, []);\n\n  return (\n    <div>\n      <h2>Text Embedding Generator</h2>\n      \n      {loading && Object.keys(fileProgress).length > 0 && (\n        <div>\n          <p>{status}</p>\n          {Object.entries(fileProgress).map(([file, progress]) => (\n            <div key={file} style={{ margin: '10px 0' }}>\n              <div style={{ fontSize: '12px', marginBottom: '5px' }}>{file}</div>\n              <div style={{ width: '100%', height: '20px', background: '#f0f0f0', borderRadius: '5px', overflow: 'hidden' }}>\n                <div \n                  style={{ \n                    width: `${progress}%`, \n                    height: '100%', \n                    background: '#4CAF50',\n                    transition: 'width 0.3s'\n                  }} \n                />\n              </div>\n            </div>\n          ))}\n        </div>\n      )}\n      \n      <textarea\n        value={text}\n        onChange={(e) => setText(e.target.value)}\n        placeholder=\"Enter text\"\n        disabled={loading}\n      />\n      \n      <button onClick={generate} disabled={loading || !text}>\n        {loading ? 'Processing...' : 'Generate Embedding'}\n      </button>\n      \n      {embedding && (\n        <div>\n          <h3>Result:</h3>\n          <p>Dimensions: {embedding.length}</p>\n          <p>First 5 values: {embedding.slice(0, 5).join(', ')}</p>\n        </div>\n      )}\n    </div>\n  );\n}\n```\n\n## Express API\n\n### Basic API Server\n\n```javascript\n// server.js\nimport express from 'express';\nimport { pipeline } from '@huggingface/transformers';\n\nconst app = express();\napp.use(express.json());\n\n// Initialize model once at startup\nlet extractor;\n(async () => {\n  console.log('Loading model...');\n  extractor = await pipeline(\n    'feature-extraction',\n    'onnx-community/all-MiniLM-L6-v2-ONNX'\n  );\n  console.log('Model ready!');\n})();\n\napp.post('/embed', async (req, res) => {\n  try {\n    const { text } = req.body;\n    \n    if (!text) {\n      return res.status(400).json({ error: 'Text is required' });\n    }\n    \n    const output = await extractor(text, { \n      pooling: 'mean', \n      normalize: true \n    });\n    \n    res.json({\n      text,\n      embedding: Array.from(output.data),\n      dimensions: output.data.length\n    });\n  } catch (error) {\n    console.error('Error:', error);\n    res.status(500).json({ error: 'Failed to generate embedding' });\n  }\n});\n\napp.listen(3000, () => {\n  console.log('Server running on http://localhost:3000');\n});\n```\n\n### API with Graceful Shutdown\n\n```javascript\n// server-with-shutdown.js\nimport express from 'express';\nimport { pipeline } from '@huggingface/transformers';\n\nconst app = express();\napp.use(express.json());\n\nlet extractor;\nlet server;\n\nasync function initialize() {\n  console.log('Loading model...');\n  extractor = await pipeline(\n    'feature-extraction',\n    'onnx-community/all-MiniLM-L6-v2-ONNX'\n  );\n  console.log('Model ready!');\n}\n\napp.post('/embed', async (req, res) => {\n  try {\n    const { text } = req.body;\n    \n    if (!text) {\n      return res.status(400).json({ error: 'Text is required' });\n    }\n    \n    const output = await extractor(text, { \n      pooling: 'mean', \n      normalize: true \n    });\n    \n    res.json({\n      embedding: Array.from(output.data),\n      dimensions: output.data.length\n    });\n  } catch (error) {\n    res.status(500).json({ error: error.message });\n  }\n});\n\nasync function shutdown(signal) {\n  console.log(`\\n${signal} received. Shutting down...`);\n  \n  if (server) {\n    server.close(() => {\n      console.log('HTTP server closed');\n    });\n  }\n  \n  if (extractor) {\n    console.log('Disposing model...');\n    await extractor.dispose();\n    console.log('Model disposed');\n  }\n  \n  process.exit(0);\n}\n\nprocess.on('SIGTERM', () => shutdown('SIGTERM'));\nprocess.on('SIGINT', () => shutdown('SIGINT'));\n\ninitialize().then(() => {\n  server = app.listen(3000, () => {\n    console.log('Server running on http://localhost:3000');\n  });\n});\n```\n\n---\n\nThese examples demonstrate the same functionality across different runtimes and frameworks, making it easy to adapt to your specific use case. All examples include proper cleanup with `.dispose()` to free memory.\n"
  },
  {
    "path": "skills/transformers.js/references/MODEL_ARCHITECTURES.md",
    "content": "# Supported Model Architectures\n\nThis document lists the model architectures currently supported by Transformers.js.\n\n## Natural Language Processing\n\n### Text Models\n- **ALBERT** - A Lite BERT for Self-supervised Learning\n- **BERT** - Bidirectional Encoder Representations from Transformers\n- **CamemBERT** - French language model based on RoBERTa\n- **CodeGen** - Code generation models\n- **CodeLlama** - Code-focused Llama models\n- **Cohere** - Command-R models for RAG\n- **DeBERTa** - Decoding-enhanced BERT with Disentangled Attention\n- **DeBERTa-v2** - Improved version of DeBERTa\n- **DistilBERT** - Distilled version of BERT (smaller, faster)\n- **GPT-2** - Generative Pre-trained Transformer 2\n- **GPT-Neo** - Open source GPT-3 alternative\n- **GPT-NeoX** - Larger GPT-Neo models\n- **LLaMA** - Large Language Model Meta AI\n- **Mistral** - Mistral AI language models\n- **MPNet** - Masked and Permuted Pre-training\n- **MobileBERT** - Compressed BERT for mobile devices\n- **RoBERTa** - Robustly Optimized BERT\n- **T5** - Text-to-Text Transfer Transformer\n- **XLM-RoBERTa** - Multilingual RoBERTa\n\n### Sequence-to-Sequence\n- **BART** - Denoising Sequence-to-Sequence Pre-training\n- **Blenderbot** - Open-domain chatbot\n- **BlenderbotSmall** - Smaller Blenderbot variant\n- **M2M100** - Many-to-Many multilingual translation\n- **MarianMT** - Neural machine translation\n- **mBART** - Multilingual BART\n- **NLLB** - No Language Left Behind (200 languages)\n- **Pegasus** - Pre-training with extracted gap-sentences\n\n## Computer Vision\n\n### Image Classification\n- **BEiT** - BERT Pre-Training of Image Transformers\n- **ConvNeXT** - Modern ConvNet architecture\n- **ConvNeXTV2** - Improved ConvNeXT\n- **DeiT** - Data-efficient Image Transformers\n- **DINOv2** - Self-supervised Vision Transformer\n- **DINOv3** - Latest DINO iteration\n- **EfficientNet** - Efficient convolutional networks\n- **MobileNet** - Lightweight models for mobile\n- **MobileViT** - Mobile Vision Transformer\n- **ResNet** - Residual Networks\n- **SegFormer** - Semantic segmentation transformer\n- **Swin** - Shifted Window Transformer\n- **ViT** - Vision Transformer\n\n### Object Detection\n- **DETR** - Detection Transformer\n- **D-FINE** - Fine-grained Distribution Refinement for object detection\n- **DINO** - DETR with Improved deNoising anchOr boxes\n- **Grounding DINO** - Open-set object detection\n- **YOLOS** - You Only Look at One Sequence\n\n### Segmentation\n- **CLIPSeg** - Image segmentation with text prompts\n- **Mask2Former** - Universal image segmentation\n- **SAM** - Segment Anything Model\n- **EdgeTAM** - On-Device Track Anything Model\n\n### Depth & Pose\n- **DPT** - Dense Prediction Transformer\n- **Depth Anything** - Monocular depth estimation\n- **Depth Pro** - Sharp monocular metric depth\n- **GLPN** - Global-Local Path Networks for depth\n\n## Audio\n\n### Speech Recognition\n- **Wav2Vec2** - Self-supervised speech representations\n- **Whisper** - Robust speech recognition (multilingual)\n- **HuBERT** - Self-supervised speech representation learning\n\n### Audio Processing\n- **Audio Spectrogram Transformer** - Audio classification\n- **DAC** - Descript Audio Codec\n\n### Text-to-Speech\n- **SpeechT5** - Unified speech and text pre-training\n- **VITS** - Conditional Variational Autoencoder with adversarial learning\n\n## Multimodal\n\n### Vision-Language\n- **CLIP** - Contrastive Language-Image Pre-training\n- **Chinese-CLIP** - Chinese version of CLIP\n- **ALIGN** - Large-scale noisy image-text pairs\n- **BLIP** - Bootstrapping Language-Image Pre-training\n- **Florence-2** - Unified vision foundation model\n- **LLaVA** - Large Language and Vision Assistant\n- **Moondream** - Tiny vision-language model\n\n### Document Understanding\n- **DiT** - Document Image Transformer\n- **Donut** - OCR-free Document Understanding\n- **LayoutLM** - Pre-training for document understanding\n- **TrOCR** - Transformer-based OCR\n\n### Audio-Language\n- **CLAP** - Contrastive Language-Audio Pre-training\n\n## Embeddings & Similarity\n\n- **Sentence Transformers** - Sentence embeddings\n- **all-MiniLM** - Efficient sentence embeddings\n- **all-mpnet-base** - High-quality sentence embeddings\n- **E5** - Text embeddings by Microsoft\n- **BGE** - General embedding models\n- **nomic-embed** - Long context embeddings\n\n## Specialized Models\n\n### Code\n- **CodeBERT** - Pre-trained model for code\n- **GraphCodeBERT** - Code structure understanding\n- **StarCoder** - Code generation\n\n### Scientific\n- **SciBERT** - Scientific text\n- **BioBERT** - Biomedical text\n\n### Retrieval\n- **ColBERT** - Contextualized late interaction over BERT\n- **DPR** - Dense Passage Retrieval\n\n## Model Selection Tips\n\n### For Text Tasks\n- **Small & Fast**: DistilBERT, MobileBERT\n- **Balanced**: BERT-base, RoBERTa-base\n- **High Accuracy**: RoBERTa-large, DeBERTa-v3-large\n- **Multilingual**: XLM-RoBERTa, mBERT\n\n### For Vision Tasks\n- **Mobile/Browser**: MobileNet, EfficientNet-B0\n- **Balanced**: DeiT-base, ConvNeXT-tiny\n- **High Accuracy**: Swin-large, DINOv2-large\n\n### For Audio Tasks\n- **Speech Recognition**: Whisper-tiny (fast), Whisper-large (accurate)\n- **Audio Classification**: Audio Spectrogram Transformer\n\n### For Multimodal\n- **Vision-Language**: CLIP (general), Florence-2 (comprehensive)\n- **Document AI**: Donut, LayoutLM\n- **OCR**: TrOCR\n\n## Finding Models on Hugging Face Hub\n\nSearch for compatible models:\n```\nhttps://huggingface.co/models?library=transformers.js\n```\n\nFilter by task:\n```\nhttps://huggingface.co/models?pipeline_tag=text-classification&library=transformers.js\n```\n\nCheck for ONNX support by looking for `onnx/` folder in model repository.\n"
  },
  {
    "path": "skills/transformers.js/references/PIPELINE_OPTIONS.md",
    "content": "# Pipeline Options Reference\n\nGuide to configuring model loading and inference using the `PretrainedModelOptions` parameter in the `pipeline()` function.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Basic Options](#basic-options)\n3. [Model Loading Options](#model-loading-options)\n4. [Device and Performance Options](#device-and-performance-options)\n5. [Common Configuration Patterns](#common-configuration-patterns)\n\n## Overview\n\nThe `pipeline()` function accepts three parameters:\n\n```javascript\nimport { pipeline } from '@huggingface/transformers';\n\nconst pipe = await pipeline(\n  'task-name',           // 1. Task type (e.g., 'sentiment-analysis')\n  'model-id',            // 2. Model identifier (optional, uses default if null)\n  options                // 3. PretrainedModelOptions (optional)\n);\n```\n\nThe third parameter, `options`, allows you to configure how the model is loaded and executed.\n\n### Available Options\n\n```typescript\ninterface PretrainedModelOptions {\n  // Progress tracking\n  progress_callback?: (info: ProgressInfo) => void;\n  \n  // Model configuration\n  config?: PretrainedConfig;\n  \n  // Cache and loading\n  cache_dir?: string;\n  local_files_only?: boolean;\n  revision?: string;\n  \n  // Model-specific settings\n  subfolder?: string;\n  model_file_name?: string;\n  \n  // Device and performance\n  device?: DeviceType | Record<string, DeviceType>;\n  dtype?: DataType | Record<string, DataType>;\n  \n  // External data format (large models)\n  use_external_data_format?: boolean | number | Record<string, boolean | number>;\n  \n  // ONNX Runtime settings\n  session_options?: InferenceSession.SessionOptions;\n}\n```\n\n## Basic Options\n\n### Progress Callback\n\nTrack model download and loading progress. **Note:** Models consist of multiple files (model weights, config, tokenizer, etc.), and each file reports its own progress:\n\n```javascript\nconst fileProgress = {};\n\nconst pipe = await pipeline('sentiment-analysis', null, {\n  progress_callback: (info) => {\n    if (info.status === 'progress') {\n      fileProgress[info.file] = info.progress;\n      console.log(`${info.file}: ${info.progress.toFixed(1)}%`);\n    }\n    \n    if (info.status === 'done') {\n      console.log(`✓ ${info.file} complete`);\n    }\n  }\n});\n```\n\n**Progress Info Types:**\n\n```typescript\ntype ProgressInfo = {\n  status: 'initiate' | 'download' | 'progress' | 'done' | 'ready';\n  name: string;       // Model id or path\n  file: string;       // File being processed\n  progress?: number;  // Percentage (0-100, only for 'progress' status)\n  loaded?: number;    // Bytes downloaded (only for 'progress' status)\n  total?: number;     // Total bytes (only for 'progress' status)\n};\n```\n\n**Example: Browser Loading UI with Multiple Files**\n\n```javascript\nconst statusDiv = document.getElementById('status');\nconst progressContainer = document.getElementById('progress-container');\nconst fileProgressBars = {};\n\nconst pipe = await pipeline('image-classification', null, {\n  progress_callback: (info) => {\n    if (info.status === 'progress') {\n      // Create progress bar for each file if not exists\n      if (!fileProgressBars[info.file]) {\n        const fileDiv = document.createElement('div');\n        fileDiv.innerHTML = `\n          <div class=\"file-name\">${info.file}</div>\n          <div class=\"progress-bar\">\n            <div class=\"progress-fill\" style=\"width: 0%\"></div>\n          </div>\n        `;\n        progressContainer.appendChild(fileDiv);\n        fileProgressBars[info.file] = fileDiv.querySelector('.progress-fill');\n      }\n      \n      // Update progress bar\n      fileProgressBars[info.file].style.width = `${info.progress}%`;\n      \n      const mb = (info.loaded / 1024 / 1024).toFixed(2);\n      const totalMb = (info.total / 1024 / 1024).toFixed(2);\n      statusDiv.textContent = `${info.file}: ${mb}/${totalMb} MB`;\n    }\n    \n    if (info.status === 'ready') {\n      statusDiv.textContent = 'Model ready!';\n    }\n  }\n});\n```\n\nFor more progress tracking examples, see the examples in this section above.\n\n### Custom Configuration\n\nOverride the model's default configuration:\n\n```javascript\nimport { pipeline } from '@huggingface/transformers';\n\nconst pipe = await pipeline('text-generation', 'model-id', {\n  config: {\n    max_length: 512,\n    temperature: 0.8,\n    // ... other config options\n  }\n});\n```\n\n**Use cases:**\n- Override default generation parameters\n- Adjust model-specific settings\n- Test different configurations without modifying model files\n\n## Model Loading Options\n\n### Cache Directory\n\nSpecify where to cache downloaded models:\n\n```javascript\n// Node.js: Custom cache location\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  cache_dir: './my-custom-cache'\n});\n```\n\n**Default behavior:**\n- If not specified, uses `env.cacheDir` (default: `./.cache`)\n- Only applies when `env.useFSCache = true` (Node.js)\n- Browser cache uses Cache API (configured via `env.cacheKey`)\n\n\n\n### Local Files Only\n\nPrevent any network requests:\n\n```javascript\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  local_files_only: true\n});\n```\n\n**Use cases:**\n- Offline applications\n- Air-gapped environments\n- Testing with pre-downloaded models\n- Production deployments with bundled models\n\n**Important:**\n- Model must already be cached or available locally\n- Throws error if model not found locally\n- Requires `env.allowLocalModels = true`\n\n\n\n### Model Revision\n\nSpecify a specific model version (git branch, tag, or commit):\n\n```javascript\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  revision: 'v1.0.0'  // Use specific version\n});\n\n// Or use a branch\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  revision: 'experimental'\n});\n\n// Or use a commit hash\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  revision: 'abc123def456'\n});\n```\n\n**Default:** `'main'` (latest version)\n\n**Use cases:**\n- Pin to stable release for production\n- Test experimental features\n- Reproduce results with specific model version\n- Work with models under development\n\n**Important:**\n- Only applies to remote models (Hugging Face Hub)\n- Ignored for local file paths\n- Each revision is cached separately\n\n### Model Subfolder\n\nSpecify the subfolder within the model repository:\n\n```javascript\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  subfolder: 'onnx'  // Default: 'onnx'\n});\n```\n\n**Default:** `'onnx'`\n\n**Use cases:**\n- Custom model repository structure\n- Multiple model variants in same repo\n- Organizational preferences\n\n\n\n### Model File Name\n\nSpecify a custom model file name (without `.onnx` extension):\n\n```javascript\nconst pipe = await pipeline('text-generation', 'model-id', {\n  model_file_name: 'decoder_model_merged'\n});\n// Loads: decoder_model_merged.onnx\n```\n\n**Use cases:**\n- Models with non-standard file names\n- Select specific model variant\n- Encoder-decoder models with separate files\n\n**Note:** Currently only valid for encoder-only or decoder-only models.\n\n\n\n## Device and Performance Options\n\n### Device Selection\n\nChoose where to run the model:\n\n```javascript\n// Run on CPU (WASM - default)\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  device: 'wasm'\n});\n\n// Run on GPU (WebGPU)\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  device: 'webgpu'\n});\n```\n\n**Common devices:**\n- `'wasm'` - WebAssembly (CPU, most compatible)\n- `'webgpu'` - WebGPU (GPU, faster in browsers)\n- `'cpu'` - CPU\n- `'gpu'` - Auto-detect GPU\n- `'cuda'` - NVIDIA CUDA (Node.js with GPU)\n\nSee the full list in the [devices.js source](https://github.com/huggingface/transformers.js/blob/main/src/utils/devices.js).\n\n**Per-component device selection:**\n\nFor models with multiple components (encoder-decoder, vision-encoder-decoder, etc.):\n\n```javascript\nconst pipe = await pipeline('automatic-speech-recognition', 'model-id', {\n  device: {\n    encoder: 'webgpu',    // Run encoder on GPU\n    decoder: 'wasm'       // Run decoder on CPU\n  }\n});\n```\n\n**WebGPU Requirements:**\n- Chrome/Edge 113+\n- Enable chrome://flags/#enable-unsafe-webgpu (if needed)\n- Adequate GPU memory\n\n\n\n### Data Type (Quantization)\n\nControl model precision and size:\n\n```javascript\n// Full precision (largest, most accurate)\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  dtype: 'fp32'\n});\n\n// Half precision (balanced)\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  dtype: 'fp16'\n});\n\n// 8-bit quantization (smaller, faster)\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  dtype: 'q8'\n});\n\n// 4-bit quantization (smallest, fastest)\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  dtype: 'q4'\n});\n```\n\n**Common data types:**\n- `'fp32'` - 32-bit floating point (full precision)\n- `'fp16'` - 16-bit floating point (half precision)\n- `'q8'` - 8-bit quantized (good balance)\n- `'q4'` - 4-bit quantized (maximum compression)\n- `'int8'` - 8-bit integer\n- `'uint8'` - 8-bit unsigned integer\n\nSee the full list in the [dtypes.js source](https://github.com/huggingface/transformers.js/blob/main/src/utils/dtypes.js).\n\n**Per-component data type:**\n\n```javascript\nconst pipe = await pipeline('automatic-speech-recognition', 'model-id', {\n  dtype: {\n    encoder: 'fp32',  // Encoder at full precision\n    decoder: 'q8'     // Decoder quantized\n  }\n});\n```\n\n**Trade-offs:**\n\n| Data Type | Model Size | Speed | Accuracy | Use Case |\n|-----------|-----------|-------|----------|----------|\n| `fp32` | Largest | Slowest | Highest | Research, maximum quality |\n| `fp16` | Medium | Medium | High | Production, GPU inference |\n| `q8` | Small | Fast | Good | Production, CPU inference |\n| `q4` | Smallest | Fastest | Acceptable | Edge devices, real-time apps |\n\n\n\n### External Data Format\n\nFor models >= 2GB, ONNX uses external data format:\n\n```javascript\n// Automatically detect and load external data\nconst pipe = await pipeline('text-generation', 'large-model-id', {\n  use_external_data_format: true\n});\n\n// Specify number of external data chunks\nconst pipe = await pipeline('text-generation', 'large-model-id', {\n  use_external_data_format: 5  // Load 5 chunks (model.onnx_data_0 to _4)\n});\n```\n\n**How it works:**\n- Models >= 2GB split weights into separate files\n- Main file: `model.onnx` (structure only)\n- Data files: `model.onnx_data` or `model.onnx_data_0`, `model.onnx_data_1`, etc.\n\n**Default behavior:**\n- `false` - No external data (models < 2GB)\n- `true` - Load external data automatically\n- `number` - Load this many external data chunks\n\n**Maximum chunks:** 100 (defined by `MAX_EXTERNAL_DATA_CHUNKS`)\n\n**Per-component external data:**\n\n```javascript\nconst pipe = await pipeline('text-generation', 'large-model-id', {\n  use_external_data_format: {\n    encoder: true,\n    decoder: 3  // Decoder has 3 external data chunks\n  }\n});\n```\n\n\n\n### Session Options\n\nAdvanced ONNX Runtime configuration:\n\n```javascript\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  session_options: {\n    executionProviders: ['webgpu', 'wasm'],\n    graphOptimizationLevel: 'all',\n    enableCpuMemArena: true,\n    enableMemPattern: true,\n    executionMode: 'sequential',\n    logSeverityLevel: 2,\n    logVerbosityLevel: 0\n  }\n});\n```\n\n**Common session options:**\n\n| Option | Description | Default |\n|--------|-------------|---------|\n| `executionProviders` | Ordered list of execution providers | `['wasm']` |\n| `graphOptimizationLevel` | Graph optimization: `'disabled'`, `'basic'`, `'extended'`, `'all'` | `'all'` |\n| `enableCpuMemArena` | Enable CPU memory arena for faster memory allocation | `true` |\n| `enableMemPattern` | Enable memory pattern optimization | `true` |\n| `executionMode` | `'sequential'` or `'parallel'` | `'sequential'` |\n| `logSeverityLevel` | 0=Verbose, 1=Info, 2=Warning, 3=Error, 4=Fatal | `2` |\n| `freeDimensionOverrides` | Override dynamic dimensions (e.g., `{ batch_size: 1 }`) | - |\n\n**Use cases:**\n- Fine-tune performance for specific hardware\n- Debug model execution issues\n- Override dynamic shapes\n- Control memory usage\n\n\n\n## Common Configuration Patterns\n\n### Development\n\nFast iteration with progress tracking:\n\n```javascript\nimport { pipeline } from '@huggingface/transformers';\n\nconst pipe = await pipeline('sentiment-analysis', null, {\n  progress_callback: (info) => {\n    if (info.status === 'progress') {\n      console.log(`${info.file}: ${info.progress.toFixed(1)}%`);\n    }\n  }\n});\n```\n\n### Production (GPU)\n\nUse WebGPU with fp16 for better performance:\n\n```javascript\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  device: 'webgpu',\n  dtype: 'fp16'\n});\n```\n\n### Production (CPU)\n\nUse quantization for smaller size and faster CPU inference:\n\n```javascript\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  dtype: 'q8'  // or 'q4' for even smaller\n});\n```\n\n### Offline/Local\n\nPrevent network requests, use only local models:\n\n```javascript\nimport { pipeline, env } from '@huggingface/transformers';\n\nenv.allowLocalModels = true;\nenv.localModelPath = './models/';\n\nconst pipe = await pipeline('sentiment-analysis', 'model-id', {\n  local_files_only: true\n});\n```\n\n### Per-Component Settings\n\nFor encoder-decoder models, configure each component separately:\n\n```javascript\nconst pipe = await pipeline('automatic-speech-recognition', 'model-id', {\n  device: {\n    encoder: 'webgpu',\n    decoder: 'wasm'\n  },\n  dtype: {\n    encoder: 'fp16',\n    decoder: 'q8'\n  }\n});\n```\n\n## Related Documentation\n\n- [Configuration Reference](./CONFIGURATION.md) - Environment configuration with `env` object\n- [Text Generation Guide](./TEXT_GENERATION.md) - Text generation options and streaming\n- [Model Architectures](./MODEL_ARCHITECTURES.md) - Supported models and selection tips\n- [Main Skill Guide](../SKILL.md) - Getting started with Transformers.js\n\n## Best Practices\n\n1. **Progress Callbacks**: Use `progress_callback` for large models to show download progress\n2. **Quantization**: Use `q8` or `q4` for CPU inference to reduce size and improve speed\n3. **Device Selection**: Use `webgpu` for better performance when available\n4. **Offline-First**: Use `local_files_only: true` in production to avoid runtime downloads\n5. **Version Pinning**: Use `revision` to pin model versions for reproducible deployments\n6. **Memory Management**: Always dispose pipelines with `pipe.dispose()` when done\n\n---\n\nThis document covers all available options for the `pipeline()` function. For environment-level configuration (remote hosts, global cache settings, WASM paths), see the [Configuration Reference](./CONFIGURATION.md).\n"
  },
  {
    "path": "skills/transformers.js/references/TEXT_GENERATION.md",
    "content": "# Text Generation Guide\n\nGuide to generating text with Transformers.js, including streaming and chat format.\n\n## Table of Contents\n\n1. [Basic Generation](#basic-generation)\n2. [Streaming](#streaming)\n3. [Chat Format](#chat-format)\n4. [Generation Parameters](#generation-parameters)\n5. [Model Selection](#model-selection)\n6. [Best Practices](#best-practices)\n\n## Basic Generation\n\n```javascript\nimport { pipeline } from '@huggingface/transformers';\n\nconst generator = await pipeline(\n  'text-generation',\n  'onnx-community/Qwen2.5-0.5B-Instruct',\n  { dtype: 'q4' }\n);\n\nconst result = await generator('Once upon a time', {\n  max_new_tokens: 100,\n  temperature: 0.7,\n});\n\nconsole.log(result[0].generated_text);\n\n// Clean up when done\nawait generator.dispose();\n```\n\n## Streaming\n\nStream tokens as they're generated for better UX. Once you understand streaming, you can combine it with other features like chat format.\n\n### Node.js\n\n```javascript\nimport { pipeline, TextStreamer } from '@huggingface/transformers';\n\nconst generator = await pipeline(\n  'text-generation',\n  'onnx-community/Qwen2.5-0.5B-Instruct',\n  { dtype: 'q4' }\n);\n\nconst streamer = new TextStreamer(generator.tokenizer, {\n  skip_prompt: true,\n  skip_special_tokens: true,\n  callback_function: (token) => {\n    process.stdout.write(token);\n  },\n});\n\nawait generator('Tell me a story', {\n  max_new_tokens: 200,\n  temperature: 0.7,\n  streamer,\n});\n```\n\n### Browser\n\n```html\n<!DOCTYPE html>\n<html>\n<body>\n  <textarea id=\"prompt\" placeholder=\"Enter prompt...\"></textarea>\n  <button onclick=\"generate()\">Generate</button>\n  <div id=\"output\"></div>\n\n  <script type=\"module\">\n    import { pipeline, TextStreamer } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.8.1';\n    \n    const generator = await pipeline(\n      'text-generation',\n      'onnx-community/Qwen2.5-0.5B-Instruct',\n      { dtype: 'q4' }\n    );\n    \n    window.generate = async function() {\n      const prompt = document.getElementById('prompt').value;\n      const outputDiv = document.getElementById('output');\n      outputDiv.textContent = '';\n      \n      const streamer = new TextStreamer(generator.tokenizer, {\n        skip_prompt: true,\n        skip_special_tokens: true,\n        callback_function: (token) => {\n          outputDiv.textContent += token;\n        },\n      });\n      \n      await generator(prompt, {\n        max_new_tokens: 200,\n        temperature: 0.7,\n        streamer,\n      });\n    };\n  </script>\n</body>\n</html>\n```\n\n### React\n\n```jsx\nimport { useState, useRef, useEffect } from 'react';\nimport { pipeline, TextStreamer } from '@huggingface/transformers';\n\nfunction StreamingGenerator() {\n  const generatorRef = useRef(null);\n  const [output, setOutput] = useState('');\n  const [loading, setLoading] = useState(false);\n\n  const handleGenerate = async (prompt) => {\n    if (!prompt) return;\n    \n    setLoading(true);\n    setOutput('');\n    \n    // Load model on first generate\n    if (!generatorRef.current) {\n      generatorRef.current = await pipeline(\n        'text-generation',\n        'onnx-community/Qwen2.5-0.5B-Instruct',\n        { dtype: 'q4' }\n      );\n    }\n    \n    const streamer = new TextStreamer(generatorRef.current.tokenizer, {\n      skip_prompt: true,\n      skip_special_tokens: true,\n      callback_function: (token) => {\n        setOutput((prev) => prev + token);\n      },\n    });\n\n    await generatorRef.current(prompt, {\n      max_new_tokens: 200,\n      temperature: 0.7,\n      streamer,\n    });\n    \n    setLoading(false);\n  };\n\n  // Cleanup on unmount\n  useEffect(() => {\n    return () => {\n      if (generatorRef.current) {\n        generatorRef.current.dispose();\n      }\n    };\n  }, []);\n\n  return (\n    <div>\n      <button onClick={() => handleGenerate('Tell me a story')} disabled={loading}>\n        {loading ? 'Generating...' : 'Generate'}\n      </button>\n      <div>{output}</div>\n    </div>\n  );\n}\n```\n\n## Chat Format\n\nUse structured messages for conversations. Works with both basic generation and streaming (just add `streamer` parameter).\n\n### Single Turn\n\n```javascript\nimport { pipeline } from '@huggingface/transformers';\n\nconst generator = await pipeline(\n  'text-generation',\n  'onnx-community/Qwen2.5-0.5B-Instruct',\n  { dtype: 'q4' }\n);\n\nconst messages = [\n  { role: 'system', content: 'You are a helpful assistant.' },\n  { role: 'user', content: 'How do I create an async function?' }\n];\n\nconst result = await generator(messages, {\n  max_new_tokens: 256,\n  temperature: 0.7,\n});\n\nconsole.log(result[0].generated_text);\n```\n\n### Multi-turn Conversation\n\n```javascript\nconst conversation = [\n  { role: 'system', content: 'You are a helpful assistant.' },\n  { role: 'user', content: 'What is JavaScript?' },\n  { role: 'assistant', content: 'JavaScript is a programming language...' },\n  { role: 'user', content: 'Can you show an example?' }\n];\n\nconst result = await generator(conversation, {\n  max_new_tokens: 200,\n  temperature: 0.7,\n});\n\n// To add streaming, just pass a streamer:\n// streamer: new TextStreamer(generator.tokenizer, {...})\n```\n\n## Generation Parameters\n\n### Common Parameters\n\n```javascript\nawait generator(prompt, {\n  // Token limits\n  max_new_tokens: 512,        // Maximum tokens to generate\n  min_new_tokens: 0,          // Minimum tokens to generate\n  \n  // Sampling\n  temperature: 0.7,           // Randomness (0.0-2.0)\n  top_k: 50,                  // Consider top K tokens\n  top_p: 0.95,                // Nucleus sampling\n  do_sample: true,            // Use random sampling (false = always pick most likely token)\n  \n  // Repetition control\n  repetition_penalty: 1.0,    // Penalty for repeating (1.0 = no penalty)\n  no_repeat_ngram_size: 0,    // Prevent repeating n-grams\n  \n  // Streaming\n  streamer: streamer,         // TextStreamer instance\n});\n```\n\n### Parameter Effects\n\n**Temperature:**\n- Low (0.1-0.5): More focused and deterministic\n- Medium (0.6-0.9): Balanced creativity and coherence\n- High (1.0-2.0): More creative and random\n\n```javascript\n// Focused output\nawait generator(prompt, { temperature: 0.3, max_new_tokens: 100 });\n\n// Creative output\nawait generator(prompt, { temperature: 1.2, max_new_tokens: 100 });\n```\n\n**Sampling Methods:**\n\n```javascript\n// Greedy (deterministic)\nawait generator(prompt, { \n  do_sample: false,\n  max_new_tokens: 100 \n});\n\n// Top-k sampling\nawait generator(prompt, { \n  top_k: 50,\n  temperature: 0.7,\n  max_new_tokens: 100 \n});\n\n// Top-p (nucleus) sampling\nawait generator(prompt, { \n  top_p: 0.95,\n  temperature: 0.7,\n  max_new_tokens: 100 \n});\n```\n\n## Model Selection\n\nBrowse available text generation models on Hugging Face Hub:\n\n**https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending**\n\n### Selection Tips\n\n- **Small models (< 1B params)**: Fast, browser-friendly, use `dtype: 'q4'`\n- **Medium models (1-3B params)**: Balanced quality/speed, use `dtype: 'q4'` or `fp16`\n- **Large models (> 3B params)**: High quality, slower, best for Node.js with `dtype: 'fp16'`\n\nCheck model cards for:\n- Parameter count and model size\n- Supported languages\n- Benchmark scores\n- License restrictions\n\n## Best Practices\n\n1. **Model Size**: Use quantized models (`q4`) for browsers, larger models (`fp16`) for servers\n2. **Streaming**: Use streaming for better UX - shows progress and feels responsive\n3. **Token Limits**: Set `max_new_tokens` to prevent runaway generation\n4. **Temperature**: Tune based on use case (creative: 0.8-1.2, factual: 0.3-0.7)\n5. **Memory**: Always call `dispose()` when done\n6. **Caching**: Load model once, reuse for multiple requests\n\n## Related Documentation\n\n- [Pipeline Options](./PIPELINE_OPTIONS.md) - Configure pipeline loading\n- [Configuration Reference](./CONFIGURATION.md) - Environment settings\n- [Code Examples](./EXAMPLES.md) - More examples for different runtimes\n- [Main Skill Guide](../SKILL.md) - Getting started guide\n"
  }
]