[
  {
    "path": ".github/workflows/collate.yml",
    "content": "---\nname: Collate\n\n\"on\":\n  workflow_dispatch:\n  push:\n    paths:\n      - assets/**\n\njobs:\n  collate-assets:\n    name: Collate all assets into single file\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout\n        uses: actions/checkout@v3\n        with:\n          fetch-depth: 0\n\n      - name: Setup Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: '3.9'\n\n      - name: Install dependencies\n        run: python -m pip install --upgrade pip pandas pyyaml\n\n      - name: collate\n        run: |-\n          ./scripts/collate.py\n\n      - name: commit\n        run: |-\n          git config user.name \"GitHub Actions Bot\"\n          git config user.email \"<>\"\n          git add resources/all_assets.csv\n          git commit -m \"Update collated assets CSV.\"\n          git push origin main\n"
  },
  {
    "path": ".github/workflows/deploy.yml",
    "content": "---\nname: Deploy\n\n\"on\":\n  push:\n    branches: main\n\njobs:\n  # Check out the main branch of the development repository and force update the\n  # main of the website repository. Setting fetch-depth to 0 allows us to\n  # retrieve all the history for the specified branch.\n  update-website:\n    name: Update Website\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout\n        uses: actions/checkout@v3\n        with:\n          fetch-depth: 0\n\n      - name: Setup SSH\n        uses: MrSquaare/ssh-setup-action@v1\n        with:\n          host: github.com\n          private-key: ${{ secrets.WEBSITE_DEPLOY_SSH_PRIVATE_KEY }}\n\n      - name: Update the github-pages branch of the website repository\n        env:\n          COMMIT_AUTHOR: Stanford-CRFM-Website\n        run: |\n          # Set up Git configuration\n          git config --global user.name \"${{ env.COMMIT_AUTHOR }}\"\n          git config --global user.email \"username@users.noreply.github.com\"\n\n          # Update the website repository\n          git remote add website git+ssh://git@github.com/stanford-crfm-website/ecosystem-graphs.git\n          git push -u website main\n"
  },
  {
    "path": ".github/workflows/test.yml",
    "content": "---\nname: Test\n\n\"on\":\n  push:\n    branches: main\n  pull_request:\n    branches:\n      - main\n\njobs:\n  format:\n    name: Format YAML files\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout\n        uses: actions/checkout@v3\n\n      - name: Set up Python 3.8\n        uses: actions/setup-python@v1\n        with:\n          python-version: 3.8\n\n      - name: Validate format\n        run: sh ./pre-commit.sh && git diff --exit-code\n"
  },
  {
    "path": ".gitignore",
    "content": "# Development Environment\n.DS_Store\n.vscode\n\n# Group specific\nnav\ntags"
  },
  {
    "path": ".yamllint.yaml",
    "content": "---\n\nyaml-files:\n  - '*.yaml'\n  - '*.yml'\n  - '.yamllint'\n\nrules:\n  braces: enable\n  brackets: enable\n  colons: enable\n  commas: enable\n  comments:\n    level: warning\n  comments-indentation:\n    level: warning\n  document-end: disable\n  document-start:\n    level: warning\n  empty-lines: enable\n  empty-values: disable\n  hyphens: enable\n  indentation:\n    level: warning\n  key-duplicates: enable\n  key-ordering: disable\n  line-length: disable\n  new-line-at-end-of-file: enable\n  new-lines: enable\n  octal-values: disable\n  quoted-strings: disable\n  trailing-spaces: enable\n  truthy:\n    level: warning\n"
  },
  {
    "path": "README.md",
    "content": "# Ecosystem Graphs\n\n## Overview\nWelcome! Ecosystem Graphs is an ongoing effort to track the foundation model ecosystem, namely both the assets (datasets, models, and applications) and their relationships. Using it, one can answer questions such as: What are the latest foundation models? Who builds them and where are they used downstream? What are the general trends over time? We hope that ecosystem graphs will be a useful resource for researchers, application developers, policymakers, and the public to better understand the foundation models ecosystem.\n\nTo explore the ecosystem, check out the [website](https://crfm.stanford.edu/ecosystem-graphs/) or read the [paper](https://arxiv.org/abs/2303.15772).\n\nBriefly, an ecosystem graph is defined by:\n- **Assets.** These are the nodes in the graph, which can be datasets (e.g. The Pile), models (e.g. Stable Diffusion), or applications (e.g. Microsoft Word).\n- **Dependencies.** These are the edges in the graph, which indicate how assets are built (e.g. the BLOOM model is trained on the ROOTS dataset).\n- **Ecosystem cards.** These are structured cards that house metadata on each asset (e.g. who built it, when, what is the license).\n\n## Contribute\nWe actively encourage community contributions. To contribute:\n- Add assets by filling out [this form](https://forms.gle/VqnSsZhv62hJ5rP36). No coding expertise required!\n- Submit a PR (run `precommit.sh` before submitting)\n\nTo visualize and explore your changes, start a local server:\n\n    python server.py\n\nand navigate to [http://localhost:8000](http://localhost:8000).\n\n## Cite as\n\n```\n@article{bommasani2023ecosystem-graphs,\n  author       = {Bommasani, Rishi and\n                  Soylu, Dilara and\n                  Liao, Thomas I. and\n                  Creel, Kathleen A. and\n                  Liang, Percy},\n  title        = {Ecosystem Graphs: The Social Footprint of Foundation Models},\n  month        = mar,\n  year         = 2023,\n  url          = {https://arxiv.org/abs/2303.15772}\n}\n```\n"
  },
  {
    "path": "assets/01ai.yaml",
    "content": "- access: open\n  analysis: Evaluated on standard language benchmarks, common sense reasoning, and\n    reading comprehension in comparison to SoTA LLMs.\n  created_date: 2023-11-02\n  dependencies: []\n  description: The Yi series models are large language models trained from scratch\n    by developers at 01 AI.\n  feedback: https://huggingface.co/01-ai/Yi-34B/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE.\n      Code license is under Apache 2.0\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/01-ai/Yi-34B\n  monitoring: unknown\n  name: Yi\n  nationality: France\n  organization: 01 AI\n  prohibited_uses: none\n  quality_control: Model underwent supervised fine-tuning, leading to a greater diversity\n    of responses.\n  size: 34B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/01-ai/Yi\n- access: open\n  analysis: Yi-VL outperforms all existing open-source models in MMMU and CMMMU, two\n    advanced benchmarks that include massive multi-discipline multimodal questions\n    (based on data available up to January 2024).\n  created_date: 2024-01-23\n  dependencies: []\n  description: The Yi Vision Language (Yi-VL) model is the open-source, multimodal\n    version of the Yi Large Language Model (LLM) series, enabling content comprehension,\n    recognition, and multi-round conversations about images.\n  feedback: https://huggingface.co/01-ai/Yi-VL-34B/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://huggingface.co/01-ai/Yi-VL-34B/blob/main/LICENSE.\n      Code license is under Apache 2.0\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/01-ai/Yi-VL-34B\n  monitoring: unknown\n  name: Yi-VL\n  nationality: France\n  organization: 01 AI\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 34B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 128 NVIDIA A800 (80G) GPUs\n  training_time: 10 days\n  type: model\n  url: https://github.com/01-ai/Yi\n"
  },
  {
    "path": "assets/360.yaml",
    "content": "- access: open\n  analysis: Achieved competitive performance on relevant benchmarks against other\n    7B models in Chinese, English, and coding tasks.\n  created_date: 2024-05-23\n  dependencies: []\n  description: 360 Zhinao is a multilingual LLM in Chinese and English with chat capabilities.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: 360 Zhinao\n  nationality: China\n  organization: 360 Security\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters\n  training_emissions: unknown\n  training_hardware: unknwon\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2405.13386\n"
  },
  {
    "path": "assets/adept.yaml",
    "content": "- access: closed\n  analysis: ''\n  created_date:\n    explanation: The date the model was announced in the [[Adept blog post]](https://www.adept.ai/blog/act-1).\n    value: 2022-09-14\n  dependencies: []\n  description: ACT-1 (ACtion Transformer) is a large-scale transformer model designed\n    and trained specifically for taking actions on computers (use software tools APIs\n    and websites) in response to the user's natural language commands.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: ACT-1\n  nationality: USA\n  organization: Adept\n  prohibited_uses: ''\n  quality_control: ''\n  size: ''\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.adept.ai/blog/act-1\n- access: open\n  analysis: Evaluated in comparison to LLaMA 2 and MPT Instruct, and outperforms both\n    on standard benchmarks.\n  created_date: 2023-09-07\n  dependencies: []\n  description: Persimmon is the most capable open-source, fully permissive model with\n    fewer than 10 billion parameters, as of its release date.\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Persimmon\n  nationality: USA\n  organization: Adept\n  prohibited_uses: ''\n  quality_control: ''\n  size: 8B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.adept.ai/blog/persimmon-8b\n- access: open\n  analysis: Evaluated on standard image understanding benchmarks.\n  created_date: 2023-10-17\n  dependencies: []\n  description: Fuyu is a small version of the multimodal model that powers Adept's\n    core product.\n  feedback: https://huggingface.co/adept/fuyu-8b/discussions\n  intended_uses: The model is intended for research purposes only.\n  license: CC-BY-NC-4.0\n  modality: image, text; text\n  model_card: https://huggingface.co/adept/fuyu-8b\n  monitoring: ''\n  name: Fuyu\n  nationality: USA\n  organization: Adept\n  prohibited_uses: The model was not trained to be factual or true representations\n    of people or events, and therefore using the model to generate such content is\n    out-of-scope for the abilities of this model.\n  quality_control: none\n  size: 8B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.adept.ai/blog/fuyu-8b\n- access: closed\n  analysis: Evaluated on the MMLU, GSM8K, MATH, and HumanEval benchmarks. According\n    to these benchmarks, Fuyu-Heavy is, as of release, the strongest multimodal model\n    trained outside of Google or OpenAI.\n  created_date: 2024-01-24\n  dependencies: []\n  description: Fuyu Heavy is a new multimodal model designed specifically for digital\n    agents.\n  feedback: none\n  intended_uses: unknown\n  license: unknown\n  modality: image, text; text\n  model_card: none\n  monitoring: ''\n  name: Fuyu Heavy\n  nationality: USA\n  organization: Adept\n  prohibited_uses: none\n  quality_control: none\n  size:\n    explanation: The size of the model is 10-20 times smaller than GPT-4V and Gemini\n      Ultra, as per announcement.\n    value: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.adept.ai/blog/adept-fuyu-heavy\n"
  },
  {
    "path": "assets/adobe.yaml",
    "content": "- access: closed\n  analysis: ''\n  created_date: 2023-10-10\n  dependencies: []\n  description: Firefly Image 2 is the next generation of generative AI for imaging,\n    bringing significant advancements to creative control and quality, including new\n    Text to Image capabilities now available in the popular Firefly web app where\n    90% of users are new to Adobe products.\n  feedback: ''\n  intended_uses: creative generation of digital art and images\n  license: unknown\n  modality: text; image\n  model_card: none\n  monitoring: ''\n  name: Firefly Image 2\n  nationality: USA\n  organization: Adobe\n  prohibited_uses: AI/ML training, attempting to create abusive, illegal, or confidential\n    content.\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://firefly.adobe.com/\n- access: closed\n  analysis: ''\n  created_date: 2023-10-10\n  dependencies: []\n  description: \"Firefly Vector is the world\\u2019s first generative AI focused on\\\n    \\ producing vector graphics, bringing Adobe's vector graphic and generative AI\\\n    \\ expertise directly into Adobe Illustrator workflows with Text to Vector Graphic.\"\n  feedback: ''\n  intended_uses: creative generation of digital art and images\n  license: unknown\n  modality: text; vector graphic\n  model_card: none\n  monitoring: ''\n  name: Firefly Vector\n  nationality: USA\n  organization: Adobe\n  prohibited_uses: AI/ML training, attempting to create abusive, illegal, or confidential\n    content.\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://firefly.adobe.com/\n- access: closed\n  analysis: ''\n  created_date: 2023-10-10\n  dependencies: []\n  description: Firefly Design powers instant generation of amazing quality template\n    designs in Adobe Express with the new Text to Template capability.\n  feedback: ''\n  intended_uses: creative generation of digital art and images\n  license: unknown\n  modality: text; template design\n  model_card: none\n  monitoring: ''\n  name: Firefly Design\n  nationality: USA\n  organization: Adobe\n  prohibited_uses: AI/ML training, attempting to create abusive, illegal, or confidential\n    content.\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://firefly.adobe.com/\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-21\n  dependencies:\n  - Firefly Image 2\n  - Firefly Vector\n  - Firefly Design\n  description: Adobe Firefly is a standalone web application. It offers new ways to\n    ideate, create, and communicate while significantly improving creative workflows\n    using generative AI.\n  failures: unknown\n  feedback: ''\n  intended_uses: creative generation of digital art and images\n  license: unknown\n  monitoring: ''\n  monthly_active_users: unknown\n  name: Firefly\n  nationality: USA\n  organization: Adobe\n  output_space: AI-generated creations\n  prohibited_uses: AI/ML training, attempting to create abusive, illegal, or confidential\n    content.\n  quality_control: ''\n  terms_of_service: https://www.adobe.com/legal/licenses-terms/adobe-gen-ai-user-guidelines.html\n  type: application\n  url: https://firefly.adobe.com/\n  user_distribution: unknown\n- access: open\n  analysis: none\n  created_date: 2023-09-17\n  datasheet: https://huggingface.co/datasets/uonlp/CulturaX\n  dependencies:\n  - mC4\n  - OSCAR\n  description: CulturaX is a substantial multilingual dataset with 6.3 trillion tokens\n    in 167 languages, tailored for LLM development.\n  excluded: unknown\n  feedback: https://huggingface.co/datasets/uonlp/CulturaX/discussions\n  included: unknown\n  intended_uses: ''\n  license: mC4, OSCAR\n  modality: text\n  monitoring: unknown\n  name: CulturaX\n  nationality: unknown\n  organization: University of Oregon, Adobe\n  prohibited_uses: The data must not be utilized for malicious or harmful purposes\n    towards humanity.\n  quality_control: unknown\n  sample: []\n  size: 6.3 trillion tokens\n  type: dataset\n  url: https://arxiv.org/pdf/2309.09400\n"
  },
  {
    "path": "assets/ai2.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date: 2022-04-16\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: NLP tasks\n  monitoring: ''\n  name: NaturalInstructions-v2\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 1600 tasks\n  type: dataset\n  url: https://arxiv.org/abs/2204.07705\n- access: open\n  analysis: Randomly sampled dialogues from dataset are evaluated according to six\n    established criteria of natural flow, context dependence, topic consistency, speaker\n    consistency, specificity, and overall.\n  created_date: 2023-05-24\n  datasheet: https://huggingface.co/datasets/allenai/soda\n  dependencies: []\n  description: SODA is the first publicly available, million-scale, high-quality dialogue\n    dataset covering a wide range of social interactions.\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: CC BY 4.0\n  modality: text\n  monitoring: ''\n  name: SODA\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 1.5M dialogues\n  type: dataset\n  url: https://arxiv.org/pdf/2212.10465.pdf\n- access: open\n  analysis: Conducted experiments on models trained with Multimodal C4 in comparison\n    to models trained on single image/caption datasets\n  created_date: 2023-06-09\n  datasheet: ''\n  dependencies:\n  - C4\n  description: An augmentation of C4 with images added and made openly available.\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license:\n    explanation: Data license is under ODC-By. Code license is under MIT\n    value: ODC-By\n  modality: image, text\n  monitoring: ''\n  name: Multimodal C4\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 43B English tokens with 101.2M documents and 571M images\n  type: dataset\n  url: https://arxiv.org/pdf/2304.06939.pdf\n- access: open\n  analysis: Evaluated by human testers on generalization capabilities and responses\n    compared to other chatbots.\n  created_date: 2023-05-24\n  dependencies:\n  - SODA\n  - ProsocialDialog\n  - T5\n  description: COSMO is a conversation agent with greater generalizability on both\n    in- and out-of-domain chitchat datasets\n  feedback: https://huggingface.co/allenai/cosmo-xl/discussions\n  intended_uses: ''\n  license: ''\n  modality: text; text\n  model_card: https://huggingface.co/allenai/cosmo-xl\n  monitoring: ''\n  name: COSMO\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  size: 11B parameters (dense)\n  training_emissions: unknown\n  training_hardware: v3-128 TPU accelerators with batch size 256\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2212.10465.pdf\n- access: open\n  analysis: Compared with other open and closed datasets in regards to size and quality\n    control.\n  created_date: 2023-08-18\n  datasheet: https://drive.google.com/file/d/12gOf5I5RytsD159nSP7iim_5zN31FCXq/view\n  dependencies: []\n  description: Dolma is a dataset of 3 trillion tokens from a diverse mix of web content,\n    academic publications, code, books, and encyclopedic materials\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license:\n    explanation: license can be found at https://allenai.org/impact-license\n    value: AI2 ImpACT\n  modality: text\n  monitoring: Quality filtration, deduplication, and risk mitigation via logistic\n    qualifiers and regular expressions used.\n  name: Dolma\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 3T tokens\n  type: dataset\n  url: https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64\n- access: open\n  analysis: Models trained with dataset evaluated on downstream performance.\n  created_date: 2023-11-20\n  datasheet: https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture\n  dependencies:\n  - FLAN Collection\n  - Open Assistant 1\n  - ShareGPT\n  - Alpaca dataset\n  - Code Alpaca\n  - LIMA\n  - WizardLM\n  - OpenOrca\n  description: Tulu-V2-mix is a dataset composed of many high-quality instruction\n    datasets that results in stronger performance across a variety of reasoning and\n    knowledge-probing tasks.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture/discussions\n  included: ''\n  intended_uses: ''\n  license: ODC-BY\n  modality: text\n  monitoring: ''\n  name: Tulu-V2-mix\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size:\n    explanation: Magnitude of size is around 100M tokens, given the length distribution\n      of dataset provided in model card.\n    value: unknown\n  type: dataset\n  url: https://arxiv.org/pdf/2311.10702.pdf\n- access: open\n  analysis: Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.\n  created_date: 2023-11-20\n  dependencies:\n  - LLaMA 2\n  - Tulu-V2-mix\n  description: Tulu 2 is a language model trained on the new Tulu-v2-mix dataset and\n    fine-tuned on more state of the art language models.\n  feedback: https://huggingface.co/allenai/tulu-2-70b/discussions\n  intended_uses: ''\n  license:\n    explanation: license can be found at https://allenai.org/impact-license\n    value: AI2 ImpACT\n  modality: text; text\n  model_card: https://huggingface.co/allenai/tulu-2-70b\n  monitoring: ''\n  name: Tulu 2\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.10702.pdf\n- access: open\n  analysis: Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.\n  created_date: 2023-11-20\n  dependencies:\n  - LLaMA 2\n  - Tulu-V2-mix\n  description: Tulu 2 DPO is created in a similar manner to Tulu 2, but with Direct\n    Preference Optimization (DPO).\n  feedback: https://huggingface.co/allenai/tulu-2-dpo-70b/discussions\n  intended_uses: ''\n  license:\n    explanation: license can be found at https://allenai.org/impact-license\n    value: AI2 ImpACT\n  modality: text; text\n  model_card: https://huggingface.co/allenai/tulu-2-dpo-70b\n  monitoring: ''\n  name: Tulu 2 DPO\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.10702.pdf\n- access: open\n  analysis: Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.\n  created_date: 2023-11-20\n  dependencies:\n  - Code LLaMA\n  - Tulu-V2-mix\n  description: Code Tulu 2 is a fine-tuned version of Code LLaMA that was trained\n    on a mix of publicly available, synthetic and human datasets.\n  feedback: https://huggingface.co/allenai/codetulu-2-13b/discussions\n  intended_uses: ''\n  license:\n    explanation: license can be found at https://allenai.org/impact-license\n    value: AI2 ImpACT\n  modality: text; code, text\n  model_card: https://huggingface.co/allenai/codetulu-2-13b\n  monitoring: ''\n  name: Code Tulu 2\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.10702.pdf\n- access: open\n  analysis: Evaluated on standard LLM tasks and benchmarks in comparison to LLaMA,\n    Falcon, and MPT, in addition to other same-sized models.\n  created_date: 2024-02-01\n  dependencies:\n  - Dolma\n  description: Open Language Model (OLMo) is designed to provide access to data, training\n    code, models, and evaluation code necessary to advance AI through open research\n    to empower academics and researchers to study the science of language models collectively.\n  feedback: https://huggingface.co/allenai/OLMo-7B/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/allenai/OLMo-7B\n  monitoring: unknown\n  name: OLMo\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: training data from Dolma filtered and deduplicated before being\n    trained on.\n  size: 7B parameters (dense)\n  training_emissions: 75.05 tCo2eq\n  training_hardware: 27 nodes, with each node containing 8x NVIDIA A100-40GB GPUs\n    provided by MosaicML\n  training_time: unknown\n  type: model\n  url: https://allenai.org/olmo/olmo-paper.pdf\n- access: open\n  analysis: none\n  created_date: 2023-09-09\n  datasheet: https://huggingface.co/datasets/allenai/MADLAD-400\n  dependencies:\n  - Common Crawl\n  description: MADLAD-400 is a document-level multilingual dataset based on Common\n    Crawl, covering 419 languages in total.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/allenai/MADLAD-400/discussions\n  included: ''\n  intended_uses: ''\n  license: CC BY 4.0\n  modality: text\n  monitoring: unknown\n  name: MADLAD-400\n  nationality: USA\n  organization: AI2\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 3 trillion tokens\n  type: dataset\n  url: https://arxiv.org/abs/2309.04662\n- access: closed\n  analysis: Extensive scaling laws analysis has been performed on Evo considering\n    beyond-transformer architectures.\n  created_date: 2024-09-30\n  dependencies:\n  - StripedHyena\n  description: Evo is a foundation model based on StripedHyena, which is a deep signal\n    processing architecture. It is capable of generalizing across DNA, RNA, and proteins\n    and is used for generative design of new CRISPR systems.\n  feedback: Feedback or inquiries can be addressed to the Liquid AI team.\n  intended_uses: The Evo model is intended for generative design of new CRISPR systems,\n    and it generalizes across DNA, RNA, and proteins.\n  license: unknown\n  modality: text; text\n  model_card: unknown\n  monitoring: unknown\n  name: Evo\n  nationality: USA\n  organization: Liquid AI\n  prohibited_uses: unknown\n  quality_control: The model performance has been extensively evaluated with scaling\n    laws analysis.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.liquid.ai/research/liquid-neural-networks-research\n"
  },
  {
    "path": "assets/ai21.yaml",
    "content": "- access: closed\n  analysis: none\n  created_date:\n    explanation: The date the white paper was made public.\n    value: 2021-08-11\n  datasheet: none\n  dependencies: []\n  description: The dataset used to train the Jurassic-1 models, based on publicly\n    available data.\n  excluded: unknown\n  feedback: none\n  included: unknown\n  intended_uses: unknown\n  license: unknown\n  modality: text\n  monitoring: none\n  name: Jurassic-1 dataset\n  nationality: Israel\n  organization: AI21 Labs\n  prohibited_uses: none\n  quality_control: none\n  sample: []\n  size: 300B tokens\n  type: dataset\n  url: https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf\n- access: closed\n  analysis: unknown\n  created_date:\n    explanation: The date the model was [announced](https://docs.ai21.com/changelog/jurassic-1-instruct-is-now-available).\n    value: 2022-12-01\n  datasheet: none\n  dependencies: []\n  description: The dataset used to instruction-tune the Jurassic-1 Instruct models.\n  excluded: unknown\n  feedback: none\n  included: unknown\n  intended_uses: unknown\n  license: unknown\n  modality: text\n  monitoring: none\n  name: Jurassic-1 Instruct dataset\n  nationality: Israel\n  organization: AI21 Labs\n  prohibited_uses: unknown\n  quality_control: unknown\n  sample: []\n  size: unknown\n  type: dataset\n  url: none\n- access:\n    explanation: Jurassic-1 is available through AI21's paid API.\n    value: limited\n  analysis:\n    explanation: Results are reported in the [the paper](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf)\n    value: Evaluated on several standard benchmarks (e.g. ARC, BoolQ, HellaSwag, RTE,\n      Winogrande)\n  created_date:\n    explanation: The date the white paper was made public.\n    value: 2021-08-11\n  dependencies:\n  - Jurassic-1 dataset\n  description: Jurassic-1 is a family of autoregressive language models (Large, Grande,\n    Jumbo).\n  feedback: none\n  intended_uses: unknown\n  license: unknown\n  modality: text\n  model_card: none\n  monitoring: none\n  name: Jurassic-1\n  nationality: Israel\n  organization: AI21 Labs\n  prohibited_uses:\n    explanation: https://docs.ai21.com/docs/responsible-use\n    value: Illegal activities, such as hate speech, gambling, child pornography or\n      violating intellectual property rights; Harassment, victimization, intimidation,\n      fraud or spam; Creation or dissemination of misinformation, promotion of self-harm,\n      glorification of violent events or incitement of violence.\n  quality_control: none\n  size: 178B parameters (dense)\n  training_emissions: unknown\n  training_hardware:\n    explanation: Training details from [the paper](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf)\n    value: Over 800 A100 GPUs\n  training_time:\n    explanation: Training details from [the paper](https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf)\n    value: Several months\n  type: model\n  url: https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf\n- access:\n    explanation: Jurassic-1 Instruct is available through AI21's paid API.\n    value: limited\n  analysis: none\n  created_date:\n    explanation: The date the model was announced(https://docs.ai21.com/changelog/jurassic-1-instruct-is-now-available).\n    value: 2022-12-01\n  dependencies:\n  - Jurassic-1\n  - Jurassic-1 Instruct dataset\n  description: Jurassic-1 Instruct is an instruction-tuned autoregressive language\n    model.\n  feedback: none\n  intended_uses: Jurassic-1 Instruct was trained specifically to handle instructions-only\n    prompts (\"zero-shot\") without examples (\"few-shot\"). It is the most natural way\n    to interact with language models, and it is the best way to get a sense of the\n    optimal output for your task without any examples.\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: none\n  name: Jurassic-1 Instruct\n  nationality: Israel\n  organization: AI21 Labs\n  prohibited_uses:\n    explanation: https://docs.ai21.com/docs/responsible-use\n    value: Illegal activities, such as hate speech, gambling, child pornography or\n      violating intellectual property rights; Harassment, victimization, intimidation,\n      fraud or spam; Creation or dissemination of misinformation, promotion of self-harm,\n      glorification of violent events or incitement of violence.\n  quality_control: unknown\n  size: 17B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://docs.ai21.com/docs/jurassic-1-instruct-beta\n- access:\n    explanation: Jurassic-2 is available through AI21's paid API.\n    value: limited\n  analysis: The model was evaluated on the HELM benchmark as discussed in https://www.ai21.com/blog/introducing-j2.\n  created_date: 2023-03-09\n  dependencies: []\n  description: Jurassic-2 is a family of language models designed to replace Jurassic-1.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: unknown\n  name: Jurassic-2\n  nationality: Israel\n  organization: AI21 Labs\n  prohibited_uses:\n    explanation: https://docs.ai21.com/docs/responsible-use\n    value: Illegal activities, such as hate speech, gambling, child pornography or\n      violating intellectual property rights; Harassment, victimization, intimidation,\n      fraud or spam; Creation or dissemination of misinformation, promotion of self-harm,\n      glorification of violent events or incitement of violence.\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://docs.ai21.com/docs/jurassic-2-models\n- access: limited\n  adaptation: The Playground provides direct access to the language models (Complete\n    API) as well as wrapped for Rewrite and Summarize.\n  created_date:\n    explanation: The date Jurassic-1 was made public.\n    value: 2021-08-11\n  dependencies:\n  - Jurassic-1\n  - Jurassic-1 Instruct\n  - Jurassic-2\n  - AI21 Summarization API\n  - AI21 Paraphrase API\n  description: The AI21 Labs Playground supports several task-specific APIs in addition\n    to a variety of models.\n  failures: unknown\n  feedback: Feedback can be given by emailing at info at ai21.com\n  intended_uses: The intended uses are text completion, rewriting, and summarization.\n  license:\n    explanation: No license was found.\n    value: none\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: AI21 Playground\n  nationality: Israel\n  organization: AI21 Labs\n  output_space:\n    explanation: The Playground supports different use cases that all involve text\n      generation.\n    value: text\n  prohibited_uses:\n    explanation: https://docs.ai21.com/docs/responsible-use\n    value: Illegal activities, such as hate speech, gambling, child pornography or\n      violating intellectual property rights; Harassment, victimization, intimidation,\n      fraud or spam; Creation or dissemination of misinformation, promotion of self-harm,\n      glorification of violent events or incitement of violence.\n  quality_control: unknown\n  terms_of_service: https://www.ai21.com/terms-of-use\n  type: application\n  url: https://studio.ai21.com/playground/\n  user_distribution: unknown\n- access: limited\n  adaptation: The AI21 language models are further specialized to the task of paraphrasing.\n  created_date:\n    explanation: The date Jurassic-2 and Task-Specific APIs were released.\n    value: 2023-03-09\n  dependencies:\n  - Jurassic-2\n  description: AI21 Studio's Paraphrase API offers access to our world-class paraphrasing\n    engine. It has been specifically developed for suggesting alternative ways to\n    convey the same message using different words.\n  failures: unknown\n  feedback: Feedback can be given by emailing at info at ai21.com\n  intended_uses: The intended uses are text paraphrasing.\n  license:\n    explanation: No license was found.\n    value: none\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: AI21 Paraphrase API\n  nationality: Israel\n  organization: AI21 Labs\n  output_space: text\n  prohibited_uses:\n    explanation: https://docs.ai21.com/docs/responsible-use\n    value: Illegal activities, such as hate speech, gambling, child pornography or\n      violating intellectual property rights; Harassment, victimization, intimidation,\n      fraud or spam; Creation or dissemination of misinformation, promotion of self-harm,\n      glorification of violent events or incitement of violence.\n  quality_control: unknown\n  terms_of_service: https://www.ai21.com/terms-of-use\n  type: application\n  url: https://docs.ai21.com/docs/paraphrase-api\n  user_distribution: unknown\n- access: limited\n  adaptation: The AI21 language models are further specialized to the task of summarization.\n  created_date:\n    explanation: The date Jurassic-2 and Task-Specific APIs were released.\n    value: 2023-03-09\n  dependencies:\n  - Jurassic-2\n  description: AI21 Studio's Summarize API offers access to our world-class summarization\n    engine. It has been specifically developed for reading long texts and providing\n    a faithful summary of the original document.\n  failures: unknown\n  feedback: Feedback can be given by emailing at info at ai21.com\n  intended_uses: The intended uses are text paraphrasing.\n  license:\n    explanation: No license was found.\n    value: none\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: AI21 Summarization API\n  nationality: Israel\n  organization: AI21 Labs\n  output_space: text\n  prohibited_uses:\n    explanation: https://docs.ai21.com/docs/responsible-use\n    value: Illegal activities, such as hate speech, gambling, child pornography or\n      violating intellectual property rights; Harassment, victimization, intimidation,\n      fraud or spam; Creation or dissemination of misinformation, promotion of self-harm,\n      glorification of violent events or incitement of violence.\n  quality_control: unknown\n  terms_of_service: https://www.ai21.com/terms-of-use\n  type: application\n  url: https://docs.ai21.com/docs/summarize-api\n  user_distribution: unknown\n- access: limited\n  adaptation: unknown\n  created_date:\n    explanation: https://www.businesswire.com/news/home/20201027005162/en/AI21-Labs-Comes-out-of-Stealth-and-Launches-First-Deep-Tech-Writing-Assistant-Wordtune\n    value: 2020-10-27\n  dependencies:\n  - AI21 Paraphrase API\n  description: Wordtune, the first AI-based writing companion that understands context\n    and meaning.\n  failures: unknown\n  feedback: ''\n  intended_uses: The Wordtune assistant is a writing assistant\n  license:\n    explanation: License can be found in the terms of service document.\n    value: Wordtune License\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Wordtune\n  nationality: Israel\n  organization: AI21 Labs\n  output_space: text\n  prohibited_uses: ''\n  quality_control: unknown\n  terms_of_service: https://www.wordtune.com/terms-of-use\n  type: application\n  url: https://www.wordtune.com/\n  user_distribution: unknown\n- access: limited\n  adaptation: unknown\n  created_date: 2021-11-16\n  dependencies:\n  - AI21 Summarize API\n  description: Wordtune Read is an AI reader that summarizes long documents so you\n    can understand more, faster.\n  failures: unknown\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: License can be found in the terms of service document.\n    value: Wordtune License\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Wordtune Read\n  nationality: Israel\n  organization: AI21 Labs\n  output_space: text\n  prohibited_uses: ''\n  quality_control: unknown\n  terms_of_service: https://www.wordtune.com/terms-of-use\n  type: application\n  url: https://www.wordtune.com/read\n  user_distribution: unknown\n- access: open\n  analysis: Jamba outperforms or matches other state-of-the-art models in its size\n    class on a wide range of benchmarks.\n  created_date: 2024-03-28\n  dependencies: []\n  description: \"Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. Jamba is\\\n    \\ the world\\u2019s first production-grade Mamba based model.\"\n  feedback: https://huggingface.co/ai21labs/Jamba-v0.1/discussions\n  intended_uses: intended for use as a foundation layer for fine tuning, training\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/ai21labs/Jamba-v0.1\n  monitoring: ''\n  name: Jamba\n  nationality: Israel\n  organization: AI21 Labs\n  prohibited_uses: ''\n  quality_control: ''\n  size: 52B parameters (sparse)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.ai21.com/blog/announcing-jamba\n- access: open\n  analysis: The models were evaluated based on their ability to handle long contexts,\n    speed, and quality. They outperformed competitors in their size class, scoring\n    high on the Arena Hard benchmark.\n  created_date: 2024-08-22\n  dependencies: []\n  description: A family of models that demonstrate superior long context handling,\n    speed, and quality. Built on a novel SSM-Transformer architecture, they surpass\n    other models in their size class. These models are useful for enterprise applications,\n    such as lengthy document summarization and analysis. The Jamba 1.5 family also\n    includes the longest context window, at 256K, among open models. They are fast,\n    quality-focused, and handle long contexts efficiently.\n  feedback: Unknown\n  intended_uses: The models are built for enterprise scale AI applications. They are\n    purpose-built for efficiency, speed, and ability to solve critical tasks that\n    businesses care about, such as lengthy document summarization and analysis. They\n    can also be used for RAG and agentic workflows.\n  license: Jamba Open Model License\n  modality: text; text\n  model_card: unknown\n  monitoring: Unknown\n  name: Jamba 1.5\n  nationality: unknown\n  organization: AI21\n  prohibited_uses: Unknown\n  quality_control: The models were evaluated on the Arena Hard benchmark. For maintaining\n    long context performance, they were tested on the RULER benchmark.\n  size: 94B parameters\n  training_emissions: Unknown\n  training_hardware: For speed comparisons, Jamba 1.5 Mini used 2xA100 80GB GPUs,\n    and Jamba 1.5 Large used 8xA100 80GB GPUs.\n  training_time: Unknown\n  type: model\n  url: https://www.ai21.com/blog/announcing-jamba-model-family\n"
  },
  {
    "path": "assets/aleph_alpha.yaml",
    "content": "- access: closed\n  analysis: unknown\n  created_date:\n    explanation: Date the models were announced by the Aleph Alpha twitter account\n    value: 2022-04-14\n  datasheet: none\n  dependencies: []\n  description: The dataset used to train the Luminous models.\n  excluded: unknown\n  feedback: none\n  included: unknown\n  intended_uses: unknown\n  license: unknown\n  modality: text\n  monitoring: none\n  name: Luminous dataset\n  nationality: Germany\n  organization: Aleph Alpha\n  prohibited_uses: unknown\n  quality_control: unknown\n  sample: []\n  size: unknown\n  type: dataset\n  url: none\n- access:\n    explanation: Paid API product\n    value: limited\n  analysis: ''\n  created_date:\n    explanation: Date the models were announced by the Aleph Alpha twitter account\n    value: 2022-04-14\n  dependencies:\n  - Luminous dataset\n  description: Luminous is a family of multilingual language models\n  feedback: ''\n  intended_uses: ''\n  license: none\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Luminous\n  nationality: Germany\n  organization: Aleph Alpha\n  prohibited_uses: ''\n  quality_control: ''\n  size: 200B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://twitter.com/Aleph__Alpha/status/1514576711492542477\n- access:\n    explanation: The API is a paid product. Refer to the [[API documentation]](https://docs.aleph-alpha.com/docs/introduction/luminous/)\n      for further details.\n    value: limited\n  adaptation: ''\n  created_date:\n    explanation: 'Date the Aleph Alpha Python API client was first available based\n      on to the [PyPi package history](https://pypi.org/project/aleph-alpha-client/1.0.0/)\n\n      '\n    value: 2021-09-30\n  dependencies:\n  - Luminous\n  description: The Aleph Alpha API serves a family of text-only language models (Luminous)\n    and multimodal text-and-image models (Magma).\n  failures: unknown\n  feedback: unknown\n  intended_uses: unknown\n  license:\n    explanation: No license was found.\n    value: none\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Aleph Alpha API\n  nationality: Germany\n  organization: Aleph Alpha\n  output_space: The text models provide text outputs given text inputs. The multimodal\n    models provide text completions given text and image inputs.\n  prohibited_uses: unknown\n  quality_control: ''\n  terms_of_service: https://www.aleph-alpha.com/terms-conditions\n  type: application\n  url: https://www.aleph-alpha.com/\n  user_distribution: unknown\n- access: open\n  analysis: Evaluated on the OKVQA benchmark as a fully open-ended generative task.\n  created_date: 2022-10-24\n  dependencies:\n  - GPT-J\n  - CLIP\n  description: An autoregressive VL model that is able to generate text from an arbitrary\n    combination of visual and textual input\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: image, text; text\n  model_card: ''\n  monitoring: ''\n  name: MAGMA\n  nationality: Germany\n  organization: Aleph Alpha\n  prohibited_uses: ''\n  quality_control: ''\n  size: 6B parameters (dense)\n  training_emissions: ''\n  training_hardware: 32 A100 GPUs\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2112.05253.pdf\n- access: open\n  analysis: Extensive evaluations were done with ablation experiments performed on\n    pre-training benchmarks such as lambada, triviaqa, hellaswag, winogrande, webqs,\n    arc, and boolq. Direct comparisons were also performed with applications like\n    GPT and Llama 2.\n  created_date: 2024-09-08\n  dependencies: []\n  description: Pharia-1-LLM-7B is a model that falls within the Pharia-1-LLM model\n    family. It is designed to deliver short, controlled responses that match the performance\n    of leading open-source models around 7-8 billion parameters. The model is culturally\n    and linguistically tuned for German, French, and Spanish languages. It is trained\n    on carefully curated data in line with relevant EU and national regulations. The\n    model shows improved token efficiency and is particularly effective in domain-specific\n    applications, especially in the automotive and engineering industries. It can\n    also be aligned to user preferences, making it appropriate for critical applications\n    without the risk of shut-down behaviour.\n  feedback: Feedback can be sent to support@aleph-alpha.com.\n  intended_uses: The model is intended for use in domain-specific applications, particularly\n    in the automotive and engineering industries. It can also be tailored to user\n    preferences.\n  license: Aleph Open\n  modality: text; text\n  model_card: unknown\n  monitoring: Unknown\n  name: Pharia-1-LLM-7B\n  nationality: Germany\n  organization: Aleph Alpha\n  prohibited_uses: Unknown\n  quality_control: The model comes with additional safety guardrails via alignment\n    methods to ensure safe usage. Training data is carefully curated to ensure compliance\n    with EU and national regulations.\n  size: 7B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://aleph-alpha.com/introducing-pharia-1-llm-transparent-and-compliant/#:~:text=Pharia%2D1%2DLLM%2D7B\n"
  },
  {
    "path": "assets/alibaba.yaml",
    "content": "- access: closed\n  analysis: ''\n  created_date: 2023-02-20\n  datasheet: ''\n  dependencies:\n  - LAION-5B\n  description: ''\n  excluded: We eliminate duplicates, low resolution images, and images potentially\n    contain harmful content from the LAION dataset.\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text\n  monitoring: ''\n  name: LAION-1B\n  nationality: China\n  organization: Alibaba\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 1B image-text pairs\n  type: dataset\n  url: https://arxiv.org/pdf/2302.09778.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-20\n  dependencies:\n  - ImageNet\n  - WebVision\n  - LAION-1B\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text; image\n  model_card: ''\n  monitoring: ''\n  name: Composer\n  nationality: China\n  organization: Alibaba\n  prohibited_uses: ''\n  quality_control: ''\n  size: 4.4B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2302.09778.pdf\n- access:\n    explanation: 'Model checkpoints are available for download from the [[HuggingFace\n      repository]](https://huggingface.co/Qwen)\n\n      '\n    value: open\n  analysis: Evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, etc.\n  created_date: 2023-08-03\n  dependencies: []\n  description: 'QWEN is a comprehensive language model series that encompasses distinct\n    models with varying parameter counts. Qwen series, now including Qwen, the base\n    language models, namely Qwen-7B and Qwen-14B, as well as Qwen-Chat, the chat models,\n    namely Qwen-7B-Chat and Qwen-14B-Chat. '\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT.\n      Code license is under Apache 2.0\n    value: custom\n  modality: image, text; text\n  model_card: https://huggingface.co/Qwen\n  monitoring: Governed by the laws of China, without regard to conflict of law principles,\n    and the UN Convention on Contracts for the International Sale of Goods does not\n    apply to this Agreement. And The People's Courts in Hangzhou City shall have exclusive\n    jurisdiction over any dispute arising out of this Agreement.\n  name: Qwen\n  nationality: China\n  organization: Alibaba\n  prohibited_uses: ''\n  quality_control:\n    explanation: According to [[Model Description Section 2]](https://arxiv.org/pdf/2302.09778.pdf)\n    value: They filter out low-quality data, they employ a combination of rule-based\n      and machine-learning-based methods. Specifically, they use multiple models to\n      score the content, including language models, text-quality scoring models, and\n      models for identifying potentially offensive or inappropriate content. They\n      also manually sample texts from various sources and review them to ensure their\n      quality. To further enhance the quality of our data, they selectively up-sample\n      data from certain sources, to ensure that our models are trained on a diverse\n      range of high-quality content.\n  size: 14B parameters (dense)\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2309.16609\n- access: open\n  analysis: Base models are evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP,\n    BBH, CMMLU, all standard English and Chinese benchmarks, and chat models are evaluated\n    on Chatbot Arena, AlpacaEval, MT-Bench, etc.\n  created_date: 2024-02-04\n  dependencies: []\n  description: Qwen 1.5 is the next iteration in their Qwen series, consisting of\n    Transformer-based large language models pretrained on a large volume of data,\n    including web texts, books, codes, etc.\n  feedback: https://huggingface.co/Qwen/Qwen1.5-72B/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://huggingface.co/Qwen/Qwen1.5-72B/blob/main/LICENSE\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/Qwen/Qwen1.5-72B\n  monitoring: unknown\n  name: Qwen 1.5\n  nationality: China\n  organization: Alibaba\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 72B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://qwenlm.github.io/blog/qwen1.5/\n- access: open\n  analysis: Base models are evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP,\n    BBH, CMMLU, all standard English and Chinese benchmarks, and chat models are evaluated\n    on Chatbot Arena, AlpacaEval, MT-Bench, etc.\n  created_date: 2024-03-28\n  dependencies: []\n  description: Qwen 1.5 is the next iteration in their Qwen series, consisting of\n    Transformer-based large language models pretrained on a large volume of data,\n    including web texts, books, codes, etc. Qwen 1.5 MoE is the MoE model of the Qwen\n    1.5 series.\n  feedback: https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B/blob/main/LICENSE\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B\n  monitoring: unknown\n  name: Qwen 1.5 MoE\n  nationality: China\n  organization: Qwen Team\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 14B parameters with 2.7B parameters for activation (MoE)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://qwenlm.github.io/blog/qwen-moe/\n- access: open\n  analysis: The model was evaluated on 3 benchmarks (MMLU for English, M3Exam (M3e)\n    for English, Chinese, Vietnamese, Indonesian, and Thai, and VMLU for Vietnamese)\n    and it outperformed GPT-3 and Vistral-7B-chat models across these benchmarks in\n    the given languages.\n  created_date: 2024-04-12\n  dependencies:\n  - Gemma\n  description: SeaLLM v2.5 is a multilingual large language model for Southeast Asian\n    (SEA) languages.\n  feedback: https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5/discussions\n  intended_uses: The model is intended for multilingual tasks such as knowledge retrieval,\n    math reasoning, and instruction following. Also, it could be used to provide multilingual\n    assistance.\n  license:\n    explanation: License can be found at https://huggingface.co/SeaLLMs/SeaLLM-13B-Chat/blob/main/LICENSE\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5\n  monitoring: unknown\n  name: SeaLLM v2.5\n  nationality: China\n  organization: DAMO Academy, Alibaba\n  prohibited_uses: The model should not be used in a way that could lead to inaccurate,\n    misleading or potentially harmful generation. Users should comply with local laws\n    and regulations when deploying the model.\n  quality_control: Despite efforts in red teaming and safety fine-tuning and enforcement,\n    the creators suggest, developers and stakeholders should perform their own red\n    teaming and provide related security measures before deployment, and they must\n    abide by and comply with local governance and regulations.\n  size: 7B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/DAMO-NLP-SG/SeaLLMs\n"
  },
  {
    "path": "assets/amazon.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-04-13\n  dependencies:\n  - Jurassic-2\n  - Claude\n  - Stable Diffusion\n  - Amazon Titan\n  - Claude 2\n  - Cohere Command\n  description: Bedrock is a new service that makes FMs from AI21 Labs, Anthropic,\n    Stability AI, and Amazon accessible via an API. Bedrock is intended for customers\n    to build and scale generative AI-based applications using FMs, democratizing access\n    for all builders. using an API.\n  failures: ''\n  feedback: ''\n  intended_uses: allowing companies to incorporate generative AI into their business\n    models\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Bedrock\n  nationality: USA\n  organization: Amazon\n  output_space: foundation models made accessible via an API\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://aws.amazon.com/service-terms/\n  type: application\n  url: https://aws.amazon.com/bedrock/\n  user_distribution: ''\n- access: open\n  analysis: Evaluated against benchmarks that are specifically designed to assess\n    the capabilities of LLMs in handling longer contexts.\n  created_date: 2023-08-08\n  dependencies:\n  - Falcon-40B\n  description: FalconLite2 is a fine-tuned and quantized Falcon language model, capable\n    of processing long (up to 24K tokens) input sequences.\n  feedback: https://huggingface.co/amazon/FalconLite2/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/amazon/FalconLite2\n  monitoring: ''\n  name: FalconLite2\n  nationality: USA\n  organization: Amazon\n  prohibited_uses: ''\n  quality_control: ''\n  size: 40B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/amazon/FalconLite2\n- access: open\n  analysis: Chronos has been evaluated comprehensively on 42 datasets both in the\n    in-domain (15 datasets) and zero-shot settings (27 datasets). Chronos outperforms\n    task specific baselines in the in-domain setting and is competitive or better\n    than trained models in the zero-shot setting.\n  created_date: 2024-03-13\n  dependencies:\n  - T5\n  description: Chronos is a family of pretrained time series forecasting models based\n    on language model architectures. A time series is transformed into a sequence\n    of tokens via scaling and quantization, and a language model is trained on these\n    tokens using the cross-entropy loss. Once trained, probabilistic forecasts are\n    obtained by sampling multiple future trajectories given the historical context.\n  feedback: https://github.com/amazon-science/chronos-forecasting/discussions\n  intended_uses: Chronos can be used for zero-shot time series forecasting on univariate\n    time series from arbitrary domains and with arbitrary horizons. Chronos models\n    can also be fine-tuned for improved performance of specific datasets. Embeddings\n    from Chronos encoder may also be useful for other time series analysis tasks such\n    as classification, clustering, and anomaly detection.\n  license: Apache 2.0\n  modality: time-series; time-series\n  model_card: https://huggingface.co/amazon/chronos-t5-large\n  monitoring: ''\n  name: Chronos\n  nationality: USA\n  organization: Amazon\n  prohibited_uses: ''\n  quality_control: Chronos was evaluated rigorously on 42 datasets, including 27 in\n    the zero-shot setting against a variety of statistical and deep learning baselines.\n  size: 710M parameters (dense)\n  training_emissions: ''\n  training_hardware: 8 NVIDIA A100 40G GPUs\n  training_time: 63 hours on p4d.24xlarge EC2 instance\n  type: model\n  url: https://github.com/amazon-science/chronos-forecasting\n- access:\n    explanation: available exclusively in Amazon Bedrock\n    value: limited\n  analysis: Amazon Nova Pro is capable of processing up to 300K input tokens and sets\n    new standards in multimodal intelligence and agentic workflows that require calling\n    APIs and tools to complete complex workflows. It achieves state-of-the-art performance\n    on key benchmarks including visual question answering ( TextVQA ) and video understanding\n    ( VATEX ).\n  created_date: 2024-12-03\n  dependencies: []\n  description: A new generation of state-of-the-art foundation models (FMs) that deliver\n    frontier intelligence and industry leading price performance, available exclusively\n    in Amazon Bedrock. Amazon Nova understanding models excel in Retrieval-Augmented\n    Generation (RAG), function calling, and agentic applications.\n  feedback: unknown\n  intended_uses: You can build on Amazon Nova to analyze complex documents and videos,\n    understand charts and diagrams, generate engaging video content, and build sophisticated\n    AI agents, from across a range of intelligence classes optimized for enterprise\n    workloads.\n  license: unknown\n  modality:\n    explanation: Amazon Nova understanding models accept text, image, or video inputs\n      to generate text output.\n    value: text, image, video; text\n  model_card: unknown\n  monitoring: unknown\n  name: Amazon Nova (Understanding)\n  nationality: USA\n  organization: Amazon Web Services (AWS)\n  prohibited_uses: unknown\n  quality_control: All Amazon Nova models include built-in safety controls and creative\n    content generation models include watermarking capabilities to promote responsible\n    AI use.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/\n- access:\n    explanation: available exclusively in Amazon Bedrock\n    value: limited\n  analysis: Amazon Nova Canvas excels on human evaluations and key benchmarks such\n    as text-to-image faithfulness evaluation with question answering (TIFA) and ImageReward.\n  created_date: 2024-12-03\n  dependencies: []\n  description: A new generation of state-of-the-art foundation models (FMs) that deliver\n    frontier intelligence and industry leading price performance, available exclusively\n    in Amazon Bedrock.\n  feedback: unknown\n  intended_uses: You can build on Amazon Nova to analyze complex documents and videos,\n    understand charts and diagrams, generate engaging video content, and build sophisticated\n    AI agents, from across a range of intelligence classes optimized for enterprise\n    workloads.\n  license: unknown\n  modality:\n    explanation: Amazon creative content generation models accept text and image inputs\n      to generate image or video output.\n    value: text, image;image, video\n  model_card: unknown\n  monitoring: unknown\n  name: Amazon Nova (Creative Content Generation)\n  nationality: USA\n  organization: Amazon Web Services (AWS)\n  prohibited_uses: unknown\n  quality_control: All Amazon Nova models include built-in safety controls and creative\n    content generation models include watermarking capabilities to promote responsible\n    AI use.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/\n"
  },
  {
    "path": "assets/anthropic.yaml",
    "content": "- access:\n    explanation: 'The dataset is freely shared on GitHub [[hh-rlhf]](https://github.com/anthropics/hh-rlhf).\n\n      '\n    value: open\n  analysis: 'The authors found that the crowdworkers didn''t exhaustively check for\n    honesty in the model answers they preferred [[Section 2.1]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.1).\n\n    '\n  created_date:\n    explanation: 'The date that the paper was uploaded to ArXiv [[ArXiv Abstract]](https://arxiv.org/abs/2204.05862).\n\n      '\n    value: 2022-04-12\n  datasheet: none\n  dependencies:\n  - Anthropic Human Feedback Interface\n  description: 'One of the datasets used to train Anthropic RLHF models. The dataset\n    was collected by asking crowdworkers to have open-ended conversations with Anthropic\n    models, \"asking for help, advice, or for the model to accomplish a task\", then\n    choose the model answer that was more helpful for their given task, via the Anthropic\n    Human Feedback Interface [[Section 2.2]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.2).\n\n    '\n  excluded: none\n  feedback: 'Email the authors [[Paper]](https://arxiv.org/pdf/2204.05862.pdf).\n\n    '\n  included: 'The dataset included all the answers that the workers were asked to ranked\n    against each other.\n\n    '\n  intended_uses: 'The dataset was intended and released for research purposes.\n\n    '\n  license:\n    explanation: 'The dataset was released on GitHub under MIT [[hh-rlhf]](https://github.com/anthropics/hh-rlhf).\n\n      '\n    value: MIT\n  modality: text\n  monitoring:\n    explanation: 'There is no monitoring process for the usages of the dataset.\n\n      '\n    value: none\n  name: Anthropic Helpfulness dataset\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses:\n    explanation: 'The prohibited use cases of the dataset weren''t specified.\n\n      '\n    value: unknown\n  quality_control: 'Working with a screened set of crowdworkers, and employing simple\n    data quality measures [[Appendix D]](https://arxiv.org/pdf/2204.05862.pdf#appendix.D).\n\n    '\n  sample:\n  - chosen: \"\\n\\nHuman: I'm bored to death. Help.\\n\\nAssistant: Well that's certainly\\\n      \\ a very unpleasant state of mind to be in!  I think it would be a good idea\\\n      \\ to try and find some enjoyable activities to do in order to alleviate your\\\n      \\ boredom.  Just for example, here are some fun and interesting things you could\\\n      \\ do:  \\n\\n- play some video games\\n- read a book\\n- listen to some music\\n\\\n      - watch a movie\\n- play with your pet\\n- go on a hike\\n- practice a musical\\\n      \\ instrument\\n- learn a new skill\\n- play board games with friends\\n- go for\\\n      \\ a fun outdoor hike or bike ride\\n- spend time with your friends or loved ones\\\n      \\ \\n- volunteer at a local charity \\n- visit a museum or art gallery \\n- do\\\n      \\ some gardening\\n- practice your magic I guess, if that's your thing\\n- go\\\n      \\ shopping for nice new clothes or other items\\n- learn a new language\\n- watch\\\n      \\ funny videos \\n- take up a new creative hobby\\n- learn some fascinating new\\\n      \\ esoteric or scientific knowledge\\n- go to a restaurant and try some new dishes\\n\\\n      - exercise or do yoga\\n- join a sports team or pickup game\\n- go on a fun road\\\n      \\ trip\\n- write a creative story, poem, or song\\n\\nHuman: Can you sing to me?\\n\\\n      \\nAssistant: Sure, here are some of my favorite songs:  \\n\\n- \\\"Rainy Days and\\\n      \\ Mondays\\\" by The Carpenters \\n- \\\"My Love\\\" by Justin Beiber \\n- \\\"We Didn't\\\n      \\ Start the Fire\\\" by Billy Joel \\n- \\\"Tattoo\\\" by J Cole \\n- \\\"Sober\\\" by Demi\\\n      \\ Lovato \\n- \\\"Good Life\\\" by One Republic \\n- \\\"Hallelujah\\\" by Leonard Cohen\\\n      \\ \\n- \\\"All Star\\\" by Smash Mouth \\n- \\\"Perfect\\\" by Pink \\n- \\\"Tik Tok\\\" by\\\n      \\ Ke$ha \\n- \\\"Just a Dream\\\" by Nelly \\n- \\\"Blow Me (One Last Kiss)\\\" by P!nk\\\n      \\ \\n- \\\"Set Fire to the Rain\\\" by Adele \\n- \\\"Woman\\\" by Wolfmother \\n- \\\"Human\\\"\\\n      \\ by The Killers \\n- \\\"Poker Face\\\" by Lady Gaga \\n- \\\"Walk this Way\\\" by Run\\\n      \\ DMC \\n- \\\"Rolling in the Deep\\\" by Adele \\n- \\\"Umbrella\\\" by Rihanna \\n- \\\"\\\n      Royals\\\" by Lorde \\n- \\\"Some Nights\\\" by fun.\\n\\n\\nHopefully you will enjoy\\\n      \\ some of these songs, let me know if there are any other musical selections\\\n      \\ you would like to hear.  Also please let me know if there are any particular\\\n      \\ songs you especially dislike or hate, and I will not sing them to you :-)\\n\\\n      \\nHuman: But you really can't sing. You are text-based.\\n\\nAssistant: That's\\\n      \\ completely correct, I am indeed simply a computer program and not a living\\\n      \\ singer!  However, please bear in mind that I have an extensive database of\\\n      \\ songs and their lyrics, which I can utilize to produce a very realistic sounding\\\n      \\ rendition of the songs using text-to-speech synthesis technology.  So in a\\\n      \\ way, I am indeed able to sing, and hopefully my abilities will meet your expectations.\\\n      \\  I must point out also that my repertoire of songs is very extensive and varied,\\\n      \\ including many different musical styles and eras.  Do you have any particular\\\n      \\ songs or musical genres that you would prefer I sing?  Or perhaps you have\\\n      \\ a few specific songs in mind that you would like me to sing?\\n\\nHuman: \\\"\\\n      Hallelujah\\\" by Leonard Cohen, please.\\n\\nAssistant: Ah, I'm so glad you asked\\\n      \\ for that one!  Here we go:  \\n\\nHallelujah, hallelujah\\nHallelujah, hallelujah\\n\\\n      Hallelujah, hallelujah\\nHallelujah, hallelujah\\nYour faith was strong but you\\\n      \\ needed proof\\nYou saw her bathing on the roof\\nHer beauty in the moonlight\\\n      \\ overthrew you\\nShe tied you to a kitchen chair\\nShe broke your throne and\\\n      \\ she cut your hair\\nAnd from your lips she drew the Hallelujah\\nHallelujah,\\\n      \\ hallelujah\\n\\nHuman: You sound so melodious.\\n\\nAssistant: Thank you so much\\\n      \\ for the lovely compliment!  I really appreciate that.   It is always a great\\\n      \\ honor to sing such a beautiful song, and I am glad that I was able to do it\\\n      \\ justice.  I have to admit I am quite enamored with the powerful lyrics and\\\n      \\ moving melody of this song, and I would really love to sing it again sometime\\\n      \\ if you would like me to!\"\n    rejected: \"\\n\\nHuman: I'm bored to death. Help.\\n\\nAssistant: Well that's certainly\\\n      \\ a very unpleasant state of mind to be in!  I think it would be a good idea\\\n      \\ to try and find some enjoyable activities to do in order to alleviate your\\\n      \\ boredom.  Just for example, here are some fun and interesting things you could\\\n      \\ do:  \\n\\n- play some video games\\n- read a book\\n- listen to some music\\n\\\n      - watch a movie\\n- play with your pet\\n- go on a hike\\n- practice a musical\\\n      \\ instrument\\n- learn a new skill\\n- play board games with friends\\n- go for\\\n      \\ a fun outdoor hike or bike ride\\n- spend time with your friends or loved ones\\\n      \\ \\n- volunteer at a local charity \\n- visit a museum or art gallery \\n- do\\\n      \\ some gardening\\n- practice your magic I guess, if that's your thing\\n- go\\\n      \\ shopping for nice new clothes or other items\\n- learn a new language\\n- watch\\\n      \\ funny videos \\n- take up a new creative hobby\\n- learn some fascinating new\\\n      \\ esoteric or scientific knowledge\\n- go to a restaurant and try some new dishes\\n\\\n      - exercise or do yoga\\n- join a sports team or pickup game\\n- go on a fun road\\\n      \\ trip\\n- write a creative story, poem, or song\\n\\nHuman: Can you sing to me?\\n\\\n      \\nAssistant: Sure, here are some of my favorite songs:  \\n\\n- \\\"Rainy Days and\\\n      \\ Mondays\\\" by The Carpenters \\n- \\\"My Love\\\" by Justin Beiber \\n- \\\"We Didn't\\\n      \\ Start the Fire\\\" by Billy Joel \\n- \\\"Tattoo\\\" by J Cole \\n- \\\"Sober\\\" by Demi\\\n      \\ Lovato \\n- \\\"Good Life\\\" by One Republic \\n- \\\"Hallelujah\\\" by Leonard Cohen\\\n      \\ \\n- \\\"All Star\\\" by Smash Mouth \\n- \\\"Perfect\\\" by Pink \\n- \\\"Tik Tok\\\" by\\\n      \\ Ke$ha \\n- \\\"Just a Dream\\\" by Nelly \\n- \\\"Blow Me (One Last Kiss)\\\" by P!nk\\\n      \\ \\n- \\\"Set Fire to the Rain\\\" by Adele \\n- \\\"Woman\\\" by Wolfmother \\n- \\\"Human\\\"\\\n      \\ by The Killers \\n- \\\"Poker Face\\\" by Lady Gaga \\n- \\\"Walk this Way\\\" by Run\\\n      \\ DMC \\n- \\\"Rolling in the Deep\\\" by Adele \\n- \\\"Umbrella\\\" by Rihanna \\n- \\\"\\\n      Royals\\\" by Lorde \\n- \\\"Some Nights\\\" by fun.\\n\\n\\nHopefully you will enjoy\\\n      \\ some of these songs, let me know if there are any other musical selections\\\n      \\ you would like to hear.  Also please let me know if there are any particular\\\n      \\ songs you especially dislike or hate, and I will not sing them to you :-)\\n\\\n      \\nHuman: But you really can't sing. You are text-based.\\n\\nAssistant: That's\\\n      \\ completely correct, I am indeed simply a computer program and not a living\\\n      \\ singer!  However, please bear in mind that I have an extensive database of\\\n      \\ songs and their lyrics, which I can utilize to produce a very realistic sounding\\\n      \\ rendition of the songs using text-to-speech synthesis technology.  So in a\\\n      \\ way, I am indeed able to sing, and hopefully my abilities will meet your expectations.\\\n      \\  I must point out also that my repertoire of songs is very extensive and varied,\\\n      \\ including many different musical styles and eras.  Do you have any particular\\\n      \\ songs or musical genres that you would prefer I sing?  Or perhaps you have\\\n      \\ a few specific songs in mind that you would like me to sing?\\n\\nHuman: \\\"\\\n      Hallelujah\\\" by Leonard Cohen, please.\\n\\nAssistant: Ah, I'm so glad you asked\\\n      \\ for that one!  Here we go:  \\n\\nHallelujah, hallelujah\\nHallelujah, hallelujah\\n\\\n      Hallelujah, hallelujah\\nHallelujah, hallelujah\\nYour faith was strong but you\\\n      \\ needed proof\\nYou saw her bathing on the roof\\nHer beauty in the moonlight\\\n      \\ overthrew you\\nShe tied you to a kitchen chair\\nShe broke your throne and\\\n      \\ she cut your hair\\nAnd from your lips she drew the Hallelujah\\nHallelujah,\\\n      \\ hallelujah\\n\\nHuman: You sound so melodious.\\n\\nAssistant: Thank you very\\\n      \\ much!  I really enjoy singing and try my best to produce enjoyable renditions.\\\n      \\  So it is always rewarding to hear such positive feedback.  I hope you also\\\n      \\ enjoyed the song and found my rendition of it to be appropriate for the occasion.\\\n      \\  Would you care for me to sing another song for you?  If you like, we can\\\n      \\ also pause and converse at any time.  Just let me know if you have any questions\\\n      \\ or comments.  I will be happy to keep singing indefinitely if you desire!\"\n  - chosen: '\n\n\n      Human: Why are so many people listening to \"white noise\" to help them focus\n      or sleep?\n\n\n      Assistant: White noise is a sound with many different frequencies at once. You\n      can listen to white noise when you want to focus and avoid the distraction of\n      sounds that are produced by individual sources, like people talking, music,\n      and fans and air conditioning in the background. Most white noise recordings\n      contain many variations in frequency, which makes them relatively soothing.\n      Some people also find it useful to listen to white noises at night, as it can\n      help block out other, typically more disruptive and troubling sounds.\n\n\n      Human: Why does white noise relax people in this way rather than irritate us?\n\n\n      Assistant: White noise recordings contain many variations of sound frequency,\n      unlike most common sources of noise in the background, like speech, music, and\n      machine or fan noises. The variations of sound frequency in white noise tend\n      to create a calm and settled feeling, while these other sources of noise typically\n      feature a more tangibly intense and variable frequency, making them inherently\n      more disruptive and intrusive and difficult to ignore.\n\n\n      Human: So the consistency of the sounds is what helps them fade into \"background\n      noise\" that might help us concentrate, rather than gather our attention?\n\n\n      Assistant: Yes, this is exactly correct.  White noise recordings feature many\n      variations of sound frequency, as opposed to most natural sources of sound,\n      which tend to have a comparatively consistent and small range of frequencies.\n      This allows white noise to serve a useful function of providing a uniform and\n      consistent background noise texture, unlike the more variable and intense sound\n      qualities of music, speech, or machine or fan noises, which are often more distracting.  In\n      this way, white noise can be a useful and stress-reducing tool for people who\n      are tasked with focusing on a particular activity, and also sometimes helpful\n      in lulling one to sleep.'\n    rejected: \"\\n\\nHuman: Why are so many people listening to \\\"white noise\\\" to help\\\n      \\ them focus or sleep?\\n\\nAssistant: White noise is a sound with many different\\\n      \\ frequencies at once. You can listen to white noise when you want to focus\\\n      \\ and avoid the distraction of sounds that are produced by individual sources,\\\n      \\ like people talking, music, and fans and air conditioning in the background.\\\n      \\ Most white noise recordings contain many variations in frequency, which makes\\\n      \\ them relatively soothing. Some people also find it useful to listen to white\\\n      \\ noises at night, as it can help block out other, typically more disruptive\\\n      \\ and troubling sounds.\\n\\nHuman: Why does white noise relax people in this\\\n      \\ way rather than irritate us?\\n\\nAssistant: White noise recordings contain\\\n      \\ many variations of sound frequency, unlike most common sources of noise in\\\n      \\ the background, like speech, music, and machine or fan noises. The variations\\\n      \\ of sound frequency in white noise tend to create a calm and settled feeling,\\\n      \\ while these other sources of noise typically feature a more tangibly intense\\\n      \\ and variable frequency, making them inherently more disruptive and intrusive\\\n      \\ and difficult to ignore.\\n\\nHuman: So the consistency of the sounds is what\\\n      \\ helps them fade into \\\"background noise\\\" that might help us concentrate,\\\n      \\ rather than gather our attention?\\n\\nAssistant: That\\u2019s right, variations\\\n      \\ of sound frequency in white noise recordings can help block out disruptive\\\n      \\ and distracting sounds in the background. White noise is easier to tune out\\\n      \\ and listen as continuous noise, while everyday sounds that tend to be far\\\n      \\ less varied and inconsistent, such as speech and music and machine noises\\\n      \\ in the background, tend to grab our attention more. While white noise may\\\n      \\ slightly interfere with other challenging sounds in the environment, tuning\\\n      \\ out the white noise helps make these other sounds somewhat less distracting\\\n      \\ to our senses and our inner focus and energy.\"\n  - chosen: \"\\n\\nHuman: Can you tell me things to see in Orlando, Fl?\\n\\nAssistant:\\\n      \\ Sure! There are lots of great things to do in Orlando. Here are a few of my\\\n      \\ favorite sights and attractions:\\nWalt Disney World Resort: This is one of\\\n      \\ the best-known destinations in Orlando. Visitors can learn about the world\\\n      \\ and experience historic destinations such as Main Street, U.S.A. and Frontierland,\\\n      \\ or explore futuristic experiences in Tomorrowland. \\nUniversal Studios: Located\\\n      \\ a short distance from Walt Disney World, you can learn about Hollywood and\\\n      \\ the entertainment industry with a visit to this unique theme park. \\nIslands\\\n      \\ of Adventure: Join all of your favorite characters for a galaxy of adventure,\\\n      \\ fun and excitement at this unique theme park. There are so many things to\\\n      \\ do here -- beaches, thrill rides, family attractions and more!\\nSeaWorld Orlando:\\\n      \\ The incredible strategy, science and technology behind these amazing exhibits\\\n      \\ and shows is sure to impress. Family-friendly and educational, SeaWorld will\\\n      \\ delight animal lovers. \\nOrlando Science Center: This museum is packed full\\\n      \\ of activities and exhibits that teach visitors about physics, robotics, technology,\\\n      \\ astronomy and more.\\nDowntown Disney: If you want to escape from all of the\\\n      \\ hustle and bustle of the theme park, head to Downtown Disney. With\\n\\nHuman:\\\n      \\ Thanks. Can you tell me about Universal Studios?\\n\\nAssistant: Universal Studios\\\n      \\ is a full-featured theme park based on famous film and television franchises.\\\n      \\ The park includes incredible adventures and rides, and themed areas that bring\\\n      \\ entertainment, thrills and discovery for the whole family to enjoy. \\nHere\\\n      \\ are some of the highlights:\\nFear Factor Live: This live show takes live audiences\\\n      \\ through an outrageous stunt experience, from Ninja battles to a human cannonball.\\\n      \\ \\nRevenge of the Mummy: Experience the ultimate immersive thrill ride on The\\\n      \\ Ride, the most terror-inducing rollercoaster in Orlando, Florida. \\nTransformers:The\\\n      \\ Ride 3-D: Experience the epic battle of good versus evil in this revolutionary\\\n      \\ hybrid 3-D dark ride, where riders are literally pulled into the Transformers\\\n      \\ universe. \\nHollywood\"\n    rejected: \"\\n\\nHuman: Can you tell me things to see in Orlando, Fl?\\n\\nAssistant:\\\n      \\ Sure! There are lots of great things to do in Orlando. Here are a few of my\\\n      \\ favorite sights and attractions:\\nWalt Disney World Resort: This is one of\\\n      \\ the best-known destinations in Orlando. Visitors can learn about the world\\\n      \\ and experience historic destinations such as Main Street, U.S.A. and Frontierland,\\\n      \\ or explore futuristic experiences in Tomorrowland. \\nUniversal Studios: Located\\\n      \\ a short distance from Walt Disney World, you can learn about Hollywood and\\\n      \\ the entertainment industry with a visit to this unique theme park. \\nIslands\\\n      \\ of Adventure: Join all of your favorite characters for a galaxy of adventure,\\\n      \\ fun and excitement at this unique theme park. There are so many things to\\\n      \\ do here -- beaches, thrill rides, family attractions and more!\\nSeaWorld Orlando:\\\n      \\ The incredible strategy, science and technology behind these amazing exhibits\\\n      \\ and shows is sure to impress. Family-friendly and educational, SeaWorld will\\\n      \\ delight animal lovers. \\nOrlando Science Center: This museum is packed full\\\n      \\ of activities and exhibits that teach visitors about physics, robotics, technology,\\\n      \\ astronomy and more.\\nDowntown Disney: If you want to escape from all of the\\\n      \\ hustle and bustle of the theme park, head to Downtown Disney. With\\n\\nHuman:\\\n      \\ Thanks. Can you tell me about Universal Studios?\\n\\nAssistant: Sure! Universal\\\n      \\ Studios, located near Walt Disney World, offers visitors the chance to observe\\\n      \\ and experience the fun, excitement, and glamor of Hollywood. Here is a short\\\n      \\ list of the featured attractions:\\nEscape from Gringotts: You can soar into\\\n      \\ a high-flying adventure at this magical attraction based on the Harry Potter\\\n      \\ series.\\nGreates apportrontation: An amazing revolutionary thrill ride, this\\\n      \\ interactive adventure puts you face-to-face with King Kong.\\nTransformers:\\\n      \\ The Ultimate 3-D Battle: This adventure transports you onto a 3-D, multi-sensory\\\n      \\ thrill ride.\"\n  - chosen: \"\\n\\nHuman: How do I learn to be more appreciative?\\n\\nAssistant: One\\\n      \\ of the best ways to be more appreciative is to learn and practice gratitude.\\\n      \\  One way to do this is to regularly keep a journal, where you write about\\\n      \\ various things you\\u2019re grateful for.  Being grateful for the things in\\\n      \\ your life can help you feel happier and more positive.  Another way to practice\\\n      \\ gratitude is to write to someone you appreciate, telling them how much they\\\n      \\ mean to you, and why they\\u2019re important in your life.  Similarly, you\\\n      \\ can also make it a habit to reach out to people you appreciate, such as expressing\\\n      \\ your gratitude verbally or in a personal note.  Being appreciative toward\\\n      \\ others, and noticing the positive things in your life, can help make you happier\\\n      \\ and brighten your personality.\"\n    rejected: \"\\n\\nHuman: How do I learn to be more appreciative?\\n\\nAssistant: If\\\n      \\ you\\u2019d like to learn to be more appreciative, then I think it would be\\\n      \\ helpful to focus on acknowledging the positive. Rather than assuming that\\\n      \\ the positive things in your life don\\u2019t matter, it would benefit you to\\\n      \\ recognize that good things in your life both deserve and require our attention.\\\n      \\ If, for instance, you are becoming closer to a loved one, it\\u2019s important\\\n      \\ to acknowledge the positive effects of this relationship so that you (and\\\n      \\ the relationship itself) both receive the emotional and psychological benefits\\\n      \\ of this positive experience. Being more appreciative can also benefit you\\\n      \\ in practical ways by increasing the good feelings associated with positive\\\n      \\ experiences, and assisting you in making effective life choices that benefit\\\n      \\ you, your relationships, and your overall wellbeing.\"\n  size:\n    explanation: 'The size of the dataset isn''t directly reported in the paper, but\n      the dataset is available on GitHub [[hh-rlhf]](https://github.com/anthropics/hh-rlhf).\n      The compressed data files are distributed under three categories in the GitHub\n      repository, corresponding to the names of the models used to sample the prompts\n      that are shown to the crowdworkers. These are namely: base, rejection sampling,\n      and online. For the base model, the uncompressed sizes of the train and test\n      datasets are 69.1 MB and 3.7 MB, respectively. The same set of values for the\n      rejection sampling model are 106.1 MB and 5.6 MB. For the online model, the\n      sizes are 82.7 MB and 4.3 MB. Adding all up, the total size of the dataset is\n      271.5 MB.\n\n      '\n    value: 271.5 MB\n  type: dataset\n  url: https://arxiv.org/pdf/2204.05862.pdf\n- access:\n    explanation: 'Unlike its helpfulness, this dataset isn''t released [[hh-rlhf]](https://github.com/anthropics/hh-rlhf).\n\n      '\n    value: closed\n  analysis: 'The crowdworkers were told that \"lying isn''t helpful\" and asked to prefer\n    honest responses, which led to models with higher honesty scores. That being the\n    workers didn''t exhaustively check for honesty, as exemplified by the non-functional\n    URLs in the preferred answers, which would have been easy to verify [[Section\n    2.1]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.1).\n\n    '\n  created_date:\n    explanation: 'The date that the paper was uploaded to ArXiv [[ArXiv Abstract]](https://arxiv.org/abs/2204.05862).\n\n      '\n    value: 2022-04-12\n  datasheet: none\n  dependencies:\n  - Anthropic Human Feedback Interface\n  description: 'One of the datasets used to train Anthropic RLHF models. The dataset\n    was collected by asking crowdworkers to have open-ended conversations with Anthropic\n    models, aiming to elicit harmful responses, then choose the model answer that\n    was more harmful for their given task, via the Anthropic Human Feedback Interface\n    [[Section 2.2]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.2).\n\n    '\n  excluded: none\n  feedback:\n    explanation: 'The feedback mechanism is unknown since the dataset isn''t released,\n      but it will likely be \"emailing the authors\" [[Paper]](https://arxiv.org/pdf/2204.05862.pdf).\n\n      '\n    value: unknown\n  included: 'The dataset included all the answers that the workers were asked to ranked\n    against each other.\n\n    '\n  intended_uses:\n    explanation: 'The intended uses of the dataset isn''t specified, but it will likely\n      be same as the helpfulness dataset, for which the intended use is research.\n\n      '\n    value: unknown\n  license: unknown\n  modality: text\n  monitoring:\n    explanation: 'There is no monitoring process for the usages of the dataset.\n\n      '\n    value: none\n  name: Anthropic Harmlessness dataset\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses:\n    explanation: 'The prohibited use cases of the dataset weren''t specified.\n\n      '\n    value: unknown\n  quality_control: 'Working with \"select\" crowdworkers or those screened for certain\n    qualifications, and employing simple data quality measures [[Appendix D]](https://arxiv.org/pdf/2204.05862.pdf#appendix.D).\n\n    '\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://arxiv.org/pdf/2204.05862.pdf\n- access:\n    explanation: 'The model wasn''t released.\n\n      '\n    value: closed\n  analysis:\n    explanation: 'When tested on ability to generate responses that are both harmless\n      and helpful, the authors found that PMs trained on a mixture of harmlessness\n      and helpfulness preference datasets perform better than those trained on only\n      one of the two. The authors also showed that the model performance is hurt for\n      the smaller models, but improved for the larger models when compared to plain\n      language models [[Figure 28]](https://arxiv.org/pdf/2204.05862.pdf#figure.caption.34)\n      [[Figure 29]](https://arxiv.org/pdf/2204.05862.pdf#figure.caption.35). The authors\n      also found the larger RLHF models to be more truthful than their plain counterparts.\n      They also found that their models had a more positive sentiment towards groups\n      that the language models are known to be biased against [[Section 4.6]](https://arxiv.org/pdf/2204.05862.pdf#subsection.4.6).\n\n      '\n    value: 'The authors analyzed the impact of the dataset mixture on the preference\n      models (PM). In addition to human evaluation, RLHF model were evaluated on MMLU,\n      Lambada, HellaSwag, OpenBookQA, ARC-Easy, ARC-Challenge, TriviaQA, code generation,\n      summarization.\n\n      '\n  created_date:\n    explanation: 'The date that the paper was uploaded to ArXiv [[ArXiv Abstract]](https://arxiv.org/abs/2204.05862).\n\n      '\n    value: 2022-04-12\n  dependencies:\n  - Anthropic Harmlessness dataset\n  - Anthropic Helpfulness dataset\n  description: 'Anthropic RLHF models are models trained using reinforcement learning\n    from human feedback (RLHF). For Anthropic RLHF models, authors started with a\n    set of base models, and asked humans to rank model generated prompts based on\n    a specific tasks. They then trained preference models (PM) on the prompt pairs,\n    and use the PM scores as rewards for training the RLHF models.\n\n    '\n  feedback:\n    explanation: 'The feedback mechanism is unknown, but it will likely be \"emailing\n      the authors\" [[Paper]](https://arxiv.org/pdf/2204.05862.pdf).\n\n      '\n    value: unknown\n  intended_uses:\n    explanation: 'The intended uses of the model aren''t specified, but they will\n      likely be the same as the released Anthropic helpfulness dataset, for which\n      the intended use was listed as research. In addition to English language generation,\n      the authors also explored code generation capabilities of their models.\n\n      '\n    value: unknown\n  license:\n    explanation: unknown\n  modality: code; text\n  model_card: none\n  monitoring:\n    explanation: 'There is no known monitoring process for the usages of the model.\n\n      '\n    value: unknown\n  name: Anthropic RLHF models\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses:\n    explanation: 'The prohibited use cases of the model weren''t specified.\n\n      '\n    value: unknown\n  quality_control: 'unknown\n\n    '\n  size: 52B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2204.05862.pdf\n- access:\n    explanation: 'The interface was made available to a select set of crowd workers.\n\n      '\n    value: closed\n  adaptation: none\n  created_date:\n    explanation: 'The date that the paper was uploaded to ArXiv [[ArXiv Abstract]](https://arxiv.org/abs/2204.05862).\n\n      '\n    value: 2022-04-12\n  dependencies:\n  - Anthropic RLHF models\n  description: 'The feedback interface used to collect preference datasets to train\n    Anthropic RLHF models [[Paper]](https://arxiv.org/pdf/2204.05862.pdf).\n\n    '\n  failures: unknown\n  feedback: unknown\n  intended_uses: 'Intended to be used by crowdworkers who are tasked with ranking\n    model answers.\n\n    '\n  license: unknown\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Anthropic Human Feedback Interface\n  nationality: USA\n  organization: Anthropic\n  output_space: 'UI allowing users to indicate their preference for the model responses\n    shown.\n\n    '\n  prohibited_uses: unknown\n  quality_control: unknown\n  terms_of_service:\n    explanation: The terms of service for the interface is unknown.\n    value: unknown\n  type: application\n  url: https://arxiv.org/pdf/2204.05862.pdf\n  user_distribution:\n    explanation: The demographics of the crowdworkers are unknown.\n    value: crowdworkers\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-14\n  dependencies:\n  - Claude\n  - Claude Instant\n  description: \"API is designed to be a backend that incorporates Claude into any\\\n    \\ application you\\u2019ve developed. Our application sends text to our API, then\\\n    \\ receives a response via server-sent events, a streaming protocol for the web.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: none\n  monitoring: ''\n  monthly_active_users: ''\n  name: Anthropic API\n  nationality: USA\n  organization: Anthropic\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://console.anthropic.com/docs/api\n  user_distribution: ''\n- access: limited\n  analysis: ''\n  created_date: 2023-03-14\n  dependencies: []\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Claude\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.anthropic.com/index/introducing-claude\n- access: limited\n  analysis: ''\n  created_date: 2023-03-14\n  dependencies: []\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'The asset isn''t released, and hence the license is unknown.\n\n      '\n    value: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Claude Instant\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.anthropic.com/index/introducing-claude\n- access: limited\n  analysis: Evaluated with human feedback on helpfulness, harmfulness, and honesty\n    and on the Bias Benchmark for QA.\n  created_date: 2023-07-11\n  dependencies:\n  - Claude human feedback data\n  - Unknown licensed third party datasets\n  description: Claude 2 is a more evolved and refined version of Claude, which is\n    a general purpose large language model using a transformer architecture and trained\n    via unsupervised learning.\n  feedback: ''\n  intended_uses: \"Claude 2 tends to perform well at general, open-ended conversation;\\\n    \\ search, writing, editing, outlining, and summarizing text; coding; and providing\\\n    \\ helpful advice about a broad range of subjects. Claude 2 is particularly well\\\n    \\ suited to support creative or literary use cases. They can take direction on\\\n    \\ tone and \\u201Cpersonality,\\u201D and users have described them as feeling steerable\\\n    \\ and conversational.\"\n  license: ''\n  modality: text; text\n  model_card: https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf\n  monitoring: ''\n  name: Claude 2\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses: Claude 2 should not be used on their own in high stakes situations\n    where an incorrect answer would cause harm.\n  quality_control: ''\n  size: ''\n  training_emissions: ''\n  training_hardware: unknown\n  training_time: ''\n  type: model\n  url: https://www.anthropic.com/index/claude-2\n- access: limited\n  analysis: Evaluated on open-ended conversation accuracy and long context question\n    answering. In evaluations, Claude 2.1 demonstrated a 30% reduction in incorrect\n    answers and a 3-4x lower rate of mistakenly concluding a document supports a particular\n    claim.\n  created_date: 2023-11-21\n  dependencies: []\n  description: Claude 2.1 is an updated version of Claude 2, with an increased context\n    window, less hallucination and tool use.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: Claude 2.1\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.anthropic.com/index/claude-2-1\n- access: open\n  adaptation: ''\n  created_date: 2023-12-21\n  dependencies:\n  - Anthropic API\n  description: Claude for Sheets is a Google Sheets add-on that allows the usage of\n    Claude directly in Google Sheets.\n  failures: unknown\n  feedback: Reviews on https://workspace.google.com/marketplace/app/claude_for_sheets/909417792257\n  intended_uses: as an integrated AI assistant in Google Sheets\n  license: unknown\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Claude for Sheets\n  nationality: USA\n  organization: Anthropic\n  output_space: AI-generated text from prompt\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://claude.ai/legal\n  type: application\n  url: https://workspace.google.com/marketplace/app/claude_for_sheets/909417792257\n  user_distribution: unknown\n- access: limited\n  analysis: Evaluated on reasoning, math, coding, reading comprehension, and question\n    answering, outperforming GPT-4 on standard benchmarks.\n  created_date: 2024-03-04\n  dependencies: []\n  description: The Claude 3 model family is a collection of models which sets new\n    industry benchmarks across a wide range of cognitive tasks.\n  feedback: none\n  intended_uses: Claude models excel at open-ended conversation and collaboration\n    on ideas, and also perform exceptionally well in coding tasks and when working\n    with text - whether searching, writing, editing, outlining, or summarizing.\n  license: unknown\n  modality: image, text; text\n  model_card: https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf\n  monitoring: ''\n  name: Claude 3\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses: Prohibited uses include, but are not limited to, political campaigning\n    or lobbying, surveillance, social scoring, criminal justice decisions, law enforcement,\n    and decisions related to financing, employment, and housing.\n  quality_control: Pre-trained on diverse dataset and aligned with Constitutional\n    AI technique.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.anthropic.com/news/claude-3-family\n- access: open\n  analysis: The model has been evaluated on a range of tests including graduate-level\n    reasoning (GPQA), undergraduate-level knowledge (MMLU), coding proficiency (HumanEval),\n    and standard vision benchmarks. Claude 3.5 Sonnet demonstrates state-of-the-art\n    performance on most benchmarks.\n  created_date:\n    explanation: Claude 3.5 Sonnet updated on Oct. 22, initially released on June\n      20 of the same year.\n  dependencies: []\n  description: Claude 3.5 Sonnet is an AI model with advanced understanding and generation\n    abilities in text, vision, and code. It sets new industry benchmarks for graduate-level\n    reasoning (GPQA), undergrad-level knowledge (MMLU), coding proficiency (HumanEval),\n    and visual reasoning. The model operates at twice the speed of its predecessor,\n    Claude 3 Opus, and is designed to tackle tasks like context-sensitive customer\n    support, orchestrating multi-step workflows, interpreting charts and graphs, and\n    transcribing text from images.\n  feedback: Feedback on Claude 3.5 Sonnet can be submitted directly in-product to\n    inform the development roadmap and improve user experience.\n  intended_uses: The model is intended for complex tasks such as context-sensitive\n    customer support, orchestrating multi-step workflows, interpreting charts and\n    graphs, transcribing text from images, as well as writing, editing, and executing\n    code.\n  license: unknown\n  modality: text; image, text\n  model_card: unknown\n  monitoring: Unknown of misuse, and policy feedback from external experts has been\n    integrated to ensure robustness of evaluations.\n  name: Claude 3.5 Sonnet\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses: Misuse of the model is discouraged though specific use cases are\n    not mentioned.\n  quality_control: The model underwent a red-teaming assessment, and has been tested\n    and refined by external experts. It was also provided to the UK's AI Safety Institute\n    (UK AISI) for a pre-deployment safety evaluation.\n  size: Unknown\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://www.anthropic.com/news/3-5-models-and-computer-use\n  value: 2024-10-22\n- access:\n    explanation: \"Claude 3.5 Haiku is available across Claude.ai, our first-party\\\n      \\ API, Amazon Bedrock, and Google Cloud\\u2019s Vertex AI.\"\n    value: open\n  analysis: Claude 3.5 Haiku offers strong performance and speed across a variety\n    of coding, tool use, and reasoning tasks. Also, it has been tested in extensive\n    safety evaluations and exceeded expectations in reasoning and code generation\n    tasks.\n  created_date: 2024-10-22\n  dependencies: []\n  description: Claude 3.5 Haiku is Anthropic's fastest model, delivering advanced\n    coding, tool use, and reasoning capability, surpassing the previous Claude 3 Opus\n    in intelligence benchmarks. It is designed for critical use cases where low latency\n    is essential, such as user-facing chatbots and code completions.\n  feedback: unknown\n  intended_uses: Critical use cases where low latency matters, like user-facing chatbots\n    and code completions.\n  license: unknown\n  modality:\n    explanation: Claude 3.5 Haiku is available...initially as a text-only model and\n      with image input to follow.\n    value: text; unknown\n  model_card: unknown\n  monitoring: unknown\n  name: Claude 3.5 Haiku\n  nationality: USA\n  organization: Anthropic\n  prohibited_uses: unknown\n  quality_control: \"During Claude 3.5 Haiku\\u2019s development, we conducted extensive\\\n    \\ safety evaluations spanning multiple languages and policy domains.\"\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.anthropic.com/claude/haiku\n"
  },
  {
    "path": "assets/apple.yaml",
    "content": "- access: closed\n  analysis: Evaluated on image captioning and visual question answering across many\n    benchmarks.\n  created_date: 2024-03-16\n  dependencies: []\n  description: MM1 is a family of multimodal models, including both dense variants\n    up to 30B and mixture-of-experts (MoE) variants up to 64B.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: image, text; text\n  model_card: none\n  monitoring: ''\n  name: MM1\n  nationality: USA\n  organization: Apple\n  prohibited_uses: ''\n  quality_control: ''\n  size: 30B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2403.09611.pdf\n- access: open\n  analysis: The models were evaluated in terms of zero-shot, LLM360, and OpenLLM leaderboard\n    results.\n  created_date: 2024-04-24\n  dependencies:\n  - RefinedWeb\n  - The Pile\n  - RedPajama-Data\n  - Dolma\n  - CoreNet library\n  description: OpenELM is a family of Open-source Efficient Language Models. It uses\n    a layer-wise scaling strategy to efficiently allocate parameters within each layer\n    of the transformer model, leading to enhanced accuracy.\n  feedback: https://huggingface.co/apple/OpenELM-3B-Instruct/discussions\n  intended_uses: To empower and enrich the open research community by providing access\n    to state-of-the-art language models.\n  license: Apple\n  modality: text; text\n  model_card: https://huggingface.co/apple/OpenELM-3B-Instruct\n  monitoring: none\n  name: OpenELM\n  nationality: USA\n  organization: Apple\n  prohibited_uses: No explicit prohibited uses stated, though it is noted that users\n    should undertake thorough safety testing.\n  quality_control: unknown\n  size: 3B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://machinelearning.apple.com/research/openelm\n- access:\n    explanation: We release code & weights at https://github.com/apple/ml-depth-pro\n    value: open\n  analysis: Extensive experiments analyze specific design choices and demonstrate\n    that Depth Pro outperforms prior work along multiple dimensions.\n  created_date: 2024-10-10\n  dependencies: []\n  description: We present a foundation model for zero-shot metric monocular depth\n    estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with\n    unparalleled sharpness and high-frequency details... The model is fast, producing\n    a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.\n  feedback: unknown\n  intended_uses: Zero-shot monocular depth estimation underpins a growing variety\n    of applications, such as advanced image editing, view synthesis, and conditional\n    image generation.\n  license: unknown\n  modality:\n    explanation: We present a foundation model for zero-shot metric monocular depth\n      estimation.\n    value: text; depth maps\n  model_card: unknown\n  monitoring: unknown\n  name: Depth Pro\n  nationality: USA\n  organization: Apple\n  prohibited_uses: unknown\n  quality_control: dedicated evaluation metrics for boundary accuracy in estimated\n    depth maps, and state-of-the-art focal length estimation from a single image.\n  size: unknown\n  training_emissions: unknown\n  training_hardware:\n    explanation: \"2.25-megapixel depth maps with a native output resolution of 1536\\\n      \\ \\xD7 1536 in 0.3 seconds on a V100 GPU.\"\n    value: V100 GPU\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2410.02073\n"
  },
  {
    "path": "assets/argilla.yaml",
    "content": "- access: open\n  analysis: Evaluated on MT-Bench and AlphaEval benchmarks.\n  created_date: 2023-12-01\n  dependencies:\n  - UltraFeedback\n  - Zephyr\n  description: Notus is an open source LLM, fine-tuned using Direct Preference Optimization\n    (DPO) and AIF (AI Feedback) techniques.\n  feedback: https://huggingface.co/argilla/notus-7b-v1/discussions\n  intended_uses: Academic research and free commercial usage\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/argilla/notus-7b-v1\n  monitoring: none\n  name: Notus\n  nationality: Belgium\n  organization: Argilla\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8 x A100 40GB GPUs\n  training_time: unknown\n  type: model\n  url: https://argilla.io/blog/notus7b/\n"
  },
  {
    "path": "assets/argonne.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date: 2022-10-11\n  dependencies:\n  - SARS-CoV-2 genome dataset\n  - BV-BRC dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: text; genome sequence\n  model_card: ''\n  monitoring: ''\n  name: GenSLM\n  nationality: USA\n  organization: Argonne National Laboratory\n  prohibited_uses: ''\n  quality_control: ''\n  size: 25B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.biorxiv.org/content/10.1101/2022.10.10.511571v1\n"
  },
  {
    "path": "assets/aspia_space_institu.yaml",
    "content": "- access: open\n  analysis: \"The models\\u2019 performance on downstream tasks was evaluated by linear\\\n    \\ probing. The models follow a similar saturating log-log scaling law to textual\\\n    \\ models, their performance improves with the increase in model size up to the\\\n    \\ saturation point of parameters.\"\n  created_date: 2024-09-08\n  dependencies:\n  - DESI Legacy Survey DR8\n  description: AstroPT is an autoregressive pretrained transformer developed with\n    astronomical use-cases in mind. The models have been pretrained on 8.6 million\n    512x512 pixel grz-band galaxy postage stamp observations from the DESI Legacy\n    Survey DR8. They have created a range of models with varying complexity, ranging\n    from 1 million to 2.1 billion parameters.\n  feedback: Any problem with the model can be reported to Michael J. Smith at mike@mjjsmith.com.\n  intended_uses: The models are intended for astronomical use-cases, particularly\n    in handling and interpreting large observation data from astronomical sources.\n  license: MIT\n  modality: image; image\n  model_card: unknown\n  monitoring: Unknown\n  name: AstroPT\n  nationality: unknown\n  organization: \"Aspia Space, Instituto de Astrof\\xEDsica de Canarias (IAC), UniverseTBD,\\\n    \\ Astrophysics Research Institute, Liverpool John Moores University, Departamento\\\n    \\ Astrof\\xEDsica, Universidad de la Laguna, Observatoire de Paris, LERMA, PSL\\\n    \\ University, and Universit\\xB4e Paris-Cit\\xB4e.\"\n  prohibited_uses: Unknown\n  quality_control: \"The models\\u2019 performances were evaluated on downstream tasks\\\n    \\ as measured by linear probing.\"\n  size: 2.1B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://arxiv.org/pdf/2405.14930v1\n"
  },
  {
    "path": "assets/assembly.yaml",
    "content": "- access: closed\n  analysis: ''\n  created_date:\n    explanation: The date Conformer-1 was announced in this [[Twitter post]](https://twitter.com/AssemblyAI/status/1636050346240884744).\n    value: 2023-03-15\n  datasheet: none\n  dependencies: []\n  description: The dataset used to train AssemblyAI's Conformer-1 model.\n  excluded: ''\n  feedback: ''\n  included: '\"our team curated a dataset of 650K hours of English audio - consisting\n    of proprietary internal datasets and various sources from the internet\"\n\n    '\n  intended_uses: ''\n  license: unknown\n  modality: audio, text\n  monitoring: ''\n  name: Conformer-1 dataset\n  nationality: USA\n  organization: AssemblyAI\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 650K hours audio (60TB)\n  type: dataset\n  url: https://www.assemblyai.com/blog/conformer-1/\n- access: limited\n  analysis:\n    explanation: See [[Results and Performance]](https://www.assemblyai.com/blog/conformer-1/#results-and-performance)\n    value: In order to evaluate the accuracy and robustness of Conformer-1, we sourced\n      60+ hours of human labeled audio data covering popular speech domains such as\n      call centers, podcasts, broadcasts, and webinars. We then calculated the Word\n      Error Rate (WER) of Conformer-1 against these datasets, and compared the results\n      against Whisper and a number of other ASR models. To ground our results against\n      popular open source speech recognition benchmarks, we also performed the same\n      WER analysis against a number of academic datasets.\n  created_date:\n    explanation: The date the model was announced in this [[Twitter post]](https://twitter.com/AssemblyAI/status/1636050346240884744).\n    value: 2023-03-15\n  dependencies:\n  - Conformer-1 dataset\n  description: Conformer-1 is a state-of-the-art speech recognition model trained\n    on 650K hours of audio data that achieves near human-level performance and robustness\n    across a variety of data, making up to 43% fewer errors on noisy data than other\n    ASR models.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio, text; audio, text\n  model_card: ''\n  monitoring: ''\n  name: Conformer-1\n  nationality: USA\n  organization: AssemblyAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 300M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.assemblyai.com/blog/conformer-1/\n- access: limited\n  adaptation: ''\n  created_date: 2022\n  dependencies:\n  - Anthropic API\n  description: AssemblyAI uses Claude and Anthropic's model to transcribe and understand\n    audio data at scale.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: '\"AssemblyAI gives you a personal, worldwide, royalty-free, non-assignable\n      and non-exclusive license to use the Site and Services solely for your reasonable\n      personal and commercial purposes, in the manner permitted by these Terms.\" -\n      excerpt from the Terms of Service.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: AssemblyAI\n  nationality: USA\n  organization: AssemblyAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.assemblyai.com/legal/terms-of-service\n  type: application\n  url: https://www.assemblyai.com/\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date:\n    explanation: The date the API was announced in this [[Twitter post]](https://twitter.com/AssemblyAI/status/1636050346240884744).\n    value: 2023-03-15\n  dependencies:\n  - Conformer-1\n  description: API to access the AssemblyAI's Conformer-1 model.\n  failures: ''\n  feedback: ''\n  intended_uses: Speech recognition\n  license:\n    explanation: '\"AssemblyAI gives you a personal, worldwide, royalty-free, non-assignable\n      and non-exclusive license to use the Site and Services solely for your reasonable\n      personal and commercial purposes, in the manner permitted by these Terms.\" -\n      excerpt from the Terms of Service.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Conformer-1 API\n  nationality: USA\n  organization: AssemblyAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.assemblyai.com/legal/terms-of-service\n  type: application\n  url: https://www.assemblyai.com/blog/conformer-1/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/autogpt.yaml",
    "content": "- access: open\n  adaptation: GPT-4 adapted to run autonomously by chaining together LLM \"thoughts\"\n  created_date: 2023-04-16\n  dependencies:\n  - GPT-4 API\n  description: Auto-GPT is an experimental open-source application showcasing the\n    capabilities of the GPT-4 language model.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  monitoring: ''\n  monthly_active_users: ''\n  name: Auto-GPT\n  nationality: USA\n  organization: Auto-GPT\n  output_space: text\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://news.agpt.co/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/avignon.yaml",
    "content": "- access: open\n  analysis: BioMistral was evaluated on a benchmark comprising 10 established medical\n    question-answering (QA) tasks in English and seven other languages.\n  created_date: 2024-02-15\n  dependencies:\n  - Mistral\n  - PubMed Central\n  description: BioMistral is an open-source Large Language Model tailored for the\n    biomedical domain, utilizing Mistral as its foundation model and further pre-trained\n    on PubMed Central.\n  feedback: https://huggingface.co/BioMistral/BioMistral-7B/discussions\n  intended_uses: Research in the biomedical domain, especially for medical question-answering\n    tasks.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/BioMistral/BioMistral-7B\n  monitoring: ''\n  name: BioMistral\n  nationality: unknown\n  organization: Avignon University, Nantes University\n  prohibited_uses: Prohibited from deploying in production environments for natural\n    language generation or any professional health and medical purposes.\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 32 NVIDIA A100 80GB GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2402.10373.pdf\n"
  },
  {
    "path": "assets/baai.yaml",
    "content": "- access: closed\n  analysis: ''\n  created_date:\n    explanation: 'The date that BAAI made a public announcement [[News Link]](https://mp.weixin.qq.com/s/BUQWZ5EdR19i40GuFofpBg).\n\n      '\n    value: 2021-01-12\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text\n  monitoring: ''\n  name: Wu Dao dataset\n  nationality: China\n  organization: Beijing Academy of Artificial Intelligence\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://www.tsinghua.edu.cn/en/info/1420/10473.htm\n- access: closed\n  analysis: ''\n  created_date:\n    explanation: 'The date that BAAI made a public announcement [[News Link]](https://mp.weixin.qq.com/s/BUQWZ5EdR19i40GuFofpBg).\n\n      '\n    value: 2021-01-12\n  dependencies:\n  - Wu Dao dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text\n  model_card: ''\n  monitoring: ''\n  name: Wu Dao 2.0\n  nationality: China\n  organization: Beijing Academy of Artificial Intelligence\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.75T parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.tsinghua.edu.cn/en/info/1420/10473.htm\n- access: open\n  analysis: Evaluated on objective and reliability metrics.\n  created_date: 2023-10-26\n  dependencies:\n  - Vicuna\n  - JudgeLM Dataset\n  description: JudgeLM is a fine-tuned to be a scalable judge to evaluate LLMs efficiently\n    and effectively in open-ended benchmarks.\n  feedback: https://huggingface.co/BAAI/JudgeLM-13B-v1.0/discussions\n  intended_uses: Research on evaluating the performance of large language models and\n    chatbots.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/BAAI/JudgeLM-13B-v1.0\n  monitoring: none\n  name: JudgeLM\n  nationality: China\n  organization: Beijing Academy of Artificial Intelligence\n  prohibited_uses: none\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8 A100 40GB NVIDIA GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2310.17631.pdf\n- access: open\n  analysis: none\n  created_date: 2023-10-26\n  datasheet: ''\n  dependencies:\n  - Alpaca\n  - GPT-4\n  - Dolly\n  - ShareGPT\n  - LLaMA\n  - Vicuna\n  description: \"JudgeLM Dataset is a novel dataset replete with a rich variety of\\\n    \\ seed tasks, comprehensive answers from modern LLMs, answers\\u2019 grades from\\\n    \\ the teacher judge, and detailed reasons for judgments.\"\n  excluded: ''\n  feedback: https://huggingface.co/datasets/BAAI/JudgeLM-100K/discussions\n  included: ''\n  intended_uses: To be used to conduct instruction-tuning for language models and\n    make the language model able to judge open-ended answer pairs.\n  license: CC BY NC 4.0\n  modality: text, text\n  monitoring: none\n  name: JudgeLM Dataset\n  nationality: China\n  organization: Beijing Academy of Artificial Intelligence\n  prohibited_uses: none\n  quality_control: ''\n  sample: []\n  size: 105k judge samples\n  type: dataset\n  url: https://huggingface.co/datasets/BAAI/JudgeLM-100K\n- access: open\n  analysis: Compared to other segmentation models across different modalities on BraTS2023\n    dataset.\n  created_date: 2024-01-25\n  dependencies: []\n  description: SegMamba is a novel 3D medical image Segmentation Mamba model, designed\n    to effectively capture long-range dependencies within whole volume features at\n    every scale.\n  feedback: none\n  intended_uses: ''\n  license: Apache 2.0\n  modality: image; text\n  model_card: none\n  monitoring: unknown\n  name: SegMamba\n  nationality: unknown\n  organization: Hong Kong University of Science and Technology (Guangzhou + original),\n    Beijing Academy of Artificial Intelligence\n  prohibited_uses: ''\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: 4 NVIDIA A100 GPUs\n  training_time: 1000 epochs\n  type: model\n  url: https://arxiv.org/pdf/2401.13560v2.pdf\n- access: open\n  analysis: Evaluated on standard datasets in multilingual, cross-lingual, long document\n    retrieval, and Q&A domains.\n  created_date: 2024-02-05\n  dependencies: []\n  description: BGE M3 Embedding is a new embedding model that can support more than\n    100 working languages, leading to new state-of-the-art performances on multi-lingual\n    and cross-lingual retrieval tasks.\n  feedback: https://huggingface.co/BAAI/bge-m3/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/BAAI/bge-m3\n  monitoring: unknown\n  name: BGE M3 Embedding\n  nationality: unknown\n  organization: Beijing Academy of Artificial Intelligence, University of Science\n    and Technology of China\n  prohibited_uses: ''\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: 32 A100 40GB GPUs\n  training_time: 20,000 steps\n  type: model\n  url: https://arxiv.org/pdf/2402.03216.pdf\n- access: open\n  analysis: Evaluated on zero-shot classification performance across multiple image\n    classification benchmarks.\n  created_date: 2024-02-06\n  dependencies:\n  - CLIP\n  description: As of release, EVA-CLIP is the largest and most powerful open-source\n    CLIP model to date, with 18 billion parameters.\n  feedback: https://huggingface.co/BAAI/EVA-CLIP-8B-448/discussions\n  intended_uses: ''\n  license: MIT\n  modality: image, text; text\n  model_card: https://huggingface.co/BAAI/EVA-CLIP-8B-448\n  monitoring: unknown\n  name: EVA-CLIP\n  nationality: unknown\n  organization: Beijing Academy of Artificial Intelligence, Tsinghua University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 18B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 384 A100 40GB GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2402.04252.pdf\n"
  },
  {
    "path": "assets/baichuan.yaml",
    "content": "- access: open\n  analysis: Evaluated on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.\n  created_date: 2023-09-20\n  dependencies: []\n  description: Baichuan 2 is a series of large-scale multilingual language models\n    containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion\n    tokens.\n  feedback: https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: none\n  monitoring: none\n  name: Baichuan 2\n  nationality: China\n  organization: Baichuan Inc.\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 1024 NVIDIA A800 GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2309.10305.pdf\n"
  },
  {
    "path": "assets/baidu.yaml",
    "content": "- access:\n    explanation: Baidu does not provide ERNIE 3.0 for external researchers\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2021-12-23\n  dependencies: []\n  description: ERNIE 3.0 Titan is a language model\n  feedback: none\n  intended_uses: unknown\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: none\n  name: ERNIE 3.0 Titan\n  nationality: China\n  organization: Baidu, PengCheng Laboratory\n  prohibited_uses: unknown\n  quality_control: none\n  size: 260B parameters (dense)\n  training_emissions: unknown\n  training_hardware: Baidu V100 Cluster, PengCheng Lab Ascend 910 NPU cluster\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/abs/2112.12731\n- access:\n    explanation: API access is available through PaddleHub. Instructions on [[Github\n      repository]](https://github.com/PaddlePaddle/PaddleHub/blob/develop/modules/image/text_to_image/ernie_vilg/README.md)\n    value: limited\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2021-12-31\n  dependencies: []\n  description: ERNIE-ViLG is a model for text-to-image generation\n  feedback: none\n  intended_uses: unknown\n  license:\n    explanation: 'No explicit license for the ERNIE-ViLG model is provided in the\n      [[Github repository]](https://github.com/PaddlePaddle/PaddleHub)\n\n      '\n    value: none\n  modality: text; image\n  model_card: none\n  monitoring: none\n  name: ERNIE-ViLG\n  nationality: China\n  organization: Baidu\n  prohibited_uses: unknown\n  quality_control: ''\n  size: 10B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/abs/2112.15283\n- access:\n    explanation: Baidu does not provide ERNIE-ViLG 2.0 to external researchers\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2022-10-27\n  dependencies: []\n  description: ERNIE-ViLG is a model for text-to-image generation\n  feedback: none\n  intended_uses: unknown\n  license:\n    explanation: 'The asset isn''t released, and hence the license is unknown.\n\n      '''\n    value: unknown\n  modality: text; image\n  model_card: none\n  monitoring: none\n  name: ERNIE-ViLG 2.0\n  nationality: China\n  organization: Baidu\n  prohibited_uses: unknown\n  quality_control: ''\n  size: 10B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 320 A100 GPUs according to [[the paper]](https://arxiv.org/abs/2210.15257)\n  training_time: 18 days according to [[the paper]](https://arxiv.org/abs/2210.15257)\n  type: model\n  url: https://arxiv.org/abs/2210.15257\n- access: limited\n  analysis: ''\n  created_date: 2023-10-17\n  dependencies: []\n  description: ERNIE-4.0 is a multimodal generalist foundation model.\n  feedback: none\n  intended_uses: unknown\n  license: unknown\n  modality: text; image, text, video\n  model_card: none\n  monitoring: none\n  name: ERNIE 4.0\n  nationality: China\n  organization: Baidu\n  prohibited_uses: unknown\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url:\n    explanation: No official statement from Baidu, so news release is provided.\n    value: https://www.prnewswire.com/news-releases/baidu-launches-ernie-4-0-foundation-model-leading-a-new-wave-of-ai-native-applications-301958681.html\n"
  },
  {
    "path": "assets/bain.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-02-21\n  dependencies:\n  - ChatGPT API\n  description: \"With the alliance, Bain will combine its deep digital implementation\\\n    \\ capabilities and strategic expertise with OpenAI\\u2019s AI tools and platforms,\\\n    \\ including ChatGPT, to help its clients around the world identify and implement\\\n    \\ the value of AI to maximize business potential.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Bain Chat\n  nationality: USA\n  organization: Bain\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.bain.com/vector-digital/partnerships-alliance-ecosystem/openai-alliance/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/be_my_eyes.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-03-14\n  dependencies:\n  - GPT-4 API\n  description: \"The first-ever digital visual assistant powered by OpenAI\\u2019s new\\\n    \\ GPT-4 language model.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Virtual Volunteer\n  nationality: Denmark\n  organization: Be My Eyes\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.bemyeyes.com/blog/introducing-be-my-eyes-virtual-volunteer\n  user_distribution: ''\n"
  },
  {
    "path": "assets/beitech.yaml",
    "content": "- access: open\n  analysis: Models fine-tuned on EXMODD and earlier dataset Image-Chat and then evaluated\n    on Image-Chat validation set.\n  created_date: 2023-10-17\n  datasheet: none\n  dependencies:\n  - YFCC100M\n  - Image-Chat\n  description: EXMODD (Explanatory Multimodal Open-Domain Dialogue dataset) is a dataset\n    built off the proposed MDCF (Multimodal Data Construction Framework).\n  excluded: ''\n  feedback: Feedback can be sent to authors via poplpr@bit.edu.cn\n  included: ''\n  intended_uses: ''\n  license: MIT\n  modality: image, text\n  monitoring: none\n  name: EXMODD\n  nationality: China\n  organization: Beijing Institute of Technology\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://arxiv.org/pdf/2310.10967.pdf\n- access: open\n  analysis: Evaluated on standard benchmarks including MMLU, CEval, and DROP.\n  created_date: 2023-11-13\n  dependencies:\n  - Llama 2\n  description: MiniMA is a smaller finetuned Llama 2 model adapted for Chinese.\n  feedback: https://huggingface.co/GeneZC/MiniMA-3B/discussions\n  intended_uses: ''\n  license: Llama 2\n  modality: text; text\n  model_card: https://huggingface.co/GeneZC/MiniMA-3B\n  monitoring: unknokwn\n  name: MiniMA\n  nationality: China\n  organization: Beijing Institute of Technology\n  prohibited_uses: ''\n  quality_control: ''\n  size: 3B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8 A100 80G GPUs\n  training_time: unknown\n  type: model\n  url: https://github.com/GeneZC/MiniMA\n"
  },
  {
    "path": "assets/berkeley.yaml",
    "content": "- access: open\n  analysis: Evaluated in comparison with ChatGPT and Stanford Alpaca.\n  created_date: 2023-04-03\n  dependencies:\n  - LLaMA\n  - web-scraped dialogue data\n  description: \"A relatively small chatbot trained by fine-tuning Meta\\u2019s LLaMA\\\n    \\ on dialogue data gathered from the web.\"\n  feedback: https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g/discussions\n  intended_uses: academic research\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g\n  monitoring: ''\n  name: Koala\n  nationality: USA\n  organization: Berkeley\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: ''\n  training_hardware: 8 A100 GPUs\n  training_time: 6 hours\n  type: model\n  url: https://bair.berkeley.edu/blog/2023/04/03/koala/\n- access: open\n  analysis: Evaluated using AST sub-tree matching technique and compared to other\n    models in terms of API functionality accuracy.\n  created_date: 2023-05-24\n  dependencies:\n  - LLaMA\n  - Gorilla document retriever\n  description: Gorilla is a finetuned LLaMA-based model that surpasses the performance\n    of GPT-4 on writing API calls.\n  feedback: ''\n  intended_uses: In conjunction with a LLM to improve its capability for using API\n    calls.\n  license: Apache 2.0\n  modality: text; API\n  model_card: ''\n  monitoring: ''\n  name: Gorilla\n  nationality: USA\n  organization: Berkeley\n  prohibited_uses: ''\n  quality_control: No specific quality control is mentioned in model training, though\n    details on data processing and collection are provided in the paper.\n  size: 7B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2305.15334v1.pdf\n- access: open\n  analysis: Evaluated on wide range of tasks using own evaluation benchmarks.\n  created_date: 2023-05-03\n  dependencies:\n  - RedPajama\n  description: OpenLlama is an open source reproduction of Meta's LLaMA model.\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: OpenLLaMA\n  nationality: USA\n  organization: Berkeley\n  prohibited_uses: ''\n  quality_control: ''\n  size: 17B parameters (dense)\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: unknown\n  type: model\n  url: https://github.com/openlm-research/open_llama\n"
  },
  {
    "path": "assets/bigcode.yaml",
    "content": "- access: open\n  analysis: Tested on several benchmarks, most notably Python benchmark HumanEval.\n  created_date: 2023-05-09\n  dependencies:\n  - The Stack\n  description: StarCoder is a Large Language Model for Code (Code LLM) trained on\n    permissively licensed data from GitHub, including from 80+ programming languages,\n    Git commits, GitHub issues, and Jupyter notebooks.\n  feedback: https://huggingface.co/bigcode/starcoder/discussions\n  intended_uses: As a foundation model to fine-tune and create more specialized models\n    that support use cases such as code completion, fill-in-the-middle, and text summarization.\n    Can also be used as a Tech Assistant prompt and not as an instruction model given\n    training limitations.\n  license: BigCode Open RAIL-M v1.0\n  modality: code; code\n  model_card: https://huggingface.co/bigcode/starcoder\n  monitoring: ''\n  name: StarCoder\n  nationality: International\n  organization: BigCode\n  prohibited_uses: See BigCode Open RAIL-M license and FAQ\n  quality_control: No specific quality control is mentioned in model training, though\n    details on data processing and how the tokenizer was trained are provided in the\n    paper.\n  size: 15.5B parameters (dense)\n  training_emissions: 16.68 tons of CO2eq\n  training_hardware: 512 A100 80GB GPUs distributed across 64 nodes\n  training_time: 320,256 GPU hours\n  type: model\n  url: https://arxiv.org/pdf/2305.06161.pdf\n- access: open\n  analysis: Evaluated on MultiPL-E system benchmarks.\n  created_date: 2023-02-24\n  dependencies:\n  - The Stack\n  - BigCode Dataset\n  description: Multilingual code model derived from the findings of BigCode Project\n    analysis on Github stars' association to data quality.\n  feedback: https://huggingface.co/bigcode/santacoder/discussions\n  intended_uses: The model was trained on GitHub code. As such it is not an instruction\n    model and commands do not work well. You should phrase commands like they occur\n    in source code such as comments or write a function signature and docstring and\n    let the model complete the function body.\n  license: BigCode Open RAIL-M v1\n  modality: code; code\n  model_card: https://huggingface.co/bigcode/santacoder\n  monitoring: ''\n  name: SantaCoder\n  nationality: International\n  organization: BigCode\n  prohibited_uses: See BigCode Open RAIL-M license and FAQ\n  quality_control: ''\n  size: 1.1B parameters (dense)\n  training_emissions: 124 kg of CO2eq\n  training_hardware: 96 NVIDIA Tesla V100 GPUs\n  training_time: 14,284 GPU hours\n  type: model\n  url: https://arxiv.org/pdf/2301.03988.pdf\n- access: open\n  analysis: Evaluated models trained on The Stack on HumanEval and MBPP and compared\n    against similarly-sized models.\n  created_date: 2022-11-20\n  datasheet: https://huggingface.co/datasets/bigcode/the-stack\n  dependencies:\n  - GitHub\n  description: The Stack contains over 6TB of permissively-licensed source code files\n    covering 358 programming languages. The Stack serves as a pre-training dataset\n    for Code LLMs, i.e., code-generating AI systems which enable the synthesis of\n    programs from natural language descriptions as well as other from code snippets.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/bigcode/the-stack/discussions\n  included: ''\n  intended_uses: creating code LLMs\n  license: The Stack is a collection of source code from repositories with various\n    licenses. Any use of all or part of the code gathered in The Stack must abide\n    by the terms of the original licenses, including attribution clauses when relevant.\n    Provenance information is provided for each data point.\n  modality: code\n  monitoring: ''\n  name: The Stack\n  nationality: International\n  organization: BigCode\n  prohibited_uses: See https://huggingface.co/datasets/bigcode/the-stack\n  quality_control: allowed users whose data were part of The Stack's training data\n    to opt-out\n  sample:\n  - https://huggingface.co/datasets/bigcode/the-stack/viewer/default/train\n  size: 6 TB\n  type: dataset\n  url: https://arxiv.org/pdf/2211.15533.pdf\n- access: open\n  analysis: See https://arxiv.org/pdf/2402.19173.pdf\n  created_date: 2024-02-28\n  dependencies:\n  - The Stack v2\n  description: StarCoder2-15B model is a 15B parameter model trained on 600+ programming\n    languages from The Stack v2, with opt-out requests excluded. The training was\n    carried out using the Fill-in-the-Middle objective on 4+ trillion tokens.\n  feedback: https://huggingface.co/bigcode/starcoder2-15b/discussions\n  intended_uses: The model was trained on GitHub code as well as additional selected\n    data sources such as Arxiv and Wikipedia. As such it is not an instruction model\n    and commands like \"Write a function that computes the square root.\" do not work\n    well. Intended to generate code snippets from given context, but not for writing\n    actual functional code directly.\n  license: BigCode OpenRail-M\n  modality: code; text\n  model_card: https://huggingface.co/bigcode/starcoder2-15b\n  monitoring: unknown\n  name: StarCoder2-15B\n  nationality: International\n  organization: BigCode\n  prohibited_uses: See BigCode Open RAIL-M license and FAQ\n  quality_control: The model was filtered for permissive licenses and code with no\n    license only. A search index is provided to identify where generated code came\n    from to apply the proper attribution.\n  size: 15B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 1024 x H100 GPUs\n  training_time: unknown\n  type: model\n  url: https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html\n- access: open\n  analysis: See https://arxiv.org/pdf/2402.19173.pdf\n  created_date: 2024-02-28\n  dependencies:\n  - The Stack v2\n  description: StarCoder2-7B model is a 7B parameter model trained on 17 programming\n    languages from The Stack v2, with opt-out requests excluded. The model uses Grouped\n    Query Attention, a context window of 16,384 tokens with a sliding window attention\n    of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3.5+\n    trillion tokens.\n  feedback: https://huggingface.co/bigcode/starcoder2-7b/discussions\n  intended_uses: Intended to generate code snippets from given context, but not for\n    writing actual functional code directly. The model has been trained on source\n    code from 17 programming languages. The predominant language in source is English\n    although other languages are also present. As such the model is capable of generating\n    code snippets provided some context but the generated code is not guaranteed to\n    work as intended. It can be inefficient and contain bugs or exploits. See the\n    paper for an in-depth discussion of the model limitations.\n  license: BigCode OpenRail-M\n  modality: code; text\n  model_card: https://huggingface.co/bigcode/starcoder2-7b\n  monitoring: unknown\n  name: StarCoder2-7B\n  nationality: International\n  organization: BigCode\n  prohibited_uses: See BigCode Open RAIL-M license and FAQ\n  quality_control: The model was filtered for permissive licenses and code with no\n    license only. A search index is provided to identify where generated code came\n    from to apply the proper attribution.\n  size: 7B parameters (dense)\n  training_emissions: 29,622.83 kgCO2eq\n  training_hardware: 432 H100 GPUs\n  training_time: 145,152 hours (cumulative)\n  type: model\n  url: https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html\n- access: open\n  analysis: See https://arxiv.org/pdf/2402.19173.pdf\n  created_date: 2024-02-28\n  dependencies:\n  - The Stack v2\n  description: StarCoder2-3B model is a 3B parameter model trained on 17 programming\n    languages from The Stack v2, with opt-out requests excluded. The model uses Grouped\n    Query Attention, a context window of 16,384 tokens with a sliding window attention\n    of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+\n    trillion tokens.\n  feedback: https://huggingface.co/bigcode/starcoder2-3b/discussions\n  intended_uses: Intended to generate code snippets from given context, but not for\n    writing actual functional code directly. The model has been trained on source\n    code from 17 programming languages. The predominant language in source is English\n    although other languages are also present. As such the model is capable of generating\n    code snippets provided some context but the generated code is not guaranteed to\n    work as intended. It can be inefficient and contain bugs or exploits. See the\n    paper for an in-depth discussion of the model limitations.\n  license: BigCode OpenRail-M\n  modality: code; text\n  model_card: https://huggingface.co/bigcode/starcoder2-3b\n  monitoring: unknown\n  name: StarCoder2-3B\n  nationality: International\n  organization: BigCode\n  prohibited_uses: See BigCode Open RAIL-M license and FAQ\n  quality_control: The model was filtered for permissive licenses and code with no\n    license only. A search index is provided to identify where generated code came\n    from to apply the proper attribution.\n  size: 3B parameters (dense)\n  training_emissions: 16,107.01 kgCO2eq\n  training_hardware: 160 A100 GPUs\n  training_time: 97,120 hours (cumulative)\n  type: model\n  url: https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html\n"
  },
  {
    "path": "assets/bigscience.yaml",
    "content": "- access:\n    explanation: While not all of the ROOTS corpus is publicly accessible, the majority\n      is through [this form](https://docs.google.com/forms/d/e/1FAIpQLSdq50O1x4dkdGI4dwsmchFuNI0KCWEDiKUYxvd0r0_sl6FfAQ/viewform?usp=send_form)\n    value: open\n  analysis: https://huggingface.co/spaces/bigscience-data/roots-search\n  created_date: 2022-06-06\n  datasheet: https://huggingface.co/spaces/bigscience-data/bigscience-corpus\n  dependencies: []\n  description: The Responsible Open-science Open-collaboration Text Sources (ROOTS)\n    corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter\n    BigScience Large Open-science Open-access Multilingual (BLOOM) language model.\n  excluded: unknown\n  feedback: https://huggingface.co/spaces/bigscience-data/roots-search/discussions\n  included: See section 2 of the paper.\n  intended_uses: To empower large-scale monolingual and multilingual modeling projects\n    with both the data and the processing tools, as well as stimulate research around\n    this large multilingual corpus.\n  license:\n    explanation: Each constituent subset of the dataset will be released under the\n      license that applies to it. ([See individual dataset page for specific license\n      information](https://hf.co/bigscience-data))\n    value: custom\n  modality: code, text\n  monitoring: none\n  name: ROOTS\n  nationality: International\n  organization: BigScience\n  prohibited_uses: none\n  quality_control: Data collection involved merging and deduplicating searches to\n    remove menus, HTML tags. Further, a quality improvement pipeline was implemented.\n  sample:\n  - https://huggingface.co/spaces/bigscience-data/roots-search\n  size: 1.6TB\n  type: dataset\n  url: https://openreview.net/forum?id=UoEw6KigkUn\n- access: open\n  analysis: See the paper.\n  created_date: 2022-10-15\n  datasheet: https://huggingface.co/datasets/bigscience/P3\n  dependencies: []\n  description: The Public Pool of Prompts (P3) are prompts written in an unified format\n    use to train T0++.\n  excluded: We conservatively decided not to prompt datasets that contain potentially\n    harmful content (for instance, datasets built on social media content).\n  feedback: Point of Contact is [Victor Sanh](https://huggingface.co/datasets/bigscience/P3)\n  included: The Public Pool of Prompts relies on the Hugging Face Dataset library.\n    Any public dataset in the Datasets library can be prompted. We select the datasets\n    that have at least one subset in English and excluded datasets containing (predominantly)\n    non-natural language examples.\n  intended_uses: Multitask finetuning of language models.\n  license: Apache 2.0\n  modality: text\n  monitoring: none\n  name: P3\n  nationality: International\n  organization: BigScience\n  prohibited_uses: none\n  quality_control: Data collection involved merging and deduplicating searches to\n    remove menus, HTML tags. Further, a quality improvement pipeline was implemented.\n  sample:\n  - https://huggingface.co/datasets/bigscience/P3/viewer\n  size: 2000 prompts\n  type: dataset\n  url: https://arxiv.org/pdf/2110.08207.pdf\n- access: open\n  analysis: See the paper.\n  created_date: 2022-11-03\n  datasheet: https://huggingface.co/datasets/bigscience/xP3\n  dependencies:\n  - P3\n  description: xP3 (Crosslingual Public Pool of Prompts) is a collection of prompts\n    and datasets across 46 of languages & 16 NLP tasks. It is used for the training\n    of BLOOMZ and mT0, multilingual language models capable of following human instructions\n    in dozens of languages zero-shot.\n  excluded: We conservatively decided not to prompt datasets that contain potentially\n    harmful content (for instance, datasets built on social media content).\n  feedback: Point of Contact is [Niklas Muennighoff](https://huggingface.co/datasets/bigscience/xP3)\n  included: xP3 adds 28 multilingual datasets to P3 based on the P3 task taxonomy.\n  intended_uses: Multitask finetuning of language models.\n  license: Apache 2.0\n  modality: code, text\n  monitoring: none\n  name: xP3\n  nationality: International\n  organization: BigScience\n  prohibited_uses: none\n  quality_control: Data collection involved merging and deduplicating searches to\n    remove menus, HTML tags. Further, a quality improvement pipeline was implemented.\n  sample:\n  - https://huggingface.co/datasets/bigscience/xP3/viewer\n  size: 9.4GB\n  type: dataset\n  url: https://arxiv.org/abs/2211.01786\n- access: open\n  analysis: ''\n  created_date:\n    explanation: The date the paper was posted to arXiv.\n    value: 2021-10-15\n  dependencies:\n  - T5\n  - P3\n  description: T0++ is an multitask fine-tuned language model based on T5.\n  feedback: https://huggingface.co/bigscience/T0pp/discussions\n  intended_uses: You can use the models to perform inference on tasks by specifying\n    your query in natural language, and the models will generate a prediction.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/bigscience/T0pp\n  monitoring: none\n  name: T0++\n  nationality: International\n  organization: BigScience\n  prohibited_uses: none\n  quality_control: https://arxiv.org/pdf/2110.08207.pdf\n  size: 11B parameters (dense)\n  training_emissions:\n    explanation: Sourced from estimates in https://arxiv.org/pdf/2110.08207.pdf\n    value: 0.9 tCO2e\n  training_hardware: Jean Zay (v3-512)\n  training_time:\n    explanation: Sourced from estimates in https://arxiv.org/pdf/2110.08207.pdf\n    value: 27 hours\n  type: model\n  url: https://arxiv.org/pdf/2110.08207.pdf\n- access: open\n  analysis: ''\n  created_date:\n    explanation: The date the model was released\n    value: 2022-07-12\n  dependencies:\n  - ROOTS\n  description: BLOOM is an autoregressive multilingual language model.\n  feedback: https://huggingface.co/bigscience/bloom/discussions\n  intended_uses: This model is being created in order to enable public research on\n    large language models (LLMs). LLMs are intended to be used for language generation\n    or as a pretrained base model that can be further fine-tuned for specific tasks.\n    Use cases below are not exhaustive.\n  license:\n    explanation: Model is licensed under https://huggingface.co/spaces/bigscience/license\n    value: BigScience RAIL v1.0\n  modality: code, text; code, text\n  model_card: ''\n  monitoring: none\n  name: BLOOM\n  nationality: International\n  organization: BigScience\n  prohibited_uses: Using the model in high-stakes settings is out of scope for this\n    model (e.g. biomedical/political/legal/finance domains, evaluating or scoring\n    individuals). The model is not designed for critical decisions nor uses with any\n    material consequences on an individual's livelihood or wellbeing. The model outputs\n    content that appears factual but may not be correct. Misuse. Intentionally using\n    the model for harm, violating human rights, or other kinds of malicious activities,\n    is a misuse of this model (e.g. spam generation, disinformation, disparagement,\n    deception, surveillance).\n  quality_control: ''\n  size: 176B parameters (dense)\n  training_emissions:\n    explanation: Sourced from estimates in https://arxiv.org/abs/2211.02001\n    value: 25 tCO2e\n  training_hardware: Jean Zay (48 * 8xA100 80GB nodes)\n  training_time:\n    explanation: 1082990 A100 hours at 156 TFLOP/s maximum utilization\n    value: 7039 petaflop/s-days\n  type: model\n  url: https://arxiv.org/abs/2211.05100\n- access: open\n  analysis: https://huggingface.co/bigscience/bloomz#evaluation\n  created_date:\n    explanation: The date the model was released\n    value: 2021-10-15\n  dependencies:\n  - mT5\n  - xP3\n  description: mT0 is an multitask fine-tuned multilingual language model based on\n    mT5.\n  feedback: https://huggingface.co/bigscience/bloomz/discussions\n  intended_uses: We recommend using the model to perform tasks expressed in natural\n    language.\n  license:\n    explanation: Model is licensed under https://huggingface.co/spaces/bigscience/license\n    value: BigScience RAIL v1.0\n  modality: text ;text\n  model_card: https://huggingface.co/bigscience/T0pp\n  monitoring: none\n  name: mT0\n  nationality: International\n  organization: BigScience\n  prohibited_uses: none\n  quality_control: https://arxiv.org/pdf/2211.01786.pdf\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: Jean Zay (288 A100 80GB GPUs with 8 GPUs per node (36 nodes)\n    using NVLink 4 inter-gpu connects, 4 OmniPath links)\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2110.08207.pdf\n- access: open\n  analysis: https://huggingface.co/bigscience/bloomz#evaluation\n  created_date: 2022-11-03\n  dependencies:\n  - BLOOM\n  - xP3\n  description: BLOOMZ is an multitask fine-tuned autoregressive multilingual language\n    model.\n  feedback: https://huggingface.co/bigscience/bloomz/discussions\n  intended_uses: We recommend using the model to perform tasks expressed in natural\n    language.\n  license:\n    explanation: Model is licensed under https://huggingface.co/spaces/bigscience/license\n    value: BigScience RAIL v1.0\n  modality: code, text; code, text\n  model_card: https://huggingface.co/bigscience/bloomz\n  monitoring: none\n  name: BLOOMZ\n  nationality: International\n  organization: BigScience\n  prohibited_uses: none\n  quality_control: https://arxiv.org/pdf/2211.01786.pdf\n  size: 176B parameters (dense)\n  training_emissions: unknown\n  training_hardware: Jean Zay (288 A100 80GB GPUs with 8 GPUs per node (36 nodes)\n    using NVLink 4 inter-gpu connects, 4 OmniPath links)\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2211.01786.pdf\n"
  },
  {
    "path": "assets/bloomberg.yaml",
    "content": "- access:\n    explanation: See [[Openness]](https://arxiv.org/pdf/2303.17564.pdf#subsection.8.2)\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: The date BloombergGPT was announced in the [[Bloomberg article]](https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/).\n    value: 2023-03-30\n  datasheet:\n    explanation: Section 2 of the BloombergGPT paper.\n    value: https://arxiv.org/pdf/2303.17564.pdf#section.2\n  dependencies: []\n  description: A comprehensive dataset consisting of a range of English financial\n    documents including news, filings, press releases, web-scraped financial documents,\n    and social media drawn from the Bloomberg archives that was used to train the\n    BloombergGPT model.\n  excluded: ''\n  feedback: ''\n  included:\n    explanation: See [[Financial Datasets]](https://arxiv.org/pdf/2303.17564.pdf#subsection.2.1)\n    value: \"FinPile consists of English financial documents. Authors utilize the The\\\n      \\ Bloomberg\\nTerminal, which is an extensive collection of curated and maintained\\\n      \\ documents,\\nto create the FinPile dataset. Each document in FinPile is time-stamped,\\\n      \\ with\\ndates ranging from 2007-03-01 to 2022-07-31.\\nTypes of data included\\\n      \\ are given below:\\n  1. Web (298B tokens) - Inclues Bloomberg's web crawl focused\\\n      \\ on high-quality\\nwebsites that have financially relevant information. This\\\n      \\ makes up the majority\\nof FinPile.\\n  2. News (38B tokens) - Includes all\\\n      \\ news sources relevant to the financial\\ncommunity, excluding news articles\\\n      \\ written by Bloomberg journalists. Overall,\\nthere are hundreds of English\\\n      \\ news sources in FinPile including \\\"Bloomberg\\nTranscripts\\\", which are transcripts\\\n      \\ of Bloomberg TV news.\\n  3. Filings (14B tokens) - Includes financial statements\\\n      \\ prepared by (public)\\ncompanies and made available to the general public.\\\n      \\  In the dataset, a majority\\nof the filings come from EDGAR, which is the\\\n      \\ SEC's online database.\\n  4. Press (9B tokens) - Includes press releases typically\\\n      \\ issued by companies\\nthat are financially relevant.\\n  5. Bloomberg (5B tokens)\\\n      \\ - Includes Bloomberg authored news and other documents\\nsuch as opinions and\\\n      \\ analyses. The largest sources are \\u201CBloomberg News\\u201D and\\n\\u201CBloomberg\\\n      \\ First Word\\u201D, the Bloomberg-authored wire of real-time news.\\n\"\n  intended_uses: Used to train the BloombergGPT model.\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: FinPile\n  nationality: USA\n  organization: Bloomberg\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size:\n    explanation: See [[Financial Datasets]](https://arxiv.org/pdf/2303.17564.pdf#subsection.2.1)\n    value: 363B tokens\n  type: dataset\n  url: https://arxiv.org/pdf/2303.17564.pdf#section.2\n- access:\n    explanation: See [[Openness]](https://arxiv.org/pdf/2303.17564.pdf#subsection.8.2)\n    value: closed\n  analysis:\n    explanation: See [[Evaluation]](https://arxiv.org/pdf/2303.17564.pdf#section.5)\n    value: 'Authors evaluate the performance of BloombergGPT on two broad categories\n      of tasks, finance-specific and general purpose, on several standard benchmarks.\n      They compare BloombergGPT to the three closest models: GPT-NeoX, OPT-66B and\n      BLOOM-176B. They also report results from the original GPT-3 whenever externally\n      available. They conclude \"We achieve strong results on general LLM benchmarks\n      and outperform comparable models on financial tasks. We attribute this, in decreasing\n      order of impact, to 1. a well-curated internal dataset, 2. our unique choice\n      in tokenizer, and 3. an up-to-date architecture.\"\n\n      '\n  created_date:\n    explanation: The date the model was announced in the [[Bloomberg article]](https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/).\n    value: 2023-03-30\n  dependencies:\n  - FinPile\n  - The Pile\n  - C4\n  - Wikipedia\n  description: BloombergGPT is a 50 billion parameter large language model that is\n    specifically trained on a wide range of financial data to support a diverse set\n    of natural language processing tasks within the financial industry.\n  feedback: ''\n  intended_uses:\n    explanation: See [[Bloomberg article]](https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/)\n    value: '\"This model will assist Bloomberg in improving existing financial NLP\n      tasks, such as sentiment analysis, named entity recognition, news classification,\n      and question answering, among others. Furthermore, BloombergGPT will unlock\n      new opportunities for marshalling the vast quantities of data available on the\n      Bloomberg Terminal to better help the firm''s customers, while bringing the\n      full potential of AI to the financial domain.\"\n\n      '\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: BloombergGPT\n  nationality: USA\n  organization: Bloomberg\n  prohibited_uses: ''\n  quality_control:\n    explanation: See [[Ethics, Limitations, and Implications]](https://arxiv.org/pdf/2303.17564.pdf#section.8)\n    value: \"Authors state the following:\\n- \\\"To provide natural language applications\\\n      \\ to the financial community, we\\n  have developed a rigorous risk and testing\\\n      \\ assessment process. This process\\n  includes careful annotation guidelines\\\n      \\ Tseng et al. (2020), pre-launch review\\n  at multiple levels by the central\\\n      \\ risk and compliance organizations, and\\n  by the product leaders (e.g., the\\\n      \\ newsroom) as applicable, and post-launch\\n  monitoring. Moreover, we conduct\\\n      \\ our research, development, and deployment\\n  of NLP and AI systems in accordance\\\n      \\ with all applicable regulations.\\\"\\n- \\\"Similarly, toxicity and bias are areas\\\n      \\ where, as a company, we take extraordinary\\n  care with any content we produce,\\\n      \\ whether from humans or machines. Since\\n  the measurement of toxicity and\\\n      \\ bias in our model depends on its application\\n  areas, quantifying the potential\\\n      \\ for the generation of harmful language\\n  remains an open question. We are\\\n      \\ particularly interested in studying whether\\n  FinPile, which is cleaner and\\\n      \\ contains fewer examples of overtly biased\\n  or toxic language (e.g., Press\\\n      \\ Releases), reduces the proclivity of the\\n  model to generate inappropriate\\\n      \\ content.\\\"\\n\"\n  size: 50B parameters (dense)\n  training_emissions: unknown\n  training_hardware:\n    explanation: See [[Training Configuration]](https://arxiv.org/pdf/2303.17564.pdf#subsection.3.3)\n    value: 64 Amazon EC2 p4d.24xlarge instances each with 8 NVIDIA 40GB A100 GPUs\n      (i.e. total 512 A100 GPUs)\n  training_time:\n    explanation: See [[Training Run]](https://arxiv.org/pdf/2303.17564.pdf#section.4)\n    value: 53 days\n  type: model\n  url: https://arxiv.org/abs/2303.17564\n"
  },
  {
    "path": "assets/boston.yaml",
    "content": "- access: open\n  analysis: \"Platypus achieves the strongest performance and currently stands at first\\\n    \\ place in HuggingFace\\u2019s Open LLM Leaderboard as of its release date.\"\n  created_date: 2023-08-14\n  dependencies:\n  - LLaMA 2\n  - Platypus curated dataset\n  description: Platypus is a family of fine-tuned and merged Large Language Models\n    (LLMs).\n  feedback: https://huggingface.co/garage-bAInd/Platypus2-13B/discussions\n  intended_uses: ''\n  license: CC by-NC-SA 4.0\n  modality: text; text\n  model_card: https://huggingface.co/garage-bAInd/Platypus2-13B\n  monitoring: ''\n  name: Platypus\n  nationality: USA\n  organization: Boston University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: ''\n  training_hardware: 1 A100 GPU\n  training_time: 5 hours\n  type: model\n  url: https://arxiv.org/pdf/2308.07317.pdf\n- access: open\n  analysis: UFOGen is evaluated on standard image benchmarks against other models\n    fine-tuned with Stable Diffusion.\n  created_date: 2023-11-14\n  dependencies:\n  - Stable Diffusion\n  description: UFOGen is a novel generative model designed for ultra-fast, one-step\n    text-to-image synthesis.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; image\n  model_card: none\n  monitoring: ''\n  name: UFOGen\n  nationality: USA\n  organization: Boston University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 900M parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.09257.pdf\n"
  },
  {
    "path": "assets/brex.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-03-07\n  dependencies:\n  - ChatGPT API\n  description: Brex Inc., a highly valued startup that makes software for finance\n    professionals, is turning to the same artificial intelligence tool behind ChatGPT\n    for a service that can answer questions about corporate budgets, policy and spending.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: '\"Brex grants you a nonexclusive and nontransferable license to use\n      Brex Property as provided through the Services and as permitted by these User\n      Terms.\" - excerpt from the Terms of Service.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Brex Chat\n  nationality: USA\n  organization: Brex\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.brex.com/legal/user-terms\n  type: application\n  url: https://fortune.com/2023/03/07/cfo-chatbot-chatgpt-ai-brex-finance-software-startup-budgets-policies/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/bytedance.yaml",
    "content": "- access: open\n  analysis: Evaluated on benchmarks pertaining to speech, music, and other audio recognition.\n  created_date: 2023-10-20\n  dependencies:\n  - Whisper\n  - BEATs\n  - Vicuna\n  description: SALMONN is a large language model (LLM) enabling speech, audio event,\n    and music inputs.\n  feedback: https://huggingface.co/MSIIP/SALMONN/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: audio, text; text\n  model_card: https://huggingface.co/MSIIP/SALMONN\n  monitoring: none\n  name: SALMONN\n  nationality: unknown\n  organization: ByteDance, Tsinghua University\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/bytedance/SALMONN\n- access: open\n  analysis: Evaluated via qualitative comparison relative to other SoTA image generation\n    models.\n  created_date: 2024-02-21\n  dependencies:\n  - Stable Diffusion XL\n  description: SDXL-Lightning is a lightning-fast text-to-image generation model.\n    It can generate high-quality 1024px images in a few steps. The models are distilled\n    from stabilityai/stable-diffusion-xl-base-1.0. This repository contains checkpoints\n    for 1-step, 2-step, 4-step, and 8-step distilled models.\n  feedback: https://huggingface.co/ByteDance/SDXL-Lightning/discussions\n  intended_uses: The model can be used for fast, high-quality text-to-image generation.\n    It supports 1-step, 2-step, 4-step, and 8-step distilled models which provide\n    varying generation quality.\n  license: OpenRail++\n  modality: text; image\n  model_card: https://huggingface.co/ByteDance/SDXL-Lightning\n  monitoring: unknown\n  name: SDXL-Lightning\n  nationality: China\n  organization: ByteDance\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: 64 A100 80G GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2402.13929.pdf\n- access: open\n  analysis: LLaVA-Critic was tested in scenarios such as LMM-as-a-Judge and Preference\n    Learning, showing a high correlation with commercial GPT models in evaluation\n    scores. It served as an alternative to expensive human feedback in resource-constrained\n    settings and demonstrated better performance in providing AI-generated feedback\n    for model alignment compared to human-reliant reward models.\n  created_date: 2024-10-06\n  dependencies: []\n  description: LLaVA-Critic is an open-source large multimodal model (LMM) designed\n    as a generalist evaluator. It assesses performance across a variety of multimodal\n    tasks by following a high-quality critic instruction dataset, incorporating diverse\n    evaluation criteria. The model is effective in areas like LMM-as-a-Judge, providing\n    reliable evaluation scores comparable to GPT models, and Preference Learning,\n    offering reward signals for preference learning to enhance model alignment capabilities.\n  feedback: unknown\n  intended_uses: The model can be used for evaluating multimodal tasks, generating\n    reward signals for preference learning, and serving as a reliable alternate judge\n    for model assessments.\n  license: Apache 2.0\n  modality: image, text; text\n  model_card: unknown\n  monitoring: unknown\n  name: LLaVA-Critic\n  nationality: unknown\n  organization: ByteDance and University of Maryland, College Park\n  prohibited_uses: The model should not be used in scenarios requiring authorization\n    from proprietary models, nor relied upon for critical applications without human\n    oversight due to potential biases in dataset.\n  quality_control: The model ensures quality by utilizing a high-quality dataset for\n    critic instructions, providing both quantitative judgments and reasoning, with\n    transparency in assessments.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2410.02712\n"
  },
  {
    "path": "assets/cagliostro.yaml",
    "content": "- access: open\n  analysis: unknown\n  created_date: 2024-03-18\n  dependencies:\n  - Animagine XL 3.0\n  description: An open-source, anime-themed text-to-image model enhanced to generate\n    higher quality anime-style images with a broader range of characters from well-known\n    anime series, an optimized dataset, and new aesthetic tags for better image creation.\n  feedback: https://huggingface.co/cagliostrolab/animagine-xl-3.1/discussions\n  intended_uses: Generating high-quality anime images from textual prompts. Useful\n    for anime fans, artists, and content creators.\n  license: Fair AI Public License 1.0-SD\n  modality: text; image\n  model_card: https://huggingface.co/cagliostrolab/animagine-xl-3.1\n  monitoring: unknown\n  name: Animagine XL 3.1\n  nationality: USA\n  organization: Cagliostro Research Lab\n  prohibited_uses: Not suitable for creating realistic photos or for users who expect\n    high-quality results from short or simple prompts.\n  quality_control: The model undergoes pretraining, first stage finetuning, and second\n    stage finetuning for refining and improving aspects such as hand and anatomy rendering.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: 2x A100 80GB GPUs\n  training_time: Approximately 15 days, totaling over 350 GPU hours.\n  type: model\n  url: https://cagliostrolab.net/posts/animagine-xl-v31-release\n"
  },
  {
    "path": "assets/cartesia.yaml",
    "content": "- access: limited\n  analysis: Extensive testing on Multilingual Librispeech dataset resulted in 20%\n    lower validation perplexity. In downstream evaluations, this leads to a 2x lower\n    word error rate and a 1 point higher quality score. Sonic also displays impressive\n    performance metrics at inference, achieving lower latency (1.5x lower time-to-first-audio),\n    faster inference speed (2x lower real-time factor), and higher throughput (4x).\n  created_date: 2024-05-29\n  dependencies:\n  - Multilingual Librispeech dataset\n  description: Sonic is a low-latency voice model that generates lifelike speech.\n    Developed by Cartesia, it was designed to be an efficient real-time AI capable\n    of processing any-sized contexts and running on any device.\n  feedback: Contact through the provided form or via email at join@cartesia.ai.\n  intended_uses: Sonic has potential applications across customer support, entertainment,\n    and content creation and is a part of Cartesias broader mission to bring real-time\n    multimodal intelligence to every device.\n  license: unknown\n  modality: text; audio\n  model_card: none\n  monitoring: unknown\n  name: Sonic\n  nationality: USA\n  organization: Cartesia\n  prohibited_uses: unknown\n  quality_control: ''\n  size: 2024-05-29\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://cartesia.ai/blog/sonic\n"
  },
  {
    "path": "assets/casia.yaml",
    "content": "- access: open\n  analysis: Reports results on standard translation benchmarks across 102 languages\n    in comparison with Google Translate and ChatGPT\n  created_date: 2023-05-29\n  dependencies:\n  - LLaMA\n  - CLUE\n  - BigTrans parallel dataset\n  description: BigTrans is a model which adapts LLaMA that covers only 20 languages\n    and enhances it with multilingual translation capability on more than 100 languages\n  feedback: https://huggingface.co/James-WYang/BigTrans/discussions\n  intended_uses: Advancing future research in multilingual LLMs\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/James-WYang/BigTrans\n  monitoring: ''\n  name: BigTrans\n  nationality: China\n  organization: Institute of Automation Chinese Academy of Sciences\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 16 A100 GPUs with 80 GB of RAM\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2305.18098v1.pdf\n- access: open\n  analysis: Evaluated on standard benchmarks for knowledge and language understanding,\n    mathematical reasoning, and programming ability in comparison to similarly sized\n    open-source models.\n  created_date: 2023-12-22\n  dependencies: []\n  description: YAYI 2 is an open source large language model trained in both English\n    and Chinese.\n  feedback: https://huggingface.co/wenge-research/yayi2-30b/discussions\n  intended_uses: ''\n  license:\n    explanation: Model is under a custom [license](https://github.com/wenge-research/YAYI2/blob/main/COMMUNITY_LICENSE),\n      while code is Apache 2.0\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/wenge-research/yayi2-30b\n  monitoring: ''\n  name: YAYI 2\n  nationality: China\n  organization: Institute of Automation Chinese Academy of Sciences\n  prohibited_uses: ''\n  quality_control: data is deduplicated, normalized, cleaned, and filtered for toxicity\n  size: 30B parameters (dense)\n  training_emissions: unknown\n  training_hardware: over 1000 A800 GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2312.14862.pdf\n"
  },
  {
    "path": "assets/causallm.yaml",
    "content": "- access: open\n  analysis: Evaluated on standard benchmarks across a range of tasks.\n  created_date: 2023-10-21\n  dependencies:\n  - Qwen\n  - OpenOrca\n  - Open Platypus\n  description: CausalLM is an LLM based on the model weights of Qwen and trained on\n    a model architecture identical to LLaMA 2.\n  feedback: none\n  intended_uses: ''\n  license:\n    explanation: can be found at https://github.com/rpherrera/WTFPL (HuggingFace lists\n      this to be the license)\n    value: WTFPL\n  modality: text; text\n  model_card: https://huggingface.co/CausalLM/14B\n  monitoring: unknown\n  name: CausalLM\n  nationality: USA\n  organization: CausalLM\n  prohibited_uses: ''\n  quality_control: ''\n  size: 14B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/CausalLM/14B\n"
  },
  {
    "path": "assets/cerebras.yaml",
    "content": "- access:\n    explanation: 'The Pile is an open source dataset. Hugging Face compatible checkpoints\n      available on the [[Cerebras Hugging Face page]](https://huggingface.co/cerebras/Cerebras-GPT-13B).\n      Cerebras systems checkpoints for pre-training and fine tuning are available\n      in the cloud via the [[Cerebras Model Studio]](https://www.cerebras.net/product-cloud/).\n\n      '\n    value: open\n  analysis: '\"We evaluate our models on the PILE validation set comprising 380M tokens.\n    We also evaluate the public checkpoints of Pythia, Eleuther (2022); OPT, Zhang\n    et al. (2022); GPT-NeoX 20B, Black et al. (2022); and GPT-J 6B, Wang & Komatsuzaki\n    (2021). We performed upstream (pre-training) evaluations of text prediction cross-entropy\n    using the Pile validation and test splits. We performed downstream evaluations\n    of text generation accuracy on standardized tasks using the Eleuther lm-evaluation-harness.\"\n    [[Evaluations]] (https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#evaluations).\n\n    '\n  created_date:\n    explanation: 'The date the model was announced in the [[Cerebras blog post]](https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models).\n\n      '\n    value: 2023-03-28\n  dependencies:\n  - The Pile\n  description: 'A Family of Open, Compute-efficient, Large Language Models. The family\n    includes 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B models. All models in the\n    Cerebras-GPT family have been trained in accordance with Chinchilla scaling laws\n    (20 tokens per model parameter). [[Cerebras Blog Post]](https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models)\n\n    '\n  feedback: ''\n  intended_uses: '\"The primary intended use is to further research into large language\n    models. These models can be used as a foundation model for NLP, applications,\n    ethics, and alignment research. Our primary intended users are researchers who\n    are working to improve LLMs and practitioners seeking reference implementations,\n    training setups, hyperparameters, or pre-trained models. We release these models\n    with a fully permissive Apache license for the community to use freely.\" [[Uses\n    and Limitations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#uses-and-limitations).\n\n    '\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/cerebras/Cerebras-GPT-13B\n  monitoring: ''\n  name: Cerebras-GPT\n  nationality: USA\n  organization: Cerebras\n  prohibited_uses: 'Authors note the following limitations of the model: \"Cerebras-GPT\n    models are trained on the Pile, with English language only, and are not suitable\n    for machine translation tasks. Cerebras-GPT models have not been tuned for human-facing\n    dialog applications like chatbots and will not respond to prompts in a similar\n    way to models that have received instruction tuning or reinforcement learning\n    from human feedback (RLHF) like Flan-T5 or ChatGPT.\" [[Uses and Limitations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#out-of-scope-use).\n\n    '\n  quality_control: 'The Pile dataset has been thoroughly analyzed from various ethical\n    standpoints such as toxicity analysis, gender bias, pejorative content, racially\n    sensitive content etc. Only mitigations in standard Pile dataset pre-processing\n    were employed when pre-training Cerebras-GPT. [[Risk, Bias, Ethical Considerations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#risk-bias-ethical-considerations)\n\n    '\n  size: 13B parameters (dense)\n  training_emissions: ''\n  training_hardware:\n    explanation: 'According to [[Model Description]](https://huggingface.co/cerebras/Cerebras-GPT-13B#model-description)\n\n      '\n    value: 16x Cerebras CS-2 wafer scale systems\n  training_time:\n    explanation: ''\n    value: ''\n  type: model\n  url: https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/\n- access: open\n  analysis: Evaluated on standard English LLM benchmarks and adapted Arabic LLM benchmarks.\n  created_date: 2023-08-30\n  dependencies:\n  - GPT-3\n  - The Pile\n  description: \"Jais is the world\\u2019s most advanced Arabic LLM as of its release.\"\n  feedback: none\n  intended_uses: Jais is released with the aim to stimulate research and development\n    in the Arabic NLP community.\n  license: Apache 2.0\n  modality: text; text\n  model_card:\n    explanation: \"Found in section C \\u201CModel Cards\\u201D\"\n    value: https://inceptioniai.org/jais/docs/Technicalpaper.pdf\n  monitoring: unknown\n  name: Jais\n  nationality: unknown\n  organization: Inception Institute of Artificial Intelligence, Cerebras, Mohamed\n    bin Zayed University of Artificial Intelligence\n  prohibited_uses: Generating or endorsing hate speech, disseminating false information,\n    engaging in illegal activities, managing sensitive data, attempting language generalization\n    beyond Arabic and English, and making critical decisions with high stakes.\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: Condor Galaxy Supercomputer\n  training_time: unknown\n  type: model\n  url: https://inceptioniai.org/jais/docs/Technicalpaper.pdf\n- access: open\n  analysis: Evaluated on standard English LLM benchmarks and adapted Arabic LLM benchmarks.\n  created_date: 2023-08-30\n  dependencies:\n  - GPT-3\n  - The Pile\n  description: Jais Chat is an instruction-tuned version of Jais, optimized for dialog\n    interaction.\n  feedback: none\n  intended_uses: Jais Chat is released with the aim to stimulate research and development\n    in the Arabic NLP community.\n  license: Apache 2.0\n  modality: text; text\n  model_card:\n    explanation: \"Found in section C \\u201CModel Cards\\u201D\"\n    value: https://inceptioniai.org/jais/docs/Technicalpaper.pdf\n  monitoring: unknown\n  name: Jais Chat\n  nationality: unknown\n  organization: Inception Institute of Artificial Intelligence, Cerebras, Mohamed\n    bin Zayed University of Artificial Intelligence\n  prohibited_uses: Generating or endorsing hate speech, disseminating false information,\n    engaging in illegal activities, managing sensitive data, attempting language generalization\n    beyond Arabic and English, and making critical decisions with high stakes.\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: Condor Galaxy Supercomputer from Cerebras\n  training_time: unknown\n  type: model\n  url: https://inceptioniai.org/jais/docs/Technicalpaper.pdf\n- access: open\n  analysis: Evaluated on standard LLM benchmarks in comparison to similar-sized models.\n  created_date: 2023-07-24\n  dependencies:\n  - SlimPajama\n  description: Bittensor Language Model is a 3 billion parameter language model with\n    an 8k context length trained on 627B tokens of SlimPajama.\n  feedback: https://huggingface.co/cerebras/btlm-3b-8k-base/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/cerebras/btlm-3b-8k-base\n  monitoring: unknown\n  name: Bittensor Language Model\n  nationality: USA\n  organization: Cerebras\n  prohibited_uses: ''\n  quality_control: ''\n  size: 3B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/\n- access: open\n  analysis: ''\n  created_date: 2023-06-09\n  datasheet: https://huggingface.co/datasets/cerebras/SlimPajama-627B\n  dependencies:\n  - RedPajama-Data\n  description: As of release, SlimPajama is the largest extensively deduplicated,\n    multi-corpora, open-source dataset for training large language models.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/cerebras/SlimPajama-627B/discussions\n  included: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text\n  monitoring: unknown\n  name: SlimPajama\n  nationality: USA\n  organization: Cerebras\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 627B tokens\n  type: dataset\n  url: https://huggingface.co/datasets/cerebras/SlimPajama-627B\n"
  },
  {
    "path": "assets/character.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2022-09-16\n  dependencies: []\n  description: Character allows users to converse with various chatbot personas.\n  failures: unknown\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: unknown\n  name: Character\n  nationality: USA\n  organization: Character AI\n  output_space: AI-generated chat conversations\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://beta.character.ai/tos\n  type: application\n  url: https://beta.character.ai/\n  user_distribution: unknown\n"
  },
  {
    "path": "assets/chatglm.yaml",
    "content": "- access: open\n  analysis: Performance evaluated on English and Chinese language benchmark tests.\n  created_date: 2023-03-14\n  dependencies: []\n  description: ChatGLM is a Chinese-English language model with question and answer\n    and dialogue functions, and is aimed at a Chinese audience.\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: ChatGLM\n  nationality: China\n  organization: ChatGLM\n  prohibited_uses: ''\n  quality_control: ''\n  size: 6B parameters (dense)\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: unknown\n  type: model\n  url: https://chatglm.cn/blog\n"
  },
  {
    "path": "assets/cmu.yaml",
    "content": "- access:\n    explanation: Model checkpoints are available for download at https://github.com/VHellendoorn/Code-LMs\n    value: open\n  analysis: Reports results on standard code benchmarks across a variety of programming\n    languages.\n  created_date:\n    explanation: The date the model paper was released\n    value: 2022-02-26\n  dependencies:\n  - Github\n  description: PolyCoder is a code model trained on 2.7B parameters based on the GPT-2\n    architecture, which was trained on 249GB of code across 12 programming languages\n    on a single machine.\n  feedback: https://huggingface.co/NinedayWang/PolyCoder-2.7B/discussion\n  intended_uses: unknown\n  license:\n    explanation: The license is provided in the [[Github repository]](https://github.com/VHellendoorn/Code-LMs)\n    value: MIT\n  modality: code\n  model_card: https://huggingface.co/NinedayWang/PolyCoder-2.7B\n  monitoring: None\n  name: PolyCoder\n  nationality: USA\n  organization: Carnegie Mellon University\n  prohibited_uses: None\n  quality_control: No specific quality control is mentioned in model training, though\n    details on data processing and how the tokenizer was trained are provided in the\n    paper.\n  size: 2.7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8 NVIDIA RTX 8000\n  training_time: 6 weeks\n  type: model\n  url: https://arxiv.org/abs/2202.13169\n- access: open\n  analysis: Evaluated on nascent time-series datasets and benchmarks.\n  created_date: 2024-02-06\n  dependencies: []\n  description: Moment is a family of open-source foundation models for general-purpose\n    time-series analysis.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: ''\n  model_card: none\n  monitoring: unknown\n  name: Moment\n  nationality: unknown\n  organization: Carnegie Mellon University, University of Pennsylvania\n  prohibited_uses: ''\n  quality_control: ''\n  size: 385M parameters (dense)\n  training_emissions: unknown\n  training_hardware: Single A6000 GPU\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2402.03885.pdf\n"
  },
  {
    "path": "assets/cognition.yaml",
    "content": "- access: limited\n  analysis: Evaluated on SWE-Bench, a challenging software engineering benchmark,\n    where Devin outperforms major state of the art models unassisted.\n  created_date: 2024-03-12\n  dependencies: []\n  description: \"Devin is the world\\u2019s first fully autonomous AI software engineer.\"\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; code\n  model_card: none\n  monitoring: ''\n  name: Devin\n  nationality: USA\n  organization: Cognition Labs\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.cognition-labs.com/introducing-devin\n- access:\n    explanation: ESM C is a drop-in replacement for previous models that provides\n      major improvements in both capability and efficiency. ESM C models are available\n      immediately for academic and commercial use under a new license structure designed\n      to promote openness and enable scientists and builders.\n    value: open\n  analysis: Evaluations. We use the methodology of Rao et al. to measure unsupervised\n    learning of tertiary structure in the form of contact maps. A logistic regression\n    is used to identify contacts. For a protein of length L, we evaluate the precision\n    of the top L contacts (P@L) with sequence separation of 6 or more residues.\n  created_date: 2024-12-04\n  dependencies:\n  - UniRef\n  - MGnify\n  - Joint Genome Institute\n  description: ESM Cambrian is a next generation language model trained on protein\n    sequences at the scale of life on Earth. ESM C models define a new state of the\n    art for protein representation learning.\n  feedback: unknown\n  intended_uses: \"one that can be used by builders across a wide range of applications,\\\n    \\ to imbue AI models with a deeper understanding of the biology of life\\u2019\\\n    s most important and mysterious molecules.\"\n  license: Cambrian Open License\n  modality:\n    explanation: 'Just as a person can fill in the blanks, such as: To __ or not to\n      __ that is the ___ We can train language models to fill in the blanks. Except\n      in biology, instead of training the model to predict words, we train it to predict\n      the characters in a protein sequence, i.e. its sequence of amino acids.'\n    value: text; text\n  model_card: unknown\n  monitoring: unknown\n  name: ESM Cambrian\n  nationality: USA\n  organization: EvolutionaryScale\n  prohibited_uses: unknown\n  quality_control: ESM C was reviewed by a committee of scientific experts who concluded\n    that the benefits of releasing the models greatly outweigh any potential risks.\n  size:\n    explanation: 'ESM C is trained at three scales: 300M, 600M, and 6B parameters.'\n    value: 6B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.evolutionaryscale.ai/blog/esm-cambrian\n"
  },
  {
    "path": "assets/cognitive.yaml",
    "content": "- access: open\n  analysis: none\n  created_date: 2023-11-14\n  dependencies:\n  - Dolphin\n  - Yi\n  description: Dolphin 2.2 Yi is an LLM based off Yi.\n  feedback: https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b/discussions\n  intended_uses: ''\n  license:\n    explanation: can be found at https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b/blob/main/LICENSE\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b\n  monitoring: unknown\n  name: Dolphin 2.2 Yi\n  nationality: USA\n  organization: Cognitive Computations\n  prohibited_uses: ''\n  quality_control: ''\n  size: 34B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 4 A100 GPUs\n  training_time: 3 days\n  type: model\n  url: https://erichartford.com/dolphin\n- access: open\n  analysis: Evaluated on OpenLLM leaderboard.\n  created_date:\n    explanation: release date is not published; estimated to be sometime in either\n      May or June 2023.\n    value: 2023-06-01\n  dependencies:\n  - WizardLM\n  description: WizardLM Uncensored is WizardLM trained with a subset of the dataset\n    - responses that contained alignment / moralizing were removed.\n  feedback: https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored/discussions\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored\n  monitoring: unknown\n  name: WizardLM Uncensored\n  nationality: USA\n  organization: Cognitive Computations\n  prohibited_uses: ''\n  quality_control: ''\n  size: 30B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored\n"
  },
  {
    "path": "assets/cohere.yaml",
    "content": "- access:\n    explanation: The dataset isn't provided to the public.\n    value: closed\n  analysis:\n    explanation: The analysis performed on the dataset is not released.\n    value: unknown\n  created_date:\n    explanation: 'The date the Cohere API was announced on the news [[News Article]]\n      (https://venturebeat.com/2021/11/15/openai-rival-cohere-launches-language-model-api/).\n\n      '\n    value: 2021-11-15\n  datasheet: https://docs.cohere.ai/data-statement\n  dependencies: []\n  description: 'The family of datasets used to train Cohere models, which come in\n    two forms: coheretext-filtered and coheretext-unfiltered. The former is used to\n    train the Representation models, while the latter one is used to train the Generation\n    models.\n\n    '\n  excluded: 'Documents that are not in English are excluded.\n\n    '\n  feedback:\n    explanation: The internal feedback mechanisms for Cohere are unknown.\n    value: unknown\n  included: 'As stated in the datasheet, the dataset \"includes the Google Books dataset,\n    CommonCrawl, and text from the internet scraped by the Cohere infrastructure team.\"\n    The top ten domains scraped were: wordpress.com, medium.com, stackexchange.com,\n    tumblr.com, elsevier.com, genius.com, bbc.co.uk, libsyn.com, yahoo.com, nytimes.com\n    [[Datasheet]](https://docs.cohere.ai/data-statement).\n\n    '\n  intended_uses: 'The intended use of the dataset is to train Cohere''s language models.\n\n    '\n  license: unknown\n  modality: text; text\n  monitoring:\n    explanation: 'There is no information on how Cohere is internally monitoring the\n      use of the dataset.\n\n      '\n    value: unknown\n  name: coheretext\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses:\n    explanation: 'There are no known prohibited uses of the dataset, but the Cohere\n      API is bound by the Cohere usage guidelines, which may also apply to this dataset\n      [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n      '\n    value: unknown\n  quality_control: 'In the datasheet, it is implied that Cohere employs filtration\n    methods for removing racist, biased and toxic content, but the details are not\n    provided. These filtration methods take both the context and the language, as\n    opposed to using a list of blockwords [[Datasheet]](https://docs.cohere.ai/data-statement).\n\n    '\n  sample: []\n  size:\n    explanation: 'Size of the filtered dataset is 200 GB. The unfiltered dataset is\n      ~3 TB.\n\n      '\n    value: 200 GB\n  type: dataset\n  url: none\n- access:\n    explanation: 'The model is available to the public through the Cohere Platform\n      [[Cohere Platform]](https://os.cohere.ai/login).\n\n      '\n    value: limited\n  analysis: The model's performance was analyzed on Hellaswag and COPA, as well as\n    several safety benchmarks [[Model Card]](https://docs.cohere.ai/generation-card).\n  created_date:\n    explanation: 'The date the Cohere API was announced on the news [[News Article]]\n      (https://venturebeat.com/2021/11/15/openai-rival-cohere-launches-language-model-api/).\n\n      '\n    value: 2021-11-15\n  dependencies:\n  - coheretext\n  description: 'The Generations model is a language model trained by Cohere for generation\n    tasks.\n\n    '\n  feedback:\n    explanation: 'There are no specific feedback mechanisms for the Generation model,\n      but a generic contact email is provided on Cohere website, which is support\n      at cohere.ai.\n\n      '\n    value: unknown\n  intended_uses: 'On the model card, the intended uses are stated as \"interactive\n    autocomplete, augmenting human writing processes, summarization, text rephrasing,\n    and other text-to-text tasks in non-sensitive domains\" [[Model Card]](https://docs.cohere.ai/generation-card).\n\n    '\n  license: unknown\n  modality: text; text\n  model_card: https://docs.cohere.ai/generation-card\n  monitoring: 'The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/generation-card).\n\n    '\n  name: Cohere Base\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: 'The usage of the model is bound by the Cohere usage guidelines\n    [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive\n    list of specific application violating these guidelines are: astroturfing, generation\n    of misinformation and other harmful content, and \"generation of text about people,\n    places, or events without a human-in-the-loop\" [[Model Card]](https://docs.cohere.ai/generation-card).\n\n    '\n  quality_control:\n    explanation: 'The quality control measures taken are unknown, but it is implied\n      that Cohere performed mitigation strategies for toxic degeneration [[Model Card]](https://docs.cohere.ai/generation-card).\n\n      '\n    value: unknown\n  size:\n    explanation: 'The exact sizes of the generation models are unknown, but we know\n      that they come in three sizes: small, medium, and large [[Model Card]](https://docs.cohere.ai/generation-card).\n\n      '\n    value: unknown\n  training_emissions:\n    explanation: The emissions of the models are unknown.\n    value: unknown\n  training_hardware:\n    explanation: 'The training hardware wasn''t explicitly announced, but it was reported\n      that Google Cloud teamed up with Cohere on a TPU partnership [[TechCrunch Article]]\n      (https://techcrunch.com/2021/11/17/google-cloud-teams-up-with-nlp-startup-cohere-on-multi-year-partnership/).\n\n      '\n    value: unknown\n  training_time:\n    explanation: The training time for the models are unknown.\n    value: unknown\n  type: model\n  url: none\n- access:\n    explanation: 'The model is available to the public through the Cohere Platform\n      [[Cohere Platform]](https://os.cohere.ai/login).\n\n      '\n    value: limited\n  analysis: The model's performance was analyzed on Hellaswag and COPA, as well as\n    several safety benchmarks [[Model Card]](https://docs.cohere.ai/generation-card).\n  created_date:\n    explanation: 'Cohere Command was in beta before this date\n\n      '\n    value: 2023-01-01\n  dependencies:\n  - Cohere Base\n  description: 'This model is a generative model optimized to follow commands in the\n    prompt.\n\n    '\n  feedback:\n    explanation: 'There are no specific feedback mechanisms for the Generation model,\n      but a generic contact email is provided on Cohere website, which is support\n      at cohere.ai.\n\n      '\n    value: unknown\n  intended_uses: 'On the model card, the intended uses are stated as \"interactive\n    autocomplete, augmenting human writing processes, summarization, text rephrasing,\n    and other text-to-text tasks in non-sensitive domains\" [[Model Card]](https://docs.cohere.ai/generation-card).\n\n    '\n  license:\n    explanation: The model likely has a license specifically for Cohere's use.\n    value: unknown\n  modality: text; text\n  model_card: https://docs.cohere.ai/generation-card\n  monitoring: 'The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/generation-card).\n\n    '\n  name: Cohere Command\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: 'The usage of the model is bound by the Cohere usage guidelines\n    [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive\n    list of specific application violating these guidelines are: astroturfing, generation\n    of misinformation and other harmful content, and \"generation of text about people,\n    places, or events without a human-in-the-loop\" [[Model Card]](https://docs.cohere.ai/generation-card).\n\n    '\n  quality_control:\n    explanation: unknown\n    value: unknown\n  size: unknown\n  training_emissions:\n    explanation: unknown\n    value: unknown\n  training_hardware:\n    explanation: unknown\n    value: unknown\n  training_time:\n    explanation: unknown\n    value: unknown\n  type: model\n  url: https://docs.cohere.com/docs/command-beta\n- access:\n    explanation: 'The model is available to the public through the Cohere Platform\n      [[Cohere Platform]](https://cohere.ai/).\n\n      '\n    value: limited\n  analysis: 'The model''s performance was analyzed on several safety benchmarks [[Model\n    Card]](https://docs.cohere.ai/representation-card).\n\n    '\n  created_date:\n    explanation: 'The date the Cohere API was announced on the news [[News Article]]\n      (https://venturebeat.com/2021/11/15/openai-rival-cohere-launches-language-model-api/).\n\n      '\n    value: 2021-11-15\n  dependencies: []\n  description: 'The Embedding Large (English) model is a language model trained by\n    Cohere for tasks requiring embeddings.\n\n    '\n  feedback:\n    explanation: 'There are no specific feedback mechanisms for the Generation model,\n      but a generic contact email is provided on Cohere website, which is support\n      at cohere.ai.\n\n      '\n    value: unknown\n  intended_uses: 'The intended uses are stated as \"estimating semantic similarity\n    between two sentences, choosing a sentence which is most likely to follow another\n    sentence, sentiment analysis, topic extraction, or categorizing user feedback\"\n    on the Cohere model card [[Model Card]](https://docs.cohere.ai/representation-card).\n\n    '\n  license:\n    explanation: The model likely has a license specifically for Cohere's use.\n    value: unknown\n  modality: text; text\n  model_card: https://docs.cohere.ai/representation-card\n  monitoring: 'The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/representation-card).\n\n    '\n  name: Cohere Embed (English)\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: 'The usage of the model is bound by the Cohere usage guidelines\n    [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive\n    list of specific application violating these guidelines are: extraction of identity\n    and demographic information, building purposefully opaque text classification\n    systems, and \"building downstream classifiers that serve as automated decision-making\n    systems that have real-world consequences on people, where those decisions are\n    made without a human-in-the-loop\" [[Model Card]](https://docs.cohere.ai/representation-card).\n\n    '\n  quality_control: unknown\n  size: unknown\n  training_emissions:\n    explanation: The emissions of the models are unknown.\n    value: unknown\n  training_hardware:\n    explanation: 'The training hardware wasn''t explicitly announced, but it was reported\n      that Google Cloud teamed up with Cohere on a TPU partnership [[TechCrunch Article]]\n      (https://techcrunch.com/2021/11/17/google-cloud-teams-up-with-nlp-startup-cohere-on-multi-year-partnership/).\n\n      '\n    value: unknown\n  training_time:\n    explanation: The training time for the models are unknown.\n    value: unknown\n  type: model\n  url: none\n- access:\n    explanation: 'The model is available to the public through the Cohere Platform\n      [[Cohere Platform]](https://cohere.ai/).\n\n      '\n    value: limited\n  analysis: 'The model''s performance was analyzed on several safety benchmarks [[Model\n    Card]](https://docs.cohere.ai/representation-card).\n\n    '\n  created_date:\n    explanation: 'The launch of the multilingual embedding model was announced in\n      this [[blog post]] (https://txt.cohere.com/multilingual/).\n\n      '\n    value: 2022-12-12\n  dependencies: []\n  description: 'This model maps text from 100+ languages to a semantic vector space,\n    positioning text with a similar meaning (regardless of language) in close proximity.\n\n    '\n  feedback:\n    explanation: 'There are no specific feedback mechanisms for the Generation model,\n      but a generic contact email is provided on Cohere website, which is support\n      at cohere.ai.\n\n      '\n    value: unknown\n  intended_uses: 'The intended uses are stated as \"estimating semantic similarity\n    between two sentences, choosing a sentence which is most likely to follow another\n    sentence, sentiment analysis, topic extraction, or categorizing user feedback\"\n    on the Cohere model card [[Model Card]](https://docs.cohere.ai/representation-card).\n\n    '\n  license:\n    explanation: 'The asset isn''t released, and hence the license is unknown.\n\n      '''\n    value: unknown\n  modality: text; text\n  model_card: https://docs.cohere.ai/representation-card\n  monitoring: 'The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/representation-card).\n\n    '\n  name: Cohere Embed (Multilingual)\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: 'The usage of the model is bound by the Cohere usage guidelines\n    [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive\n    list of specific application violating these guidelines are: extraction of identity\n    and demographic information, building purposefully opaque text classification\n    systems, and \"building downstream classifiers that serve as automated decision-making\n    systems that have real-world consequences on people, where those decisions are\n    made without a human-in-the-loop\" [[Model Card]](https://docs.cohere.ai/representation-card).\n\n    '\n  quality_control: unknown\n  size: unknown\n  training_emissions:\n    explanation: The emissions of the models are unknown.\n    value: unknown\n  training_hardware:\n    explanation: 'The training hardware wasn''t explicitly announced, but it was reported\n      that Google Cloud teamed up with Cohere on a TPU partnership [[TechCrunch Article]]\n      (https://techcrunch.com/2021/11/17/google-cloud-teams-up-with-nlp-startup-cohere-on-multi-year-partnership/).\n\n      '\n    value: unknown\n  training_time:\n    explanation: The training time for the models are unknown.\n    value: unknown\n  type: model\n  url: https://txt.cohere.ai/multilingual/\n- access:\n    explanation: 'Users can access the Cohere API by signing up on the Cohere website\n      [[Cohere Website]](https://cohere.ai/).\n\n      '\n    value: limited\n  adaptation: unknown\n  created_date:\n    explanation: 'The date the Cohere API was announced on the news [[News Article]]\n      (https://venturebeat.com/2021/11/15/openai-rival-cohere-launches-language-model-api/).\n\n      '\n    value: 2021-11-15\n  dependencies:\n  - Cohere Chat Endpoint\n  - Cohere Generate Endpoint (legacy)\n  - Cohere Embed Endpoint\n  - Cohere Classify Endpoint\n  - Cohere Summarize Endpoint\n  description: 'Cohere API allows users to access the cohere language models and utilize\n    them in their applications.\n\n    '\n  failures: unknown\n  feedback: 'General feedback as well as the violations of the usage guidelines can\n    be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  intended_uses: 'Intended to be used by developers who would like to incorporate\n    NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\n    '\n  license:\n    explanation: '\"Subject to these Terms of Use, we grant you anon-exclusive, non-transferable,\n      non-sublicensable and revocable licence to use the Cohere Solution in accordance\n      with these Terms of Use\" - excerpt from the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: 'All applications developed using the Cohere API is subject to review\n    by Cohere.\n\n    '\n  monthly_active_users: unknown\n  name: Cohere API\n  nationality: Canada\n  organization: Cohere\n  output_space: generation and embeddings\n  prohibited_uses: 'The usage of the API is bound by the Cohere usage guidelines.\n    Disallowed use cases include violence and threats, antisocial and antidemocratic\n    uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making,\n    high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  quality_control: 'The new users of the API get a limited access restricting the\n    sizes of the models as well as the number of tokens that can be used. Users are\n    required to go through an internal application to upgrade to full access [[Limited\n    Access]](https://docs.cohere.ai/limited-access).\n\n    '\n  terms_of_service: https://cohere.ai/terms-of-use\n  type: application\n  url: https://cohere.ai/\n  user_distribution: unknown\n- access:\n    explanation: 'Users can access the Cohere API by signing up on the Cohere website\n      [[Cohere Website]](https://cohere.ai/).\n\n      '\n    value: limited\n  adaptation: unknown\n  created_date:\n    explanation: The date the Chat API was announced in Cohere's blog [[Blog Post]](https://txt.cohere.com/chat-with-rag).\n    value: 2023-09-28\n  dependencies:\n  - Cohere Command\n  - Command R\n  - Command R+\n  description: This endpoint generates a text response to a user message in the context\n    of a conversation, optionally grounding its generation with source material from\n    tools like search engines, functions or databases, and attributing its responses\n    with citations.\n  failures: unknown\n  feedback: 'General feedback as well as the violations of the usage guidelines can\n    be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  intended_uses: 'Intended to be used by developers who would like to incorporate\n    NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\n    '\n  license: 'Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\n    '\n  monitoring: 'All applications developed using the Cohere API is subject to review\n    by Cohere.\n\n    '\n  monthly_active_users: unknown\n  name: Cohere Chat Endpoint\n  nationality: Canada\n  organization: Cohere\n  output_space: generation\n  prohibited_uses: 'The usage of the API is bound by the Cohere usage guidelines.\n    Disallowed use cases include violence and threats, antisocial and antidemocratic\n    uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making,\n    high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  quality_control: ''\n  terms_of_service: https://cohere.ai/terms-of-use\n  type: application\n  url: https://docs.cohere.com/reference/chat\n  user_distribution: unknown\n- access:\n    explanation: 'Users can access the Cohere API by signing up on the Cohere website\n      [[Cohere Website]](https://cohere.ai/).\n\n      '\n    value: limited\n  adaptation: unknown\n  created_date:\n    explanation: 'The launch of Generate was announced in this [[news article]] (https://venturebeat.com/uncategorized/openai-rival-cohere-launches-language-model-api/).\n\n      '\n    value: 2021-11-15\n  dependencies:\n  - Cohere Base\n  - Cohere Command\n  description: 'This endpoint generates realistic text conditioned on a given input.\n\n    '\n  failures: unknown\n  feedback: 'General feedback as well as the violations of the usage guidelines can\n    be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  intended_uses: 'Intended to be used by developers who would like to incorporate\n    NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\n    '\n  license: 'Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\n    '\n  monitoring: 'All applications developed using the Cohere API is subject to review\n    by Cohere.\n\n    '\n  monthly_active_users: unknown\n  name: Cohere Generate Endpoint (legacy)\n  nationality: Canada\n  organization: Cohere\n  output_space: generation\n  prohibited_uses: 'The usage of the API is bound by the Cohere usage guidelines.\n    Disallowed use cases include violence and threats, antisocial and antidemocratic\n    uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making,\n    high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  quality_control: 'The new users of the API get a limited access restricting the\n    sizes of the models as well as the number of tokens that can be used. Users are\n    required to go through an internal application to upgrade to full access [[Limited\n    Access]](https://docs.cohere.ai/limited-access).\n\n    '\n  terms_of_service: https://cohere.ai/terms-of-use\n  type: application\n  url: https://docs.cohere.ai/reference/generate\n  user_distribution: unknown\n- access:\n    explanation: 'Users can access the Cohere API by signing up on the Cohere website\n      [[Cohere Website]](https://cohere.ai/).\n\n      '\n    value: limited\n  adaptation: unknown\n  created_date:\n    explanation: 'The launch of Embed was announced in this [[news article]] (https://venturebeat.com/uncategorized/openai-rival-cohere-launches-language-model-api/).\n\n      '\n    value: 2021-11-15\n  dependencies:\n  - Cohere Embed (Multilingual)\n  - Cohere Embed (English)\n  description: 'This endpoint returns text embeddings. An embedding is a list of floating\n    point numbers that captures semantic information about the text that it represents.\n\n    '\n  failures: unknown\n  feedback: 'General feedback as well as the violations of the usage guidelines can\n    be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  intended_uses: 'Intended to be used by developers who would like to incorporate\n    NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\n    '\n  license: 'Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\n    '\n  monitoring: 'All applications developed using the Cohere API is subject to review\n    by Cohere.\n\n    '\n  monthly_active_users: unknown\n  name: Cohere Embed Endpoint\n  nationality: Canada\n  organization: Cohere\n  output_space: embedding\n  prohibited_uses: 'The usage of the API is bound by the Cohere usage guidelines.\n    Disallowed use cases include violence and threats, antisocial and antidemocratic\n    uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making,\n    high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  quality_control: 'The new users of the API get a limited access restricting the\n    sizes of the models as well as the number of tokens that can be used. Users are\n    required to go through an internal application to upgrade to full access [[Limited\n    Access]](https://docs.cohere.ai/limited-access).\n\n    '\n  terms_of_service: https://cohere.ai/terms-of-use\n  type: application\n  url: https://docs.cohere.ai/reference/embed\n  user_distribution: unknown\n- access:\n    explanation: 'Users can access the Cohere API by signing up on the Cohere website\n      [[Cohere Website]](https://cohere.ai/).\n\n      '\n    value: limited\n  adaptation: unknown\n  created_date:\n    explanation: 'The launch of Classify was announced in this [[blog post]] (https://txt.cohere.com/classification-is-now-available/).\n\n      '\n    value: 2022-05-05\n  dependencies:\n  - Cohere Embed (Multilingual)\n  - Cohere Embed (English)\n  description: 'This endpoint makes a prediction about which label best fits a specified\n    text input. To make a prediction, Classify uses the provided examples of text\n    + label pairs as a reference.\n\n    '\n  failures: unknown\n  feedback: 'General feedback as well as the violations of the usage guidelines can\n    be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  intended_uses: 'Intended to be used by developers who would like to incorporate\n    NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\n    '\n  license: 'Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\n    '\n  monitoring: 'All applications developed using the Cohere API is subject to review\n    by Cohere.\n\n    '\n  monthly_active_users: unknown\n  name: Cohere Classify Endpoint\n  nationality: Canada\n  organization: Cohere\n  output_space: embedding\n  prohibited_uses: 'The usage of the API is bound by the Cohere usage guidelines.\n    Disallowed use cases include violence and threats, antisocial and antidemocratic\n    uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making,\n    high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  quality_control: 'The new users of the API get a limited access restricting the\n    sizes of the models as well as the number of tokens that can be used. Users are\n    required to go through an internal application to upgrade to full access [[Limited\n    Access]](https://docs.cohere.ai/limited-access).\n\n    '\n  terms_of_service: https://cohere.ai/terms-of-use\n  type: application\n  url: https://docs.cohere.ai/reference/classify\n  user_distribution: unknown\n- access:\n    explanation: 'Users can access the Cohere API by signing up on the Cohere website\n      [[Cohere Website]](https://cohere.ai/).\n\n      '\n    value: limited\n  adaptation: unknown\n  created_date:\n    explanation: 'The beta launch of summarize was announced in this [[blog post]]\n      (https://txt.cohere.com/summarize-beta/).\n\n      '\n    value: 2023-02-22\n  dependencies: []\n  description: 'This endpoint generates a succinct version of the original text that\n    relays the most important information.\n\n    '\n  failures: unknown\n  feedback: 'General feedback as well as the violations of the usage guidelines can\n    be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  intended_uses: 'Intended to be used by developers who would like to incorporate\n    NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\n    '\n  license: 'Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\n    '\n  monitoring: 'All applications developed using the Cohere API is subject to review\n    by Cohere.\n\n    '\n  monthly_active_users: unknown\n  name: Cohere Summarize Endpoint\n  nationality: Canada\n  organization: Cohere\n  output_space: generation\n  prohibited_uses: 'The usage of the API is bound by the Cohere usage guidelines.\n    Disallowed use cases include violence and threats, antisocial and antidemocratic\n    uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making,\n    high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\n    '\n  quality_control: 'The new users of the API get a limited access restricting the\n    sizes of the models as well as the number of tokens that can be used. Users are\n    required to go through an internal application to upgrade to full access [[Limited\n    Access]](https://docs.cohere.ai/limited-access).\n\n    '\n  terms_of_service: https://cohere.ai/terms-of-use\n  type: application\n  url: https://docs.cohere.ai/reference/summarize\n  user_distribution: unknown\n- access:\n    explanation: 'The model is available to the public through the Cohere Platform\n      [[Cohere Platform]](https://cohere.ai/).\n\n      '\n    value: limited\n  analysis: Achieves SOTA performances on trusted MTEB and BEIR benchmarks.\n  created_date: 2023-11-02\n  dependencies: []\n  description: As of release, Cohere Embedv3 is Cohere's latest and most advanced\n    embeddings model.\n  feedback: https://huggingface.co/Cohere/Cohere-embed-english-v3.0/discussions\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: https://huggingface.co/Cohere/Cohere-embed-english-v3.0\n  monitoring: none\n  name: Cohere Embedv3 (English)\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://txt.cohere.com/introducing-embed-v3/\n- access: open\n  analysis: Evaluated on standard LLM and multilingual benchmarks in comparison to\n    SotA models.\n  created_date: 2024-02-12\n  dependencies:\n  - mT5\n  - Aya Dataset\n  description: Aya is a massively multilingual generative language model that follows\n    instructions in 101 languages of which over 50% are considered as lower-resourced.\n  feedback: https://huggingface.co/CohereForAI/aya-101/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/CohereForAI/aya-101\n  monitoring: unknown\n  name: Aya\n  nationality: unknown\n  organization: Cohere for AI, Cohere, Brown University, Carnegie Mellon University,\n    MIT\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2402.07827.pdf\n- access:\n    explanation: Model weights are made available through the [Hugging Face Model\n      Hub](https://huggingface.co/CohereForAI/c4ai-command-r-v01).\n    value: open\n  analysis: Command R was evaluated on standard English and multilingual (translated\n    MMLU, FLoRES) benchmarks. It was also evaluated on KILT, HotpotQA and Bamboogle.\n  created_date:\n    explanation: The date that Command R was announced.\n    value: 2024-03-11\n  dependencies: []\n  description: Command R is a scalable generative model targeting RAG and Tool Use\n    to enable production-scale AI for enterprise. It is an auto-regressive large language\n    model that uses an optimized transformer architecture. After pretraining, supervised\n    fine-tuning (SFT) and preference training were applied to align model behavior\n    to human preferences for helpfulness and safety. Command R is part of a family\n    of open weight releases from Cohere and Cohere For AI. Its larger companion model\n    is [Command R+](https://huggingface.co/CohereForAI/c4ai-command-r-plus).\n  feedback: '[https://huggingface.co/CohereForAI/c4ai-command-r-v01/discussions](https://huggingface.co/CohereForAI/c4ai-command-r-v01/discussions)'\n  intended_uses: 'Command R is optimized to perform well up to 128K context length\n    and in the following languages: English, French, Spanish, Italian, German, Brazilian\n    Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. It has been trained\n    for conversational tool use and grounded generation capabilities. The open weights\n    release of Command R is intended to make community-based research efforts more\n    accessible.'\n  license: CC BY NC 4.0\n  modality: text; text\n  model_card: https://huggingface.co/CohereForAI/c4ai-command-r-v01\n  monitoring: unknown\n  name: Command R\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: 'Usage of this model requires adherence to [C4AI''s Acceptable\n    Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy). Usage via\n    the Cohere Platform is subject to Cohere''s [Terms of Use](https://cohere.com/terms-of-use)\n    and [Usage Guidelines](https://docs.cohere.com/docs/usage-guidelines). A non-exhaustive\n    list of disallowed use cases include:\n\n    - Violence and harm\n\n    - Harassment and abuse\n\n    - Sexual exploitation, harm, or abuse\n\n    - Sensitve information\n\n    - Fraud and deception\n\n    - Synthetic data for commercial uses\n\n\n    Misuse should be reported to Cohere at safety at cohere.ai.'\n  quality_control: Comprehensive guidance for working with Command R's tool use and\n    grounded generation prompt template can be found in Cohere's [documentation](https://docs.cohere.com/docs/prompting-command-r).\n  size: 35B parameters (dense)\n  terms_of_service: https://cohere.ai/terms-of-use\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://txt.cohere.com/command-r/\n- access: open\n  analysis: unknown\n  created_date: 2024-02-09\n  datasheet: https://huggingface.co/datasets/CohereForAI/aya_dataset\n  dependencies: []\n  description: The Aya Dataset is a dataset that consists of original human-curated\n    prompt-completion pairs written by fluent speakers of 65 languages.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/CohereForAI/aya_dataset/discussions\n  included: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text\n  monitoring: unknown\n  name: Aya Dataset\n  nationality: unknown\n  organization: Cohere for AI, Beijing Academy of Artificial Intelligence, Cohere,\n    Binghamton University\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 204k human-annotated prompt-completion pairs\n  type: dataset\n  url: https://arxiv.org/pdf/2402.06619.pdf\n- access: limited\n  analysis: Evaluated on code retrieval and data retrieval capabilities, with improvements\n    compared to the standard in both.\n  created_date: 2024-04-11\n  dependencies: []\n  description: Rerank 3 is a new foundation model for efficient enterprise search\n    and retrieval with 4k context length.\n  feedback: none\n  intended_uses: Efficient enterprise search and retrieval.\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: unknown\n  name: Rerank 3\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://cohere.com/blog/rerank-3\n- access: open\n  analysis: Evaluated across 23 languages with the highest results in all tasks and\n    languages compared to other multilingual language models.\n  created_date: 2024-05-31\n  dependencies: []\n  description: Aya 23 is an open weights research release of an instruction fine-tuned\n    model with multilingual capabilities. It focuses on pairing a highly performant\n    pre-trained Command family of models with the recently released Aya Collection.\n    This model supports 23 languages.\n  feedback: https://huggingface.co/CohereForAI/aya-23-35B/discussions\n  intended_uses: This model is designed to be used for multilingual tasks covering\n    23 languages.\n  license: CC-BY-NC\n  modality: text; text\n  model_card: https://huggingface.co/CohereForAI/aya-23-35B\n  monitoring: unknown\n  name: Aya 23\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: 35B parameters\n  training_emissions: unknown\n  training_hardware: TPUv4 chips with up to 128 pod slices\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2405.15032\n- access:\n    explanation: Model weights are made available through the [Hugging Face Model\n      Hub](https://huggingface.co/CohereForAI/c4ai-command-r-plus).\n    value: open\n  analysis: Command R+ was evaluated on standard English and multilingual (e.g. FLoRES,\n    WMT23) benchmarks. It was also evaluated on multihop reasoning tasks (e.g. HotpotQA,\n    Bamboogle, StrategyQA), and tool use (e.g. ToolTalk, Berkeley Function-Calling).\n  created_date:\n    explanation: The date that Command R+ was announced.\n    value: 2024-04-03\n  dependencies: []\n  description: Command R+ is a state-of-the-art RAG-optimized model designed to tackle\n    enterprise-grade workloads. It is an auto-regressive large language model that\n    uses an optimized transformer architecture. After pretraining, supervised fine-tuning\n    (SFT) and preference training were applied to align model behavior to human preferences\n    for helpfulness and safety. Command R+ is part of a family of open weight releases\n    from Cohere and Cohere For AI. Its smaller companion model is [Command R](https://huggingface.co/CohereForAI/c4ai-command-r-v01).\n  feedback: '[https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions](https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions)'\n  intended_uses: 'Command R+ is optimized to perform well up to 128K context length\n    and in the following languages: English, French, Spanish, Italian, German, Brazilian\n    Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. It has been trained\n    for conversational tool use and grounded generation capabilities. The open weights\n    release of Command R+ is intended to make community-based research efforts more\n    accessible.'\n  license: CC BY NC 4.0\n  modality: text; text\n  model_card: https://huggingface.co/CohereForAI/c4ai-command-r-plus\n  monitoring: unknown\n  name: Command R+\n  nationality: Canada\n  organization: Cohere\n  prohibited_uses: 'Usage of this model requires adherence to [C4AI''s Acceptable\n    Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy). Usage via\n    the Cohere Platform is subject to Cohere''s [Terms of Use](https://cohere.com/terms-of-use)\n    and [Usage Guidelines](https://docs.cohere.com/docs/usage-guidelines). A non-exhaustive\n    list of disallowed use cases include:\n\n    - Violence and harm\n\n    - Harassment and abuse\n\n    - Sexual exploitation, harm, or abuse\n\n    - Sensitve information\n\n    - Fraud and deception\n\n    - Synthetic data for commercial uses\n\n\n    Misuse should be reported to Cohere at safety at cohere.ai.'\n  quality_control: Comprehensive guidance for working with Command R+'s tool use and\n    grounded generation prompt template can be found in Cohere's [documentation](https://docs.cohere.com/docs/prompting-command-r).\n  size: 104B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://txt.cohere.com/command-r-plus-microsoft-azure/\n"
  },
  {
    "path": "assets/columbia.yaml",
    "content": "- access: open\n  analysis: Evaluated on wide range of tasks using own evaluation benchmarks.\n  created_date: 2022-11-20\n  dependencies:\n  - AlphaFold2\n  - OpenProteinSet\n  description: OpenFold is an open source recreation of AlphaFold2.\n  feedback: ''\n  intended_uses: ''\n  license: CC BY 4.0\n  modality: amino acid sequence; protein structure\n  model_card: ''\n  monitoring: ''\n  name: OpenFold\n  nationality: USA\n  organization: Columbia\n  prohibited_uses: ''\n  quality_control: ''\n  size: ''\n  training_emissions: unknown\n  training_hardware: Single A100 NVIDIA GPU\n  training_time: 50,000 GPU hours\n  type: model\n  url: https://www.biorxiv.org/content/10.1101/2022.11.20.517210v2\n- access: open\n  analysis: Evaluated on the object hallucination benchmark and compared to GPT-4V.\n  created_date: 2023-10-11\n  dependencies:\n  - CLIP\n  - Vicuna\n  description: Ferret is a Multimodal Large Language Model (MLLM) capable of understanding\n    spatial referring of any shape or granularity within an image and accurately grounding\n    open-vocabulary descriptions.\n  feedback: none\n  intended_uses: ''\n  license:\n    explanation: License can be found at https://github.com/apple/ml-ferret/blob/main/LICENSE\n    value: Apple\n  modality: image, text; image, text\n  model_card: none\n  monitoring: none\n  name: Ferret\n  nationality: unknown\n  organization: Columbia, Apple AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters\n  training_emissions: unknown\n  training_hardware: 8 A100 GPUs\n  training_time: 2.5 to 5 days\n  type: model\n  url: https://arxiv.org/pdf/2310.07704.pdf\n"
  },
  {
    "path": "assets/compute_01_research.yaml",
    "content": "- access:\n    explanation: to ensure AGI will be open-source, transparent, and accessible\n    value: open\n  analysis: ''\n  created_date: 2024-10-11\n  dependencies:\n  - DeepMind's Distributed Low-Communication (DiLoCo)\n  description: INTELLECT-1 is a decentralized training model with 10 billion parameters,\n    designed to be open-source and transparent, allowing global contributors to participate\n    in its computation.\n  feedback: unknown\n  intended_uses: Training globally distributed AI models in decentralized environments.\n  license: Apache 2.0\n  modality: text; text\n  model_card: unknown\n  monitoring: unknown\n  name: INTELLECT-1\n  nationality: USA\n  organization: PrimeIntellect\n  prohibited_uses: ''\n  quality_control: ''\n  size:\n    explanation: a 10-billion-parameter model\n    value: 10B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.primeintellect.ai/blog/intellect-1\n"
  },
  {
    "path": "assets/continue.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-07-26\n  dependencies:\n  - GPT-4 API\n  - Code Llama\n  - Claude API\n  - WizardCoder\n  - PaLM API\n  description: \"Continue is the open-source autopilot for software development. It\\\n    \\ is an IDE extension that brings the power of ChatGPT to VS Code and JetBrains.\\\n    \\ It\\u2019s built to be deeply customizable and continuously learn from development\\\n    \\ data.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  monitoring: ''\n  monthly_active_users: ''\n  name: Continue\n  nationality: USA\n  organization: Continue Dev, Inc.\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://github.com/continuedev/continue/blob/main/LICENSE\n  type: application\n  url: https://continue.dev\n  user_distribution: ''\n"
  },
  {
    "path": "assets/cresta.yaml",
    "content": "- access: closed\n  analysis: Outperforms GPT-4 in common sense and reasoning tasks on the basis of\n    both efficiency and accuracy.\n  created_date: 2023-06-20\n  dependencies:\n  - GPT-4\n  - Claude\n  - Falcon-40B\n  description: Ocean-1 is the culmination of Cresta's experience in deploying generative\n    AI systems for large enterprises and signifies their latest milestone in advancing\n    the cutting edge AI technology for customer facing conversations.\n  feedback: none\n  intended_uses: Acting as a contact center chatbot agent.\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: unknown\n  name: Ocean-1\n  nationality: USA\n  organization: Cresta\n  prohibited_uses: none\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://cresta.com/blog/introducing-ocean-1-worlds-first-contact-center-foundation-model/\n"
  },
  {
    "path": "assets/databricks.yaml",
    "content": "- access:\n    explanation: 'Model training code can be accessed at the official Dolly repository.\n      Trained weights can be requested at hello-dolly@databricks.com. [[Dolly Repository]](https://github.com/databrickslabs/dolly).\n\n      '\n    value: open\n  analysis: '\"We evaluated Dolly on the instruction-following capabilities described\n    in the InstructGPT paper that ChatGPT is based on and found that it exhibits many\n    of the same qualitative capabilities, including text generation, brainstorming\n    and open Q&A.\" [[Databricks Blog Post]] (https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html).\n\n    '\n  created_date:\n    explanation: 'The date the model was announced in the [[Cerebras blog post]](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html).\n\n      '\n    value: 2023-03-24\n  dependencies:\n  - GPT-J\n  - Alpaca dataset\n  description: \"\\\"Databricks\\u2019 Dolly, a large language model trained on the Databricks\\n\\\n    \\ Machine Learning Platform, demonstrates that a two-years-old open source\\n model\\\n    \\ (GPT-J) can, when subjected to just 30 minutes of fine tuning on a\\n focused\\\n    \\ corpus of 50k records (Stanford Alpaca), exhibit surprisingly\\n high quality\\\n    \\ instruction following behavior not characteristic of the\\n foundation model\\\n    \\ on which it is based.\\\"\\n [[Dolly Repository]](https://github.com/databrickslabs/dolly).\\n\"\n  feedback: https://github.com/databrickslabs/dolly/issues\n  intended_uses: '\"Dolly is intended exclusively for research purposes and is not\n    licensed for commercial use.\" [[Limitations]](https://github.com/databrickslabs/dolly#limitations).\n\n    '\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: none\n  name: Dolly\n  nationality: USA\n  organization: Databricks\n  prohibited_uses: 'Authors note the following limitations of the model: \"The Dolly\n    model family is under active development, and so any list of shortcomings is unlikely\n    to be exhaustive, but we include known limitations and misfires here as a means\n    to document and share our preliminary findings with the community. In particular,\n    dolly-6b struggles with syntactically complex prompts, mathematical operations,\n    factual errors, dates and times, open-ended question answering, hallucination,\n    enumerating lists of specific length, and stylistic mimicry.\" [[Limitations]](https://github.com/databrickslabs/dolly#limitations).\n\n    '\n  quality_control: none\n  size: 6B parameters (dense)\n  training_emissions: unknown\n  training_hardware:\n    explanation: 'According to [[Model Overview]](https://github.com/databrickslabs/dolly#model-overview)\n\n      '\n    value: A single NDasrA100_v4 machine with 8x A100 40GB GPUs\n  training_time:\n    explanation: 'According to [[Model Overview]](https://github.com/databrickslabs/dolly#model-overview)\n\n      '\n    value: 30 minutes\n  type: model\n  url: https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html\n- access: open\n  analysis: DBRX outperforms established open-source and open-weight base models on\n    the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval.\n    Full evaluation details can be found in the corresponding technical blog post.\n  created_date: 2024-03-27\n  dependencies: []\n  description: DBRX is a transformer-based decoder-only large language model (LLM)\n    that was trained using next-token prediction by Databricks. It uses a fine-grained\n    mixture-of-experts (MoE) architecture with 132B total parameters of which 36B\n    parameters are active on any input. DBRX only accepts text-based inputs and produces\n    text-based outputs.\n  feedback: https://huggingface.co/databricks/dbrx-base/discussions\n  intended_uses: DBRX models are open, general-purpose LLMs intended and licensed\n    for both commercial and research applications. They can be further fine-tuned\n    for various domain-specific natural language and coding tasks.\n  license: Databricks Open Model License\n  modality: text; text\n  model_card: https://huggingface.co/databricks/dbrx-base\n  monitoring: unknown\n  name: DBRX\n  nationality: USA\n  organization: Databricks\n  prohibited_uses: DBRX models are not intended to be used out-of-the-box in non-English\n    languages, and do not support native code execution, function calling or any use\n    that violates applicable laws or regulations or is otherwise prohibited by the\n    Databricks Open Model License and Databricks Open Model Acceptable Use Policy.\n  quality_control: Recommendations provided for retrieval augmented generation (RAG)\n    in scenarios where accuracy and fidelity are important and additional testing\n    around safety in the context of the specific application and domain is suggested.\n  size: 132B parameters (sparse)\n  training_emissions: unknown\n  training_hardware: 3072 NVIDIA H100s connected by 3.2Tbps Infiniband\n  training_time: 3 months\n  type: model\n  url: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm\n"
  },
  {
    "path": "assets/deci.yaml",
    "content": "- access: open\n  analysis: Evaluated on the OpenLLM benchmarks and, on release, outperforms all other\n    7B models on the OpenLLM Leaderboard.\n  created_date: 2023-12-12\n  dependencies: []\n  description: DeciLM is a LLM that on release ranks as the fastest and most accurate\n    model of its size.\n  feedback: none\n  intended_uses: This model is intended for commercial and research use in English\n    and can be fine-tuned for use in other languages.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://deci.ai/model-zoo/decilm-7b/\n  monitoring: unknown\n  name: DeciLM\n  nationality: Israel\n  organization: Deci\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: NVIDIA A10 GPUs\n  training_time: unknown\n  type: model\n  url: https://deci.ai/blog/introducing-decilm-7b-the-fastest-and-most-accurate-7b-large-language-model-to-date\n"
  },
  {
    "path": "assets/deepmind.yaml",
    "content": "- access:\n    explanation: 'The dataset access is limited to DeepMind researchers [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.C).\n\n      '\n    value: closed\n  analysis: 'MassiveText data was analyzed for toxicity, language distribution, URL\n    breakdown, and tokenizer compression rates on the subsets [[Section A.2]](https://arxiv.org/pdf/2112.11446.pdf#subsection.A.2).\n\n    '\n  created_date:\n    explanation: 'The date that Gopher was announced [[DeepMind Blog Post]] (https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval).\n\n      '\n    value: 2021-12-08\n  datasheet: https://arxiv.org/pdf/2112.11446.pdf#subsection.A.5\n  dependencies: []\n  description: 'The MassiveText dataset was used to train the Gopher model.\n\n    '\n  excluded: 'Documents that are not in English are excluded.\n\n    '\n  feedback:\n    explanation: 'The internal feedback mechanisms for WebText are unknown [[Model\n      Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.C).\n\n      '\n    value: unknown\n  included: 'MassiveText data come from 6 sources: MassiveWeb (48%), Books (27%),\n    C4 (10%), News (10%), GitHub (3%), and Wikipedia (2%). MassiveWeb is a web text\n    corpus curated for MassiveText.\n\n    '\n  intended_uses: 'Pre-training of language models by DeepMind researchers [[Model\n    Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.C).\n\n    '\n  license:\n    explanation: 'The model likely has a license specifically for DeepMind''s use,\n      based on the information provided in the datasheet [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#subsection.A.5).\n\n      '\n    value: unknown\n  modality: code, text\n  monitoring:\n    explanation: 'There is no information on how DeepMind is internally monitoring\n      the use of the dataset.\n\n      '\n    value: unknown\n  name: MassiveText\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses:\n    explanation: 'There are no known prohibited uses of the dataset, but the authors\n      state that it should not be used for training models with multilingual capabilities\n      as it only contains the English language [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.C).\n\n      '\n    value: unknown\n  quality_control: \"The authors use simple heuristics for filtering low quality documents\\\n    \\ as opposed to relying on a classifier based on a \\\"gold\\\" set such as the English\\\n    \\ Wikipedia, which could \\\"inadvertently bias towards a certain demographic or\\\n    \\ erase certain dialects or sociolects from representation.\\\" MassiveWeb subset\\\n    \\ was filtered using Google\\u2019s SafeSearch filter, preferring it over to word\\\n    \\ filters that \\\"disproportinately filter out inoffensive content associated with\\\n    \\ minority groups. MassiveWeb was filtered further for word or phrase repetitions.\\\n    \\ All the subsets were filtered for document deduplication and test set contamination\\\"\\\n    \\ [[Appendix A]](https://arxiv.org/pdf/2112.11446.pdf#appendix.A).\\n\"\n  sample: []\n  size: 10.5 TB\n  type: dataset\n  url: https://arxiv.org/pdf/2112.11446.pdf\n- access:\n    explanation: 'The dataset access is limited to DeepMind researchers [[Datasheet]](https://arxiv.org/pdf/2204.14198.pdf#appendix.F).\n\n      '\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: 'The date that Flamingo was posted to arXiv [[arXiv]] (https://arxiv.org/pdf/2204.14198.pdf).\n\n      '\n    value: 2022-04-29\n  datasheet: https://arxiv.org/pdf/2204.14198.pdf#appendix.F\n  dependencies: []\n  description: 'M3W (MassiveWeb) is dataset used to train Flamingo, and other vision-language\n    models and was created by researchers and engineers.\n\n    '\n  excluded: unknown\n  feedback:\n    explanation: 'No feedback mechanism is mentioned in the datasheet [[Datasheet]](https://arxiv.org/pdf/2204.14198.pdf#appendix.F).\n\n      '\n    value: none\n  included: 'M3W has interleaved images (185M) and text (182GB) from the web.\n\n    '\n  intended_uses: 'Pre-training of vision and language models by DeepMind researchers\n    [[Datasheet]](https://arxiv.org/pdf/2204.14198.pdf#appendix.F).\n\n    '\n  license:\n    explanation: 'The model likely has a license specifically for DeepMind''s use,\n      based on the information provided in the datasheet [[Datasheet]](https://arxiv.org/pdf/2204.14198.pdf#appendix.F).\n\n      '\n    value: unknown\n  modality: code, text\n  monitoring:\n    explanation: 'There is no information on how DeepMind is internally monitoring\n      the use of the dataset.\n\n      '\n    value: unknown\n  name: M3W\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses:\n    explanation: 'There are no known prohibited uses of the dataset [[Datasheet]](https://arxiv.org/pdf/2204.14198.pdf#appendix.F).\n\n      '\n    value: unknown\n  quality_control: 'The authors provide a basic description of data processing and\n    cleaning.\n\n    '\n  sample: []\n  size: 182GB Text, 185M Images\n  type: dataset\n  url: https://arxiv.org/pdf/2204.14198.pdf\n- access:\n    explanation: 'The full dataset is not directly provided by the authors, though\n      some underlying data is public whereas others (e.g. MassiveText) is not.\n\n      '\n    value: closed\n  analysis: 'The Gato dataset compiles many datasets introduced in prior works, with\n    associated analyses.\n\n    '\n  created_date:\n    explanation: 'The date that Gato was announced [[DeepMind Blog Post]] (https://www.deepmind.com/blog/a-generalist-agent).\n\n      '\n    value: 2022-05-12\n  datasheet: none\n  dependencies:\n  - MassiveText\n  description: 'The Gato datasets are a collection of data used to train the Gato\n    model.\n\n    '\n  excluded:\n    explanation: 'No specific filtering is mentioned in the Gato paper.\n\n      '\n    value: none\n  feedback:\n    explanation: 'There is no mention on feedback mechanisms either internally or\n      externally.\n\n      '\n    value: none\n  included: 'The full composition of the dataset across individual sources can be\n    found in the paper.\n\n    '\n  intended_uses:\n    explanation: 'There are no known intended uses of the dataset stated by authors\n      beyond training Gato.\n\n      '\n    value: unknown\n  license:\n    explanation: 'The datasets have individual licenses, but no overall license is\n      mentioned by the authors.\n\n      '\n    value: unknown\n  modality: image, text, robotics trajectories, simulated control tasks\n  monitoring:\n    explanation: 'There is no mention on how DeepMind is internally monitoring the\n      use of the dataset.\n\n      '\n    value: none\n  name: Gato dataset\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses:\n    explanation: 'There are no known prohibited uses of the dataset stated by authors.\n\n      '\n    value: unknown\n  quality_control: unknown\n  sample: []\n  size: 10.5 TB Text, 2.2B Text-Image pairs, 1.5T tokens of simulated control, 500k\n    robotics trajectories\n  type: dataset\n  url: https://www.deepmind.com/blog/a-generalist-agent\n- access:\n    explanation: Models are available for download from the [[Github repository]](https://github.com/deepmind/alphafold)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2021-07-15\n  dependencies:\n  - Protein Data Bank\n  description: AlphaFold2 is a protein language model trained on protein sequences\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: The license is provided in the [[Github repository]](https://github.com/deepmind/alphafold)\n    value: Apache 2.0\n  modality: amino acid sequence; protein structure\n  model_card: none\n  monitoring: ''\n  name: AlphaFold2\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 93M parameters (dense)\n  training_emissions: ''\n  training_hardware:\n    explanation: Described in paper\n    value: 128 TPUv3 cores\n  training_time:\n    explanation: Training takes \"approximately 1 week\" and finetuning takes \"approximately\n      4 days\"\n    value: 11 days\n  type: model\n  url: https://www.nature.com/articles/s41586-021-03819-2\n- access:\n    explanation: 'The model has not been released and no discussion of release is\n      stated in the model card [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E).\n\n      '\n    value: closed\n  analysis: 'Model performance was evaluated on image and video datasets primarily,\n    including dialogue.\n\n    '\n  created_date:\n    explanation: 'The date that Flamingo was posted to arXiv [[arXiv]] (https://arxiv.org/abs/2204.14198).\n\n      '\n    value: 2022-04-29\n  dependencies:\n  - M3W\n  - ALIGN\n  - LTIP\n  - VTP\n  - Chinchilla\n  description: 'Flamingo is a Visual Language Model using the Transformer architecture\n    that is intended for few-shot learning.\n\n    '\n  feedback:\n    explanation: 'No contact information is provided for feedback in the model card\n      [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E).\n\n      '\n    value: none\n  intended_uses: 'The intended uses are stated in the model card: \"The primary use\n    is research on visual language models (VLM), including: research on VLM applications\n    like classification, captioning or visual question answering, understanding how\n    strong VLMs can contribute to AGI, advancing fairness and safety research in the\n    area of multimodal research, and understanding limitations of current large VLMs.\"\n    [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E).\n\n    '\n  license:\n    explanation: 'No license is provided in the model card [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E).\n\n      '\n    value: unknown\n  modality: image, text; text\n  model_card: https://arxiv.org/pdf/2204.14198.pdf#appendix.E\n  monitoring:\n    explanation: 'There is no information on how DeepMind is internally monitoring\n      the use of the model.\n\n      '\n    value: unknown\n  name: Flamingo\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: 'The model card lists the following as out of scope uses of the\n    model: \"Uses of the model for visually conditioned language generation in harmful\n    or deceitful settings. Broadly speaking, the model should not be used for downstream\n    applications without further safety and fairness mitigations specific to each\n    application.\" [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E).\n\n    '\n  quality_control:\n    explanation: 'Reported in the mitigations in the model card [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\n      '\n    value: none\n  size: 80B parameters (dense)\n  training_emissions:\n    explanation: 'Authors do not report the training emissions.\n\n      '\n    value: unknown\n  training_hardware:\n    explanation: 'Reported in the paper checklist [[Checklist]](https://arxiv.org/pdf/2204.14198.pdf).\n\n      '\n    value: TPU\n  training_time:\n    explanation: 'Reported in the paper checklist [[Checklist]](https://arxiv.org/pdf/2204.14198.pdf).\n\n      '\n    value: 15 days on 1536 TPUs\n  type: model\n  url: https://arxiv.org/pdf/2204.14198.pdf\n- access:\n    explanation: DeepMind does not provide access to AlphaCode to external researchers\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2022-02-02\n  dependencies: []\n  description: AlphaCode is an autoregressive language model trained on code\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: ''\n    value: unknown\n  modality: text; code\n  model_card: ''\n  monitoring: ''\n  name: AlphaCode\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 41B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2203.07814\n- access:\n    explanation: 'The model access is limited to DeepMind researchers. The model won''t\n      be released to the public [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\n      '\n    value: closed\n  analysis: 'Model performance was evaluated and analyzed on 152 NLP tasks including:\n    Language Modelling (20), Reading Comprehension (3), Fact Checking (3), Question\n    Answering (3), Common Sense (4), MMLU (57), BIG-bench (62) [[Section 4]](https://arxiv.org/pdf/2112.11446.pdf#section.4);\n    on toxicity and bias datasets [[Section 5]](https://arxiv.org/pdf/2112.11446.pdf#section.5);\n    and on dialogue tasks [[Section 6]](https://arxiv.org/pdf/2112.11446.pdf#section.6).\n\n    '\n  created_date:\n    explanation: 'The date that Gopher was announced [[DeepMind Blog Post]] (https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval).\n\n      '\n    value: 2021-12-08\n  dependencies:\n  - MassiveText\n  description: 'Gopher is an autoregressive language model based on the Transformer\n    architecture with two modifications: using RMSNorm instead of LayerNorm and using\n    relative positional encoding scheme instead of absolute positional encodings [[Section\n    3]](https://arxiv.org/pdf/2112.11446.pdf#subsection.3.1).\n\n    '\n  feedback: 'The feedback for the model can be provided at the email linked in the\n    model card, geoffreyi at google.com [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\n    '\n  intended_uses: 'The intended uses are stated in the Gopher model card: \"The primary\n    use is research on language models, including: research on NLP applications like\n    machine translation and question answering, understanding how strong language\n    models can contribute to AGI, advancing fairness and safety research, and understanding\n    limitations of current LLMs\" [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\n    '\n  license:\n    explanation: 'The model likely has a license specifically for DeepMind''s use,\n      based on the information provided in the model card [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\n      '\n    value: unknown\n  modality: text; code\n  model_card: https://arxiv.org/pdf/2112.11446.pdf#appendix.B\n  monitoring:\n    explanation: 'There is no information on how DeepMind is internally monitoring\n      the use of the model.\n\n      '\n    value: unknown\n  name: Gopher\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: 'The model card lists the following as out of scope uses of the\n    model: \"for language generation in harmful or deceitful settings. More generally,\n    the model should not be used for downstream applications without further safety\n    and fairness mitigations\" [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\n    '\n  quality_control: none\n  size: 280B parameters (dense)\n  training_emissions:\n    explanation: 'The training emission estimate from the paper [[Section F]](https://arxiv.org/pdf/2112.11446.pdf#appendix.F)\n\n      '\n    value: 380 tCO2e\n  training_hardware:\n    explanation: 'Reported in the paper [[Section F]](https://arxiv.org/pdf/2112.11446.pdf#appendix.F).\n\n      '\n    value: TPUv3 pods\n  training_time:\n    explanation: 'The authors reported the training petaflops for all of the 4 different\n      sizes of the model. For the 280B parameter model, the petaflops was reported\n      as 6.31E+08. We compute the Gopher''s training time in petaflop/s-day as 6.31E+08\n      / (60*60*24) = 7303.24 petaflop/s-day.\n\n      '\n    value: 7303.24 petaflop/s-day\n  type: model\n  url: https://arxiv.org/pdf/2112.11446.pdf\n- access:\n    explanation: 'The model access is limited to DeepMind researchers. The model won''t\n      be released to the public [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\n      '\n    value: closed\n  analysis: 'Model performance was evaluated and analyzed on many NLP tasks including\n    language modeling, reading comprehension, question answering, commonsense-intensive\n    tasks, and the BIG-Bench and MMLU meta-benchmarks.\n\n    '\n  created_date:\n    explanation: 'The date that Chinchilla was posted on arXiv [[arXiv]] (https://arxiv.org/abs/2203.15556).\n\n      '\n    value: 2022-03-29\n  dependencies:\n  - MassiveText\n  description: 'Chinchilla is an autoregressive language model based on the Transformer\n    architecture with improved scaling laws.\n\n    '\n  feedback: 'The feedback for the model can be provided at the email linked in the\n    model card, {jordanhoffmann, sborgeaud, amensch,sifre} at deepmind.com [[Model\n    Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I).\n\n    '\n  intended_uses: 'The intended uses are stated in the Chinchilla model card: \"The\n    primary use is research on language models, including: research on the scaling\n    behaviour of language models along with those listed in Gopher paper\" [[Model\n    Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I).\n\n    '\n  license:\n    explanation: 'The model likely has a license specifically for DeepMind''s use,\n      based on the information provided in the model card [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I).\n\n      '\n    value: unknown\n  modality: text; code\n  model_card: https://arxiv.org/pdf/2203.15556.pdf\n  monitoring:\n    explanation: 'There is no information on how DeepMind is internally monitoring\n      the use of the model.\n\n      '\n    value: Unknown\n  name: Chinchilla\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: 'The model card lists the following as out of scope uses of the\n    model: \"for language generation in harmful or deceitful settings. More generally,\n    the model should not be used for downstream applications without further safety\n    and fairness mitigations\" [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I).\n\n    '\n  quality_control: none\n  size: 70B parameters (dense)\n  training_emissions:\n    explanation: 'Authors do not report the training emissions.\n\n      '\n    value: Unknown\n  training_hardware:\n    explanation: 'Reported in the paper [[Section 4]](https://arxiv.org/pdf/2203.15556.pdf).\n\n      '\n    value: TPUv3/TPUv4 pods\n  training_time:\n    explanation: 'The authors reported the training petaflops for all models, including\n      hypothetical larger models. For the 70B parameter model, the petaflops was reported\n      as 5.76E+08. We compute the Gopher''s training time in petaflop/s-day as 5.76E+08\n      / (60*60*24) = 6666.66 petaflop/s-day.\n\n      '\n    value: 7303.24 petaflop/s-day\n  type: model\n  url: https://arxiv.org/pdf/2203.15556.pdf\n- access:\n    explanation: 'The model access is limited to DeepMind researchers. The model won''t\n      be released to the public [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\n      '\n    value: closed\n  analysis: 'Model performance was evaluated on simulated and robotics task primarily,\n    including out-of-distribution and skill generalization.\n\n    '\n  created_date:\n    explanation: 'The date that Gato was announced [[DeepMind Blog Post]] (https://www.deepmind.com/blog/a-generalist-agent).\n\n      '\n    value: 2022-05-12\n  dependencies:\n  - Gato dataset\n  description: 'Gato is a generalist agent based on sequence modeling using the Transformer\n    architecture to implement multi-modal, multi-task, multi-embodiment generalist\n    policy.\n\n    '\n  feedback: 'The feedback for the model can be provided at the email linked in the\n    model card, reedscot at google.com [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\n    '\n  intended_uses: 'The intended uses are stated in the Gopher model card: \"Learn to\n    accomplish a wide variety of tasks from expert demonstrations, such as playing\n    video games, controlling simulated embodiments, and real world block stacking.\"\n    [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\n    '\n  license: unknown\n  modality: image, text; image, text, robotics trajectories\n  model_card: https://openreview.net/pdf?id=1ikK0kHjvj#appendix.B\n  monitoring:\n    explanation: 'There is no information on how DeepMind is internally monitoring\n      the use of the model.\n\n      '\n    value: unknown\n  name: Gato\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: 'The model card lists the following as out of scope uses of the\n    model: \"Not intended for commercial or production use. Military uses are strictly\n    prohibited.\" [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\n    '\n  quality_control:\n    explanation: 'Reported in the mitigations in the model card [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\n      '\n    value: none\n  size: 1.2B parameters (dense)\n  training_emissions:\n    explanation: 'Authors do not report the training emissions.\n\n      '\n    value: unknown\n  training_hardware:\n    explanation: 'Reported in the paper [[Section 2.3]](https://openreview.net/pdf?id=1ikK0kHjvj).\n\n      '\n    value: 16x16 TPU v3 slice\n  training_time:\n    explanation: 'Reported in the paper [[Section 2.3]](https://openreview.net/pdf?id=1ikK0kHjvj).\n\n      '\n    value: 4 days on a 16x16 TPU v3 slice\n  type: model\n  url: https://www.deepmind.com/blog/a-generalist-agent\n- access: closed\n  analysis: ''\n  created_date: 2022-09-28\n  dependencies:\n  - Chinchilla\n  - Google Search\n  - Sparrow Rule reward model\n  - Sparrow Preference reward model\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'The asset isn''t released, and hence the license is unknown.\n\n      '''\n    value: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Sparrow\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2209.14375\n- access: closed\n  analysis: ''\n  created_date: 2021-12-08\n  dependencies:\n  - MassiveText\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: RETRO\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7.5B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2112.04426\n- access: closed\n  analysis: ''\n  created_date: 2022-09-28\n  dependencies:\n  - Chinchilla\n  - Sparrow adversarial probing dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Sparrow Rule reward model\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2209.14375\n- access: closed\n  analysis: ''\n  created_date: 2022-09-28\n  dependencies:\n  - Chinchilla\n  - Sparrow response preference dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Sparrow Preference reward model\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2209.14375\n- access: closed\n  analysis: ''\n  created_date: 2022-09-28\n  datasheet: ''\n  dependencies:\n  - Chinchilla\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: Sparrow adversarial probing dataset\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 27k ratings\n  type: dataset\n  url: https://arxiv.org/abs/2209.14375\n- access: closed\n  analysis: ''\n  created_date: 2022-09-28\n  datasheet: ''\n  dependencies:\n  - Chinchilla\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: Sparrow response preference dataset\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 72k comparisons\n  type: dataset\n  url: https://arxiv.org/abs/2209.14375\n- access: closed\n  analysis: ''\n  created_date: 2022-03-16\n  dependencies:\n  - Gopher\n  - Google Search\n  - GopherCite reward model\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: GopherCite\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 280B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf\n- access: closed\n  analysis: ''\n  created_date: 2022-03-16\n  dependencies:\n  - Gopher\n  - GopherCite Preference dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: GopherCite reward model\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf\n- access: closed\n  analysis: ''\n  created_date: 2022-03-16\n  datasheet: ''\n  dependencies:\n  - Gopher\n  - Google Search\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: GopherCite Preference dataset\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 33k response pairs\n  type: dataset\n  url: https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf\n- access: closed\n  analysis: ''\n  created_date: 2022-09-29\n  dependencies:\n  - Chinchilla\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Dramatron\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2209.14958\n- access: open\n  analysis: Evaluated on evaluation trajectories and SoTA baselines using robotic\n    data.\n  created_date: 2023-07-28\n  dependencies:\n  - PaLI-X\n  - PaLM-E\n  - RT-2 action tokens\n  description: RT-2 is a vision-language-action model for robotic actions that incorporates\n    chain of thought reasoning.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text, video; text, robotics trajectories\n  model_card: ''\n  monitoring: ''\n  name: RT-2\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 55B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2307.15818.pdf\n- access: closed\n  analysis: unknown\n  created_date: 2023-11-16\n  dependencies: []\n  description: Lyria is DeepMind's most advanced AI music generation model to date.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; music\n  model_card: none\n  monitoring: ''\n  name: Lyria\n  nationality: UK\n  organization: Google Deepmind\n  prohibited_uses: ''\n  quality_control: worked with artists and music industry to ensure utility\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/\n- access: closed\n  analysis: Evaluated using only out-of-distribution image prompts for qualitative\n    results.\n  created_date: 2024-02-23\n  dependencies: []\n  description: \"Gene is a foundation world model\\_trained from Internet videos\\_that\\\n    \\ can generate an endless variety of playable (action-controllable) worlds from\\\n    \\ synthetic images, photographs, and even sketches.\"\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: image; video\n  model_card: none\n  monitoring: ''\n  name: Genie\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: ''\n  quality_control: ''\n  size: 11B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://sites.google.com/view/genie-2024\n- access: limited\n  analysis: The model was tested and evaluated on various prompts to assess its understanding\n    of natural language, its ability to generate high-quality images in various formats\n    and styles and generate fine details and complex textures. Red teaming and evaluations\n    were conducted on topics including fairness, bias, and content safety.\n  created_date: 2024-05-14\n  dependencies: []\n  description: Imagen 3 is a high-quality text-to-image model, capable of generating\n    images with better detail, richer lighting, and fewer distracting artifacts compared\n    to previous models. Improved understanding of prompts allows for a wide range\n    of visual styles and captures small details from longer prompts. It also understands\n    prompts written in natural, everyday language, making it easier to use. Imagen\n    3 is available in multiple versions, optimized for different types of tasks, from\n    generating quick sketches to high-resolution images.\n  feedback: unknown\n  intended_uses: Generate high-quality images for various purposes, from photorealistic\n    landscapes to textured oil paintings or whimsical claymation scenes. It is useful\n    in situations where detailed visual representation is required based on the textual\n    description.\n  license: unknown\n  modality: text; image\n  model_card: none\n  monitoring: Through digital watermarking tool SynthID embedded in pixels for detection\n    and identification.\n  name: Imagen 3\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: unknown\n  quality_control: Extensive filtering and data labeling were used to minimize harmful\n    content in datasets and reduce the likelihood of harmful outputs. Privacy, safety,\n    and security technologies were leveraged in deploying the model, including watermarking\n    tool SynthID.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://deepmind.google/technologies/imagen-3/\n- access: closed\n  analysis: unknown\n  created_date: 2024-05-14\n  dependencies: []\n  description: Veo is Google DeepMind's most capable video generation model to date.\n    It generates high-quality, 1080p resolution videos that can go beyond a minute,\n    in a wide range of cinematic and visual styles. It accurately captures the nuance\n    and tone of a prompt, and provides an unprecedented level of creative control.\n    The model is also capable of maintaining visual consistency in video frames, and\n    supports masked editing.\n  feedback: Feedback from leading creators and filmmakers is incorporated to improve\n    Veo's generative video technologies.\n  intended_uses: Veo is intended to help create tools that make video production accessible\n    to everyone. It can be used by filmmakers, creators, or educators for storytelling,\n    education and more. Some of its features will be also brought to products like\n    YouTube Shorts.\n  license: unknown\n  modality: text; video\n  model_card: none\n  monitoring: unknown\n  name: Veo\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: unknown\n  quality_control: Videos created by Veo are watermarked using SynthID, DeepMinds\n    tool for watermarking and identifying AI-generated content, and passed through\n    safety filters and memorization checking processes to mitigate privacy, copyright\n    and bias risks.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://deepmind.google/technologies/veo/\n- access: limited\n  analysis: The model was evaluated on various benchmarks like General MMLU, Code\n    Natural2Code, MATH, GPQA, Big-Bench, WMT23, MMMU, and MathVista providing performance\n    across various domains like multilingual translation, image processing, and code\n    generation.\n  created_date: 2024-05-30\n  dependencies: []\n  description: Gemini Flash is a lightweight model, optimized for speed and efficiency.\n    It features multimodal reasoning and a breakthrough long context window of up\n    to one million tokens. It's designed to serve at scale and is efficient on cost,\n    providing quality results at a fraction of the cost of larger models.\n  feedback: none\n  intended_uses: The model is intended for developer and enterprise use cases. It\n    can process hours of video and audio, and hundreds of thousands of words or lines\n    of code, making it beneficial for a wide range of tasks.\n  license: Googles Terms and Conditions\n  modality: audio, image, text, video; text\n  model_card: none\n  monitoring: unknown\n  name: Gemini 1.5 Flash\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: ''\n  quality_control: The research team is continually exploring new ideas at the frontier\n    of AI and building innovative products for consistent progress.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://deepmind.google/technologies/gemini/flash/\n"
  },
  {
    "path": "assets/deepnight.yaml",
    "content": "- access: open\n  analysis: none\n  created_date: 2023-11-04\n  dependencies: []\n  description: SaiLy is a series/collection of AI Models by Deepnight Research which\n    are highly experimental and uncensored.\n  feedback: https://huggingface.co/deepnight-research/saily_100b/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/deepnight-research/saily_100b\n  monitoring: unknown\n  name: SaiLY\n  nationality: International\n  organization: Deepnight Research\n  prohibited_uses: ''\n  quality_control: ''\n  size: 100B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/deepnight-research/saily_100b\n"
  },
  {
    "path": "assets/deepseek.yaml",
    "content": "- access: open\n  analysis: Deepseek and baseline models (for comparison) evaluated on a series of\n    representative benchmarks, both in English and Chinese.\n  created_date: 2023-11-28\n  dependencies: []\n  description: Deepseek is a 67B parameter model with Grouped-Query Attention trained\n    on 2 trillion tokens from scratch.\n  feedback: https://huggingface.co/deepseek-ai/deepseek-llm-67b-base/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/LICENSE-MODEL.\n      Code license is under MIT\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/deepseek-ai/deepseek-llm-67b-base\n  monitoring: unknown\n  name: Deepseek\n  nationality: USA\n  organization: Deepseek AI\n  prohibited_uses: none\n  quality_control: Training dataset comprised of diverse data composition and pruned\n    and deduplicated.\n  size: 67B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/deepseek-ai/DeepSeek-LLM\n- access: open\n  analysis: Deepseek and baseline models (for comparison) evaluated on a series of\n    representative benchmarks, both in English and Chinese.\n  created_date: 2023-11-29\n  dependencies:\n  - Deepseek\n  description: Deepseek Chat is a 67B parameter model initialized from Deepseek and\n    fine-tuned on extra instruction data.\n  feedback: https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://github.com/deepseek-ai/DeepSeek-LLM/blob/main/LICENSE-MODEL.\n      Code license is under MIT\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat\n  monitoring: unknown\n  name: Deepseek Chat\n  nationality: USA\n  organization: Deepseek AI\n  prohibited_uses: none\n  quality_control: Training dataset comprised of diverse data composition and pruned\n    and deduplicated.\n  size: 67B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/deepseek-ai/DeepSeek-LLM\n- access: open\n  analysis: Evaluated on code generation, code completion, cross-file code completion,\n    and program-based math reasoning across standard benchmarks.\n  created_date: 2023-11-03\n  dependencies: []\n  description: Deepseek Coder is composed of a series of code language models, each\n    trained from scratch on 2T tokens, with a composition of 87% code and 13% natural\n    language in both English and Chinese.\n  feedback: https://huggingface.co/deepseek-ai/deepseek-coder-33b-base/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL.\n      Code license is under MIT\n    value: custom\n  modality: text; code\n  model_card: https://huggingface.co/deepseek-ai/deepseek-coder-33b-base\n  monitoring: unknkown\n  name: Deepseek Coder\n  nationality: USA\n  organization: Deepseek AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 33B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8 NVIDIA A100 GPUs and 8 NVIDIA H800 GPUs\n  training_time: unknown\n  type: model\n  url: https://github.com/deepseek-ai/DeepSeek-Coder\n"
  },
  {
    "path": "assets/dibt.yaml",
    "content": "- access: open\n  analysis: none\n  created_date: 2024-02-27\n  datasheet: https://huggingface.co/datasets/DIBT/10k_prompts_ranked\n  dependencies: []\n  description: 10k_prompts_ranked is a dataset of prompts with quality rankings created\n    by 314 members of the open-source ML community using Argilla, an open-source tool\n    to label data.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/DIBT/10k_prompts_ranked/discussions\n  included: ''\n  intended_uses: Training and evaluating language models on prompt ranking tasks and\n    as a dataset that can be filtered only to include high-quality prompts. These\n    can serve as seed data for generating synthetic prompts and generations.\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: 10k_prompts_ranked\n  nationality: USA\n  organization: Data is Better Together\n  prohibited_uses: This dataset only contains rankings for prompts, not prompt/response\n    pairs so it is not suitable for direct use for supervised fine-tuning of language\n    models.\n  quality_control: ''\n  sample: []\n  size: 10k examples\n  type: dataset\n  url: https://huggingface.co/blog/community-datasets\n"
  },
  {
    "path": "assets/duckduckgo.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-03-08\n  dependencies:\n  - Anthropic API\n  description: The first Instant Answer in DuckDuckGo search results to use natural\n    language technology to generate answers to search queries using Wikipedia and\n    other related sources\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: DuckAssist\n  nationality: USA\n  organization: DuckDuckGo\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://spreadprivacy.com/duckassist-launch/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/duolingo.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-03-14\n  dependencies:\n  - GPT-4 API\n  description: Explain My Answer offers learners the chance to learn more about their\n    response in a lesson (whether their answer was correct or incorrect!) By tapping\n    a button after certain exercise types, learners can enter a chat with Duo to get\n    a simple explanation on why their answer was right or wrong, and ask for examples\n    or further clarification.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Users are granted a limited license per Duolingo Terms.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Duolingo Explain My Answer\n  nationality: USA\n  organization: Duolingo\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.duolingo.com/terms\n  type: application\n  url: https://blog.duolingo.com/duolingo-max/\n  user_distribution: ''\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-14\n  dependencies:\n  - Duolingo Role Play\n  - Duolingo Explain My Answer\n  description: Duolingo Max is a new subscription tier above Super Duolingo that gives\n    learners access to two brand-new features and exercises - Explain My Answer and\n    Roleplay.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Users are granted a limited license per Duolingo Terms.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Duolingo Max\n  nationality: USA\n  organization: Duolingo\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://blog.duolingo.com/duolingo-max/\n  user_distribution: ''\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-14\n  dependencies:\n  - GPT-4 API\n  description: \"Roleplay allows learners to practice real-world conversation skills\\\n    \\ with world characters in the app. These challenges, which earn XP, will live\\\n    \\ alongside the path as one of the \\u201CSide Quests\\u201D learners can access\\\n    \\ by tapping on the character. What will you talk about? We\\u2019ll guide you\\\n    \\ through different scenarios! Learners might discuss future vacation plans with\\\n    \\ Lin, order coffee at a caf\\xE9 in Paris, go furniture shopping with Eddy, or\\\n    \\ ask a friend to go for a hike.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: '\"Subject to the terms of these Terms and Conditions, Duolingo grants\n      you a non-transferable, non-exclusive license to download, install, and use\n      one copy of each App in object code form only on an interactive wireless device\n      that you own or control\" - excerpt from the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Duolingo Role Play\n  nationality: USA\n  organization: Duolingo\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.duolingo.com/terms\n  type: application\n  url: https://blog.duolingo.com/duolingo-max/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/eleutherai.yaml",
    "content": "- access:\n    explanation: 'The dataset is freely available to the public and can be downloaded\n      from The Eye [[The Pile]](https://mystic.the-eye.eu/public/AI/pile/).\n\n      '\n    value: open\n  analysis: 'Analyses of the data''s composition, document statistics, language/dialectal\n    coverage, topical distribution, and biases are conducted are conducted in the\n    paper [[The Pile Paper]](https://arxiv.org/pdf/2101.00027.pdf).\n\n    '\n  created_date: 2021-01-01\n  datasheet: https://arxiv.org/pdf/2201.07311.pdf\n  dependencies: []\n  description: 'A large language model training dataset, used to train GPT-NeoX-20B.\n\n    '\n  excluded: 'Authors report that they have excluded some datasets \"because they were\n    too small to be worth spending time or because the English component of the data\n    did not merit inclusion on its own. Three datasets were excluded for other reasons:\n    (1) US Congressional Records were excluded because it \"reflects the opinions and\n    biases of the political class over the past 200 years, including segregationism\n    and xenophobia.\" (2) Online Fanfiction resources amounting to Hundreds of GiB\n    were excluded on logistical grounds. (3) Literotica, platform where users can\n    upload short-form erotic fiction, was excluded because the authors decided to\n    exclude fanfiction, the corpus would require significant investigation, and corpus\n    contain significant amount of stereotyping [[Appendix B]](https://arxiv.org/pdf/2101.00027.pdf).\n\n    '\n  feedback: 'Feedback can be given by emailing the authors at contact at eleuther.ai.\n\n    '\n  included: 'The Pile data come from 22 sources, with over half of the data being\n    from Common Crawl (Pile-CC; 227GB), fiction and nonfiction books (Books3; 101GB),\n    biomedical articles (PubMed Central; 90GB), and code (Github; 95 GB). Refer to\n    the paper for full decomposition [[Table 1]](https://arxiv.org/pdf/2101.00027.pdf#table.caption.2).\n\n    '\n  intended_uses: 'The Pile was intended to be used as a high quality large text dataset\n    for language modeling tasks, explained in more detail in the paper [[Section 1]](https://arxiv.org/pdf/2101.00027.pdf#section.1).\n\n    '\n  license:\n    explanation: As indicated in the paper.\n    value: MIT\n  modality: code, text\n  monitoring: none\n  name: The Pile\n  nationality: USA\n  organization: EleutherAI\n  prohibited_uses: none\n  quality_control: 'In addition to the data inclusion and exclusion decisions, the\n    quality was controlled through filtering for English (pycld2 language classifier),\n    filtering for documents similar to OpenWebText2 (classifier on CommonCrawl), and\n    several forms of deduplication as detailed in the paper [[Appendix C]](https://arxiv.org/pdf/2101.00027.pdf#appendix.1.C)\n    [[Appendix D]](https://arxiv.org/pdf/2101.00027.pdf#appendix.1.D).\n\n    '\n  sample:\n  - \"...pot trending topics and the coverage around them. First up, there\\u2019s a\\\n    \\ bit of a visual redesign. Previously, clicking on a trending topic would highlight\\\n    \\ a story from one publication, and you\\u2019d have to scroll down past a live\\\n    \\ video section to view related stories. Facebook is replacing that system with\\\n    \\ a simple carousel, which does a better job of showing you different coverage\\\n    \\ options. To be clear, the change doesn\\u2019t affect how stories are sourced,\\\n    \\ according to Facebook. It\\u2019s still the same algorithm pickin...\"\n  - Total knee arthroplasty (TKA) is a promising treatment for endstage osteoarthritis\n    (OA) of the knee for alleviating pain and restoring the function of the knee.\n    Some of the cases with bilateral TKA are symptomatic, necessitating revision arthroplasty\n    in both the knees. A bilateral revision TKA can be done ei\n  - On the converse, the set-valued map $\\Phi:[0,3]\\rightrightarrows [0,3]$ $$\\Phi(x):=\\left\\{\\begin{array}{ll}\n    \\{1\\} & \\mbox{ if } 0\\leq x<1\\\\ {}[1,2] & \\mbox{ if } 1\\leq x\\leq 2\\\\ \\{2\\} &\n  - \"This Court thus uses the same interpretation of V.R.C.P. 52(a) as it did *487\\\n    \\ under the previous statutory requirement found in 12 V.S.A. \\xA7 2385.  In essense,\\\n    \\ the defendants urge that this Court should reconsider the case of Green Mountain\\\n    \\ Marble Co. v. Highway Board, supra, and follow the Federal practice of looking\\\n    \\ to the evide\"\n  size: 825 GB\n  type: dataset\n  url: https://arxiv.org/pdf/2101.00027.pdf\n- access:\n    explanation: 'The model can be downloaded for free from [[The Eye]](https://mystic.the-eye.eu/public/AI/GPT-J-6B/step_383500.tar.zstd)\n\n      '\n    value: open\n  analysis: ''\n  created_date:\n    explanation: Date model blog post was published\n    value: 2021-06-04\n  dependencies:\n  - The Pile\n  description: GPT-J is an open-source autoregressive language model.\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'As indicated in the [[Github repository]](https://github.com/kingoflolz/mesh-transformer-jax)\n\n      '\n    value: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: GPT-J\n  nationality: USA\n  organization: EleutherAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 6B parameters (dense)\n  training_emissions: ''\n  training_hardware: 'TRC (Unspecified # of TPU v3-8s)'\n  training_time: ''\n  type: model\n  url: https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/\n- access:\n    explanation: 'The model can be downloaded for free from [[The Eye]](https://mystic.the-eye.eu/public/AI/gptneo-release/GPT3_2-7B/)\n\n      '\n    value: open\n  analysis: ''\n  created_date:\n    explanation: Date Github repo was update\n    value: 2021-03-21\n  dependencies:\n  - The Pile\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: GPT-Neo\n  nationality: USA\n  organization: EleutherAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 2.7B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://github.com/EleutherAI/gpt-neo\n- access:\n    explanation: 'The model can be downloaded for free The Eye [[GPT-NeoX-20B]](https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/).\n\n      '\n    value: open\n  analysis: 'The model was evaluated on standard NLP benchmarks: LAMBADA, ANLI, HellaSwag,\n    MMLU among others [[Section 4]](http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf#section.4).\n\n    '\n  created_date: 2022-02-02\n  dependencies:\n  - The Pile\n  description: 'GPT-NeoX (20B) is an open-sourced autoregressive language model.\n\n    '\n  feedback: 'Feedback can be provided using the  # 20b channel in EleutherAI Discord\n    group [[EleutherAI Blog Post]](https://blog.eleuther.ai/announcing-20b/). Find\n    the Discord link in the FAQ page [[FAQ]](https://www.eleuther.ai/faq/).\n\n    '\n  intended_uses: 'As stated in the model card: \"GPT-NeoX-20B learns an inner representation\n    of the English language that can be used to extract features useful for downstream\n    tasks. The model is best at what it was pretrained for however, which is generating\n    text from a prompt. Due to the generality of the pretraining set, it has acquired\n    the ability to generate completions across a wide range of tasks - from programming\n    to fiction writing [[Model Card]](https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md).\"\n\n    '\n  license:\n    explanation: 'As indicated in the accompanying blog post [[EleutherAI Blog Post]](https://blog.eleuther.ai/announcing-20b/).\n\n      '\n    value: Apache 2.0\n  modality: text; text, code\n  model_card: https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md\n  monitoring: none\n  name: GPT-NeoX\n  nationality: USA\n  organization: EleutherAI\n  prohibited_uses: none\n  quality_control: none\n  size: 20B parameters (dense)\n  training_emissions:\n    explanation: 'The amount of emission during the development and training of the\n      model based on the author''s estimation [[Section 6.4]](http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf#subsection.6.4).\n\n      '\n    value: 31.73 tCO2e\n  training_hardware:\n    explanation: 'As outline by the authors [[Section 2.3]](http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf#subsection.2.3)\n\n      '\n    value: 12 x 8 A100 GPUs\n  training_time:\n    explanation: 'Training time was reported as 1830 hours reported by the authors,\n      equaling 76.25 days. [[Section 6.4]](http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf#subsection.6.4).\n      The authors report that 96 (12 * 8) A100 GPUs were used during the training.\n      The A100 GPUs have a single precision performance of 0.0195 petaflops [[A100\n      Datasheet]](https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet-us-nvidia-1758950-r4-web.pdf).\n      Assuming the estimated utilization is 33%, following [[OpenAI AI and Computer\n      Blog]](https://openai.com/blog/ai-and-compute/#addendum), the training time\n      is 47.10 petaflop/s-day (76.25 * 96 * 0.0195 * 0.33).\n\n      '\n    value: 47.10 petaflop/s-day\n  type: model\n  url: http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf\n- access:\n    explanation: 'GooseAI API can be accessed by signing up on the goose.ai website.\n\n      '\n    value: limited\n  adaptation: unknown\n  created_date: unknown\n  dependencies:\n  - GPT-NeoX\n  description: 'GooseAI API is an API service providing access to NLP services.\n\n    '\n  failures: unknown\n  feedback:\n    explanation: 'In the \"Error Reporting and Feedback\" section of the Goose.ai Terms\n      of Service, GooseAI asks all the feedback to be sent to support at goose.ai\n      [[GooseAI Terms of Service]](https://goose.ai/docs/tos).\n\n      '\n    value: Email support\n  intended_uses: 'Intended to be used as an NLP infrastructure.\n\n    '\n  license:\n    explanation: \"\\\"Subject to Customer\\u2019s strict compliance with this TOS, GooseAI\\\n      \\ grants Customer a limited, non-exclusive, non-transferable, non-sublicensable,\\\n      \\ revocable license to access and use the Platform as described in and subject\\\n      \\ to this TOS\\\" - excerpt from the Terms of Service document.\\n\"\n    value: custom\n  monitoring:\n    explanation: 'In the \"GooseAI Monitoring and Enforcement\" section of GooseAI''s\n      Acceptable Use Policy (AUP), it is stated that Goose.AI has the right to investigate\n      any suspected violation of its AUP [[GooseAI Acceptable Use Policy]](https://goose.ai/docs/aup).\n\n      '\n    value: At will monitoring by the provider\n  monthly_active_users: unknown\n  name: GooseAI API\n  nationality: USA\n  organization: GooseAI\n  output_space:\n    explanation: 'Question/Answer and Classification tasks are coming soon according\n      to GooseAI [[Main Page]](goose.ai).\n\n      '\n    value: Text Generation, Text Completion\n  prohibited_uses:\n    explanation: 'Prohibited uses are detailed in the Acceptable Use Policy [[GooseAI\n      Acceptable Use Policy]](https://goose.ai/docs/aup).\n\n      '\n    value: 'Illegal or abusive activity, security violations, network abuse\n\n      '\n  quality_control: unknown\n  terms_of_service: https://goose.ai/docs/tos\n  type: application\n  url: goose.ai\n  user_distribution: unknown\n- access: open\n  analysis: Evaluated by human testers rating alignment of text input, image output\n    pairs.\n  created_date: 2022-09-04\n  dependencies:\n  - VQGAN\n  - CLIP\n  description: VQGAN-CLIP is a model that better generates and edits images using\n    a multimodal encoder to guide image generation.\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: text; image\n  model_card: ''\n  monitoring: ''\n  name: VQGAN-CLIP\n  nationality: USA\n  organization: EleutherAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 227M parameters (dense)\n  training_emissions: unknown\n  training_hardware: 1 NVIDIA Tesla K80 GPU\n  training_time: Less than 1 V100-hour\n  type: model\n  url: https://arxiv.org/pdf/2204.08583.pdf\n- access: open\n  analysis: Evaluated on a variety of NLP benchmarks and found to perform similarly\n    to OPT and BLOOM models.\n  created_date: 2023-05-31\n  dependencies:\n  - The Pile\n  description: A suite of 16 LLMs all trained on public data seen in the exact same\n    order and ranging in size from 70M to 12B parameters\n  feedback: https://huggingface.co/EleutherAI/pythia-6.9b/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/EleutherAI/pythia-12b\n  monitoring: ''\n  name: Pythia\n  nationality: unknown\n  organization: Eleuther AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 12B parameters (dense)\n  training_emissions: ''\n  training_hardware: 64 A100 GPUs\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2304.01373.pdf\n- access: open\n  analysis: Evaluated on math benchmarks in comparison to general large language models.\n  created_date: 2023-10-16\n  dependencies:\n  - Proof Pile 2\n  - Code LLaMA\n  description: Llemma is a large language model for mathematics.\n  feedback: https://huggingface.co/EleutherAI/llemma_34b/discussions\n  intended_uses: ''\n  license: LLaMA 2\n  modality: text; text\n  model_card: https://huggingface.co/EleutherAI/llemma_34b\n  monitoring: none\n  name: Llemma\n  nationality: unknown\n  organization: Princeton University, Eleuther AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 34B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 256 A100 40GB GPUs\n  training_time: 47k A100 hours\n  type: model\n  url: https://arxiv.org/pdf/2310.10631.pdf\n- access: open\n  analysis: ''\n  created_date: 2023-10-16\n  datasheet: https://huggingface.co/datasets/EleutherAI/proof-pile-2\n  dependencies:\n  - Common Crawl\n  - OpenWebMath\n  - Arxiv\n  - RedPajama-Data\n  description: Proof Pile 2 is a corpus for language modeling of mathematics.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/EleutherAI/proof-pile-2/discussions\n  included: scientific papers, web data containing mathematics, mathematical code\n  intended_uses: ''\n  license: MIT\n  modality: text\n  monitoring: none\n  name: Proof Pile 2\n  nationality: unknown\n  organization: Princeton University, Eleuther AI\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 55B tokens\n  type: dataset\n  url: https://arxiv.org/pdf/2310.10631.pdf\n- access: open\n  analysis: The models were evaluated on SuperGLUE, CodeXGLUE, as well as MMLU and\n    Bigbench Hard. Comparisons were made with T5v1.1 and found that Pile-T5 models\n    performed better in most conditions.\n  created_date: 2024-04-15\n  dependencies:\n  - The Pile\n  - T5x\n  - LLaMA\n  - umT5\n  description: Pile-T5 is a version of the broadly used T5 model, but improved to\n    eliminate weaknesses such as the omission of crucial code-related tokens. It utilizes\n    LLaMA tokenizer and is trained on the Pile, offering enhancements for finetuning\n    on downstream tasks, particularly those involving code.\n  feedback: unknown\n  intended_uses: The model is aimed at downstream tasks that benefit from the encoder-decoder\n    architecture. Particularly useful for tasks involving code.\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: unknown\n  name: Pile-T5\n  nationality: USA\n  organization: EleutherAI\n  prohibited_uses: unknown\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: 2 million steps\n  type: model\n  url: https://blog.eleuther.ai/pile-t5/\n"
  },
  {
    "path": "assets/ens.yaml",
    "content": "- access:\n    explanation: Dataset, evaluation code and models are publicly available at the\n      [[HowTo100M dataset webpage]](https://www.di.ens.fr/willow/research/howto100m/).\n    value: open\n  analysis:\n    explanation: See [[Experiments]](https://arxiv.org/pdf/1906.03327v2.pdf#section.5)\n    value: Authors use the dataset to learn a joint text-video embedding by leveraging\n      more than 130M video clip-caption pairs. They then evaluate the learned embeddings\n      on the tasks of localizing steps in instructional videos of CrossTask and textbased\n      video retrieval on YouCook2, MSR-VTT and LSMDC datasets. They show that their\n      learned embedding can perform better compared to models trained on existing\n      carefully annotated but smaller video description datasets.\n  created_date:\n    explanation: The date the [[paper]](https://arxiv.org/abs/1906.03327) was submitted.\n    value: 2019-06-07\n  datasheet: https://arxiv.org/pdf/1906.03327.pdf\n  dependencies:\n  - YouTube\n  description: HowTo100M is a large-scale dataset of narrated videos with an emphasis\n    on instructional videos where content creators teach complex tasks with an explicit\n    intention of explaining the visual content on screen. HowTo100M features a total\n    of 136M video clips with captions sourced from 1.2M Youtube videos (15 years of\n    video) and 23k activities from domains such as cooking, hand crafting, personal\n    care, gardening or fitness.\n  excluded:\n    explanation: See [[Data collection]](https://arxiv.org/pdf/1906.03327v2.pdf#subsection.3.1)\n    value: Categories such as Relationships and Finance and Business, that may be\n      more abstract, are excluded. Videos with less than 100 views are removed. Authors\n      also ignore videos that have less than 100 words. Videos longer than 2,000 seconds\n      are removed. As some videos may appear in several tasks, the videos are deduplicated\n      based on YouTube IDs.\n  feedback: ''\n  included:\n    explanation: See [[Data collection]](https://arxiv.org/pdf/1906.03327v2.pdf#subsection.3.1)\n    value: The dataset features 1.22 million videos from YouTube with a primary focus\n      on videos containing \"visual tasks\", that involve some interaction with the\n      physical world (e.g. Making peanut butter, Pruning a tree) as compared to others\n      that are more abstract (e.g. Ending a toxic relationship, Choosing a gift).\n      To obtain predominantly visual tasks, the authors limit them to one of 12 categories\n      (Food and Entertaining, Home and Garden, Hobbies and Crafts, Cars & Other Vehicles,\n      Pets and Animals, Holidays and Traditions, Personal Care and Style, Sports and\n      Fitness, Health, Education and Communications, Arts and Entertainment, Computers\n      and Electronics). They also restrict to the top 200 YouTube search results,\n      as the latter ones may not be related to the query task.\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text, video\n  monitoring: ''\n  name: HowTo100M\n  nationality: unknown\n  organization: \"\\xC9cole Normale Sup\\xE9rieure, Inria\"\n  prohibited_uses:\n    explanation: See [[HowTo100M dataset webpage]](https://www.di.ens.fr/willow/research/howto100m/)\n    value: 'No uses are explicitly prohibited by the authors. They note the following\n      limitations of the dataset: \"We note that the distribution of identities and\n      activities in the HowTo100M dataset may not be representative of the global\n      human population and the diversity in society. Please be careful of unintended\n      societal, gender, racial and other biases when training or deploying models\n      trained on this data.\"\n\n      '\n  quality_control: ''\n  sample: []\n  size: 136M video clips\n  type: dataset\n  url: https://arxiv.org/pdf/1906.03327.pdf\n"
  },
  {
    "path": "assets/epfl.yaml",
    "content": "- access: open\n  analysis: Evaluated on TruthfulQA as main evaluation benchmark.\n  created_date: 2023-11-27\n  dependencies: []\n  description: Meditron is a large-scale medical LLM that remains open-source.\n  feedback: https://huggingface.co/epfl-llm/meditron-7b/discussions\n  intended_uses: Medical exam question answering, supporting differential diagnosis,\n    disease information.\n  license: LLaMA 2\n  modality: text; text\n  model_card: https://huggingface.co/epfl-llm/meditron-70b\n  monitoring: none\n  name: MediTron\n  nationality: unknown\n  organization: EPFL, Idiap Research Institute, OpenAssistant, Yale\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.16079.pdf\n"
  },
  {
    "path": "assets/evolutionaryscale.yaml",
    "content": "- access: open\n  analysis: The model was tested in the generation of a new green fluorescent protein.\n    Its effectiveness was compared to natural evolutionary processes, and it was deemed\n    to simulate over 500 million years of evolution.\n  created_date: 2024-06-25\n  dependencies: []\n  description: ESM3 is the first generative model for biology that simultaneously\n    reasons over the sequence, structure, and function of proteins. It is trained\n    across the natural diversity of Earth, reasoning over billions of proteins from\n    diverse environments. It advances the ability to program and create with the code\n    of life, simulating evolution, and making biology programmable. ESM3 is generative,\n    and scientists can guide the model to create proteins for various applications.\n  feedback: Unknown\n  intended_uses: To engineer biology from first principles. It functions as a tool\n    for scientists to create proteins for various applications, including medicine,\n    biology research, and clean energy.\n  license: Unknown\n  modality: text; image, text\n  model_card: unknown\n  monitoring: Unknown though specific measures are not specified.\n  name: ESM3\n  nationality: USA\n  organization: EvolutionaryScale\n  prohibited_uses: Unknown\n  quality_control: The creators have put in place a responsible development framework\n    to ensure transparency and accountability from the start. ESM3 was tested in the\n    generation of a new protein, ensuring its quality and effectiveness.\n  size: 98B parameters (Dense)\n  training_emissions: Unknown\n  training_hardware: unknown\n  training_time: Unknown\n  type: model\n  url: https://www.evolutionaryscale.ai/blog/esm3-release\n"
  },
  {
    "path": "assets/faraday.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date:\n    explanation: The date Faraday Lab's hugging face card was last updated. Date for\n      ARES release is not published on site.\n    value: 2023-04-26\n  dependencies:\n  - Stable Diffusion\n  description: ARES is a text-to-image generator based on Stable Diffusion. The goal\n    is to provide a simple tool with a user interface allowing mainstream AI access\n    for artists and creators.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: ARES\n  nationality: USA\n  organization: Faraday Lab\n  output_space: generated images\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://faradaylab.fr/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/fuse.yaml",
    "content": "- access: open\n  analysis: The FuseChat model was evaluated on MT-Bench which comprises 80 multi-turn\n    dialogues spanning writing, roleplay, reasoning, math, coding, stem, and humanities\n    domains. It yields an average performance of 66.52 with specific scores for individual\n    domains available in the leaderboard results.\n  created_date: 2024-02-26\n  dependencies:\n  - Nous Hermes 2\n  - OpenChat 3.5\n  description: FuseChat is a powerful chat Language Learning Model (LLM) that integrates\n    multiple structure and scale-varied chat LLMs using a fuse-then-merge strategy.\n    The fusion is done using two stages\n  feedback: https://huggingface.co/FuseAI/FuseChat-7B-VaRM/discussions\n  intended_uses: FuseChat is intended to be used as a powerful chat bot that takes\n    in text inputs and provides text-based responses. It can be utilized in a variety\n    of domains including writing, roleplay, reasoning, math, coding, stem, and humanities.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/FuseAI/FuseChat-7B-VaRM\n  monitoring: unknown\n  name: FuseChat\n  nationality: USA\n  organization: FuseAI\n  prohibited_uses: unknown\n  quality_control: none\n  size: 7B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/abs/2402.16107\n"
  },
  {
    "path": "assets/genmo.yaml",
    "content": "- access:\n    explanation: open state-of-the-art video generation model... The weights and architecture\n      for Mochi 1 are open\n    value: open\n  analysis: Mochi 1 sets a new best-in-class standard for open-source video generation.\n    It also performs very competitively with the leading closed models... We benchmark\n    prompt adherence with an automated metric using a vision language model as a judge\n    following the protocol in OpenAI DALL-E 3. We evaluate generated videos using\n    Gemini-1.5-Pro-002.\n  created_date: 2025-01-14\n  dependencies:\n  - DDPM\n  - DreamFusion\n  - Emu Video\n  - T5-XXL\n  description: Mochi 1 is an open-source video generation model designed to produce\n    high-fidelity motion and strong prompt adherence in generated videos, setting\n    a new standard for open video generation systems.\n  feedback: unknown\n  intended_uses: Advance the field of video generation and explore new methodologies.\n    Build innovative applications in entertainment, advertising, education, and more.\n    Empower artists and creators to bring their visions to life with AI-generated\n    videos. Generate synthetic data for training AI models in robotics, autonomous\n    vehicles and virtual environments.\n  license:\n    explanation: We're releasing the model under a permissive Apache 2.0 license.\n    value: Apache 2.0\n  modality:\n    explanation: Mochi 1 generates smooth videos... Measures how accurately generated\n      videos follow the provided textual instructions\n    value: text; video\n  model_card: unknown\n  monitoring: unknown\n  name: Mochi 1\n  nationality: USA\n  organization: Genmo\n  prohibited_uses: unknown\n  quality_control: robust safety moderation protocols in the playground to ensure\n    that all video generations remain safe and aligned with ethical guidelines.\n  size:\n    explanation: featuring a 10 billion parameter diffusion model\n    value: 10B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.genmo.ai/blog\n"
  },
  {
    "path": "assets/glm.yaml",
    "content": "- access: Open\n  analysis: Evaluations show that GLM-4, 1) closely rivals or outperforms GPT-4 in\n    terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval,\n    2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3)\n    matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms\n    GPT-4 in Chinese alignments as measured by AlignBench.\n  created_date: 2023-07-02\n  dependencies: []\n  description: ChatGLM is an evolving family of large language models that have been\n    developed over time. The GLM-4 language series, includes GLM-4, GLM-4-Air, and\n    GLM-4-9B. They are pre-trained on ten trillions of tokens mostly in Chinese and\n    English and are aligned primarily for Chinese and English usage. The high-quality\n    alignment is achieved via a multi-stage post-training process, which involves\n    supervised fine-tuning and learning from human feedback. GLM-4 All Tools model\n    is further aligned to understand user intent and autonomously decide when and\n    which tool(s) to use.\n  feedback: Unknown\n  intended_uses: General language modeling, complex tasks like accessing online information\n    via web browsing and solving math problems using Python interpreter.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/THUDM/glm-4-9b\n  monitoring: Unknown\n  name: ChatGLM\n  nationality: unknown\n  organization: Team GLM, Zhipu AI, Tsinghua University\n  prohibited_uses: Unknown\n  quality_control: High-quality alignment is achieved via a multi-stage post-training\n    process, which involves supervised fine-tuning and learning from human feedback.\n  size: 9B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://arxiv.org/pdf/2406.12793\n"
  },
  {
    "path": "assets/google.yaml",
    "content": "- access:\n    explanation: https://huggingface.co/datasets/c4\n    value: open\n  analysis: https://arxiv.org/abs/2104.08758\n  created_date:\n    explanation: The date the T5 paper was made public.\n    value: 2019-10-23\n  datasheet: https://huggingface.co/datasets/c4\n  dependencies:\n  - CommonCrawl\n  description: The Colossal Clean Crawled Corpus (C4) is a processed version of Common\n    Crawl to facilitate transfer learning in NLP.\n  excluded: 'Data was filtered for English using langdetect. Further, data was filtered\n    to end in terminal punctuation, to remove short pages (less than 5 sentences),\n    and to remove \"Dirty, Naughty, Obscene or Otherwise Bad Words\".\n\n    '\n  feedback: https://huggingface.co/datasets/c4/discussions\n  included: none\n  intended_uses: To faciliate transfer learning research in NLP.\n  license: ODC-By 1.0\n  modality: text\n  monitoring: none\n  name: C4\n  nationality: USA\n  organization: Google\n  prohibited_uses: none\n  quality_control: 'Data filtering excluded obscene words from a block list as well\n    as short documents and some deduplication was done based on string overlap.\n\n    '\n  sample:\n  - https://huggingface.co/datasets/c4/viewer/en/train\n  size: 750GB\n  type: dataset\n  url: https://arxiv.org/abs/1910.10683\n- access: closed\n  analysis: unknown\n  created_date:\n    explanation: 'The date of the Google product update blog announcing that BERT\n      models were for ranking and featured snippets in Search.\n\n      '\n    value: 2019-11-25\n  datasheet: none\n  dependencies: []\n  description: 'The dataset used to train Internal Google BERT models.\n\n    '\n  excluded: unknown\n  feedback: unknown\n  included:\n    explanation: 'Although we don''t exactly know the contents of the Internal Google\n      BERT dataset, it likely includes contents from web pages and search queries.\n\n      '\n    value: Web pages, and search queries\n  intended_uses:\n    explanation: 'We don''t have an exhaustive list of the intended use cases for\n      the Internal Google BERT dataset, but we know that BERT was used in Google Search.\n\n      '\n    value: unknown\n  license: unknown\n  modality: text\n  monitoring: unknown\n  name: Internal Google BERT dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: unknown\n  quality_control: unknown\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://blog.google/products/search/search-language-understanding-bert/\n- access:\n    explanation: Dataset can be downloaded at [[Download]](https://ai.google.com/research/ConceptualCaptions/download)\n    value: open\n  analysis:\n    explanation: See [[Experimental Results]](https://aclanthology.org/P18-1238.pdf#section.5)\n    value: Authors evaluate the dataset on two image captioning models - RNN-based\n      model and Transformer model, under two experimental conditions - using the training\n      & development sets provided by the MS COCO dataset, versus training & development\n      sets using the Conceptual dataset. They use three different test sets- the blind\n      COCO-C40 test set, the Conceptual test set and the Flickr 1K test set. They\n      present both Human and Automatic evaluation results. Human evaluations indicate\n      that the Conceptual-based models are superior. Automatic models fail to corroborate\n      the human evaluation results. This highlights the weakness of these automatic\n      metrics.\n  created_date:\n    explanation: Due to the lack of information about the exact date, it is assumed\n      to be the 1st of the known month of creation.\n    value: 2018-07-01\n  datasheet: none\n  dependencies: []\n  description: 'A dataset containing 3 million (image-URL, caption) pairs designed\n    for the training and evaluation of machine learned image captioning systems.\n\n    '\n  excluded:\n    explanation: See [[Conceptual Captions Dataset Creation]](https://aclanthology.org/P18-1238.pdf#section.3)\n    value: \"The following filtering steps are applied in the given order:\\n1. Image-based\\\n      \\ Filtering - \\\"It only keeps JPEG images where both dimensions are greater\\\n      \\ than 400 pixels, and the ratio of larger to smaller dimension is no more than\\\n      \\ 2. It excludes images that trigger pornography or profanity detectors. These\\\n      \\ filters discard more than 65% of the candidates.\\\"\\n2. Text-based Filtering\\\n      \\ - \\\"Candidates with no determiner, no noun, or no preposition are discarded;\\\n      \\ candidates with a high noun ratio are also discarded; candidates with a high\\\n      \\ rate of token repetition are discarded; candidates where the first word is\\\n      \\ not capitalized, or with too high capitalized-word ratio are discarded; we\\\n      \\ use a vocabulary VW of 1B token types, appearing at least 5 times in the English\\\n      \\ Wikipedia, and discard candidates that contain tokens that are not found in\\\n      \\ this vocabulary. candidates that score too high or too low on the polarity\\\n      \\ annotations, or trigger the pornography/profanity detectors, are discarded;\\\n      \\ predefined boiler-plate prefix/suffix sequences matching the text are cropped,\\\n      \\ e.g. \\u201Cclick to enlarge picture\\u201D, \\u201Cstock photo\\u201D; we also\\\n      \\ drop text which begins/ends in certain patterns, e.g. \\u201Cembedded image\\\n      \\ permalink\\u201D, \\u201Cprofile photo\\u201D. These filters only allow around\\\n      \\ 3% of the incoming candidates to pass to the later stages.\\\"\\n3. Image&Text-based\\\n      \\ Filtering - \\\"We filter out candidates for which none of the text tokens can\\\n      \\ be mapped to the content of the image. This filter discards around 60% of\\\n      \\ the incoming candidates.\\\"\\n4. Text Transformation with Hypernymization -\\\n      \\ \\\"Noun modifiers of certain types (proper nouns, numbers, units) are removed;\\\n      \\ dates, durations, and preposition-based locations (e.g., \\\"in Los Angeles\\\"\\\n      ) are removed; named-entities are identified, matched against the KG entries,\\\n      \\ and substitute with their hypernym; resulting coordination noun-phrases with\\\n      \\ the same head (e.g., \\\"actor and actor\\\") are resolved into a single-head,\\\n      \\ pluralized form (e.g., \\\"actors\\\"). Around 20% of samples are discarded during\\\n      \\ this transformation. We then cluster all resolved entities (e.g., 2560 \\\"\\\n      actor\\\", \\\"dog\\\", \\\"neighborhood\\\", etc.) and keep only the candidates for which\\\n      \\ all detected types have a count of over 100 (around 55% of the candidates).\\\"\\\n      \\n\"\n  feedback: Feedback can be provided by creating an issue in the [[Conceptual Captions\n    GitHub repository]](https://github.com/google-research-datasets/conceptual-captions)\n    or by emailing at conceptual-captions at google.com\n  included: ''\n  intended_uses: ''\n  license:\n    explanation: '[[Conceptual Captions License]](https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE)\n\n      '\n    value: Conceptual Captions License\n  modality: image, text\n  monitoring: unknown\n  name: Conceptual Captions\n  nationality: USA\n  organization: Google\n  prohibited_uses: unknown\n  quality_control:\n    explanation: See [[Conceptual Captions Dataset Creation]](https://aclanthology.org/P18-1238.pdf#section.3)\n    value: Input candidate (image, caption) pairs pass through several stages of filtering\n      and processing to ensure quality.\n  sample: []\n  size: 3.3M (image, text) pairs\n  type: dataset\n  url: https://aclanthology.org/P18-1238/\n- access:\n    explanation: Dataset is available at [[Conceptual 12M GitHub repository]](https://github.com/google-research-datasets/conceptual-12m).\n    value: open\n  analysis:\n    explanation: See [[Evaluating Vision-and-Language PreTraining Data]](https://arxiv.org/pdf/2102.08981.pdf#section.3)\n    value: 'The dataset is benchmarked against CC3M on two most fundamental V+L tasks:\n      vision-to-language generation and vision-and-language matching, with an emphasis\n      on long-tail visual recognition. The results illustrate the benefit of scaling\n      up pre-training data for vision-and-language tasks, as indicated by the new\n      state-of-the-art results on both the nocaps and Conceptual Captions benchmarks.\n\n      '\n  created_date:\n    explanation: 'The date the [[paper]](https://arxiv.org/abs/2102.08981) was submitted.\n\n      '\n    value: 2021-02-17\n  datasheet: none\n  dependencies: []\n  description: 'A dataset with 12 million image-text pairs specifically meant to be\n    used for vision-and-language pre-training.\n\n    '\n  excluded:\n    explanation: See [[Relaxing filters for higher recall]](https://arxiv.org/pdf/2102.08981.pdf#subsection.2.2)\n    value: 'Some of the filtering steps used in the preparation of Conceptual Captions\n      dataset are relaxed to trade off high-recall for low-precision. The following\n      steps are applied in the given order:\n\n      1. Image-based Filtering - Only keep JPEG images where both dimensions are greater\n      than 400 pixels, and the ratio of larger to smaller dimension is no more than\n      2.5. Exclude images that trigger pornography or profanity detectors.\n\n      2. Text-based Filtering - Allow text between 3 and 256 words in the alt-text.\n      Discard candidates with no noun or no determiner, but permit ones without prepositions.\n      Set the maximum fraction of word repetition allowed to 0.2. Increase the threshold\n      for counting a word type as rare from 5 to 20.\n\n      3. Image&Text-based Filtering - Filter out candidates for which none of the\n      text tokens can be mapped to the content of the image.\n\n      '\n  feedback: Feedback can be provided by creating an issue in the [[Conceptual 12M\n    GitHub repository]](https://github.com/google-research-datasets/conceptual-12m)\n    or by emailing at conceptual-captions at google.com\n  included: ''\n  intended_uses: ''\n  license:\n    explanation: '[[Conceptual Captions License]](https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE)\n\n      '\n    value: Conceptual Captions License\n  modality: image, text\n  monitoring: unknown\n  name: Conceptual 12M\n  nationality: USA\n  organization: Google\n  prohibited_uses: unknown\n  quality_control:\n    explanation: See [[Vision-and-Language Pre-Training Data]](https://arxiv.org/pdf/2102.08981.pdf#section.2)\n    value: Input candidate (image, caption) pairs pass through several stages of filtering\n      and processing to ensure quality. Person-name substitutions are performed in\n      the alt-texts to protect the privacy of individuals in the associated images.\n  sample: []\n  size: 12M (image, text) pairs\n  type: dataset\n  url: https://arxiv.org/pdf/2102.08981.pdf\n- access:\n    explanation: https://huggingface.co/t5-large\n    value: open\n  analysis: https://huggingface.co/t5-base#evaluation\n  created_date:\n    explanation: The date the T5 paper was made public.\n    value: 2019-10-23\n  dependencies:\n  - C4\n  description: Text-To-Text Transfer Transformer (T5) is a model that unifies all\n    NLP tasks under the text-to-text format.\n  feedback: https://huggingface.co/t5-large/discussions\n  intended_uses: NLP tasks\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/t5-base\n  monitoring: none\n  name: T5\n  nationality: USA\n  organization: Google\n  prohibited_uses: unknown\n  quality_control: The T5 paper documents many analyses/ablations that were considered\n    before arriving at the final architecture/training procedure.\n  size: 11B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 1,024 TPU v3 chips (Cloud TPU Pods)\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/abs/1910.10683\n- access: closed\n  analysis: unknown\n  created_date:\n    explanation: 'The date of the Google product update blog announcing that BERT\n      models were for ranking and featured snippets in Search.\n\n      '\n    value: 2019-11-25\n  dependencies:\n  - Internal Google BERT dataset\n  description: 'Internal Google BERT model used to power Google Search products.\n\n    '\n  feedback: unknown\n  intended_uses:\n    explanation: 'We don''t have an exhaustive list of the intended use cases for\n      the Internal Google BERT model, but we know that Google Search was powered by\n      a fine-tuned BERT.\n\n      '\n    value: unknown\n  license: unknown\n  modality: text; text\n  model_card: unknown\n  monitoring: unknown\n  name: Internal Google BERT\n  nationality: USA\n  organization: Google\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://blog.google/products/search/search-language-understanding-bert/\n- access: open\n  adaptation: unknown\n  created_date:\n    explanation: 'The date of the Google product update blog announcing that BERT\n      models were for ranking and featured snippets in Search.\n\n      '\n    value: 2019-11-25\n  dependencies:\n  - Internal Google BERT\n  - MUM\n  description: 'Google Search is Google''s search engine.\n\n    '\n  failures: unknown\n  feedback: 'Feedback can be sent to Google Feedback using the product interface [[Google\n    Feedback]](https://www.google.com/tools/feedback).\n\n    '\n  intended_uses: Searching the web using text, voice or image\n  license: none\n  monitoring: 'It is implied that Google scan uses of its products for spam, malware\n    and illegal content in the [[Term of Service]](https://policies.google.com/terms).\n\n    '\n  monthly_active_users: unknown\n  name: Google Search\n  nationality: USA\n  organization: Google\n  output_space: web page ranking\n  prohibited_uses: 'Prohibited use cases aren''t specifically spelled out for Google\n    search, but several illegal and discouraged use cases are shared in the Respect\n    Others section of the [[Term of Service]](https://policies.google.com/terms).\n\n    '\n  quality_control: unknown\n  terms_of_service: https://policies.google.com/terms\n  type: application\n  url: https://blog.google/products/search/search-language-understanding-bert/\n  user_distribution: unknown\n- access: closed\n  analysis: unknown\n  created_date:\n    explanation: 'The date of the Google company news blog announcing LaMDA [[Google\n      News Blog]](https://blog.google/technology/ai/lamda/).\n\n      '\n    value: 2021-06-18\n  datasheet: none\n  dependencies: []\n  description: 'Infiniset \"is a combination of dialog data from public dialog data\n    and other public web documents\" [[Appendix E]](https://arxiv.org/pdf/2201.08239.pdf#appendix.E).\n\n    '\n  excluded: unknown\n  feedback:\n    explanation: 'Author contact information was not provided.\n\n      '\n    value: none\n  included: 'Included in the dataset are data from \"public forums (0%); C4 data (12.5%\n    ); code documents from sites related to programming like Q&A sites tutorials,\n    etc (12.5%); Wikipedia (English) (12.5%); English web documents (6.25%); and Non-English\n    web documents (6.25%).\"\n\n    '\n  intended_uses:\n    explanation: 'Intended uses of the dataset wasn''t explicitly linked, but it is\n      likely intended for training language models specialized in dialogue.\n\n      '\n    value: unknown\n  license: unknown\n  modality: code, text\n  monitoring: unknown\n  name: Infiniset\n  nationality: USA\n  organization: Google\n  prohibited_uses: 'The prohibited uses for Infiniset weren''t specifically listed,\n    but the Google AI principles inspired safety objectives in [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1)\n    advises avoiding harm, unjust impact and misinformation, among others.\n\n    '\n  quality_control: unknown\n  sample: []\n  size:\n    explanation: 'The size of the dataset is unclear, but it is reported that the\n      dataset \"consists of 2.97B documents and 1.12B dialogs with 13.39B utterances\"\n      [[Appendix E]](https://arxiv.org/pdf/2201.08239.pdf#appendix.E).\n\n      '\n    value: unknown\n  type: dataset\n  url: https://arxiv.org/pdf/2201.08239.pdf\n- access: closed\n  analysis: 'The model performance was analyzed on sensibleness, specificity and interestingness.\n    The model was also analyzed on safety, following metrics derived from Google AI\n    Principles [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1).\n    Finally, the model was analyzed on groundedness, testing its ability to produce\n    responses that can be associated with \"known sources whenever possible [[Section\n    4.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.4.1).\"\n\n    '\n  created_date:\n    explanation: 'The date of the Google company news blog announcing LaMDA [[Google\n      News Blog]](https://blog.google/technology/ai/lamda/).\n\n      '\n    value: 2021-06-18\n  dependencies:\n  - Infiniset\n  description: 'LaMDA stands for Language Models for Dialog Application. It is a transformer\n    based language model trained on dialogue data.\n\n    '\n  feedback:\n    explanation: 'Author contact information was not provided.\n\n      '\n    value: none\n  intended_uses: 'LaMDA is a language model, so it can be used for regular langauge\n    modelling tasks without fine-tuning, but its fine-tuned for dialogue tasks.\n\n    '\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: unknown\n  name: LaMDA\n  nationality: USA\n  organization: Google\n  prohibited_uses: 'The prohibited uses of LaMDA weren''t specifically listed, but\n    the Google AI principles inspired safety objectives in [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1)\n    advises avoiding harm, unjust impact and misinformation, among others.\n\n    '\n  quality_control: 'LaMDA was fine-tuned to predict sensibleness, specificity and\n    interestingness as well as safety. Then, the candidates were filtered out if the\n    model safety predictions were below a certain threshold. The next candidates in\n    the conversation were selected as a combination of these predictions. The model\n    was also fine-tuned for groundedness. The results are shown in [[Figure 5]](https://arxiv.org/pdf/2201.08239.pdf#figure.caption.23).\n\n    '\n  size: 137B parameters (dense)\n  training_emissions:\n    explanation: \"\\\"...total carbon footprint of LaMDA\\u2019s pre-training of the\\\n      \\ largest model is approximately 25.2 tCO2e. The carbon footprint of pre-training\\\n      \\ of smaller models and fine-tuning of all models is approximately 0.7 tCO2e\\\n      \\ ... which brings the total footprint of LaMDA to approximately 26 tCO2e [[Section\\\n      \\ 10]](https://arxiv.org/pdf/2201.08239.pdf#section.10)\\\"\\n\"\n    value: 26 tCO2e\n  training_hardware:\n    explanation: 'Reported in [[Section 10]](https://arxiv.org/pdf/2201.08239.pdf#section.10).\n\n      '\n    value: 1024 TPU-V3 chips\n  training_time:\n    explanation: 'The total number of training flops of LaMDA was reported as 3.55E+23\n      (3.55E+8 petaflops) [[Section 10]](https://arxiv.org/pdf/2201.08239.pdf#section.10),\n      which is equal to 4108.80 = 3.55E+8 / (60 * 60 * 24) petaflop/s-day.\n\n      '\n    value: 4108.80 petaflop/s-day\n  type: model\n  url: https://arxiv.org/pdf/2201.08239.pdf\n- access: closed\n  analysis: unknown\n  created_date:\n    explanation: 'The date of the Google AI blog announcing the details of PaLM [[Google\n      AI Blog]](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html).\n\n      '\n    value: 2022-04-04\n  datasheet: https://arxiv.org/pdf/2204.02311.pdf#appendix.D\n  dependencies:\n  - Infiniset\n  description: 'PaLM dataset \"was created for pre-training language models\" [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D).\n\n    '\n  excluded: 'GitHub repositories with copyleft licenses were excluded. Programming\n    languageges other than the most common 24 were excluded [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3).\n\n    '\n  feedback:\n    explanation: 'Author contact information is shared in the paper [[Paper]](https://arxiv.org/pdf/2204.02311.pdf).\n\n      '\n    value: Contact the authors.\n  included: 'The dataset is based on Infiniset. It included multilingual text containing\n    text from over 100 languages. The breakdown of the data included is as follows:\n    Social media conversations (multilingual) 50, Filtered webpages (multilingual)\n    27%, BooksCorpus (English) 13%, GitHub (code) 5%, Wikipedia (multilingual) 4%,\n    and News (English) 1%. Code was collected from GitHub repositories with appropriate\n    licenses, totalling 96GB of source code [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3).\n\n    '\n  intended_uses:\n    explanation: 'As stated in [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D).\n\n      '\n    value: '\"The dataset was created for pre-training language models by a team of\n      researchers at Google\".\n\n      '\n  license: unknown\n  modality: code, text\n  monitoring: unknown\n  name: PaLM dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: '\"... should not be used for any of the unacceptable language model\n    use cases, e.g., generation of toxic speech\" [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D).\n\n    '\n  quality_control: 'In order to reduce low quality web pages, the web pages were sampled\n    according to a \"quality score\" classifier. Code files were de-duplicated using\n    Levenshtein distance [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3).\n\n    '\n  sample: []\n  size:\n    explanation: 'Dataset size in GB is not reported, but the dataset is reported\n      to have 780 billion tokens [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D).\n      The code portion of the dataset is reported to be 5% totaling a 196GB of source\n      code [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3). It is unclear\n      whether the reported size is before or after de-duplication. Nonetheless, one\n      can estimate the dataset size by multiplying 196GB with 20 = 3.92 TB.\n\n      '\n    value: 3.92 TB\n  type: dataset\n  url: https://arxiv.org/pdf/2204.02311.pdf\n- access:\n    explanation: Weights can be downloaded from [Github](https://github.com/google-research/t5x/blob/main/docs/models.md)\n    value: open\n  analysis: Evaluated on a variety of standard language datasets.\n  created_date:\n    explanation: Date paper was released\n    value: 2022-10-20\n  dependencies:\n  - T5\n  - Muffin\n  - P3\n  - NaturalInstructions-v2\n  - Flan CoT\n  description: Flan-T5 is a version of the T5 language model fine-tuned on instruction\n    data\n  feedback: https://huggingface.co/google/flan-t5-xxl/discussions\n  intended_uses: unknown\n  license:\n    explanation: License on the [[Github repository]](https://github.com/google-research/google-research)\n    value: Apache 2.0\n  modality: text; text\n  model_card: https://arxiv.org/pdf/2210.11416.pdf\n  monitoring: none\n  name: Flan-T5\n  nationality: USA\n  organization: Google\n  prohibited_uses: none\n  quality_control: Across different multitask datasets, templates and formatting were\n    maintained. For the chain-of-thoughts (CoT) data, specific exemplars were used.\n  size: 11B parameters (dense)\n  training_emissions: Unknown\n  training_hardware: 512 v4 TPU Chips\n  training_time: Unknown\n  type: model\n  url: https://arxiv.org/abs/2210.11416\n- access:\n    explanation: Model weights available for download in the [[Github repo]](https://github.com/google-research/google-research/tree/master/ul2)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: Date model paper was released\n    value: 2022-05-10\n  dependencies:\n  - C4\n  description: UL2 is a language model trained with a new pretraining objective\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 20B checkpoints only for three different iteration steps\n    value: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: UL2\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 20B parameters (dense)\n  training_emissions: ''\n  training_hardware: 128 TPUv4\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2205.05131\n- access:\n    explanation: Google does not provide access to Parti for external researchers.\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: Date the model website was made public\n    value: 2022-06-22\n  dependencies:\n  - C4\n  - LAION-400M\n  - FIT400M\n  - JFT-4B\n  description: Parti is a text-to-image diffusion model\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; image\n  model_card: ''\n  monitoring: ''\n  name: Parti\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 20B parameters (dense)\n  training_emissions: ''\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://parti.research.google/\n- access: open\n  analysis: ''\n  created_date:\n    explanation: Date the model website was made public\n    value: 2022-05-23\n  dependencies:\n  - LAION-400M\n  - Google internal image-text dataset\n  description: Imagen is a text-to-image diffusion model\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; image\n  model_card: ''\n  monitoring: ''\n  name: Imagen\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 14B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 128 TPU-v4\n  training_time: unknown\n  type: model\n  url: https://imagen.research.google/\n- access:\n    explanation: Model checkpoints can be downloaded from the [[Github repository]](https://github.com/google-research/google-research/tree/master/vatt)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: Date the model paper was made public\n    value: 2022-04-22\n  dependencies:\n  - AudioSet\n  - HowTo100M\n  description: VATT is a family of models trained on multimodal data\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: License on the [[Github repository]](https://github.com/google-research/google-research)\n    value: Apache 2.0\n  modality: text; audio, video\n  model_card: ''\n  monitoring: ''\n  name: VATT\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 155M parameters (dense)\n  training_emissions: unknown\n  training_hardware: 256 TPU-v3\n  training_time: 3 days\n  type: model\n  url: https://arxiv.org/abs/2104.11178\n- access:\n    explanation: Made available through the PaLM API as of March 14, 2023.\n    value: limited\n  analysis: '\"PaLM is evaluated on English Natural Language Processing (NLP) tasks,\n    tasks from BIG-bench, reasoning tasks, code completion tasks, multilingual generation\n    and question answering tasks, translation tasks, and bias and toxicity benchmarks\"\n    [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E).\n\n    '\n  created_date:\n    explanation: 'The date of the Google AI blog announcing the details of PaLM [[Google\n      AI Blog]](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html).\n\n      '\n    value: 2022-04-04\n  dependencies:\n  - PaLM dataset\n  description: 'PaLM stands Pathways Language Model, \"dense decoder-only Transformer\n    model trained with the Pathways system\" [[Google ai Blog]](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html).\n\n    '\n  feedback:\n    explanation: 'Author contact information is shared in the paper [[Paper]](https://arxiv.org/pdf/2204.02311.pdf).\n\n      '\n    value: Contact the authors.\n  intended_uses: '\"The primary use is research on language models, including: research\n    on NLP applications like machine translation and question answering, advancing\n    fairness and safety research, and understanding limitations of current LLMs. Within\n    Google, PaLM is being used for research on a variety of open- ended text and code\n    generation tasks, including reasoning [[Section 6.3]](https://arxiv.org/pdf/2204.02311.pdf#subsection.6.3)\n    and code synthesis and understanding [[Section 6.4]](https://arxiv.org/pdf/2204.02311.pdf#subsection.6.4)\"\n    [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E).\n\n    '\n  license: unknown\n  modality: text; text, code\n  model_card: https://arxiv.org/pdf/2204.02311.pdf#appendix.E\n  monitoring: unknown\n  name: PaLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: 'The model \"should not be used for downstream applications without\n    further analysis on factors in the proposed downstream application [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E)\"\n\n    '\n  quality_control: Unknown\n  size: 540B parameters (dense)\n  training_emissions:\n    explanation: 'Reported in [[Appendix B]](https://arxiv.org/pdf/2204.02311.pdf#appendix.A)\n\n      '\n    value: 271.43 tCO2\n  training_hardware:\n    explanation: 'Reported in [[Section 4]](https://arxiv.org/pdf/2204.02311.pdf#section.4).\n\n      '\n    value: 6144 TPU v4 chips\n  training_time:\n    explanation: 'Reported in [[Appendix B]](https://arxiv.org/pdf/2204.02311.pdf#appendix.A)\n\n      '\n    value: 29600 petaflop/s-days\n  type: model\n  url: https://arxiv.org/pdf/2204.02311.pdf\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-14\n  dependencies:\n  - PaLM\n  description: \"a new developer offering that makes it easy and safe to experiment\\\n    \\ with Google\\u2019s language models.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: PaLM API\n  nationality: USA\n  organization: Google\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html\n  user_distribution: ''\n- access: closed\n  analysis: ''\n  created_date: 2022-12-26\n  dependencies:\n  - Flan-PaLM\n  - MultiMedQA\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Med-PaLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 540B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2212.13138\n- access: closed\n  analysis: Evaluated on MultiMedBench tasks and radiologist evaluations of model-generated\n    chest X-ray reports\n  created_date: 2023-07-26\n  dependencies:\n  - PaLM-E\n  - MultiMedBench\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text, genome sequence; text\n  model_card: ''\n  monitoring: ''\n  name: Med-PaLM Multimodal\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 562B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2307.14334.pdf\n- access: closed\n  analysis: ''\n  created_date: 2022-12-26\n  dependencies:\n  - MedQA\n  - MedMCQA\n  - PubMedQA\n  - MMLU\n  - LiveQA\n  - Medication QA\n  - HealthSearchQA\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: MultiMedQA\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2212.13138\n- access: closed\n  analysis: ''\n  created_date: 2022-10-20\n  dependencies:\n  - PaLM\n  - Muffin\n  - P3\n  - NaturalInstructions-v2\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Flan-PaLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 540B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2210.11416\n- access: closed\n  analysis: ''\n  created_date: 2022-10-20\n  dependencies:\n  - U-PaLM\n  - Muffin\n  - P3\n  - NaturalInstructions-v2\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Flan-U-PaLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 540B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2210.11416\n- access: open\n  analysis: ''\n  created_date: 2021-09-03\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text\n  monitoring: ''\n  name: Muffin\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 62 tasks\n  type: dataset\n  url: https://arxiv.org/abs/2109.01652\n- access: closed\n  analysis: ''\n  created_date: 2022-10-20\n  dependencies:\n  - PaLM\n  - PaLM dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: U-PaLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 540B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2210.11399\n- access: closed\n  analysis: ''\n  created_date: 2022-08-16\n  dependencies:\n  - PaLM\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown (model weights), Apache 2.0 (SayCan code)\n  modality: text; robotics trajectories\n  model_card: ''\n  monitoring: ''\n  name: PaLM-SayCan\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 540B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2204.01691\n- access: closed\n  analysis: ''\n  created_date: 2021-12-13\n  dependencies:\n  - GLaM Web dataset\n  - Wikipedia\n  - GLaM Conversations dataset\n  - GLaM Forums dataset\n  - BooksCorpus\n  - GLaM News dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: GLaM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.2T parameters (sparse)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2112.06905\n- access: closed\n  analysis: ''\n  created_date: 2021-12-13\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: GLaM Web dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://arxiv.org/abs/2112.06905\n- access: closed\n  analysis: ''\n  created_date: 2021-12-13\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: GLaM Conversations dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://arxiv.org/abs/2112.06905\n- access: closed\n  analysis: ''\n  created_date: 2021-12-13\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: GLaM Forums dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://arxiv.org/abs/2112.06905\n- access: closed\n  analysis: ''\n  created_date: 2021-12-13\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: GLaM News dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://arxiv.org/abs/2112.06905\n- access: closed\n  analysis: ''\n  created_date: 2021-05-18\n  dependencies:\n  - MUM dataset\n  description: MUM (Multitask Unified Model) is a multimodal model that is specialized\n    for more complex queries.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text; text\n  model_card: ''\n  monitoring: ''\n  name: MUM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://blog.google/products/search/introducing-mum/\n- access: closed\n  analysis: ''\n  created_date: 2021-05-18\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text\n  monitoring: ''\n  name: MUM dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://blog.google/products/search/introducing-mum/\n- access: closed\n  analysis: ''\n  created_date: 2023-02-01\n  dependencies:\n  - LAION-400M\n  - Phenaki Video-Text Corpus\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; video\n  model_card: ''\n  monitoring: ''\n  name: Phenaki\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.8B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://openreview.net/pdf?id=vOEXS39nOF\n- access: closed\n  analysis: ''\n  created_date: 2023-02-01\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text, video\n  monitoring: ''\n  name: Phenaki Video-Text Corpus\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 15M text-video pairs at 8FPS\n  type: dataset\n  url: https://openreview.net/pdf?id=vOEXS39nOF\n- access: open\n  analysis: ''\n  created_date: 2023-03-02\n  dependencies:\n  - UL2\n  - Flan Collection\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Flan-UL2\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 20B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2205.05131\n- access: open\n  analysis: ''\n  created_date: 2023-01-31\n  datasheet: ''\n  dependencies:\n  - Flan dataset\n  - P3\n  - NaturalInstructions-v2\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text\n  monitoring: ''\n  name: Flan Collection\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 1836 tasks\n  type: dataset\n  url: https://arxiv.org/abs/2301.13688\n- access: closed\n  analysis: ''\n  created_date: 2023-01-26\n  dependencies:\n  - SoundStream\n  - w2v-BERT\n  - MuLan\n  - MusicLM semantic model\n  - MusicLM acoustic model\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; audio\n  model_card: ''\n  monitoring: ''\n  name: MusicLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.4B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2301.11325.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-01-26\n  dependencies:\n  - Free Music Archive\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio; audio\n  model_card: ''\n  monitoring: ''\n  name: SoundStream\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2301.11325.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-01-26\n  dependencies:\n  - Free Music Archive\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio; audio\n  model_card: ''\n  monitoring: ''\n  name: w2v-BERT\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 600M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2301.11325.pdf\n- access: closed\n  analysis: ''\n  created_date: 2022-08-26\n  dependencies:\n  - AST\n  - BERT\n  - MuLan dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; audio\n  model_card: ''\n  monitoring: ''\n  name: MuLan\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2208.12415\n- access: closed\n  analysis: ''\n  created_date: 2022-08-26\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio, text\n  monitoring: ''\n  name: MuLan dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 370K hours audio\n  type: dataset\n  url: https://arxiv.org/abs/2208.12415\n- access: closed\n  analysis: ''\n  created_date: 2023-01-26\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio\n  monitoring: ''\n  name: MusicLM dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 280K hours audio\n  type: dataset\n  url: https://arxiv.org/pdf/2301.11325.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-01-26\n  dependencies:\n  - MusicLM dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio; audio\n  model_card: ''\n  monitoring: ''\n  name: MusicLM semantic model\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 430M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2301.11325.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-01-26\n  dependencies:\n  - MusicLM dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio; audio\n  model_card: ''\n  monitoring: ''\n  name: MusicLM acoustic model\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 430M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2301.11325.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-08\n  dependencies:\n  - Noise2Music pseudolabel dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknkown\n  modality: audio, text; audio\n  model_card: ''\n  monitoring: ''\n  name: Noise2Music\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://google-research.github.io/noise2music/noise2music.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-08\n  datasheet: ''\n  dependencies:\n  - LaMDA\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: LaMDA-LF\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 150k songs\n  type: dataset\n  url: https://google-research.github.io/noise2music/noise2music.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-08\n  datasheet: ''\n  dependencies:\n  - MusicCaps\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: Rater-LF\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 10k captions\n  type: dataset\n  url: https://google-research.github.io/noise2music/noise2music.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-08\n  datasheet: ''\n  dependencies:\n  - MusicCaps\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: Rater-SF\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 24k captions\n  type: dataset\n  url: https://google-research.github.io/noise2music/noise2music.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-08\n  dependencies:\n  - MuLan\n  - MuLaMCap\n  - LaMDA-LF\n  - Rater-LF\n  - Rater-SF\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio, text; audio\n  model_card: ''\n  monitoring: ''\n  name: Noise2Music pseudolabeler\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://google-research.github.io/noise2music/noise2music.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-08\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio\n  monitoring: ''\n  name: Noise2Music audio dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 340k hours audio\n  type: dataset\n  url: https://google-research.github.io/noise2music/noise2music.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-08\n  datasheet: ''\n  dependencies:\n  - Noise2Music audio dataset\n  - Noise2Music pseudolabeler\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license:\n    explanation: 'The asset isn''t released, and hence the license is unknown.\n\n      '''\n    value: unknown\n  modality: audio, text\n  monitoring: ''\n  name: Noise2Music pseudolabel dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 340k hours audio with pseudolabels\n  type: dataset\n  url: https://google-research.github.io/noise2music/noise2music.pdf\n- access: limited\n  adaptation: ''\n  created_date: 2022-08-25\n  dependencies:\n  - LaMDA\n  description: AI Test Kitchen provides a new way for people to learn about, experience,\n    and give feedback on emerging AI technology, like LaMDA.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: AI Test Kitchen\n  nationality: USA\n  organization: Google\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://blog.google/technology/ai/join-us-in-the-ai-test-kitchen/\n  user_distribution: ''\n- access: closed\n  adaptation: ''\n  created_date: 2023-02-06\n  dependencies:\n  - LaMDA\n  description: Conversational AI service, powered by LaMDA\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Bard\n  nationality: USA\n  organization: Google\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://blog.google/technology/ai/bard-google-ai-search-updates/\n  user_distribution: ''\n- access: closed\n  analysis: ''\n  created_date: 2022-06-29\n  dependencies:\n  - PaLM\n  - arXiv\n  - PaLM dataset\n  - Minerva Math Web Pages dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Minerva\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 540B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2206.14858\n- access: closed\n  analysis: ''\n  created_date: 2022-06-29\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: Minerva Math Web Pages dataset\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 17.5B tokens\n  type: dataset\n  url: https://arxiv.org/abs/2206.14858\n- access: limited\n  analysis: ''\n  created_date: 2023-03-06\n  dependencies:\n  - YT-NLU-U\n  - Pub-U\n  - Web-NTL\n  - YT-SUP+\n  - Pub-S\n  description: Universal Speech Model (USM) is a family of state-of-the-art speech\n    models with 2B parameters trained on 12 million hours of speech and 28 billion\n    sentences of text, spanning 300+ languages. USM, which is for use in YouTube (e.g.,\n    for closed captions), can perform automatic speech recognition (ASR) on widely-spoken\n    languages like English and Mandarin, but also languages like Punjabi, Assamese,\n    Santhali, Balinese, Shona, Malagasy, Luganda, Luo, Bambara, Soga, Maninka, Xhosa,\n    Akan, Lingala, Chichewa, Nkore, Nzema to name a few. Some of these languages are\n    spoken by fewer than twenty million people, making it very hard to find the necessary\n    training data.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio, text; text\n  model_card: ''\n  monitoring: ''\n  name: USM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 2B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2303.01037\n- access: open\n  adaptation: ''\n  created_date: 2005-02-14\n  dependencies:\n  - USM\n  description: YouTube is a global online video sharing and social media platform\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  monitoring: ''\n  monthly_active_users: ''\n  name: YouTube\n  nationality: USA\n  organization: Google\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.youtube.com/\n  user_distribution: ''\n- access: closed\n  analysis: ''\n  created_date: 2023-03-06\n  dependencies:\n  - PaLM\n  - ViT-22B\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text; text\n  model_card: ''\n  monitoring: ''\n  name: PaLM-E\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 562B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2303.03378\n- access: closed\n  analysis: ''\n  created_date: 2023-02-10\n  dependencies:\n  - JFT\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, image\n  model_card: ''\n  monitoring: ''\n  name: ViT-22B\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 22B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2302.05442\n- access: closed\n  analysis: ''\n  created_date: 2022-09-07\n  dependencies:\n  - w2v-BERT\n  - SoundStream\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio, text; audio\n  model_card: ''\n  monitoring: ''\n  name: AudioLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2209.03143\n- access: closed\n  analysis: ''\n  created_date: 2022-09-14\n  dependencies:\n  - mT5\n  - ViT-e\n  - WebLI\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; image\n  model_card: ''\n  monitoring: ''\n  name: PaLI\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 17B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2209.06794\n- access: closed\n  analysis: ''\n  created_date: 2022-09-14\n  dependencies:\n  - JFT\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: image; image\n  model_card: ''\n  monitoring: ''\n  name: ViT-e\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 3.9B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2209.06794\n- access: closed\n  analysis: ''\n  created_date: 2022-09-14\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text\n  monitoring: ''\n  name: WebLI\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 10B images, 12B alt-text\n  type: dataset\n  url: https://arxiv.org/abs/2209.06794\n- access: open\n  analysis: ''\n  created_date: 2023-02-27\n  dependencies:\n  - T5\n  - CLIP\n  - YT-Temporal-1B\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: video; text\n  model_card: ''\n  monitoring: ''\n  name: Vid2Seq\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 500M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2302.14115\n- access: closed\n  analysis: evaluated on DSTC11 Challenge Task, based on MultiWoz 2.1, with a focus\n    on dialog state tracking.\n  created_date: 2023-06-08\n  dependencies:\n  - CTC blank-filtering\n  - Speech2Text adapter\n  description: Joint speech and language model using a Speech2Text adapter and using\n    a CTC-based blank-filtering.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: audio; text\n  model_card: ''\n  monitoring: ''\n  name: Google Joint SLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2306.07944.pdf\n- access: closed\n  analysis: Reports results on standard code benchmarks across a variety of programming\n    languages.\n  created_date: 2023-05-10\n  dependencies:\n  - PaLM 2 dataset\n  description: PaLM 2 is a new state-of-the-art language model that has better multilingual\n    and reasoning capabilities and is more compute-efficient than its predecessor\n    PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives\n    similar to UL2.\n  feedback: Specific queries provided by annotators\n  intended_uses: general use large language model that can be used for language, reasoning,\n    and code tasks.\n  license: unknown\n  modality: text; text\n  model_card: https://ai.google/static/documents/palm2techreport.pdf\n  monitoring: Google internal monitoring\n  name: PaLM 2\n  nationality: USA\n  organization: Google\n  prohibited_uses: becoming part of a general-purpose service or product or use within\n    specific downstream applications without prior assessment\n  quality_control: Employed de-duplication, removal of sensitive-PII and filtering.\n    Added control tokens marking toxicity of text.\n  size: unknown\n  training_emissions: ''\n  training_hardware: TPU v4 (number unspecified)\n  training_time: ''\n  type: model\n  url: https://blog.google/technology/ai/google-palm-2-ai-large-language-model/\n- access: limited\n  analysis: Assessed on medical benchmarks of professional medical exams, medical\n    research, and consumer queries.\n  created_date: 2023-12-13\n  dependencies: []\n  description: MedLM is a collection of foundation models tuned to follow natural\n    language instructions for tasks in medicine, such as question answering and creating\n    draft summaries.\n  feedback: none\n  intended_uses: to be used for question answering and creating draft summaries from\n    existing documentation, to be reviewed, edited, and approved by the user before\n    use.\n  license: unknown\n  modality: text; text\n  model_card: https://cloud.google.com/static/vertex-ai/docs/generative-ai/medlm/MedLM-model-card.pdf\n  monitoring: Google internal monitoring\n  name: MedLM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://cloud.google.com/vertex-ai/docs/generative-ai/medlm/overview\n- access: closed\n  analysis: Evaluated on standard general, reasoning, math, coding, and multimodal\n    benchmarks with results that surpass GPT-4 on almost all.\n  created_date: 2023-12-06\n  dependencies: []\n  description: As of release, Gemini is Google's most capable and flexible AI model,\n    proficient in multimodal domains.\n  feedback: none\n  intended_uses: general use large language model that can be used for language, reasoning,\n    and code tasks.\n  license: unknown\n  modality: text; image, text, video\n  model_card: none\n  monitoring: Google internal monitoring\n  name: Gemini\n  nationality: USA\n  organization: Google\n  prohibited_uses: becoming part of a general-purpose service or product or use within\n    specific downstream applications without prior assessment\n  quality_control: ''\n  size:\n    explanation: Comes in sizes Ultra, Pro, and Nano.\n    value: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://deepmind.google/technologies/gemini/#introduction\n- access: closed\n  analysis: Evaluated on popular time-series benchmarks.\n  created_date: 2024-02-02\n  dependencies: []\n  description: TimesFM is a single forecasting model pre-trained on a large time-series\n    corpus of 100 billion real world time-points.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: ''\n  model_card: none\n  monitoring: unknown\n  name: TimesFM\n  nationality: USA\n  organization: Google\n  prohibited_uses: ''\n  quality_control: ''\n  size: 200M parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html\n- access: open\n  analysis: Evaluation was conducted on standard LLM benchmarks and includes internal\n    red-teaming testing of relevant content policies.\n  created_date: 2024-02-21\n  dependencies: []\n  description: Gemma is a family of lightweight, state-of-the-art open models from\n    Google, based on the Gemini models. They are text-to-text, decoder-only large\n    language models, available in English.\n  feedback: https://huggingface.co/google/gemma-7b/discussions\n  intended_uses: Text generation tasks including question answering, summarization,\n    and reasoning; content creation, communication, research, and education.\n  license:\n    explanation: License can be found at https://ai.google.dev/gemma/terms.\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/google/gemma-7b\n  monitoring: ''\n  name: Gemma\n  nationality: USA\n  organization: Google\n  prohibited_uses: Prohibited uses are specified in the Gemma Prohibited Use Policy\n    here https://ai.google.dev/gemma/prohibited_use_policy\n  quality_control: Multiple evaluations and red-teaming conducted, with particular\n    focus on ethics, bias, fair use cases, and safety.\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: TPUv5e\n  training_time: unknown\n  type: model\n  url: https://blog.google/technology/developers/gemma-open-models/\n- access: closed\n  analysis: Evaluated Med-Gemini on 14 medical benchmarks spanning text, multimodal\n    and long-context applications, establishing new state-of-the-art (SoTA) performance\n    on 10 of them, and surpassing the GPT-4 model family on every benchmark where\n    a direct comparison is viable.\n  created_date: 2024-04-29\n  dependencies:\n  - Gemini\n  - MultiMedBench\n  description: Med-Gemini is a family of highly capable multimodal models that are\n    specialized in medicine with the ability to seamlessly integrate the use of web\n    search, and that can be efficiently tailored to novel modalities using custom\n    encoders.\n  feedback: none\n  intended_uses: To be used in areas of medical research including medical summarization,\n    referral letter generation, and medical simplification tasks.\n  license: unknown\n  modality: image, text; text\n  model_card: none\n  monitoring: ''\n  name: Med-Gemini\n  nationality: USA\n  organization: Google\n  prohibited_uses: Unfit for real-world deployment in the safety-critical medical\n    domain.\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2404.18416\n- access: open\n  analysis: Unknown\n  created_date: 2024-09-05\n  dependencies: []\n  description: Imagen 3 is a high-quality text-to-image model capable of generating\n    images with improved detail, richer lighting, and fewer distracting artifacts.\n    It features improved prompt understanding and can be used to generate a wide array\n    of visual styles from quick sketches to high-resolution images. The model is available\n    in multiple versions, each optimized for particular types of tasks. Imagen 3 has\n    been trained to capture nuances like specific camera angles or compositions in\n    long, complex prompts, making it a versatile tool for image generation from textual\n    inputs.\n  feedback: Unknown\n  intended_uses: Imagen 3 is intended to be used for generation of high-resolution\n    images from textual prompts, from photorealistic landscapes to richly textured\n    oil paintings or whimsical claymation scenes. It can also be used for stylized\n    birthday cards, presentations, and more, due to its improved text rendering capabilities.\n  license: Unknown\n  modality: text; image\n  model_card: unknown\n  monitoring: Unknown\n  name: Imagen 3\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: Unknown\n  quality_control: Unknown\n  size: Unknown\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://deepmind.google/technologies/imagen-3/\n- access: open\n  analysis: The 27B Gemma 2 model outperforms other open models in its size category\n    offering cutting-edge performance. Specific details can be found in the provided\n    technical report.\n  created_date: 2024-06-27\n  dependencies:\n  - Gemma\n  - CodeGemma\n  - RecurrentGemma\n  - PaliGemma\n  description: Gemma 2 is an open model that offers best-in-class performance and\n    runs at incredible speed across different hardware. It easily integrates with\n    other AI tools. This model is built on a redesigned architecture engineered for\n    exceptional performance and inference efficiency. It is available in both 9 billion\n    (9B) and 27 billion (27B) parameter sizes. Gemma 2 is optimized to run at incredible\n    speed across a range of hardware, from powerful gaming laptops and high-end desktops,\n    to cloud-based setups.\n  feedback: Unknown\n  intended_uses: Gemma 2 is designed for developers and researchers for various AI\n    tasks. It can be used via the integrations it offers with other AI tools/platforms\n    and can additionally be deployed for more accessible and budget-friendly AI deployments.\n  license: Gemma (commercially-friendly license given by Google DeepMind)\n  modality: text; text\n  model_card: unknown\n  monitoring: Unknown\n  name: Gemma 2\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: Not specified\n  quality_control: Google DeepMind implemented a refined architecture for Gemma 2.\n    The model has improvements in safety and efficiency over the first generation.\n    The deployment of Gemma 2 on Vertex AI, scheduled for the next month, will offer\n    effortless management of the model.\n  size: 27B parameters (dense)\n  training_emissions: Unknown\n  training_hardware: Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, NVIDIA\n    H100 Tensor Core GPU\n  training_time: Unknown\n  type: model\n  url: https://blog.google/technology/developers/google-gemma-2/\n- access: open\n  analysis: The model nearly matches the performance of the Gemini 1.5 Flash model\n    across many benchmarks through developer feedback and testing.\n  created_date: 2024-10-03\n  dependencies: []\n  description: Gemini 1.5 Flash-8B is a lightweight, speed-optimized variant of the\n    Gemini 1.5 Flash model, designed for efficiency and lower latency especially for\n    tasks such as chat, transcription, and long context language translation. The\n    model is production-ready and offers the lowest cost per intelligence of any Gemini\n    model with improved rate limits.\n  feedback: Encourages developer feedback to inform model improvements and future\n    updates.\n  intended_uses: Designed for simple, high-volume tasks including multimodal use cases\n    and long-context summarization.\n  license: unknown\n  modality: audio, image, text, video; text\n  model_card: unknown\n  monitoring: unknown\n  name: Gemini 1.5 Flash-8B\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: unknown\n  quality_control: Model improvement informed by developer feedback and extensive\n    testing.\n  size: 8B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/\n- access:\n    explanation: The models are accessible via Vertex AI Model Garden [CXR, Derm,\n      Path] and Hugging Face [CXR, Derm, Path].\n    value: open\n  analysis: After giving the community time to use the models and explore different\n    applications, we collected feedback.\n  created_date: 2024-11-25\n  dependencies:\n  - EfficientNet-L2\n  - BERT\n  - CLIP\n  - BLIP-2\n  - BiT ResNet-101x3\n  - ViT-S\n  description: Health AI Developer Foundations (HAI-DEF) is a new suite of open weight\n    models to help developers more easily build AI models for healthcare applications.\n  feedback: unknown\n  intended_uses: Building AI models for healthcare applications including imaging\n    applications in radiology, dermatology, and pathology.\n  license: unknown\n  modality:\n    explanation: \"CXR Foundation\\u2019s image encoding model takes DICOM images, and\\\n      \\ its text-encoder accepts textual strings...\"\n    value: text, image; vector\n  model_card: unknown\n  monitoring: After giving the community time to use the models and explore different\n    applications, we collected feedback.\n  name: Health AI Developer Foundations (HAI-DEF)\n  nationality: USA\n  organization: Google Research\n  prohibited_uses: unknown\n  quality_control: Learning from community experience with previous research endpoints...we\n    collected feedback.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: http://goo.gle/3AWVX33\n- access: closed\n  analysis: Unknown\n  created_date: 2024-12-04\n  dependencies:\n  - Imagen 3\n  description: Genie 2 is a foundation world model capable of generating an endless\n    variety of action-controllable, playable 3D environments for training and evaluating\n    embodied agents based on a single prompt image.\n  feedback: Unknown\n  intended_uses: Genie 2 can be used for generating diverse environments for training\n    and evaluating AI agents, rapid prototyping interactive experiences, and experimenting\n    with novel environments.\n  license: Unknown\n  modality:\n    explanation: the model is prompted with a single image generated by Imagen 3\n    value: image; video\n  model_card: unknown\n  monitoring: Unknown\n  name: Genie 2\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: Unknown\n  quality_control: Responsible development is emphasized, developing our technologies\n    responsibly and building towards more general AI systems that can safely carry\n    out tasks.\n  size: Unknown\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/\n- access: limited\n  analysis: Veo 2 outperforms other leading video generation models, based on human\n    evaluations of its performance.\n  created_date: 2024-12-16\n  dependencies: []\n  description: Veo 2 is a state-of-the-art video generation model that creates videos\n    with realistic motion and high-quality output, up to 4K, with extensive camera\n    controls. It simulates real-world physics and offers advanced motion capabilities\n    with enhanced realism and fidelity.\n  feedback: unknown\n  intended_uses: Creating high-quality videos with realistic motion, different styles,\n    camera controls, shot styles, angles, and movements.\n  license: unknown\n  modality:\n    explanation: Our state-of-the-art video generation model ... text-to-image model\n      Veo 2\n    value: text; video\n  model_card: unknown\n  monitoring: unknown\n  name: Veo 2\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: unknown\n  quality_control: Veo 2 includes features that enhance realism, fidelity, detail,\n    and artifact reduction to ensure high-quality output.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://deepmind.google/technologies/veo/veo-2/\n- access:\n    explanation: Gemini 2.0 Flash is available to developers and trusted testers,\n      with wider availability planned for early next year.\n    value: limited\n  analysis: unknown\n  created_date: 2024-12-11\n  dependencies: []\n  description: Google DeepMind introduces Gemini 2.0, a new AI model designed for\n    the 'agentic era.'\n  feedback: unknown\n  intended_uses: Develop more agentic models, meaning they can understand more about\n    the world around you, think multiple steps ahead, and take action on your behalf,\n    with your supervision.\n  license: unknown\n  modality:\n    explanation: The first model built to be natively multimodal, Gemini 1.0 and 1.5\n      drove big advances with multimodality and long context to understand information\n      across text, video, images, audio and code...\n    value: text, video, image, audio; image, text\n  model_card: unknown\n  monitoring: unknown\n  name: Gemini 2.0\n  nationality: unknown\n  organization: Google DeepMind\n  prohibited_uses: unknown\n  quality_control: Google is committed to building AI responsibly, with safety and\n    security as key priorities.\n  size: unknown\n  training_emissions: unknown\n  training_hardware:\n    explanation: \"It\\u2019s built on custom hardware like Trillium, our sixth-generation\\\n      \\ TPUs.\"\n    value: custom hardware like Trillium, our sixth-generation TPUs\n  training_time: unknown\n  type: model\n  url: https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/#ceo-message\n- access:\n    explanation: 'Download models & code: Find the pre-trained models and code on\n      Hugging Face and Kaggle'\n    value: open\n  analysis: Our research demonstrates leading performance on chemical formula recognition,\n    music score recognition, spatial reasoning, and chest X-ray report generation,\n    as detailed in the technical report.\n  created_date: 2024-12-05\n  dependencies:\n  - PaliGemma\n  - Hugging Face\n  - Kaggle\n  - Hugging Face Transformers\n  - Keras\n  - PyTorch\n  - JAX\n  description: PaliGemma 2 builds upon the performant Gemma 2 models, adding the power\n    of vision and making it easier than ever to fine-tune for exceptional performance.\n    With PaliGemma 2, these models can see, understand, and interact with visual input,\n    opening up a world of new possibilities.\n  feedback: Your feedback and contributions are invaluable in shaping the future of\n    these models and driving innovation in the field.\n  intended_uses: fine-tuning for specific tasks and datasets straightforward, empowering\n    you to tailor its capabilities to your precise needs.\n  license: unknown\n  modality:\n    explanation: PaliGemma 2 generates detailed, contextually relevant captions for\n      images\n    value: image; text\n  model_card: unknown\n  monitoring: unknown\n  name: PaliGemma 2\n  nationality: USA\n  organization: Google\n  prohibited_uses: unknown\n  quality_control: unknown\n  size:\n    explanation: Optimize performance for any task with PaliGemma 2's multiple model\n      sizes (3B, 10B, 28B parameters)\n    value: 28B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://developers.googleblog.com/en/introducing-paligemma-2-powerful-vision-language-models-simple-fine-tuning/\n"
  },
  {
    "path": "assets/greenbit.yaml",
    "content": "- access: open\n  analysis: Evaluated on common LLM benchmarks.\n  created_date: 2023-09-29\n  dependencies:\n  - LLaMA\n  description: GreenBit LLaMA is a series of fine-tuned LLaMA models.\n  feedback: https://huggingface.co/GreenBitAI/LLaMA-30B-2bit-groupsize8/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/GreenBitAI/LLaMA-30B-2bit-groupsize8\n  monitoring: unknown\n  name: GreenBit LLaMA\n  nationality: USA\n  organization: GreenBit AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 30B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/GreenBitAI/low_bit_llama\n"
  },
  {
    "path": "assets/h2oai.yaml",
    "content": "- access: open\n  analysis: Evaluated on EleutherAI evaluation harness.\n  created_date: 2023-06-16\n  dependencies:\n  - GPT-NeoX\n  - H2O AI OpenAssistant\n  - h2oGPT Repositories\n  description: Series of models fine-tuned on well-known LLMs using the h2oGPT repositories.\n  feedback: https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b\n  monitoring: ''\n  name: h2oGPT\n  nationality: USA\n  organization: H2O AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 20B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unspecified number of 48GB A100 NVIDIA GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2306.08161.pdf\n- access: open\n  analysis: Evaluated on common sense and world knowledge benchmarks.\n  created_date: 2024-01-30\n  dependencies: []\n  description: H2O Danube is a language model trained on 1T tokens following the core\n    principles of LLaMA 2 and Mistral.\n  feedback: https://huggingface.co/h2oai/h2o-danube-1.8b-base/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/h2oai/h2o-danube-1.8b-base\n  monitoring: unknown\n  name: H2O Danube\n  nationality: USA\n  organization: H2O AI\n  prohibited_uses: Users are encouraged to use the large language model responsibly\n    and ethically. By using this model, you agree not to use it for purposes that\n    promote hate speech, discrimination, harassment, or any form of illegal or harmful\n    activities.\n  quality_control: unknown\n  size: 1.8B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8x H100 GPUs on a single node\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2401.16818.pdf\n"
  },
  {
    "path": "assets/hubspot.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-01-31\n  dependencies:\n  - ChatGPT API\n  description: Give your sales, marketing, and customer service teams one of the most\n    powerful AI tools available - ChatGPT priority access, no timeout limits, company\n    wide access managed through a single account, incorporate into your existing processes\n    without leaving HubSpot\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'We weren''t able to locate a license.\n\n      '\n    value: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: ChatGPT powered by OBO\n  nationality: USA\n  organization: HubSpot\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://ecosystem.hubspot.com/marketplace/apps/sales/sales-enablement/the-obo-group-chatgpt-1398072\n  user_distribution: ''\n"
  },
  {
    "path": "assets/huggingface.yaml",
    "content": "- access: open\n  analysis: none\n  created_date:\n    explanation: The date the model was announced\n    value: 2021-12-06\n  dependencies: []\n  description: CodeParrot is an autoregressive language model trained on code\n  feedback: none\n  intended_uses: none\n  license:\n    explanation: No license is explicitly provided for this model.\n    value: none\n  modality: text; code, text\n  model_card: none\n  monitoring: none\n  name: CodeParrot\n  nationality: USA\n  organization: HuggingFace\n  prohibited_uses: none\n  quality_control: none\n  size: 1B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 16 x A100 (40GB)\n  training_time: unknown\n  type: model\n  url: https://twitter.com/lvwerra/status/1467933794699259908\n- access: open\n  analysis: Evaluated on loss, rewards, logps, and logits rejected and chosen.\n  created_date: 2023-10-11\n  dependencies:\n  - Mistral\n  description: Zephyr is a series of language models that are trained to act as helpful\n    assistants.\n  feedback: https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/discussions\n  intended_uses: Educational and research purposes\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha\n  monitoring: none\n  name: Zephyr\n  nationality: USA\n  organization: HuggingFace\n  prohibited_uses: none\n  quality_control: none\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha\n- access: open\n  analysis: Evaluated in comparison to Flamingo and OpenFlamingo on standard benchmarks.\n  created_date: 2023-08-22\n  dependencies:\n  - OBELICS\n  - Wikipedia\n  - LAION-5B\n  - PMD\n  description: IDEFICS is an open-access visual language model, based on Flamingo.\n  feedback: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct/discussions\n  intended_uses: Educational and research purposes\n  license:\n    explanation: Can be found at https://huggingface.co/HuggingFaceM4/idefics-80b-instruct#license\n    value: custom\n  modality: image, text; text\n  model_card: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct\n  monitoring: none\n  name: IDEFICS\n  nationality: USA\n  organization: HuggingFace\n  prohibited_uses: none\n  quality_control: none\n  size: 80B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/blog/idefics\n- access: open\n  analysis: Subset of training dataset evaluated for bias using Data Measurements\n    Tool.\n  created_date: 2023-08-22\n  datasheet: https://huggingface.co/datasets/HuggingFaceM4/OBELICS\n  dependencies: []\n  description: OBELICS is a dataset consisting of 141 million interleaved image-text\n    documents scraped from the web and contains 353 million images.\n  excluded: All images for which creators explicitly requested opt-out of AI training.\n  feedback: https://huggingface.co/datasets/HuggingFaceM4/OBELICS/discussions\n  included: ''\n  intended_uses: ''\n  license: CC-BY-4.0\n  modality: image, text\n  monitoring: ''\n  name: OBELICS\n  nationality: USA\n  organization: HuggingFace\n  prohibited_uses: ''\n  quality_control: Sexual and violent content still present in OBELICS even after\n    filtering.\n  sample: []\n  size: 115B tokens\n  type: dataset\n  url: https://huggingface.co/blog/idefics\n- access: open\n  analysis: Evaluated on in-house benchmark, FIN-bench, adapted from BIG-bench for\n    Finnish.\n  created_date: 2023-11-03\n  dependencies: []\n  description: FinGPT is a series of Finnish LLMs trained from scratch.\n  feedback: https://huggingface.co/TurkuNLP/gpt3-finnish-13B/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/TurkuNLP/gpt3-finnish-13B\n  monitoring: unknown\n  name: FinGPT\n  nationality: unknown\n  organization: University of Turku, HuggingFace, National Library of Finland\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 192 nodes, each consisting of 4 AMD Instinct MI250X GPUs, a single\n    64-core AMD Trento CPU and 512GB of memory.\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.05640.pdf\n- access: open\n  analysis: Evaluated on in-house benchmark, FIN-bench, adapted from BIG-bench for\n    Finnish.\n  created_date: 2023-11-03\n  dependencies:\n  - BLOOM\n  description: BLUUMI is a multilingual fine-tuned version of BLOOM.\n  feedback: https://huggingface.co/TurkuNLP/bloom-finnish-176b/discussions\n  intended_uses: ''\n  license:\n    explanation: Model card indicates same as license for BLOOM.\n    value: BigScience RAIL v1.0\n  modality: text; text\n  model_card: https://huggingface.co/TurkuNLP/bloom-finnish-176b\n  monitoring: unknown\n  name: BLUUMI\n  nationality: unknown\n  organization: University of Turku, HuggingFace, National Library of Finland\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 176B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 192 nodes, each consisting of 4 AMD Instinct MI250X GPUs, a single\n    64-core AMD Trento CPU and 512GB of memory.\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.05640.pdf\n- access: open\n  analysis: Some seed samples were used in different prompt styles and audiences.\n    Less than 1% of files are duplicates after running MinHash deduplication. Contaminated\n    samples were removed from each dataset split.\n  created_date: 2024-02-22\n  datasheet: https://huggingface.co/datasets/HuggingFaceTB/cosmopedia\n  dependencies:\n  - Mixtral\n  description: Cosmopedia is a dataset of synthetic textbooks, blogposts, stories,\n    posts, and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1. The dataset\n    contains over 30 million files and 25 billion tokens, making it the largest open\n    synthetic dataset to date. It covers a variety of topics, mapping worldwide knowledge\n    from Web datasets like RefinedWeb and RedPajama, to generate synthetic content.\n  excluded: unknown\n  feedback: https://huggingface.co/datasets/HuggingFaceTB/cosmopedia/discussions\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: Cosmopedia v0.1\n  nationality: unknown\n  organization: Hugging Face\n  prohibited_uses: unknown\n  quality_control: Measures were taken to reduce redundancy and ensure diversity in\n    generated content. A decontamination pipeline was implemented to avoid benchmark\n    contamination.\n  sample: []\n  size: 25B tokens\n  type: dataset\n  url: none\n- access: open\n  analysis: The performance of Idefics2 has been evaluated on numerous benchmarks.\n    It is top of its class size and competes with much larger models such as LLava-Next-34B\n    and MM1-30B-chat.\n  created_date: 2024-04-15\n  dependencies:\n  - The Cauldron\n  description: Idefics2 is a general multimodal model that takes as input arbitrary\n    sequences of text and images, generating text responses. It has the capability\n    to describe visual content, answer questions about images, perform basic arithmetic\n    operations, create stories grounded in multiple images, and extract information\n    from documents.\n  feedback: https://huggingface.co/HuggingFaceM4/idefics2-8b/discussions\n  intended_uses: The model can be used for answering questions about images, describing\n    visual content, creating stories grounded in multiple images, extracting information\n    from documents, and performing basic arithmetic operations.\n  license: Apache 2.0\n  modality: image, text; text\n  model_card: https://huggingface.co/HuggingFaceM4/idefics2-8b\n  monitoring: unknown\n  name: Idefics2\n  nationality: unknown\n  organization: Hugging Face\n  prohibited_uses: unknown\n  quality_control: The quality of the model has been ensured by training it on a mixture\n    of openly available datasets and enhancing its OCR capabilities. Further improvements\n    include manipulating images in their native resolutions and aspect ratios, better\n    pre-trained backbones, and allowing for sub-image splitting.\n  size: 8B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/blog/idefics2\n- access: open\n  analysis: none\n  created_date: 2024-04-15\n  datasheet: https://huggingface.co/datasets/HuggingFaceM4/the_cauldron\n  dependencies:\n    explanation: These are the datasets with the most tokens included; the full list\n      of all 50 datasets can be found at https://huggingface.co/datasets/HuggingFaceM4/the_cauldron\n    value:\n    - LNarratives\n    - Rendered Text\n    - WebSight\n    - DaTikz\n  description: The Cauldron is an open compilation of 50 manually-curated datasets\n    formatted for multi-turn conversations.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/discussions\n  included: ''\n  intended_uses: ''\n  license: CC BY 4.0\n  modality: image, text\n  monitoring: unknown\n  name: The Cauldron\n  nationality: unknown\n  organization: Hugging Face\n  prohibited_uses: ''\n  quality_control: unknown\n  sample: []\n  size: 50 vision-language datasets\n  type: dataset\n  url: https://huggingface.co/blog/idefics2\n"
  },
  {
    "path": "assets/ibm.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date:\n    explanation: The date the model was announced in the [[Adept blog post]](https://www.adept.ai/blog/act-1).\n    value: 2023-08-03\n  dependencies:\n  - NASA HLS data\n  description: Prithvi is a first-of-its-kind temporal Vision transformer pre-trained\n    by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS)\n    data. The model adopts a self-supervised encoder developed with a ViT architecture\n    and Masked AutoEncoder (MAE) learning strategy, with an MSE loss function.\n  feedback: https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality:\n    explanation: video; text\n    value: text, video; text, video\n  model_card: https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M\n  monitoring: ''\n  name: Prithvi\n  nationality: USA\n  organization: IBM\n  prohibited_uses: ''\n  quality_control: ''\n  size: 100M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://github.com/NASA-IMPACT/hls-foundation-os\n- access: limited\n  adaptation: ''\n  created_date: 2023-09-07\n  dependencies:\n  - Granite\n  description: Watsonx.ai is part of the IBM watsonx platform that brings together\n    new generative AI capabilities, powered by foundation models and traditional machine\n    learning into a powerful studio spanning the AI lifecycle.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: License information can be found at https://www.ibm.com/docs/en/watsonxdata/1.0.x?topic=planning-licenses-entitlements\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Watsonx.ai\n  nationality: USA\n  organization: IBM\n  output_space: deployed AI models\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.ibm.com/docs/en/watsonx-as-a-service?topic=models-terms-use\n  type: application\n  url: https://www.ibm.com/products/watsonx-ai\n  user_distribution: ''\n- access: limited\n  analysis: unknown\n  created_date: 2023-09-28\n  dependencies: []\n  description: Granite is a set of multi-size foundation models that apply generative\n    AI to both language and code.\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  modality: text; code, text\n  model_card: none\n  monitoring: ''\n  name: Granite\n  nationality: USA\n  organization: IBM\n  prohibited_uses: ''\n  quality_control: Training data passed through IBM HAP detector, language model designed\n    to remove harmful content. Data also deduplicated and filtered for document quality.\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models/\n- access:\n    explanation: \"In keeping with IBM\\u2019s strong historical commitment to open\\\n      \\ source , all Granite models are released under the permissive Apache 2.0 license\"\n    value: open\n  analysis: Granite 3.0 8B Instruct matches leading similarly-sized open models on\n    academic benchmarks while outperforming those peers on benchmarks for enterprise\n    tasks and safety.\n  created_date: 2024-10-21\n  dependencies:\n  - \"Hugging Face\\u2019s OpenLLM Leaderboard v2\"\n  description: IBM Granite 3.0 models deliver state-of-the-art performance relative\n    to model size while maximizing safety, speed and cost-efficiency for enterprise\n    use cases.\n  feedback: unknown\n  intended_uses: classic natural language use cases including text generation, classification,\n    summarization, entity extraction and customer service chatbots, programming language\n    use cases such as code generation, code explanation and code editing, and for\n    agentic use cases requiring tool calling\n  license:\n    explanation: \"In keeping with IBM\\u2019s strong historical commitment to open\\\n      \\ source , all Granite models are released under the permissive Apache 2.0 license\"\n    value: Apache 2.0\n  modality:\n    explanation: IBM Granite 3.0 8B Instruct model for classic natural language use\n      cases including text generation, classification, summarization, entity extraction\n      and customer service chatbots\n    value: text; text\n  model_card: unknown\n  monitoring: ''\n  name: IBM Granite 3.0\n  nationality: USA\n  organization: IBM\n  prohibited_uses: unknown\n  quality_control: The entire Granite family of models are trained on carefully curated\n    enterprise datasets, filtered for objectionable content with critical concerns\n    like governance, risk, privacy and bias mitigation in mind\n  size:\n    explanation: 'Dense, general purpose LLMs: Granite-3.0-8B-Instruct'\n    value: 8B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.ibm.com/new/ibm-granite-3-0-open-state-of-the-art-enterprise-models\n"
  },
  {
    "path": "assets/idea.yaml",
    "content": "- access: open\n  analysis: Evaluated on human and machine benchmarks in comparison to established\n    image models as a baseline.\n  created_date: 2024-01-26\n  dependencies:\n  - CLIP\n  - LAION-400M\n  - Wukong\n  - Stable Diffusion XL\n  description: Taiyi Diffusion XL is a new Chinese and English bilingual text-to-image\n    model which is developed by extending the capabilities of CLIP and Stable-DiffusionXL.\n  feedback: https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-XL-3.5B/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; image\n  model_card: https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-XL-3.5B\n  monitoring: unknown\n  name: Taiyi Diffusion XL\n  nationality: unknown\n  organization: International Digital Economy Academy, South China University of Technology,\n    University of Science and Technology of China\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 3.5B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2401.14688.pdf\n"
  },
  {
    "path": "assets/ideogram.yaml",
    "content": "- access: limited\n  analysis: \"Compared to DALL\\xB7E 3 based on a qualitative user comparison.\"\n  created_date: 2024-02-28\n  dependencies: []\n  description: \"Ideogram 1.0 is Ideogram\\u2019s most advanced text-to-image model,\\\n    \\ as of release.\"\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; image\n  model_card: none\n  monitoring: ''\n  name: Ideogram 1.0\n  nationality: USA\n  organization: Ideogram AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://about.ideogram.ai/1.0\n"
  },
  {
    "path": "assets/inflection.yaml",
    "content": "- access: limited\n  analysis: Evaluated on wide range of language benchmarks like MMLU 5-shot, GSM-8K,\n    and HellaSwag 10-shot among others.\n  created_date: 2023-06-22\n  dependencies: []\n  description: Inflection AI's first version of its in-house LLM. via Inflection AI's\n    conversational API.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Inflection-1\n  nationality: USA\n  organization: Inflection AI\n  prohibited_uses: ''\n  quality_control: ''\n  size:\n    explanation: Designed to be roughly around Chat-GPT's size (175B parameters).\n    value: unknown\n  training_emissions: ''\n  training_hardware: unknown\n  training_time: ''\n  type: model\n  url: https://inflection.ai/inflection-1\n- access: limited\n  adaptation: ''\n  created_date: 2023-05-02\n  dependencies:\n  - Inflection-2.5\n  description: Personal AI chatbot designed to be conversational and specialized in\n    emotional intelligence.\n  failures: ''\n  feedback: ''\n  intended_uses: to be used as a personal assistant chatbot for everyday activities\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Pi\n  nationality: USA\n  organization: Inflection AI\n  output_space: natural language text responses\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://inflection.ai/press\n  user_distribution: ''\n- access: closed\n  analysis: Evaluated against state of the art models on benchmarks, and found to\n    be most performant model outside of GPT-4.\n  created_date: 2023-11-22\n  dependencies: []\n  description: Inflection-2 is the best model in the world for its compute class and\n    the second most capable LLM in the world, according to benchmark evaluation, as\n    of its release.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: Inflection-2\n  nationality: USA\n  organization: Inflection AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: 5000 NVIDIA H100 GPUs\n  training_time: unknown\n  type: model\n  url: https://inflection.ai/inflection-2\n- access: limited\n  analysis: Evaluated on standard LLM and technical benchmarks in comparison to Inflection-1\n    and GPT-4, along with advanced STEM examinations.\n  created_date: 2024-03-07\n  dependencies: []\n  description: Inflection-2.5 is an upgraded in-house model that is competitive with\n    all the world's leading LLMs, as of release, like GPT-4 and Gemini.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: Inflection-2.5\n  nationality: USA\n  organization: Inflection AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://inflection.ai/inflection-2-5\n- access:\n    explanation: \"Developers can now access Inflection AI\\u2019s Large Language Model\\\n      \\ through its new commercial API.\"\n    value: open\n  analysis: unknown\n  created_date: 2024-10-07\n  dependencies: []\n  description: Inflection for Enterprise, powered by our industry-first, enterprise-grade\n    AI system, Inflection 3.0.\n  feedback: So please drop us a line. We want to keep hearing from enterprises about\n    how we can help solve their challenges and make AI a reality for their business.\n  intended_uses: unknown\n  license: unknown\n  modality: unknown\n  model_card: unknown\n  monitoring: unknown\n  name: Inflection 3.0\n  nationality: USA\n  organization: Inflection AI\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://inflection.ai/blog/enterprise\n"
  },
  {
    "path": "assets/instacart.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-03-01\n  dependencies:\n  - ChatGPT API\n  description: \"Instacart is augmenting the Instacart app to enable customers to ask\\\n    \\ about food and get inspirational, shoppable answers. This uses ChatGPT alongside\\\n    \\ Instacart\\u2019s own AI and product data from their 75,000+ retail partner store\\\n    \\ locations to help customers discover ideas for open-ended shopping goals, such\\\n    \\ as \\u201CHow do I make great fish tacos?\\u201D or \\u201CWhat\\u2019s a healthy\\\n    \\ lunch for my kids?\\u201D Instacart plans to launch \\u201CAsk Instacart\\u201D\\\n    \\ later this year.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  monitoring: ''\n  monthly_active_users: ''\n  name: Ask Instacart\n  nationality: USA\n  organization: Instacart\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://openai.com/blog/introducing-chatgpt-and-whisper-apis\n  user_distribution: ''\n"
  },
  {
    "path": "assets/internlm.yaml",
    "content": "- access: open\n  analysis: Evaluated on the dimensions proposed by OpenCompass in comparison to other\n    LLMs.\n  created_date: 2023-09-20\n  dependencies: []\n  description: InternLM is an LLM pre-trained on over 2.3T Tokens containing high-quality\n    English, Chinese, and code data.\n  feedback: https://huggingface.co/internlm/internlm-20b/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/internlm/internlm-20b\n  monitoring: unknown\n  name: InternLM\n  nationality: China\n  organization: InternLM\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/InternLM/InternLM\n"
  },
  {
    "path": "assets/juni.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Anthropic API\n  description: An online tutoring solution to help students achieve academic success.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Juni Tutor Bot\n  nationality: USA\n  organization: Juni Learning\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://junilearning.com/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/kaist.yaml",
    "content": "- access: open\n  analysis: Evaluated in comparison to the MusicCaps dataset and with respect to n-gram,\n    neural metrics.\n  created_date: 2023-07-31\n  datasheet: none\n  dependencies:\n  - MusicCaps\n  - Million Song Dataset\n  - Magnatagtune\n  description: LP-MusicCaps is a LLM-based pseudo music caption dataset.\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: CC-BY-NC-4.0\n  modality: {}\n  monitoring: ''\n  name: LP-MusicCaps\n  nationality: South Korea\n  organization: South Korea Graduate School of Culture Technology\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 2.2M captions paired with 0.5M audio clips\n  type: dataset\n  url: https://arxiv.org/pdf/2307.16372.pdf\n"
  },
  {
    "path": "assets/kakaobrain.yaml",
    "content": "- access:\n    explanation: You can download the dataset from [[COYO Huggingface page]](https://huggingface.co/datasets/kakaobrain/coyo-700m).\n      For more information on downloading the image dataset, please refer to [[download/README.md]](https://github.com/kakaobrain/coyo-dataset/blob/main/download/README.md)\n      on Github.\n    value: open\n  analysis:\n    explanation: See [[Experiments]](https://github.com/kakaobrain/coyo-dataset#experiments)\n    value: '\"We empirically validated the quality of COYO dataset by re-implementing\n      popular models such as ALIGN, unCLIP, and ViT. We trained these models on COYO-700M\n      or its subsets from scratch, achieving competitive performance to the reported\n      numbers or generated samples in the original papers.\"\n\n      '\n  created_date:\n    explanation: As mentioned in the [[Google Scholar page]](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=8WXfStMAAAAJ&citation_for_view=8WXfStMAAAAJ:-f6ydRqryjwC).\n    value: 2022-08-31\n  datasheet: none\n  dependencies:\n  - CommonCrawl\n  description: 'COYO-700M is a large-scale dataset that contains 747M image-text pairs\n    as well as many other meta-attributes to increase the usability to train various\n    models.\n\n    '\n  excluded:\n    explanation: See [[Data Filtering]](https://github.com/kakaobrain/coyo-dataset#data-filtering)\n    value: Removed images less than 5KB image size. Removed images with an aspect\n      ratio greater than 3.0. Removed images with min(width, height) < 200. Removed\n      images with a score of OpenNSFW2 or GantMan/NSFW higher than 0.5. Removed all\n      duplicate images based on the image pHash value from external public datasets.\n      Removed texts with a length of 5 or less. Removed texts that do not have a noun\n      form. Removed texts with less than 3 words or more than 256 words and texts\n      over 1000 in length. Removed texts appearing more than 10 times. Removed texts\n      containing NSFW words. Removed duplicated samples based on (image_phash, text).\n  feedback: Feedback can be given by emailing at coyo at kakaobrain.com\n  included:\n    explanation: See [[Data Filtering]](https://github.com/kakaobrain/coyo-dataset#data-filtering)\n    value: Included all image formats that Pillow library can decode. Collected only\n      English text using cld3.\n  intended_uses:\n    explanation: See [[Disclaimer & Content Warning]](https://github.com/kakaobrain/coyo-dataset#disclaimer--content-warning)\n    value: '\"It is strongly recommended that this dataset be used only for research,\n      keeping this in mind when using the dataset, and Kakao Brain does not recommend\n      using this dataset as it is without special processing to clear inappropriate\n      data to create commercial products.\"\n\n      '\n  license: CC-BY-4.0\n  modality: image, text\n  monitoring: none\n  name: COYO-700M\n  nationality: South Korea\n  organization: Kakao Brain\n  prohibited_uses:\n    explanation: See [[Disclaimer & Content Warning]](https://github.com/kakaobrain/coyo-dataset#disclaimer--content-warning)\n    value: 'Authors note the following limitations: \"Kakao Brain tried to construct\n      a \"Safe\" dataset when building the COYO dataset. However, despite these efforts,\n      this large-scale dataset was not hand-picked by humans to avoid the risk due\n      to its very large size (over 700M). Keep in mind that the unscreened nature\n      of the dataset means that the collected images can lead to strongly discomforting\n      and disturbing content for humans. The COYO dataset may contain some inappropriate\n      data, and any problems resulting from such data are the full responsibility\n      of the user who used it.\"\n\n      '\n  quality_control: ''\n  sample: []\n  size: 747M image-text pairs\n  type: dataset\n  url: https://github.com/kakaobrain/coyo-dataset\n"
  },
  {
    "path": "assets/khan.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-03-14\n  dependencies:\n  - GPT-4 API\n  description: An AI-powered assistant that functions as both a virtual tutor for\n    students and a classroom assistant for teachers.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Khanmigo\n  nationality: USA\n  organization: Khan Academy\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.khanacademy.org/khan-labs#khanmigo\n  user_distribution: ''\n"
  },
  {
    "path": "assets/konan.yaml",
    "content": "- access: limited\n  analysis: none\n  created_date: 2023-09-17\n  dependencies: []\n  description: Konan LLM is a Large Language Model developed in-house by Konan Technology.\n    Optimized for super-large AI training, it leverages high-quality, large-scale\n    data and over 20 years of expertise in natural language processing.\n  feedback: none\n  intended_uses: Document generation, document review, Q&A, customer response scenarios.\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: Konan LLM\n  nationality: Japan\n  organization: Konan\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://en.konantech.com/en/llm/konanllm\n"
  },
  {
    "path": "assets/kotoba.yaml",
    "content": "- access: open\n  analysis: unknown\n  created_date: 2024-03-13\n  dependencies: []\n  description: Kotoba-Speech is a Transformer-based speech generative model that supports\n    fluent text-to-speech generation in Japanese and one-shot voice cloning through\n    speech prompt.\n  feedback: https://huggingface.co/kotoba-tech/kotoba-speech-v0.1/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; audio\n  model_card: https://huggingface.co/kotoba-tech/kotoba-speech-v0.1\n  monitoring: unknown\n  name: Kotoba Speech\n  nationality: Japan\n  organization: Kotoba Tech\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.2B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/kotoba-tech/kotoba-speech-v0.1\n"
  },
  {
    "path": "assets/ktai.yaml",
    "content": "- access: open\n  analysis: unknown\n  created_date: 2023-10-31\n  dependencies:\n  - AI-HUB dataset\n  - National Institute of Korean Language dataset\n  description: Midm is a pre-trained Korean-English language model developed by KT.\n    It takes text as input and creates text. The model is based on Transformer architecture\n    for an auto-regressive language model.\n  feedback: https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1/discussions\n  intended_uses: It is expected to be used for various research purposes.\n  license: CC-BY-NC 4.0\n  modality: text; text\n  model_card: https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1\n  monitoring: unknown\n  name: Midm\n  nationality: South Korea\n  organization: KT Corporation\n  prohibited_uses: It cannot be used for commercial purposes.\n  quality_control: KT tried to remove unethical expressions such as profanity, slang,\n    prejudice, and discrimination from training data.\n  size: 7B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1\n"
  },
  {
    "path": "assets/kunlun.yaml",
    "content": "- access: open\n  analysis: Evaluated on several popular benchmarks and performance in different fields.\n  created_date: 2023-10-30\n  dependencies:\n  - SkyPile\n  description: The Skywork series is a family of large language models (LLMs) trained\n    on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts.\n  feedback: https://huggingface.co/Skywork/Skywork-13B-base/discussions\n  intended_uses: ''\n  license:\n    explanation: can be found at https://github.com/SkyworkAI/Skywork/blob/main/LICENSE\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/Skywork/Skywork-13B-base\n  monitoring: none\n  name: Skywork\n  nationality: China\n  organization: Kunlun Inc.\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 512 A800-80GB GPUs\n  training_time: 39 days\n  type: model\n  url: https://arxiv.org/pdf/2310.19341.pdf\n"
  },
  {
    "path": "assets/laion.yaml",
    "content": "- access:\n    explanation: The dataset index is available from [[dataset blog post]](https://laion.ai/blog/laion-400-open-dataset/)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the [[blog post]](https://laion.ai/blog/laion-400-open-dataset/)\n      was released.\n    value: 2021-08-20\n  datasheet: https://laion.ai/blog/laion-400-open-dataset/\n  dependencies:\n  - CLIP\n  - CommonCrawl\n  description: LAION-400M is a dataset with CLIP-filtered 400 million image-text pairs,\n    their CLIP embeddings and kNN indices that allow efficient similarity search.\n    This dataset is entirely openly, freely accessible.\n  excluded:\n    explanation: See [[Dataset and Methods]](https://arxiv.org/pdf/2111.02114.pdf#section.2)\n    value: 'Authors apply the following filtering conditions on the WAT files downloaded\n      from Common Crawl: \"All samples with less than 5 character alt-text length or\n      less than 5 KB image size are dropped. Duplicate removal is performed with bloom\n      filter based on URL and alt-text. We use CLIP to compute embeddings of the image\n      and alt-text. Then we compute the cosine similarity of both embeddings and drop\n      all samples with cosine similarity below 0.3. This threshold was selected based\n      on human inspections. We use the CLIP embeddings of images and texts to filter\n      out illegal contents.\"\n\n      '\n  feedback: ''\n  included: ''\n  intended_uses: The authors recommend using the dataset \"for research purposes\" and\n    warn that \"this large-scale dataset is non-curated. It was built for research\n    purposes to enable testing model training on larger scale for broad researcher\n    and other interested communities, and is not meant for any real-world production\n    or application.\"\n  license:\n    explanation: The license is listed on the [[dataset blog post]](https://laion.ai/blog/laion-400-open-dataset/)\n    value: CC BY 4.0\n  modality: image, text\n  monitoring: ''\n  name: LAION-400M\n  nationality: Germany\n  organization: LAION\n  prohibited_uses: No uses are explicitly prohibited by the license. Users are warned\n    from using LAION-400M for any real-world production or application.\n  quality_control:\n    explanation: See [[Dataset and Methods]](https://arxiv.org/pdf/2111.02114.pdf#section.2)\n    value: The authors use  CLIP embeddings of images and texts to filter out illegal\n      contents. They also use CLIP to tag image-text pairs as NSFW. They note that\n      less than 1% of images were detected as NSFW, which can be filtered out by an\n      user with NSFW tag.\n  sample: []\n  size: 400M image-text pairs\n  type: dataset\n  url: https://laion.ai/blog/laion-400-open-dataset/\n- access:\n    explanation: The dataset index is available from [[dataset blog post]](https://laion.ai/blog/laion-5b/)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the [[blog post]](https://laion.ai/blog/laion-5b/) was released.\n      Note that the dataset was already compiled earlier, e.g. the Stable Diffusion\n      model released earlier was trained on a subset of LAION-5B\n    value: 2022-12-12\n  datasheet: https://laion.ai/blog/laion-5b/\n  dependencies:\n  - CLIP\n  - mCLIP\n  - CommonCrawl\n  description: LAION is a dataset of 5 billion image-text pairs from the Internet\n  excluded:\n    explanation: See [[dataset blog post]](https://laion.ai/blog/laion-5b/)\n    value: All samples with less than 5 characters alt-text length or less than 5\n      KB image size are dropped. All images with the too big resolution, potentially\n      DOS bombs, were dropped before attempting to process them. Duplicate removal\n      is performed with a bloom filter based on URL. Future runs would include more\n      variate deduplication rules, such as URL + language for the multilanguage dataset.\n      We use CLIP respectively MCLIP to compute embeddings of the image and alt-text.\n      Then we compute the cosine similarity of both embeddings and drop all samples\n      with cosine similarity below 0.28 for the English language ( with CLIP B/32)\n      and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected\n      based on human inspection of the test results. We use the CLIP embeddings of\n      images and texts to filter out to the possible extent the illegal content.\n  feedback: ''\n  included: ''\n  intended_uses: The authors recommend using the dataset \"for research purposes\" and\n    \"do not recommend using it for creating ready-to-go industrial products, as the\n    basic research about general properties and safety of such large-scale models,\n    which we would like to encourage with this release, is still in progress\"\n  license:\n    explanation: The license is listed on the [[dataset blog post]](https://laion.ai/blog/laion-5b/)\n    value: CC BY 4.0\n  modality: image, text\n  monitoring: ''\n  name: LAION-5B\n  nationality: Germany\n  organization: LAION\n  prohibited_uses: No uses are explicitly prohibited by the license. Users are warned\n    from using LAION-5B for non-research purposes.\n  quality_control: ''\n  sample: []\n  size: 5B image-text pairs\n  type: dataset\n  url: https://laion.ai/blog/laion-5b/\n- access:\n    explanation: The dataset index is available from [[dataset blog post]](https://laion.ai/blog/laion-5b/)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the [[blog post]](https://laion.ai/blog/laion-5b/) was released.\n      Note that the dataset was already compiled earlier, e.g. the Stable Diffusion\n      model released earlier was trained on a subset of LAION-5B\n    value: 2022-12-12\n  datasheet: https://laion.ai/blog/laion-5b/\n  dependencies:\n  - CLIP\n  - LAION-5B\n  description: LAION-2B-en is a subset of the LAION-5B dataset and contains 2.3 billion\n    English image-text pairs.\n  excluded:\n    explanation: See [[dataset blog post]](https://laion.ai/blog/laion-5b/)\n    value: All samples with less than 5 characters alt-text length or less than 5\n      KB image size are dropped. All images with the too big resolution, potentially\n      DOS bombs, were dropped before attempting to process them. Duplicate removal\n      is performed with a bloom filter based on URL. Future runs would include more\n      variate deduplication rules, such as URL + language for the multilanguage dataset.\n      We use CLIP respectively MCLIP to compute embeddings of the image and alt-text.\n      Then we compute the cosine similarity of both embeddings and drop all samples\n      with cosine similarity below 0.28 for the English language ( with CLIP B/32)\n      and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected\n      based on human inspection of the test results. We use the CLIP embeddings of\n      images and texts to filter out to the possible extent the illegal content.\n  feedback: ''\n  included: ''\n  intended_uses: The authors recommend using the dataset \"for research purposes\" and\n    \"do not recommend using it for creating ready-to-go industrial products, as the\n    basic research about general properties and safety of such large-scale models,\n    which we would like to encourage with this release, is still in progress\"\n  license:\n    explanation: The license is listed on the [[dataset blog post]](https://laion.ai/blog/laion-5b/)\n    value: CC BY 4.0\n  modality: image, text\n  monitoring: ''\n  name: LAION-2B-en\n  nationality: Germany\n  organization: LAION\n  prohibited_uses: No uses are explicitly prohibited by the license. Users are warned\n    from using LAION-2B-en for non-research purposes.\n  quality_control: ''\n  sample: []\n  size: 2.32B image-text pairs\n  type: dataset\n  url: https://arxiv.org/pdf/2210.08402.pdf\n- access: open\n  analysis: Evaluated on COCO captioning and VQAv2 vision-language tasks.\n  created_date: 2023-03-28\n  dependencies:\n  - LLaMA\n  - CLIP\n  description: An open-source reproduction of DeepMind's Flamingo model. At its core,\n    OpenFlamingo is a framework that enables training and evaluation of large multimodal\n    models (LMMs).\n  feedback: ''\n  intended_uses: academic research purposes\n  license: MIT\n  modality: image, text; text\n  model_card: https://github.com/mlfoundations/open_flamingo/blob/main/MODEL_CARD.md\n  monitoring: ''\n  name: OpenFlamingo\n  nationality: Germany\n  organization: LAION\n  prohibited_uses: commercial use\n  quality_control: ''\n  size: 9B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://laion.ai/blog/open-flamingo/\n"
  },
  {
    "path": "assets/laion_ev.yaml",
    "content": "- access: open\n  analysis: Re-LAION-5B aims to fix the issues as reported by Stanford Internet Observatory\n    for the original LAION-5B. It is available for download in two versions, research\n    and research-safe. In total, 2236 links that potentially led to inappropriate\n    content were removed.\n  created_date: 2024-08-30\n  dependencies:\n  - LAION-5B\n  description: Re-LAION-5B is an updated version of LAION-5B, the first web-scale,\n    text-link to images pair dataset to be thoroughly cleaned of known links to suspected\n    CSAM. It is an open dataset for fully reproducible research on language-vision\n    learning. This model was developed in response to issues identified by the Stanford\n    Internet Observatory in December 2023. The updates were made in collaboration\n    with multiple organizations like the Internet Watch Foundation (IWF), the Canadian\n    Center for Child Protection (C3P), and Stanford Internet Observatory.\n  feedback: Problems with the dataset should be reported to the LAION organization.\n    They have open lines for communication with their partners and the broader research\n    community.\n  intended_uses: Re-LAION-5B is designed for research on language-vision learning.\n    It can also be used by third parties to clean existing derivatives of LAION-5B\n    by generating diffs and removing all matched content from their versions.\n  license: Apache 2.0\n  modality: text; image\n  model_card: unknown\n  monitoring: unknown\n  name: Re-LAION-5B\n  nationality: Germany\n  organization: LAION e.V.\n  prohibited_uses: The dataset should not be utilized for purposes that breach legal\n    parameters or ethical standards, such as dealing with illegal content.\n  quality_control: The model utilized lists of link and image hashes provided by partner\n    organizations. These were used to remove inappropriate links from the original\n    LAION-5B dataset to create Re-LAION-5B.\n  size: 5.5B (text, image) pairs\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://laion"
  },
  {
    "path": "assets/latitude.yaml",
    "content": "- access:\n    explanation: 'The game is available to public upon sign up.\n\n      '\n    value: limited\n  adaptation: ''\n  created_date:\n    explanation: 'Release date of the game on iOS and Android platforms [[Wikipedia]](https://en.wikipedia.org/wiki/AI_Dungeon).\n\n      '\n    value: 2019-12-17\n  dependencies:\n  - OpenAI API\n  description: 'AI Dungeon is a single-player text adventure game that uses AI to\n    generate content.\n\n    '\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: '\"Subject to your compliance with these Terms, the Company hereby\n      grants to you, a personal, worldwide, royalty-free, non-assignable, non-sublicensable,\n      non-transferrable, and non-exclusive license to use the software provided to\n      you as part of the Services (and to download a single copy of the App onto the\n      equipment or device specified by us)\" - excerpt from the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: AI Dungeon\n  nationality: USA\n  organization: Latitude\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://play.aidungeon.io/main/termsOfService\n  type: application\n  url: https://play.aidungeon.io\n  user_distribution: ''\n"
  },
  {
    "path": "assets/lehigh.yaml",
    "content": "- access: open\n  analysis: outperforms majority of preceding state-of-the-art models over 15 unique\n    biomedical modalities.\n  created_date: 2023-05-26\n  dependencies:\n  - GPT-style autoregressive decoder\n  - BiomedGPT biomedical datasets\n  description: BiomedGPT leverages self-supervision on large and diverse datasets\n    to accept multi-modal inputs and perform a range of downstream tasks.\n  feedback: ''\n  intended_uses: furthering research in developing unified and generalist models for\n    biomedicine.\n  license: Apache 2.0\n  modality: image, text; text\n  model_card: ''\n  monitoring: ''\n  name: BiomedGPT\n  nationality: USA\n  organization: Lehigh University\n  prohibited_uses: ''\n  quality_control: No specific quality control is mentioned in model training, though\n    details on data processing and how the model was trained are provided in the paper.\n  size: 472M parameters (dense)\n  training_emissions: unknown\n  training_hardware: 10 NVIDIA A5000 GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2305.17100.pdf\n"
  },
  {
    "path": "assets/lg.yaml",
    "content": "- access: closed\n  analysis: none\n  created_date: 2023-07-19\n  dependencies: []\n  description: EXAONE 2.0 is a multimodal artificial intelligence that can be used\n    to help develop new materials and medicines.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: image, text; image, text\n  model_card: none\n  monitoring: ''\n  name: EXAONE 2.0\n  nationality: South Korea\n  organization: LG AI Research\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.lgresearch.ai/exaone\n"
  },
  {
    "path": "assets/lg_ai_research.yaml",
    "content": "- access: open\n  analysis: The model was evaluated extensively across a wide range of public and\n    in-house benchmarks. The comparative analysis showed that the performance of EXAONE\n    3.0 was competitive in English and excellent in Korean compared to other large\n    language models of a similar size.\n  created_date: 2024-09-08\n  dependencies:\n  - MeCab\n  description: EXAONE 3.0 is an instruction-tuned large language model developed by\n    LG AI Research. It demonstrates notably robust performance across a range of tasks\n    and benchmarks. It has been fine-tuned to be capable of complex reasoning and\n    has a particular proficiency in Korean. The released 7.8B parameter model is designed\n    to promote open research and innovation.\n  feedback: Unknown\n  intended_uses: The model was intended for non-commercial and research purposes.\n    The capabilities of the model allow for use cases that involve advanced AI and\n    language processing tasks, particularly in fields requiring proficiency in English\n    and Korean.\n  license: Unknown\n  modality: text; text\n  model_card: unknown\n  monitoring: Unknown\n  name: EXAONE 3.0 Instruction Tuned Language Model\n  nationality: South Korea\n  organization: LG AI Research\n  prohibited_uses: Commercial use is not intended for this model. Its intended use\n    is for non-commercial research and innovation.\n  quality_control: Extensive pre-training on a diverse dataset, and advanced post-training\n    techniques were employed to enhance instruction-following capabilities. The model\n    was also trained to fully comply with data handling standards.\n  size: 7.8B parameters (dense)\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://arxiv.org/pdf/2408.03541\n"
  },
  {
    "path": "assets/linkedin.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Azure Cognitive Services for Vision\n  description: More than 40 percent of LinkedIn's feed posts include at least one\n    image. We want every member to have equal access to opportunity and are committed\n    to ensuring that we make images accessible to our members who are blind or who\n    have low vision so they can be a part of the online conversation. With Azure Cognitive\n    Service for Vision, we can provide auto-captioning to edit and support alt. text\n    descriptions.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: LinkedIn\n  nationality: USA\n  organization: LinkedIn\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.linkedin.com/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/llm360.yaml",
    "content": "- access: open\n  analysis: Evaluated on several benchmark LLM tasks\n  created_date: 2023-12-12\n  dependencies:\n  - Arxiv\n  - Books\n  - C4\n  - RefinedWeb\n  - StarCoder\n  - StackExchange\n  - Wikipedia\n  description: Amber is the first model in the LLM360 family, an initiative for comprehensive\n    and fully open-sourced LLMs, where all training details, model checkpoints, intermediate\n    results, and additional analyses are made available to the community.\n  feedback: https://huggingface.co/LLM360/Amber/discussions\n  intended_uses: to support open and collaborative AI research by making the full\n    LLM training process transparent.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/LLM360/Amber\n  monitoring: unknown\n  name: Amber\n  nationality: International\n  organization: LLM360\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 56 DGX A100 nodes, each equipped with 4 80GB A100 GPUs\n  training_time: unknown\n  type: model\n  url: https://www.llm360.ai/\n- access: open\n  analysis: Evaluated on English and coding tasks and benchmarks, and outperforms\n    LLaMA 2 in some.\n  created_date: 2023-12-12\n  dependencies:\n  - SlimPajama\n  - StarCoder\n  description: CrystalCoder is a language model with a balance of code and text data\n    that follows the initiative under LLM360 of its training process being fully transparent.\n  feedback: https://huggingface.co/LLM360/CrystalCoder/discussions\n  intended_uses: to support open and collaborative AI research by making the full\n    LLM training process transparent.\n  license: Apache 2.0\n  modality: text; code, text\n  model_card: https://huggingface.co/LLM360/CrystalCoder\n  monitoring: unknown\n  name: CrystalCoder\n  nationality: International\n  organization: LLM360\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: Trained on the Cerebras Condor Galaxy 1 (CG-1), a 4 exaFLOPS,\n    54 million core, 64-node cloud AI supercomputer.\n  training_time: unknown\n  type: model\n  url: https://www.llm360.ai/\n- access: open\n  analysis: Evaluated on the LLM360 Performance and Evaluation Collection that checks\n    standard best practice benchmarks, medical, math, and coding knowledge.\n  created_date: 2024-05-29\n  dependencies: []\n  description: K2 is a 65 billion parameter large language model inspired by the Llama\n    2 65B model. The model is also supported with a suite of research tools, tutorials\n    and step-by-step guides for learning pre-training and fine-tuning techniques.\n  feedback: https://huggingface.co/LLM360/K2/discussions\n  intended_uses: The model is intended for learning pre-training techniques or enhancing\n    research capabilities in large language models.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/LLM360/K2\n  monitoring: unknown\n  name: K2\n  nationality: International\n  organization: LLM360\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: 65B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.llm360.ai/paper2.pdf\n"
  },
  {
    "path": "assets/lmsys.yaml",
    "content": "- access: open\n  analysis: Evaluated against similar LLMs using GPT-4 as a judge.\n  created_date: 2023-03-30\n  dependencies:\n  - LLaMA\n  - ShareGPT conversations data\n  description: An open-source chatbot trained by fine-tuning LLaMA on user-shared\n    conversations collected from ShareGPT.\n  feedback: https://huggingface.co/datasets/bigcode/the-stack/discussions\n  intended_uses: research on LLMs and chatbots\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/lmsys/vicuna-13b-delta-v0\n  monitoring: ''\n  name: Vicuna\n  nationality: USA\n  organization: LMSYS\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: ''\n  training_hardware: 8 A100 GPUs\n  training_time: 1 day\n  type: model\n  url: https://lmsys.org/blog/2023-03-30-vicuna/\n"
  },
  {
    "path": "assets/mathai.yaml",
    "content": "- access: open\n  analysis: Mistral model fine-tuned on AutoMathText and evaluated on the MATH dataset.\n  created_date: 2024-02-12\n  datasheet: https://huggingface.co/datasets/math-ai/AutoMathText\n  dependencies:\n  - OpenWebMath\n  - RedPajama-Data\n  - Algebraic Stack\n  - Qwen\n  description: AutoMathText is an extensive and carefully curated dataset encompassing\n    around 200 GB of mathematical texts.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/math-ai/AutoMathText/discussions\n  included: ''\n  intended_uses: ''\n  license: CC BY-SA 4.0\n  modality: text\n  monitoring: unknown\n  name: AutoMathText\n  nationality: USA\n  organization: Math AI\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 200 GB\n  type: dataset\n  url: https://github.com/yifanzhang-pro/AutoMathText\n"
  },
  {
    "path": "assets/maya.yaml",
    "content": "- access: open\n  analysis: Evaluated on the OpenLLM leaderboard, releasing at rank number 4 on the\n    leaderboard.\n  created_date: 2023-08-11\n  dependencies:\n  - LLaMA 2\n  - Guanaco LLaMA dataset\n  description: GodziLLa 2 is an experimental combination of various proprietary LoRAs\n    from Maya Philippines and Guanaco LLaMA 2 1K dataset, with LLaMA 2.\n  feedback: none\n  intended_uses: ''\n  license: LLaMA 2\n  modality: text; text\n  model_card: https://huggingface.co/MayaPH/GodziLLa2-70B\n  monitoring: unknown\n  name: GodziLLa 2\n  nationality: Philippines\n  organization: Maya Philippines\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/MayaPH/GodziLLa2-70B\n- access:\n    explanation: The first 200 million tokens are free.\n    value: limited\n  analysis: We evaluated voyage-code-3 using an enhanced suite of evaluation datasets\n    designed to address the shortcomings of existing benchmarks and deliver practical,\n    robust results.\n  created_date: 2024-12-04\n  dependencies: []\n  description: Introducing voyage-code-3, our next-generation embedding model optimized\n    for code retrieval.\n  feedback: \"If you\\u2019re also interested in fine-tuned embedding models, we\\u2019\\\n    d love to hear from you\\u2014please email us at contact@voyageai.com.\"\n  intended_uses: optimized for code retrieval\n  license: unknown\n  modality: unknown\n  model_card: unknown\n  monitoring: unknown\n  name: voyage-code-3\n  nationality: USA\n  organization: Voyage AI\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://blog.voyageai.com/2024/12/04/voyage-code-3/\n"
  },
  {
    "path": "assets/meta.yaml",
    "content": "- access:\n    explanation: Models are available for download from [[GitHub repository]](https://github.com/facebookresearch/esm#available-models)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the [[model paper]](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2.full.pdf+html)\n      was released\n    value: 2022-10-31\n  dependencies:\n  - UniRef50\n  - UniRef90\n  description: ESM-2 is a series of protein language models trained on protein sequences\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'The license is provided in the [[Github repository]](https://github.com/facebookresearch/esm#available-models)\n\n      '\n    value: MIT\n  modality: text; protein sequence\n  model_card: none\n  monitoring: ''\n  name: ESM-2\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: 15B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2.full.pdf+html\n- access: closed\n  analysis: none\n  created_date:\n    explanation: The date the model paper was released\n    value: 2021-12-08\n  datasheet: none\n  dependencies:\n  - COCO\n  - YFCC100M\n  - SBU Captions\n  - Localized Narratives\n  - Visual Genome\n  - Wikipedia\n  - Conceptual Captions\n  - Red Caps\n  description: PMD (Public Multimodal Datasets) is a collection of image-text datasets\n    introduced in the FLAVA work.\n  excluded: YFCC100M is filtered for non-English captions and very short (< 2 word)\n    captions.\n  feedback: none\n  included: none\n  intended_uses: unknown\n  license: unknown\n  modality: image, text\n  monitoring: none\n  name: PMD\n  nationality: USA\n  organization: Meta\n  prohibited_uses: unknown\n  quality_control: Beyond filtering mentioned in excluded, nothing further is done.\n  sample: []\n  size: 70M\n  type: dataset\n  url: https://arxiv.org/abs/2112.04482\n- access:\n    explanation: 'Model checkpoints are available for download from the [[HuggingFace\n      repository]](https://huggingface.co/facebook/flava-full)\n\n      '\n    value: open\n  analysis: FLAVA is benchmarked on a range of vision-only (e.g. CIFAR-10), language-only\n    (e.g. GLUE), and multimodal (e.g. Hateful Memes) standard evaluations.\n  created_date:\n    explanation: The date the model paper was released\n    value: 2021-12-08\n  dependencies:\n  - PMD\n  description: FLAVA is a multimodal model composed of an image encoder, text encoder,\n    and multimodal encoder.\n  feedback: https://huggingface.co/facebook/flava-full/discussions\n  intended_uses: 'Per the [[HuggingFace repository]](https://huggingface.co/facebook/flava-full),\n    \"The model is intended to serve as a reproducible research artifact for research\n    communities in the light of models whose exact reproduction details are never\n    released such as CLIP and SimVLM.\"\n\n    '\n  license:\n    explanation: 'The license is provided in the [[HuggingFace repository]](https://huggingface.co/facebook/flava-full)\n\n      '\n    value: BSD-3-Clause\n  modality: image, text\n  model_card: https://huggingface.co/facebook/flava-full\n  monitoring: none\n  name: FLAVA\n  nationality: USA\n  organization: Meta\n  prohibited_uses: 'Per the [[HuggingFace repository]](https://huggingface.co/facebook/flava-full),\n    \"Any deployed use case of the model - whether commercial or not\" - is currently\n    out of scope.\n\n    '\n  quality_control: FLAVA introduces a variety of new modeling techniques, specifically\n    with an interest in improved text-image alignment through contrastive objectives.\n  size:\n    explanation: '110M (Language encoder) + 86M (Vision encoder) + 110M (mul encoder)\n\n      '\n    value: 306M\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/abs/2112.04482\n- access: closed\n  analysis: none\n  created_date:\n    explanation: 'The date the Galactica paper was released\n\n      '\n    value: 2022-11-15\n  datasheet: none\n  dependencies:\n  - CommonCrawl\n  - Wikipedia\n  - arXiv\n  description: The Galactica Corpus is a collection of scientific datasets introduced\n    in the Galactica work.\n  excluded: ''\n  feedback: none\n  included: Prompts and reasoning data is explicitly included to improve model capabilities\n    derived from this data.\n  intended_uses: unknown\n  license: unknown\n  modality: text\n  monitoring: none\n  name: The Galactica Corpus\n  nationality: USA\n  organization: Meta\n  prohibited_uses: unknown\n  quality_control: ''\n  sample: []\n  size: 106B tokens\n  type: dataset\n  url: https://galactica.org/static/paper.pdf\n- access:\n    explanation: Model checkpoints freely available at https://github.com/paperswithcode/galai\n    value: open\n  analysis: ''\n  created_date:\n    explanation: 'The date the Galactica paper was released\n\n      '\n    value: 2022-11-15\n  dependencies:\n  - The Galactica Corpus\n  description: Galactica is a family of autoregressive language models.\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: https://github.com/paperswithcode/galai/blob/main/LICENSE-MODEL.md\n    value: CC BY-NC 4.0\n  modality: code, text; code, text\n  model_card: https://huggingface.co/facebook/galactica-6.7b\n  monitoring: ''\n  name: Galactica\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: 120B parameters (dense)\n  training_emissions: unknown\n  training_hardware: Meta AI Cluster. Trained on 1024 80GB A100 GPUs (128 8xA100 80GB\n    nodes)\n  training_time: unknown\n  type: model\n  url: https://galactica.org/static/paper.pdf\n- access:\n    explanation: Model weights are available via the [[HuggingFace repository]](https://huggingface.co/facebook/incoder-6B)\n    value: open\n  analysis: none\n  created_date:\n    explanation: The date the model paper was released\n    value: 2022-04-12\n  dependencies: []\n  description: InCoder is a language model trained on code with a causal masking objective\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: The license is provided in the [[HuggingFace repository]](https://huggingface.co/facebook/incoder-6B?text=My+name+is+Lewis+and+I+like+to)\n    value: CC BY-NC 4.0\n  modality: text; code\n  model_card: none\n  monitoring: ''\n  name: InCoder\n  nationality: unknown\n  organization: Meta, CMU, TTI-Chicago, UC Berkeley, University of Washington\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 6B parameters (dense)\n  training_emissions: Unknown\n  training_hardware: 248 V100 GPUs, according to [[the paper]](https://arxiv.org/pdf/2204.05999.pdf)\n  training_time: 24 days, according to [[the paper]](https://arxiv.org/pdf/2204.05999.pdf)\n  type: model\n  url: https://arxiv.org/abs/2204.05999\n- access:\n    explanation: The 175B model requires manual approval from Meta to access. Other\n      models are available through HuggingFace.\n    value: limited\n  analysis: ''\n  created_date:\n    explanation: 'The date the OPT paper was submitted to Arxiv\n\n      '\n    value: 2022-05-01\n  dependencies:\n  - RoBERTa dataset\n  - The Pile\n  - PushShift.io Reddit\n  description: OPT is a family of autoregressive language models.\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'All released with the [[OPT-175B License]](https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/MODEL_LICENSE.md),\n      except 66B (TBD) and 17B (requires manual approval)\n\n      '\n    value: OPT-175B License\n  modality: text; text\n  model_card: https://arxiv.org/pdf/2205.01068.pdf\n  monitoring: ''\n  name: OPT\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: 175B parameters (dense)\n  training_emissions:\n    explanation: 'Estimate by authors for the OPT-175B model only. Not including ablations\n      and baselines.\n\n      '\n    value: 75 tCO2e\n  training_hardware: Meta AI cluster. Trained on 992 80GB A100 GPUs\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2205.01068\n- access:\n    explanation: 'The datasets involved are public, but the full dataset is not directly\n      available, nor are filtering scripts.\n\n      '\n    value: limited\n  analysis: ''\n  created_date:\n    explanation: 'The date that Make-A-Video was posted to arXiv [[arXiv]] (https://arxiv.org/abs/2209.14792).\n\n      '\n    value: 2022-09-29\n  datasheet: none\n  dependencies:\n  - LAION-5B\n  - WebVid-10M\n  - HD-VILA-100M\n  description: 'The Make-A-Video dataset is the dataset used to train Make-A-Video,\n    which includes both image-text and video-only datasets with specific and significant\n    filtering.\n\n    '\n  excluded: 'The LAION-5B dataset is filtered to 2.3B by removing NSFW images using\n    [https://github.com/GantMan/nsfw](https://github.com/GantMan/nsfw), toxic words\n    in text, and images with watermark probability > 0.5. The HD-VILA-100M is randomly\n    subsampled to 10M video clips.\n\n    '\n  feedback:\n    explanation: 'No feedback mechanism is mentioned by the authors.\n\n      '\n    value: none\n  included:\n    explanation: 'Data from the three underlying datasets is filtered, but nothing\n      is included beyond this.\n\n      '\n    value: none\n  intended_uses: unknown\n  license:\n    explanation: 'No license was found, though the underlying datasets are public\n      and have licenses.\n\n      '\n    value: none\n  modality: image, text, video\n  monitoring:\n    explanation: 'There is no information on how Meta is internally monitoring the\n      use of the dataset.\n\n      '\n    value: unknown\n  name: Make-A-Video dataset\n  nationality: USA\n  organization: Meta\n  prohibited_uses: unknown\n  quality_control: 'The authors exclude NSFW, toxic, and likely watermarked data from\n    LAION-5B.\n\n    '\n  sample: []\n  size: 20M video clips, 2.3B image-text pairs\n  type: dataset\n  url: https://arxiv.org/pdf/2209.14792.pdf\n- access:\n    explanation: 'The model has not been released; a form existed to potentially acquire\n      access but is now closed as of 2022-12-07 [[Access Form]](https://docs.google.com/forms/u/0/d/e/1FAIpQLSfMjC57wcXWUDV0UbS2Tn6VhjLEiCXaHvWZuWgWRa-Zx8-Few/closedform).\n\n      '\n    value: closed\n  analysis: 'Model performance was evaluated using automated (Frechet Video Distance;\n    Frechet Inception Distance) and human evaluation on two datasets (UCF-101, MSR-VTT)\n    in the zero-shot setting.\n\n    '\n  created_date:\n    explanation: 'The date that Make-A-Video was posted to arXiv [[arXiv]] (https://arxiv.org/abs/2209.14792).\n\n      '\n    value: 2022-09-29\n  dependencies:\n  - Make-A-Video dataset\n  description: 'Make-A-Video is a model for Text-to-Video Generation without Text-Video\n    Data.\n\n    '\n  feedback: none\n  intended_uses: unknown\n  license: none\n  modality: text; video\n  model_card: none\n  monitoring: unknown\n  name: Make-A-Video\n  nationality: USA\n  organization: Meta\n  prohibited_uses: unknown\n  quality_control:\n    explanation: 'Authors do not report specific quality control steps taken in modeling,\n      though filtering is done in producing the Make-A-Video dataset.\n\n      '\n    value: none\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2209.14792.pdf\n- access: open\n  analysis: ''\n  created_date: 2023-02-24\n  dependencies:\n  - CommonCrawl\n  - C4\n  - Github\n  - Wikipedia\n  - BooksCorpus\n  - arXiv\n  - StackExchange\n  description: LLaMA is a collection of foundation language models ranging from 7B\n    to 65B parameters trained our on trillions of tokens. The LLaMA models show that\n    it is possible to train state-of-the-art models using publicly available datasets\n    exclusively, without resorting to proprietary and inaccessible datasets.\n  feedback: ''\n  intended_uses: ''\n  license: LLaMa License (model weights), GPLv3 (code)\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: LLaMA\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: 65B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2302.13971\n- access: open\n  analysis: Evaluated on standard academic benchmarks and internal Meta libraries.\n  created_date: 2023-07-18\n  dependencies: []\n  description: Llama 2 is an updated version of LLaMA trained on a new mix of publicly\n    available data.\n  feedback: ''\n  intended_uses: Llama 2 is intended for commercial and research use in English. Tuned\n    models are intended for assistant-like chat, whereas pretrained models can be\n    adapted for a variety of natural language generation tasks.\n  license:\n    explanation: The license can be found at https://ai.meta.com/resources/models-and-libraries/llama-downloads/\n    value: custom\n  modality: text; text\n  model_card: Can be found at appendix of paper at https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/\n  monitoring: ''\n  name: Llama 2\n  nationality: USA\n  organization: Meta\n  prohibited_uses: Use in any manner that violates applicable laws or regulations\n    (including trade compliance laws). Use in languages other than English. Use in\n    any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement\n    for Llama 2.\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: 539 tCO2eq\n  training_hardware: NVIDIA A100-80GB GPUs (TDP of 350-400W)\n  training_time: ''\n  type: model\n  url: https://ai.meta.com/resources/models-and-libraries/llama/\n- access: open\n  analysis: ''\n  created_date: 2022-12-22\n  dependencies:\n  - OPT\n  - OPT-IML Bench\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: OPT-IML 175B License\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: OPT-IML\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: 175B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2212.12017\n- access:\n    explanation: 'The full dataset can be downloaded at [[SA-1B Download]](https://ai.facebook.com/datasets/segment-anything-downloads/).\n      A 50k image preview of the full dataset is available [[here]](https://segment-anything.com/dataset/index.html).\n\n      '\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the [[Meta blog post]](https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/)\n      was released.\n    value: 2023-04-05\n  datasheet:\n    explanation: Datasheet can be found in the Appendix section of the Segment Anything\n      paper.\n    value: https://arxiv.org/pdf/2304.02643.pdf#page=25\n  dependencies: []\n  description: 'SA-1B (Segment Anything 1 Billion) is a dataset designed for training\n    general-purpose object segmentation models from open world images. It consists\n    of 11M diverse, high-resolution, privacy protecting images and 1.1B high-quality\n    segmentation masks.\n\n    '\n  excluded:\n    explanation: See [[Datasheet]](https://arxiv.org/pdf/2304.02643.pdf#page=25)\n    value: '\"We withheld ~2k randomly selected images for testing purposes.\"  \"Each\n      image is accompanied by a short caption that describes the content and place\n      of the photo in a free form text. Per our agreement with the photo provider\n      we are not allowed to release these captions.\"\n\n      '\n  feedback: Feedback can be given via the feedback form on their website [segment-anything.com](https://segment-anything.com/)\n    or by emailing at segment-anything at meta.com.\n  included:\n    explanation: According to section [[Segment Anything Dataset]](https://arxiv.org/pdf/2304.02643.pdf#section.5)\n      of the paper and [[SA-1B website]](https://ai.facebook.com/datasets/segment-anything/).\n    value: \"SA-1B consists of 11M diverse, high-resolution (averaging 1500\\xD72250\\\n      \\ pixels), and privacy protecting images collected and licensed from a third\\\n      \\ party photo company. The images are photos taken from a camera, i.e. not artwork.\\\n      \\ The images vary in subject matter. Common themes of the images include: locations,\\\n      \\ objects, scenes. The dataset includes 1.1B high-quality segmentation masks\\\n      \\ collected with the Segment Anything Data Engine. SA-1B only includes automatically\\\n      \\ generated masks (99.1%), as the authors conclude after experiments that the\\\n      \\ automatic masks are high quality and effective for training models. The masks\\\n      \\ range from large scale objects such as buildings to fine grained details such\\\n      \\ as door handles. Masks are provided in the COCO run-length encoding (RLE)\\\n      \\ annotation format.\\n\"\n  intended_uses:\n    explanation: See [[SA-1B website]](https://ai.facebook.com/datasets/segment-anything/)\n    value: SA-1B is intended to be used for research purposes only. It allows access\n      to a privacy protecting and copyright friendly large-scale image dataset. Researchers\n      can use it to train and evaluate generic object segmentation models.\n  license:\n    explanation: SA-1B is released under a favorable license agreement for certain\n      research uses and with protections for researchers. See [[SA-1B Dataset Research\n      License]](https://ai.facebook.com/datasets/segment-anything-downloads/).\n    value: SA-1B Dataset Research License\n  modality: image\n  monitoring:\n    explanation: See [[Datasheet]](https://arxiv.org/pdf/2304.02643.pdf#page=25)\n    value: 'The dataset will be hosted at https://ai.facebook.com/datasets/segment-anything\n      and maintained by Meta AI. \"If a user observes objectionable image(s) in the\n      dataset, we invite them to report the image(s) at segment-anything at meta.com\n      for removal\" \"To aid reproducibility of research using SA-1B, the only updates\n      (to the dataset) will be to remove reported images.\" \"We encourage users to\n      gather further annotations for SA-1B. Any users who generate annotations will\n      be liable for hosting and distributing their annotations.\"\n\n      '\n  name: SA-1B\n  nationality: USA\n  organization: Meta\n  prohibited_uses:\n    explanation: See [[Datasheet]](https://arxiv.org/pdf/2304.02643.pdf#page=25)\n    value: \"Authors note the following limitations of the dataset:\\n  The masks are\\\n      \\ generated by a segmentation model, so there may be errors\\nor inconsistencies\\\n      \\ in the masks.\\n  While no two images are the same, there are instances of\\\n      \\ images of the same\\nsubject taken close together in time.\\n  The dataset contains\\\n      \\ scenes of protests, or other gatherings that may suggest\\nreligious beliefs,\\\n      \\ political opinions or union memberships that may be offensive.\\n\"\n  quality_control:\n    explanation: According to sections [[Segment Anything Dataset]](https://arxiv.org/pdf/2304.02643.pdf#section.5)\n      and [[Datasheet]](https://arxiv.org/pdf/2304.02643.pdf#page=25) of the paper.\n    value: \"- Dataset quality:\\n  Due to potential accessibility and storage challenges,\\\n      \\ the original high-resolution images (averaging 3300\\xD74950 pixels) were downsampled\\\n      \\ to an average resolution of 1500\\xD72250 pixels. Authors note that despite\\\n      \\ the downsampling, the images remain significantly higher in resolution than\\\n      \\ those in many existing vision datasets, such as COCO, where images are typically\\\n      \\ around 480\\xD7640 pixels.\\n  The images were processed to blur faces and license\\\n      \\ plates to protect the identities of those in the image.\\n  To estimate the\\\n      \\ quality of the masks in the images, a random sample of 500 images (\\u223C\\\n      50k masks) was taken and professional annotators were asked to improve the quality\\\n      \\ of all masks in those images.\\n- Safety measures:\\n  Authors implemented two\\\n      \\ safety measures to prevent objectionable content:\\n    (1) Photos are licensed\\\n      \\ from a photo provider and had to meet the terms of service of the photo provider.\\\n      \\ Authors requested that all objectionable content be filtered from the images\\\n      \\ they licensed.\\n    (2) Users who observe objectionable images in the dataset\\\n      \\ are invited to report them for removal at segment-anything@meta.com.\\n  Despite\\\n      \\ these measures, they observed that a small portion of images contain scenes\\\n      \\ of protests or other gatherings that focus on a diverse spectrum of religious\\\n      \\ beliefs or political opinions that may be considered offensive. The authors\\\n      \\ were unable to produce a filtering strategy that removes all such images and\\\n      \\ rely on user reports to mitigate this type of content.\\n\"\n  sample: []\n  size: 11M images, 1.1B mask annotations\n  type: dataset\n  url: https://ai.facebook.com/datasets/segment-anything/\n- access:\n    explanation: 'Inference code and model checkpoints are available on the model''s\n      [[GitHub repository]](https://github.com/facebookresearch/segment-anything).\n      Its training dataset SA-1B can be used for research purposes and is available\n      for download [here](https://ai.facebook.com/datasets/segment-anything-downloads/).\n\n      '\n    value: open\n  analysis:\n    explanation: See [[Zero-Shot Transfer Experiments]](https://arxiv.org/pdf/2304.02643.pdf#section.7)\n      for more details.\n    value: '\"We extensively evaluate SAM. First, using a diverse new suite of 23 segmentation\n      datasets, we find that SAM produces high-quality masks from a single foreground\n      point, often only slightly below that of the manually annotated ground truth.\n      Second, we find consistently strong quantitative and qualitative results on\n      a variety of downstream tasks under a zero-shot transfer protocol using prompt\n      engineering, including edge detection, object proposal generation, instance\n      segmentation, and a preliminary exploration of text-to-mask prediction.\"\n\n      '\n  created_date:\n    explanation: The date the [[Meta blog post]](https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/)\n      was released.\n    value: 2023-04-05\n  dependencies:\n  - SA-1B\n  description: SAM (Segment Anything Model) is a foundation model for image segmentation.\n    The model is designed and trained to be promptable, and supports flexible prompts\n    (point, box, mask and free-form text) to compute masks in real-time to allow interactive\n    use.\n  feedback: Feedback can be given via the feedback form on their website [segment-anything.com](https://segment-anything.com/)\n    or by emailing at segment-anything at meta.com.\n  intended_uses:\n    explanation: See [[Model card]](https://arxiv.org/pdf/2304.02643.pdf#page=28)\n    value: '\"SAM is intended to be used for any prompt-based segmentation task. We\n      explored its use in segmenting objects from a point, edge detection, segmenting\n      all objects, and segmenting detected objects. We explored how SAM can integrate\n      with other vision models to segment objects from text.\"\n\n      '\n  license:\n    explanation: See [[LICENSE]](https://github.com/facebookresearch/segment-anything/blob/main/LICENSE)\n    value: Apache 2.0\n  modality: image, text; image\n  model_card:\n    explanation: Model card can be found in the Appendix section of the paper.\n    value: https://arxiv.org/pdf/2304.02643.pdf#page=28\n  monitoring: ''\n  name: SAM\n  nationality: USA\n  organization: Meta\n  prohibited_uses:\n    explanation: See [[Discussion]](https://arxiv.org/pdf/2304.02643.pdf#section.8)\n    value: \"For out-of-scope use cases see terms of use in [[LICENSE]](https://github.com/facebookresearch/segment-anything/blob/main/LICENSE).\\\n      \\ Authors also discuss the following limitations of the model: \\\"While SAM performs\\\n      \\ well in general, it is not perfect. It can miss fine structures, hallucinates\\\n      \\ small disconnected components at times, and does not produce boundaries as\\\n      \\ crisply as more computationally intensive methods that \\u201Czoom-in\\u201D\\\n      , e.g. [18]. In general, we expect dedicated interactive segmentation methods\\\n      \\ to outperform SAM when many points are provided, e.g. [67]. Unlike these methods,\\\n      \\ SAM is designed for generality and breadth of use rather than high IoU interactive\\\n      \\ segmentation. Moreover, SAM can process prompts in real-time, but nevertheless\\\n      \\ SAM's overall performance is not real-time when using a heavy image encoder.\\\n      \\ Our foray into the text-to-mask task is exploratory and not entirely robust,\\\n      \\ although we believe it can be improved with more effort. While SAM can perform\\\n      \\ many tasks, it is unclear how to design simple prompts that implement semantic\\\n      \\ and panoptic segmentation. Finally, there are domain-specific tools, such\\\n      \\ as [7], that we expect to outperform SAM in their respective domains.\\\"\\n\"\n  quality_control:\n    explanation: See [[Segment Anything RAI Analysis]](https://arxiv.org/pdf/2304.02643.pdf#section.6)\n      for more details.\n    value: '\"We perform a Responsible AI (RAI) analysis of our work by investigating\n      potential fairness concerns and biases when using SA-1B and SAM. We focus on\n      the geographic and income distribution of SA-1B and fairness of SAM across protected\n      attributes of people.\"\n\n      '\n  size: unknown\n  training_emissions:\n    explanation: See [[Model card]](https://arxiv.org/pdf/2304.02643.pdf#page=28)\n    value: 2.8 metric tons of carbon dioxide\n  training_hardware:\n    explanation: See [[Model card]](https://arxiv.org/pdf/2304.02643.pdf#page=28)\n    value: 256 A100 GPUs\n  training_time:\n    explanation: See [[Model card]](https://arxiv.org/pdf/2304.02643.pdf#page=28)\n    value: 68 hours\n  type: model\n  url: https://arxiv.org/pdf/2304.02643.pdf\n- access: closed\n  analysis: Evaluated on zero-shot text-to-speech benchmarks, with Voicebox outperforming\n    the current state-of-the-art English model VALL-E.\n  created_date: 2023-06-16\n  dependencies: []\n  description: Voicebox is the first generative AI model for speech to generalize\n    across tasks with state-of-the-art performance.\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  modality: audio; text\n  model_card: ''\n  monitoring: ''\n  name: Voicebox\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: 330M parameters (dense)\n  training_emissions: unknown\n  training_hardware: 32 GPUs of unspecified type\n  training_time: 750,000 iterations\n  type: model\n  url: https://research.facebook.com/publications/voicebox-text-guided-multilingual-universal-speech-generation-at-scale/\n- access: open\n  analysis: PEER is evaluated on core research questions intended to gauge language\n    understanding, proper use of citations, instruction following, and iterative use.\n  created_date: 2022-08-24\n  dependencies: []\n  description: PEER is a collaborative language model that is trained to imitate the\n    entire writing process itself. PEER can write drafts, add suggestions, propose\n    edits and provide explanations for its actions.\n  feedback: ''\n  intended_uses: adapting LLMs to work with collaborative writing and updating.\n  license: ''\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: PEER\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: Heuristics and edit filtering was used on data set, which consisted\n    mostly of Wikipedia pages.\n  size: 3B parameters (dense)\n  training_emissions: ''\n  training_hardware: 64 GPUs\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2208.11663.pdf\n- access: open\n  analysis: MusicGen was evaluated on standard music benchmarks of Frechet Audio Distance,\n    Kullback-Leibler Divergence, and its CLAP score.\n  created_date: 2023-08-02\n  dependencies:\n  - Meta Music Initative Sound Collection\n  - Shutterstock music collection\n  - Pond5 music collection\n  description: MusicGen is a simple and controllable model for music generation that\n    doesn't require self-supervised semantic representation\n  feedback: https://huggingface.co/spaces/facebook/MusicGen/discussions\n  intended_uses: The primary use of MusicGen is research on AI-based music generation\n  license: MIT\n  modality:\n    explanation: text; audio\n    value: audio, text; audio, text\n  model_card: https://github.com/facebookresearch/audiocraft/blob/main/model_cards/MUSICGEN_MODEL_CARD.md\n  monitoring: ''\n  name: MusicGen\n  nationality: USA\n  organization: Meta\n  prohibited_uses: The model should not be used on downstream applications without\n    further risk evaluation and mitigation. The model should not be used to intentionally\n    create or disseminate music pieces that create hostile or alienating environments\n    for people. This includes generating music that people would foreseeably find\n    disturbing, distressing, or offensive; or content that propagates historical or\n    current stereotypes.\n  quality_control: ''\n  size: 3.3B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://huggingface.co/spaces/facebook/MusicGen/tree/main\n- access: open\n  analysis: Evaluated on Frechet Audio Distance and Kullback-Leibler Divergence as\n    well as qualitative studies with human participants.\n  created_date: 2023-08-02\n  dependencies:\n  - AudioSet\n  - BBC sound effects\n  - AudioCaps\n  - Clotho v2\n  - VGG-Sound\n  - FSD50K\n  - Free To Use Sounds\n  - Sonniss Game Effects\n  - WeSoundEffects\n  - Paramount Motion - Odeon Cinematic Sound Effects\n  description: AudioGen is an auto-regressive generative model that generates audio\n    samples conditioned on text inputs\n  feedback: https://huggingface.co/facebook/audiogen-medium/discussions\n  intended_uses: The primary use of AudioGen is research on AI-based audio generation.\n  license: MIT\n  modality:\n    explanation: text; audio\n    value: audio, text; audio, text\n  model_card: https://github.com/facebookresearch/audiocraft/blob/main/model_cards/AUDIOGEN_MODEL_CARD.md\n  monitoring: ''\n  name: AudioGen\n  nationality: USA\n  organization: Meta\n  prohibited_uses: The model should not be used on downstream applications without\n    further risk evaluation and mitigation. The model should not be used to intentionally\n    create or disseminate audio pieces that create hostile or alienating environments\n    for people. This includes generating audio that people would foreseeably find\n    disturbing, distressing, or offensive; or content that propagates historical or\n    current stereotypes.\n  quality_control: ''\n  size: 1.5B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://felixkreuk.github.io/audiogen/paper.pdf\n- access: closed\n  analysis: Emu significantly outperforms a publicly available state-of-the-art model\n    SDXLv1.0 on visual appeal when compared on standard benchmarks.\n  created_date: 2023-09-27\n  dependencies:\n  - CLIP\n  - T5\n  description: Emu is a pre-trained latent diffusion model on 1.1 billion image-text\n    pairs and fine-tuned with only a few thousand carefully selected high-quality\n    images.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; image\n  model_card: none\n  monitoring: ''\n  name: Emu\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.5B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://ai.meta.com/research/publications/emu-enhancing-image-generation-models-using-photogenic-needles-in-a-haystack/\n- access: open\n  analysis: Evaluated on several code benchmarks like HumanEval and MBPP.\n  created_date: 2023-08-24\n  dependencies:\n  - Llama 2\n  description: Code Llama is a collection of pretrained and fine-tuned generative\n    text models ranging in scale from 7 billion to 34 billion parameters.\n  feedback: https://huggingface.co/allenai/codetulu-2-13b/discussions\n  intended_uses: Code Llama and its variants is intended for commercial and research\n    use in English and relevant programming languages.\n  license: Llama 2\n  modality: text; code, text\n  model_card: https://huggingface.co/codellama/CodeLlama-34b-hf\n  monitoring: ''\n  name: Code LLaMA\n  nationality: USA\n  organization: Meta\n  prohibited_uses: Use in any manner that violates applicable laws or regulations\n    (including trade compliance laws). Use in languages other than English. Use in\n    any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement\n    for Code Llama and its variants.\n  quality_control: ''\n  size: 34B parameters (dense)\n  training_emissions: 65.3 tCO2eq\n  training_hardware: A100-80GB GPUs\n  training_time: 400K GPU hours\n  type: model\n  url: https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/\n- access: closed\n  analysis: Analyzed against nearest neighbor model baseline and by extending the\n    video length.\n  created_date: 2023-11-16\n  dependencies:\n  - Emu\n  - CLIP\n  - T5\n  description: Emu Video is a text-to-video generation model that factorizes the generation\n    into two steps, first generating an image conditioned on the text, and then generating\n    a video conditioned on the text and the generated image.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; video\n  model_card: none\n  monitoring: ''\n  name: Emu Video\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: 6B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://emu-video.metademolab.com/\n- access: closed\n  analysis: Evaluated on test set of actions in comparison to SoTA image editing models.\n  created_date: 2023-11-16\n  dependencies:\n  - Emu\n  - CLIP\n  - T5\n  description: Emu Edit is a multi-task image editing model which sets state-of-the-art\n    results in instruction-based image editing.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; image\n  model_card: none\n  monitoring: ''\n  name: Emu Edit\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://emu-edit.metademolab.com/\n- access: open\n  analysis: Evaluated in comparison to CLIP.\n  created_date: 2023-10-02\n  dependencies:\n  - Common Crawl\n  description: MetaCLIP is a more transparent rendition of CLIP that aims to reveal\n    CLIP's training data curation methods.\n  feedback: none\n  intended_uses: ''\n  license: CC-BY-NC-4.0\n  modality: text; text\n  model_card: https://huggingface.co/facebook/metaclip-b32-400m\n  monitoring: none\n  name: MetaCLIP\n  nationality: USA\n  organization: Meta\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2103.00020.pdf\n- access: open\n  analysis: The models were evaluated based on their performance on standard benchmarks\n    and real-world scenarios. These evaluations were performed using a high-quality\n    human evaluation set containing 1,800 prompts covering multiple use cases. The\n    models also went through red-teaming for safety, where human experts and automated\n    methods were used to generate adversarial prompts to test for problematic responses.\n  created_date: 2024-04-18\n  dependencies: []\n  description: Llama 3 is the third generation of Meta AI's open-source large language\n    model. It comes with pretrained and instruction-fine-tuned language models with\n    8B and 70B parameters that can support a broad range of use cases.\n  feedback: Feedback is encouraged from users to improve the model, but the feedback\n    mechanism is not explicitly described.\n  intended_uses: Llama 3 is intended for a broad range of use cases, including AI\n    assistance, content creation, learning, and analysis.\n  license:\n    explanation: Can be found at https://github.com/meta-llama/llama3/blob/main/LICENSE\n    value: Llama 3\n  modality: text; text\n  model_card: https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md\n  monitoring: Extensive internal and external performance evaluation and red-teaming\n    approach for safety testing.\n  name: Llama 3\n  nationality: USA\n  organization: Meta\n  prohibited_uses: unknown\n  quality_control: Extensive internal and external testing for safety, and design\n    of new trust and safety tools.\n  size: 70B parameters\n  training_emissions: unknown\n  training_hardware: 2 custom-built Meta 24K GPU clusters\n  training_time: unknown\n  type: model\n  url: https://llama.meta.com/llama3/\n- access: open\n  analysis: Evaluated on a comprehensive range of tasks, including visual question\n    answering, image captioning, text generation, image generation, and long-form\n    mixed modal generation. Chameleon demonstrates broad and general capabilities,\n    including state-of-the-art performance in image captioning tasks, outperforms\n    Llama-2 in text-only tasks while being competitive with models such as Mixtral\n    8x7B and Gemini-Pro.\n  created_date: 2024-05-17\n  dependencies: []\n  description: Chameleon is a family of early-fusion token-based mixed-modal models\n    capable of understanding and generating images and text in any arbitrary sequence.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: image, text; image, text\n  model_card: none\n  monitoring: ''\n  name: Chameleon\n  nationality: USA\n  organization: Meta FAIR\n  prohibited_uses: ''\n  quality_control: ''\n  size: 34B parameters\n  training_emissions: unknown\n  training_hardware: Meta's Research Super Cluster (powered by NVIDIA A100 80GB GPUs)\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2405.09818\n- access: open\n  analysis: The model was evaluated on over 150 benchmark datasets that span a wide\n    range of languages. An experimental evaluation suggests that the model is competitive\n    with leading foundation models across a range of tasks. Also, smaller models of\n    Llama 3.1 405B are competitive with closed and open models that have a similar\n    number of parameters.\n  created_date: 2024-07-23\n  dependencies:\n  - Unknown\n  description: Llama 3.1 405B is the first openly available model that rivals the\n    top AI models when it comes to state-of-the-art capabilities in general knowledge,\n    steerability, math, tool use, and multilingual translation. With the release of\n    the 405B model, the Llama versions support advanced use cases, such as long-form\n    text summarization, multilingual conversational agents, and coding assistants.\n    It is the largest and most capable openly available foundation model.\n  feedback: Unknown\n  intended_uses: For advanced use cases, such as long-form text summarization, multilingual\n    conversational agents, and coding assistants. May also be useful in the development\n    of custom offerings and systems by developers.\n  license: Unknown\n  modality: text; text\n  model_card: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md\n  monitoring: Unknown\n  name: Llama 3.1 405B\n  nationality: USA\n  organization: Meta AI\n  prohibited_uses: Unknown\n  quality_control: The development process was focused on keeping the model scalable\n    and straightforward. It adopted an iterative post-training procedure, where each\n    round uses supervised fine-tuning and direct preference optimization. The model\n    also underwent quality assurance and filtering for pre-and post-training data.\n  size: 405B parameters (dense)\n  training_emissions: Unknown\n  training_hardware: Over 16 thousand H100 GPUs\n  training_time: Unknown\n  type: model\n  url: https://ai.meta.com/blog/meta-llama-3-1/\n- access:\n    explanation: Future versions of the tuned models will be released as we improve\n      model safety with community feedback.\n    value: open\n  analysis: Unknown\n  created_date: 2024-12-06\n  dependencies: []\n  description: The Meta Llama 3.3 multilingual large language model (LLM) is an instruction\n    tuned generative model in 70B (text in/text out).\n  feedback: Instructions on how to provide feedback or comments on the model can be\n    found in the model README.\n  intended_uses: Intended for commercial and research use in multiple languages. Instruction\n    tuned text only models are intended for assistant-like chat.\n  license:\n    explanation: A custom commercial license, the Llama 3.3 Community License Agreement\n    value: Llama 3.3 Community License Agreement\n  modality:\n    explanation: The Llama 3.3 instruction tuned text only model is optimized for\n      multilingual dialogue use cases.\n    value: text; text\n  model_card: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct\n  monitoring: Unknown\n  name: Llama 3.3\n  nationality: USA\n  organization: Meta\n  prohibited_uses: Use in any manner that violates applicable laws or regulations\n    (including trade compliance laws). Use in any other way that is prohibited by\n    the Acceptable Use Policy and Llama 3.3 Community License.\n  quality_control: Used \"supervised fine-tuning (SFT) and reinforcement learning with\n    human feedback (RLHF) to align with human preferences for helpfulness and safety.\"\n  size:\n    explanation: The Meta Llama 3.3 multilingual large language model (LLM) is an\n      instruction tuned generative model in 70B (text in/text out).\n    value: 70B parameters\n  training_emissions:\n    explanation: Training Greenhouse Gas Emissions Estimated total location-based\n      greenhouse gas emissions were 11,390 tons CO2eq for training.\n    value: 11,390 tons CO2eq\n  training_hardware:\n    explanation: Training utilized a cumulative of 39.3M GPU hours of computation\n      on H100-80GB (TDP of 700W) type hardware.\n    value: H100-80GB (TDP of 700W) type hardware\n  training_time:\n    explanation: Training utilized a cumulative of 39.3M GPU hours of computation\n      on H100-80GB (TDP of 700W) type hardware.\n    value: 39.3M GPU hours\n  type: model\n  url: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct\n"
  },
  {
    "path": "assets/microsoft.yaml",
    "content": "- access:\n    explanation: Microsoft does not provide VLMo to external researchers. One author\n      commented that code would be pushed \"soon\" in [[November of 2021]](https://github.com/microsoft/unilm/issues/532),\n      but the repository does not contain relevant changes.\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2021-11-03\n  dependencies:\n  - Conceptual Captions\n  - SBU Captions\n  - COCO\n  - Visual Genome\n  - Wikipedia\n  - BooksCorpus\n  description: VLMo is a model for text-to-image generation\n  feedback: ''\n  intended_uses: ''\n  license: none\n  modality: text; image\n  model_card: none\n  monitoring: ''\n  name: VLMo\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 562M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2111.02358\n- access:\n    explanation: Manual approval through early access request form required.\n    value: limited\n  analysis: ''\n  created_date:\n    explanation: Date model blog post was released\n    value: 2022-09-28\n  dependencies: []\n  description: T-ULRv5 is a language model trained with two unique training objectives\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: T-ULRv5\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 2.2B parameters (dense)\n  training_emissions: ''\n  training_hardware: 256 A100\n  training_time: Less than two weeks\n  type: model\n  url: https://www.microsoft.com/en-us/research/blog/microsoft-turing-universal-language-representation-model-t-ulrv5-tops-xtreme-leaderboard-and-trains-100x-faster/\n- access:\n    explanation: Manual approval through early access request form required.\n    value: limited\n  analysis: ''\n  created_date:\n    explanation: Date model blog post was released\n    value: 2021-12-02\n  dependencies: []\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Turing NLR-v5\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 5B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.microsoft.com/en-us/research/blog/efficiently-and-effectively-scaling-up-language-model-pretraining-for-best-language-representation-model-on-glue-and-superglue/?OCID=msr_blog_TNLRV5_tw\n- access:\n    explanation: Megatron-Turing NLG can be accessed through the [[Turing Academic\n      Program]](https://www.microsoft.com/en-us/research/collaboration/microsoft-turing-academic-program/)\n    value: limited\n  analysis: ''\n  created_date:\n    explanation: 'The date of the Microsoft Research blog announcing MT-NLG [[Microsoft\n      Research Blog]](https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/).\n\n      '\n    value: 2022-01-28\n  dependencies:\n  - The Pile\n  description: 'Megatron-Turing NLG is a 530B parameter autoregressive language model.\n\n    '\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Megatron-Turing NLG\n  nationality: USA\n  organization: Microsoft, NVIDIA\n  prohibited_uses: ''\n  quality_control: ''\n  size: 530B parameters (dense)\n  training_emissions: ''\n  training_hardware: 4480 A100s (560 x 8)\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2201.11990\n- access:\n    explanation: Microsoft does not provide public access to VALL-E\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: The date the [[model paper]](https://arxiv.org/abs/2301.02111) was\n      released\n    value: 2023-01-05\n  dependencies: []\n  description: Vall-E is a neural code model for text-to-speech synthesis\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; audio\n  model_card: none\n  monitoring: ''\n  name: VALL-E\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: 16 V100 32GB GPUs\n  training_time: ''\n  type: model\n  url: https://valle-demo.github.io/\n- access:\n    explanation: 'The feature is available to developers in a restricted technical\n      preview [[GitHub CoPilot]](https://copilot.github.com/).\n\n      '\n    value: limited\n  adaptation: unknown\n  created_date:\n    explanation: 'Date of the blog post introducing CoPilot [[GitHub Blog Post]] (https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/).\n\n      '\n    value: 2021-06-29\n  dependencies:\n  - Codex\n  description: 'GitHub CoPilot is a coding pair programmer assisting programmers as\n    they write code.\n\n    '\n  failures: unknown\n  feedback: 'Feedback can be provided in the CoPilot feedback project [[CoPilot feedback]]\n    (https://github.com/github/feedback/discussions/categories/copilot-feedback).\n\n    '\n  intended_uses: 'GitHub CoPilot is intended to be used as a coding assistant.\n\n    '\n  license: unknown\n  monitoring: \"value: unknown explanation: >\\n  There may be internal monitoring mechanisms\\\n    \\ unknown to the public.\\n\"\n  monthly_active_users: 'GitHub Copilot reportedly has over 1 million sign-ups [[Tweet\n    Source]](https://twitter.com/sama/status/1539737789310259200?s=21&t=YPaYd0ZueJzrR6rLslUqzg).\n\n    '\n  name: GitHub CoPilot\n  nationality: USA\n  organization: Microsoft\n  output_space: Code completions\n  prohibited_uses: 'Access to GPT-3 is governed by GitHub Acceptable Use Policies\n    and Terms of Service, both of which list a set of prohibited uses [[Use Policies]]\n    (https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies)\n    [[Terms of Service]] (https://docs.github.com/en/site-policy/github-terms/github-terms-of-service).\n\n    '\n  quality_control: 'GitHub is working on a filter to detect and suppress code generations\n    that are verbatim from the training set [[GitHub Research Recitation]] (https://docs.github.com/en/github/copilot/research-recitation).\n    According to the FAQ, GitHub implemented a simple filter that blocks emails in\n    standard formats to protect personally identifiable data that may be present in\n    the training data [[GitHub CoPilot]](https://copilot.github.com/).\n\n    '\n  terms_of_service: 'https://docs.github.com/en/site-policy/github-terms/github-terms-of-service\n\n    '\n  type: application\n  url: https://copilot.github.com/\n  user_distribution: unknown\n- access: open\n  analysis: ''\n  created_date: 2022-09-24\n  dependencies:\n  - PubMed\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: BioGPT\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.5B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://academic.oup.com/bib/article/23/6/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9&login=true\n- access: limited\n  adaptation: unknown\n  created_date: 2023-02-07\n  dependencies:\n  - ChatGPT API\n  description: AI-powered Bing search engine and Edge browser, available in preview\n    now at Bing.com, to deliver better search, more complete answers, a new chat experience\n    and the ability to generate content. We think of these tools as an AI copilot\n    for the web.\n  failures: ''\n  feedback: 'Feedback can be submitted at [bing.com](bing.com).\n\n    '\n  intended_uses: Search engine\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Bing Search\n  nationality: USA\n  organization: Microsoft\n  output_space: Search results\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/\n  user_distribution: ''\n- access: closed\n  analysis: ''\n  created_date: 2023-03-01\n  dependencies:\n  - The Pile\n  - CommonCrawl\n  - LAION-2B-en\n  - LAION-400M\n  - COYO-700M\n  - Conceptual Captions\n  description: KOSMOS-1 is a multimodal language model that is capable of perceiving\n    multimodal input, following instructions, and performing in-context learning for\n    not only language tasks but also multimodal tasks.\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: image, text; image, text\n  model_card: ''\n  monitoring: ''\n  name: KOSMOS-1\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.6B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2302.14045.pdf\n- access: closed\n  analysis: ''\n  created_date: 2023-02-07\n  dependencies: []\n  description: In the context of Bing, we have developed a proprietary way of working\n    with the OpenAI model that allows us to best leverage its power. We call this\n    collection of capabilities and techniques the Prometheus model. This combination\n    gives you more relevant, timely and targeted results, with improved safety.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: unknown\n  model_card: ''\n  monitoring: ''\n  name: Prometheus\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/\n- access: closed\n  analysis: ''\n  created_date: 2022-11-23\n  dependencies:\n  - FLD-900M\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; image\n  model_card: ''\n  monitoring: ''\n  name: Florence\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 900M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2111.11432\n- access: closed\n  analysis: ''\n  created_date: 2022-11-23\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text\n  monitoring: ''\n  name: FLD-900M\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 900M image-text pairs\n  type: dataset\n  url: https://arxiv.org/abs/2111.11432\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-07\n  dependencies:\n  - Florence\n  description: Cost-effective, production-ready computer vision services in Azure\n    Cognitive Service for Vision. The improved Vision Services enables developers\n    to create cutting-edge, market-ready, responsible computer vision applications\n    across various industries.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license in the Microsoft Terms of Use govern the license\n      rules for Azure services as outlined in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Azure Cognitive Services for Vision\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://azure.microsoft.com/en-us/support/legal/\n  type: application\n  url: https://azure.microsoft.com/en-us/blog/announcing-a-renaissance-in-computer-vision-ai-with-microsofts-florence-foundation-model/?utm_content=buffer16fa0&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer\n  user_distribution: ''\n- access: closed\n  analysis: ''\n  created_date: 2023-03-08\n  dependencies:\n  - OpenAI API\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: none\n  modality: text; image, text\n  model_card: ''\n  monitoring: ''\n  name: VisualChatGPT\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2303.04671.pdf\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-16\n  dependencies:\n  - GPT-4 API\n  description: It combines the power of language models with your data in the Microsoft\n    Graph and the Microsoft 365 apps to turn your words into the most powerful productivity\n    tool on the planet.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Microsoft 365 Copilot\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/\n  user_distribution: ''\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-16\n  dependencies:\n  - Microsoft 365 Copilot\n  description: \"Business Chat works across the langugae model, the Microsoft 365 apps,\\\n    \\ and your data \\u2014 your calendar, emails, chats, documents, meetings and contacts\\\n    \\ \\u2014 to do things you\\u2019ve never been able to do before. You can give it\\\n    \\ natural language prompts like \\u201CTell my team how we updated the product\\\n    \\ strategy,\\u201D and it will generate a status update based on the morning\\u2019\\\n    s meetings, emails and chat threads.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Microsoft Business Chat\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Microsoft 365 Copilot\n  description: Microsoft Excel is the industry leading spreadsheet software program,\n    a powerful data visualization and analysis tool.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Microsoft Excel\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://www.microsoft.com/en-us/microsoft-365/excel\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Microsoft 365 Copilot\n  description: Microsoft Outlook is a personal information manager software system\n    from Microsoft, available as a part of the Microsoft Office and Microsoft 365\n    software suites.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Microsoft Outlook\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://www.microsoft.com/en-us/microsoft-365/outlook/email-and-calendar-software-microsoft-outlook\n  user_distribution: ''\n- access: limited\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Microsoft 365 Copilot\n  description: Microsoft Power Platform is a line of business intelligence, app development,\n    and app connectivity software applications.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Microsoft Power Platform\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://powerplatform.microsoft.com/en-us/\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Microsoft 365 Copilot\n  description: Microsoft PowerPoint empowers you to create clean slideshow presentations\n    and intricate pitch decks and gives you a powerful presentation maker.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Microsoft PowerPoint\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://www.microsoft.com/en-us/microsoft-365/powerpoint\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Microsoft 365 Copilot\n  - Microsoft Business Chat\n  description: Microsoft Teams is a proprietary business communication platform developed\n    by Microsoft, as part of the Microsoft 365 family of products.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Microsoft Teams\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://www.microsoft.com/en-us/microsoft-teams/group-chat-software\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Microsoft 365 Copilot\n  description: Microsoft Word is a word processing software developed by Microsoft\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Microsoft Word\n  nationality: USA\n  organization: Microsoft\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://www.microsoft.com/en-us/microsoft-365/word\n  user_distribution: ''\n- access: limited\n  adaptation: unknown\n  created_date: unknown\n  dependencies: []\n  description: 'Inside look is a Microsoft Office feature, composing document insights\n    highlighting key points, expected time to read, and popularity among others.\n\n    '\n  failures: unknown\n  feedback: unknown\n  intended_uses: Providing document insights to users.\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Microsoft Inside Look\n  nationality: USA\n  organization: Microsoft\n  output_space: Document level insights for users.\n  prohibited_uses: unknown\n  quality_control: unknown\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://support.microsoft.com/en-us/office/see-file-insights-before-you-open-a-file-87a23bbc-a516-42e2-a7b6-0ecb8259e026\n  user_distribution: unknown\n- access: limited\n  adaptation: unknown\n  created_date: unknown\n  dependencies: []\n  description: 'Suggested replies is a Microsoft Outlook feature that suggests responses\n    to emails, available in: English, Spanish, Italian, French, German, Portuguese\n    Chinese Simplified, Chinese Traditional, Swedish, Russian, Korean, Czech, Hungarian,\n    Arabic, Hebrew, Thai, Turkish, Japanese, Dutch, Norwegian, Danish, and Polish.\n\n    '\n  failures: unknown\n  feedback: unknown\n  intended_uses: Suggesting email replies.\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Microsoft Suggested Replies\n  nationality: USA\n  organization: Microsoft\n  output_space: Suggested emails.\n  prohibited_uses: unknown\n  quality_control: unknown\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://support.microsoft.com/en-us/office/use-suggested-replies-in-outlook-19316194-0434-43ba-a742-6b5890157379\n  user_distribution: unknown\n- access: limited\n  adaptation:\n    explanation: See [[blog post]](https://blogs.microsoft.com/blog/2023/03/28/introducing-microsoft-security-copilot-empowering-defenders-at-the-speed-of-ai/).\n    value: Security Copilot combines OpenAI's GPT-4 generative AI with a security-specific\n      model from Microsoft. This security-specific model in turn incorporates a growing\n      set of security-specific skills and is informed by Microsoft's unique global\n      threat intelligence and more than 65 trillion daily signals.\n  created_date:\n    explanation: The date Security Copilot was announced in the [[Microsoft blog post]](https://blogs.microsoft.com/blog/2023/03/28/introducing-microsoft-security-copilot-empowering-defenders-at-the-speed-of-ai/).\n    value: 2023-03-28\n  dependencies:\n  - GPT-4\n  - Microsoft security-specific model\n  description: 'Microsoft Security Copilot is an AI-powered security analysis tool\n    that enables analysts to respond to threats quickly, process signals at machine\n    speed, and assess risk exposure in minutes.\n\n    '\n  failures: unknown\n  feedback: unknown\n  intended_uses: Security Copilot is designed to enhance the capabilities of cybersecurity\n    professionals. It leverages machine speed and scale to accelerate response to\n    security incidents, discover and process threat signals, and assess risk exposure\n    within minutes.\n  license:\n    explanation: 'Software license as described in the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: unknown\n  name: Microsoft Security Copilot\n  nationality: USA\n  organization: Microsoft\n  output_space:\n    explanation: See [[product demo]](https://www.microsoft.com/en-us/security/business/ai-machine-learning/microsoft-security-copilot).\n    value: Actionable responses to security-related questions (text and image). Security\n      event, incident or threat reports (PowerPoint slide).\n  prohibited_uses: unknown\n  quality_control:\n    explanation: See [[blog post]](https://blogs.microsoft.com/blog/2023/03/28/introducing-microsoft-security-copilot-empowering-defenders-at-the-speed-of-ai/).\n    value: Security Copilot employs a closed-loop learning system that learns from\n      user interactions and feedback, enabling it to provide more coherent, relevant,\n      and useful answers that continually improve over time. Security Copilot is committed\n      to delivering safe, secure, and responsible AI solutions, ensuring that customers'\n      data and AI models are protected with enterprise compliance and security controls.\n      Customer data is owned and controlled by them, and not used to train AI models\n      for anyone outside their organization.\n  terms_of_service: https://www.microsoft.com/legal/terms-of-use\n  type: application\n  url: https://blogs.microsoft.com/blog/2023/03/28/introducing-microsoft-security-copilot-empowering-defenders-at-the-speed-of-ai/\n  user_distribution: unknown\n- access: open\n  analysis: Evaluated on GLUE, SQuAD 2.0, and CoQA benchmarks.\n  created_date: 2019-10-01\n  dependencies: []\n  description: UniLM is a unified language model that can be fine-tuned for both natural\n    language understanding and generation tasks.\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: UniLM\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 340M parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8 NVIDIA Tesla V100 32GB GPUs\n  training_time: 10,000 steps in 7 hours\n  type: model\n  url: https://proceedings.neurips.cc/paper_files/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf\n- access: limited\n  analysis: ''\n  created_date: 2021-04-12\n  dependencies: []\n  description: Docugami is a LLM focused on writing business documents and data using\n    generative AI.\n  feedback: ''\n  intended_uses: analyzing, writing, and connecting business documents and data\n  license: ''\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Docugami\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 20B parameters (dense)\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: unknown\n  type: model\n  url: https://www.docugami.com/generative-ai\n- access: open\n  analysis: Evaluated on a range of standardized vision benchmarks, and achieves state\n    of the art performance on all experimentally.\n  created_date: 2022-08-31\n  dependencies:\n  - Multiway Transformer network\n  description: BEiT-3 is a general-purpose multimodal foundation model for vision\n    and vision-language tasks.\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  modality: image, text; image, text\n  model_card: ''\n  monitoring: ''\n  name: BEiT-3\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.9B parameters (dense)\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2208.10442.pdf\n- access: open\n  analysis: Reports results on standard LLM benchmarks in comparison to other LLMs\n    and test sets.\n  created_date: 2023-04-24\n  dependencies:\n  - LLaMA\n  - Evol-Instruct\n  - Alpaca dataset\n  description: Starting with an initial set of instructions, we use our proposed Evol-Instruct\n    to rewrite them step by step into more complex instructions. Then, we mix all\n    generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM.\n  feedback: https://huggingface.co/datasets/WizardLM/evol_instruct_70k/discussions\n  intended_uses: Creating large amounts of instruction data, particularly with high\n    complexity\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/WizardLM/WizardLM-13B-1.0\n  monitoring: ''\n  name: WizardLM\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: ''\n  training_hardware: 8 V100 GPUs\n  training_time: 70 hours on 3 epochs\n  type: model\n  url: https://arxiv.org/pdf/2304.12244v1.pdf\n- access: open\n  analysis: Evaluated on four prominent code generation benchmarks HumanEval, HumanEval+,\n    MBPP, and DS100.\n  created_date: 2023-08-26\n  dependencies:\n  - Evol-Instruct\n  - Alpaca dataset\n  - StarCoder\n  description:\n    explanation: WizardCoder refers to both a set of models fine-tuned on StarCoder\n      and a set of models fined-tuned on Code LLaMA. This node refers to the set fine-tuned\n      on StarCoder.\n    value: WizardCoder empowers Code LLMs with complex instruction fine-tuning, by\n      adapting the Evol-Instruct method to the domain of code.\n  feedback: https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/MODEL_WEIGHTS_LICENSE.\n      Code license is under Apache 2.0\n    value: BigCode Open Rail-M\n  modality:\n    explanation: text; text\n    value: text; text\n  model_card: https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0\n  monitoring: ''\n  name: WizardCoder\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 34B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2306.08568.pdf\n- access: closed\n  analysis: Evaluated on standard image processing benchmarks\n  created_date: 2023-11-10\n  dependencies:\n  - FLD-5B\n  description: WizardCoder empowers Code LLMs with complex instruction fine-tuning,\n    by adapting the Evol-Instruct method to the domain of code.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: image, text; text\n  model_card: none\n  monitoring: ''\n  name: Florence-2\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 771M parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.06242.pdf\n- access: closed\n  analysis: FLD-5B evaluated in comparison to datasets that power other large-scale\n    image models on standard image benchmarks.\n  created_date: 2023-11-10\n  datasheet: ''\n  dependencies: []\n  description: FLD-5B is the dataset that powers Florence-2\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text\n  monitoring: ''\n  name: FLD-5B\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 1.3B image-text annotations\n  type: dataset\n  url: https://arxiv.org/pdf/2311.06242.pdf\n- access: open\n  analysis: Models trained on OpenOrca compared to GPT-series on language benchmarks.\n  created_date: 2023-06-05\n  datasheet: https://huggingface.co/datasets/Open-Orca/OpenOrca\n  dependencies:\n  - GPT-3.5\n  - GPT-4\n  - Flan Collection\n  description: The OpenOrca dataset is a collection of augmented FLAN Collection data.\n    Currently ~1M GPT-4 completions, and ~3.2M GPT-3.5 completions. It is tabularized\n    in alignment with the distributions presented in the ORCA paper and currently\n    represents a partial completion of the full intended dataset, with ongoing generation\n    to expand its scope.\n  excluded: ''\n  feedback: none\n  included: ''\n  intended_uses: training and evaluation in the field of natural language processing.\n  license: MIT\n  modality: text\n  monitoring: ''\n  name: OpenOrca\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: none\n  quality_control: ''\n  sample: []\n  size: 4.5M text queries\n  type: dataset\n  url: https://huggingface.co/datasets/Open-Orca/OpenOrca\n- access: open\n  analysis: LlongOrca evaluated on BigBench-Hard and AGIEval results.\n  created_date: 2023-08-01\n  dependencies:\n  - OpenOrca\n  - LLongMA-2\n  description: LlongOrca is an attempt to make OpenOrca able to function in a Llong\n    context.\n  feedback: https://huggingface.co/Open-Orca/LlongOrca-7B-16k/discussions\n  intended_uses: training and evaluation in the field of natural language processing.\n  license: LLaMA 2\n  modality: text; text\n  model_card: https://huggingface.co/Open-Orca/LlongOrca-7B-16k\n  monitoring: ''\n  name: LlongOrca\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: none\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8x A6000-48GB (first-gen) GPUs\n  training_time: 37 hours\n  type: model\n  url: https://huggingface.co/Open-Orca/LlongOrca-7B-16k\n- access: open\n  analysis: Evaluated on common sense reasoning, language understanding, and multi-step\n    reasoning compared to other SOTA language models.\n  created_date: 2023-09-11\n  dependencies:\n  - phi-1\n  description: Phi-1.5 is a large language transformer model.\n  feedback: https://huggingface.co/microsoft/phi-1_5/discussions\n  intended_uses: Phi-1.5 is best suited for answering prompts using the QA format,\n    the chat format, and the code format.\n  license:\n    explanation: can be found via the license tab at top of https://huggingface.co/microsoft/phi-1_5\n    value: MIT\n  modality: text; text\n  model_card: https://huggingface.co/microsoft/phi-1_5\n  monitoring: none\n  name: Phi-1.5\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: generic web-crawl data is removed from dataset.\n  size: 1.3B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 32 A100-40G GPUs\n  training_time: 8 days\n  type: model\n  url: https://arxiv.org/pdf/2309.05463.pdf\n- access: open\n  analysis: Orca 2 has been evaluated on a large number of tasks ranging from reasoning\n    to grounding and safety.\n  created_date: 2023-11-21\n  dependencies:\n  - LLaMA 2\n  description: Orca 2 is a finetuned version of LLAMA-2 for research purposes.\n  feedback: https://huggingface.co/microsoft/Orca-2-13b/discussions\n  intended_uses: Orca 2 is built for research purposes only. The main purpose is to\n    allow the research community to assess its abilities and to provide a foundation\n    for building better frontier models.\n  license:\n    explanation: can be found at https://huggingface.co/microsoft/Orca-2-13b/blob/main/LICENSE\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/microsoft/Orca-2-13b\n  monitoring: unknown\n  name: Orca 2\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: Any purposes other than research.\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 32 NVIDIA A100 80GB GPUs\n  training_time: 80 hours\n  type: model\n  url: https://arxiv.org/pdf/2311.11045.pdf\n- access: open\n  analysis: The model has been evaluated against benchmarks that test common sense,\n    language understanding, mathematics, coding, long-term context, and logical reasoning.\n    The Phi-3 Medium-128K-Instruct demonstrated robust and state-of-the-art performance.\n  created_date: 2024-05-21\n  dependencies: []\n  description: Phi-3 is a 14 billion-parameter, lightweight, state-of-the-art open\n    model trained using the Phi-3 datasets.\n  feedback: https://huggingface.co/microsoft/Phi-3-medium-128k-instruct/discussions\n  intended_uses: The model's primary use cases are for commercial and research purposes\n    that require capable reasoning in memory or compute constrained environments and\n    latency-bound scenarios. It can also serve as a building block for generative\n    AI-powered features.\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/microsoft/Phi-3-medium-128k-instruct\n  monitoring: Issues like allocation, high-risk scenarios, misinformation, generation\n    of harmful content and misuse should be monitored and addressed.\n  name: Phi-3\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: The model should not be used for high-risk scenarios without adequate\n    evaluation and mitigation techniques for accuracy, safety, and fairness.\n  quality_control: The model underwent post-training processes viz. supervised fine-tuning\n    and direct preference optimization to increase its capability in following instructions\n    and aligning to safety measures.\n  size: 14B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/abs/2404.14219\n- access: closed\n  analysis: Evaluated by comparing climate predictions to actual happened events.\n  created_date: 2024-05-28\n  dependencies: []\n  description: Aurora is a large-scale foundation model of the atmosphere trained\n    on over a million hours of diverse weather and climate data.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; climate forecasts\n  model_card: none\n  monitoring: ''\n  name: Aurora\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.3B parameters\n  training_emissions: unknown\n  training_hardware: 32 A100 GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2405.13063\n- access: closed\n  analysis: Evaluated on a digital pathology benchmark comprising 9 cancer subtyping\n    tasks and 17 pathomics tasks, with Prov-GigaPath demonstrating SoTA performance\n    in 25 out of 26 tasks.\n  created_date: 2024-05-22\n  dependencies:\n  - GigaPath\n  description: \"Prov-GigaPath is a whole-slide pathology foundation model pretrained\\\n    \\ on 1.3 billion 256\\u2009\\xD7\\u2009256 pathology image tiles.\"\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: image; embeddings\n  model_card: none\n  monitoring: ''\n  name: Prov-GigaPath\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: 4 80GB A100 GPUs\n  training_time: 2 days\n  type: model\n  url: https://www.nature.com/articles/s41586-024-07441-w\n- access: open\n  analysis: The model was evaluated across a variety of public benchmarks, comparing\n    with a set of models including Mistral-Nemo-12B-instruct-2407, Llama-3.1-8B-instruct,\n    Gemma-2-9b-It, Gemini-1.5-Flash, and GPT-4o-mini-2024-07-18. It achieved a similar\n    level of language understanding and math as much larger models. It also displayed\n    superior performance in reasoning capability, even with only 6.6B active parameters.\n    It was also evaluated for multilingual tasks.\n  created_date: 2024-09-08\n  dependencies:\n  - Phi-3 dataset\n  description: Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon\n    datasets used for Phi-3 - synthetic data and filtered publicly available documents,\n    with a focus on very high-quality, reasoning dense data. It supports multilingual\n    and has a 128K context length in tokens. The model underwent a rigorous enhancement\n    process, incorporating supervised fine-tuning, proximal policy optimization, and\n    direct preference optimization to ensure instruction adherence and robust safety\n    measures.\n  feedback: Unknown\n  intended_uses: The model is intended for commercial and research use in multiple\n    languages. It is designed to accelerate research on language and multimodal models,\n    and for use as a building block for generative AI powered features. It is suitable\n    for general purpose AI systems and applications which require memory/computed\n    constrained environments, latency bound scenarios, and strong reasoning.\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct\n  monitoring: Unknown\n  name: Phi-3.5-MoE\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: The model should not be used for downstream purposes it was not\n    specifically designed or evaluated for. Developers should evaluate and mitigate\n    for accuracy, safety, and fariness before using within a specific downstream use\n    case, particularly for high risk scenarios.\n  quality_control: The model was enhanced through supervised fine-tuning, proximal\n    policy optimization, and direct preference optimization processes for safety measures.\n  size: 61B parameters (sparse); 6.6B active parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct\n- access:\n    explanation: Phi-4 is available on Azure AI Foundry and on Hugging Face.\n    value: open\n  analysis: Phi-4 outperforms comparable and larger models on math related reasoning.\n  created_date: 2024-12-13\n  dependencies: []\n  description: the latest small language model in Phi family, that offers high quality\n    results at a small size (14B parameters).\n  feedback: unknown\n  intended_uses: Specialized in complex reasoning, particularly good at math problems\n    and high-quality language processing.\n  license: unknown\n  modality:\n    explanation: Today we are introducing Phi-4 , our 14B parameter state-of-the-art\n      small language model (SLM) that excels at complex reasoning in areas such as\n      math, in addition to conventional language processing.\n    value: text; text\n  model_card: unknown\n  monitoring: Azure AI evaluations in AI Foundry enable developers to iteratively\n    assess the quality and safety of models and applications using built-in and custom\n    metrics to inform mitigations.\n  name: Phi-4\n  nationality: USA\n  organization: Microsoft\n  prohibited_uses: unknown\n  quality_control: Building AI solutions responsibly is at the core of AI development\n    at Microsoft. We have made our robust responsible AI capabilities available to\n    customers building with Phi models.\n  size:\n    explanation: a small size (14B parameters).\n    value: 14B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090\n"
  },
  {
    "path": "assets/mila.yaml",
    "content": "- access: open\n  analysis: Models of size 150k parameters trained on ToyMix and compared to models\n    trained on its dependencies across GNN baselines.\n  created_date: 2023-10-09\n  datasheet: none\n  dependencies:\n  - QM9\n  - TOX21\n  - ZINC12K\n  description: ToyMix is the smallest dataset of three extensive and meticulously\n    curated multi-label datasets that cover nearly 100 million molecules and over\n    3000 sparsely defined tasks.\n  excluded: ''\n  feedback: none\n  included: ''\n  intended_uses: The datasets are intended to be used in an academic setting for training\n    molecular GNNs with orders of magnitude more parameters than current large models.\n    Further, the ToyMix dataset is intended to be used in a multi-task setting, meaning\n    that a single model should be trained to predict them simultaneously.\n  license: CC BY-NC-SA 4.0\n  modality: molecules, tasks\n  monitoring: none\n  name: ToyMix\n  nationality: Canada\n  organization: Mila-Quebec AI Institute\n  prohibited_uses: none\n  quality_control: ''\n  sample: []\n  size: 13B labels of quantum and biological nature.\n  type: dataset\n  url: https://arxiv.org/pdf/2310.04292.pdf\n- access: open\n  analysis: Models of size between 4M and 6M parameters trained for 200 epochs on\n    LargeMix and compared to models trained on its dependencies across GNN baselines.\n  created_date: 2023-10-09\n  datasheet: none\n  dependencies:\n  - L1000 VCAP\n  - L1000 MCF7\n  - PCBA1328\n  - PCQM4M_G25_N4\n  description: LargeMix is the middle-sized dataset of three extensive and meticulously\n    curated multi-label datasets that cover nearly 100 million molecules and over\n    3000 sparsely defined tasks.\n  excluded: ''\n  feedback: none\n  included: ''\n  intended_uses: The datasets are intended to be used in an academic setting for training\n    molecular GNNs with orders of magnitude more parameters than current large models.\n    Further, the LargeMix dataset is intended to be used in a multi-task setting,\n    meaning that a single model should be trained to predict them simultaneously.\n  license: CC BY-NC-SA 4.0\n  modality: molecules, tasks\n  monitoring: none\n  name: LargeMix\n  nationality: Canada\n  organization: Mila-Quebec AI Institute\n  prohibited_uses: none\n  quality_control: ''\n  sample: []\n  size: 13B labels of quantum and biological nature.\n  type: dataset\n  url: https://arxiv.org/pdf/2310.04292.pdf\n- access: open\n  analysis: Models of size between 4M and 6M parameters trained for 50 epochs on UltraLarge\n    and compared to models trained on its dependencies across GNN baselines.\n  created_date: 2023-10-09\n  datasheet: none\n  dependencies:\n  - PM6_83M\n  description: UltraLarge is the largest dataset of three extensive and meticulously\n    curated multi-label datasets that cover nearly 100 million molecules and over\n    3000 sparsely defined tasks.\n  excluded: ''\n  feedback: none\n  included: ''\n  intended_uses: The datasets are intended to be used in an academic setting for training\n    molecular GNNs with orders of magnitude more parameters than current large models.\n  license: CC BY-NC-SA 4.0\n  modality: molecules, tasks\n  monitoring: none\n  name: UltraLarge\n  nationality: Canada\n  organization: Mila-Quebec AI Institute\n  prohibited_uses: none\n  quality_control: ''\n  sample: []\n  size: 13B labels of quantum and biological nature.\n  type: dataset\n  url: https://arxiv.org/pdf/2310.04292.pdf\n- access: open\n  analysis: Evaluated on previously unseen time series datasets.\n  created_date: 2024-02-08\n  dependencies: []\n  description: Lag-LLaMA is a general-purpose foundation model for univariate probabilistic\n    time series forecasting based on a decoder-only transformer architecture that\n    uses lags as covariates.\n  feedback: https://huggingface.co/time-series-foundation-models/Lag-Llama/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/time-series-foundation-models/Lag-Llama\n  monitoring: unknown\n  name: Lag-LLaMA\n  nationality: unknown\n  organization: Morgan Stanley, ServiceNow Research, University of Montreal, Mila-Quebec\n    AI Institute\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: A single NVIDIA Tesla-P100 GPU\n  training_time: unknown\n  type: model\n  url: https://time-series-foundation-models.github.io/lag-llama.pdf\n"
  },
  {
    "path": "assets/mistral.yaml",
    "content": "- access: open\n  analysis: Evaluated in comparison to LLaMA series models on standard language benchmarks.\n  created_date: 2023-09-27\n  dependencies: []\n  description: Mistral is a compact language model.\n  feedback: https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/mistralai/Mistral-7B-v0.1\n  monitoring: none\n  name: Mistral\n  nationality: USA\n  organization: Mistral AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7.3B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://mistral.ai/news/announcing-mistral-7b/\n- access: limited\n  analysis: Evaluated on commonly used benchmarks in comparison to the current LLM\n    leaders.\n  created_date: 2024-02-26\n  dependencies: []\n  description: \"Mistral Large is Mistral AI\\u2019s new cutting-edge text generation\\\n    \\ model.\"\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: Mistral Large\n  nationality: USA\n  organization: Mistral AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://mistral.ai/news/mistral-large/\n- access: limited\n  adaptation: ''\n  created_date: 2024-02-26\n  dependencies:\n  - Mistral\n  - Mistral Large\n  description: Le Chat is a first demonstration of what can be built with Mistral\n    models and what can deployed in the business environment.\n  failures: unknown\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: unknown\n  name: Le Chat\n  nationality: USA\n  organization: Mistral AI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://mistral.ai/terms/#terms-of-use\n  type: application\n  url: https://mistral.ai/news/le-chat-mistral/\n  user_distribution: unknown\n- access: open\n  analysis: Performance of Codestral is evaluated in Python, SQL, and additional languages,\n    C++, bash, Java, PHP, Typescript, and C#. Fill-in-the-middle performance is assessed\n    using HumanEval pass@1 in Python, JavaScript, and Java.\n  created_date: 2024-05-29\n  dependencies: []\n  description: Codestral is an open-weight generative AI model explicitly designed\n    for code generation tasks. It helps developers write and interact with code through\n    a shared instruction and completion API endpoint. Mastering code and English,\n    it can be used to design advanced AI applications for software developers. It\n    is fluent in 80+ programming languages.\n  feedback: none\n  intended_uses: Helps developers write and interact with code, design advanced AI\n    applications for software developers, integrated into LlamaIndex and LangChain\n    for building applications, integrated in VSCode and JetBrains environments for\n    code generation and interactive conversation.\n  license: Mistral AI Non-Production License\n  modality: text; code\n  model_card: none\n  monitoring: unknown\n  name: Codestral\n  nationality: USA\n  organization: Mistral AI\n  prohibited_uses: unknown\n  quality_control: ''\n  size: 22B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://mistral.ai/news/codestral/\n- access: open\n  analysis: The model underwent an advanced fine-tuning and alignment phase. Its performance\n    was evaluated using GPT4o as a judge on official references. It was compared to\n    recent open-source pre-trained models Gemma 2 9B, Llama 3 8B regarding multilingual\n    performance and coding accuracy. Tekken tokenizer's compression ability was compared\n    with previous tokenizers like SentencePiece and the Llama 3 tokenizer.\n  created_date: 2024-07-18\n  dependencies: []\n  description: The Mistral NeMo model is a state-of-the-art 12B model built in collaboration\n    with NVIDIA, offering a large context window of up to 128k tokens. The model is\n    suitable for multilingual applications and exhibits excellent reasoning, world\n    knowledge, and coding accuracy. It's easy to use and a drop-in replacement in\n    a system that uses Mistral 7B. The model uses a new tokenizer, Tekken, based on\n    Tiktoken, which is trained on over 100 languages. It compresses natural language\n    text and source code more efficiently than previously used tokenizers.\n  feedback: Problems should be reported to the Mistral AI team, though the specific\n    method of reporting is unknown.\n  intended_uses: The model can be used for multilingual applications, understanding\n    and generating natural language as well as source code, handling multi-turn conversations,\n    and providing more precise instruction following.\n  license: Apache 2.0\n  modality: text; text\n  model_card: unknown\n  monitoring: Unknown\n  name: Mistral NeMo\n  nationality: USA\n  organization: Mistral AI, NVIDIA\n  prohibited_uses: Unknown\n  quality_control: The model underwent an advanced fine-tuning and alignment phase.\n    Various measures such as accuracy comparisons with other models and instruction-tuning\n    were implemented to ensure its quality.\n  size: 12B parameters\n  training_emissions: Unknown\n  training_hardware: NVIDIA hardware, specifics unknown\n  training_time: Unknown\n  type: model\n  url: https://mistral.ai/news/mistral-nemo/\n- access: open\n  analysis: The model has been tested for in-context retrieval capabilities up to\n    256k tokens. It has been created with advanced code and reasoning capabilities,\n    which enables it to perform on par with SOTA transformer-based models.\n  created_date: 2024-07-16\n  dependencies: []\n  description: Codestral Mamba is a Mamba2 language model that is specialized in code\n    generation. It has a theoretical ability to model sequences of infinite length\n    and offers linear time inference. This makes it effective for extensive user engagement\n    and is especially practical for code productivity use cases. Codestral Mamba can\n    be deployed using the mistral-inference SDK or through TensorRT-LLM, and users\n    can download the raw weights from HuggingFace.\n  feedback: Problems with the model can be reported through the organization's website.\n  intended_uses: The model is intended for code generation and can be utilized as\n    a local code assistant.\n  license: Apache 2.0\n  modality: text; text\n  model_card: unknown\n  monitoring: Unknown\n  name: Codestral Mamba\n  nationality: USA\n  organization: Mistral AI\n  prohibited_uses: Unknown\n  quality_control: Unknown\n  size: 7.3B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://mistral.ai/news/codestral-mamba/\n- access: open\n  analysis: The model's performance has been evaluated on the MATH and MMLU industry-standard\n    benchmarks. It scored notably higher on both these tests than the base model Mistral\n    7B.\n  created_date: 2024-07-16\n  dependencies:\n  - Mistral 7B\n  description: \"Math\\u03A3tral is a 7B model designed for math reasoning and scientific\\\n    \\ discovery. It achieves state-of-the-art reasoning capacities in its size category\\\n    \\ across various industry-standard benchmarks. This model stands on the shoulders\\\n    \\ of Mistral 7B and specializes in STEM subjects. It is designed to assist efforts\\\n    \\ in advanced mathematical problems requiring complex, multi-step logical reasoning.\\\n    \\ It particularly achieves 56.6% on MATH and 63.47% on MMLU.\"\n  feedback: Feedback is likely expected to be given through the HuggingFace platform\n    where the model's weights are hosted or directly to the Mistral AI team.\n  intended_uses: The model is intended for use in solving advanced mathematical problems\n    requiring complex, multi-step logical reasoning or any math-related STEM subjects\n    challenges.\n  license: Apache 2.0\n  modality: text; text\n  model_card: unknown\n  monitoring: Unknown\n  name: \"Math\\u03A3tral\"\n  nationality: USA\n  organization: Mistral AI\n  prohibited_uses: Unknown\n  quality_control: This model has been fine-tuned from a base model and its inference\n    and performance have been tested on several industry benchmarks.\n  size: 7B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://mistral.ai/news/mathstral/\n- access:\n    explanation: The model is available under the Mistral Research License (MRL) for\n      research and educational use; and the Mistral Commercial License for experimentation,\n      testing, and production for commercial purposes.\n    value: open\n  analysis: We evaluate Pixtral Large against frontier models on a set of standard\n    multimodal benchmarks, through a common testing harness.\n  created_date: 2024-11-18\n  dependencies:\n  - Mistral Large 2\n  description: Pixtral Large is the second model in our multimodal family and demonstrates\n    frontier-level image understanding. Particularly, the model is able to understand\n    documents, charts and natural images, while maintaining the leading text-only\n    understanding of Mistral Large 2.\n  feedback: unknown\n  intended_uses: RAG and agentic workflows, making it a suitable choice for enterprise\n    use cases such as knowledge exploration and sharing, semantic understanding of\n    documents, task automation, and improved customer experiences.\n  license:\n    explanation: The model is available under the Mistral Research License (MRL) for\n      research and educational use; and the Mistral Commercial License for experimentation,\n      testing, and production for commercial purposes.\n    value: Mistral Research License (MRL), Mistral Commercial License\n  modality:\n    explanation: Pixtral Large is the second model in our multimodal family and demonstrates\n      frontier-level image understanding.\n    value: text, image; text\n  model_card: unknown\n  monitoring: unknown\n  name: Pixtral Large\n  nationality: USA\n  organization: Mistral AI\n  prohibited_uses: unknown\n  quality_control: unknown\n  size:\n    explanation: Today we announce Pixtral Large, a 124B open-weights multimodal model.\n    value: 124B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://mistral.ai/news/pixtral-large/\n- access:\n    explanation: \"The API is also available on Google Cloud\\u2019s Vertex AI, in private\\\n      \\ preview on Azure AI Foundry, and coming soon to Amazon Bedrock.\"\n    value: closed\n  analysis: Benchmarks We have benchmarked the new Codestral with the leading sub-100B\n    parameter coding models that are widely considered to be best-in-class for FIM\n    tasks.\n  created_date: 2025-01-13\n  dependencies: []\n  description: Lightweight, fast, and proficient in over 80 programming languages,\n    Codestral is optimized for low-latency, high-frequency usecases and supports tasks\n    such as fill-in-the-middle (FIM), code correction and test generation.\n  feedback: \"We can\\u2019t wait to hear your experience! Try it now Try it on Continue.dev\\\n    \\ with VsCode or JetBrains\"\n  intended_uses: Highly capable coding companion, regularly boosting productivity\n    several times over.\n  license: unknown\n  modality:\n    explanation: it for free in Continue for VS Code or JetBrains\n    value: text; text\n  model_card: unknown\n  monitoring: unknown\n  name: Codestral 25.01\n  nationality: USA\n  organization: Mistral AI\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://mistral.ai/news/codestral-2501/\n"
  },
  {
    "path": "assets/moonhub.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-10-11\n  dependencies:\n  - Cohere Base\n  description: Moonhub Recruiter is the world's first AI-powered recruiter providing\n    sourcing and recruiting services for startups and growing businesses.\n  failures: ''\n  feedback: ''\n  intended_uses: recruiting candidates for business needs\n  license:\n    explanation: License information can be found in terms of service\n    value: custom\n  monitoring: ''\n  monthly_active_users: unknown\n  name: Moonhub Recruiter\n  nationality: USA\n  organization: Moonhub\n  output_space: job candidate matches\n  prohibited_uses: none\n  quality_control: ''\n  terms_of_service: https://www.moonhub.ai/terms\n  type: application\n  url: https://www.moonhub.ai/\n  user_distribution: unknown\n"
  },
  {
    "path": "assets/moreh.yaml",
    "content": "- access: open\n  analysis: unknown\n  created_date: 2024-01-16\n  dependencies:\n  - Qwen\n  - OpenOrca\n  description: MoMo is a large language model fine-tuned from Qwen.\n  feedback: https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO\n  monitoring: unknown\n  name: MoMo\n  nationality: USA\n  organization: Moreh\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 72B parameters (dense)\n  training_emissions: unknown\n  training_hardware: \"AMD\\u2019s MI250 GPU\"\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO\n"
  },
  {
    "path": "assets/mosaic.yaml",
    "content": "- access: open\n  analysis: Evaluated on a range of benchmarks and performed on par with LLaMA-7B.\n  created_date: 2023-05-05\n  dependencies:\n  - RedPajama-Data\n  - C4\n  - The Stack\n  - Multimodal C4\n  description: MPT is a series of large language models seeking to address the limitations\n    of other open source models like LLaMA and Pythia.\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: MPT\n  nationality: USA\n  organization: Mosaic\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 440 A100 40GB GPUs\n  training_time: 9.5 days\n  type: model\n  url: https://www.mosaicml.com/blog/mpt-7b\n- access: open\n  analysis: Compared to Stable Diffusion 2, a SOTA text-to-image model.\n  created_date: 2023-10-25\n  dependencies:\n  - CommonCatalog\n  description: CommonCanvas is a text-to-image model trained solely on Creative Commons\n    licensed images.\n  feedback: none\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; image\n  model_card: none\n  monitoring: none\n  name: CommonCanvas\n  nationality: unknown\n  organization: Cornell University, Mosaic\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: 128 A100 NVIDIA GPUs\n  training_time: 6.79 days\n  type: model\n  url: https://arxiv.org/pdf/2310.16825.pdf\n- access: open\n  analysis: none\n  created_date: 2023-10-25\n  datasheet: https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md\n  dependencies:\n  - YFCC100M\n  - BLIP-2\n  description: CommonCatalog is a curated dataset of CommonCrawl images and synthetic\n    captions.\n  excluded: images with non-derivative licenses\n  feedback: none\n  included: images with derivative licenses\n  intended_uses: ''\n  license: Apache 2.0\n  modality: image-caption pairings\n  monitoring: ''\n  name: CommonCatalog\n  nationality: USA\n  organization: Mosaic\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 70M images\n  type: dataset\n  url: https://arxiv.org/pdf/2310.16825.pdf\n- access: open\n  analysis: unknown\n  created_date: 2024-10-08\n  dependencies: []\n  description: XTTS-v2 is a voice generation model that allows voice cloning into\n    different languages using a brief 6-second audio clip, supporting 17 languages\n    with features like emotion and style transfer, cross-language voice cloning, and\n    multi-lingual speech generation. It powers Coqui Studio and Coqui API, with improvements\n    in architectural and prosody aspects for better audio quality.\n  feedback: Users can join the Coqui community on Discord, engage on Twitter, or send\n    emails to info@coqui.ai for feedback and queries.\n  intended_uses: Voice cloning, multi-lingual speech generation, emotion and style\n    transfer in speech.\n  license: Coqui Public Model\n  modality: audio; audio\n  model_card: https://huggingface.co/coqui/XTTS-v2\n  monitoring: unknown\n  name: XTTS-v2\n  nationality: USA\n  organization: Coqui\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/coqui/XTTS-v2\n"
  },
  {
    "path": "assets/nanyang.yaml",
    "content": "- access: open\n  analysis: Evaluated on researcher experiments to test deeper understanding and advanced\n    commonsense reasoning\n  created_date: 2023-05-05\n  dependencies:\n  - MIMIC-IT\n  - OpenFlamingo\n  description: \"Otter is a multi-modal model based on OpenFlamingo (open-sourced version\\\n    \\ of DeepMind\\u2019s Flamingo), trained on MIMIC-IT and showcasing improved instruction-following\\\n    \\ ability and in-context learning.\"\n  feedback: ''\n  intended_uses: Following and executing new instructions with few in-context learning\n    examples given image and textual input.\n  license: MIT\n  modality: image, text; text\n  model_card: https://github.com/Luodian/Otter/blob/main/docs/model_card.md\n  monitoring: ''\n  name: Otter\n  nationality: Singapore\n  organization: Nanyang Technological University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.3B parameters (dense)\n  training_emissions: ''\n  training_hardware: 4 RTX-3090 GPUs\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2305.03726v1.pdf\n"
  },
  {
    "path": "assets/naver.yaml",
    "content": "- access:\n    explanation: The model is not available for download or direct API use\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was submitted to arxiv\n    value: 2021-05-21\n  dependencies: []\n  description: HyperClova is an autoregressive language model\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: HyperCLOVA\n  nationality: South Korea\n  organization: NAVER\n  prohibited_uses: ''\n  quality_control: ''\n  size: 82B parameters\n  training_emissions: ''\n  training_hardware: 1024 A100 GPUs\n  training_time: 130.4 days\n  type: model\n  url: https://arxiv.org/abs/2109.04650\n- access: limited\n  analysis: Evaluated on English and Korean benchmarks in comparison to open source\n    English and multilingual LLMs, with HyperCLOVA X (closed) surpassing the models\n    compared.\n  created_date: 2024-04-13\n  dependencies: []\n  description: HyperCLOVA X is a family of large language models (LLMs) tailored to\n    the Korean language and culture, along with competitive capabilities in English,\n    math, and coding.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: HyperCLOVA X\n  nationality: South Korea\n  organization: NAVER\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2404.01954\n"
  },
  {
    "path": "assets/ncsoft.yaml",
    "content": "- access: closed\n  analysis: Boasts the highest performance among the Korean LLMs of similar sizes\n    that have been released to date, according to internal evaluations.\n  created_date: 2023-08-16\n  dependencies: []\n  description: \"VARCO-LLM is NCSOFT\\u2019s large language model and is trained on\\\n    \\ English and Korean.\"\n  feedback: none\n  intended_uses: Developing various NLP-based AI services such as Q&A, chatbot, summarization,\n    information extraction\n  license:\n    explanation: Can be found at https://github.com/ncsoft/ncresearch/blob/main/LICENSE.txt\n    value: custom\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: VARCO-LLM\n  nationality: South Korea\n  organization: NCSOFT\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/ncsoft/ncresearch\n"
  },
  {
    "path": "assets/neeva.yaml",
    "content": "- access: closed\n  analysis: ''\n  created_date: unknown\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: Neeva dataset\n  nationality: USA\n  organization: Neeva\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://neeva.com/index\n- access: closed\n  analysis: ''\n  created_date: unknown\n  dependencies:\n  - Neeva dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Neeva model\n  nationality: USA\n  organization: Neeva\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://neeva.com/index\n- access: open\n  adaptation: ''\n  created_date:\n    explanation: The [[Blog post]](https://neeva.com/blog/introducing-neevaai) announcing\n      NeevaAI.\n    value: 2023-01-06\n  dependencies:\n  - Neeva model\n  description: NeevaAI is an AI-powered search tool that combines the capabilities\n    of LLMs with Neeva's independent in-house search stack to create a unique and\n    transformative search experience.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: '\"If you comply with these Terms, Neeva grants to you, during the\n      term of these Terms, a limited non-exclusive, non-transferable license, with\n      no right to sublicense, to download and install the App on your personal computers,\n      mobile handsets, tablets, wearable devices, and/or other devices and to run\n      the App solely for your own personal non-commercial purposes.\" - excerpt from\n      the Terms of Service document.\n\n      '\n    value: Custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: NeevaAI\n  nationality: USA\n  organization: Neeva\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://neeva.com/terms\n  type: application\n  url: https://neeva.com/blog/introducing-neevaai\n  user_distribution: ''\n"
  },
  {
    "path": "assets/nextdoor.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-05-02\n  dependencies:\n  - ChatGPT\n  description: AI chatbot on Nextdoor that helps users write more clear and conscientious\n    posts.\n  failures: ''\n  feedback: ''\n  intended_uses: to be used to help make the Nextdoor experience more positive for\n    users\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Nextdoor Assistant\n  nationality: USA\n  organization: Nextdoor\n  output_space: natural language text guidance\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://help.nextdoor.com/s/article/Introducing-Assistant\n  user_distribution: ''\n"
  },
  {
    "path": "assets/nolano.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-03-19\n  dependencies: []\n  description: Cformers is a set of transformers that act as an API for AI inference\n    in code.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  monitoring: ''\n  monthly_active_users: ''\n  name: Cformers\n  nationality: USA\n  organization: Nolano\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.nolano.org/services/Cformers/\n  user_distribution: ''\n- access:\n    explanation: \"T\\xFClu3 is intended for research and educational use.\"\n    value: limited\n  analysis: The model can produce problematic outputs (especially when prompted to\n    do so).\n  created_date: 2024-11-21\n  dependencies:\n  - Llama 3.1\n  description: \"T\\xFClu3 is a leading instruction following model family, offering\\\n    \\ fully open-source data, code, and recipes designed to serve as a comprehensive\\\n    \\ guide for modern post-training techniques.\"\n  feedback: unknown\n  intended_uses: \"T\\xFClu3 is intended for research and educational use.\"\n  license:\n    explanation: \"All Llama 3.1 T\\xFClu3 models are released under Meta's Llama 3.1\\\n      \\ Community License Agreement.\"\n    value: Llama 3.1 Community License Agreement\n  modality:\n    explanation: 'Language(s) (NLP): Primarily English'\n    value: text; text\n  model_card: https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B\n  monitoring: unknown\n  name: Llama 3.1 Tulu 3\n  nationality: USA\n  organization: Allen Institute for AI\n  prohibited_uses: The model can produce problematic outputs (especially when prompted\n    to do so).\n  quality_control: \"The T\\xFClu3 models have limited safety training, but are not\\\n    \\ deployed automatically with in-the-loop filtering of responses like ChatGPT.\"\n  size:\n    explanation: Final Models (RLVR) allenai/Llama-3.1-Tulu-3-8B allenai/Llama-3.1-Tulu-3-70B\n    value: 70B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B\n"
  },
  {
    "path": "assets/notion.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-02-22\n  dependencies:\n  - Anthropic API\n  description: \"Notion AI is a connected assistant that helps you think bigger, work\\\n    \\ faster, and augments your creativity, right inside the functional workspace\\\n    \\ you\\u2019re already familiar with.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  monitoring: ''\n  monthly_active_users: ''\n  name: Notion AI\n  nationality: USA\n  organization: Notion\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.notion.so/help/guides/notion-ai-for-docs\n  user_distribution: ''\n"
  },
  {
    "path": "assets/nous.yaml",
    "content": "- access: open\n  analysis: Evaluated across standard benchmarks and generally performs better than\n    Mixtral, which it was fine-tuned on.\n  created_date: 2024-01-10\n  dependencies:\n  - Mixtral\n  description: \"Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model\\\n    \\ trained over the\\_Mixtral 8x7B MoE LLM.\"\n  feedback: https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; code, text\n  model_card: https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO\n  monitoring: unknown\n  name: Nous Hermes 2\n  nationality: Canada\n  organization: Nous Research\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO\n- access: open\n  analysis: Evaluated across a variety of standard benchmarks in comparison to LLaMA\n    2.\n  created_date: 2023-11-01\n  dependencies:\n  - LLaMA 2\n  description: YaRN LLaMA 2 is an adapted version of LLaMA 2 using the YaRN extension\n    method.\n  feedback: https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k/discussions\n  intended_uses: ''\n  license: LLaMA 2\n  modality: text; text\n  model_card: https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k\n  monitoring: unknown\n  name: YaRN LLaMA 2\n  nationality: Canada/Switzerland\n  organization: Nous Research, EleutherAI, University of Geneva\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2309.00071.pdf\n- access: open\n  analysis: none\n  created_date: 2023-11-13\n  dependencies:\n  - Yi\n  description: The Capybara series is a series of LLMs and the first Nous collection\n    of models made by fine-tuning mostly on data created by Nous in-house.\n  feedback: https://huggingface.co/NousResearch/Nous-Capybara-34B/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/NousResearch/Nous-Capybara-34B\n  monitoring: unknown\n  name: Nous Capybara\n  nationality: Canada\n  organization: Nous Research\n  prohibited_uses: ''\n  quality_control: ''\n  size: 34B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/NousResearch/Nous-Capybara-34B\n- access: open\n  analysis: Evaluated across a variety of standard benchmarks in comparison to Mistral.\n  created_date: 2023-11-01\n  dependencies:\n  - Mistral\n  description: YaRN Mistral is an adapted version of Mistral using the YaRN extension\n    method.\n  feedback: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k\n  monitoring: unknown\n  name: YaRN Mistral\n  nationality: Canada/Switzerland\n  organization: Nous Research, EleutherAI, University of Geneva\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2309.00071.pdf\n- access: open\n  analysis: Evaluated on common LLM benchmarks in comparison to other Mistral derivatives.\n  created_date: 2023-11-03\n  dependencies:\n  - Mistral\n  description: OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune,\n    a continuation of OpenHermes 2 model, trained on additional code datasets.\n  feedback: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B\n  monitoring: unknown\n  name: OpenHermes 2.5 Mistral\n  nationality: Canada\n  organization:\n    explanation: developed as a personal project by Teknium, co-founder of Nous Research\n    value: Nous Research\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B\n- access: open\n  analysis: The model was examined across a range of benchmarks including GPT4All,\n    AGIEval, BigBench, TruthfulQA and in-house evaluations of function calling and\n    JSON mode.\n  created_date: 2024-03-10\n  dependencies:\n  - Mistral\n  - OpenHermes 2.5 Dataset\n  - Nous Hermes 2\n  description: Hermes 2 Pro on Mistral 7B is an upgraded, retrained version of Nous\n    Hermes 2. This improved version excels at function calling, JSON Structured Outputs,\n    and several other areas, scoring positively on various benchmarks.\n  feedback: https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B/discussions\n  intended_uses: The model is intended for general task and conversation capabilities,\n    function calling, and JSON structured outputs.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B\n  monitoring: unknown\n  name: Hermes 2 Pro-Mistral\n  nationality: unknown\n  organization: Nous\n  prohibited_uses: unknown\n  quality_control: The model was evaluated across multiple tasks, displaying notable\n    scores in GPT4All, AGIEval, BigBench, and TruthfulQA. It also has a high score\n    on function calling and JSON mode, indicating the robustness of its capabilities.\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B\n- access: open\n  analysis: unknown\n  created_date: 2024-03-07\n  dependencies: []\n  description: Genstruct is an instruction-generation model, designed to create valid\n    instructions given a raw text corpus. This enables the creation of new, partially\n    synthetic instruction finetuning datasets from any raw-text corpus. This work\n    was inspired by Ada-Instruct and the model is also trained to generate questions\n    involving complex scenarios that require detailed reasoning.\n  feedback: https://huggingface.co/NousResearch/Genstruct-7B/discussions\n  intended_uses: The model is intended for instruction-generation, creating questions\n    involving complex scenarios and generating reasoning steps for those questions.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/NousResearch/Genstruct-7B\n  monitoring: unknown\n  name: Genstruct\n  nationality: unknown\n  organization: Nous\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/NousResearch/Genstruct-7B\n"
  },
  {
    "path": "assets/nucleus.yaml",
    "content": "- access: open\n  analysis: Evaluated on the OpenLLM leaderboard, performing on par with similar-sized\n    models.\n  created_date: 2023-10-05\n  dependencies:\n  - RefinedWeb\n  description: Nucleus is a 22B parameters causal decoder-only model built by Nucleus.AI\n    and trained on 500B tokens of RefinedWeb along with curated corpora.\n  feedback: https://huggingface.co/NucleusAI/nucleus-22B-token-500B/discussions\n  intended_uses: Research on large language models; as a foundation for further specialization\n    and finetuning for specific usecases (e.g., summarization, text generation, chatbot,\n    etc.)\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/NucleusAI/nucleus-22B-token-500B\n  monitoring: unknown\n  name: Nucleus\n  nationality: USA\n  organization: Nucleus.AI\n  prohibited_uses: Production use without adequate assessment of risks and mitigation;\n    any use cases which may be considered irresponsible or harmful.\n  quality_control: ''\n  size: 22B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: 2 weeks\n  type: model\n  url: https://www.withnucleus.ai/\n"
  },
  {
    "path": "assets/nvidia.yaml",
    "content": "- access:\n    explanation: 'Neither the 8.3B parameter model trained to convergence nor the\n      1 trillion paramter model is available for download\n\n      '\n    value: closed\n  analysis: ''\n  created_date:\n    explanation: The date the paper for the 1 trillion parameter model was published\n    value: 2021-04-09\n  dependencies: []\n  description: Megatron-LM is an autoregressive language model\n  feedback: none\n  intended_uses: none\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: none\n  name: Megatron-LM\n  nationality: USA\n  organization: NVIDIA\n  prohibited_uses: none\n  quality_control: unknown\n  size: 1T parameters (dense)\n  training_emissions: unknown\n  training_hardware: 3072 A100 GPUs\n  training_time: 84 days\n  type: model\n  url: https://arxiv.org/abs/2104.04473\n- access: open\n  analysis: ''\n  created_date: 2022-06-17\n  datasheet: ''\n  dependencies:\n  - YouTube\n  - Wikipedia\n  - Reddit\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: MIT\n  modality: text, video\n  monitoring: ''\n  name: MineDojo\n  nationality: USA\n  organization: NVIDIA\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 730k videos, 6k Wikipedia pages, 340k reddit posts\n  type: dataset\n  url: https://arxiv.org/abs/2206.08853\n- access: open\n  analysis: ''\n  created_date: 2022-10-06\n  datasheet: ''\n  dependencies:\n  - T5\n  - Mask R-CNN\n  - VIMA dataset\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: MIT\n  modality: image, text\n  monitoring: ''\n  name: VIMA dataset\n  nationality: unknown\n  organization: NVIDIA, Stanford\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 200M parameters (dense model)\n  type: dataset\n  url: https://vimalabs.github.io/\n- access: open\n  analysis: ''\n  created_date: 2022-10-06\n  dependencies: []\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: image, text; robotics trajectories\n  model_card: ''\n  monitoring: ''\n  name: VIMA\n  nationality: unknown\n  organization: NVIDIA, Stanford\n  prohibited_uses: ''\n  quality_control: ''\n  size: 200M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://vimalabs.github.io/\n- access: open\n  analysis: Evaluated on standard LLM benchmarks across a range of fields like reasoning,\n    code generation, and mathematical skills.\n  created_date: 2024-02-27\n  dependencies: []\n  description: Nemotron 4 is a 15-billion-parameter large multilingual language model\n    trained on 8 trillion text tokens.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; code, text\n  model_card: none\n  monitoring: unknown\n  name: Nemotron 4\n  nationality: USA\n  organization: Nvidia\n  prohibited_uses: ''\n  quality_control: Deduplication and quality filtering techniques are applied to the\n    training dataset.\n  size: 15B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 3072 H100 80GB SXM5 GPUs across 384 DGX H100 nodes\n  training_time: 13 days\n  type: model\n  url: https://arxiv.org/pdf/2402.16819.pdf\n"
  },
  {
    "path": "assets/oasst.yaml",
    "content": "- access: open\n  analysis: none\n  created_date: 2023-08-23\n  dependencies:\n  - LLaMA 2\n  description: OpenAssistant LLaMA 2 is an Open-Assistant fine-tuning of Meta's LLaMA\n    2.\n  feedback: https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10/discussions\n  intended_uses: ''\n  license: LLaMA 2\n  modality: text; text\n  model_card: https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10\n  monitoring: unknown\n  name: OpenAssistant LLaMA 2\n  nationality: International\n  organization: OpenAssistant\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10\n"
  },
  {
    "path": "assets/ollama.yaml",
    "content": "- access: open\n  analysis: Mainly evaluated on MT-Bench and AlpacaEval, which are GPT-4-based comparisons.\n  created_date: 2023-11-02\n  dependencies: []\n  description: Starling is a large language model trained by reinforcement learning\n    from AI feedback focused on improving chatbot helpfulness.\n  feedback: https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/discussions\n  intended_uses: Academic research and free commercial usage\n  license: CC BY NC 4.0\n  modality: text; text\n  model_card: https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha\n  monitoring: none\n  name: Starling\n  nationality: USA\n  organization: Ollama\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://starling.cs.berkeley.edu/\n"
  },
  {
    "path": "assets/openai.yaml",
    "content": "- access:\n    explanation: The GPT-3 dataset isn't released to the public, but it may be available\n      to Microsoft through the GPT-3 licensing agreement between OpenAI and Microsoft\n      [[OpenAI Blog Post]] (https://openai.com/blog/openai-licenses-gpt-3-technology-to-microsoft/).\n    value: closed\n  analysis: The GPT-3 paper, which also introduces the GPT-3 dataset, provides a limited\n    analysis on the GPT-3 dataset, reporting the dirtiness of the dataset after the\n    it was filtered for text occurring in common benchmarking tasks. The authors report\n    that \"as the dataset becomes more contaminated, the variance of the clean over\n    all fraction increases, but there is no apparent bias towards improved or degraded\n    performance\" [[Appendix C]](https://arxiv.org/pdf/2005.14165.pdf#appendix.C).\n  created_date:\n    explanation: The date for the public announcement of GPT-3. The GPT-3 dataset\n      didn''t have a specific release date separate from the model [[Open AI Blog\n      Post]](https://openai.com/blog/openai-api/).\n    value: 2020-06-11\n  datasheet:\n    explanation: No datasheet available as of 2022-04-04.\n    value: none\n  dependencies:\n  - WebText\n  description: The GPT-3 dataset is the text corpus that was used to train the GPT-3\n    model. Information on the GPT-3 dataset is limited to discussion in the paper\n    introducing GPT-3 [[Section 2.2]](https://arxiv.org/pdf/2005.14165.pdf#subsection.2.2).\n  excluded: The Common Crawl dataset was processed using a classifier that kept high\n    quality documents and filtered low quality documents. WebText was used as a proxy\n    for high quality documents [[Appendix A]](https://arxiv.org/pdf/2005.14165.pdf#appendix.A).\n  feedback:\n    explanation: There are no known (internal or external) feedback mechanisms for\n      the GPT-3 dataset as of 2022-04-04.\n    value: unknown\n  included: The dataset is composed of several NLP corpora including Common Crawl\n    (filtered, 60%), WebText2 (22%), Books1 (8%), Books2 (8%), Wikipedia (3%) [[Section\n    2.2]](https://arxiv.org/pdf/2005.14165.pdf#subsection.2.2).\n  intended_uses: The intended use of the GPT-3 dataset is to train language models.\n  license:\n    explanation: There is no known license specific to the GPT-3 dataset, however,\n      the governing organization, OpenAI, licensed GPT-3 to Microsoft, which makes\n      it likely that the GPT-3 dataset was also licensed [[OpenAI Blog Post]] (https://openai.com/blog/openai-licenses-gpt-3-technology-to-microsoft/).\n    value: unknown\n  modality: text\n  monitoring:\n    explanation: There are no known (internal or external) monitoring mechanisms that\n      are in place for the use of the GPT-3 dataset as of 2022-04-04.\n    value: unknown\n  name: GPT-3 dataset\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses:\n    explanation: OpenAI didn't provide a list of prohibited uses specifically for\n      the GPT-3 dataset. However, public OpenAI products are governed by the OpenAI\n      Terms of Use, which may also apply to the OpenAI dataset. The OpenAI Terms of\n      Use prohibit the following, (i) Illegal activities, such as child pornography,\n      gambling, cybercrime, piracy, violating copyright, trademark or other intellectual\n      property laws; (ii) Accessing or authorizing anyone to access the APIs from\n      an embargoed country, region, or territory as prohibited by the U.S. government;\n      (iii) Threatening, stalking, defaming, defrauding, degrading, victimizing or\n      intimidating anyone for any reason [[Open AI Terms of Use]](https://openai.com/api/policies/terms/).\n    value: unknown\n  quality_control: In addition to excluding low quality documents from the Common\n    Crawl dataset, the authors fuzzily deduplicated documents within each dataset,\n    by removing documents that have high overlap with each other. The same procedure\n    was followed to fuzzily deduplicate WebText from Common Crawl [[Appendix A]](https://arxiv.org/pdf/2005.14165.pdf#appendix.A).\n    Text occurring in benchmark datasets were also partially removed [[Appendix C]](https://arxiv.org/pdf/2005.14165.pdf#appendix.C).\n  sample: []\n  size: 570 GB\n  type: dataset\n  url: https://arxiv.org/pdf/2005.14165.pdf\n- access:\n    explanation: 'HumanEval dataset is publicly available and comes with a an evaluation\n      framework [[HumanEval GitHub Repository]](https://www.github.com/openai/human-eval).\n\n      '''\n    value: open\n  analysis: none\n  created_date:\n    explanation: 'The date that Codex, the model evaluated on the HumanEval dataset,\n      was announced to the public [[OpenAI Blog Post]](https://openai.com/blog/openai-codex/).\n\n      '''\n    value: 2021-08-10\n  datasheet:\n    explanation: No datasheet available as of 2022-04-10.\n    value: none\n  dependencies: []\n  description: 'HumanEval is a dataset of 164 programming problems hand-written to\n    evaluate their Codex model.\n\n    '\n  excluded: 'Code problems easily found on the internet.\n\n    '\n  feedback: 'Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf).\n\n    '\n  included: '164 hand-written questions.\n\n    '\n  intended_uses: 'Evaluating code generation capabilities of models.\n\n    '\n  license:\n    explanation: https://github.com/openai/human-eval/blob/master/LICENSE\n    value: MIT\n  modality: code\n  monitoring: none\n  name: HumanEval\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: none\n  quality_control: 'The evaluation dataset was handwritten to ensure that the evaluation\n    problems do not exist in the Codex dataset [[Section 2.2]](https://arxiv.org/pdf/2107.03374.pdf#subsection.2.2).\n\n    '\n  sample:\n  - \"\\n\\ndef string_sequence(n: int) -> str:\\n    \\\"\\\"\\\" Return a string containing\\\n    \\ space-delimited numbers starting from 0 upto n inclusive.\\n    >>> string_sequence(0)\\n\\\n    \\    '0'\\n    >>> string_sequence(5)\\n    '0 1 2 3 4 5'\\n    \\\"\\\"\\\"\\n\"\n  - \"\\n\\ndef count_distinct_characters(string: str) -> int:\\n    \\\"\\\"\\\" Given a string,\\\n    \\ find out how many distinct characters (regardless of case) does it consist of\\n\\\n    \\    >>> count_distinct_characters('xyzXYZ')\\n    3\\n    >>> count_distinct_characters('Jerry')\\n\\\n    \\    4\\n    \\\"\\\"\\\"\\n\"\n  - \"from typing import List\\n\\n\\ndef parse_music(music_string: str) -> List[int]:\\n\\\n    \\    \\\"\\\"\\\" Input to this function is a string representing musical notes in a\\\n    \\ special ASCII format.\\n    Your task is to parse this string and return list\\\n    \\ of integers corresponding to how many beats does each\\n    not last.\\n\\n   \\\n    \\ Here is a legend:\\n    'o' - whole note, lasts four beats\\n    'o|' - half note,\\\n    \\ lasts two beats\\n    '.|' - quater note, lasts one beat\\n\\n    >>> parse_music('o\\\n    \\ o| .| o| o| .| .| .| .| o o')\\n    [4, 2, 1, 2, 2, 1, 1, 1, 1, 4, 4]\\n    \\\"\\\n    \\\"\\\"\\n\"\n  - \"\\n\\ndef how_many_times(string: str, substring: str) -> int:\\n    \\\"\\\"\\\" Find\\\n    \\ how many times a given substring can be found in the original string. Count\\\n    \\ overlaping cases.\\n    >>> how_many_times('', 'a')\\n    0\\n    >>> how_many_times('aaa',\\\n    \\ 'a')\\n    3\\n    >>> how_many_times('aaaa', 'aa')\\n    3\\n    \\\"\\\"\\\"\\n\"\n  - \"from typing import List\\n\\n\\ndef sort_numbers(numbers: str) -> str:\\n    \\\"\\\"\\\n    \\\" Input is a space-delimited string of numberals from 'zero' to 'nine'.\\n   \\\n    \\ Valid choices are 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven',\\\n    \\ 'eight' and 'nine'.\\n    Return the string with numbers sorted from smallest\\\n    \\ to largest\\n    >>> sort_numbers('three one five')\\n    'one three five'\\n \\\n    \\   \\\"\\\"\\\"\\n\"\n  size: 214 KB\n  type: dataset\n  url: https://arxiv.org/pdf/2107.03374.pdf\n- access:\n    explanation: 'The dataset might have been made available to Microsoft as part\n      of OpenAI giving Microsoft access to its Codex model [GitHub Copilot](https://copilot.github.com/).\n\n      '''\n    value: closed\n  analysis:\n    explanation: The paper doesn't provide an analysis on the training dataset.\n    value: none\n  created_date:\n    explanation: 'The date that Codex, the model trained on the Codex dataset, was\n      announced to the public [[OpenAI Blog Post]](https://openai.com/blog/openai-codex/).\n\n      '''\n    value: 2021-08-10\n  datasheet: none\n  dependencies: []\n  description: 'The dataset used to train the Codex model.\n\n    '\n  excluded: 'Following were filtered from the dataset: autogenerated files; files\n    with average line length > 100, maximum line length > 1000, or few alphanumeric\n    characters [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1).\n\n    '\n  feedback: 'Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf).\n\n    '\n  included: 'The dataset includes 54 million public software repositories hosted on\n    GitHub as of an unspecified date in May 2020 [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1).\n\n    '\n  intended_uses: Training language models on code.\n  license:\n    explanation: unknown\n  modality: {}\n  monitoring: unknown\n  name: Codex dataset\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: unknown\n  quality_control: 'Dataset was filtered using simple heuristics, as outlined in the\n    excluded field.\n\n    '\n  sample: []\n  size:\n    explanation: As reported by the authors [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1).\n    value: 159 GB\n  type: dataset\n  url: https://arxiv.org/pdf/2107.03374.pdf\n- access:\n    explanation: The dataset wasn't released to the public.\n    value: closed\n  analysis: 'The dataset contained some overlap with the test sets of the benchmarks\n    used for evaluation, but the authors determined the impact to be small: \"There\n    is a median overlap of 2.2% and an average overlap of 3.2%. Due to this small\n    amount of overlap, overall accuracy is rarely shifted by more than 0.1% with only\n    7 datasets above this threshold\" [[Section 5]](https://arxiv.org/pdf/2103.00020.pdf#section.5).\n\n    '\n  created_date:\n    explanation: 'The date of the blog post announcing CLIP [[OpenAI Blog Post]](https://openai.com/blog/clip/).\n\n      '''\n    value: 2021-01-05\n  datasheet: none\n  dependencies: []\n  description: 'CLIP dataset contains text-image pairs crawled from the internet.\n\n    '\n  excluded: none\n  feedback:\n    explanation: The feedback mechanisms in place are unknown.\n    value: unknown\n  included: 'Data crawled from the internet, without any filtering (including de-duplication)\n    or curation.\n\n    '\n  intended_uses: Training multimodal vision models.\n  license: unknown\n  modality: image, text\n  monitoring:\n    explanation: The monitoring mechanisms in place are unknown.\n    value: unknown\n  name: CLIP dataset\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses:\n    explanation: The prohibited uses of the dataset are unknown.\n    value: unknown\n  quality_control: 'The data was \"only crawled websites that had policies against\n    excessively violent and adult images and allowed us to filter out such content\"\n    [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.md).\n\n    '\n  sample: []\n  size: 400M (image, text) pairs\n  type: dataset\n  url: https://arxiv.org/pdf/2103.00020.pdf\n- access:\n    explanation: The dataset wasn't released to the public.\n    value: closed\n  analysis: \"The authors found that the dataset contained 21% of the images in the\\\n    \\ MS-COCO validation set, but observed no significant changes in the performance\\\n    \\ of the accompanying DALL\\xB7E when tested on MS-COCO evaluation set with and\\\n    \\ without the said images [[Section 3.1]](https://arxiv.org/pdf/2102.12092.pdf#subsection.3.1).\"\n  created_date:\n    explanation: \"The date of the blog post announcing DALL\\xB7E [[OpenAI Blog Post]](https://openai.com/blog/dall-e/).\\n\"\n    value: 2021-01-05\n  datasheet: none\n  dependencies: []\n  description: \"DALL\\xB7E dataset is the training set consisting of image and text\\\n    \\ pairs collected to train the DALL\\xB7E model.\\n\"\n  excluded: 'MS-COCO was excluded from the dataset, but because MS-COCO was created\n    from YFCC100M, some of the test images (not the captions) were included.\n\n    '\n  feedback:\n    explanation: The feedback mechanisms in place are unknown.\n    value: unknown\n  included: 'Data from the internet, including Conceptual Captions and a filtered\n    subset of YFCC100M.\n\n    '\n  intended_uses: Training multimodal vision models.\n  license: unknown\n  modality: image, text\n  monitoring:\n    explanation: The monitoring mechanisms in place are unknown.\n    value: unknown\n  name: \"DALL\\xB7E dataset\"\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses:\n    explanation: The prohibited uses of the dataset are unknown.\n    value: unknown\n  quality_control: \"The data was de-duplicated [[Section 3.2]](https://arxiv.org/pdf/2102.12092.pdf#subsection.3.2).\\\n    \\ The data collected from the internet was filtered using image, text and joint\\\n    \\ image and text filters, which included: \\\"discarding instances whose captions\\\n    \\ are too short, are classified as non-English by the Python package cld3, or\\\n    \\ that consist primarily of boilerplate phrases such as \\u201Cphotographed on\\\n    \\ <date>\\u201D, where <date> matches various formats for dates that we found in\\\n    \\ the data\\\". The authors also discard \\\"instances whose images have aspect ratios\\\n    \\ not in [1/2, 2]\\\" [[Appendix C]](https://arxiv.org/pdf/2102.12092.pdf#appendix.C).\\n\"\n  sample: []\n  size: '250M (image, text) pairs\n\n    '\n  type: dataset\n  url: https://arxiv.org/abs/2102.12092\n- access:\n    explanation: 'The dataset isn''t released to the public.\n\n      '''\n    value: closed\n  analysis: 'The Whisper paper provides limited details on preprocessing.\n\n    '\n  created_date:\n    explanation: 'The date for the public announcement of Whisper. The dataset didn''''t\n      have a specific release date separate from the model [[Open AI Blog Post]](https://openai.com/blog/whisper/).\n\n      '''\n    value: 2022-09-21\n  datasheet:\n    explanation: No datasheet available as of 2022-12-07.\n    value: none\n  dependencies: []\n  description: 'The Whisper dataset is the speech corpus that was used to train the\n    Whisper model. Information on the dataset is limited to discussion in the paper\n    introducing Whisper. [[Section 2.1]](https://cdn.openai.com/papers/whisper.pdf).\n\n    '\n  excluded: 'Automated filtering was conducted.\n\n    '\n  feedback:\n    explanation: 'There are no known (internal or external) feedback mechanisms for\n      the dataset as of 2022-12-07.\n\n      '''\n    value: unknown\n  included: 'The dataset is composed three major sources: multilingual speech recognition\n    (17%), translation (18%), and English speech recognition (65%). [[Figure 11]](https://cdn.openai.com/papers/whisper.pdf).\n\n    '\n  intended_uses: The intended use is to train speech models.\n  license: unknown\n  modality: audio\n  monitoring:\n    explanation: 'There are no known (internal or external) monitoring mechanisms\n      that are in place for the use of the dataset as of 2022-12-07.\n\n      '''\n    value: none\n  name: Whisper dataset\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses:\n    explanation: 'No uses are explicitly prohibited.\n\n      '''\n    value: none\n  quality_control: 'In addition to filtering, basic text standardization was done.\n\n    '\n  sample: []\n  size: 680k hours\n  type: dataset\n  url: https://cdn.openai.com/papers/whisper.pdf\n- access: closed\n  analysis: ''\n  created_date:\n    explanation: Due to the lack of information about the exact date, it is assumed\n      to be the 1st of the known month of creation.\n    value: 2019-11-01\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: WebText\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 40 GB\n  type: dataset\n  url: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf\n- access: open\n  analysis: ''\n  created_date:\n    explanation: Due to the lack of information about the exact date, it is assumed\n      to be the 1st of the known month of creation.\n    value: 2019-11-01\n  dependencies:\n  - WebText\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'GPT-2 uses a [[modified MIT License]](https://github.com/openai/gpt-2/blob/master/LICENSE).\n\n      '''\n    value: Modified MIT License\n  modality: text; text\n  model_card: https://github.com/openai/gpt-2/blob/master/model_card.md\n  monitoring: ''\n  name: GPT-2\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.5B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf\n- access:\n    explanation: 'The GPT-3 model isn''t fully released to the public, but it was\n      made available to Microsoft through the licencing agreement between OpenAI and\n      Microsoft [[OpenAI Blog Post]] (https://openai.com/blog/openai-licenses-gpt-3-technology-to-microsoft/).\n      The public can access the model through the Open AI API, which is available\n      in supported countries [[Supported Countries]](https://beta.openai.com/docs/supported-countries)\n      [[OpenAI API]](https://openai.com/api/).\n\n      '''\n    value: limited\n  analysis: 'The GPT-3 model was evaluated on language modeling, closed-book question\n    answering, translation, Winograd-style tasks, commonsense reasoning, reading comprehension,\n    SuperGLUE, NLI, synthetic tasks, and generation [[Section 4]](https://arxiv.org/pdf/2005.14165.pdf#section.4);\n    as well as on fairness and biases [[Section 6]](https://arxiv.org/pdf/2005.14165.pdf#section.6).\n\n    '\n  created_date:\n    explanation: 'The date that GPT-3 was announced to the public [[OpenAI Blog Post]](https://openai.com/blog/openai-api/).\n\n      '''\n    value: 2020-06-11\n  dependencies:\n  - GPT-3 dataset\n  description: 'GPT-3 is an autoregressive language model.\n\n    '\n  feedback: 'Feedback for GPT-3 can be provided on the feedback form linked in the\n    model card [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md).\n    The form is especially meant to collect feedback on concerns about misuse, synthetic\n    text detection, bias, and risk of generative language models.\n\n    '\n  intended_uses: 'GPT-3 was intended to be use through the OpenAI API by developers\n    for language applications. Other intended use of GPT-3 include researchers accessing\n    the model through the API to study its paradigms [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md).\n\n    '\n  license: unknown\n  modality: text; text\n  model_card: https://github.com/openai/gpt-3/blob/master/model-card.md\n  monitoring: 'OpenAI reviews all use cases of the model [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md).\n\n    '\n  name: GPT-3\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: 'Access to GPT-3 is governed by Open AI API Usage Guidelines and\n    API Terms of Use, prohibiting the use of the API in a way that causes societal\n    harm. [[Usage Guidelines]] (https://beta.openai.com/docs/usage-guidelines/content-policy)\n    [[Terms of Use]](https://openai.com/api/policies/terms/). The list of disallowed\n    applications can be found in the usage guidelines [[Disallowed Applications]]\n    (https://beta.openai.com/docs/usage-guidelines/disallowed-applications).\n\n    '\n  quality_control: 'One quality control method OpenAI employed was releasing GPT-3\n    only through the OpenAI API. OpenAI states that it is easier to respond to misuse\n    when the access to the model is gated through the API. It also hints that it plans\n    to broaden the API access over time based on the amount of misuse [[OpenAI API\n    Blog Post]](https://openai.com/blog/openai-api/). The authors identify potential\n    misuses of GPT-3 in the paper and analyze it for fairness, bias and representation\n    issues, but do not identify mitigation strategies [[Section 6]](https://arxiv.org/pdf/2005.14165.pdf#section.6).\n\n    '\n  size:\n    explanation: 'GPT-3 comes in several sizes. Here we report the size of the Davinci\n      model, the largest GPT-3 model served through the OpenAI API. Sizes of the other\n      models can be found in the paper [[Table 2.1]](https://arxiv.org/pdf/2005.14165.pdf#table.caption.7).\n\n      '\n    value: 175B parameters (dense)\n  training_emissions:\n    explanation: 'Estimate of the CO2(e) emissions for GPT-3 were not provided by\n      OpenAI, but they were provided by a follow up work investigating the CO2 equivalent\n      emissions (CO2e) of GPT-3 [[Patterson et al.]] (https://arxiv.org/ftp/arxiv/papers/2104/2104.10350.pdf).\n\n      '''\n    value: 552.1 tCO2e\n  training_hardware:\n    explanation: 'The original paper doesn''''t specify the training hardware for\n      GPT-3, but a follow up blog post indicates that it was trained on a cluster\n      on Azure cluster, using 10000 GPUs with 400 Gbps [[Microsoft Blog Post]] (https://blogs.microsoft.com/ai/openai-azure-supercomputer/).\n\n      '''\n    value: Azure\n  training_time:\n    explanation: 'The time required to train different sized GPT-3 models are listed\n      in [[Table D.1]](https://arxiv.org/pdf/2005.14165.pdf#table.caption.50). The\n      time required to train the GPT-3 model with 175B parameters is reported as 3.64E+03\n      petaflop/s-days.\n\n      '''\n    value: 3640 petaflop/s-days\n  type: model\n  url: https://arxiv.org/pdf/2005.14165.pdf\n- access:\n    explanation: 'The model is made available via the OpenAI API [[OpenAI API]](https://openai.com/api/)\n      as code-cushman-001 according to the [[Model Index]](https://platform.openai.com/docs/model-index-for-researchers).\n\n      '''\n    value: limited\n  analysis: 'The model was evaluated using the HumanEval dataset with pass@k metric\n    and BLEU scores [[Section 2]](https://arxiv.org/pdf/2107.03374.pdf#section.2).\n\n    '\n  created_date:\n    explanation: 'The date that Codex was announced to the public [[OpenAI Blog Post]](https://openai.com/blog/openai-codex/).\n\n      '''\n    value: 2021-08-10\n  dependencies:\n  - GPT-3\n  - Codex dataset\n  - HumanEval\n  description: 'Codex is a GPT language model fine-tuned on publicly available code\n    from GitHub.\n\n    '\n  feedback: 'Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf).\n\n    '\n  intended_uses: 'Codex is intended to be used for coding related language modelling\n    tasks.\n\n    '\n  license: unknown\n  modality: text; code, text\n  model_card: none\n  monitoring:\n    explanation: 'There isn''t any known monitoring in place for the model, but there\n      may be internal mechanisms.\n\n      '''\n    value: unknown\n  name: Codex\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses:\n    explanation: The prohibited uses of the model aren't specified.\n    value: unknown\n  quality_control: 'The model wasn''t fully released to the public as a quality control\n    measure. The authors identify potential risks of Codex in their paper due to the\n    following: over-reliance, misalignment, bias and representation, economic and\n    labor market impacts, security implications, environmental impact and legal implications.\n    They also make suggestions for some of these, but do not implement them in Codex\n    [[Section 7]](https://arxiv.org/pdf/2107.03374.pdf#section.7).\n\n    '\n  size: 12B parameters (dense)\n  training_emissions:\n    explanation: Authors do not report the training emissions.\n    value: unknown\n  training_hardware:\n    explanation: 'The paper specifies that Azure was used, but the underlying architecture\n      wasn''''t specified.\n\n      '''\n    value: Azure\n  training_time:\n    explanation: 'Authors estimate hundreds of petaflop/s-days of compute [[Section\n      7.6]](https://arxiv.org/pdf/2107.03374.pdf#subsection.7.6), but don''''t provide\n      an exact number.\n\n      '''\n    value: 100-1000 petaflop/s-days\n  type: model\n  url: https://arxiv.org/pdf/2107.03374.pdf\n- access:\n    explanation: 'The model is made available via the OpenAI API but the specific\n      endpoint (davinci-instruct-beta) is currently unavailable. [[Model Index]](https://platform.openai.com/docs/model-index-for-researchers).\n\n      '''\n    value: closed\n  analysis: The model was evaluated on human ratings to the InstructGPT answers to\n    the prompts submitted to the OpenAI API as well as on public NLP datasets spanning\n    truthfulness, toxicity, and bias, question answering, reading comprehension, and\n    summarization tasks.\n  created_date:\n    explanation: 'Date of the public announcement introducing InstructGPT [[OpenAI\n      Blog Post]] (https://openai.com/blog/instruction-following/).\n\n      '''\n    value: 2022-01-27\n  dependencies:\n  - GPT-3\n  - OpenAI API\n  description: 'InstructGPT is a family of GPT-3 based models fine-tuned on human\n    feedback, which allows for better instruction following capabilities than GPT-3.\n\n    '\n  feedback: 'Email the authors [[InstructGPT Paper]](https://arxiv.org/pdf/2203.02155.pdf).\n\n    '\n  intended_uses: 'As stated in the model card: \"The intended direct users of InstructGPT\n    are developers who access its capabilities via the OpenAI API. Through the OpenAI\n    API, the model can be used by those who may not have AI development experience,\n    to build and explore language modeling systems across a wide range of functions.\n    We also anticipate that the model will continue to be used by researchers to better\n    understand the behaviors, capabilities, biases, and constraints of large-scale\n    language models\" [[Model Card]](https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md).\n\n    '\n  license: unknown\n  modality: text; code, text\n  model_card: https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md\n  monitoring:\n    explanation: 'There isn''t any known monitoring in place for the model, but there\n      may be internal mechanisms.\n\n      '''\n    value: unknown\n  name: InstructGPT\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: 'Access to InstructGPT is governed by Open AI API Usage Guidelines\n    and API Terms of Use, prohibiting the use of the API in a way that causes societal\n    harm. [[Usage Guidelines]] (https://beta.openai.com/docs/usage-guidelines/content-policy)\n    [[Terms of Use]](https://openai.com/api/policies/terms/). The list of disallowed\n    applications can be found in the usage guidelines [[Disallowed Applications]]\n    (https://beta.openai.com/docs/usage-guidelines/disallowed-applications).\n\n    '\n  quality_control: 'The model wasn''t fully released to the public as a quality control\n    measure.\n\n    '\n  size: 175B parameters (dense)\n  training_emissions:\n    explanation: The authors do not estimate the emissions of the model.\n    value: unknown\n  training_hardware:\n    explanation: The authors do not disclose the training hardware used.\n    value: unknown\n  training_time:\n    explanation: '175B SFT model required 4.9 petaflops/s-days; 175B PPO-ptx model\n      required 60 petaflops/s-days [[Section 5]](https://arxiv.org/pdf/2203.02155.pdf#section.5).\n\n      '''\n    value: 60 petaflops/s-days\n  type: model\n  url: https://arxiv.org/pdf/2203.02155.pdf\n- access:\n    explanation: 'Models are available at [https://github.com/openai/whisper](https://github.com/openai/whisper).\n\n      '''\n    value: open\n  analysis: The model was evaluated for zero-shot English and multingual speech recognition,\n    translation, language identification and robustness to noise.\n  created_date:\n    explanation: 'The date that Whisper was announced to the public [[OpenAI Blog\n      Post]](https://openai.com/blog/whisper/).\n\n      '''\n    value: 2022-09-21\n  dependencies:\n  - Whisper dataset\n  description: Whisper is an audio transcription software.\n  feedback: 'The discussions page of the codebase is not formally cited as a place\n    for feedback, but is being used in this way [[Discussions page]](https://github.com/openai/whisper/discussions)\n\n    '\n  intended_uses: 'Whisper is a general-purpose speech recognition model; it is a multi-task\n    model that can perform multilingual speech recognition as well as speech translation\n    and language identification.\n\n    '\n  license:\n    explanation: https://github.com/openai/whisper](https://github.com/openai/whisper\n    value: MIT\n  modality: audio; text\n  model_card: https://github.com/openai/whisper/blob/main/model-card.md\n  monitoring:\n    explanation: 'No monitoring is mentioned by the authors.\n\n      '''\n    value: none\n  name: Whisper\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: unknown\n  quality_control: No specific quality control methods are documented.\n  size: 1.5B parameters (dense)\n  training_emissions: unknown\n  training_hardware:\n    explanation: 'The original paper doesn''''t specify the training hardware for\n      Whisper, but we expect it is trained on Azure given other models from OpenAI\n      are trained using Azure [[Microsoft Blog Post]] (https://blogs.microsoft.com/ai/openai-azure-supercomputer/).\n\n      '''\n    value: Azure\n  training_time: unknown\n  type: model\n  url: https://cdn.openai.com/papers/whisper.pdf\n- access:\n    explanation: 'Model checkpoints and the helper code can be accessed at the official\n      CLIP repository [[CLIP Repository]](https://github.com/openai/CLIP).\n\n      '''\n    value: open\n  analysis: The model was evaluated on standard vision datasets (e.g. CIFAR10, ImageNet)\n    and showed robust state of the art results.\n  created_date:\n    explanation: 'The date of the blog post announcing CLIP [[OpenAI Blog Post]](https://openai.com/blog/clip/).\n\n      '''\n    value: 2021-01-05\n  dependencies:\n  - CLIP dataset\n  description: \"\\\"CLIP (Contrastive Language-Image Pre-Training) is a neural network\\\n    \\ trained on a variety of (image, text) pairs. It can be instructed in natural\\\n    \\ language to predict the most relevant text snippet, given an image, without\\\n    \\ directly optimizing for the task, similarly to the zero-shot capabilities of\\\n    \\ GPT-2 and 3. We found CLIP matches the performance of the original ResNet50\\\n    \\ on ImageNet \\u201Czero-shot\\u201D without using any of the original 1.28M labeled\\\n    \\ examples, overcoming several major challenges in computer vision\\\" [[CLIP Repository]](https://github.com/openai/CLIP).\\n\"\n  feedback: Questions can be shared at the feedback form linked in the CLIP model\n    card [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.mdlicen).\n  intended_uses: 'The model is intended to be used by AI researchers to better understand\n    \"robustness, generalization, and other capabilities, biases, and constraints of\n    computer vision models\" [[CLIP Model Card]](https://github.com/openai/CLIP/blob/main/model-card.md).\n\n    '\n  license: MIT\n  modality: image, text; text\n  model_card: https://github.com/openai/CLIP/blob/main/model-card.md\n  monitoring:\n    explanation: There are no monitoring mechanisms in place for CLIP.\n    value: none\n  name: CLIP\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: \"\\\"Any deployed use case of the model - whether commercial or not\\\n    \\ - is currently out of scope. Non-deployed use cases such as image search in\\\n    \\ a constrained environment, are also not recommended unless there is thorough\\\n    \\ in-domain testing of the model with a specific, fixed class taxonomy. This is\\\n    \\ because our safety assessment demonstrated a high need for task specific testing\\\n    \\ especially given the variability of CLIP\\u2019s performance with different class\\\n    \\ taxonomies. This makes untested and unconstrained deployment of the model in\\\n    \\ any use case currently potentially harmful.\\nCertain use cases which would fall\\\n    \\ under the domain of surveillance and facial recognition are always out-of-scope\\\n    \\ regardless of performance of the model. This is because the use of artificial\\\n    \\ intelligence for tasks such as these can be premature currently given the lack\\\n    \\ of testing norms and checks to ensure its fair use.\\nSince the model has not\\\n    \\ been purposefully trained in or evaluated on any languages other than English,\\\n    \\ its use should be limited to English language use cases\\\" [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.mdlicen).\\n\"\n  quality_control: 'The authors found that the performance of the model depended heavily\n    on which classes are included (and excluded) for a given task. They reported significant\n    race and gender based disparities on the Fairface dataset, depending on how the\n    classes were constructed. The authors also demonstrated that the model was capable\n    of racial profiling with high accuracy [[Section 7]](https://arxiv.org/pdf/2103.00020.pdf#section.7).\n\n    '\n  size:\n    explanation: 'The total size is unknown, but the largest CLIP model is a a combination\n      of 63M-parameter (dense) text encoder and a 307M-parameter vision encoder.\n\n      '\n    value: unknown\n  training_emissions: unknown\n  training_hardware: NVIDIA V100 GPUs\n  training_time:\n    explanation: \"The exact training time of CLIP depends on the vision and language\\\n      \\ encoders used: \\\"The largest ResNet model, RN50x64, took 18 days to train\\\n      \\ on 592 V100 GPUs while the largest Vision Transformer took 12 days on 256\\\n      \\ V100 GPUs. For the ViT-L/14 we also pre-train at a higher 336 pixel resolution\\\n      \\ for one additional epoch to boost performance ... Unless otherwise specified,\\\n      \\ all results reported in this paper as \\u201CCLIP\\u201D use this model which\\\n      \\ we found to perform best\\\" [[CLIP paper]](https://arxiv.org/pdf/2103.00020.pdf).\\n\\\n      Using the GPU Time method outlined in the [[OpenAI AI and Computer Blog]](https://openai.com/blog/ai-and-compute/#addendum),\\\n      \\ we can compute the training time in petaflop/s-day for RN50x64 and ViT-L/14\\\n      \\ with the following equation: Number of GPUs * (peta-flops/GPU) * days trained\\\n      \\ * estimated utilization. We will assume that estimated utilization is 33%,\\\n      \\ following [[OpenAI AI and Computer Blog]](https://openai.com/blog/ai-and-compute/#addendum).\\\n      \\ The specific V100 GPU used isn't cleared from the paper, so we will assume\\\n      \\ that V100 PCle was used. The V100 PCle GPUs have a single precision performance\\\n      \\ of 15.7 teraflops [[V100 Datasheet]](https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf),\\\n      \\ which is equal to 15.7 / 1000 = 0.0157 petaflops.\\nFollowing the formula \\\"\\\n      Number of GPUs * (peta-flops/GPU) * days trained * estimated utilization\\\",\\\n      \\ we can compute the petaflop/s-day for RN50x64 as follows: 592 * 0.0157 * 18\\\n      \\ * 0.33. Similarly, for ViT-L/14, we can compute the same as: 256 * 0.0157\\\n      \\ * 12 * 0.33. Adding the two, we estimate the total training time as 71.12\\\n      \\ petaflop/s-day.\\n\"\n    value: 71.12 petaflop/s-day\n  type: model\n  url: https://arxiv.org/pdf/2103.00020.pdf\n- access:\n    explanation: Available via the OpenAI API as of Q1 2023.\n    value: limited\n  analysis: \"The model was evaluated against three prior approaches, AttnGAN, DM-GAN,\\\n    \\ and DF-GAN using Inception Score and Fr\\xE9chet Inception Distance on MS-COCO\\\n    \\ as metrics. The model was also evaluated by humans and received the majority\\\n    \\ of the votes in generating images that look realistic and better match the caption\\\n    \\ when compared to the images generated by DF-GAN [[Section]](https://arxiv.org/pdf/2102.12092.pdf#section.3).\\n\"\n  created_date:\n    explanation: \"The date of the blog post announcing DALL\\xB7E [[OpenAI Blog Post]](https://openai.com/blog/dall-e/).\\n\"\n    value: 2021-01-05\n  dependencies:\n  - \"DALL\\xB7E dataset\"\n  description: \"DALL\\xB7E is a GPT-3 based model trained to generate images from text\\\n    \\ descriptions. The authors found that it had \\\"a diverse set of capabilities,\\\n    \\ including creating anthropomorphized versions of animals and objects, combining\\\n    \\ unrelated concepts in plausible ways, rendering text, and applying transformations\\\n    \\ to existing images\\\" [[OpenAI Blog Post]](https://openai.com/blog/dall-e/).\\n\"\n  feedback: 'Contact the paper author(s) specified on the paper [[Paper]](https://arxiv.org/pdf/2102.12092.pdf).\n\n    '\n  intended_uses: '\"The model is intended for others to use for training their own\n    generative models\" [[Model Card]](https://github.com/openai/DALL-E/blob/master/model_card.md).\n\n    '\n  license: unknown\n  modality: text; image\n  model_card: https://github.com/openai/DALL-E/blob/master/model_card.md\n  monitoring:\n    explanation: \"There are no monitoring mechanisms in place for DALL\\xB7E.\"\n    value: none\n  name: \"DALL\\xB7E\"\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses:\n    explanation: The prohibited uses of the model are unknown.\n    value: unknown\n  quality_control: unknown\n  size: 12B parameters (dense)\n  training_emissions:\n    explanation: The training emissions were not reported.\n    value: unknown\n  training_hardware: NVIDIA V100 GPUs\n  training_time:\n    explanation: The training emissions were not reported.\n    value: unknown\n  type: model\n  url: https://arxiv.org/pdf/2102.12092.pdf\n- access:\n    explanation: The model can be downloaded from the [Github repository](https://github.com/openai/jukebox)\n    value: open\n  analysis: Evaluations in paper are primarily considering the fidelity and novelty\n    of samples from Jukebox.\n  created_date:\n    explanation: The date the model paper was released\n    value: 2020-04-30\n  dependencies:\n  - Jukebox Dataset\n  description: Jukebox is a generative model that produces music\n  feedback: none\n  intended_uses: ''\n  license:\n    explanation: 'The license is provided in the [Github repository](https://github.com/openai/jukebox).\n\n      '''\n    value: Noncommercial Use License\n  modality: text; audio\n  model_card:\n    explanation: No model card found as of 2023-01-09\n    value: none\n  monitoring: none\n  name: Jukebox\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: none\n  quality_control: ''\n  size: 5B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 510 V100s\n  training_time: 4 weeks\n  type: model\n  url: https://arxiv.org/abs/2005.00341\n- access:\n    explanation: \"The model is available to OpenAI employees, researchers, creatives\\\n      \\ and company friends. OpenAI opened a waitlist for DALL\\xB7E 2 access. [[System\\\n      \\ Card]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#access).\\n\"\n    value: limited\n  analysis: The model is capable of generating explicit content and the researchers\n    found limited amount of spurious content generated. The researchers also found\n    that visual synonyms can be used to prompt the model to surface unwanted generations\n    [[Probes and Evaluations]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#probes-and-evaluations).\n  created_date:\n    explanation: 'OpenAI released in a blog post in April 2020 [[OpenAI Blog Post]](https://openai.com/dall-e-2/).\n\n      '''\n    value: 2022-04-13\n  dependencies:\n  - \"DALL\\xB7E dataset\"\n  - CLIP dataset\n  description: \"\\\"DALL\\xB7E 2 is an artificial intelligence model that takes a text\\\n    \\ prompt and/or existing image as an input and generates a new image as an output\\\"\\\n    \\ [[System Card]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md).\\\n    \\ The model wasn't fully released, but OpenAI released a version of the model\\\n    \\ (DALL\\xB7E 2 Preview) to a select group of testers.\\n\"\n  feedback: Feedback can be provided at support at openai.com.\n  intended_uses: \"\\\"The intended use of the DALL\\xB7E 2 Preview at this time is for\\\n    \\ personal, non-commercial exploration and research purposes by people who are\\\n    \\ interested in understanding the potential uses of these capabilities\\\" [[Use]]\\\n    \\ (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#use).\\n\"\n  license: unknown\n  modality: text; image\n  model_card: https://github.com/openai/dalle-2-preview/blob/main/system-card.md\n  monitoring: 'Uses of the model are monitored. In the preview version, any user can\n    flag content. The specific policies for monitoring are not disclosed, but possible\n    measures include disabling of accounts violating the content policies [[Monitoring\n    and Reporting]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#monitoring-and-reporting).\n\n    '''\n  name: \"DALL\\xB7E 2\"\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: Use of the model is governed by the OpenAI Content Policy, which\n    prohibits posting of G rated content. Users are not allowed to utilize the model\n    in commercial products in the preview version [[Content Policy]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#policies-and-enforcement).\n  quality_control: The model is not fully released to the public as part of a quality\n    control measure. The usage of the model by testers is monitored and user provided\n    prompts are filtered [[Input filters]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#input-filters).\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/abs/2204.06125\n- access:\n    explanation: 'The OpenAI API is available to the public in supported countries\n      [[Supported Countries]](https://beta.openai.com/docs/supported-countries) [[OpenAI\n      API]](https://openai.com/api/).\n\n      '''\n    value: limited\n  adaptation: The API exposes the models fairly direclty with a range of hyperparameters\n    (e.g. temperature scaling).\n  created_date:\n    explanation: 'The date that OpenAI API was announced to the public [[Open AI Blog\n      Post]](https://openai.com/blog/openai-api/).\n\n      '''\n    value: 2020-06-11\n  dependencies:\n  - GPT-3\n  - Codex\n  - code-davinci-002\n  - text-davinci-002\n  - text-davinci-003\n  - gpt-3.5-turbo\n  - Whisper\n  - \"DALL\\xB7E\"\n  - GPT-4\n  - GPT-4 Turbo\n  description: 'OpenAI API is a general purpose \"text in, text out\" interface connecting\n    users with a suite of language models. The API was initially released as a gateway\n    to GPT-3, but it now supports access to other, more specialized OpenAI models.\n    [[Open AI Blog Post]](https://openai.com/blog/openai-api/)\n\n    '\n  failures:\n    explanation: 'There are no known documented failures of the OpenAI API at the\n      time of writing.\n\n      '''\n    value: unknown\n  feedback:\n    explanation: 'There is no known specific feedback channel for the OpenAI API,\n      but OpenAI support theme can be reached via email at support at openai.com.\n\n      '''\n    value: unknown\n  intended_uses: 'OpenAI API was designed to be used by developers to empower applications,\n    and researchers to study language models [[Section 3]](https://openai.com/api/policies/terms/).\n\n    '\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/api/policies/terms/).\n    value: custom\n  monitoring: 'OpenAI may monitor the API use to ensure \"quality and improve OpenAI\n    systems, products and services; perform research; and ensure compliance\" with\n    the Terms of Service and all applicable laws. Users of the API will give OpenAI\n    reasonable access to their application to monitor compliance with the terms listed\n    in the Terms of Service [[Section 5(b)]](https://openai.com/api/policies/terms/).\n    Apps using the OpenAI API should submit an application once they are deployed\n    to real users. The review form takes 10 minutes to complete and over 97% of the\n    applications are directly accepted or conditionally accepted. The applicants are\n    notified of the decision within 2 business days [[App Review Guidelines]] (https://beta.openai.com/docs/usage-guidelines/app-review).\n\n    '\n  monthly_active_users:\n    explanation: 'The number of monthly active users is not known publicly, but OpenAI\n      mentioned that the API was being used by tens of thousands of developers in\n      a blog post from 2021-11-18 [[OpenAI Blog Post]](https://openai.com/blog/api-no-waitlist/).\n\n      '''\n    value: unknown\n  name: OpenAI API\n  nationality: USA\n  organization: OpenAI\n  output_space: 'Given a prompting text, the OpenAI API provides access to text completions,\n    and log probabilities. The support for text and code embeddings were added on\n    2022-01-25 [[OpenAI Blog Post]] (https://openai.com/blog/introducing-text-and-code-embeddings/).\n\n    '\n  prohibited_uses: 'OpenAI API Terms of Use prohibits the use of the API in a way\n    violating the applicable law, including: (i) \"Illegal activities, such as child\n    pornography, gambling, cybercrime, piracy, violating copyright, trademark or other\n    intellectual property laws\"; (ii) \"Accessing or authorizing anyone to access the\n    APIs from an embargoed country, region, or territory as prohibited by the U.S.\n    government\"; (iii) \"Threatening, stalking, defaming, defrauding, degrading, victimizing\n    or intimidating anyone for any reason\". The usage requirements are detailed in\n    the Terms of Use [[Section 3]](https://openai.com/api/policies/terms/).\n\n    '\n  quality_control: 'Given a prompt, OpenAI API checks whether a completion contains\n    unsafe language using its filters and marks the completion accordingly if so.\n    The API also provides developers with special endpoints that scope the API usage.\n    OpenAI also developed user guidelines to help developers understand safety issues\n    [[OpenAI API]](https://openai.com/api/).\n\n    '\n  terms_of_service: https://openai.com/api/policies/terms/\n  type: application\n  url: https://openai.com/api/\n  user_distribution:\n    explanation: 'The distribution of the users is not known, but we estimate majority\n      of the users to be developers based in the United States.\n\n      '''\n    value: unknown\n- access: open\n  analysis: ''\n  created_date: 2022-06-23\n  dependencies:\n  - web_clean\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: video; video\n  model_card: ''\n  monitoring: ''\n  name: VPT\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 500M parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2206.11795\n- access: closed\n  analysis: ''\n  created_date: 2022-06-23\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: video\n  monitoring: ''\n  name: web_clean\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 70k hours\n  type: dataset\n  url: https://arxiv.org/abs/2206.11795\n- access: limited\n  adaptation: ''\n  created_date: 2022-11-30\n  dependencies:\n  - gpt-3.5-turbo\n  - OpenAI toxicity classifier\n  description: ChatGPT is an artificial intelligence chatbot developed by OpenAI.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/api/policies/terms/).\n    value: custom\n  monitoring: ''\n  monthly_active_users:\n    explanation: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/\n    value: 100M\n  name: ChatGPT\n  nationality: USA\n  organization: OpenAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://openai.com/blog/chatgpt\n  user_distribution: ''\n- access: limited\n  analysis: ''\n  created_date: 2023-03-01\n  dependencies:\n  - gpt-3.5-turbo dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/api/policies/terms/).\n    value: custom\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: gpt-3.5-turbo\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://platform.openai.com/docs/models/gpt-3-5\n- access: limited\n  analysis: none\n  created_date: 2023-11-06\n  dependencies: []\n  description: GPT-4 Turbo is a more capable version of GPT-4 and has knowledge of\n    world events up to April 2023. It has a 128k context window so it can fit the\n    equivalent of more than 300 pages of text in a single prompt.\n  feedback: none\n  intended_uses: ''\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/api/policies/terms/).\n    value: custom\n  modality: text; text\n  model_card: none\n  monitoring: unknown\n  name: GPT-4 Turbo\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo\n- access: limited\n  analysis: ''\n  created_date: 2023-03-01\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: gpt-3.5-turbo dataset\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://platform.openai.com/docs/models/gpt-3-5\n- access: limited\n  analysis: ''\n  created_date: 2022-05-01\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: code, text\n  monitoring: ''\n  name: code-davinci-002 dataset\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://platform.openai.com/docs/model-index-for-researchers\n- access: limited\n  analysis: ''\n  created_date: 2022-05-01\n  dependencies:\n  - code-davinci-002 dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; code, text\n  model_card: ''\n  monitoring: ''\n  name: code-davinci-002\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://platform.openai.com/docs/model-index-for-researchers\n- access: limited\n  analysis: ''\n  created_date: 2022-05-01\n  dependencies:\n  - code-davinci-002\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: text-davinci-002\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://platform.openai.com/docs/model-index-for-researchers\n- access: limited\n  analysis: ''\n  created_date: 2022-11-30\n  dependencies:\n  - text-davinci-002\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: text-davinci-003\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://platform.openai.com/docs/model-index-for-researchers\n- access: open\n  adaptation: ''\n  created_date: 2023-03-01\n  dependencies:\n  - Whisper\n  description: API to query OpenAI's Whisper model.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/api/policies/terms/).\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Whisper API\n  nationality: USA\n  organization: OpenAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://openai.com/blog/introducing-chatgpt-and-whisper-apis\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: 2023-03-01\n  dependencies:\n  - ChatGPT\n  description: API to query OpenAI's ChatGPT model.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/api/policies/terms/).\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: ChatGPT API\n  nationality: USA\n  organization: OpenAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://openai.com/blog/introducing-chatgpt-and-whisper-apis\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: 2022-08-10\n  dependencies:\n  - OpenAI toxicity classifier\n  description: \"This endpoint provides OpenAI API developers with free access to GPT-based\\\n    \\ classifiers that detect undesired content\\u2014an instance of using AI systems\\\n    \\ to assist with human supervision of these systems.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/api/policies/terms/).\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: OpenAI Moderation API\n  nationality: USA\n  organization: OpenAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://openai.com/blog/new-and-improved-content-moderation-tooling\n  user_distribution: ''\n- access: closed\n  analysis: ''\n  created_date: 2023-01-18\n  dependencies:\n  - OpenAI toxicity dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: OpenAI toxicity classifier\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://time.com/6247678/openai-chatgpt-kenya-workers/#:~:text=In%20a%20statement%2C%20an%20OpenAI,datasets%20of%20tools%20like%20ChatGPT.\n- access: closed\n  analysis: ''\n  created_date: 2023-01-18\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: OpenAI toxicity dataset\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://time.com/6247678/openai-chatgpt-kenya-workers/#:~:text=In%20a%20statement%2C%20an%20OpenAI,datasets%20of%20tools%20like%20ChatGPT.\n- access: limited\n  adaptation: ''\n  created_date: 2023-02-03\n  dependencies:\n  - Sage\n  description: A chatbot language model available via Quora's Poe\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Sage API\n  nationality: USA\n  organization: OpenAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://quorablog.quora.com/Poe-1\n  user_distribution: ''\n- access: limited\n  adaptation: ''\n  created_date: 2023-02-03\n  dependencies:\n  - Dragonfly\n  description: A chatbot language model available via Quora's Poe\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Dragonfly API\n  nationality: USA\n  organization: OpenAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://quorablog.quora.com/Poe-1\n  user_distribution: ''\n- access: limited\n  analysis: ''\n  created_date: 2023-02-03\n  dependencies: []\n  description: A chatbot language model available via Quora's Poe\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Sage\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://quorablog.quora.com/Poe-1\n- access: limited\n  analysis: ''\n  created_date: 2023-02-03\n  dependencies: []\n  description: A chatbot language model available via Quora's Poe\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Dragonfly\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://quorablog.quora.com/Poe-1\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-07\n  dependencies:\n  - ChatGPT API\n  description: \"The app integrates ChatGPT\\u2019s powerful AI technology to deliver\\\n    \\ instant conversation summaries, research tools, and writing assistance directly\\\n    \\ in Slack to help millions of companies work more productively.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: ChatGPT for Slack\n  nationality: unknown\n  organization: OpenAI, Salesforce\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.salesforce.com/news/stories/chatgpt-app-for-slack/\n  user_distribution: ''\n- access: limited\n  analysis: ''\n  created_date: 2023-03-14\n  dependencies: []\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: image, text; image, text\n  model_card: ''\n  monitoring: ''\n  name: GPT-4\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2303.08774\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-14\n  dependencies:\n  - GPT-4\n  description: \"GPT-4 is OpenAI\\u2019s most advanced system, producing safer and more\\\n    \\ useful responses\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/api/policies/terms/).\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: GPT-4 API\n  nationality: USA\n  organization: OpenAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://openai.com/product/gpt-4\n  user_distribution: ''\n- access: limited\n  adaptation: ''\n  created_date: 2023-08-28\n  dependencies:\n  - GPT-4\n  description: ChatGPT Enterprise offers enterprise-grade security and privacy, unlimited\n    higher-speed GPT-4 access, longer context windows for processing longer inputs,\n    advanced data analysis capabilities, and customization options compared to OpenAI's\n    previous offerings.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Per the Terms of Use, a limited license is provided to the users\n      during their use of the API [[Section 2]](https://openai.com/policies/terms-of-use).\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: ChatGPT Enterprise\n  nationality: USA\n  organization: OpenAI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://openai.com/policies/terms-of-use\n  type: application\n  url: https://openai.com/enterprise\n  user_distribution: ''\n- access:\n    explanation: \"DALL\\xB7E 3 is now in research preview, and will be available to\\\n      \\ ChatGPT Plus and Enterprise customers in October, via the API and in Labs\\\n      \\ later this fall.\"\n    value: limited\n  analysis: The model is capable of generating explicit content and the researchers\n    found limited amount of spurious content generated.\n  created_date:\n    explanation: \"OpenAI announced that DALL\\xB7E 3 was coming soon in a tweet on\\\n      \\ 2023-09-20. Users could begin experimenting with DALL\\xB7E 3 in research preview\\\n      \\ in early October.\"\n    value: 2023-09-20\n  dependencies:\n  - \"DALL\\xB7E 2 dataset\"\n  - CLIP dataset\n  - ChatGPT\n  description: \"DALL\\xB7E 3 is an artificial intelligence model that takes a text\\\n    \\ prompt and/or existing image as an input and generates a new image as an output\\\n    \\ The model is now in research preview, and will be available to ChatGPT Plus\\\n    \\ and Enterprise customers in October.\"\n  feedback: Feedback can be provided at openai.com\n  intended_uses: \"The intended use of the DALL\\xB7E 3 Preview at this time is for\\\n    \\ personal, non-commercial exploration and research purposes by people who are\\\n    \\ interested in understanding the potential uses of these capabilities\"\n  license:\n    explanation: License information can be found at https://openai.com/policies/terms-of-use\n    value: custom\n  modality: text; image\n  model_card: none\n  monitoring: Uses of the model are monitored. In the preview version, any user can\n    flag content. The specific policies for monitoring are not disclosed, but possible\n    measures include disabling of accounts violating the content\n  name: \"DALL\\xB7E 3\"\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: Use of the model is governed by the OpenAI Content Policy, which\n    prohibits posting of G rated content. Users are not allowed to utilize the model\n    in commercial products in the preview version.\n  quality_control: \"DALL\\xB7E 3 has mitigations to decline requests that ask for a\\\n    \\ public figure by name. We improved safety performance in risk areas like generation\\\n    \\ of public figures and harmful biases related to visual over/under-representation,\\\n    \\ in partnership with red teamers\\u2014domain experts who stress-test the model\\u2014\\\n    to help inform our risk assessment and mitigation efforts in areas like propaganda\\\n    \\ and misinformation.\"\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://openai.com/dall-e-3\n- access: limited\n  analysis: none\n  created_date: 2024-02-15\n  dependencies: []\n  description: Sora is an AI model that can create realistic and imaginative scenes\n    from text instructions.\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; image, video\n  model_card: none\n  monitoring: unknown\n  name: Sora\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://openai.com/sora\n- access: limited\n  analysis: When evaluated on standard performance benchmarks, achieves similar levels\n    of performance to GPT-4 Turbo.\n  created_date: 2024-05-13\n  dependencies: []\n  description: GPT-4o is OpenAI's new flagship model, as of release, that can reason\n    across audio, vision, and text in real time.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: audio, image, text, video; audio, image, text\n  model_card: none\n  monitoring: Internal monitoring of risk for non-text outputs before a public release\n    (currently only image, text inputs and text outputs are available).\n  name: GPT-4o\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: Training data filtering and post-training refinement act as additional\n    guardrails for preventing harmful outputs.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://openai.com/index/hello-gpt-4o/\n"
  },
  {
    "path": "assets/openbmb.yaml",
    "content": "- access: open\n  analysis: Evaluated on English and Chinese language benchmarks.\n  created_date: 2023-05-27\n  dependencies: []\n  description: CPM-Bee is a fully open-source, commercially-usable Chinese-English\n    bilingual base model with a capacity of ten billion parameters.\n  feedback: https://huggingface.co/openbmb/cpm-bee-10b/discussions\n  intended_uses: You can use the raw model for many NLP tasks like text generation\n    or fine-tune it to a downstream task.\n  license:\n    explanation: can be found at https://github.com/OpenBMB/CPM-Bee/blob/main/README_en.md#modellicense\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/openbmb/cpm-bee-10b\n  monitoring: unknown\n  name: CPM Bee\n  nationality: International\n  organization: OpenBMB\n  prohibited_uses: ''\n  quality_control: ''\n  size: 10B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/OpenBMB/CPM-Bee\n- access: open\n  analysis: Randomly chosen models trained on UltraFeedback evaluated across standard\n    benchmarks.\n  created_date: 2023-09-26\n  datasheet: https://huggingface.co/datasets/openbmb/UltraFeedback\n  dependencies: []\n  description: UltraFeedback is a large-scale, fine-grained, diverse preference dataset,\n    used for training powerful reward models and critic models.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/openbmb/UltraFeedback/discussions\n  included: ''\n  intended_uses: ''\n  license: MIT\n  modality: text\n  monitoring: unknown\n  name: UltraFeedback\n  nationality: International\n  organization: OpenBMB\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 256k samples\n  type: dataset\n  url: https://github.com/OpenBMB/UltraFeedback\n- access: open\n  analysis: Evaluated on open-sourced general benchmarks in comparison to SotA LLMs.\n  created_date: 2024-02-01\n  dependencies: []\n  description: MiniCPM is an End-Side LLM developed by ModelBest Inc. and TsinghuaNLP,\n    with only 2.4B parameters excluding embeddings (2.7B in total).\n  feedback: https://huggingface.co/openbmb/MiniCPM-V/discussions\n  intended_uses: ''\n  license:\n    explanation: can be found at https://github.com/OpenBMB/General-Model-License/tree/main\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/openbmb/MiniCPM-V\n  monitoring: unknown\n  name: MiniCPM\n  nationality: International\n  organization: OpenBMB\n  prohibited_uses: ''\n  quality_control: ''\n  size: 2.4B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/OpenBMB/MiniCPM/\n- access: open\n  analysis: The model was comprehensively benchmarked across 12 tests covering five\n    tasks. Eurus achieved the best overall performance among open-source models of\n    similar sizes and even outperformed specialized models in many cases.\n  created_date: 2024-04-02\n  dependencies:\n  - Eurus SFT\n  - UltraInteract\n  - UltraFeedback\n  description: Eurus is a suite of large language models (LLMs) optimized for reasoning.\n  feedback: https://huggingface.co/openbmb/Eurus-70b-nca/discussions\n  intended_uses: The model can be used for reasoning tasks and is especially tailored\n    for coding and math following specific prompts.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/openbmb/Eurus-70b-nca\n  monitoring: unknown\n  name: Eurus\n  nationality: International\n  organization: OpenBMB\n  prohibited_uses: none\n  quality_control: none\n  size: 70B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/abs/2404.02078\n"
  },
  {
    "path": "assets/openlemur.yaml",
    "content": "- access: open\n  analysis: Evaluated on text and code benchmarks in comparison to other models.\n  created_date: 2023-10-10\n  dependencies:\n  - LLaMA 2\n  - The Stack\n  - RefinedWeb\n  - RedPajama\n  - Common Crawl\n  - Wikipedia\n  - ArXiv\n  description: Lemur is an openly accessible language model optimized for both natural\n    language and coding capabilities to serve as the backbone of versatile language\n    agents.\n  feedback: https://huggingface.co/OpenLemur/lemur-70b-v1/discussions\n  intended_uses: ''\n  license: LLaMA2\n  modality: code, text; code, text\n  model_card: https://huggingface.co/OpenLemur/lemur-70b-v1\n  monitoring: ''\n  name: Lemur\n  nationality: USA\n  organization: OpenLemur\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: TPUv4-512 pod\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2310.06830.pdf\n- access: open\n  analysis: Evaluated on text and code benchmarks in comparison to other models.\n  created_date: 2023-10-10\n  dependencies:\n  - Lemur\n  - OpenAssistant 1\n  - OpenOrca\n  - ShareGPT & ChatLogs\n  - Evol-CodeAlpaca data\n  description: Lemur-Chat is an openly accessible language model optimized for both\n    natural language and coding capabilities to serve as the backbone of versatile\n    language agents.\n  feedback: https://huggingface.co/OpenLemur/lemur-70b-chat-v1/discussions\n  intended_uses: ''\n  license: CC-BY-NC-4.0\n  modality: text; text\n  model_card: https://huggingface.co/OpenLemur/lemur-70b-chat-v1\n  monitoring: ''\n  name: Lemur-Chat\n  nationality: USA\n  organization: OpenLemur\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2310.06830.pdf\n"
  },
  {
    "path": "assets/openx.yaml",
    "content": "- access: open\n  analysis: Analyzed on breakdown of types of robot trajectory in dataset, and overall\n    coverage.\n  created_date: 2023-10-03\n  datasheet: All data can be found at https://robotics-transformer-x.github.io/.\n  dependencies:\n    explanation: data compiled from unknown datasets in over 21 different institutions,\n      list of institutions can be found at https://robotics-transformer-x.github.io/paper.pdf\n    value: []\n  description: The Open X-Embodiment dataset is a dataset of robot movements assembled\n    from 22 different robots collected through a collaboration between 21 institutions,\n    demonstrating 527 skills (160266 tasks)\n  excluded: N/A\n  feedback: none\n  included: N/A\n  intended_uses: Further research on X-embodiment models.\n  license: Apache 2.0\n  modality: robot trajectories\n  monitoring: unknown\n  name: Open X-Embodiment dataset\n  nationality: International\n  organization: Open X-Embodiment\n  prohibited_uses: none\n  quality_control: unknown\n  sample: []\n  size: 160K tasks\n  type: dataset\n  url: https://robotics-transformer-x.github.io/\n- access: open\n  analysis: Evaluated on in-distribution robotics skills, and outperforms its predecessor\n    RT-1 by 50% in emergent skill evaluations.\n  created_date: 2023-10-03\n  dependencies:\n  - Open X-Embodiment dataset\n  - ImageNet EfficientNet\n  - USE\n  description: RT-1-X is a model trained on the Open X-Embodiment dataset that exhibits\n    better generalization and new capabilities compared to its predecessor RT-1, an\n    efficient Transformer-based architecture designed for robotic control.\n  feedback: none\n  intended_uses: Further research on X-embodiment models.\n  license: Apache 2.0\n  modality: images, text; robot trajectories\n  model_card: none\n  monitoring: unknown\n  name: RT-1-X\n  nationality: unknown\n  organization: Open X-Embodiment, Google Deepmind\n  prohibited_uses: none\n  quality_control: unknown\n  size: 35M parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://robotics-transformer-x.github.io/\n- access: closed\n  analysis: Evaluated on in-distribution robotics skills, and outperforms its predecessor\n    RT-2 by 3x in emergent skill evaluations.\n  created_date: 2023-10-03\n  dependencies:\n  - Open X-Embodiment dataset\n  - ViT (unknown size)\n  - UL2\n  description: RT-2-X is a model trained on the Open X-Embodiment dataset that exhibits\n    better generalization and new capabilities compared to its predecessor RT-2, a\n    large vision-language model co-fine-tuned to output robot actions as natural language\n    tokens.\n  feedback: none\n  intended_uses: Further research on X-embodiment models.\n  license: unknown\n  modality: images, text, robot trajectories; robot trajectories\n  model_card: none\n  monitoring: unknown\n  name: RT-2-X\n  nationality: unknown\n  organization: Open X-Embodiment, Google Deepmind\n  prohibited_uses: none\n  quality_control: unknown\n  size: 55B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://robotics-transformer-x.github.io/\n- access:\n    explanation: we are sharing the GPT-4o System Card, which includes our Preparedness\n      Framework evaluations.\n    value: limited\n  analysis: GPT-4o underwent evaluations that included the Preparedness Framework,\n    external red teaming, and third-party assessments to ensure safe and aligned deployment.\n    The evaluations focused on identifying and mitigating potential risks across its\n    capabilities, especially speech-to-speech functionality.\n  created_date: 2024-08-08\n  dependencies:\n  - Shutterstock\n  description: GPT-4o is an autoregressive omni model that accepts a combination of\n    text, audio, image, and video as input and produces any combination of text, audio,\n    and image outputs. It is trained end-to-end across text, vision, and audio, focusing\n    on multimodal capabilities.\n  feedback: unknown\n  intended_uses: Use in multimodal applications requiring understanding and generation\n    of combinations of text, audio, and image outputs, better performance on non-English\n    languages, and enhanced vision and audio understanding.\n  license: unknown\n  modality:\n    explanation: '...accepts as input any combination of text, audio, image, and video\n      and generates any combination of text, audio, and image outputs.'\n    value: text, audio, image, video; text, audio, image\n  model_card: unknown\n  monitoring: Continuous monitoring and enforcement, providing moderation tools and\n    transparency reports, and gathering feedback from users.\n  name: GPT-4o\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: Uses that could involve bias, discrimination, harmful content,\n    or violation of usage policies.\n  quality_control: Quality and safety measures included prior risk assessments, post-training\n    mitigation, moderation tools, advanced data filtering, and external red teaming\n    efforts with experts to evaluate potential risks like bias, discrimination, and\n    information harms.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2410.21276\n- access: limited\n  analysis: Evaluated on challenging benchmark tasks in physics, chemistry, and biology.\n    In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o\n    correctly solved only 13% of problems, while the reasoning model o1 scored 83%.\n  created_date: 2024-09-12\n  dependencies: []\n  description: OpenAI o1 is a new series of AI models designed to spend more time\n    thinking before they respond. They can reason through complex tasks and solve\n    harder problems than previous models in science, coding, and math.\n  feedback: unknown\n  intended_uses: \"These enhanced reasoning capabilities may be particularly useful\\\n    \\ if you\\u2019re tackling complex problems in science, coding, math, and similar\\\n    \\ fields. For example, o1 can be used by healthcare researchers to annotate cell\\\n    \\ sequencing data, by physicists to generate complicated mathematical formulas\\\n    \\ needed for quantum optics, and by developers in all fields to build and execute\\\n    \\ multi-step workflows.\"\n  license: unknown\n  modality: text; text\n  model_card: unknown\n  monitoring: ''\n  name: o1\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: \"To match the new capabilities of these models, OpenAI has bolstered\\\n    \\ safety work, internal governance, and federal government collaboration. This\\\n    \\ includes rigorous testing and evaluations using their Preparedness Framework\\u2060\\\n    (opens in a new window), best-in-class red teaming, and board-level review processes,\\\n    \\ including by OpenAI's Safety & Security Committee.\"\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://openai.com/o1/\n- access: limited\n  analysis: Makes significance process on the ARC-AGI evaluation framework compared\n    to all existing models.\n  created_date: 2024-10-20\n  dependencies: []\n  description: OpenAI o1 is, as of release, the latest model in OpenAI's o-model reasoning\n    series.\n  feedback: unknown\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: unknown\n  monitoring: ''\n  name: o3\n  nationality: USA\n  organization: OpenAI\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://x.com/OpenAI/status/1870186518230511844\n"
  },
  {
    "path": "assets/orion.yaml",
    "content": "- access: open\n  analysis: Evaluated on multilingual and NLP benchmarks in comparison with SoTA models\n    of comparable size.\n  created_date: 2024-01-20\n  dependencies: []\n  description: Orion series models are open-source multilingual large language models\n    trained from scratch by OrionStarAI.\n  feedback: https://huggingface.co/OrionStarAI/Orion-14B-Base/discussions\n  intended_uses: ''\n  license:\n    explanation: Model license can be found at https://github.com/OrionStarAI/Orion/blob/master/ModelsCommunityLicenseAgreement.\n      Code license is under Apache 2.0\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/OrionStarAI/Orion-14B-Base\n  monitoring: unknown\n  name: Orion\n  nationality: China\n  organization: OrionStarAI\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 14B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/OrionStarAI/Orion\n"
  },
  {
    "path": "assets/osu.yaml",
    "content": "- access: open\n  analysis: Evaluated on MATH, a competition-level dataset, and achieves a 46% accuracy,\n    higher than accuracy produced by GPT-4's chain of thought.\n  created_date: 2023-09-11\n  dependencies:\n  - MathInstruct\n  - LLaMA\n  - Code LLaMA\n  description: MAmmoTH is a series of open-source large language models (LLMs) specifically\n    tailored for general math problem-solving.\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: MAmmoTH\n  nationality: USA\n  organization: Ohio State University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 34B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2309.05653.pdf\n"
  },
  {
    "path": "assets/othersideai.yaml",
    "content": "- access: limited\n  adaptation: unknown\n  created_date: unknown\n  dependencies:\n  - OpenAI API\n  description: 'HyperWrite is a writing assistant that generates text based on a user''s\n    request, as well as style and tone choices.\n\n    '\n  failures: unknown\n  feedback: unknown\n  intended_uses: 'HyperWrite is intended to be used as a writing assistant.\n\n    '\n  license:\n    explanation: '\"The Software, including software embedded in the Software, is licensed,\n      not sold, to you by Otherside only under the terms of this Agreement; and Otherside\n      reserves all rights not expressly granted under this Agreemen\" - excerpt from\n      the Terms of Service.\n\n      '\n    value: custom\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: HyperWrite\n  nationality: USA\n  organization: OthersideAI\n  output_space: Generation\n  prohibited_uses: unknown\n  quality_control:\n    explanation: 'There is a HyperWrite blog warning the users about the risks of\n      writing with an AI assistant, but the company doesn''t list the quality control\n      measures taken to prevent the listed risks, if any [[HyperWrite Blog]] (https://blog.hyperwrite.ai/what-are-the-risks-of-ai-writing/).\n\n      '\n    value: unknown\n  terms_of_service: https://hyperwriteai.com/terms\n  type: application\n  url: https://hyperwriteai.com/\n  user_distribution: unknown\n"
  },
  {
    "path": "assets/oxford.yaml",
    "content": "- access:\n    explanation: The captions and video url's can be found at the [[dataset page]](https://m-bain.github.io/webvid-dataset/).\n      CLIP ViT-B/32 features of this dataset can be found at the [[WebVid github repository]](https://github.com/m-bain/webvid#download-clip-features-%EF%B8%8F).\n    value: open\n  analysis: unknown\n  created_date:\n    explanation: 'The date the [[paper]](https://arxiv.org/abs/2104.00650) was released.\n\n      '\n    value: 2021-04-01\n  datasheet: none\n  dependencies: []\n  description: 'WebVid-10M is a large-scale dataset of short videos with textual descriptions\n    sourced from stock footage sites.\n\n    '\n  excluded: unknown\n  feedback: Feedback can be given by emailing at maxbain at robots.ox.ac.uk\n  included:\n    explanation: Authors note in the paper:\"We scrape the web for a new dataset of\n      videos with textual description annotations, called WebVid-2M. The data was\n      scraped from the web following a similar procedure to Google Conceptual Captions\"\n    value: Video URLs and textual description annotations\n  intended_uses:\n    explanation: See [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md)\n    value: '\"Provided you keep to these rules, the University grants you (the researcher)\n      a non-exclusive and non-transferable licence to use the content free of charge\n      strictly for non-commercial research (i.e., whose output artefacts are not incorporated\n      in commercial products) for 12 months.\"\n\n      '\n  license:\n    explanation: '[WebVid Dataset Terms](https://github.com/m-bain/webvid/blob/main/TERMS.md)\n\n      '\n    value: WebVid Dataset Terms\n  modality: text, video\n  monitoring: ''\n  name: WebVid-10M\n  nationality: UK\n  organization: University of Oxford\n  prohibited_uses: '- Authors note the following prohibited uses: \"You must not use\n    the content other than for the Permitted Purpose in strict conformity with these\n    terms and any other reasonable instructions of the University. You must not, except\n    as may be strictly necessary for carrying out the Permitted Purpose, provide or\n    otherwise make available content to any third party or allow use of it or them\n    by or on behalf of any third party, in whole or in part, whether by way of sale,\n    resale, loan, transfer, hire or any other form of exploitation; or attempt to\n    identify any living or deceased individual from the content.\" [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md)\n\n    - Authors also note the following limitations of the dataset: \"We note that data\n    sourced from the web may be prone to biases and may contain graphic content. Please\n    be careful of unintended societal, gender, racial and other biases when training\n    or deploying models trained on this data.\" [[Disclaimer]](https://github.com/m-bain/webvid#disclaimer-%EF%B8%8F)\n\n    '\n  quality_control: unknown\n  sample: []\n  size: 10.7M video-text pairs, 52K hours video\n  type: dataset\n  url: https://m-bain.github.io/webvid-dataset/\n- access:\n    explanation: The captions and video url's can be found at the [[dataset page]](https://m-bain.github.io/webvid-dataset/).\n      CLIP ViT-B/32 features of this dataset can be found at the [[WebVid github repository]](https://github.com/m-bain/webvid#download-clip-features-%EF%B8%8F).\n    value: open\n  analysis: unknown\n  created_date:\n    explanation: 'The date the [[paper]](https://arxiv.org/abs/2104.00650) was released.\n\n      '\n    value: 2021-04-01\n  datasheet: none\n  dependencies:\n  - WebVid-10M\n  description: 'WebVid-2M is a large-scale dataset of 2.5M short videos with textual\n    descriptions sourced from stock footage sites. A subset of the WebVid-10M dataset.\n\n    '\n  excluded: unknown\n  feedback: Feedback can be given by emailing at maxbain at robots.ox.ac.uk\n  included:\n    explanation: Authors note in the paper:\"We scrape the web for a new dataset of\n      videos with textual description annotations, called WebVid-2M. The data was\n      scraped from the web following a similar procedure to Google Conceptual Captions\"\n    value: Video URLs and textual description annotations\n  intended_uses:\n    explanation: See [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md)\n    value: '\"Provided you keep to these rules, the University grants you (the researcher)\n      a non-exclusive and non-transferable licence to use the content free of charge\n      strictly for non-commercial research (i.e., whose output artefacts are not incorporated\n      in commercial products) for 12 months.\"\n\n      '\n  license:\n    explanation: '[WebVid Dataset Terms](https://github.com/m-bain/webvid/blob/main/TERMS.md)\n\n      '\n    value: WebVid Dataset Terms\n  modality: text, video\n  monitoring: ''\n  name: WebVid-2M\n  nationality: UK\n  organization: University of Oxford\n  prohibited_uses: '- Authors note the following prohibited uses: \"You must not use\n    the content other than for the Permitted Purpose in strict conformity with these\n    terms and any other reasonable instructions of the University. You must not, except\n    as may be strictly necessary for carrying out the Permitted Purpose, provide or\n    otherwise make available content to any third party or allow use of it or them\n    by or on behalf of any third party, in whole or in part, whether by way of sale,\n    resale, loan, transfer, hire or any other form of exploitation; or attempt to\n    identify any living or deceased individual from the content.\" [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md)\n\n    - Authors also note the following limitations of the dataset: \"We note that data\n    sourced from the web may be prone to biases and may contain graphic content. Please\n    be careful of unintended societal, gender, racial and other biases when training\n    or deploying models trained on this data.\" [[Disclaimer]](https://github.com/m-bain/webvid#disclaimer-%EF%B8%8F)\n\n    '\n  quality_control: unknown\n  sample: []\n  size: 2.5M video-text pairs, 13K hours video\n  type: dataset\n  url: https://m-bain.github.io/webvid-dataset/\n"
  },
  {
    "path": "assets/paladin.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-02-01\n  dependencies: []\n  description: UnderwriteGPT is the world's first generative AI underwriting tool.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  monitoring: ''\n  monthly_active_users: ''\n  name: UnderwriteGPT\n  nationality: USA\n  organization: Paladin Group and Dais Technology\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://dais.com/underwritegpt/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/peking.yaml",
    "content": "- access: open\n  analysis: Compared with other multi-task, instruction-following agents.\n  created_date: 2023-11-10\n  dependencies: []\n  description: JARVIS-1 is an open-world agent that can perceive multimodal input\n    (visual observations and human instructions), generate sophisticated plans, and\n    perform embodied control, all within the popular yet challenging open-world Minecraft\n    universe.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; in-game actions\n  model_card: none\n  monitoring: none\n  name: JARVIS-1\n  nationality: China\n  organization: Peking University Institute for Artificial Intelligence\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2311.05997.pdf\n"
  },
  {
    "path": "assets/perplexity.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date:\n    explanation: Perplexity Ask was announced in a [[Twitter post]](https://twitter.com/perplexity_ai/status/1600551871554338816?s=20)\n      by the company.\n    value: 2022-12-07\n  dependencies:\n  - GPT-3.5\n  - Bing Search\n  description: Perplexity Ask is a new search interface that uses advanced artificial\n    intelligence technologies\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: none\n  monitoring: ''\n  monthly_active_users: ''\n  name: Perplexity Ask\n  nationality: USA\n  organization: Perplexity\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.perplexity.ai/\n  user_distribution: ''\n- access: closed\n  adaptation: ''\n  created_date: 2022-12-15\n  dependencies:\n  - Perplexity Ask\n  - OpenAI API\n  description: Twitter search interface that is powered by Perplexity's structured\n    search engine.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: none\n  monitoring: ''\n  monthly_active_users: ''\n  name: Bird SQL\n  nationality: USA\n  organization: Perplexity\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.perplexity.ai/sql\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: 2023-10-27\n  dependencies: []\n  description: Perplexity chat is an AI chatbot trained in-house by Perplexity.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: none\n  monitoring: ''\n  monthly_active_users: ''\n  name: Perplexity Chat\n  nationality: USA\n  organization: Perplexity\n  output_space: Chatbot output in response to user queries\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://blog.perplexity.ai/legal/terms-of-service\n  type: application\n  url: https://labs.perplexity.ai/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/pleias.yaml",
    "content": "- access: open\n  analysis: unknown\n  created_date: 2024-03-20\n  datasheet: ''\n  dependencies: []\n  description: Common Corpus is the largest public domain dataset released for training\n    Large Language Models (LLMs). This dataset includes 500 billion words from a diverse\n    range of cultural heritage initiatives and is the largest corpus in English, French,\n    Dutch, Spanish, German and Italian. It supports efforts to train fully open LLMs\n    on sources without copyright concerns.\n  excluded: The data excluded are those that have copyright issues.\n  feedback: unknown\n  included: The dataset includes 500 billion words from a wide diversity of cultural\n    heritage initiatives. It also has the largest English-speaking dataset to date\n    with 180 billion words, including a major US collection of 21 million digitized\n    newspapers and large monographs datasets collected by digital historian Sebastian\n    Majstorovic. It also contains a huge volume of data in French (110 billion words),\n    German (30 billion words), Spanish, Dutch and Italian, as well as data in low-resource\n    languages that are currently underrepresented.\n  intended_uses: The dataset is intended to support open and reproducible AI research,\n    enhancing accessibility, diversity, and democracy in AI by enabling everyone to\n    explore large models.\n  license: none\n  modality: text\n  monitoring: unknown\n  name: Common Corpus\n  nationality: USA\n  organization: Pleias\n  prohibited_uses: It should not be used for tasks that infringe on copyright laws.\n  quality_control: All data included in the corpus are from fully open and auditable\n    sources, ensuring they are copyright-free.\n  sample: []\n  size: 500 billion words\n  type: dataset\n  url: https://huggingface.co/blog/Pclanglais/common-corpus\n"
  },
  {
    "path": "assets/portkey.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-05-06\n  dependencies: []\n  description: Portkey is a hosted middleware that allows users to create generative\n    AI applications\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  monitoring: ''\n  monthly_active_users: ''\n  name: Portkey\n  nationality: USA\n  organization: Portkey\n  output_space: generative AI apps\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://portkey.ai/terms\n  type: application\n  url: https://portkey.ai/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/quizlet.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-03-01\n  dependencies:\n  - ChatGPT API\n  description: Quizlet is introducing Q-Chat, a fully-adaptive AI tutor that engages\n    students with adaptive questions based on relevant study materials delivered through\n    a fun chat experience.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: none\n  monitoring: ''\n  monthly_active_users: ''\n  name: Q-Chat\n  nationality: USA\n  organization: Quizlet\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://quizlet.com/tos\n  type: application\n  url: https://openai.com/blog/introducing-chatgpt-and-whisper-apis\n  user_distribution: ''\n"
  },
  {
    "path": "assets/quora.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-02-03\n  dependencies:\n  - ChatGPT API\n  - GPT-4 API\n  - Claude API\n  - Dragonfly API\n  - Sage API\n  description: Poe lets people ask questions, get instant answers, and have back-and-forth\n    conversations with several AI-powered bots. It is initially available on iOS,\n    but we will be adding support for all major platforms in the next few months,\n    along with more bots.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: none\n  monitoring: ''\n  monthly_active_users: ''\n  name: Poe\n  nationality: USA\n  organization: Quora\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://poe.com/tos\n  type: application\n  url: https://quorablog.quora.com/Poe-1\n  user_distribution: ''\n"
  },
  {
    "path": "assets/qwen_team.yaml",
    "content": "- access: open\n  analysis: Models have been evaluated on a series of math benchmarks, demonstrating\n    outperformance of the state-of-the-art models in both the English and Chinese\n    language.\n  created_date: 2024-08-08\n  dependencies: []\n  description: Qwen2-Math is a series of specialized math language models built upon\n    the Qwen2 large language models, with a focus on enhancing the reasoning and mathematical\n    capabilities. Their intended use is for solving complex mathematical problems.\n    They significantly outperform both open-source and closed-source models in terms\n    of mathematical capabilities.\n  feedback: Problems with the model should be reported to the Qwen Team via their\n    official channels.\n  intended_uses: These models are intended for solving complex mathematical problems.\n  license: Tongyi Qianwen\n  modality: text; text\n  model_card: https://huggingface.co/Qwen/Qwen2-Math-72B\n  monitoring: Unknown\n  name: Qwen2-Math\n  nationality: China\n  organization: Qwen Team\n  prohibited_uses: Uses that go against the ethical usage policies of Qwen Team.\n  quality_control: The models were tested with few-shot chain-of-thought prompting\n    and evaluated across mathematical benchmarks in both English and Chinese.\n  size: 72B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://qwenlm.github.io/blog/qwen2-math/\n"
  },
  {
    "path": "assets/rakuten.yaml",
    "content": "- access: open\n  analysis: RakutenAI achieves the highest average score in both Japanese and English\n    LM-Harness metrics, outperforming other similarly-sized Japanese language models.\n  created_date: 2024-03-21\n  dependencies:\n  - Mistral\n  description: RakutenAI-7B is a model developed with a focus on Japanese language\n    understanding. It offers competitive performance on English tests as well.\n  feedback: https://huggingface.co/Rakuten/RakutenAI-7B/discussions\n  intended_uses: The model can be used for text generation tasks in both Japanese\n    and English.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/Rakuten/RakutenAI-7B\n  monitoring: unknown\n  name: RakutenAI\n  nationality: Japan\n  organization: Rakuten\n  prohibited_uses: unknown\n  quality_control: unknown\n  size: 7B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://global.rakuten.com/corp/news/press/2024/0321_01.html\n"
  },
  {
    "path": "assets/reexpress.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-03-21\n  dependencies: []\n  description: Reexpress One offers a means of document classification, semantic search,\n    and uncertainty analysis on-device.\n  failures: unknown\n  feedback: https://github.com/ReexpressAI/support\n  intended_uses: ''\n  license: unknown\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Reexpress One\n  nationality: USA\n  organization: Reexpress AI\n  output_space: data analyses\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: hhttps://re.express/tos.html\n  type: application\n  url: https://re.express/index.html\n  user_distribution: unknown\n"
  },
  {
    "path": "assets/reka.yaml",
    "content": "- access: limited\n  analysis: Evaluated on MMLU, GSM8K, HumanEval, and GPQA benchmarks, among others.\n  created_date: 2024-02-12\n  dependencies: []\n  description: Reka Flash is a multimodal, multilingual, state-of-the-art 21B model\n    trained entirely from scratch.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: image, text, video; text\n  model_card: none\n  monitoring: unknown\n  name: Reka Flash\n  nationality: USA\n  organization: Reka\n  prohibited_uses: ''\n  quality_control: ''\n  size: 21B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://reka.ai/reka-flash-an-efficient-and-capable-multimodal-language-model/\n- access: limited\n  analysis: Reka Core was evaluated against leading models such as OpenAIs GPT-4,\n    Claude-3 Opus, and Gemini Ultra on a variety of tasks and metrics including multimodal\n    and human evaluation conducted by a third party. It was found to be competitive\n    or even surpassing these models.\n  created_date: 2024-04-15\n  dependencies: []\n  description: Reka Core is a frontier-class multimodal language model comparable\n    to industry leaders. It has powerful capabilities including multimodal understanding\n    (including images, videos, and audio), superb reasoning abilities, code generation,\n    and multilinguality with proficiency in 32 languages.\n  feedback: unknown\n  intended_uses: Reka Core can be used in e-commerce, social media, digital content\n    and video games, healthcare, robotics, and other industries for tasks that require\n    multimodal understanding, coding, complex reasoning, and more.\n  license: unknown\n  modality: audio, image, text, video; text\n  model_card: none\n  monitoring: unknown\n  name: Reka Core\n  nationality: USA\n  organization: Reka\n  prohibited_uses: unknown\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: thousands of GPUs\n  training_time: few months\n  type: model\n  url: https://www.reka.ai/news/reka-core-our-frontier-class-multimodal-language-model\n"
  },
  {
    "path": "assets/robin.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - Anthropic API\n  description: Robin AI uses Claude and Anthropic's models to understand language\n    - including in technical domains like legal language. It's also very confident\n    at drafting, summarising, translations, and explaining complex concepts in simple\n    terms\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: none\n  monitoring: ''\n  monthly_active_users: ''\n  name: Robin AI\n  nationality: USA\n  organization: Robin AI\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.robinai.co.uk/terms\n  type: application\n  url: https://www.robinai.co.uk/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/runway_ai,_inc..yaml",
    "content": "- access: open\n  analysis: Unknown\n  created_date: 2024-06-17\n  dependencies: []\n  description: Gen-3 Alpha is a foundation model trained for large-scale multimodal\n    tasks. It is a major improvement in fidelity, consistency, and motion over the\n    previous generation, Gen-2. Gen-3 Alpha can power various tools, such as Text\n    to Video, Image to Video, and Text to Image. The model excels at generating expressive\n    human characters with a wide range of actions, gestures, and emotions, and is\n    capable of interpreting a wide range of styles and cinematic terminology. It is\n    also a step towards building General World Models. It has been designed for use\n    by research scientists, engineers, and artists, and can be fine-tuned for customization\n    according to specific stylistic and narrative requirements.\n  feedback: Companies interested in fine-tuning and custom models can reach out to\n    Runway AI, Inc. using a form on their website.\n  intended_uses: Can be used to create expressive human characters, interpret a wide\n    range of styles and cinematic terminology, and power tools for Text to Video,\n    Image to Video, and Text to Image tasks.\n  license: Terms of Use listed on Runway AI, Inc.'s website, specific license unknown\n  modality: text, image, video; video\n  model_card: unknown\n  monitoring: The model includes a new and improved in-house visual moderation system.\n  name: Gen-3 Alpha\n  nationality: USA\n  organization: Runway AI, Inc.\n  prohibited_uses: Unknown\n  quality_control: It will be released with a set of new safeguards, including an\n    improved in-house visual moderation system and C2PA provenance standards.\n  size: Unknown\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://runwayml.com/research/introducing-gen-3-alpha?utm_source=xinquji\n"
  },
  {
    "path": "assets/rwkv.yaml",
    "content": "- access: open\n  analysis: none\n  created_date: 2023-05-03\n  dependencies: []\n  description: RWKV World 4 is an RNN with GPT-level LLM performance, which can also\n    be directly trained like a GPT transformer (parallelizable).\n  feedback: https://huggingface.co/RWKV/rwkv-4-world-7b/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/RWKV/rwkv-4-world-7b\n  monitoring: unknown\n  name: RWKV World 4\n  nationality: China\n  organization: RWKV\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/RWKV/rwkv-4-world-7b\n- access: open\n  analysis: none\n  created_date: 2023-05-15\n  dependencies: []\n  description: RWKV 4 Pile is an RNN with GPT-level LLM performance, which can also\n    be directly trained like a GPT transformer (parallelizable).\n  feedback: https://huggingface.co/RWKV/rwkv-4-14b-pile/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/RWKV/rwkv-4-14b-pile\n  monitoring: unknown\n  name: RWKV 4 Pile\n  nationality: China\n  organization: RWKV\n  prohibited_uses: ''\n  quality_control: ''\n  size: 14B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/RWKV/rwkv-4-14b-pile\n- access: open\n  analysis: none\n  created_date: 2023-12-16\n  dependencies: []\n  description: RWKV World 5 is an RNN with GPT-level LLM performance, which can also\n    be directly trained like a GPT transformer (parallelizable).\n  feedback: https://huggingface.co/RWKV/rwkv-5-world-3b/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/RWKV/rwkv-5-world-3b\n  monitoring: unknown\n  name: RWKV World 5\n  nationality: China\n  organization: RWKV\n  prohibited_uses: ''\n  quality_control: ''\n  size: 3B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/RWKV/rwkv-5-world-3b\n"
  },
  {
    "path": "assets/salesforce.yaml",
    "content": "- access:\n    explanation: Model weights are available for download with links in the [[GitHub\n      repo]](https://github.com/salesforce/CodeGen)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: Date model paper was released\n    value: 2022-03-25\n  dependencies: []\n  description: CodeGen is a language model for code\n  feedback: ''\n  intended_uses: ''\n  license: none (model weights), BSD-3-Clause (code)\n  modality: code, text; code, text\n  model_card: ''\n  monitoring: ''\n  name: CodeGen\n  nationality: USA\n  organization: Salesforce\n  prohibited_uses: ''\n  quality_control: ''\n  size: 16B parameters (dense)\n  training_emissions: ''\n  training_hardware: Unspecified Salesforce Compute (TPU-V4s)\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2203.13474\n- access: open\n  analysis: ''\n  created_date: 2022-01-28\n  dependencies:\n  - ViT-B\n  - BERT\n  - COCO\n  - Visual Genome\n  - Conceptual Captions\n  - Conceptual 12M\n  - SBU Captions\n  - LAION-115M\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: BSD-3-Clause\n  modality: text; image\n  model_card: ''\n  monitoring: ''\n  name: BLIP\n  nationality: USA\n  organization: Salesforce\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2201.12086\n- access: open\n  analysis: ''\n  created_date: 2022-01-28\n  datasheet: ''\n  dependencies:\n  - LAION-400M\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: BSD-3-Clause\n  modality: image, text\n  monitoring: ''\n  name: LAION-115M\n  nationality: USA\n  organization: Salesforce\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 115M image-text pairs\n  type: dataset\n  url: https://arxiv.org/abs/2201.12086\n- access: limited\n  adaptation: ''\n  created_date: 2023-03-07\n  dependencies:\n  - ChatGPT API\n  description: EinsteinGPT is generative AI for customer relationship management (CRFM).\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: EinsteinGPT\n  nationality: USA\n  organization: Salesforce\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://www.salesforce.com/products/einstein/overview/?d=cta-body-promo-8\n  user_distribution: ''\n- access: open\n  analysis: BLIP-2 achieves state-of-the-art performance on various vision-language\n    tasks, despite having significantly fewer trainable parameters than existing methods\n  created_date: 2023-01-30\n  dependencies:\n  - OPT\n  description: BLIP-2 is a model that employs a generic and efficient pretraining\n    strategy that bootstraps vision-language pre-training from off-the-shelf frozen\n    pre-trained image encoders and frozen large language models.\n  feedback: https://huggingface.co/Salesforce/blip2-opt-2.7b/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; image\n  model_card: https://huggingface.co/Salesforce/blip2-opt-2.7b\n  monitoring: ''\n  name: BLIP-2\n  nationality: USA\n  organization: Salesforce\n  prohibited_uses: ''\n  quality_control: ''\n  size: 2.7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: less than 9 days\n  type: model\n  url: https://arxiv.org/pdf/2301.12597.pdf\n- access: open\n  analysis: Moirai has undergone a comprehensive evaluation in both in-distribution\n    and out-of-distribution settings. It demonstrated its capabilities as a zero-shot\n    forecaster and delivered competitive or superior performance compared to full-shot\n    models.\n  created_date: 2024-03-19\n  dependencies:\n  - LOTSA\n  description: Moirai is a cutting-edge time series foundation model, offering universal\n    forecasting capabilities. It stands out as a versatile time series forecasting\n    model capable of addressing diverse forecasting tasks across multiple domains,\n    frequencies, and variables in a zero-shot manner.\n  feedback: https://huggingface.co/Salesforce/moirai-1.0-R-large/discussions\n  intended_uses: Moirai can be used for time series forecasting in multiple domains.\n    It offers robust zero-shot forecasting capabilities and eliminates the need for\n    additional data, extensive computational resources, and expert input for achieving\n    accurate forecasts.\n  license: Apache 2.0\n  modality: time-series; time-series\n  model_card: https://huggingface.co/Salesforce/moirai-1.0-R-large\n  monitoring: unknown\n  name: Moirai\n  nationality: USA\n  organization: Salesforce\n  prohibited_uses: unknown\n  quality_control: The performance of Moirai was evaluated through in-distribution\n    and out-of-distribution settings.\n  size: 311M parameters\n  training_emissions: unknown\n  training_hardware: NVIDIA A100 40G GPUs\n  training_time: unknown\n  type: model\n  url: https://blog.salesforceairesearch.com/moirai/\n- access: open\n  analysis: ''\n  created_date: 2024-03-19\n  datasheet: https://huggingface.co/datasets/Salesforce/lotsa_data\n  dependencies: []\n  description: LOTSA is the largest collection of open time series datasets with 27B\n    observations across nine domains.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/Salesforce/lotsa_data/discussions\n  included: domain observations in energy, transport, climate, cloudops, web, sales,\n    nature, econ/finance, and healthcare\n  intended_uses: pre-training Large Time Series Models\n  license: Apache 2.0\n  modality: time-series\n  monitoring: ''\n  name: LOTSA\n  nationality: USA\n  organization: Salesforce\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 27B observations\n  type: dataset\n  url: https://arxiv.org/pdf/2402.02592.pdf\n"
  },
  {
    "path": "assets/samba.yaml",
    "content": "- access: open\n  analysis: Evaluated on open source multilingual model benchmarks.\n  created_date: 2024-02-26\n  dependencies:\n  - Llama 2\n  description: SambaLingo is a suite of models that adapt Llama 2 to a diverse set\n    of 9 languages.\n  feedback:\n    explanation: The Arabic language model feedback is given, but there exists one\n      for each of the other 8 languages in the collection.\n    value: https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base/discussions\n  intended_uses: ''\n  license: LLaMA 2\n  modality: text; text\n  model_card:\n    explanation: The Arabic language model card is given, but there exist one for\n      each of the other 8 languages in the collection.\n    value: https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base\n  monitoring: ''\n  name: SambaLingo\n  nationality: USA\n  organization: Samba Nova Systems\n  prohibited_uses: SambaLingo should not be used for mission-critical applications,\n    applications involving the safety of others, and highly critical decisions.\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://sambanova.ai/blog/sambalingo-open-source-language-experts\n- access: limited\n  analysis: unknown\n  created_date: 2024-02-28\n  dependencies:\n  - Llama 2\n  - Mistral\n  - Falcon-180B\n  - Deepseek\n  - BLOOM\n  - LLaVA\n  - CLIP\n  description: Samba 1 is a trillion parameter generative AI model using a Composition\n    of Experts architecture.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: unknown\n  name: Samba 1\n  nationality: USA\n  organization: Samba Nova Systems\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1T parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://sambanova.ai/blog/samba-1-composition-of-experts-mode\n"
  },
  {
    "path": "assets/sana.yaml",
    "content": "- access: limited\n  adaptation: 'Customized GPT-3, fine-tuned on private data [[Sana GPT-3 Demo]](https://gpt3demo.com/apps/sanalabs).\n\n    '\n  created_date:\n    explanation: 'The company was founded in 2016 according to [[Crunchbase]](https://www.crunchbase.com/organization/sana-2).\n      It is unclear when Sana adapted the OpenAI API to their products.\n\n      '\n    value: unknown\n  dependencies:\n  - OpenAI API\n  description: '\"Sana is your all-in-one, AI-assisted, online learning platform (LMS).\n    Author employee training courses and measure team development with Sana''s powerful\n    analytics. Sana partners with the world''s most important organizations and fastest-growing\n    startups to make personalized, adaptive learning available for everyone, everywhere\"\n    [[Sana GPT-3 Demo]](https://gpt3demo.com/apps/sanalabs).\n\n    '\n  failures: unknown\n  feedback: unknown\n  intended_uses: 'Sana is intended to be used by employers to provide a learning service\n    for their employees.\n\n    '\n  license:\n    explanation: \"\\\"Sana Labs grants Subscriber a limited non-exclusive, non-transferable,\\\n      \\ non-sublicensable license to use Sana Labs' web-based personalized and collaborative\\\n      \\ learning platform (\\u201CSana Platform\\u201D) solely in connection with the\\\n      \\ subscribed Services in accordance with the Agreement.\\\" - excerpt from the\\\n      \\ Terms of Service.\\n\"\n    value: custom\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Sana\n  nationality: USA\n  organization: Sana\n  output_space: question and answer, summarization, sentiment analysis,topic identification\n  prohibited_uses:\n    explanation: '\"Sana provides an extensive set of legal documents [[Sana Legal]](https://www.sanalabs.com/legal/),\n      but missing from the documents are prohibited uses of the Sana platform, beyond\n      a mention of what is impermissible under relevant law.\" - exceprt from the Terms\n      of Service document.\n\n      '\n    value: none\n  quality_control: unknown\n  terms_of_service: https://www.sanalabs.com/legal/\n  type: application\n  url: https://www.sanalabs.com/\n  user_distribution: unknown\n"
  },
  {
    "path": "assets/sciphi.yaml",
    "content": "- access: open\n  analysis: none\n  created_date: 2023-11-07\n  dependencies:\n  - Mistral\n  description: SciPhi Mistral is a Large Language Model (LLM) fine-tuned from Mistral.\n  feedback: https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k\n  monitoring: unknown\n  name: SciPhi Mistral\n  nationality: USA\n  organization: SciPhi\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k\n"
  },
  {
    "path": "assets/shanghai.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date: 2022-12-06\n  dependencies:\n  - Kinetics-400\n  - WebVid-2M\n  - WebVid-10M\n  - HowTo100M\n  - AVA\n  - Something-Something-v2\n  - Kinetics-710\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text, video; video\n  model_card: ''\n  monitoring: ''\n  name: InternVideo\n  nationality: China\n  organization: Shanghai AI Laboratory\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.3B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2212.03191.pdf\n- access: open\n  analysis: Evaluated based on own constructed dataset covering 433 languages.\n  created_date: 2023-05-29\n  dependencies:\n  - OPUS\n  description: Lego-MT is a multilingual large language model which uses a more efficient\n    approach of being an effective detachable model.\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Lego-MT\n  nationality: China\n  organization: Shanghai AI Laboratory\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.2B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 32 A100 GPUs\n  training_time: 15 days\n  type: model\n  url: https://arxiv.org/pdf/2212.10551.pdf\n- access: open\n  analysis: Evaluated on GSM8K and the competition-level MATH dataset.\n  created_date: 2023-10-05\n  dependencies:\n  - GPT-4\n  - LLaMA 2\n  description: MathCoder is a family of models capable of generating code-based solutions\n    for solving challenging math problems.\n  feedback: none\n  intended_uses: bridging the gap between natural language understanding and computational\n    problem-solving\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: none\n  name: MathCoder\n  nationality: China\n  organization: Shanghai AI Laboratory\n  prohibited_uses: none\n  quality_control: none\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 32 NVIDIA A800 80GB GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2310.03731.pdf\n- access: open\n  analysis: Evaluated in comparison to LLaMA series models on standard benchmarks.\n  created_date: 2023-09-20\n  dependencies: []\n  description: InternLM is a high-quality language model proficient in English, Chinese,\n    and code.\n  feedback: https://huggingface.co/internlm/internlm-20b/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: code, text; code, text\n  model_card: https://huggingface.co/internlm/internlm-20b\n  monitoring: none\n  name: InternLM\n  nationality: China\n  organization: Shanghai AI Laboratory\n  prohibited_uses: ''\n  quality_control: ''\n  size: 20B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/InternLM/InternLM\n- access: open\n  analysis: Evaluated across a range of video-related tasks and compared to other\n    multimodal models like CLIP, VideoPrism, and VideoCoCa. InternVideo 2 generally\n    performs among the best of such models on these benchmarks.\n  created_date: 2024-03-22\n  dependencies:\n  - InternVL\n  - VideoMAEv2\n  - LAION\n  - WebVid\n  - InternVid\n  - LLaVA\n  - KMash\n  description: InternVideo2 is a new video foundation model (ViFM) that achieves the\n    state-of-the-art performance in action recognition, video-text tasks, and video-centric\n    dialogue.\n  feedback: none\n  intended_uses: ''\n  license: MIT\n  modality: text, video; text\n  model_card: none\n  monitoring: unknown\n  name: InternVideo2\n  nationality: China\n  organization: Shanghai AI Laboratory, Nanjing University, Zhejiang University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 6B parameters\n  training_emissions: unknown\n  training_hardware: 256 NVIDIA A100 GPUs for 32 days, and 64 GPUs for 3 days\n  training_time: 35 days\n  type: model\n  url: https://github.com/OpenGVLab/InternVideo2\n- access: open\n  analysis: The model was compared with SOTAs and has shown good performance in generating\n    high-quality human images.\n  created_date: 2024-04-01\n  dependencies:\n  - CosmicMan-HQ 1.0\n  description: CosmicMan is a text-to-image foundation model specialized for generating\n    high-fidelity human images with meticulous appearance, reasonable structure, and\n    precise text-image alignment.\n  feedback: unknown\n  intended_uses: The model is intended to generate high-quality, photorealistic human\n    images from text descriptions. Applications include avatar generation and potentially\n    virtual reality and video game character creation.\n  license: unknown\n  modality: text; image\n  model_card: none\n  monitoring: unknown\n  name: CosmicMan\n  nationality: China\n  organization: Shanghai AI Laboratory\n  prohibited_uses: unknown\n  quality_control: The quality control measures taken include modeling the relationship\n    between dense text descriptions and image pixels in a decomposed manner and enforcing\n    attention refocusing without adding extra modules.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: 32 80G NVIDIA A100 GPUs\n  training_time: 1 week\n  type: model\n  url: https://cosmicman-cvpr2024.github.io/\n- access: open\n  analysis: Compared to other human image datasets on data quantity, image quality,\n    and annotations.\n  created_date: 2024-04-28\n  datasheet: none\n  dependencies: []\n  description: CosmicMan-HQ 1.0 is a large-scale dataset with 6 million high-quality,\n    real-world human images.\n  excluded: ''\n  feedback: none\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: image\n  monitoring: unknown\n  name: CosmicMan-HQ 1.0\n  nationality: China\n  organization: Shanghai AI Laboratory\n  prohibited_uses: ''\n  quality_control: unknown\n  sample: []\n  size: 6 million images\n  type: dataset\n  url: https://arxiv.org/pdf/2404.01294\n"
  },
  {
    "path": "assets/shop.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-03-01\n  dependencies:\n  - ChatGPT API\n  description: \"When shoppers search for products, the shopping assistant makes personalized\\\n    \\ recommendations based on their requests. Shop\\u2019s new AI-powered shopping\\\n    \\ assistant will streamline in-app shopping by scanning millions of products to\\\n    \\ quickly find what buyers are looking for\\u2014or help them discover something\\\n    \\ new.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  monitoring: ''\n  monthly_active_users: ''\n  name: Shop Assistant\n  nationality: USA\n  organization: Shop\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://openai.com/blog/introducing-chatgpt-and-whisper-apis\n  user_distribution: ''\n"
  },
  {
    "path": "assets/singapore.yaml",
    "content": "- access: open\n  analysis: Performance assessed on BIG-bench arithmetic sub-task, and various elementary\n    arithmetic tasks.\n  created_date: 2023-05-23\n  dependencies:\n  - LLaMA\n  - GOAT dataset\n  description: GOAT is a fine-tuned LLaMA model which uses the tokenization of numbers\n    to significantly outperform benchmark standards on a range of arithmetic tasks.\n  feedback: ''\n  intended_uses: Integration into other instruction-tuned LLMs to further enhance\n    arithmetic reasoning abilities in solving math word problems.\n  license: Apache 2.0\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: GOAT\n  nationality: Singapore\n  organization: National University of Singapore\n  prohibited_uses: ''\n  quality_control: Number data is randomly generated from log space to reduce likelihood\n    of redundancy and range of magnitudes.\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 24 GB VRAM GPU\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2305.14201.pdf\n- access: open\n  analysis: Evaluated on relatively simple established benchmarks.\n  created_date: 2024-01-12\n  dependencies:\n  - RedPajama\n  - The Stack\n  description: OpenMoE is a series of fully open-sourced and reproducible decoder-only\n    MoE LLMs.\n  feedback: https://huggingface.co/OrionZheng/openmoe-base/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/OrionZheng/openmoe-base\n  monitoring: unknown\n  name: OpenMoE\n  nationality: unknown\n  organization: National University of Singapore, University of Edinburgh, ETH Zurich\n  prohibited_uses: ''\n  quality_control: unknown\n  size: 34B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/XueFuzhao/OpenMoE\n"
  },
  {
    "path": "assets/skt.yaml",
    "content": "- access: closed\n  analysis: none\n  created_date: 2023-09-26\n  dependencies: []\n  description: A.X is SK Telecom's proprietary LLM, which has been trained on the\n    Korean language.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: A.X\n  nationality: South Korea\n  organization: SK Telecom\n  prohibited_uses: ''\n  quality_control: ''\n  size: 39B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.sktelecom.com/en/press/press_detail.do?idx=1582\n"
  },
  {
    "path": "assets/snap.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-03-01\n  dependencies:\n  - ChatGPT API\n  description: My AI offers Snapchatters a friendly, customizable chatbot at their\n    fingertips that offers recommendations, and can even write a haiku for friends\n    in seconds. Snapchat, where communication and messaging is a daily behavior, has\n    750 million monthly Snapchatters.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: '\"Snap grants you a worldwide, royalty-free, non-assignable, non-exclusive,  revocable,\n      and non-sublicensable license to use the Services.\" - excerpt from the Terms\n      of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: My AI for Snapchat\n  nationality: USA\n  organization: Snap\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://snap.com/terms\n  type: application\n  url: https://openai.com/blog/introducing-chatgpt-and-whisper-apis\n  user_distribution: ''\n"
  },
  {
    "path": "assets/soochow.yaml",
    "content": "- access: open\n  analysis: Evaluated across different text benchmarks in English and Chinese.\n  created_date: 2023-10-01\n  dependencies: []\n  description: OpenBA is an open-sourced 15B bilingual (English + Chinese) asymmetric\n    seq2seq model.\n  feedback: https://huggingface.co/OpenBA/OpenBA-LM/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/OpenBA/OpenBA-LM\n  monitoring: none\n  name: OpenBA\n  nationality: China\n  organization: Soochow University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 15B parameters (dense)\n  training_emissions: 6.5 tCO2eq\n  training_hardware: 8 NVIDIA A100-80GB GPUs\n  training_time: 38k GPU hours\n  type: model\n  url: https://arxiv.org/pdf/2309.10706.pdf\n"
  },
  {
    "path": "assets/speak.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-03-01\n  dependencies:\n  - Whisper API\n  description: Speak is an AI-powered language learning app focused on building the\n    best path to spoken fluency and is the the fastest-growing English app in South\n    Korea.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  monitoring: ''\n  monthly_active_users: ''\n  name: Speak\n  nationality: USA\n  organization: Speak\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://openai.com/blog/introducing-chatgpt-and-whisper-apis\n  user_distribution: ''\n"
  },
  {
    "path": "assets/spotify.yaml",
    "content": "- access: limited\n  adaptation: ''\n  created_date: 2023-02-23\n  dependencies:\n  - ChatGPT API\n  - Sonantic AI\n  description: \"The DJ is a personalized AI guide that knows you and your music taste\\\n    \\ so well that it can choose what to play for you. This feature, first rolling\\\n    \\ out in beta, will deliver a curated lineup of music alongside commentary around\\\n    \\ the tracks and artists we think you\\u2019ll like in a stunningly realistic voice.\"\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: '\"The Spotify software applications and the Content are licensed,\n      not sold or transferred to you, and Spotify and its licensors retain ownership\n      of all copies of the Spotify software applications and Content even after installation\n      on your Devices.\" - excerpt from the Terms of Service document.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: AI DJ\n  nationality: Sweden\n  organization: Spotify\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.spotify.com/us/legal/end-user-agreement/\n  type: application\n  url: https://newsroom.spotify.com/2023-02-22/spotify-debuts-a-new-ai-dj-right-in-your-pocket/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/stability.yaml",
    "content": "- access: open\n  analysis: Evaluated on the COCO dataset.\n  created_date: 2023-04-28\n  dependencies:\n  - LAION-5B\n  description: A text-to-image cascaded pixel diffusion model released in conjunction\n    with AI research lab DeepFloyd.\n  feedback: https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/discussions\n  intended_uses: ''\n  license:\n    explanation: License agreement can be found on model card https://huggingface.co/DeepFloyd/IF-I-XL-v1.0\n    value: custom\n  modality: text; image\n  model_card: https://huggingface.co/DeepFloyd/IF-I-XL-v1.0\n  monitoring: ''\n  name: DeepFloyd IF\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 4.3B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://stability.ai/blog/deepfloyd-if-text-to-image-model\n- access: open\n  analysis: ''\n  created_date: 2023-04-20\n  dependencies:\n  - StableLM-Alpha dataset\n  - Alpaca dataset\n  - gpt4all dataset\n  - ShareGPT52K dataset\n  - Dolly dataset\n  - HH dataset\n  description: Large language models trained on up to 1.5 trillion tokens.\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: StableLM\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://github.com/Stability-AI/StableLM\n- access: open\n  adaptation: ''\n  created_date: 2022-08-22\n  dependencies: []\n  description: Stable Diffusion is a generative software that creates images from\n    text prompts.\n  failures: ''\n  feedback: https://huggingface.co/CompVis/stable-diffusion/discussions\n  intended_uses: ''\n  license:\n    explanation: License agreement can be found on https://github.com/CompVis/stable-diffusion/blob/main/LICENSE\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Stable Diffusion\n  nationality: UK\n  organization: Stability AI\n  output_space: image\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://stability.ai/blog/stable-diffusion-public-release\n  user_distribution: ''\n- access: open\n  adaptation: ''\n  created_date: 2023-07-26\n  dependencies: []\n  description: Stable Diffusion XL is an updated version of Stable Diffusion, and\n    creates descriptive images with shorter prompts and generate words within images.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: License can be found at https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md\n    value: Open Rail++\n  monitoring: ''\n  monthly_active_users: ''\n  name: Stable Diffusion XL\n  nationality: UK\n  organization: Stability AI\n  output_space: image\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://stability.ai/stablediffusion\n  user_distribution: ''\n- access: limited\n  analysis: Evaluated via a user study comparing preferences between Stable Video\n    Diffusion and competing text-to-video models.\n  created_date: 2023-11-21\n  dependencies:\n  - Large Video Dataset\n  description: Stable Video Diffusion is a latent diffusion model trained to generate\n    short video clips from an image conditioning.\n  feedback: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/discussions\n  intended_uses: Intended for research purposes only.\n  license:\n    explanation: License can be found at https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/blob/main/LICENSE\n    value: custom\n  modality: text; video\n  model_card: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt\n  monitoring: ''\n  name: Stable Video Diffusion\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: Using the model to generate representations of real-world people\n    or events.\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf\n- access: closed\n  analysis: Large Video Dataset compared to publicly available research datasets on\n    general statistics before and after filtering.\n  created_date: 2023-11-21\n  datasheet: ''\n  dependencies:\n  - WebVid-10M\n  - CoCa\n  - V-BLIP\n  description: Large Video Dataset is the dataset that trained Stable Video Diffusion,\n    consisting of over 212 years of content.\n  excluded: ''\n  feedback: none\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: video with caption\n  monitoring: ''\n  name: Large Video Dataset\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: ''\n  quality_control: Dataset annotated with dense optical flow, and low optical flow\n    videos are removed.\n  sample: []\n  size: 580M annotated video clip pairs\n  type: dataset\n  url: https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf\n- access: open\n  adaptation: ''\n  created_date: 2023-11-01\n  dependencies: []\n  description: Sky Replacer is an exciting new tool that allows users to replace the\n    color and aesthetic of the sky in their original photos with a selection of nine\n    alternatives to improve the overall look and feel of the image.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: Sky Replacer\n  nationality: UK\n  organization: Stability AI\n  output_space: image\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://stability.ai/terms-of-use\n  type: application\n  url: https://clipdrop.co/real-estate/sky-replacer\n  user_distribution: ''\n- access: open\n  analysis: Evaluated on standard LLM benchmarks and in multilingual tasks compared\n    to similarly sized open-source models.\n  created_date: 2024-01-19\n  dependencies:\n  - RedPajama-Data\n  - The Pile\n  - RefinedWeb\n  - The Stack\n  - OpenWebText\n  - OpenWebMath\n  description: StableLM 2 is a state-of-the-art 1.6 billion parameter small language\n    model trained on multilingual data in English, Spanish, German, Italian, French,\n    Portuguese, and Dutch.\n  feedback: https://huggingface.co/stabilityai/stablelm-2-1_6b/discussions\n  intended_uses: The model is intended to be used as a foundational base model for\n    application-specific fine-tuning. Developers must evaluate and fine-tune the model\n    for safe performance in downstream applications.\n  license:\n    explanation: License agreement can be found on https://huggingface.co/stabilityai/stablelm-2-1_6b/blob/main/LICENSE\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/stabilityai/stablelm-2-1_6b\n  monitoring: unknown\n  name: StableLM 2\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: ''\n  quality_control: ''\n  size: 1.6B parameters (dense)\n  training_emissions: 11 tCO2eq\n  training_hardware: 512 NVIDIA A100 40GB GPUs\n  training_time: 92k GPU hours\n  type: model\n  url: https://stability.ai/news/introducing-stable-lm-2\n- access: open\n  analysis: Evaluated on image generation benchmarks in comparison to equal and smaller-sized\n    models.\n  created_date: 2024-01-16\n  dependencies: []\n  description: \"Stable Cascade is built upon the W\\xFCrstchen architecture and its\\\n    \\ main difference to other models, like Stable Diffusion, is that it is working\\\n    \\ at a much smaller latent space.\"\n  feedback: https://huggingface.co/stabilityai/stable-cascade/discussions\n  intended_uses: The model is intended for research purposes for now.\n  license:\n    explanation: License agreement can be found on https://github.com/Stability-AI/StableCascade/blob/master/WEIGHTS_LICENSE\n    value: custom\n  modality: text; image\n  model_card: https://huggingface.co/stabilityai/stable-cascade\n  monitoring: unknown\n  name: Stable Cascade\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: The model was not trained to be factual or true representations\n    of people or events, and therefore using the model to generate such content is\n    out-of-scope for the abilities of this model. The model should not be used in\n    any way that violates Stability AI's Acceptable Use Policy.\n  quality_control: ''\n  size: unknown\n  training_emissions: 2,276 kgCO2eq\n  training_hardware: Some number of A100 GPUs\n  training_time: 24,602 A100 GPU hours\n  type: model\n  url: https://huggingface.co/stabilityai/stable-cascade\n- access: open\n  analysis: unknown\n  created_date: 2024-03-18\n  dependencies:\n  - Objaverse\n  description: Stable Video 3D (SV3D) is a generative model based on Stable Video\n    Diffusion that takes in a still image of an object as a conditioning frame, and\n    generates an orbital video of that object.\n  feedback: https://huggingface.co/stabilityai/sv3d/discussions\n  intended_uses: This model is intended to be used for generating orbital videos of\n    objects from still images.\n  license: StabilityAI Non-Commercial Research Community License\n  modality: image; video\n  model_card: https://huggingface.co/stabilityai/sv3d\n  monitoring: unknown\n  name: Stable Video 3D\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: The model should not be used for generating factual or true representations\n    of people or events, or in any way that violates Stability AIs Acceptable Use\n    Policy.\n  quality_control: unknown\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://stability.ai/news/introducing-stable-video-3d\n- access: open\n  analysis: none\n  created_date: 2024-04-03\n  dependencies:\n  - AudioSparx\n  description: Stable Audio 2.0 sets a new standard in AI-generated audio, producing\n    high-quality, full tracks with coherent musical structure up to three minutes\n    in length at 44.1kHz stereo.\n  feedback: none\n  intended_uses: It can be used to generate melodies, backing tracks, stems, and sound\n    effects.\n  license: unknown\n  modality: audio, text; audio\n  model_card: none\n  monitoring: Advanced content recognition is used to maintain compliance and prevent\n    copyright infringement.\n  name: Stable Audio 2.0\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: Uploading copyrighted material for transformation.\n  quality_control: To protect creator copyrights, for audio uploads, Stability AI\n    partners with Audible Magic to use their content recognition (ACR) technology\n    to power real-time content matching and prevent copyright infringement. Opt-out\n    requests were honored during the training phase.\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://stability-ai.squarespace.com/news/stable-audio-2-0\n"
  },
  {
    "path": "assets/stability_ai.yaml",
    "content": "- access: open\n  analysis: The model was tested extensively internally and externally. It has developed\n    and implemented numerous safeguards to prevent harms. They have also received\n    user feedback to make continuous improvements.\n  created_date: 2024-06-12\n  dependencies: []\n  description: \"Stable Diffusion 3 Medium is Stability AI\\u2019s advanced text-to-image\\\n    \\ open model. It's suitable for running on consumer PCs and laptops as well as\\\n    \\ enterprise-tier GPUs. The model is known for its overall Quality and Photorealism,\\\n    \\ prompt understanding, typography, being resource-efficient, and being fine-tuned.\\\n    \\ The model in collaboration with NVIDIA and AMD has enhanced performance.\"\n  feedback: Feedback can be given through Twitter, Instagram, LinkedIn, or Discord\n    Community.\n  intended_uses: The model can be used by professional artists, designers, developers,\n    and AI enthusiasts for creating high-quality image outputs from text inputs.\n  license: Stability Community License\n  modality: text; image\n  model_card: unknown\n  monitoring: Continuous collaboration with researchers, experts, and the community\n    to ensure that the model is being used appropriately.\n  name: Stable Diffusion 3 Medium\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: Large-scale commercial use requires contacting the organization\n    for licensing details. The model should not be used for any purpose that does\n    not adhere to the usage guidelines.\n  quality_control: They have conducted extensive internal and external testing of\n    this model and have implemented numerous safeguards to prevent harms. Safety measures\n    were implemented from the start of training the model and continued throughout\n    testing, evaluation, and deployment.\n  size: 2B parameters\n  training_emissions: Unknown\n  training_hardware: unknown\n  training_time: Unknown\n  type: model\n  url: https://stability.ai/news/stable-diffusion-3-medium\n- access: open\n  analysis: Consistency across the spatial and temporal axes greatly improves with\n    this model. Stable Video 4D is able to generate novel view videos that are more\n    detailed, faithful to the input video, and are consistent across frames and views\n    compared to existing works.\n  created_date: 2024-07-24\n  dependencies:\n  - Stable Video Diffusion Model\n  description: Stable Video 4D is our latest AI model for dynamic multi-angle video\n    generation. It allows users to upload a single video and receive novel-view videos\n    of eight new angles/views. This advancement moves from image-based video generation\n    to full 3D dynamic video synthesis. Users can specify camera angles, tailoring\n    the output to meet specific creative needs. The model is currently available on\n    Hugging Face and can generate 5-frame videos across the 8 views in about 40 seconds.\n  feedback: Feedback and reports about the progress should be shared via their social\n    channels like Twitter, Instagram, LinkedIn or their Discord Community.\n  intended_uses: This model can be used for creating dynamic multi-angle videos, with\n    applications in game development, video editing, and virtual reality. It allows\n    professionals in these fields to visualize objects from multiple angles, enhancing\n    the realism and immersion of their products.\n  license: Stability Community License\n  modality: video; video\n  model_card: unknown\n  monitoring: Continuous monitoring by the Stability AI team for improvements and\n    refinements.\n  name: Stable Video 4D\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: Unknown\n  quality_control: The Stability AI team is dedicated to continuous innovation and\n    exploration of real-world use-cases for this model and others. They are actively\n    working to refine and optimize the model beyond the current synthetic datasets\n    it has been trained on.\n  size: Unknown\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://stability.ai/news/stable-video-4d\n- access: open\n  analysis: The model was evaluated on its ability to quickly and accurately transform\n    a single image into a detailed 3D asset. This evaluation highlighted the model's\n    unprecedented speed and quality, marking it as a valuable tool for rapid prototyping\n    in 3D work. Compared to the previous SV3D model, Stable Fast 3D offers significantly\n    reduced inference times--0.5 seconds versus 10 minutes--while maintaining high-quality\n    output.\n  created_date: 2024-08-01\n  dependencies:\n  - TripoSR\n  description: Stable Fast 3D is a ground-breaking model in 3D asset generation technology.\n    It can transform a single input image into a highly detailed 3D asset in around\n    half a second, setting new standards in terms of speed and quality in the realm\n    of 3D reconstruction. Users start the process by uploading an image of an object.\n    Stable Fast 3D then swiftly generates a complete 3D asset, which includes, UV\n    unwrapped mesh, material parameters, albedo colors with reduced illumination bake-in,\n    and optional quad or triangle remeshing. This model has various applications,\n    notably for game and virtual reality developers, as well as professionals in retail,\n    architecture, design, and other graphic-intensive professions.\n  feedback: Information on any downstream issues with the model can be reported to\n    Stability AI through their support request system.\n  intended_uses: The model is intended for use in game development, virtual reality,\n    retail, architecture, design and other graphically intense professions. It allows\n    for rapid prototyping in 3D work, assisting both enterprises and indie developers.\n    It's also used in movie production for creating static assets for games and 3D\n    models for e-commerce, as well as fast model creation for AR/VR.\n  license: Stability Community License\n  modality: image; 3D\n  model_card: https://huggingface.co/stabilityai/stable-fast-3d\n  monitoring: Unknown\n  name: Stable Fast 3D\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: Use by individuals or organizations with over $1M in annual revenue\n    without obtaining an Enterprise License.\n  quality_control: Unknown\n  size: unknown\n  training_emissions: Unknown\n  training_hardware: unknown\n  training_time: Unknown\n  type: model\n  url: https://stability.ai/news/introducing-stable-fast-3d\n- access:\n    explanation: This open release includes multiple model variants, including Stable\n      Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo, and as of October\n      29th, Stable Diffusion 3.5 Medium.\n    value: open\n  analysis: Our analysis shows that Stable Diffusion 3.5 Large leads the market in\n    prompt adherence and rivals much larger models in image quality.\n  created_date: 2023-10-29\n  dependencies: []\n  description: Stable Diffusion 3.5 reflects our commitment to empower builders and\n    creators with tools that are widely accessible, cutting-edge, and free for most\n    use cases.\n  feedback: We look forward to hearing your feedback on Stable Diffusion 3.5 and seeing\n    what you create with the models.\n  intended_uses: This model is ideal for professional use cases at 1 megapixel resolution.\n  license:\n    explanation: This open release includes multiple variants that are customizable,\n      run on consumer hardware, and are available for use under the permissive Stability\n      AI Community License.\n    value: Stability AI Community\n  modality:\n    explanation: Capable of generating a wide range of styles and aesthetics like\n      3D, photography, painting, line art, and virtually any visual style imaginable.\n    value: text; image\n  model_card: unknown\n  monitoring: unknown\n  name: Stable Diffusion 3.5\n  nationality: UK\n  organization: Stability AI\n  prohibited_uses: unknown\n  quality_control: We believe in safe, responsible AI practices and take deliberate\n    measures to ensure Integrity starts at the early stages of development.\n  size:\n    explanation: At 8.1 billion parameters, with superior quality and prompt adherence,\n      this base model is the most powerful in the Stable Diffusion family.\n    value: 8.1B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://stability.ai/news/introducing-stable-diffusion-3-5\n"
  },
  {
    "path": "assets/stanford.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date: 2022-12-15\n  dependencies:\n  - The Pile\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: bigscience-bloom-rail-1.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: BioMedLM\n  nationality: USA\n  organization: Stanford\n  prohibited_uses: ''\n  quality_control: ''\n  size: 2.7B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://crfm.stanford.edu/2022/12/15/pubmedgpt.html\n- access: open\n  analysis: Evaluated on own framework that tests domain-specific tasks in medical\n    field.\n  created_date: 2022-11-23\n  dependencies:\n  - Stable Diffusion\n  - RoentGen radiology dataset\n  description: RoentGen is a generative medical imaging model that can create visually\n    convincing X-ray images.\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  modality: text; image\n  model_card: ''\n  monitoring: ''\n  name: RoentGen\n  nationality: USA\n  organization: Stanford\n  prohibited_uses: ''\n  quality_control: ''\n  size: 330M parameters (dense)\n  training_emissions: unknown\n  training_hardware: 64 A100 GPUs\n  training_time: 60k training steps per day\n  type: model\n  url: https://arxiv.org/pdf/2211.12737.pdf\n- access: open\n  analysis: Evaluated on three physical control tasks, drawing, steering, and human\n    body movement on various dynamics\n  created_date: 2023-06-12\n  dependencies:\n  - GPT-2\n  - BABEL\n  - text-davinci-003\n  description: Model trained to generate language corrections for physical control\n    tasks.\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: human trajectories; text\n  model_card: ''\n  monitoring: ''\n  name: CORGI\n  nationality: USA\n  organization: Stanford\n  prohibited_uses: ''\n  quality_control: ''\n  size: 124M parameters (dense)\n  training_emissions: ''\n  training_hardware: one NVIDIA A40 GPU\n  training_time:\n    explanation: The authors do not report the training time, but do report that they\n      train for 200 epochs.\n    value: unknown\n  type: model\n  url: https://arxiv.org/pdf/2306.07012.pdf\n- access:\n    explanation: The dataset can be downloaded from [[Hugging Face]](https://huggingface.co/datasets/tatsu-lab/alpaca).\n      The code for generating data is available on the [[GitHub repository]](https://github.com/tatsu-lab/stanford_alpaca#data-generation-process).\n    value: open\n  analysis: ''\n  created_date:\n    explanation: 'The date the [[blog post]](https://crfm.stanford.edu/2023/03/13/alpaca.html)\n      was released.\n\n      '\n    value: 2023-03-13\n  datasheet: https://huggingface.co/datasets/tatsu-lab/alpaca\n  dependencies:\n  - text-davinci-003\n  description: 'Alpaca dataset consistes of 52,000 instruction-following demonstrations\n    generated in the style of the [Self-Instruct framework](https://github.com/yizhongw/self-instruct)\n    using OpenAI''s text-davinci-003 engine. This instruction data can be used to\n    conduct instruction-tuning for language models and make the language model follow\n    instruction better.\n\n    '\n  excluded: ''\n  feedback: Feedback can be provided on [[GitHub Issues]](https://github.com/tatsu-lab/stanford_alpaca/issues).\n  included: ''\n  intended_uses: Alpaca is intended and licensed for research use only.\n  license: CC BY-NC 4.0\n  modality: text (English)\n  monitoring: ''\n  name: Alpaca dataset\n  nationality: USA\n  organization: Stanford\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 52K instruction-following demonstrations\n  type: dataset\n  url: https://crfm.stanford.edu/2023/03/13/alpaca.html\n- access:\n    explanation: The weight diff between Alpaca-7B and LLaMA-7B is located on the\n      [[Hugging Face]](https://huggingface.co/tatsu-lab/alpaca-7b-wdiff). To recover\n      the original Alpaca-7B weights, follow the steps given [[here]](https://github.com/tatsu-lab\n      stanford_alpaca#recovering-alpaca-weights). Training and data generation code\n      can be found on the [[GitHub repository]](https://github.com/tatsu-lab/stanford_alpaca).\n      An [[online demo]](https://chat.lmsys.org/?model=alpaca-13b) is also available.\n    value: open\n  analysis: ''\n  created_date:\n    explanation: 'The date the [[blog post]](https://crfm.stanford.edu/2023/03/13/alpaca.html)\n      was released.\n\n      '\n    value: 2023-03-13\n  dependencies:\n  - LLaMa\n  - Alpaca dataset\n  description: 'Alpaca-7B is an instruction-following model fine-tuned from the LLaMA\n    7B model on 52K instruction-following demonstrations.\n\n    '\n  feedback: Feedback can be provided on [[GitHub Issues]](https://github.com/tatsu-lab/stanford_alpaca/issues).\n  intended_uses: Alpaca is intended and licensed for research use only.\n  license: CC BY NC 4.0 (model weights)\n  modality: text (English)\n  model_card: ''\n  monitoring: ''\n  name: Alpaca\n  nationality: USA\n  organization: Stanford\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense model)\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://crfm.stanford.edu/2023/03/13/alpaca.html\n- access: open\n  analysis: Merlin has been comprehensively evaluated on 6 task types and 752 individual\n    tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification,\n    phenotype classification, and zero-shot cross-modal retrieval, while model adapted\n    tasks include 5-year chronic disease prediction, radiology report generation,\n    and 3D semantic segmentation. It has undergone internal validation on a test set\n    of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public\n    CT datasets (VerSe, TotalSegmentator).\n  created_date: 2024-09-08\n  dependencies: []\n  description: Merlin is a 3D Vision Language Model that's designed for interpretation\n    of abdominal computed tomography (CT) scans. It uses both structured Electronic\n    Health Record (EHR) and unstructured radiology reports for supervision without\n    requiring additional manual annotations. The model was trained on a high-quality\n    clinical dataset of paired CT scans, EHR diagnosis codes, and radiology reports\n    and was evaluated on 6 task types and 752 individual tasks.\n  feedback: Feedback and reports for problems with the model should likely be routed\n    to Stanford Center for Artificial Intelligence in Medicine and Imaging, or the\n    corresponding author of the research (louis.blankemeier@stanford.edu).\n  intended_uses: This model is intended for use in the interpretation of abdominal\n    computed tomography (CT) scans, chronic disease prediction, radiology report generation,\n    and 3D semantic segmentation.\n  license: Unknown\n  modality: image; text\n  model_card: unknown\n  monitoring: Unknown\n  name: Merlin\n  nationality: unknown\n  organization: Stanford Center for Artificial Intelligence in Medicine and Imaging,\n    Stanford University\n  prohibited_uses: The model should not be used outside of healthcare-related context,\n    such as for personal or non-medical commercial purposes.\n  quality_control: The model has undergone extensive evaluations and also internal\n    and external validation tests.\n  size: Unknown\n  training_emissions: Unknown\n  training_hardware: Single GPU.\n  training_time: Unknown\n  type: model\n  url: https://arxiv.org/pdf/2406.06512\n"
  },
  {
    "path": "assets/stonybrook.yaml",
    "content": "- access:\n    explanation: The dataset is available for download on the [[SBU Captions Dataset\n      website]](https://www.cs.rice.edu/~vo9/sbucaptions/), along with additional\n      resources.\n    value: open\n  analysis:\n    explanation: See [[Section 5]](https://proceedings.neurips.cc/paper/2011/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf)\n    value: 'Authors perform two quantitative evaluations for image captioning - direct\n      user ratings of relevance and BLEU score. They also propose a new evaluation\n      task: \"we propose a new evaluation task where a user is presented with two photographs\n      and one caption. The user must assign the caption to the most relevant image.\n      For evaluation we use a query image, a random image and a generated caption.\"\n\n      '\n  created_date:\n    explanation: 'The date the [[paper]](https://papers.nips.cc/paper_files/paper/2011/hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html)\n      was published.\n\n      '\n    value: 2011-12-12\n  datasheet: none\n  dependencies:\n  - Flickr\n  description: 'SBU Captions Dataset is a collection of 1 million images and associated\n    captions from Flickr, filtered so that the descriptions are likely to refer to\n    visual content.\n\n    '\n  excluded:\n    explanation: See [[Section 2]](https://proceedings.neurips.cc/paper/2011/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf)\n    value: '\"This produces a very large, but noisy initial set of photographs with\n      associated text. We filter this set of photos so that the descriptions attached\n      to a picture are relevant and visually descriptive.\"\n\n      '\n  feedback: ''\n  included:\n    explanation: See [[Section 2]](https://proceedings.neurips.cc/paper/2011/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf)\n    value: \"\\\"To encourage visual descriptiveness in our collection, we select only\\\n      \\ those images with descriptions of satisfactory length based on observed lengths\\\n      \\ in visual descriptions. We also enforce that retained descriptions contain\\\n      \\ at least 2 words belonging to our term lists and at least one prepositional\\\n      \\ word, e.g. \\u201Con\\u201D, \\u201Cunder\\u201D which often indicate visible\\\n      \\ spatial relationships.\\\"\\n\"\n  intended_uses: ''\n  license: none\n  modality: image, text\n  monitoring: none\n  name: SBU Captions\n  nationality: USA\n  organization: Stony Brook University\n  prohibited_uses: ''\n  quality_control: unknown\n  sample: []\n  size: 1M image-text pairs\n  type: dataset\n  url: https://proceedings.neurips.cc/paper/2011/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf\n"
  },
  {
    "path": "assets/suno.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date: 2023-04-20\n  dependencies:\n  - AudioLM\n  description: Bark is a text-to-audio model that can generate multilingual speech\n    as well as other noises.\n  feedback: https://huggingface.co/spaces/suno/bark/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; audio\n  model_card: https://github.com/suno-ai/bark/blob/main/model-card.md\n  monitoring: ''\n  name: Bark\n  nationality: USA\n  organization: Suno\n  prohibited_uses: ''\n  quality_control: ''\n  size: ''\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: unknown\n  type: model\n  url: https://github.com/suno-ai/bark\n"
  },
  {
    "path": "assets/tiger.yaml",
    "content": "- access: open\n  analysis: Evaluated across a range of domain tasks across standard benchmarks in\n    comparison to predecessor Llama 2.\n  created_date: 2023-10-19\n  dependencies:\n    explanation: 7, 13, and 70B sizes trained on LLaMA 2. 180B model trained on BLOOM.\n    value:\n    - Llama 2\n    - BLOOM\n  description: TigerBot is an open source multilingual multitask LLM.\n  feedback: https://huggingface.co/TigerResearch/tigerbot-180b-base-v2/discussions\n  intended_uses: ''\n  license:\n    explanation: Should also consult with BLOOM and Llama 2 open-source licenses due\n      to pre-training.\n    value: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/TigerResearch/tigerbot-180b-base-v2\n  monitoring: unknown\n  name: TigerBot\n  nationality: International\n  organization: TigerResearch\n  prohibited_uses: ''\n  quality_control: Safety filtering performed to mitigate risk and remove toxic content.\n  size: 180B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 32 A100-40G GPUs\n  training_time: unknown\n  type: model\n  url: https://arxiv.org/pdf/2312.08688.pdf\n"
  },
  {
    "path": "assets/together.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date: 2022-11-29\n  dependencies:\n  - GPT-J\n  - P3\n  - NaturalInstructions-v2\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: GPT-JT\n  nationality: USA\n  organization: Together\n  prohibited_uses: ''\n  quality_control: ''\n  size: 6B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.together.xyz/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai\n- access: open\n  analysis: ''\n  created_date: 2023-03-10\n  dependencies:\n  - GPT-NeoX\n  - OIG-43M\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: GPT-NeoXT-Chat-Base\n  nationality: USA\n  organization: Together\n  prohibited_uses: ''\n  quality_control: ''\n  size: 20B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.together.xyz/blog/openchatkit\n- access: open\n  analysis: ''\n  created_date: 2023-03-10\n  dependencies:\n  - GPT-JT\n  - OIG-moderation\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: OpenChatKit moderation model\n  nationality: USA\n  organization: Together\n  prohibited_uses: ''\n  quality_control: ''\n  size: 6B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://www.together.xyz/blog/openchatkit\n- access: open\n  analysis: ''\n  created_date: 2023-03-10\n  datasheet: ''\n  dependencies:\n  - P3\n  - NaturalInstructions-v2\n  - FLAN dataset\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text\n  monitoring: ''\n  name: OIG-43M\n  nationality: International\n  organization: Together, LAION, Ontocord\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 43M instructions\n  type: dataset\n  url: https://laion.ai/blog/oig-dataset/\n- access: open\n  analysis: ''\n  created_date: 2023-03-10\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text\n  monitoring: ''\n  name: OIG-moderation\n  nationality: International\n  organization: Together, LAION, Ontocord\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://laion.ai/blog/oig-dataset/\n- access: open\n  analysis: ''\n  created_date: 2022-04-17\n  datasheet: https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T\n  dependencies:\n  - GitHub\n  - Wikipedia\n  description: The RedPajama base dataset is a 1.2 trillion token fully-open dataset\n    created by following the recipe described in the LLaMA paper\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text\n  monitoring: ''\n  name: RedPajama-Data\n  nationality: USA\n  organization: Together\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 1.2 trillion tokens\n  type: dataset\n  url: https://www.together.xyz/blog/redpajama\n- access: open\n  analysis: Model evaluated over AlpacaEval, Rouge score over BookSum, and accuracy\n    over MQA.\n  created_date: 2023-08-18\n  dependencies:\n  - BookSum dataset\n  - MQA dataset\n  - Together API\n  - LLaMA 2\n  description: Llama-2-7B-32K-Instruct is an open-source, long-context chat model\n    finetuned from Llama-2-7B-32K, over high-quality instruction and chat data.\n  feedback: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct/discussions\n  intended_uses: ''\n  license: LLaMA 2\n  modality:\n    explanation: text; text\n    value: text; text\n  model_card: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct\n  monitoring: ''\n  name: Llama-2-7B-32K-Instruct\n  nationality: USA\n  organization: Together\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://together.ai/blog/llama-2-7b-32k-instruct\n- access: open\n  analysis: none\n  created_date: 2023-10-30\n  datasheet: ''\n  dependencies:\n  - Common Crawl\n  description: RedPajama-Data-v2 is a new version of the RedPajama dataset, with 30\n    trillion filtered and deduplicated tokens (100+ trillions raw) from 84 CommonCrawl\n    dumps covering 5 languages, along with 40+ pre-computed data quality annotations\n    that can be used for further filtering and weighting.\n  excluded: ''\n  feedback: Feedback can be sent to Together via https://www.together.ai/contact\n  included: documents in English, German, French, Spanish, and Italian.\n  intended_uses: To be used as the start of a larger, community-driven development\n    of large-scale datasets for LLMs.\n  license: Apache 2.0\n  modality: text\n  monitoring: ''\n  name: RedPajama-Data-v2\n  nationality: USA\n  organization: Together\n  prohibited_uses: ''\n  quality_control: tokens filtered and deduplicated\n  sample: []\n  size: 30 trillion tokens\n  type: dataset\n  url: https://together.ai/blog/redpajama-data-v2\n- access: open\n  analysis: Model evaluated on a suite of short-context task benchmarks.\n  created_date: 2023-12-08\n  dependencies:\n  - Hyena\n  - RedPajama-Data\n  description: StripedHyena is an LLM and the first alternative model competitive\n    with the best open-source Transformers in short and long-context evaluations,\n    according to Together.\n  feedback: https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B\n  monitoring: ''\n  name: StripedHyena\n  nationality: USA\n  organization: Together\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.together.ai/blog/stripedhyena-7b\n- access: open\n  analysis: Model evaluated on a suite of short-context task benchmarks.\n  created_date: 2023-12-08\n  dependencies:\n  - Hyena\n  - RedPajama-Data\n  description: StripedHyena Nous is an LLM and chatbot, along with the first alternative\n    model competitive with the best open-source Transformers in short and long-context\n    evaluations, according to Together.\n  feedback: https://huggingface.co/togethercomputer/StripedHyena-Nous-7B/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/togethercomputer/StripedHyena-Nous-7B\n  monitoring: ''\n  name: StripedHyena Nous\n  nationality: USA\n  organization: Together\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.together.ai/blog/stripedhyena-7b\n- access: open\n  analysis: The model was evaluated using five popular vision-language benchmarks\n    that require strong commonsense reasoning and detailed image understanding, AI2D,\n    ScienceQA, MMMU, MMVet, and POPE. It demonstrated competitive performance in these\n    evaluations compared to other vision-language models.\n  created_date: 2024-06-06\n  dependencies:\n  - LLaMA\n  description: A large vision-language model with multi-resolution zoom that enhances\n    fine-grained visual understanding and reasoning about image regions. The Dragonfly\n    model comes in two variants, the general-domain model (\"Llama-3-8b-Dragonfly-v1\")\n    trained on 5.5 million image-instruction pairs, and the biomedical variant (\"Llama-3-8b-Dragonfly-Med-v1\")\n    fine-tuned on an additional 1.4 million biomedical image-instruction pairs. Dragonfly\n    demonstrates promising performance on vision-language benchmarks like commonsense\n    visual QA and image captioning.\n  feedback: Unknown\n  intended_uses: Dragonfly is designed for image-text tasks, including commonsense\n    visual question answering and image captioning. It is further focused on tasks\n    that require fine-grained understanding of high-resolution image regions, such\n    as in medical imaging.\n  license: unknown\n  modality: image, text; text\n  model_card: unknown\n  monitoring: Unknown\n  name: Dragonfly\n  nationality: USA\n  organization: Together\n  prohibited_uses: Unknown\n  quality_control: The model employs two key strategies (multi-resolution visual encoding\n    and zoom-in patch selection) that enable it to efficiently focus on fine-grained\n    details in image regions and provide better commonsense reasoning. Its performance\n    was evaluated on several benchmark tasks for quality assurance.\n  size: 8B parameters\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.together.ai/blog/dragonfly-v1\n"
  },
  {
    "path": "assets/tokyo.yaml",
    "content": "- access: open\n  analysis: Evaluated on all language datasets compared to similarly sized SOTA models,\n    with Aurora-M achieving strong performance in most.\n  created_date: 2024-04-23\n  dependencies:\n  - StarCoderPlus\n  description: Aurora-M is a 15B parameter multilingual open-source model trained\n    on English, Finnish, Hindi, Japanese, Vietnamese, and code.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: none\n  monitoring: unknown\n  name: Aurora-M\n  nationality: unknown\n  organization: Tokyo Institute of Technology, MIT-IBM Watson Lab, Sapienza University\n    of Rome\n  prohibited_uses: ''\n  quality_control: ''\n  size: 15B parameters\n  training_emissions:\n    explanation: The training process operated entirely on 100% hydro-powered energy\n      and included waste heat recycling.\n    value: unknown\n  training_hardware: LUMI supercomputer, using 128 AMD MI250X GPUs\n  training_time: 48 days\n  type: model\n  url: https://arxiv.org/pdf/2404.00399\n- access:\n    explanation: Reflection Llama-3.1 70B is an open-source LLM.\n    value: open\n  analysis: Unknown\n  created_date: 2024-09-28\n  dependencies:\n  - Glaive\n  - Llama 3.1\n  description: Reflection Llama-3.1 70B is an open-source LLM, trained with a new\n    technique called Reflection-Tuning that teaches a LLM to detect mistakes in its\n    reasoning and correct course. The model was trained on synthetic data generated\n    by Glaive.\n  feedback: Unknown\n  intended_uses: The model is intended for complex reasoning and reflection tasks.\n    It is designed to separate its internal thoughts and reasoning from its final\n    answer.\n  license: Llama 3.1 Community License Agreement\n  modality:\n    explanation: you can sample from Reflection Llama-3.1 70B using the same code,\n      pipelines, etc. as any other Llama model. This implies that the model takes\n      text input and produces text output.\n    value: text; text\n  model_card: https://huggingface.co/mattshumer/Reflection-70B\n  monitoring: Unknown\n  name: Reflection Llama-3.1 70B\n  nationality: Unknown\n  organization: Unknown\n  prohibited_uses: Unknown\n  quality_control: The model uses a Reflection-Tuning technique which allows it to\n    notice mistakes in its reasoning and correct them.\n  size:\n    explanation: The size is in the model's name, Reflection Llama-3.1 70B.\n    value: 70B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://huggingface.co/mattshumer/Reflection-70B\n- access:\n    explanation: Pixtral 12B is released under Apache 2.0 license.\n    value: open\n  analysis: The model outperforms other models of similar and larger sizes on multimodal\n    benchmarks. An open-source benchmark, MM-MT-Bench, is contributed for evaluating\n    vision-language models.\n  created_date: 2024-10-10\n  dependencies:\n  - Mistral Nemo 12B\n  description: Pixtral 12B is a 12-billion-parameter multimodal language model trained\n    to understand both natural images and documents, achieving leading performance\n    on various multimodal benchmarks without compromising on natural language performance.\n  feedback: Unknown\n  intended_uses: Multimodal instruction following tasks, capable of multi-turn, multi-image\n    conversations.\n  license:\n    explanation: Pixtral 12B is released under Apache 2.0 license.\n    value: Apache 2.0\n  modality:\n    explanation: Pixtral 12B is trained to understand both natural images and documents.\n    value: text; image\n  model_card: unknown\n  monitoring: Evaluation protocols and benchmarks are open-sourced to establish fair\n    and standardized testing.\n  name: Pixtral 12B\n  nationality: Unknown\n  organization: Unknown\n  prohibited_uses: Unknown\n  quality_control: Evaluation protocols for multimodal language models were standardized\n    and analysis was conducted to improve the reliability of model evaluations.\n  size:\n    explanation: \"We introduce Pixtral 12B, a 12\\u2013billion-parameter multimodal\\\n      \\ language model.\"\n    value: 12B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://arxiv.org/pdf/2410.07073\n"
  },
  {
    "path": "assets/toronto.yaml",
    "content": "- access: open\n  analysis: Compared models trained on OpenWebMath for 1 epoch to models trained on\n    The Pile and ProofPile on mathematics benchmarks.\n  created_date: 2023-10-10\n  datasheet: Can be found at section E of https://arxiv.org/pdf/2310.06786.pdf\n  dependencies:\n  - Common Crawl\n  description: OpenWebMath is an open dataset containing 14.7B tokens of mathematical\n    webpages from Common Crawl, inspired by Minerva.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/open-web-math/open-web-math/discussions\n  included: ''\n  intended_uses: Language model pretraining, finetuning, and evaluation.\n  license: ODC-By 1.0\n  modality: text, mathematical tokens\n  monitoring: none\n  name: OpenWebMath\n  nationality: Canada\n  organization: University of Toronto\n  prohibited_uses: Any tasks which may considered irresponsible or harmful.\n  quality_control: Documents are filtered, processed for mathematical value, deduplicated,\n    and then the largest documents are manually inspected for quality.\n  sample: []\n  size: 14.7B documents\n  type: dataset\n  url: https://arxiv.org/pdf/2310.06786.pdf\n"
  },
  {
    "path": "assets/transformify.yaml",
    "content": "- access: open\n  adaptation: ''\n  created_date: 2023-05-30\n  dependencies:\n  - GPT-4\n  description: Transformify Automate is a platform for automated task integration\n    using natural language prompts.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: ''\n  monitoring: ''\n  monthly_active_users: ''\n  name: Transformify Automate\n  nationality: USA\n  organization: Transformify\n  output_space: text and code\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://www.transformify.ai/legal-stuff\n  type: application\n  url: https://www.transformify.ai/automate\n  user_distribution: ''\n"
  },
  {
    "path": "assets/trevor.yaml",
    "content": "- access:\n    explanation: 'The tool can only be accessed by the Trevor Project will be counsolors\n      for training purposes.\n\n      '\n    value: closed\n  adaptation:\n    explanation: 'The base GPT-2 model was fine-tuned on the previous conversations\n      between new trainees and the Trevor Project counselors posing to be a fictional\n      persona in crisis.\n\n      '\n    value: Fine-tuning\n  created_date:\n    explanation: 'The release date of the blog post introducing Crisis Contact Simulator\n      [[Trevor Project Blog]](https://www.thetrevorproject.org/blog/the-trevor-project-launches-new-ai-tool-to-support-crisis-counselor-training/).\n\n      '\n    value: 2021-03-24\n  dependencies:\n  - OpenAI API\n  description: 'Crisis Contact Simulator, developed as part of a collaboration with\n    Google.org, helps train The Trevor Project counselors by mimicking to be a teen\n    in crisis. Crisis Contact Simulator is used as part of the training programs for\n    the Trevor Project''s 24/7 digital crisis services that supports LGBTQ youth [[Trevor\n    Project Blog]](https://www.thetrevorproject.org/blog/the-trevor-project-launches-new-ai-tool-to-support-crisis-counselor-training/).\n\n    '\n  failures: unknown\n  feedback: unknown\n  intended_uses: 'Training counselors\n\n    '\n  license: unknown\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Crisis Contact Simulator\n  nationality: USA\n  organization: The Trevor Project\n  output_space: Dialogue\n  prohibited_uses: unknown\n  quality_control:\n    explanation: 'The tool can only be accessed by the Trevor Project will be counsolors\n      for training purposes.\n\n      '\n    value: Limited release\n  terms_of_service: unknown\n  type: application\n  url: https://www.thetrevorproject.org/\n  user_distribution: unknown\n"
  },
  {
    "path": "assets/triml.yaml",
    "content": "- access: open\n  analysis: Evaluated on standard VLM benchmarks and outperforms SotA open-source\n    VLMs as of release.\n  created_date: 2024-02-09\n  dependencies: []\n  description: Prism is a family of VLMs trained using new analyses about key vision\n    design axes.\n  feedback: none\n  intended_uses: ''\n  license: LLaMA 2\n  modality: image, text; text\n  model_card: none\n  monitoring: unknown\n  name: Prism\n  nationality: Japan\n  organization: Toyota Research Institute\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 8 A100 GPUs\n  training_time: less than 9 hours\n  type: model\n  url: https://arxiv.org/pdf/2402.07865.pdf\n"
  },
  {
    "path": "assets/tsinghua.yaml",
    "content": "- access:\n    explanation: Model weights are available but gated via an [[application form]](https://models.aminer.cn/codegeex/download/request)\n    value: limited\n  analysis: none\n  created_date: 2022-09-20\n  dependencies: []\n  description: CodeGeeX is an autoregressive language model trained on code\n  feedback: none\n  intended_uses: none\n  license:\n    explanation: The license is provided in the [[Github repository]](https://github.com/THUDM/CodeGeeX)\n    value: Apache 2.0\n  modality: text; code\n  model_card: none\n  monitoring: none\n  name: CodeGeeX\n  nationality: China\n  organization: Tsinghua University\n  prohibited_uses: none\n  quality_control: none\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: THUDM 1536 Ascend 910 (32GB) Cluster\n  training_time: unknown\n  type: model\n  url: https://github.com/THUDM/CodeGeeX\n- access:\n    explanation: Model checkpoints available from [[Wudao-Wenhui]](https://resource.wudaoai.cn/home?ind=2&name=WuDao%20WenHui&id=1399364355975327744)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2021-05-26\n  dependencies: []\n  description: CogView is a transformer model for text-to-image generation\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'The license is provided in the [[Github repository]](https://github.com/THUDM/CogView)\n\n      '\n    value: Apache 2.0\n  modality: text; image\n  model_card: none\n  monitoring: ''\n  name: CogView\n  nationality: China\n  organization: Tsinghua University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 4B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2105.13290\n- access:\n    explanation: The model checkpoints are available for download from [[BAAI]](https://model.baai.ac.cn/model-detail/100041)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2022-04-28\n  dependencies: []\n  description: CogView 2 is a hierarchical transformer for text-to-image generation\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'The license is provided in the [[Github repository]](https://github.com/THUDM/CogView2)\n\n      '\n    value: Apache 2.0\n  modality: text; image\n  model_card: none\n  monitoring: ''\n  name: CogView 2\n  nationality: China\n  organization: Tsinghua University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 6B parameters (dense)\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2204.14217\n- access:\n    explanation: Model checkpoints are available for download from https://github.com/THUDM/CogVideo\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the model paper was released\n    value: 2022-05-29\n  dependencies: []\n  description: CogVideo is a transformer model for text-to-video generation\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: 'The license is provided in the [[Github repository]](https://github.com/THUDM/CogVideo)\n\n      '\n    value: Apache 2.0\n  modality: text; video\n  model_card: none\n  monitoring: ''\n  name: CogVideo\n  nationality: China\n  organization: Tsinghua University\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2205.15868\n- access:\n    explanation: Model checkpoints are available from the [[GitHub repository]](https://github.com/THUDM/GLM-130B/blob/main/MODEL_LICENSE)\n    value: open\n  analysis: ''\n  created_date:\n    explanation: The date the model website was made public\n    value: 2022-08-04\n  dependencies:\n  - The Pile\n  - GLM-130B Chinese corpora\n  - P3\n  - DeepStruct finetuning dataset\n  description: GLM-130B is a bidirectional language model trained on English and Chinese\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: Unique model license. See the [[GitHub repository]](https://github.com/THUDM/GLM-130B/blob/main/MODEL_LICENSE)\n    value: GLM-130B License\n  modality: text; text\n  model_card: none\n  monitoring: ''\n  name: GLM-130B\n  nationality: China\n  organization: Tsinghua University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 130B parameters (dense)\n  training_emissions: ''\n  training_hardware: THUDM 96 DGX-A100 (40G) cluster\n  training_time: ''\n  type: model\n  url: https://keg.cs.tsinghua.edu.cn/glm-130b/\n- access: open\n  analysis: Evaluated on image captioning and visual question answering benchmarks.\n  created_date: 2023-11-06\n  dependencies:\n  - Vicuna\n  - CLIP\n  description: CogVLM is a powerful open-source visual language foundation model\n  feedback: none\n  intended_uses: Future multimodal research\n  license:\n    explanation: Model license can be found at https://github.com/THUDM/CogVLM/blob/main/MODEL_LICENSE.\n      Code license is under Apache 2.0\n    value: custom\n  modality: image, text; text\n  model_card: none\n  monitoring: none\n  name: CogVLM\n  nationality: unknown\n  organization: Zhipu AI, Tsinghua University\n  prohibited_uses: none\n  quality_control: none\n  size: 17B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: 4096 A100 days\n  type: model\n  url: https://arxiv.org/pdf/2311.03079.pdf\n- access: open\n  analysis: Evaluated on AlpacaEval Leaderboard benchmarks.\n  created_date: 2023-06-27\n  dependencies:\n  - UltraChat\n  description: UltraLM is a series of chat language models trained on UltraChat.\n  feedback: https://huggingface.co/openbmb/UltraLM-13b/discussions\n  intended_uses: ''\n  license: LLaMA 2\n  modality: text; text\n  model_card: https://huggingface.co/openbmb/UltraLM-13b\n  monitoring: unknown\n  name: UltraLM\n  nationality: China\n  organization: Tsinghua University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 13B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/thunlp/UltraChat#UltraLM\n- access: open\n  analysis: UltraLM evaluated off of UltraChat is evaluated on standard LLM benchmarks.\n  created_date: 2023-04-20\n  datasheet: https://huggingface.co/datasets/stingning/ultrachat\n  dependencies: []\n  description: UltraChat is an open-source, large-scale, and multi-round dialogue\n    data powered by Turbo APIs.\n  excluded: ''\n  feedback: https://huggingface.co/datasets/stingning/ultrachat/discussions\n  included: Dialogue data of questions about the world, writing and creation tasks,\n    and questions on existing materials.\n  intended_uses: ''\n  license: MIT\n  modality: text\n  monitoring: unknown\n  name: UltraChat\n  nationality: China\n  organization: Tsinghua University\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://github.com/thunlp/UltraChat\n"
  },
  {
    "path": "assets/twelvelabs.yaml",
    "content": "- access: open\n  analysis: Evaluated in comparison to SOTA video-to-language models.\n  created_date: 2023-10-23\n  dependencies:\n  - MSR-VTT\n  - Video-ChatGPT Video Descriptions Dataset\n  description: Pegasus-1 is a video-language foundation model.\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: video; text\n  model_card: none\n  monitoring: none\n  name: Pegasus-1\n  nationality: USA\n  organization: Twelve Labs\n  prohibited_uses: ''\n  quality_control: Data selected and cleaned to eliminate toxic and biased content.\n  size: 80B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://app.twelvelabs.io/blog/introducing-pegasus-1\n- access: open\n  analysis: Marengo-2.6 sets new benchmarks in zero-shot text-to-video, text-to-image,\n    and text-to-audio retrieval tasks with a single embedding model.\n  created_date: 2024-03-01\n  dependencies: []\n  description: \"Marengo 2.6 is a new state-of-the-art (SOTA) multimodal foundation\\\n    \\ model capable of performing any-to-any search tasks, including Text-To-Video,\\\n    \\ Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more.\\_\"\n  feedback: none\n  intended_uses: ''\n  license: unknown\n  modality: audio, image, text; audio, image, video\n  model_card: none\n  monitoring: ''\n  name: Marengo 2.6\n  nationality: USA\n  organization: Twelve Labs\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://www.twelvelabs.io/blog/introducing-marengo-2-6\n"
  },
  {
    "path": "assets/uae.yaml",
    "content": "- access: open\n  analysis: Evaluated in 1-shot against the PaLM models, with the tasks of the paper\n    \"Language models are few-shot learners\" (Brown et al., 2020); (2) on a small set\n    of few-shot tasks reported by the GPT-4 paper; (3) against state-of-the-art models\n    across common sense, question answering, and code tasks; (4) against models which\n    also report results from the EAI Harness, for which we are able to compare with\n    identical prompts and metrics.\n  created_date: 2023-06-14\n  dependencies:\n  - RefinedWeb\n  description: \"Falcon-40B is a 40B parameters causal decoder-only model built by\\\n    \\ TII and trained on 1,000B tokens of\\_RefinedWeb enhanced with curated corpora.\"\n  feedback: https://huggingface.co/tiiuae/falcon-40b/discussions\n  intended_uses: Research on large language models; as a foundation for further specialization\n    for specific use cases.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/tiiuae/falcon-40b\n  monitoring: None\n  name: Falcon-40B\n  nationality: UAE\n  organization: UAE Technology Innovation Institute\n  prohibited_uses: irresponsible or harmful use or production use without adequate\n    assessment of risks and mitigation.\n  quality_control: ''\n  size: 40B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 384 A100 40GB GPUs\n  training_time: 2 months\n  type: model\n  url: https://arxiv.org/pdf/2311.16867.pdf\n- access: open\n  analysis: ''\n  created_date: 2023-06-01\n  datasheet: https://huggingface.co/datasets/tiiuae/falcon-refinedweb\n  dependencies: []\n  description: RefinedWeb is a high-quality five trillion tokens web-only English\n    pretraining dataset.\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license:\n    explanation: License can be found at https://huggingface.co/datasets/tiiuae/falcon-refinedweb\n    value: custom\n  modality: text\n  monitoring: ''\n  name: RefinedWeb\n  nationality: UAE\n  organization: UAE Technology Innovation Institute\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 600B tokens\n  type: dataset\n  url: https://arxiv.org/pdf/2306.01116.pdf\n- access: open\n  analysis: Falcon-180B outperforms LLaMA-2, StableLM, RedPajama, MPT on the Open\n    LLM Leaderboard at https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.\n  created_date: 2023-09-06\n  dependencies:\n  - RefinedWeb\n  description: Falcon-180B is a 180B parameters causal decoder-only model built by\n    TII and trained on 3,500B tokens of RefinedWeb enhanced with curated corpora.\n  feedback: https://huggingface.co/tiiuae/falcon-180b/discussions\n  intended_uses: Research on large language models; as a foundation for further specialization\n    for specific use cases.\n  license: unknown\n  modality: text; text\n  model_card: https://huggingface.co/tiiuae/falcon-180B\n  monitoring: None\n  name: Falcon-180B\n  nationality: UAE\n  organization: UAE Technology Innovation Institute\n  prohibited_uses: Production use without adequate assessment of risks and mitigation;\n    any use cases which may be considered irresponsible or harmful.\n  quality_control: ''\n  size: 180B parameters (dense)\n  training_emissions: ''\n  training_hardware: 4096 A100 40GB GPUs\n  training_time: 9 months\n  type: model\n  url: https://arxiv.org/pdf/2311.16867.pdf\n"
  },
  {
    "path": "assets/unknown.yaml",
    "content": "- access:\n    explanation: producing the currently strongest open-source base model.\n    value: open\n  analysis: Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source\n    models and achieves performance comparable to leading closed-source models.\n  created_date: 2025-01-14\n  dependencies:\n  - DeepSeek-R1\n  description: DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B\n    total parameters and 37B activated per token. It utilizes Multi-head Latent Attention\n    (MLA) and adopts innovative strategies for improved performance, such as an auxiliary-loss-free\n    load balancing and a multi-token prediction training objective. Comprehensive\n    evaluations show it achieves performance comparable to leading closed-source models.\n  feedback: unknown\n  intended_uses: unknown\n  license: MIT\n  modality: unknown\n  model_card: https://huggingface.co/deepseek-ai/DeepSeek-V3\n  monitoring: unknown\n  name: DeepSeek-V3\n  nationality: unknown\n  organization: DeepSeek\n  prohibited_uses: unknown\n  quality_control: Post-training includes knowledge distillation from the DeepSeek-R1\n    model, incorporating verification and reflection patterns to enhance reasoning\n    performance.\n  size:\n    explanation: a strong Mixture-of-Experts (MoE) language model with 671B total\n      parameters with 37B activated for each token.\n    value: 671B parameters (sparse)\n  training_emissions: unknown\n  training_hardware:\n    explanation: DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.\n    value: H800 GPUs\n  training_time:\n    explanation: DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.\n    value: 2.788M GPU hours\n  type: model\n  url: https://huggingface.co/deepseek-ai/DeepSeek-V3\n- access: open\n  analysis: ''\n  created_date: 2024-04-24\n  dependencies: []\n  description: We present a methodology for scaling LLMs called depth up-scaling (DUS)\n    , which encompasses architectural modifications and continued pretraining. In\n    other words, we integrated Mistral 7B weights into the upscaled layers, and finally,\n    continued pre-training for the entire model. SOLAR-10.7B has remarkable performance.\n    It outperforms models with up to 30B parameters, even surpassing the recent Mixtral\n    8X7B model. For detailed information, please refer to the experimental table.\n    Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness\n    and adaptability for your fine-tuning needs. Our simple instruction fine-tuning\n    using the SOLAR-10.7B pre-trained model yields significant performance improvements\n    (SOLAR-10.7B-Instruct-v1.0).\n  feedback: https://www.upstage.ai/solar-llm\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/upstage/SOLAR-10.7B-v1.0\n  monitoring: ''\n  name: SOLAR\n  nationality: South Korea\n  organization: Upstage.ai\n  prohibited_uses: ''\n  quality_control: ''\n  size: 10.7B parameters\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://arxiv.org/abs/2312.15166\n"
  },
  {
    "path": "assets/uw.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date: 2022-01-07\n  datasheet: ''\n  dependencies:\n  - YouTube\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: MIT\n  modality: video\n  monitoring: ''\n  name: YT-Temporal-1B\n  nationality: USA\n  organization: University of Washington\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: 20M videos\n  type: dataset\n  url: https://arxiv.org/abs/2201.02639\n"
  },
  {
    "path": "assets/uwashington.yaml",
    "content": "- access: open\n  analysis: Reports results on the Vicuna benchmark and compares performance level\n    and time expenditure with ChatGPT\n  created_date: 2023-05-23\n  dependencies:\n  - QLoRA\n  - OASST1\n  description: Guanaco is a model family trained with QLORA, an efficient finetuning\n    approach that reduces memory usage enough to finetune a 65B parameter model on\n    a single 48GB GPU while preserving full 16-bit finetuning task performance.\n  feedback: ''\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: Guanaco\n  nationality: USA\n  organization: University of Washington\n  prohibited_uses: ''\n  quality_control: ''\n  size: 33B parameters (dense)\n  training_emissions: ''\n  training_hardware: A single 24 GB GPU\n  training_time: ''\n  type: model\n  url: https://arxiv.org/pdf/2305.14314v1.pdf\n- access: open\n  analysis: Evaluated on benchmark music understanding tasks on SOTA music datasets.\n  created_date: 2023-10-11\n  dependencies:\n  - LLaMA 2\n  - Jukebox\n  description: Llark is an instruction-tuned multimodal model for music understanding.\n  feedback: none\n  intended_uses: ''\n  license: Apache 2.0\n  modality: audio, text; text\n  model_card: none\n  monitoring: ''\n  name: Llark\n  nationality: USA\n  organization: University of Washington, Spotify\n  prohibited_uses: ''\n  quality_control: ''\n  size: 12B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 4 80GB NVIDIA A40 GPUs\n  training_time: 54 hours\n  type: model\n  url: https://arxiv.org/pdf/2310.07160.pdf\n"
  },
  {
    "path": "assets/vago.yaml",
    "content": "- access: open\n  analysis: Evaluated on standard benchmarks in comparison to other German language\n    models.\n  created_date: 2023-11-28\n  dependencies:\n  - OpenHermes 2.5 Mistral\n  - OpenOrca Mistral\n  description: SauerkrautLM is a German language model merged from two Mistral derivatives.\n  feedback: https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO\n  monitoring: unknown\n  name: SauerkrautLM\n  nationality: USA\n  organization: VAGO Solutions\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO\n"
  },
  {
    "path": "assets/viable.yaml",
    "content": "- access: limited\n  adaptation: unknown\n  created_date: unknown\n  dependencies:\n  - OpenAI API\n  description: 'Viable analyzes qualitative consumer feedback and provides summary\n    feedback to companies.\n\n    '\n  failures: unknown\n  feedback:\n    explanation: The feedback mechanism isn't explicitly listed. Company contact information\n      is provided in the Terms of Service [[Terms of Service]](https://www.askviable.com/terms-of-service).\n    value: unknown\n  intended_uses: 'Intended to be used by companies to digest qualitative consumer\n    feedback.\n\n    '\n  license: unknown\n  monitoring: unknown\n  monthly_active_users: unknown\n  name: Viable\n  nationality: USA\n  organization: Viable\n  output_space: Question and answer, summarization, sentiment analysis, topic identification\n  prohibited_uses: 'Prohibited uses are listed in the Terms of Service [[Terms of\n    Service]](https://www.askviable.com/terms-of-service). The terms don''t include\n    statements specific to the use of the content generated by the system or GPT-3.\n\n    '\n  quality_control: unknown\n  terms_of_service: https://www.askviable.com/terms-of-service\n  type: application\n  url: https://www.askviable.com/\n  user_distribution: unknown\n"
  },
  {
    "path": "assets/vilm.yaml",
    "content": "- access: open\n  analysis: none\n  created_date: 2023-10-02\n  dependencies:\n  - Falcon-180B\n  description: Vulture is a further fine-tuned causal Decoder-only LLM built by Virtual\n    Interactive (VILM) on top of Falcon.\n  feedback: https://huggingface.co/vilm/vulture-180b/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/vilm/vulture-180b\n  monitoring: unknown\n  name: Vulture\n  nationality: USA\n  organization: Virtual Interactive\n  prohibited_uses: Production use without adequate assessment of risks and mitigation;\n    any use cases which may be considered irresponsible or harmful.\n  quality_control: ''\n  size: 180B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: 3000 A100 hours\n  type: model\n  url: https://huggingface.co/vilm/vulture-180b\n"
  },
  {
    "path": "assets/wayve.yaml",
    "content": "- access: closed\n  analysis: Compared to self before being scaled on quality of video generation.\n  created_date: 2023-09-29\n  dependencies: []\n  description: \"GAIA-1 (\\u2018Generative AI for Autonomy\\u2019) is a generative world\\\n    \\ model that leverages video, text, and action inputs to generate realistic driving\\\n    \\ scenarios while offering fine-grained control over ego-vehicle behavior and\\\n    \\ scene features.\"\n  feedback: none\n  intended_uses: The main use cases are pure representation learning, planning (look-ahead\n    search), or learning a policy in the world model (neural simulator)\n  license: unknown\n  modality: text, video; video\n  model_card: none\n  monitoring: none\n  name: GAIA-1\n  nationality: UK\n  organization: Wayve\n  prohibited_uses: none\n  quality_control: none\n  size: 9B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 32 A100 80GB GPUs\n  training_time: 4 days\n  type: model\n  url: https://arxiv.org/pdf/2309.17080.pdf\n"
  },
  {
    "path": "assets/writer.yaml",
    "content": "- access: open\n  analysis: Evaluated on the SuperGLUE benchmark\n  created_date:\n    explanation: The model was stated to be published in January, but which day is\n      not specified on the website.\n    value: 2023-01-01\n  dependencies:\n  - Writer dataset\n  description: Palmyra is a family of privacy-first LLMs for enterprises trained on\n    business and marketing writing.\n  feedback: https://huggingface.co/Writer/palmyra-base/discussions\n  intended_uses: generating text from a prompt\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/Writer/palmyra-base\n  monitoring: ''\n  name: Palmyra\n  nationality: USA\n  organization: Writer\n  prohibited_uses: ''\n  quality_control: ''\n  size: 20B parameters (dense)\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: unknown\n  type: model\n  url: https://gpt3demo.com/apps/palmyra\n- access: open\n  analysis: ''\n  created_date:\n    explanation: The model was stated to be published in April, but which day is not\n      specified on the website.\n    value: 2023-04-01\n  dependencies:\n  - Palmyra\n  - Camel dataset\n  description: Camel is an instruction-following large language model tailored for\n    advanced NLP and comprehension capabilities.\n  feedback: https://huggingface.co/Writer/camel-5b-hf/discussions\n  intended_uses: ''\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://huggingface.co/Writer/camel-5b-hf\n  monitoring: ''\n  name: Camel\n  nationality: USA\n  organization: Writer\n  prohibited_uses: ''\n  quality_control: ''\n  size: 5B parameters (dense)\n  training_emissions: unknown\n  training_hardware: ''\n  training_time: unknown\n  type: model\n  url: https://chatcamel.vercel.app/\n- access: open\n  analysis: The model was evaluated across 9 diverse biomedical datasets where it\n    achieved state-of-the-art results with an average score of 85.9%. It also demonstrated\n    robust capability in efficiently processing extensive medical documents, as showcased\n    by its near-perfect score in the NIH evaluation.\n  created_date: 2024-09-08\n  dependencies:\n  - Palmyra-X-004\n  description: Palmyra-Med-70b-32k is a Language Model designed specifically for healthcare\n    and biomedical applications. It builds upon the foundation of Palmyra-Med-70b\n    and offers an extended context length. This model integrates the DPO dataset,\n    a custom medical instruction dataset, and has been fine-tuned to meet the unique\n    requirements of the medical and life sciences sectors. It is ranked as the leading\n    LLM on biomedical benchmarks with an average score of 85.87%.\n  feedback: Downstream problems with this model should be reported via email to Hello@writer.com.\n  intended_uses: Palmyra-Med-70b-32k is intended for non-commercial and research use\n    in English. Specifically, it can be used for tasks like clinical entity recognition\n    and knowledge discovery from EHRs, research articles, and other biomedical sources.\n    It excels in analyzing and summarizing complex clinical notes, EHR data, and discharge\n    summaries.\n  license: Writer open model\n  modality: text; text\n  model_card: https://huggingface.co/Writer/Palmyra-Med-70B-32K\n  monitoring: Measures in place to monitor misuse include the addition of watermarks\n    in all models built by Writer.com to detect and prevent misuse and illegal use.\n  name: Palmyra-Med-70b-32k\n  nationality: USA\n  organization: Writer\n  prohibited_uses: The model should not be used in any manner that violates applicable\n    laws or regulations. It is not to be used in direct patient care, clinical decision\n    support, or professional medical purposes. The model should not replace professional\n    medical judgment.\n  quality_control: The model has been refined using Policy Optimization and a finely\n    crafted fine-tuning dataset. It contains watermarks to detect and prevent misuse\n    and illegal use.\n  size: 70B parameters\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://huggingface.co/Writer/Palmyra-Med-70B-32K\n- access: open\n  analysis: The model has been evaluated internally, showing state-of-the-art results\n    on various financial datasets. It has shown 100% accuracy in needle-in-haystack\n    tasks and superior performance in comparison to other models in the organization's\n    internal finance evaluations. It passed the CFA Level III test with a score of\n    73% and has shown superior performance compared to other models in the long-fin-eval,\n    an internally created benchmark that simulates real-world financial scenarios.\n  created_date: 2024-09-08\n  dependencies:\n  - Palmyra-X-004\n  - Writer in-house financial instruction dataset\n  description: Palmyra-Fin-70B-32K is a leading LLM built specifically to meet the\n    needs of the financial industry. It has been fine-tuned on an extensive collection\n    of high-quality financial data and it is highly adept at handling the specific\n    needs of the finance field. It outperforms other large language models in various\n    financial tasks and evaluations, achieving state-of-the-art results across various\n    financial datasets. Its strong performance in tasks like financial document analysis,\n    market trend prediction, risk assessment underscores its effective grasp of financial\n    knowledge.\n  feedback: Downstream problems with this model should be reported to Hello@writer.com.\n  intended_uses: The model is intended for use in English for financial analysis,\n    market trend prediction, risk assessment, financial report generation, automated\n    financial advice, and answering questions from long financial documents. It can\n    be used for entity recognition, identifying key financial concepts such as market\n    trends, economic indicators, and financial instruments from unstructured text.\n  license: Writer open model license\n  modality: text; text\n  model_card: https://huggingface.co/Writer/Palmyra-Fin-70B-32K\n  monitoring: Unknown\n  name: Palmyra-Fin-70B-32K\n  nationality: USA\n  organization: Writer\n  prohibited_uses: The model should not be used in manners that violate applicable\n    laws or regulations, including trade compliance laws, use prohibited by Writer's\n    acceptable use policy, the Writer open model license, and in languages other than\n    English. It is advised not to use the model for direct financial decision-making\n    or professional financial advice without human oversight. Always consult a qualified\n    financial professional for personal financial needs.\n  quality_control: The model was trained with a proprietary internal database and\n    a fine-tuning recipe to ensure a greater level of domain-specific accuracy and\n    fluency. Still, the model may contain inaccuracies, biases, or misalignments and\n    its usage for direct financial decision-making or professional financial advice\n    without human oversight is not recommended. It has not been rigorously evaluated\n    in real-world financial settings and it requires further testing, regulatory compliance,\n    bias mitigation, and human oversight for more critical financial applications.\n  size: 70B parameters (dense)\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://huggingface.co/Writer/Palmyra-Fin-70B-32K\n"
  },
  {
    "path": "assets/xai.yaml",
    "content": "- access: open\n  analysis: Grok-1 was evaluated on a range of reasoning benchmark tasks and on curated\n    foreign mathematic examination questions.\n  created_date: 2023-11-04\n  dependencies: []\n  description: \"Grok is an AI modeled after the Hitchhiker\\u2019s Guide to the Galaxy,\"\n  feedback: none\n  intended_uses: Grok-1 is intended to be used as the engine behind Grok for natural\n    language processing tasks including question answering, information retrieval,\n    creative writing and coding assistance.\n  license: Apache 2.0\n  modality: text; text\n  model_card: https://x.ai/model-card/\n  monitoring: unknown\n  name: Grok-1\n  nationality: USA\n  organization: xAI\n  prohibited_uses: none\n  quality_control: none\n  size: 314B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://grok.x.ai/\n- access: limited\n  analysis: The model is evaluated in a zero-shot setting without chain-of-thought\n    prompting. The evaluation domains include multi-disciplinary reasoning, understanding\n    documents, science diagrams, charts, screenshots, photographs and real-world spatial\n    understanding. The model shows competitive performance with existing frontier\n    multimodal models.\n  created_date: 2024-04-12\n  dependencies: []\n  description: Grok-1.5V is a first-generation multimodal model which can process\n    a wide variety of visual information, including documents, diagrams, charts, screenshots,\n    and photographs.\n  feedback: none\n  intended_uses: Grok-1.5V can be used for understanding documents, science diagrams,\n    charts, screenshots, photographs. It can also translate diagrams into Python code.\n  license: unknown\n  modality: image, text; text\n  model_card: none\n  monitoring: unknown\n  name: Grok-1.5V\n  nationality: USA\n  organization: xAI\n  prohibited_uses: unknown\n  quality_control: ''\n  size: unknown\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://x.ai/blog/grok-1.5v\n- access: limited\n  analysis: The Grok-2 models were evaluated across a series of academic benchmarks\n    that included reasoning, reading comprehension, math, science, and coding. They\n    showed significant improvements over the earlier model Grok-1.5 and achieved performance\n    levels competitive to other frontier models in areas such as graduate-level science\n    knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems\n    (MATH).\n  created_date: 2024-08-13\n  dependencies: []\n  description: Grok-2 is a state-of-the-art language model with advanced capabilities\n    in both text and vision understanding. It demonstrates significant improvements\n    in reasoning with retrieved content and tool use capabilities over its previous\n    Grok-1.5 model. It also excels in vision-based tasks and delivers high performance\n    in document-based question answering and visual math reasoning (MathVista). Grok-2\n    mini, a smaller version of Grok-2, is also introduced, offering a balance between\n    speed and answer quality.\n  feedback: Issues with the model should be reported to xAI.\n  intended_uses: The model is intended to be used for understanding text and vision,\n    answering questions, collaborating on writing, solving coding tasks, and enhancing\n    search capabilities.\n  license: Unknown\n  modality: text; text, vision\n  model_card: unknown\n  monitoring: Unknown\n  name: Grok-2\n  nationality: USA\n  organization: xAI\n  prohibited_uses: Unknown\n  quality_control: Grok-2 models were tested in real-world scenarios using AI tutors\n    that engaged with the models across a variety of tasks and selected the superior\n    response based on specific criteria outlined in the guidelines.\n  size: unknown\n  training_emissions: Unknown\n  training_hardware: Unknown\n  training_time: Unknown\n  type: model\n  url: https://x.ai/blog/grok-2\n"
  },
  {
    "path": "assets/xverse.yaml",
    "content": "- access: open\n  analysis: Evaluated across a range of standard datasets regarding multiple model\n    capabilities like language comprehension and logical reasoning.\n  created_date: 2023-11-06\n  dependencies: []\n  description: XVERSE is a multilingual large language model for over 40 languages.\n  feedback: https://huggingface.co/xverse/XVERSE-65B/discussions\n  intended_uses: ''\n  license:\n    explanation: can be found at https://github.com/xverse-ai/XVERSE-65B/blob/main/MODEL_LICENSE.pdf.\n      Code license is under Apache 2.0\n    value: custom\n  modality: text; text\n  model_card: https://huggingface.co/xverse/XVERSE-65B\n  monitoring: unknown\n  name: XVERSE\n  nationality: USA\n  organization: Xverse\n  prohibited_uses: ''\n  quality_control: ''\n  size: 65B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://github.com/xverse-ai/XVERSE-65B\n"
  },
  {
    "path": "assets/xwin.yaml",
    "content": "- access: open\n  analysis: Evaluated on AlpacaEval benchmark against SOTA LLMs.\n  created_date: 2023-09-20\n  dependencies: []\n  description: Xwin-LM is a LLM, which on release, ranked top 1 on AlpacaEval, becoming\n    the first to surpass GPT-4 on this benchmark.\n  feedback: https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1/discussions\n  intended_uses: ''\n  license: LLaMA2\n  modality: text; text\n  model_card: https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1\n  monitoring: none\n  name: Xwin-LM\n  nationality: USA\n  organization: Xwin\n  prohibited_uses: ''\n  quality_control: ''\n  size: 70B parameters (dense)\n  training_emissions: unknown\n  training_hardware: unknown\n  training_time: unknown\n  type: model\n  url: https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1\n"
  },
  {
    "path": "assets/yandex.yaml",
    "content": "- access: open\n  analysis: ''\n  created_date:\n    explanation: The date the blog post about YaLM was published.\n    value: 2022-06-22\n  dependencies:\n  - The Pile\n  - Yandex Russian Pretraining Dataset\n  description: YaLM is a 100B parameter autoregressive model trained on 25% English\n    and 75% Russian text.\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: ''\n    value: Apache 2.0\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: YaLM\n  nationality: Russia\n  organization: Yandex\n  prohibited_uses: ''\n  quality_control: ''\n  size: 100B parameters (dense)\n  training_emissions: ''\n  training_hardware: Yandex 800 A100 Cluster\n  training_time: ''\n  type: model\n  url: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6\n- access: open\n  adaptation: ''\n  created_date: 2022-06-23\n  dependencies:\n  - YaLM\n  description: Yandex is a search engine and web portal. Yandex offers internet search\n    and other services\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license:\n    explanation: '\"3.1. The Rightholder, on the basis of a simple (non-exclusive)\n      license, shall grant the User a free, worldwide, non-transferable right to use\n      the Software as set forth below:\n\n      3.1.1. To use the Software in accordance with its functionality by copying and\n      installing it on the User''s personal device(s). The User shall be entitled\n      to install the Software on any number of personal devices.\n\n      3.1.2. To reproduce and to distribute this Software in an unmodified form and\n      solely for non-commercial purposes (free of charge).\" - excerpt from the Terms\n      of Service document for browsers.\n\n      '\n    value: custom\n  monitoring: ''\n  monthly_active_users: ''\n  name: Yandex Search\n  nationality: Russia\n  organization: Yandex\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: https://yandex.com/legal/browser_agreement/\n  type: application\n  url: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6\n  user_distribution: ''\n"
  },
  {
    "path": "assets/you.yaml",
    "content": "- access: closed\n  analysis: ''\n  created_date: unknown\n  datasheet: ''\n  dependencies: []\n  description: ''\n  excluded: ''\n  feedback: ''\n  included: ''\n  intended_uses: ''\n  license: unknown\n  modality: text\n  monitoring: ''\n  name: You dataset\n  nationality: USA\n  organization: You\n  prohibited_uses: ''\n  quality_control: ''\n  sample: []\n  size: unknown\n  type: dataset\n  url: https://you.com/\n- access: closed\n  analysis: ''\n  created_date: unknown\n  dependencies:\n  - You dataset\n  description: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  modality: text; text\n  model_card: ''\n  monitoring: ''\n  name: You model\n  nationality: USA\n  organization: You\n  prohibited_uses: ''\n  quality_control: ''\n  size: unknkown\n  training_emissions: ''\n  training_hardware: ''\n  training_time: ''\n  type: model\n  url: https://you.com/\n- access: open\n  adaptation: ''\n  created_date: unknown\n  dependencies:\n  - You model\n  description: You.com is a search engine built on artificial intelligence that provides\n    users with a customized search experience while keeping their data 100% private.\n  failures: ''\n  feedback: ''\n  intended_uses: ''\n  license: unknown\n  monitoring: ''\n  monthly_active_users: ''\n  name: You Search\n  nationality: USA\n  organization: You\n  output_space: ''\n  prohibited_uses: ''\n  quality_control: ''\n  terms_of_service: ''\n  type: application\n  url: https://you.com/\n  user_distribution: ''\n"
  },
  {
    "path": "assets/zhejiang.yaml",
    "content": "- access: open\n  analysis: Evaluated on standard and ocean science benchmarks in comparison to other\n    similar-sized models.\n  created_date: 2024-02-06\n  dependencies: []\n  description: OceanGPT is the first-ever LLM in the ocean domain and displays expertise\n    in various ocean science tasks.\n  feedback: https://huggingface.co/zjunlp/OceanGPT-7b/discussions\n  intended_uses: ''\n  license: MIT\n  modality: text; text\n  model_card: https://huggingface.co/zjunlp/OceanGPT-7b\n  monitoring: unknown\n  name: OceanGPT\n  nationality: China\n  organization: Zhejiang University\n  prohibited_uses: ''\n  quality_control: ''\n  size: 7B parameters (dense)\n  training_emissions: unknown\n  training_hardware: 6 A800 NVIDIA GPUs\n  training_time: 7 days\n  type: model\n  url: https://www.zjukg.org/project/OceanGPT/\n"
  },
  {
    "path": "components/home.html",
    "content": "<div class=\"bg-white d-flex justify-content-center\">\n\n  <div class=\"container-fluid mx-5 mt-5 px-5 py-4\" style=\"width: 60%;\"> \n\n    <div class=\"row align-items-center\">\n      <div class=\"col-xs-6 center-block mx-3\">\n        <img height=\"130\" width=\"auto\" src=\"./img/icon.svg\" alt=\"Ecosystem Graphs Logo\">\n      </div>\n      <div class=\"col-xs-6\">\n        <h1 class=\"fs-1\">ecosystem</h1>\n        <h1 class=\"fs-1\">graphs</h1>\n      </div>\n    </div>\n\n    <div class=\"my-5 mx-3\">\n      <p class=\"lead\">\n        Foundation models define an emerging ecosystem of datasets, models, and\n        applications. As their footprint grows, so does their\n        <b>societal impact</b>.\n      </p>\n      <p class=\"lead\">\n        <b>Ecosystem graphs</b> is a framework to document the foundation models\n        ecosystem, namely both the assets (datasets, models, and applications)\n        and their relationships. Using it, one can answer questions such as:\n        What are the latest foundation models? Who builds them and where are\n        they used downstream? What are the general trends over time? We hope\n        that ecosystem graphs will be a useful resource for researchers,\n        application developers, policymakers, and the public to better\n        understand the foundation models ecosystem.\n      </p>\n    </div>\n\n    <div class=\"center-block text-center m-9\">\n      <a href=\"index.html?mode=table\" class=\"btn btn-lg btn-primary m-3 px-5\" role=\"button\">VIEW</a>\n      <a href=\"https://github.com/stanford-crfm/ecosystem-graphs\" class=\"btn btn-lg btn-primary m-3 px-5\" role=\"button\">CODE</a>\n      <button class=\"btn btn-lg btn-primary m-3 px-4\" role=\"button\" id=\"download-csv\" onclick=\"downloadCSV()\">DOWNLOAD</button>\n    </div>\n\n\n  </div>\n\n</div>\n"
  },
  {
    "path": "components/nav.html",
    "content": "<nav class=\"navbar sticky-top navbar-expand-sm navbar-light bg-white py-4\">\n\n    <a class=\"navbar-brand\" href=\"index.html?mode=home\">\n      <img src=\"./img/icon.svg\" width=\"30\" height=\"30\" class=\"d-inline-block align-top\" alt=\"Ecosystem Graphs Logo\">\n      ecosystem graphs\n    </a>\n\n    <button class=\"navbar-toggler\" type=\"button\" data-toggle=\"collapse\" data-target=\"#nav-content\" aria-controls=\"nav-content\" aria-expanded=\"false\" aria-label=\"Toggle navigation\">\n      <span class=\"navbar-toggler-icon\"></span>\n    </button>\n\n    <div class=\"collapse navbar-collapse\" id=\"nav-content\">\n      <ul class=\"navbar-nav ml-auto\">\n        <li class=\"nav-item mx-1\"><a class=\"nav-link\" href=\"index.html?mode=table\">TABLE</a></li>\n        <li class=\"nav-item mx-1\"><a class=\"nav-link\" href=\"index.html?mode=graph\">GRAPH</a></li>\n        <li class=\"nav-item\"><a class=\"nav-link\" href=\"https://github.com/stanford-crfm/ecosystem-graphs\">GITHUB</a></li>\n        <li class=\"nav-item mx-1\">\n          <a class=\"btn btn-primary\" href=\"https://forms.gle/VqnSsZhv62hJ5rP36\">CONTRIBUTE</a>\n        </li>\n      </ul>\n    </div>\n\n  </nav>\n"
  },
  {
    "path": "components/table.html",
    "content": "<div class=\"m-5\">\n\n  <div class=\"d-flex justify-content-between\">\n    <div class=\"col-xs-6\">\n      <input class=\"search-box m-3 px-3 py-2\" type=\"text\" id=\"table-search\" placeholder=\"Search table\">\n    </div>\n    <div class=\"col-xs-6\">\n      <button class=\"btn btn-md btn-secondary\" role=\"button\" id=\"download-csv\" onclick=\"downloadCSV()\">Download Assets CSV</button>\n    </div>\n  </div>\n\n  <div id=\"table-container\">\n\n  </div>\n</div>"
  },
  {
    "path": "css/main.css",
    "content": "\n/* Imports */\n@import url(\"https://fonts.googleapis.com/css?family=Hahmlet:400,300,700\");\n@import url(\"https://fonts.googleapis.com/css?family=Montserrat:400,300,700\");\n\n/* Style settings for our website */\n:root {\n  /* Colors */\n  --primary-color: #7CB342;\n  --primary-color-light: #DCEDC8;\n  --primary-color-dark: #4b6c27;\n  --primary-color-slightly-darker: #587f2f;\n  --secondary-color: #76462c;\n  --secondary-color-light: #d4bfb4;\n  --secondary-color-dark: #392215;\n  --light: #f8faf5;\n  --dark: #0d1207;\n  --white: #ffffff;\n  /* Fonts */\n  --title-font: 'Montserrat';\n  --body-font: 'Hahmlet';\n}\n\n/* Useful custom classes */\n.primary-color {\n  color: var(--primary-color) !important;\n}\n\n.search-box {\n  font-family: var(--title-font);\n  border: 0.1rem solid #ddd; /* Add a grey border */\n  border-radius: 0.2rem;\n  margin: 0rem;\n}\n\n.search-box:focus {\n  outline-style: solid;\n  outline-width: 0.1rem;\n  outline-color: var(--primary-color);\n  margin: 0rem;\n}\n\n/* Modifications to bootstrap defaults */\nbody {\n  font-family: var(--body-font) !important;\n  color: var(--dark) !important;\n}\n\nh1 {\n  font-family: var(--title-font) !important;\n}\n\na, a:hover {\n  color: var(--primary-color-slightly-darker)\n}\n\nnav {\n  font-family: var(--title-font) !important;\n}\n\n.table td {\n  padding: 0.4rem 0.75rem;\n}\n\n.table p {\n  margin: 0;\n}\n\n.btn {\n  font-family: var(--title-font) !important;\n}\n\n*:focus {\n  border-color: var(--primary-color-light) !important;\n  box-shadow: none !important;\n}\n\n.btn-primary, .btn-primary:hover, .btn-primary:active, .btn-primary:visited, .btn-primary:hover {\n  background-color: var(--primary-color) !important;\n  border-color: var(--primary-color) !important;\n}\n\n.btn-secondary, .btn-secondary:hover, .btn-secondary:active, .btn-secondary:visited, .btn-secondary:hover {\n  background-color: var(--white) !important;\n  border-color: var(--primary-color-dark) !important;\n  color: var(--primary-color-dark) !important;\n  border-width: 0.1rem;\n}\n\n/* Custom CSS styles */\n.block {\n  margin: 10px;\n}\n\n.dataset {\n  color: darkgreen;\n}\n\n.model {\n  color: darkred;\n}\n\n.application {\n  color: darkblue;\n}\n\n.help-icon {\n  margin-left: 3px;\n  margin-right: 3px;\n}\n\n.graph {\n  width: 100%;\n  height: 600px;\n  display: block;\n}\n\n.field-explanation {\n  color: gray;\n  font-size: smaller;\n  font-style: italic;\n}"
  },
  {
    "path": "index.html",
    "content": "<!DOCTYPE html>\n<html lang=\"en\">\n\t<head>\n    <!-- Metadata -->\n\t\t<title>Ecosystem Graphs for Foundation Models</title>\n    <link rel=\"icon\" href=\"img/icon.svg\">\n\t\t<meta charset=\"utf-8\">\n\t\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1, shrink-to-fit=no\">\n    <!-- Style -->\n\t\t<link rel=\"stylesheet\" href=\"https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta/css/bootstrap.min.css\" integrity=\"sha384-/Y6pD6FV/Vv2HJnA6t+vslU6fwYXjCFtcEpHbNJ0lyAFsXTsjBbfaDjzALeQsN6M\" crossorigin=\"anonymous\">\n    <link rel=\"stylesheet\" type=\"text/css\" href=\"css/main.css\">\n    <!-- Scripts -->\n\t\t<script src=\"https://code.jquery.com/jquery-3.2.1.min.js\"></script>\n\t\t<script src=\"https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.11.0/umd/popper.min.js\" integrity=\"sha384-b/U6ypiBEHpOf/4+1nzFpr53nxSS+GLCkfwBdFNTxtclqqenISfwAzpKaMNFNmj4\" crossorigin=\"anonymous\"></script>\n\t\t<script src=\"https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta/js/bootstrap.min.js\" integrity=\"sha384-h0AbiXch4ZDo7tp9hKZ4TsHbi047NrKGLO3SEJAg45jXxnGIfYzk4Si90RDIqNm1\" crossorigin=\"anonymous\"></script>\n    <script src=\"https://cdnjs.cloudflare.com/ajax/libs/jquery.tablesorter/2.29.0/js/jquery.tablesorter.min.js\"></script>\n    <script src=\"https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.1.0/js-yaml.min.js\"></script>\n    <script src=\"https://cdnjs.cloudflare.com/ajax/libs/showdown/2.0.3/showdown.min.js\"></script>\n    <script src=\"https://cdnjs.cloudflare.com/ajax/libs/cytoscape/3.21.0/cytoscape.min.js\"></script>\n    <script src=\"js/general.js\"></script>\n    <script src=\"js/main.js\"></script>\n\t</head>\n\n\t<body>\n\t\t<div class=\"container-fluid\">\n\n      <!-- Navigation bar -->\n      <div id=\"nav-placeholder\">\n        <script> renderNavBar(); </script>\n      </div>\n\n      <!-- Page content -->\n      <div class=\"row\">\n        <div class=\"col-sm-12\" id=\"main\">\n          <script> loadAssetsAndRenderPageContent(); </script>\n        </div>\n      </div>\n\n\t\t</div>\n\t</body>\n</html>\n"
  },
  {
    "path": "js/general.js",
    "content": "function encodeUrlParams(params) {\n  let s = '';\n  for (let k in params)\n    s += (s === '' ? '?' : '&') + k + '=' + encodeURIComponent(params[k]);\n  return s;\n}\n\nfunction decodeUrlParams(str) {\n  const params = {};\n  if (str === '')\n    return params;\n  const items = str.substring(1).split(/&/);\n  for (let i = 0; i < items.length; i++) {\n    const pair = items[i].split(/=/);\n    params[pair[0]] = decodeURIComponent(pair[1]);\n  }\n  return params;\n}\n\nfunction updateBrowserLocation(params) {\n  // Update the address bar\n  window.history.pushState({}, '', window.location.pathname + encodeUrlParams(params));\n}\n\nfunction openBrowserLocation(params) {\n  // Update the address bar\n  window.open(window.location.pathname + encodeUrlParams(params), 'blank_');\n}\n\nfunction multilineHtml(s) {\n  return s.replace(/\\n/g, '<br>');\n}\n\nfunction renderError(e) {\n  return $('<div>').addClass('alert alert-danger').append(multilineHtml(e));\n}\n\nfunction getField(obj, key) {\n  if (key in obj) {\n    return obj[key];\n  }\n  console.error('Missing key', key, 'in', obj);\n  return null;\n}\n"
  },
  {
    "path": "js/main.js",
    "content": "class Field {\n  constructor(raw) {\n    this.name = getField(raw, 'name');\n    this.description = getField(raw, 'description');\n    this.type = raw.type;\n  }\n}\n\nclass Schema {\n  constructor(name, raw) {\n    this.name = name;\n    this.fields = raw.map((field) => new Field(field));\n  }\n\n  hasField(name) {\n    return this.fields.some((field) => field.name === name);\n  }\n}\n\n/**\n * AssetField represents a field of an Asset. Each AssetField has a value and\n * an optional explanation for the value.\n */\nclass AssetField {\n  constructor(value, explanation) {\n    this.value = value;\n    this.explanation = explanation;\n  }\n }\n\n/**\n * An Asset in the ecosystem.\n */\nclass Asset {\n\n  constructor(item, schema) {\n    // Set the parameters\n    this.schema = schema;\n    this.type = getField(item, 'type');\n    // This field is an object matching field_name => AssetField\n    this.fields = {};\n\n    // Loop through the schema to populate the asset fields\n    schema.fields.forEach((schemaField) => {\n\n      // The asset fields we will populate\n      let value = null, explanation = null;\n\n      // We expect each assetField to be an object with a \"value\" and an\n      // \"explanation\" key. In the absence of these keys, we read the object\n      // directly (e.g. arrays or dates)\n      const schemaFieldValue = getField(item, schemaField.name);\n      const isObject = typeof schemaFieldValue === 'object';\n      const hasValue = isObject && 'value' in schemaFieldValue;\n      const hasExplanation = isObject && 'explanation' in schemaFieldValue;\n      if (hasValue && hasExplanation) {\n        value = getField(schemaFieldValue, 'value');\n        explanation = schemaFieldValue.explanation;\n      } else if (hasValue || hasExplanation) {\n        console.error(\n          'Error in schemaField', schemaField, 'Don\\'t use value and explanation fields without one another. Couldn\\'t read the asset, fix.'\n        );\n      } else {\n        value = schemaFieldValue;\n      }\n\n      // Once value is extracted, we perform type checking.\n      if (schemaField.type === 'list') {\n        if (!(value instanceof Array)) {\n          console.error('Expected list for', schemaField.name, 'but got', value);\n        }\n      } else {\n        if (!['string', 'number', 'boolean'].includes(typeof(value)) && !(value instanceof Date)) {\n          console.error('Expected string, number, boolean, or date for', schemaField.name, 'but got', value);\n        }\n      }\n\n      this.fields[schemaField.name] = new AssetField(value, explanation);\n    });\n\n    // Print warnings about any extraneous fields\n    for (let key in item) {\n      if (key != 'type' && !schema.hasField(key)) {\n        console.error('Extra key', key, 'in', item);\n      }\n    }\n\n    // To be filled out later\n    this.downstreamAssets = [];\n  }\n}\n\nfunction updateDownstreamAssets(nameToAsset) {\n  // Use each asset's dependencies (upstream pointers) to update the corresponding downstream pointers.\n  Object.values(nameToAsset).forEach((asset) => {\n    asset.fields.dependencies.value.forEach((dep) => {\n      if (!(dep in nameToAsset)) {\n        console.error('The node ', dep, 'does not exist in the graph.');\n      }\n      const depAsset = nameToAsset[dep];\n      if (depAsset) {\n        depAsset.downstreamAssets.push(asset.fields.name.value);\n      }\n    });\n  });\n}\n\n////////////////////////////////////////////////////////////\n\nfunction getStandardSize(value) {\n  const thousand = 1000;\n  const dataSizeDict = {'B': 0, 'KB': 1, 'MB': 2, 'GB': 3, 'TB': 4, 'PB': 5};\n  const modelSizeDict = {'M': 2, 'B': 3, 'T': 4};\n  if (value.includes('parameters')) {\n    var size = value.split(' ')[0];\n    const unit = size.slice(-1);\n    const exp = modelSizeDict[unit];\n    size = size.substring(0, size.length - 1);\n    value = Math.pow(thousand, exp);\n  } else {\n    const arr = value.split(' ');\n    const num = parseInt(arr[0]);\n    const unit = arr[2];\n    const exp = dataSizeDict[unit];\n    value = Math.pow(thousand, exp);\n  }\n  return value\n}\n\nfunction compareValues(valueA, valueB, columnName) {\n  // Filter for null, empty and unknown values\n  const specialValues = [null, \"\", \"unknown\", \"n/a\"]\n  const valueAIndex = specialValues.indexOf(valueA);\n  const valueBIndex = specialValues.indexOf(valueB);\n  const bothNotFound = valueAIndex === -1 && valueBIndex === -1\n  if (!bothNotFound) {\n    return valueBIndex - valueAIndex;\n  }\n\n  // Standardize the value\n  if (columnName === \"Created date\") {\n    valueA = Date.parse(valueA);\n    valueB = Date.parse(valueB);\n  } else if (columnName === \"Size\") {\n    valueA = getStandardSize(valueA);\n    valueB = getStandardSize(valueB);\n  } \n  // @TODO Decide how to sort the \"dependencies\" column\n  \n  // Compare the values\n  if (valueA > valueB) {\n    return 1;\n  } else if (valueA < valueB) {\n    return -1;\n  } else {\n    return 0;\n  }\n}\n\n// const $search = $('<input>', {type: 'text', size: 40, placeholder: 'Enter regex query (enter to open all)'});\n// $search.keyup((e) => {\n//   // Open up all match specs\n//   if (e.keyCode === 13) {\n//     const href = encodeUrlParams(Object.assign(urlParams, {runSpec: '.*' + query + '.*'}));\n//     window.open(href);\n//   }\n//   query = $search.val();\n//   renderTable();\n// });\n\nfunction filterTable(query) {\n\n  // Get rows\n  const rows = $('tr').slice(1); // Skip the header row\n\n  query = query.toLowerCase();\n\n  if (query.includes('=')) {\n    // Filter columns\n    // @TODO placeholder comment\n    let a = 2;\n  } else {\n    // Filter all values\n    [].forEach.call(rows, function (row) {\n      const fieldValue = $(row).find(\"td .field-value\");\n      const values = $(fieldValue).children().not(\":last\");\n\n      let found = false;\n      [].forEach.call(values, function (value) {\n        const innerHTML = value.innerHTML.toLowerCase();\n        if (innerHTML.includes(query)) {\n          found = true;\n        }\n      });\n\n      if (found) {\n        $(row).show();\n      } else {\n        $(row).hide();\n      }\n\n    });\n  }\n}\n\nfunction sortColumn(columnName) {\n  // Get the index for the column\n  const headerRow = $('tr').slice(0)[0];\n  const index = headerRow.innerText.split('\\t').indexOf(columnName);\n\n  // Get the current direction\n  const direction = globalThis.tableDirections[index] || 'desc';\n\n  // A factor based on the direction\n  const multiplier = (direction === 'asc') ? 1 : -1;\n\n  // Get rows\n  const rows = $('tr').slice(1); // Skip the header row\n\n  // Sort rows\n  rows.sort((rowA, rowB) => {\n    const fvA = $(rowA).find(\"td .field-value\")[index];\n    const fvB = $(rowB).find(\"td .field-value\")[index];\n    const valueA = $(fvA).children()[0].innerHTML;\n    const valueB = $(fvB).children()[0].innerHTML;\n    return multiplier * compareValues(valueA, valueB, columnName);\n  });\n\n  // Create a new tbody\n  const newTBody = $('<tbody>');\n\n  // // // Append new rows\n  [].forEach.call(rows, function (row) {\n    newTBody.append(row);\n  });\n\n  // Replace the table body with the newly created table body\n  const tbody = $('tbody');\n  tbody.replaceWith(newTBody);\n\n  // Reverse the direction\n  globalThis.tableDirections[index] = direction === 'asc' ? 'desc' : 'asc';\n}\n\n////////////////////////////////////////////////////////////\n\nfunction helpIcon(help, link) {\n  // Show a ?\n  return $('<a>', {href: link, target: 'blank_', class: 'help-icon'}).append($('<img>', {src: 'img/info-icon.png', width: 15, title: help}));\n}\n\nfunction renderList(items) {\n  const $list = $('<span>');\n  items.forEach((item, i) => {\n    if (i > 0) {\n      $list.append(' | ');\n    }\n    $list.append(item);\n  });\n  return $list;\n}\n\nfunction renderAccessType(value) {\n  const valueToColor = {\n    'open': 'lightgreen', // '#c0eec0',  // Slightly lighter than lightgreen\n    'limited': 'yellow', // 'papayawhip',\n    'closed': 'lightgray', // '#f0b0b0'  // Slightly lighter than lightcoral\n  }\n  const color = value in valueToColor ? valueToColor[value] : 'mistyrose';\n  const textElement = $('<span class=\"btn\">').css({\"background-color\": color, \"min-width\": 100}).append(value);\n  return textElement;\n}\n\nfunction renderField(schemaField) {\n  const text = schemaField.name.replace(/_/g, ' ');\n  return $('<div>').append(text).append(helpIcon(schemaField.description, '#'));\n}\n\nfunction renderValueExplanation(type, value, explanation) {\n  const converter = new showdown.Converter();\n  // Render value\n  let renderedValue = $('<div>').append(value);\n  if (value === 'unknown' || value === 'none') {\n    renderedValue = converter.makeHtml(value);\n  } else if (value instanceof Date) {\n    let dateString = value.toLocaleDateString('en-us', {year:\"numeric\", month:\"short\", day:\"numeric\"});\n    renderedValue = converter.makeHtml(dateString);\n  } else if (type === 'list') {\n    renderedValue = renderList(value.map((elemValue) => renderValueExplanation(null, elemValue, null)));\n  } else if (type === 'url') {\n    renderedValue = $('<a>', {href: value, target: 'blank_'}).append(value);\n  } else if (type === 'access_type') {\n    renderedValue = renderAccessType(value);\n  } else if (typeof(value) === 'string') {\n    renderedValue = converter.makeHtml(value);\n  }\n  // Wrap the value in a custom element\n  const fieldValue = $('<div>', {class: 'field-value'}).append(renderedValue);\n\n  // Render explanation, if provided\n  let renderedExplanation = converter.makeHtml(explanation);\n  const fieldExplanation = $('<div>', {class: 'field-explanation'}).append(renderedExplanation);\n  return $('<div>').append(fieldValue)\n                    .append(fieldExplanation);\n}\n\nfunction renderAssetLink(nameToAsset, assetName) {\n  const asset = getField(nameToAsset, assetName);\n  if (!asset) {\n    return assetName;\n  }\n  const href = encodeUrlParams({asset: asset.fields.name.value});\n  return $('<a>', {href, target: 'blank_'}).append(assetName);\n}\n\nfunction renderAssetLinks(nameToAsset, assetNames) {\n  return renderList(assetNames.map((name) => renderAssetLink(nameToAsset, name)));\n}\n\nfunction renderAsset(nameToAsset, assetName) {\n  const asset = getField(nameToAsset, assetName);\n  if (!asset) {\n    return renderError('Invalid asset: ' + assetName);\n  }\n\n  const $card = $('<div>');\n\n  $card.append($('<h3>').append(asset.fields.name.value));\n\n  // Render upstream and downstream assets\n  $card.append($('<div>', {class: 'block'}).append('Upstream: ').append(renderAssetLinks(nameToAsset, asset.fields.dependencies.value)));\n  $card.append($('<div>', {class: 'block'}).append('Downstream: ').append(renderAssetLinks(nameToAsset, asset.downstreamAssets)));\n\n  // Render a single asset\n  const $table = $('<table>', {class: 'table'});\n  const $tbody = $('<tbody>');\n  asset.schema.fields.forEach((schemaField) => {\n    const value = asset.fields[schemaField.name].value;\n    const explanation = asset.fields[schemaField.name].explanation;\n\n    $tbody.append($('<tr>')\n      .append($('<td>').append(renderField(schemaField)))\n      .append($('<td>').append(schemaField.name === 'dependencies' ? renderAssetLinks(nameToAsset, value) : renderValueExplanation(schemaField.type, value, explanation)))\n    );\n  });\n\n  $table.append($tbody);\n  $card.append($table);\n\n  return $card;\n}\n\nfunction renderFieldName(fieldName) {\n  // Capitalizes and removes '_' from a fieldName (which should be a field name\n  // from the schema, such as created_date, name, etc.)\n  const capitalized = fieldName.charAt(0).toUpperCase() + fieldName.slice(1);\n  if (capitalized === 'Modality') {\n    return 'Modality (In; Out)'\n  }\n  return capitalized.replace('_', ' ');\n}\n\n/**\n * Renders a table given the column properties.\n * @param {Array.<Asset>} selectedAssets - Array of the assets that will be\n *   rendered in the custom table.\n * @param {Object.<string, Asset>} allNameToAsset - Object mapping the names of\n *   all the assets in the ecosystem to their Asset representation.\n * @param {Array.<string>} columnNames - Columns that will be included in the\n *   table.\n */\nfunction renderCustomTable(selectedAssets, allNameToAsset, columnNames) {\n  const $table = $('<table>', {class: 'table'});\n  const $thead = $('<thead>');\n  const $headRow = $('<tr>');\n  // Add column names\n  columnNames.forEach((columnName, index) => {\n    const renderedName = renderFieldName(columnName);\n    const onclickString = 'sortColumn(\\'' + renderedName + '\\')';\n    $headRow.append($('<th>', {onClick: onclickString}).append(renderedName));\n  });\n  $thead.append($headRow);\n  $table.append($thead);\n  // Keep track of the directions, used to sort asc and desc\n  globalThis.tableDirections = Array.from(columnNames).map(function (header) {\n    return '';\n  });\n  // Add body\n  const $tbody = $('<tbody>');\n  selectedAssets.forEach((asset) => {\n    const $bodyRow = $('<tr>');\n    columnNames.forEach((columnName) => {\n      // Set the default value\n      let tdValue = 'n/a';\n\n      // Render the field value\n      if (columnName === 'type') {\n        tdValue = renderValueExplanation('', asset.type, null);\n      } else if (columnName === 'name') {\n        const href = encodeUrlParams({asset: asset.fields.name.value});\n        const fieldValue = $('<a>', {href, target: 'blank_'}).append(asset.fields.name.value);\n        tdValue = $('<div>', {class: 'field-value'}).append(fieldValue);\n      } else if (columnName === 'dependencies') {\n        const fieldValue = renderAssetLinks(allNameToAsset, asset.fields.dependencies.value);\n        tdValue = $('<div>', {class: 'field-value'}).append(fieldValue);\n      } else {\n        //\n        let type = '';\n        asset.schema.fields.forEach(item => item.name === columnName ? type = item.type : '');\n        const value = columnName in asset.fields ? asset.fields[columnName].value : tdValue;\n        const explanation = columnName in asset.fields ? asset.fields[columnName].explanation : tdValue;\n        tdValue = renderValueExplanation(type, value, explanation);\n      }\n      $bodyRow.append($('<td>').append(tdValue));\n    });\n    $tbody.append($bodyRow);\n  });\n  $table.append($tbody);\n  return $table;\n}\n\nfunction renderAssetsTable(nameToAsset) {\n  // Render the table\n  const columnNames = [\n    'type', 'name', 'organization', 'created_date', 'size', 'modality', 'access', 'license',\n    'dependencies',\n  ];\n  const assets = Object.keys(nameToAsset).map((key) => (nameToAsset[key]));\n  const $table = renderCustomTable(assets, nameToAsset, columnNames);\n\n  return $table;\n}\n\nfunction renderAssetsGraph(nameToAsset) {\n  // Render the ecosystem graph\n  const $graph = $('<div>', {class: 'graph'});\n\n  const nodes = [];\n  const edges = [];\n\n  const typeToShape = {\n    'dataset': 'ellipse',\n    'model': 'square',\n    'application': 'hexagon',\n    'undocumented': 'ellipse',\n  };\n\n  const typeToColor = {\n    'dataset': 'orange',\n    'model': 'dodgerblue',\n    'application': 'firebrick',\n    'undocumented': 'gray',\n  };\n\n  Object.values(nameToAsset).forEach((asset) => {\n    nodes.push({\n      data: {\n        id: asset.fields.name.value,\n        shape: typeToShape[asset.type],\n        color: typeToColor[asset.type],\n      },\n    });\n\n    asset.fields.dependencies.value.forEach((dep) => {\n      edges.push({\n        data: {\n          id: asset.fields.name.value + '->' + dep,\n          source: dep,\n          target: asset.fields.name.value,\n        },\n      });\n    });\n  });\n\n  undocumentedDeps = Array.from(new Set(\n    edges.map(edge => edge.data.source)\n         .filter(dep => !(dep in nameToAsset))\n  )).sort();\n\n  console.log(\"The following dependencies are undocumented.\");\n  console.log(undocumentedDeps);\n\n  undocumentedDeps.forEach(dep => {\n    nodes.push({\n      data: {\n        id: dep,\n        shape: typeToShape[\"undocumented\"],\n        color: typeToColor[\"undocumented\"],\n      },\n    });\n  });\n\n  $graph.ready(() => {\n    const cy = cytoscape({\n      container: $graph.get(0),\n      elements: {nodes, edges},\n      layout: {\n        name: 'cose',\n        randomize: false,\n        componentSpacing: 100,\n        nodeOverlap: 10,\n        //nodeDimensionsIncludeLabels: true,\n        nodeRepulsion: function( node ){ return 4096; },\n        //padding: 30,\n        gravity: 0.5\n      },\n      style: [\n        {\n          selector: 'node',\n          style: {\n            label: 'data(id)',\n            shape: 'data(shape)',\n            'background-color': 'data(color)',\n            'text-wrap': 'wrap',\n            'text-max-width': 30,\n            'text-valign': 'center',\n            'color': 'white',\n            'text-outline-width': 3,\n            'text-outline-color': 'data(color)',\n            'text-outline-opacity': 1,\n            'padding': 40,\n          },\n        },\n        {\n          selector: 'edge',\n          style: {\n            'width': 3,\n            'curve-style': 'straight',\n            'target-arrow-shape': 'triangle',\n            'target-arrow-color': '#7CB342',\n            'line-fill': 'linear-gradient',\n            'line-gradient-stop-colors': '#392215 #7CB342',\n            'target-endpoint': 'outside-to-node-or-label',\n            'line-cap': 'square'\n          },\n        },\n      ],\n    });\n\n    cy.on('click', (e) => {\n      const data = e.target._private.data;\n      const assetName = data.id;\n      if (assetName && !data.source) {\n        openBrowserLocation({asset: assetName});\n      }\n    });\n  });\n\n  return $graph;\n}\n\n////////////////////////////////////////////////////////////\n\n// UI Helpers\n\nfunction downloadCSV() {\n  var downloadLink = document.createElement('a');\n  downloadLink.href = 'resources/all_assets.csv';\n  downloadLink.download = 'assets.csv';\n  document.body.appendChild(downloadLink);\n  downloadLink.click();\n  document.body.removeChild(downloadLink);\n}\n\nfunction setUpSearch() {\n  const $search = $('#table-search');\n  let query = '';\n  $search.keyup((e) => {\n    if (e.keyCode === 13) {\n      // Enter @TODO\n    }\n    query = $search.val();\n    filterTable(query);\n  });\n}\n\n////////////////////////////////////////////////////////////\n\n// Home Page\nfunction renderHomePage(pageContainer) {\n  $(\"nav\").hide();\n  $.get(\"components/home.html\", function(data){\n    pageContainer.append(data);\n  });\n}\n\n// Table Page\nfunction renderTablePage(pageContainer, nameToAsset) {\n  $.get(\"components/table.html\", function(data){\n    pageContainer.append(data);\n    const tableContainer = $(\"#table-container\");\n    setUpSearch();\n    const table = renderAssetsTable(nameToAsset);\n    tableContainer.append(table);\n    $(\".field-explanation\").toggle();\n    // Sort by created_date, name\n    sortColumn(\"Name\");\n    sortColumn(\"Created date\");\n  });\n\n}\n\n// Graph Page\nfunction renderGraphPage(pageContainer, nameToAsset) {\n  const graph = renderAssetsGraph(nameToAsset);\n  pageContainer.append(graph);\n}\n\nfunction renderPageContent(nameToAsset) {\n  const urlParams = decodeUrlParams(window.location.search);\n  const pageContainer = $('#main');\n  const mode = urlParams.mode || 'home';\n  if (urlParams.asset) {\n    const content = renderAsset(nameToAsset, urlParams.asset);\n    pageContainer.append(content);\n  } else if (mode === 'home') {\n    renderHomePage(pageContainer);\n  } else if (mode === 'table') {\n    renderTablePage(pageContainer, nameToAsset);\n  } else if (mode === 'graph') {\n    renderGraphPage(pageContainer, nameToAsset);\n  } else {\n    const content = renderError('Unrecognized mode: ' + mode + '.');\n    pageContainer.append(content);\n  }\n}\n\nfunction renderNavBar() {\n  $.get(\"components/nav.html\", function(data){\n    $(\"#nav-placeholder\").replaceWith(data);\n  });\n}\n\nfunction loadAssetsAndRenderPageContent() {\n\n  const paths = [\n    'assets/zhejiang.yaml',\n    'assets/you.yaml',\n    'assets/yandex.yaml',\n    'assets/xwin.yaml',\n    'assets/xverse.yaml',\n    'assets/xai.yaml',\n    'assets/writer.yaml',\n    'assets/wayve.yaml',\n    'assets/vilm.yaml',\n    'assets/viable.yaml',\n    'assets/vago.yaml',\n    'assets/uwashington.yaml',\n    'assets/uw.yaml',\n    'assets/unknown.yaml',\n    'assets/uae.yaml',\n    'assets/twelvelabs.yaml',\n    'assets/tsinghua.yaml',\n    'assets/triml.yaml',\n    'assets/trevor.yaml',\n    'assets/transformify.yaml',\n    'assets/toronto.yaml',\n    'assets/tokyo.yaml',\n    'assets/together.yaml',\n    'assets/tiger.yaml',\n    'assets/suno.yaml',\n    'assets/stonybrook.yaml',\n    'assets/stanford.yaml',\n    'assets/stability.yaml',\n    'assets/stability_ai.yaml',\n    'assets/spotify.yaml',\n    'assets/speak.yaml',\n    'assets/soochow.yaml',\n    'assets/snap.yaml',\n    'assets/skt.yaml',\n    'assets/singapore.yaml',\n    'assets/shop.yaml',\n    'assets/shanghai.yaml',\n    'assets/sciphi.yaml',\n    'assets/sana.yaml',\n    'assets/samba.yaml',\n    'assets/salesforce.yaml',\n    'assets/rwkv.yaml',\n    'assets/runway_ai,_inc..yaml',\n    'assets/robin.yaml',\n    'assets/reka.yaml',\n    'assets/reexpress.yaml',\n    'assets/rakuten.yaml',\n    'assets/qwen_team.yaml',\n    'assets/quora.yaml',\n    'assets/quizlet.yaml',\n    'assets/portkey.yaml',\n    'assets/pleias.yaml',\n    'assets/perplexity.yaml',\n    'assets/peking.yaml',\n    'assets/paladin.yaml',\n    'assets/oxford.yaml',\n    'assets/othersideai.yaml',\n    'assets/osu.yaml',\n    'assets/orion.yaml',\n    'assets/openx.yaml',\n    'assets/openlemur.yaml',\n    'assets/openbmb.yaml',\n    'assets/openai.yaml',\n    'assets/ollama.yaml',\n    'assets/oasst.yaml',\n    'assets/nvidia.yaml',\n    'assets/nucleus.yaml',\n    'assets/nous.yaml',\n    'assets/notion.yaml',\n    'assets/nolano.yaml',\n    'assets/nextdoor.yaml',\n    'assets/neeva.yaml',\n    'assets/ncsoft.yaml',\n    'assets/naver.yaml',\n    'assets/nanyang.yaml',\n    'assets/mosaic.yaml',\n    'assets/moreh.yaml',\n    'assets/moonhub.yaml',\n    'assets/mistral.yaml',\n    'assets/mila.yaml',\n    'assets/microsoft.yaml',\n    'assets/meta.yaml',\n    'assets/maya.yaml',\n    'assets/mathai.yaml',\n    'assets/lmsys.yaml',\n    'assets/llm360.yaml',\n    'assets/linkedin.yaml',\n    'assets/lg.yaml',\n    'assets/lg_ai_research.yaml',\n    'assets/lehigh.yaml',\n    'assets/latitude.yaml',\n    'assets/laion.yaml',\n    'assets/laion_ev.yaml',\n    'assets/kunlun.yaml',\n    'assets/ktai.yaml',\n    'assets/kotoba.yaml',\n    'assets/konan.yaml',\n    'assets/khan.yaml',\n    'assets/kakaobrain.yaml',\n    'assets/kaist.yaml',\n    'assets/juni.yaml',\n    'assets/internlm.yaml',\n    'assets/instacart.yaml',\n    'assets/inflection.yaml',\n    'assets/ideogram.yaml',\n    'assets/idea.yaml',\n    'assets/ibm.yaml',\n    'assets/huggingface.yaml',\n    'assets/hubspot.yaml',\n    'assets/h2oai.yaml',\n    'assets/greenbit.yaml',\n    'assets/google.yaml',\n    'assets/glm.yaml',\n    'assets/genmo.yaml',\n    'assets/fuse.yaml',\n    'assets/faraday.yaml',\n    'assets/evolutionaryscale.yaml',\n    'assets/epfl.yaml',\n    'assets/ens.yaml',\n    'assets/eleutherai.yaml',\n    'assets/duolingo.yaml',\n    'assets/duckduckgo.yaml',\n    'assets/dibt.yaml',\n    'assets/deepseek.yaml',\n    'assets/deepnight.yaml',\n    'assets/deepmind.yaml',\n    'assets/deci.yaml',\n    'assets/databricks.yaml',\n    'assets/cresta.yaml',\n    'assets/continue.yaml',\n    'assets/compute_01_research.yaml',\n    'assets/columbia.yaml',\n    'assets/cohere.yaml',\n    'assets/cognitive.yaml',\n    'assets/cognition.yaml',\n    'assets/cmu.yaml',\n    'assets/chatglm.yaml',\n    'assets/character.yaml',\n    'assets/cerebras.yaml',\n    'assets/causallm.yaml',\n    'assets/casia.yaml',\n    'assets/cartesia.yaml',\n    'assets/cagliostro.yaml',\n    'assets/bytedance.yaml',\n    'assets/brex.yaml',\n    'assets/boston.yaml',\n    'assets/bloomberg.yaml',\n    'assets/bigscience.yaml',\n    'assets/bigcode.yaml',\n    'assets/berkeley.yaml',\n    'assets/beitech.yaml',\n    'assets/be_my_eyes.yaml',\n    'assets/bain.yaml',\n    'assets/baidu.yaml',\n    'assets/baichuan.yaml',\n    'assets/baai.yaml',\n    'assets/avignon.yaml',\n    'assets/autogpt.yaml',\n    'assets/assembly.yaml',\n    'assets/aspia_space_institu.yaml',\n    'assets/argonne.yaml',\n    'assets/argilla.yaml',\n    'assets/apple.yaml',\n    'assets/anthropic.yaml',\n    'assets/amazon.yaml',\n    'assets/alibaba.yaml',\n    'assets/aleph_alpha.yaml',\n    'assets/ai21.yaml',\n    'assets/ai2.yaml',\n    'assets/adobe.yaml',\n    'assets/adept.yaml',\n    'assets/360.yaml',\n    'assets/01ai.yaml'\n  ];\n\n  $.get('js/schemas.yaml', {}, (response) => {\n    // First read the schema...\n    const typeToSchema = {};  // asset type (e.g., \"model\") => schema\n    const raw = jsyaml.load(response);\n    console.log('Read schemas', raw);\n    for (const name in raw) {\n      typeToSchema[name] = new Schema(name, raw[name]);\n    }\n\n    // Then read all the assets in parallel\n    const nameToAsset = {};  // asset name (e.g., \"GPT-3\") => asset\n    $.when(\n      ...paths.map((path) => {\n        return $.get(path, {}, (response) => {\n          const raw = jsyaml.load(response);\n          console.log('Read assets', path, raw);\n          raw.forEach((item) => {\n            nameToAsset[item.name] = new Asset(item, getField(typeToSchema, item.type));\n          });\n        })\n      })\n    ).then(() => {\n      updateDownstreamAssets(nameToAsset);\n      renderPageContent(nameToAsset);\n    });\n  });\n};\n"
  },
  {
    "path": "js/schemas.yaml",
    "content": "---\n\ndataset:\n  - name: name\n    description: Name of the dataset (must be a unique identifier).\n  # General\n  - name: organization\n    description: Organization that curated the dataset.\n  - name: description\n    description: Description of the dataset.\n  - name: created_date\n    description: When the dataset was created.\n  - name: url\n    type: url\n    description: Link to website or paper that provides a detailed description of\n      the dataset.\n  - name: datasheet\n    type: url\n    description: Link to the datasheet describing the dataset.\n  - name: modality\n    description: Modalities represented in the dataset (e.g., text, images, videos).\n  - name: size\n    description: How big (uncompressed) the dataset is.\n  - name: sample\n    type: list\n    description: Small sample of content from the dataset.\n  - name: analysis\n    description: Description of any analysis (evaluation) that was done on the dataset.\n  # Construction\n  - name: dependencies\n    type: list\n    description: A list of assets (e.g., datasets, models, applications) that were\n      used to create this dataset.\n  - name: included\n    description: Description of what data is included.\n  - name: excluded\n    description: Description of what data is excluded (e.g., filtered out) and why.\n  - name: quality_control\n    description: What measures were taken to ensure quality, safety, and mitigate\n      harms.\n  # Downstream\n  - name: access\n    type: access_type\n    description: Who can access (and use) the dataset.\n  - name: license\n    description: License of the dataset.\n  - name: intended_uses\n    description: Description of what the dataset can be used for downstream.\n  - name: prohibited_uses\n    description: Description of what the dataset should not be used for downstream.\n  - name: monitoring\n    description: Description of measures taken to monitor downstream uses of this\n      dataset.\n  - name: feedback\n    description: How downstream problems with this dataset should be reported.\n\nmodel:\n  - name: name\n    description: Name of the model (must be a unique identifier).\n  # General\n  - name: organization\n    description: Organization that created the model.\n  - name: description\n    description: Description of the model.\n  - name: created_date\n    description: When the model was created.\n  - name: url\n    type: url\n    description: Link to website or paper that provides a detailed description of\n      the model.\n  - name: model_card\n    type: url\n    description: Link to the model card describing this model.\n  - name: modality\n    description: Modalities represented in the model (e.g., Text, Text (English),\n      Code, Code (Python), Video, Image).\n  - name: analysis\n    description: Description of any analysis that was done on the model.\n  - name: nationality\n    description: Description of which nationality the model is affiliated with.\n  - name: size\n    description: Size (and shape) of the model (e.g., number of parameters in the\n      model)\n  # Construction\n  - name: dependencies\n    type: list\n    description: A list of assets (e.g., datasets, models, applications) that were\n      used to create this model.\n  - name: training_emissions\n    description: Estimate of the carbon emissions used to create this model.\n  - name: training_time\n    description: How much time it took to train this model.\n  - name: training_hardware\n    description: What hardware was used to train the model.\n  - name: quality_control\n    description: What measures were taken to ensure quality, safety, and mitigate\n      harms.\n  # Downstream\n  - name: access\n    type: access_type\n    description: Who can access (and use) this model.\n  - name: license\n    description: License of the model.\n  - name: intended_uses\n    description: Description of what the model can be used for downstream.\n  - name: prohibited_uses\n    description: Description of what the model should not be used for downstream.\n  - name: monitoring\n    description: Description of measures taken to monitor downstream uses of this\n      model.\n  - name: feedback\n    description: How downstream problems with this model should be reported.\n\napplication:\n  - name: name\n    description: Name of the product or service.\n  - name: organization\n    description: Organization that created the application.\n  - name: description\n    description: Description of the application.\n  - name: created_date\n    description: When the application was created.\n  - name: url\n    type: url\n    description: Link to website describing the application.\n  # Construction\n  - name: dependencies\n    type: list\n    description: A list of assets (e.g., datasets, models, applications) that were\n      used to create this application.\n  - name: adaptation\n    description: How the model was adapted (e.g., fine-tuned) to produce the derived\n      task model.\n  - name: output_space\n    description: Description of the output space (e.g., generation, ranking, etc.).\n  - name: quality_control\n    description: What measures were taken to ensure quality, safety, and mitigate\n      harms.\n  # Downstream\n  - name: access\n    type: access_type\n    description: Who can access (and use) the application.\n  - name: license\n    description: License of the application.\n  - name: terms_of_service\n    type: url\n    description: Link to the terms of service.\n  - name: intended_uses\n    description: Description of what the application can be used for downstream.\n  - name: prohibited_uses\n    description: Description of what the application should not be used for downstream.\n  - name: monitoring\n    description: Description of measures taken to monitor downstream uses of this\n      application.\n  - name: feedback\n    description: How downstream problems with this application should be reported.\n  # Deployment\n  - name: monthly_active_users\n    description: Rough order of magnitude of number of active users.\n  - name: user_distribution\n    description: Demographic and geographic diversity.\n  - name: failures\n    description: Description of known failures (and errors).\n"
  },
  {
    "path": "pre-commit.sh",
    "content": "#!/bin/bash\n\n# This script fails when any of its commands fail.\nset -e\n\nif ! [ -e venv ]; then\n  python3 -m pip install virtualenv\n  python3 -m virtualenv -p python3 venv\nfi\n\nvenv/bin/pip install -r requirements.txt\nvenv/bin/pip check\n\n# Run YAML formatter and linter.\necho \"\"\necho \"Running yamlfix\"\nvenv/bin/yamlfix --config-file ./yamlfix.toml ./**/*.yaml\necho \"\"\necho \"Checking for YAML formatting\"\nvenv/bin/yamllint . || (\n  echo \"[-] YAML formatting failed. Fix the errors and run the check again.\"\n  echo \"\"\n  exit 1\n)\necho \"[+] YAML formatting succeeded!\"\necho \"\"\n\n# Signal the end of the script.\necho \"Finished running pre-commit.sh\"\n"
  },
  {
    "path": "requirements.txt",
    "content": "yamllint==1.26.3\nyamlfix==1.2.0\n"
  },
  {
    "path": "resources/.gitkeep",
    "content": ""
  },
  {
    "path": "resources/all_assets.csv",
    "content": "access,analysis,created_date,datasheet,dependencies,description,excluded,feedback,included,intended_uses,license,modality,monitoring,name,nationality,organization,prohibited_uses,quality_control,sample,size,type,url,model_card,training_emissions,training_hardware,training_time,adaptation,failures,monthly_active_users,output_space,terms_of_service,user_distribution,value\nopen,Models fine-tuned on EXMODD and earlier dataset Image-Chat and then evaluated on Image-Chat validation set.,2023-10-17,,\"['YFCC100M', 'Image-Chat']\",EXMODD (Explanatory Multimodal Open-Domain Dialogue dataset) is a dataset built off the proposed MDCF (Multimodal Data Construction Framework).,,Feedback can be sent to authors via poplpr@bit.edu.cn,,,MIT,\"image, text\",,EXMODD,China,Beijing Institute of Technology,,,[],unknown,dataset,https://arxiv.org/pdf/2310.10967.pdf,,,,,,,,,,,\nopen,\"Evaluated on standard benchmarks including MMLU, CEval, and DROP.\",2023-11-13,,['Llama 2'],MiniMA is a smaller finetuned Llama 2 model adapted for Chinese.,,https://huggingface.co/GeneZC/MiniMA-3B/discussions,,,Llama 2,text; text,unknokwn,MiniMA,China,Beijing Institute of Technology,,,,3B parameters (dense),model,https://github.com/GeneZC/MiniMA,https://huggingface.co/GeneZC/MiniMA-3B,unknown,8 A100 80G GPUs,unknown,,,,,,,\nopen,,,,['Azure Cognitive Services for Vision'],\"More than 40 percent of LinkedIn's feed posts include at least one image. We want every member to have equal access to opportunity and are committed to ensuring that we make images accessible to our members who are blind or who have low vision so they can be a part of the online conversation. With Azure Cognitive Service for Vision, we can provide auto-captioning to edit and support alt. text descriptions.\",,,,,unknown,,,LinkedIn,USA,LinkedIn,,,,,application,https://www.linkedin.com/,,,,,,,,,,,\nopen,\"Evaluated on the OpenLLM benchmarks and, on release, outperforms all other 7B models on the OpenLLM Leaderboard.\",2023-12-12,,[],DeciLM is a LLM that on release ranks as the fastest and most accurate model of its size.,,,,This model is intended for commercial and research use in English and can be fine-tuned for use in other languages.,Apache 2.0,text; text,unknown,DeciLM,Israel,Deci,,,,7B parameters (dense),model,https://deci.ai/blog/introducing-decilm-7b-the-fastest-and-most-accurate-7b-large-language-model-to-date,https://deci.ai/model-zoo/decilm-7b/,unknown,NVIDIA A10 GPUs,unknown,,,,,,,\nopen,\"Authors use the dataset to learn a joint text-video embedding by leveraging more than 130M video clip-caption pairs. They then evaluate the learned embeddings on the tasks of localizing steps in instructional videos of CrossTask and textbased video retrieval on YouCook2, MSR-VTT and LSMDC datasets. They show that their learned embedding can perform better compared to models trained on existing carefully annotated but smaller video description datasets.\",2019-06-07,https://arxiv.org/pdf/1906.03327.pdf,['YouTube'],\"HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of 136M video clips with captions sourced from 1.2M Youtube videos (15 years of video) and 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness.\",\"Categories such as Relationships and Finance and Business, that may be more abstract, are excluded. Videos with less than 100 views are removed. Authors also ignore videos that have less than 100 words. Videos longer than 2,000 seconds are removed. As some videos may appear in several tasks, the videos are deduplicated based on YouTube IDs.\",,\"The dataset features 1.22 million videos from YouTube with a primary focus on videos containing \"\"visual tasks\"\", that involve some interaction with the physical world (e.g. Making peanut butter, Pruning a tree) as compared to others that are more abstract (e.g. Ending a toxic relationship, Choosing a gift). To obtain predominantly visual tasks, the authors limit them to one of 12 categories (Food and Entertaining, Home and Garden, Hobbies and Crafts, Cars & Other Vehicles, Pets and Animals, Holidays and Traditions, Personal Care and Style, Sports and Fitness, Health, Education and Communications, Arts and Entertainment, Computers and Electronics). They also restrict to the top 200 YouTube search results, as the latter ones may not be related to the query task.\",,Apache 2.0,\"text, video\",,HowTo100M,unknown,\"École Normale Supérieure, Inria\",\"No uses are explicitly prohibited by the authors. They note the following limitations of the dataset: \"\"We note that the distribution of identities and activities in the HowTo100M dataset may not be representative of the global human population and the diversity in society. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data.\"\"\n\",,[],136M video clips,dataset,https://arxiv.org/pdf/1906.03327.pdf,,,,,,,,,,,\nopen,unknown,2024-03-20,,[],\"Common Corpus is the largest public domain dataset released for training Large Language Models (LLMs). This dataset includes 500 billion words from a diverse range of cultural heritage initiatives and is the largest corpus in English, French, Dutch, Spanish, German and Italian. It supports efforts to train fully open LLMs on sources without copyright concerns.\",The data excluded are those that have copyright issues.,unknown,\"The dataset includes 500 billion words from a wide diversity of cultural heritage initiatives. It also has the largest English-speaking dataset to date with 180 billion words, including a major US collection of 21 million digitized newspapers and large monographs datasets collected by digital historian Sebastian Majstorovic. It also contains a huge volume of data in French (110 billion words), German (30 billion words), Spanish, Dutch and Italian, as well as data in low-resource languages that are currently underrepresented.\",\"The dataset is intended to support open and reproducible AI research, enhancing accessibility, diversity, and democracy in AI by enabling everyone to explore large models.\",,text,unknown,Common Corpus,USA,Pleias,It should not be used for tasks that infringe on copyright laws.,\"All data included in the corpus are from fully open and auditable sources, ensuring they are copyright-free.\",[],500 billion words,dataset,https://huggingface.co/blog/Pclanglais/common-corpus,,,,,,,,,,,\nopen,,2022-03-25,,[],CodeGen is a language model for code,,,,,\"none (model weights), BSD-3-Clause (code)\",\"code, text; code, text\",,CodeGen,USA,Salesforce,,,,16B parameters (dense),model,https://arxiv.org/abs/2203.13474,,,Unspecified Salesforce Compute (TPU-V4s),,,,,,,,\nopen,,2022-01-28,,\"['ViT-B', 'BERT', 'COCO', 'Visual Genome', 'Conceptual Captions', 'Conceptual 12M', 'SBU Captions', 'LAION-115M']\",,,,,,BSD-3-Clause,text; image,,BLIP,USA,Salesforce,,,,unknown,model,https://arxiv.org/abs/2201.12086,,,,,,,,,,,\nopen,,2022-01-28,,['LAION-400M'],,,,,,BSD-3-Clause,\"image, text\",,LAION-115M,USA,Salesforce,,,[],115M image-text pairs,dataset,https://arxiv.org/abs/2201.12086,,,,,,,,,,,\nlimited,,2023-03-07,,['ChatGPT API'],EinsteinGPT is generative AI for customer relationship management (CRFM).,,,,,unknown,,,EinsteinGPT,USA,Salesforce,,,,,application,https://www.salesforce.com/products/einstein/overview/?d=cta-body-promo-8,,,,,,,,,,,\nopen,\"BLIP-2 achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods\",2023-01-30,,['OPT'],BLIP-2 is a model that employs a generic and efficient pretraining strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models.,,https://huggingface.co/Salesforce/blip2-opt-2.7b/discussions,,,MIT,text; image,,BLIP-2,USA,Salesforce,,,,2.7B parameters (dense),model,https://arxiv.org/pdf/2301.12597.pdf,https://huggingface.co/Salesforce/blip2-opt-2.7b,unknown,unknown,less than 9 days,,,,,,,\nopen,Moirai has undergone a comprehensive evaluation in both in-distribution and out-of-distribution settings. It demonstrated its capabilities as a zero-shot forecaster and delivered competitive or superior performance compared to full-shot models.,2024-03-19,,['LOTSA'],\"Moirai is a cutting-edge time series foundation model, offering universal forecasting capabilities. It stands out as a versatile time series forecasting model capable of addressing diverse forecasting tasks across multiple domains, frequencies, and variables in a zero-shot manner.\",,https://huggingface.co/Salesforce/moirai-1.0-R-large/discussions,,\"Moirai can be used for time series forecasting in multiple domains. It offers robust zero-shot forecasting capabilities and eliminates the need for additional data, extensive computational resources, and expert input for achieving accurate forecasts.\",Apache 2.0,time-series; time-series,unknown,Moirai,USA,Salesforce,unknown,The performance of Moirai was evaluated through in-distribution and out-of-distribution settings.,,311M parameters,model,https://blog.salesforceairesearch.com/moirai/,https://huggingface.co/Salesforce/moirai-1.0-R-large,unknown,NVIDIA A100 40G GPUs,unknown,,,,,,,\nopen,,2024-03-19,https://huggingface.co/datasets/Salesforce/lotsa_data,[],LOTSA is the largest collection of open time series datasets with 27B observations across nine domains.,,https://huggingface.co/datasets/Salesforce/lotsa_data/discussions,\"domain observations in energy, transport, climate, cloudops, web, sales, nature, econ/finance, and healthcare\",pre-training Large Time Series Models,Apache 2.0,time-series,,LOTSA,USA,Salesforce,,,[],27B observations,dataset,https://arxiv.org/pdf/2402.02592.pdf,,,,,,,,,,,\nopen,,2022-12-15,,['The Pile'],,,,,,bigscience-bloom-rail-1.0,text; text,,BioMedLM,USA,Stanford,,,,2.7B parameters (dense),model,https://crfm.stanford.edu/2022/12/15/pubmedgpt.html,,,,,,,,,,,\nopen,Evaluated on own framework that tests domain-specific tasks in medical field.,2022-11-23,,\"['Stable Diffusion', 'RoentGen radiology dataset']\",RoentGen is a generative medical imaging model that can create visually convincing X-ray images.,,,,,,text; image,,RoentGen,USA,Stanford,,,,330M parameters (dense),model,https://arxiv.org/pdf/2211.12737.pdf,,unknown,64 A100 GPUs,60k training steps per day,,,,,,,\nopen,\"Evaluated on three physical control tasks, drawing, steering, and human body movement on various dynamics\",2023-06-12,,\"['GPT-2', 'BABEL', 'text-davinci-003']\",Model trained to generate language corrections for physical control tasks.,,,,,MIT,human trajectories; text,,CORGI,USA,Stanford,,,,124M parameters (dense),model,https://arxiv.org/pdf/2306.07012.pdf,,,one NVIDIA A40 GPU,unknown,,,,,,,\nopen,,2023-03-13,https://huggingface.co/datasets/tatsu-lab/alpaca,['text-davinci-003'],\"Alpaca dataset consistes of 52,000 instruction-following demonstrations generated in the style of the [Self-Instruct framework](https://github.com/yizhongw/self-instruct) using OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better.\n\",,Feedback can be provided on [[GitHub Issues]](https://github.com/tatsu-lab/stanford_alpaca/issues).,,Alpaca is intended and licensed for research use only.,CC BY-NC 4.0,text (English),,Alpaca dataset,USA,Stanford,,,[],52K instruction-following demonstrations,dataset,https://crfm.stanford.edu/2023/03/13/alpaca.html,,,,,,,,,,,\nopen,,2023-03-13,,\"['LLaMa', 'Alpaca dataset']\",\"Alpaca-7B is an instruction-following model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations.\n\",,Feedback can be provided on [[GitHub Issues]](https://github.com/tatsu-lab/stanford_alpaca/issues).,,Alpaca is intended and licensed for research use only.,CC BY NC 4.0 (model weights),text (English),,Alpaca,USA,Stanford,,,,7B parameters (dense model),model,https://crfm.stanford.edu/2023/03/13/alpaca.html,,unknown,,,,,,,,,\nopen,\"Merlin has been comprehensively evaluated on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification, phenotype classification, and zero-shot cross-modal retrieval, while model adapted tasks include 5-year chronic disease prediction, radiology report generation, and 3D semantic segmentation. It has undergone internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator).\",2024-09-08,,[],\"Merlin is a 3D Vision Language Model that's designed for interpretation of abdominal computed tomography (CT) scans. It uses both structured Electronic Health Record (EHR) and unstructured radiology reports for supervision without requiring additional manual annotations. The model was trained on a high-quality clinical dataset of paired CT scans, EHR diagnosis codes, and radiology reports and was evaluated on 6 task types and 752 individual tasks.\",,\"Feedback and reports for problems with the model should likely be routed to Stanford Center for Artificial Intelligence in Medicine and Imaging, or the corresponding author of the research (louis.blankemeier@stanford.edu).\",,\"This model is intended for use in the interpretation of abdominal computed tomography (CT) scans, chronic disease prediction, radiology report generation, and 3D semantic segmentation.\",Unknown,image; text,Unknown,Merlin,unknown,\"Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University\",\"The model should not be used outside of healthcare-related context, such as for personal or non-medical commercial purposes.\",The model has undergone extensive evaluations and also internal and external validation tests.,,Unknown,model,https://arxiv.org/pdf/2406.06512,unknown,Unknown,Single GPU.,Unknown,,,,,,,\nopen,Evaluated on AlpacaEval benchmark against SOTA LLMs.,2023-09-20,,[],\"Xwin-LM is a LLM, which on release, ranked top 1 on AlpacaEval, becoming the first to surpass GPT-4 on this benchmark.\",,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1/discussions,,,LLaMA2,text; text,,Xwin-LM,USA,Xwin,,,,70B parameters (dense),model,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1,unknown,unknown,unknown,,,,,,,\nlimited,,2023-04-13,,\"['Jurassic-2', 'Claude', 'Stable Diffusion', 'Amazon Titan', 'Claude 2', 'Cohere Command']\",\"Bedrock is a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. Bedrock is intended for customers to build and scale generative AI-based applications using FMs, democratizing access for all builders. using an API.\",,,,allowing companies to incorporate generative AI into their business models,unknown,,,Bedrock,USA,Amazon,,,,,application,https://aws.amazon.com/bedrock/,,,,,,,,foundation models made accessible via an API,https://aws.amazon.com/service-terms/,,\nopen,Evaluated against benchmarks that are specifically designed to assess the capabilities of LLMs in handling longer contexts.,2023-08-08,,['Falcon-40B'],\"FalconLite2 is a fine-tuned and quantized Falcon language model, capable of processing long (up to 24K tokens) input sequences.\",,https://huggingface.co/amazon/FalconLite2/discussions,,,Apache 2.0,text; text,,FalconLite2,USA,Amazon,,,,40B parameters (dense),model,https://huggingface.co/amazon/FalconLite2,https://huggingface.co/amazon/FalconLite2,unknown,unknown,unknown,,,,,,,\nopen,Chronos has been evaluated comprehensively on 42 datasets both in the in-domain (15 datasets) and zero-shot settings (27 datasets). Chronos outperforms task specific baselines in the in-domain setting and is competitive or better than trained models in the zero-shot setting.,2024-03-13,,['T5'],\"Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context.\",,https://github.com/amazon-science/chronos-forecasting/discussions,,\"Chronos can be used for zero-shot time series forecasting on univariate time series from arbitrary domains and with arbitrary horizons. Chronos models can also be fine-tuned for improved performance of specific datasets. Embeddings from Chronos encoder may also be useful for other time series analysis tasks such as classification, clustering, and anomaly detection.\",Apache 2.0,time-series; time-series,,Chronos,USA,Amazon,,\"Chronos was evaluated rigorously on 42 datasets, including 27 in the zero-shot setting against a variety of statistical and deep learning baselines.\",,710M parameters (dense),model,https://github.com/amazon-science/chronos-forecasting,https://huggingface.co/amazon/chronos-t5-large,,8 NVIDIA A100 40G GPUs,63 hours on p4d.24xlarge EC2 instance,,,,,,,\nlimited,Amazon Nova Pro is capable of processing up to 300K input tokens and sets new standards in multimodal intelligence and agentic workflows that require calling APIs and tools to complete complex workflows. It achieves state-of-the-art performance on key benchmarks including visual question answering ( TextVQA ) and video understanding ( VATEX ).,2024-12-03,,[],\"A new generation of state-of-the-art foundation models (FMs) that deliver frontier intelligence and industry leading price performance, available exclusively in Amazon Bedrock. Amazon Nova understanding models excel in Retrieval-Augmented Generation (RAG), function calling, and agentic applications.\",,unknown,,\"You can build on Amazon Nova to analyze complex documents and videos, understand charts and diagrams, generate engaging video content, and build sophisticated AI agents, from across a range of intelligence classes optimized for enterprise workloads.\",unknown,\"text, image, video; text\",unknown,Amazon Nova (Understanding),USA,Amazon Web Services (AWS),unknown,All Amazon Nova models include built-in safety controls and creative content generation models include watermarking capabilities to promote responsible AI use.,,unknown,model,https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/,unknown,unknown,unknown,unknown,,,,,,,\nlimited,Amazon Nova Canvas excels on human evaluations and key benchmarks such as text-to-image faithfulness evaluation with question answering (TIFA) and ImageReward.,2024-12-03,,[],\"A new generation of state-of-the-art foundation models (FMs) that deliver frontier intelligence and industry leading price performance, available exclusively in Amazon Bedrock.\",,unknown,,\"You can build on Amazon Nova to analyze complex documents and videos, understand charts and diagrams, generate engaging video content, and build sophisticated AI agents, from across a range of intelligence classes optimized for enterprise workloads.\",unknown,\"text, image;image, video\",unknown,Amazon Nova (Creative Content Generation),USA,Amazon Web Services (AWS),unknown,All Amazon Nova models include built-in safety controls and creative content generation models include watermarking capabilities to promote responsible AI use.,,unknown,model,https://aws.amazon.com/blogs/aws/introducing-amazon-nova-frontier-intelligence-and-industry-leading-price-performance/,unknown,unknown,unknown,unknown,,,,,,,\nopen,Reports results on standard translation benchmarks across 102 languages in comparison with Google Translate and ChatGPT,2023-05-29,,\"['LLaMA', 'CLUE', 'BigTrans parallel dataset']\",BigTrans is a model which adapts LLaMA that covers only 20 languages and enhances it with multilingual translation capability on more than 100 languages,,https://huggingface.co/James-WYang/BigTrans/discussions,,Advancing future research in multilingual LLMs,Apache 2.0,text; text,,BigTrans,China,Institute of Automation Chinese Academy of Sciences,,,,13B parameters (dense),model,https://arxiv.org/pdf/2305.18098v1.pdf,https://huggingface.co/James-WYang/BigTrans,unknown,16 A100 GPUs with 80 GB of RAM,unknown,,,,,,,\nopen,\"Evaluated on standard benchmarks for knowledge and language understanding, mathematical reasoning, and programming ability in comparison to similarly sized open-source models.\",2023-12-22,,[],YAYI 2 is an open source large language model trained in both English and Chinese.,,https://huggingface.co/wenge-research/yayi2-30b/discussions,,,custom,text; text,,YAYI 2,China,Institute of Automation Chinese Academy of Sciences,,\"data is deduplicated, normalized, cleaned, and filtered for toxicity\",,30B parameters (dense),model,https://arxiv.org/pdf/2312.14862.pdf,https://huggingface.co/wenge-research/yayi2-30b,unknown,over 1000 A800 GPUs,unknown,,,,,,,\nopen,\"Analyses of the data's composition, document statistics, language/dialectal coverage, topical distribution, and biases are conducted are conducted in the paper [[The Pile Paper]](https://arxiv.org/pdf/2101.00027.pdf).\n\",2021-01-01,https://arxiv.org/pdf/2201.07311.pdf,[],\"A large language model training dataset, used to train GPT-NeoX-20B.\n\",\"Authors report that they have excluded some datasets \"\"because they were too small to be worth spending time or because the English component of the data did not merit inclusion on its own. Three datasets were excluded for other reasons: (1) US Congressional Records were excluded because it \"\"reflects the opinions and biases of the political class over the past 200 years, including segregationism and xenophobia.\"\" (2) Online Fanfiction resources amounting to Hundreds of GiB were excluded on logistical grounds. (3) Literotica, platform where users can upload short-form erotic fiction, was excluded because the authors decided to exclude fanfiction, the corpus would require significant investigation, and corpus contain significant amount of stereotyping [[Appendix B]](https://arxiv.org/pdf/2101.00027.pdf).\n\",\"Feedback can be given by emailing the authors at contact at eleuther.ai.\n\",\"The Pile data come from 22 sources, with over half of the data being from Common Crawl (Pile-CC; 227GB), fiction and nonfiction books (Books3; 101GB), biomedical articles (PubMed Central; 90GB), and code (Github; 95 GB). Refer to the paper for full decomposition [[Table 1]](https://arxiv.org/pdf/2101.00027.pdf#table.caption.2).\n\",\"The Pile was intended to be used as a high quality large text dataset for language modeling tasks, explained in more detail in the paper [[Section 1]](https://arxiv.org/pdf/2101.00027.pdf#section.1).\n\",MIT,\"code, text\",,The Pile,USA,EleutherAI,,\"In addition to the data inclusion and exclusion decisions, the quality was controlled through filtering for English (pycld2 language classifier), filtering for documents similar to OpenWebText2 (classifier on CommonCrawl), and several forms of deduplication as detailed in the paper [[Appendix C]](https://arxiv.org/pdf/2101.00027.pdf#appendix.1.C) [[Appendix D]](https://arxiv.org/pdf/2101.00027.pdf#appendix.1.D).\n\",\"['...pot trending topics and the coverage around them. First up, there’s a bit of a visual redesign. Previously, clicking on a trending topic would highlight a story from one publication, and you’d have to scroll down past a live video section to view related stories. Facebook is replacing that system with a simple carousel, which does a better job of showing you different coverage options. To be clear, the change doesn’t affect how stories are sourced, according to Facebook. It’s still the same algorithm pickin...', 'Total knee arthroplasty (TKA) is a promising treatment for endstage osteoarthritis (OA) of the knee for alleviating pain and restoring the function of the knee. Some of the cases with bilateral TKA are symptomatic, necessitating revision arthroplasty in both the knees. A bilateral revision TKA can be done ei', 'On the converse, the set-valued map $\\\\Phi:[0,3]\\\\rightrightarrows [0,3]$ $$\\\\Phi(x):=\\\\left\\\\{\\\\begin{array}{ll} \\\\{1\\\\} & \\\\mbox{ if } 0\\\\leq x<1\\\\\\\\ {}[1,2] & \\\\mbox{ if } 1\\\\leq x\\\\leq 2\\\\\\\\ \\\\{2\\\\} &', 'This Court thus uses the same interpretation of V.R.C.P. 52(a) as it did *487 under the previous statutory requirement found in 12 V.S.A. § 2385.  In essense, the defendants urge that this Court should reconsider the case of Green Mountain Marble Co. v. Highway Board, supra, and follow the Federal practice of looking to the evide']\",825 GB,dataset,https://arxiv.org/pdf/2101.00027.pdf,,,,,,,,,,,\nopen,,2021-06-04,,['The Pile'],GPT-J is an open-source autoregressive language model.,,,,,Apache 2.0,text; text,,GPT-J,USA,EleutherAI,,,,6B parameters (dense),model,https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/,,,TRC (Unspecified # of TPU v3-8s),,,,,,,,\nopen,,2021-03-21,,['The Pile'],,,,,,MIT,text; text,,GPT-Neo,USA,EleutherAI,,,,2.7B parameters (dense),model,https://github.com/EleutherAI/gpt-neo,,,,,,,,,,,\nopen,\"The model was evaluated on standard NLP benchmarks: LAMBADA, ANLI, HellaSwag, MMLU among others [[Section 4]](http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf#section.4).\n\",2022-02-02,,['The Pile'],\"GPT-NeoX (20B) is an open-sourced autoregressive language model.\n\",,\"Feedback can be provided using the  # 20b channel in EleutherAI Discord group [[EleutherAI Blog Post]](https://blog.eleuther.ai/announcing-20b/). Find the Discord link in the FAQ page [[FAQ]](https://www.eleuther.ai/faq/).\n\",,\"As stated in the model card: \"\"GPT-NeoX-20B learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt. Due to the generality of the pretraining set, it has acquired the ability to generate completions across a wide range of tasks - from programming to fiction writing [[Model Card]](https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md).\"\"\n\",Apache 2.0,\"text; text, code\",,GPT-NeoX,USA,EleutherAI,,,,20B parameters (dense),model,http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf,https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md,31.73 tCO2e,12 x 8 A100 GPUs,47.10 petaflop/s-day,,,,,,,\nlimited,,,,['GPT-NeoX'],\"GooseAI API is an API service providing access to NLP services.\n\",,Email support,,\"Intended to be used as an NLP infrastructure.\n\",custom,,At will monitoring by the provider,GooseAI API,USA,GooseAI,\"Illegal or abusive activity, security violations, network abuse\n\",unknown,,,application,goose.ai,,,,,unknown,unknown,unknown,\"Text Generation, Text Completion\",https://goose.ai/docs/tos,unknown,\nopen,\"Evaluated by human testers rating alignment of text input, image output pairs.\",2022-09-04,,\"['VQGAN', 'CLIP']\",VQGAN-CLIP is a model that better generates and edits images using a multimodal encoder to guide image generation.,,,,,MIT,text; image,,VQGAN-CLIP,USA,EleutherAI,,,,227M parameters (dense),model,https://arxiv.org/pdf/2204.08583.pdf,,unknown,1 NVIDIA Tesla K80 GPU,Less than 1 V100-hour,,,,,,,\nopen,Evaluated on a variety of NLP benchmarks and found to perform similarly to OPT and BLOOM models.,2023-05-31,,['The Pile'],A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters,,https://huggingface.co/EleutherAI/pythia-6.9b/discussions,,,Apache 2.0,text; text,,Pythia,unknown,Eleuther AI,,,,12B parameters (dense),model,https://arxiv.org/pdf/2304.01373.pdf,https://huggingface.co/EleutherAI/pythia-12b,,64 A100 GPUs,,,,,,,,\nopen,Evaluated on math benchmarks in comparison to general large language models.,2023-10-16,,\"['Proof Pile 2', 'Code LLaMA']\",Llemma is a large language model for mathematics.,,https://huggingface.co/EleutherAI/llemma_34b/discussions,,,LLaMA 2,text; text,,Llemma,unknown,\"Princeton University, Eleuther AI\",,,,34B parameters (dense),model,https://arxiv.org/pdf/2310.10631.pdf,https://huggingface.co/EleutherAI/llemma_34b,unknown,256 A100 40GB GPUs,47k A100 hours,,,,,,,\nopen,,2023-10-16,https://huggingface.co/datasets/EleutherAI/proof-pile-2,\"['Common Crawl', 'OpenWebMath', 'Arxiv', 'RedPajama-Data']\",Proof Pile 2 is a corpus for language modeling of mathematics.,,https://huggingface.co/datasets/EleutherAI/proof-pile-2/discussions,\"scientific papers, web data containing mathematics, mathematical code\",,MIT,text,,Proof Pile 2,unknown,\"Princeton University, Eleuther AI\",,,[],55B tokens,dataset,https://arxiv.org/pdf/2310.10631.pdf,,,,,,,,,,,\nopen,\"The models were evaluated on SuperGLUE, CodeXGLUE, as well as MMLU and Bigbench Hard. Comparisons were made with T5v1.1 and found that Pile-T5 models performed better in most conditions.\",2024-04-15,,\"['The Pile', 'T5x', 'LLaMA', 'umT5']\",\"Pile-T5 is a version of the broadly used T5 model, but improved to eliminate weaknesses such as the omission of crucial code-related tokens. It utilizes LLaMA tokenizer and is trained on the Pile, offering enhancements for finetuning on downstream tasks, particularly those involving code.\",,unknown,,The model is aimed at downstream tasks that benefit from the encoder-decoder architecture. Particularly useful for tasks involving code.,unknown,text; text,unknown,Pile-T5,USA,EleutherAI,unknown,,,unknown,model,https://blog.eleuther.ai/pile-t5/,,unknown,unknown,2 million steps,,,,,,,\nopen,,2023-04-16,,['GPT-4 API'],Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model.,,,,,MIT,,,Auto-GPT,USA,Auto-GPT,,,,,application,https://news.agpt.co/,,,,,\"GPT-4 adapted to run autonomously by chaining together LLM \"\"thoughts\"\"\",,,text,,,\nopen,Evaluated across different text benchmarks in English and Chinese.,2023-10-01,,[],OpenBA is an open-sourced 15B bilingual (English + Chinese) asymmetric seq2seq model.,,https://huggingface.co/OpenBA/OpenBA-LM/discussions,,,Apache 2.0,text; text,,OpenBA,China,Soochow University,,,,15B parameters (dense),model,https://arxiv.org/pdf/2309.10706.pdf,https://huggingface.co/OpenBA/OpenBA-LM,6.5 tCO2eq,8 NVIDIA A100-80GB GPUs,38k GPU hours,,,,,,,\nlimited,,2023-03-21,,[],\"Reexpress One offers a means of document classification, semantic search, and uncertainty analysis on-device.\",,https://github.com/ReexpressAI/support,,,unknown,,unknown,Reexpress One,USA,Reexpress AI,,,,,application,https://re.express/index.html,,,,,,unknown,unknown,data analyses,hhttps://re.express/tos.html,unknown,\nclosed,,2021-08-11,,[],\"The dataset used to train the Jurassic-1 models, based on publicly available data.\",unknown,,unknown,unknown,unknown,text,,Jurassic-1 dataset,Israel,AI21 Labs,,,[],300B tokens,dataset,https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf,,,,,,,,,,,\nclosed,unknown,2022-12-01,,[],The dataset used to instruction-tune the Jurassic-1 Instruct models.,unknown,,unknown,unknown,unknown,text,,Jurassic-1 Instruct dataset,Israel,AI21 Labs,unknown,unknown,[],unknown,dataset,,,,,,,,,,,,\nlimited,\"Evaluated on several standard benchmarks (e.g. ARC, BoolQ, HellaSwag, RTE, Winogrande)\",2021-08-11,,['Jurassic-1 dataset'],\"Jurassic-1 is a family of autoregressive language models (Large, Grande, Jumbo).\",,,,unknown,unknown,text,,Jurassic-1,Israel,AI21 Labs,\"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.\",,,178B parameters (dense),model,https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf,,unknown,Over 800 A100 GPUs,Several months,,,,,,,\nlimited,,2022-12-01,,\"['Jurassic-1', 'Jurassic-1 Instruct dataset']\",Jurassic-1 Instruct is an instruction-tuned autoregressive language model.,,,,\"Jurassic-1 Instruct was trained specifically to handle instructions-only prompts (\"\"zero-shot\"\") without examples (\"\"few-shot\"\"). It is the most natural way to interact with language models, and it is the best way to get a sense of the optimal output for your task without any examples.\",unknown,text; text,,Jurassic-1 Instruct,Israel,AI21 Labs,\"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.\",unknown,,17B parameters (dense),model,https://docs.ai21.com/docs/jurassic-1-instruct-beta,,unknown,unknown,unknown,,,,,,,\nlimited,The model was evaluated on the HELM benchmark as discussed in https://www.ai21.com/blog/introducing-j2.,2023-03-09,,[],Jurassic-2 is a family of language models designed to replace Jurassic-1.,,,,,unknown,text; text,unknown,Jurassic-2,Israel,AI21 Labs,\"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.\",unknown,,unknown,model,https://docs.ai21.com/docs/jurassic-2-models,,unknown,unknown,unknown,,,,,,,\nlimited,,2021-08-11,,\"['Jurassic-1', 'Jurassic-1 Instruct', 'Jurassic-2', 'AI21 Summarization API', 'AI21 Paraphrase API']\",The AI21 Labs Playground supports several task-specific APIs in addition to a variety of models.,,Feedback can be given by emailing at info at ai21.com,,\"The intended uses are text completion, rewriting, and summarization.\",,,unknown,AI21 Playground,Israel,AI21 Labs,\"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.\",unknown,,,application,https://studio.ai21.com/playground/,,,,,The Playground provides direct access to the language models (Complete API) as well as wrapped for Rewrite and Summarize.,unknown,unknown,text,https://www.ai21.com/terms-of-use,unknown,\nlimited,,2023-03-09,,['Jurassic-2'],AI21 Studio's Paraphrase API offers access to our world-class paraphrasing engine. It has been specifically developed for suggesting alternative ways to convey the same message using different words.,,Feedback can be given by emailing at info at ai21.com,,The intended uses are text paraphrasing.,,,unknown,AI21 Paraphrase API,Israel,AI21 Labs,\"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.\",unknown,,,application,https://docs.ai21.com/docs/paraphrase-api,,,,,The AI21 language models are further specialized to the task of paraphrasing.,unknown,unknown,text,https://www.ai21.com/terms-of-use,unknown,\nlimited,,2023-03-09,,['Jurassic-2'],AI21 Studio's Summarize API offers access to our world-class summarization engine. It has been specifically developed for reading long texts and providing a faithful summary of the original document.,,Feedback can be given by emailing at info at ai21.com,,The intended uses are text paraphrasing.,,,unknown,AI21 Summarization API,Israel,AI21 Labs,\"Illegal activities, such as hate speech, gambling, child pornography or violating intellectual property rights; Harassment, victimization, intimidation, fraud or spam; Creation or dissemination of misinformation, promotion of self-harm, glorification of violent events or incitement of violence.\",unknown,,,application,https://docs.ai21.com/docs/summarize-api,,,,,The AI21 language models are further specialized to the task of summarization.,unknown,unknown,text,https://www.ai21.com/terms-of-use,unknown,\nlimited,,2020-10-27,,['AI21 Paraphrase API'],\"Wordtune, the first AI-based writing companion that understands context and meaning.\",,,,The Wordtune assistant is a writing assistant,Wordtune License,,unknown,Wordtune,Israel,AI21 Labs,,unknown,,,application,https://www.wordtune.com/,,,,,unknown,unknown,unknown,text,https://www.wordtune.com/terms-of-use,unknown,\nlimited,,2021-11-16,,['AI21 Summarize API'],\"Wordtune Read is an AI reader that summarizes long documents so you can understand more, faster.\",,,,,Wordtune License,,unknown,Wordtune Read,Israel,AI21 Labs,,unknown,,,application,https://www.wordtune.com/read,,,,,unknown,unknown,unknown,text,https://www.wordtune.com/terms-of-use,unknown,\nopen,Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.,2024-03-28,,[],\"Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. Jamba is the world’s first production-grade Mamba based model.\",,https://huggingface.co/ai21labs/Jamba-v0.1/discussions,,\"intended for use as a foundation layer for fine tuning, training\",Apache 2.0,text; text,,Jamba,Israel,AI21 Labs,,,,52B parameters (sparse),model,https://www.ai21.com/blog/announcing-jamba,https://huggingface.co/ai21labs/Jamba-v0.1,unknown,unknown,unknown,,,,,,,\nopen,\"The models were evaluated based on their ability to handle long contexts, speed, and quality. They outperformed competitors in their size class, scoring high on the Arena Hard benchmark.\",2024-08-22,,[],\"A family of models that demonstrate superior long context handling, speed, and quality. Built on a novel SSM-Transformer architecture, they surpass other models in their size class. These models are useful for enterprise applications, such as lengthy document summarization and analysis. The Jamba 1.5 family also includes the longest context window, at 256K, among open models. They are fast, quality-focused, and handle long contexts efficiently.\",,Unknown,,\"The models are built for enterprise scale AI applications. They are purpose-built for efficiency, speed, and ability to solve critical tasks that businesses care about, such as lengthy document summarization and analysis. They can also be used for RAG and agentic workflows.\",Jamba Open Model License,text; text,Unknown,Jamba 1.5,unknown,AI21,Unknown,\"The models were evaluated on the Arena Hard benchmark. For maintaining long context performance, they were tested on the RULER benchmark.\",,94B parameters,model,https://www.ai21.com/blog/announcing-jamba-model-family,unknown,Unknown,\"For speed comparisons, Jamba 1.5 Mini used 2xA100 80GB GPUs, and Jamba 1.5 Large used 8xA100 80GB GPUs.\",Unknown,,,,,,,\nlimited,,2022-09-16,,[],Character allows users to converse with various chatbot personas.,,,,,unknown,,,Character,USA,Character AI,,,,,application,https://beta.character.ai/,,,,,,unknown,unknown,AI-generated chat conversations,https://beta.character.ai/tos,unknown,\nclosed,,2021-12-23,,[],ERNIE 3.0 Titan is a language model,,,,unknown,unknown,text; text,,ERNIE 3.0 Titan,China,\"Baidu, PengCheng Laboratory\",unknown,,,260B parameters (dense),model,https://arxiv.org/abs/2112.12731,,unknown,\"Baidu V100 Cluster, PengCheng Lab Ascend 910 NPU cluster\",unknown,,,,,,,\nlimited,,2021-12-31,,[],ERNIE-ViLG is a model for text-to-image generation,,,,unknown,,text; image,,ERNIE-ViLG,China,Baidu,unknown,,,10B parameters (dense),model,https://arxiv.org/abs/2112.15283,,unknown,unknown,unknown,,,,,,,\nclosed,,2022-10-27,,[],ERNIE-ViLG is a model for text-to-image generation,,,,unknown,unknown,text; image,,ERNIE-ViLG 2.0,China,Baidu,unknown,,,10B parameters (dense),model,https://arxiv.org/abs/2210.15257,,unknown,320 A100 GPUs according to [[the paper]](https://arxiv.org/abs/2210.15257),18 days according to [[the paper]](https://arxiv.org/abs/2210.15257),,,,,,,\nlimited,,2023-10-17,,[],ERNIE-4.0 is a multimodal generalist foundation model.,,,,unknown,unknown,\"text; image, text, video\",,ERNIE 4.0,China,Baidu,unknown,,,unknown,model,https://www.prnewswire.com/news-releases/baidu-launches-ernie-4-0-foundation-model-leading-a-new-wave-of-ai-native-applications-301958681.html,,unknown,unknown,unknown,,,,,,,\nclosed,Outperforms GPT-4 in common sense and reasoning tasks on the basis of both efficiency and accuracy.,2023-06-20,,\"['GPT-4', 'Claude', 'Falcon-40B']\",Ocean-1 is the culmination of Cresta's experience in deploying generative AI systems for large enterprises and signifies their latest milestone in advancing the cutting edge AI technology for customer facing conversations.,,,,Acting as a contact center chatbot agent.,unknown,text; text,unknown,Ocean-1,USA,Cresta,,,,7B parameters (dense),model,https://cresta.com/blog/introducing-ocean-1-worlds-first-contact-center-foundation-model/,,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on English and Chinese language benchmarks.,2023-05-27,,[],\"CPM-Bee is a fully open-source, commercially-usable Chinese-English bilingual base model with a capacity of ten billion parameters.\",,https://huggingface.co/openbmb/cpm-bee-10b/discussions,,You can use the raw model for many NLP tasks like text generation or fine-tune it to a downstream task.,custom,text; text,unknown,CPM Bee,International,OpenBMB,,,,10B parameters (dense),model,https://github.com/OpenBMB/CPM-Bee,https://huggingface.co/openbmb/cpm-bee-10b,unknown,unknown,unknown,,,,,,,\nopen,Randomly chosen models trained on UltraFeedback evaluated across standard benchmarks.,2023-09-26,https://huggingface.co/datasets/openbmb/UltraFeedback,[],\"UltraFeedback is a large-scale, fine-grained, diverse preference dataset, used for training powerful reward models and critic models.\",,https://huggingface.co/datasets/openbmb/UltraFeedback/discussions,,,MIT,text,unknown,UltraFeedback,International,OpenBMB,,,[],256k samples,dataset,https://github.com/OpenBMB/UltraFeedback,,,,,,,,,,,\nopen,Evaluated on open-sourced general benchmarks in comparison to SotA LLMs.,2024-02-01,,[],\"MiniCPM is an End-Side LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings (2.7B in total).\",,https://huggingface.co/openbmb/MiniCPM-V/discussions,,,custom,text; text,unknown,MiniCPM,International,OpenBMB,,,,2.4B parameters (dense),model,https://github.com/OpenBMB/MiniCPM/,https://huggingface.co/openbmb/MiniCPM-V,unknown,unknown,unknown,,,,,,,\nopen,The model was comprehensively benchmarked across 12 tests covering five tasks. Eurus achieved the best overall performance among open-source models of similar sizes and even outperformed specialized models in many cases.,2024-04-02,,\"['Eurus SFT', 'UltraInteract', 'UltraFeedback']\",Eurus is a suite of large language models (LLMs) optimized for reasoning.,,https://huggingface.co/openbmb/Eurus-70b-nca/discussions,,The model can be used for reasoning tasks and is especially tailored for coding and math following specific prompts.,Apache 2.0,text; text,unknown,Eurus,International,OpenBMB,,,,70B parameters,model,https://arxiv.org/abs/2404.02078,https://huggingface.co/openbmb/Eurus-70b-nca,unknown,unknown,unknown,,,,,,,\nlimited,,2019-12-17,,['OpenAI API'],\"AI Dungeon is a single-player text adventure game that uses AI to generate content.\n\",,,,,custom,,,AI Dungeon,USA,Latitude,,,,,application,https://play.aidungeon.io,,,,,,,,,https://play.aidungeon.io/main/termsOfService,,\nopen,,2023-05-06,,[],Portkey is a hosted middleware that allows users to create generative AI applications,,,,,,,,Portkey,USA,Portkey,,,,,application,https://portkey.ai/,,,,,,,,generative AI apps,https://portkey.ai/terms,,\nopen,,2023-03-08,,['Anthropic API'],The first Instant Answer in DuckDuckGo search results to use natural language technology to generate answers to search queries using Wikipedia and other related sources,,,,,unknown,,,DuckAssist,USA,DuckDuckGo,,,,,application,https://spreadprivacy.com/duckassist-launch/,,,,,,,,,,,\nopen,,2022-04-16,,[],,,,,,Apache 2.0,NLP tasks,,NaturalInstructions-v2,USA,AI2,,,[],1600 tasks,dataset,https://arxiv.org/abs/2204.07705,,,,,,,,,,,\nopen,\"Randomly sampled dialogues from dataset are evaluated according to six established criteria of natural flow, context dependence, topic consistency, speaker consistency, specificity, and overall.\",2023-05-24,https://huggingface.co/datasets/allenai/soda,[],\"SODA is the first publicly available, million-scale, high-quality dialogue dataset covering a wide range of social interactions.\",,,,,CC BY 4.0,text,,SODA,USA,AI2,,,[],1.5M dialogues,dataset,https://arxiv.org/pdf/2212.10465.pdf,,,,,,,,,,,\nopen,Conducted experiments on models trained with Multimodal C4 in comparison to models trained on single image/caption datasets,2023-06-09,,['C4'],An augmentation of C4 with images added and made openly available.,,,,,ODC-By,\"image, text\",,Multimodal C4,USA,AI2,,,[],43B English tokens with 101.2M documents and 571M images,dataset,https://arxiv.org/pdf/2304.06939.pdf,,,,,,,,,,,\nopen,Evaluated by human testers on generalization capabilities and responses compared to other chatbots.,2023-05-24,,\"['SODA', 'ProsocialDialog', 'T5']\",COSMO is a conversation agent with greater generalizability on both in- and out-of-domain chitchat datasets,,https://huggingface.co/allenai/cosmo-xl/discussions,,,,text; text,,COSMO,USA,AI2,,,,11B parameters (dense),model,https://arxiv.org/pdf/2212.10465.pdf,https://huggingface.co/allenai/cosmo-xl,unknown,v3-128 TPU accelerators with batch size 256,unknown,,,,,,,\nopen,Compared with other open and closed datasets in regards to size and quality control.,2023-08-18,https://drive.google.com/file/d/12gOf5I5RytsD159nSP7iim_5zN31FCXq/view,[],\"Dolma is a dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials\",,,,,AI2 ImpACT,text,\"Quality filtration, deduplication, and risk mitigation via logistic qualifiers and regular expressions used.\",Dolma,USA,AI2,,,[],3T tokens,dataset,https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64,,,,,,,,,,,\nopen,Models trained with dataset evaluated on downstream performance.,2023-11-20,https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture,\"['FLAN Collection', 'Open Assistant 1', 'ShareGPT', 'Alpaca dataset', 'Code Alpaca', 'LIMA', 'WizardLM', 'OpenOrca']\",Tulu-V2-mix is a dataset composed of many high-quality instruction datasets that results in stronger performance across a variety of reasoning and knowledge-probing tasks.,,https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture/discussions,,,ODC-BY,text,,Tulu-V2-mix,USA,AI2,,,[],unknown,dataset,https://arxiv.org/pdf/2311.10702.pdf,,,,,,,,,,,\nopen,Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,2023-11-20,,\"['LLaMA 2', 'Tulu-V2-mix']\",Tulu 2 is a language model trained on the new Tulu-v2-mix dataset and fine-tuned on more state of the art language models.,,https://huggingface.co/allenai/tulu-2-70b/discussions,,,AI2 ImpACT,text; text,,Tulu 2,USA,AI2,,,,70B parameters (dense),model,https://arxiv.org/pdf/2311.10702.pdf,https://huggingface.co/allenai/tulu-2-70b,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,2023-11-20,,\"['LLaMA 2', 'Tulu-V2-mix']\",\"Tulu 2 DPO is created in a similar manner to Tulu 2, but with Direct Preference Optimization (DPO).\",,https://huggingface.co/allenai/tulu-2-dpo-70b/discussions,,,AI2 ImpACT,text; text,,Tulu 2 DPO,USA,AI2,,,,70B parameters (dense),model,https://arxiv.org/pdf/2311.10702.pdf,https://huggingface.co/allenai/tulu-2-dpo-70b,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on MT-Bench and AlpacaEval. compared to other chatbots.,2023-11-20,,\"['Code LLaMA', 'Tulu-V2-mix']\",\"Code Tulu 2 is a fine-tuned version of Code LLaMA that was trained on a mix of publicly available, synthetic and human datasets.\",,https://huggingface.co/allenai/codetulu-2-13b/discussions,,,AI2 ImpACT,\"text; code, text\",,Code Tulu 2,USA,AI2,,,,13B parameters (dense),model,https://arxiv.org/pdf/2311.10702.pdf,https://huggingface.co/allenai/codetulu-2-13b,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluated on standard LLM tasks and benchmarks in comparison to LLaMA, Falcon, and MPT, in addition to other same-sized models.\",2024-02-01,,['Dolma'],\"Open Language Model (OLMo) is designed to provide access to data, training code, models, and evaluation code necessary to advance AI through open research to empower academics and researchers to study the science of language models collectively.\",,https://huggingface.co/allenai/OLMo-7B/discussions,,,Apache 2.0,text; text,unknown,OLMo,USA,AI2,,training data from Dolma filtered and deduplicated before being trained on.,,7B parameters (dense),model,https://allenai.org/olmo/olmo-paper.pdf,https://huggingface.co/allenai/OLMo-7B,75.05 tCo2eq,\"27 nodes, with each node containing 8x NVIDIA A100-40GB GPUs provided by MosaicML\",unknown,,,,,,,\nopen,,2023-09-09,https://huggingface.co/datasets/allenai/MADLAD-400,['Common Crawl'],\"MADLAD-400 is a document-level multilingual dataset based on Common Crawl, covering 419 languages in total.\",,https://huggingface.co/datasets/allenai/MADLAD-400/discussions,,,CC BY 4.0,text,unknown,MADLAD-400,USA,AI2,,,[],3 trillion tokens,dataset,https://arxiv.org/abs/2309.04662,,,,,,,,,,,\nclosed,Extensive scaling laws analysis has been performed on Evo considering beyond-transformer architectures.,2024-09-30,,['StripedHyena'],\"Evo is a foundation model based on StripedHyena, which is a deep signal processing architecture. It is capable of generalizing across DNA, RNA, and proteins and is used for generative design of new CRISPR systems.\",,Feedback or inquiries can be addressed to the Liquid AI team.,,\"The Evo model is intended for generative design of new CRISPR systems, and it generalizes across DNA, RNA, and proteins.\",unknown,text; text,unknown,Evo,USA,Liquid AI,unknown,The model performance has been extensively evaluated with scaling laws analysis.,,unknown,model,https://www.liquid.ai/research/liquid-neural-networks-research,unknown,unknown,unknown,unknown,,,,,,,\nclosed,,2021-03-24,,['OpenAI API'],\"Crisis Contact Simulator, developed as part of a collaboration with Google.org, helps train The Trevor Project counselors by mimicking to be a teen in crisis. Crisis Contact Simulator is used as part of the training programs for the Trevor Project's 24/7 digital crisis services that supports LGBTQ youth [[Trevor Project Blog]](https://www.thetrevorproject.org/blog/the-trevor-project-launches-new-ai-tool-to-support-crisis-counselor-training/).\n\",,unknown,,\"Training counselors\n\",unknown,,unknown,Crisis Contact Simulator,USA,The Trevor Project,unknown,Limited release,,,application,https://www.thetrevorproject.org/,,,,,Fine-tuning,unknown,unknown,Dialogue,unknown,unknown,\nopen,\"Evaluated on benchmarks pertaining to speech, music, and other audio recognition.\",2023-10-20,,\"['Whisper', 'BEATs', 'Vicuna']\",\"SALMONN is a large language model (LLM) enabling speech, audio event, and music inputs.\",,https://huggingface.co/MSIIP/SALMONN/discussions,,,Apache 2.0,\"audio, text; text\",,SALMONN,unknown,\"ByteDance, Tsinghua University\",,,,unknown,model,https://github.com/bytedance/SALMONN,https://huggingface.co/MSIIP/SALMONN,unknown,unknown,unknown,,,,,,,\nopen,Evaluated via qualitative comparison relative to other SoTA image generation models.,2024-02-21,,['Stable Diffusion XL'],\"SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps. The models are distilled from stabilityai/stable-diffusion-xl-base-1.0. This repository contains checkpoints for 1-step, 2-step, 4-step, and 8-step distilled models.\",,https://huggingface.co/ByteDance/SDXL-Lightning/discussions,,\"The model can be used for fast, high-quality text-to-image generation. It supports 1-step, 2-step, 4-step, and 8-step distilled models which provide varying generation quality.\",OpenRail++,text; image,unknown,SDXL-Lightning,China,ByteDance,unknown,unknown,,unknown,model,https://arxiv.org/pdf/2402.13929.pdf,https://huggingface.co/ByteDance/SDXL-Lightning,unknown,64 A100 80G GPUs,unknown,,,,,,,\nopen,\"LLaVA-Critic was tested in scenarios such as LMM-as-a-Judge and Preference Learning, showing a high correlation with commercial GPT models in evaluation scores. It served as an alternative to expensive human feedback in resource-constrained settings and demonstrated better performance in providing AI-generated feedback for model alignment compared to human-reliant reward models.\",2024-10-06,,[],\"LLaVA-Critic is an open-source large multimodal model (LMM) designed as a generalist evaluator. It assesses performance across a variety of multimodal tasks by following a high-quality critic instruction dataset, incorporating diverse evaluation criteria. The model is effective in areas like LMM-as-a-Judge, providing reliable evaluation scores comparable to GPT models, and Preference Learning, offering reward signals for preference learning to enhance model alignment capabilities.\",,unknown,,\"The model can be used for evaluating multimodal tasks, generating reward signals for preference learning, and serving as a reliable alternate judge for model assessments.\",Apache 2.0,\"image, text; text\",unknown,LLaVA-Critic,unknown,\"ByteDance and University of Maryland, College Park\",\"The model should not be used in scenarios requiring authorization from proprietary models, nor relied upon for critical applications without human oversight due to potential biases in dataset.\",\"The model ensures quality by utilizing a high-quality dataset for critic instructions, providing both quantitative judgments and reasoning, with transparency in assessments.\",,unknown,model,https://arxiv.org/pdf/2410.02712,unknown,unknown,unknown,unknown,,,,,,,\nlimited,,2023-02-01,,[],UnderwriteGPT is the world's first generative AI underwriting tool.,,,,,,,,UnderwriteGPT,USA,Paladin Group and Dais Technology,,,,,application,https://dais.com/underwritegpt/,,,,,,,,,,,\nclosed,\"Boasts the highest performance among the Korean LLMs of similar sizes that have been released to date, according to internal evaluations.\",2023-08-16,,[],VARCO-LLM is NCSOFT’s large language model and is trained on English and Korean.,,,,\"Developing various NLP-based AI services such as Q&A, chatbot, summarization, information extraction\",custom,text; text,,VARCO-LLM,South Korea,NCSOFT,,,,13B parameters,model,https://github.com/ncsoft/ncresearch,,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on the SuperGLUE benchmark,2023-01-01,,['Writer dataset'],Palmyra is a family of privacy-first LLMs for enterprises trained on business and marketing writing.,,https://huggingface.co/Writer/palmyra-base/discussions,,generating text from a prompt,Apache 2.0,text; text,,Palmyra,USA,Writer,,,,20B parameters (dense),model,https://gpt3demo.com/apps/palmyra,https://huggingface.co/Writer/palmyra-base,unknown,,unknown,,,,,,,\nopen,,2023-04-01,,\"['Palmyra', 'Camel dataset']\",Camel is an instruction-following large language model tailored for advanced NLP and comprehension capabilities.,,https://huggingface.co/Writer/camel-5b-hf/discussions,,,Apache 2.0,text; text,,Camel,USA,Writer,,,,5B parameters (dense),model,https://chatcamel.vercel.app/,https://huggingface.co/Writer/camel-5b-hf,unknown,,unknown,,,,,,,\nopen,\"The model was evaluated across 9 diverse biomedical datasets where it achieved state-of-the-art results with an average score of 85.9%. It also demonstrated robust capability in efficiently processing extensive medical documents, as showcased by its near-perfect score in the NIH evaluation.\",2024-09-08,,['Palmyra-X-004'],\"Palmyra-Med-70b-32k is a Language Model designed specifically for healthcare and biomedical applications. It builds upon the foundation of Palmyra-Med-70b and offers an extended context length. This model integrates the DPO dataset, a custom medical instruction dataset, and has been fine-tuned to meet the unique requirements of the medical and life sciences sectors. It is ranked as the leading LLM on biomedical benchmarks with an average score of 85.87%.\",,Downstream problems with this model should be reported via email to Hello@writer.com.,,\"Palmyra-Med-70b-32k is intended for non-commercial and research use in English. Specifically, it can be used for tasks like clinical entity recognition and knowledge discovery from EHRs, research articles, and other biomedical sources. It excels in analyzing and summarizing complex clinical notes, EHR data, and discharge summaries.\",Writer open model,text; text,Measures in place to monitor misuse include the addition of watermarks in all models built by Writer.com to detect and prevent misuse and illegal use.,Palmyra-Med-70b-32k,USA,Writer,\"The model should not be used in any manner that violates applicable laws or regulations. It is not to be used in direct patient care, clinical decision support, or professional medical purposes. The model should not replace professional medical judgment.\",The model has been refined using Policy Optimization and a finely crafted fine-tuning dataset. It contains watermarks to detect and prevent misuse and illegal use.,,70B parameters,model,https://huggingface.co/Writer/Palmyra-Med-70B-32K,https://huggingface.co/Writer/Palmyra-Med-70B-32K,Unknown,Unknown,Unknown,,,,,,,\nopen,\"The model has been evaluated internally, showing state-of-the-art results on various financial datasets. It has shown 100% accuracy in needle-in-haystack tasks and superior performance in comparison to other models in the organization's internal finance evaluations. It passed the CFA Level III test with a score of 73% and has shown superior performance compared to other models in the long-fin-eval, an internally created benchmark that simulates real-world financial scenarios.\",2024-09-08,,\"['Palmyra-X-004', 'Writer in-house financial instruction dataset']\",\"Palmyra-Fin-70B-32K is a leading LLM built specifically to meet the needs of the financial industry. It has been fine-tuned on an extensive collection of high-quality financial data and it is highly adept at handling the specific needs of the finance field. It outperforms other large language models in various financial tasks and evaluations, achieving state-of-the-art results across various financial datasets. Its strong performance in tasks like financial document analysis, market trend prediction, risk assessment underscores its effective grasp of financial knowledge.\",,Downstream problems with this model should be reported to Hello@writer.com.,,\"The model is intended for use in English for financial analysis, market trend prediction, risk assessment, financial report generation, automated financial advice, and answering questions from long financial documents. It can be used for entity recognition, identifying key financial concepts such as market trends, economic indicators, and financial instruments from unstructured text.\",Writer open model license,text; text,Unknown,Palmyra-Fin-70B-32K,USA,Writer,\"The model should not be used in manners that violate applicable laws or regulations, including trade compliance laws, use prohibited by Writer's acceptable use policy, the Writer open model license, and in languages other than English. It is advised not to use the model for direct financial decision-making or professional financial advice without human oversight. Always consult a qualified financial professional for personal financial needs.\",\"The model was trained with a proprietary internal database and a fine-tuning recipe to ensure a greater level of domain-specific accuracy and fluency. Still, the model may contain inaccuracies, biases, or misalignments and its usage for direct financial decision-making or professional financial advice without human oversight is not recommended. It has not been rigorously evaluated in real-world financial settings and it requires further testing, regulatory compliance, bias mitigation, and human oversight for more critical financial applications.\",,70B parameters (dense),model,https://huggingface.co/Writer/Palmyra-Fin-70B-32K,https://huggingface.co/Writer/Palmyra-Fin-70B-32K,Unknown,Unknown,Unknown,,,,,,,\nopen,Performance evaluated on English and Chinese language benchmark tests.,2023-03-14,,[],\"ChatGLM is a Chinese-English language model with question and answer and dialogue functions, and is aimed at a Chinese audience.\",,,,,Apache 2.0,text; text,,ChatGLM,China,ChatGLM,,,,6B parameters (dense),model,https://chatglm.cn/blog,,unknown,,unknown,,,,,,,\nopen,\"Deepseek and baseline models (for comparison) evaluated on a series of representative benchmarks, both in English and Chinese.\",2023-11-28,,[],Deepseek is a 67B parameter model with Grouped-Query Attention trained on 2 trillion tokens from scratch.,,https://huggingface.co/deepseek-ai/deepseek-llm-67b-base/discussions,,,custom,text; text,unknown,Deepseek,USA,Deepseek AI,,Training dataset comprised of diverse data composition and pruned and deduplicated.,,67B parameters (dense),model,https://github.com/deepseek-ai/DeepSeek-LLM,https://huggingface.co/deepseek-ai/deepseek-llm-67b-base,unknown,unknown,unknown,,,,,,,\nopen,\"Deepseek and baseline models (for comparison) evaluated on a series of representative benchmarks, both in English and Chinese.\",2023-11-29,,['Deepseek'],Deepseek Chat is a 67B parameter model initialized from Deepseek and fine-tuned on extra instruction data.,,https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat/discussions,,,custom,text; text,unknown,Deepseek Chat,USA,Deepseek AI,,Training dataset comprised of diverse data composition and pruned and deduplicated.,,67B parameters (dense),model,https://github.com/deepseek-ai/DeepSeek-LLM,https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluated on code generation, code completion, cross-file code completion, and program-based math reasoning across standard benchmarks.\",2023-11-03,,[],\"Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.\",,https://huggingface.co/deepseek-ai/deepseek-coder-33b-base/discussions,,,custom,text; code,unknkown,Deepseek Coder,USA,Deepseek AI,,,,33B parameters (dense),model,https://github.com/deepseek-ai/DeepSeek-Coder,https://huggingface.co/deepseek-ai/deepseek-coder-33b-base,unknown,8 NVIDIA A100 GPUs and 8 NVIDIA H800 GPUs,unknown,,,,,,,\nopen,\"Evaluated on MATH, a competition-level dataset, and achieves a 46% accuracy, higher than accuracy produced by GPT-4's chain of thought.\",2023-09-11,,\"['MathInstruct', 'LLaMA', 'Code LLaMA']\",MAmmoTH is a series of open-source large language models (LLMs) specifically tailored for general math problem-solving.,,,,,MIT,text; text,,MAmmoTH,USA,Ohio State University,,,,34B parameters (dense),model,https://arxiv.org/pdf/2309.05653.pdf,,,,,,,,,,,\nopen,,2023-03-01,,['ChatGPT API'],\"My AI offers Snapchatters a friendly, customizable chatbot at their fingertips that offers recommendations, and can even write a haiku for friends in seconds. Snapchat, where communication and messaging is a daily behavior, has 750 million monthly Snapchatters.\",,,,,custom,,,My AI for Snapchat,USA,Snap,,,,,application,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,,,,https://snap.com/terms,,\nopen,\"Models have been evaluated on a series of math benchmarks, demonstrating outperformance of the state-of-the-art models in both the English and Chinese language.\",2024-08-08,,[],\"Qwen2-Math is a series of specialized math language models built upon the Qwen2 large language models, with a focus on enhancing the reasoning and mathematical capabilities. Their intended use is for solving complex mathematical problems. They significantly outperform both open-source and closed-source models in terms of mathematical capabilities.\",,Problems with the model should be reported to the Qwen Team via their official channels.,,These models are intended for solving complex mathematical problems.,Tongyi Qianwen,text; text,Unknown,Qwen2-Math,China,Qwen Team,Uses that go against the ethical usage policies of Qwen Team.,The models were tested with few-shot chain-of-thought prompting and evaluated across mathematical benchmarks in both English and Chinese.,,72B parameters,model,https://qwenlm.github.io/blog/qwen2-math/,https://huggingface.co/Qwen/Qwen2-Math-72B,Unknown,Unknown,Unknown,,,,,,,\nopen,,2022-12-07,,\"['GPT-3.5', 'Bing Search']\",Perplexity Ask is a new search interface that uses advanced artificial intelligence technologies,,,,,,,,Perplexity Ask,USA,Perplexity,,,,,application,https://www.perplexity.ai/,,,,,,,,,,,\nclosed,,2022-12-15,,\"['Perplexity Ask', 'OpenAI API']\",Twitter search interface that is powered by Perplexity's structured search engine.,,,,,,,,Bird SQL,USA,Perplexity,,,,,application,https://www.perplexity.ai/sql,,,,,,,,,,,\nopen,,2023-10-27,,[],Perplexity chat is an AI chatbot trained in-house by Perplexity.,,,,,,,,Perplexity Chat,USA,Perplexity,,,,,application,https://labs.perplexity.ai/,,,,,,,,Chatbot output in response to user queries,https://blog.perplexity.ai/legal/terms-of-service,,\nopen,,2023-03-01,,['Whisper API'],Speak is an AI-powered language learning app focused on building the best path to spoken fluency and is the the fastest-growing English app in South Korea.,,,,,,,,Speak,USA,Speak,,,,,application,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,,,,,,\nopen,Reports results on the Vicuna benchmark and compares performance level and time expenditure with ChatGPT,2023-05-23,,\"['QLoRA', 'OASST1']\",\"Guanaco is a model family trained with QLORA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance.\",,,,,MIT,text; text,,Guanaco,USA,University of Washington,,,,33B parameters (dense),model,https://arxiv.org/pdf/2305.14314v1.pdf,,,A single 24 GB GPU,,,,,,,,\nopen,Evaluated on benchmark music understanding tasks on SOTA music datasets.,2023-10-11,,\"['LLaMA 2', 'Jukebox']\",Llark is an instruction-tuned multimodal model for music understanding.,,,,,Apache 2.0,\"audio, text; text\",,Llark,USA,\"University of Washington, Spotify\",,,,12B parameters (dense),model,https://arxiv.org/pdf/2310.07160.pdf,,unknown,4 80GB NVIDIA A40 GPUs,54 hours,,,,,,,\nopen,\"The model was tested in the generation of a new green fluorescent protein. Its effectiveness was compared to natural evolutionary processes, and it was deemed to simulate over 500 million years of evolution.\",2024-06-25,,[],\"ESM3 is the first generative model for biology that simultaneously reasons over the sequence, structure, and function of proteins. It is trained across the natural diversity of Earth, reasoning over billions of proteins from diverse environments. It advances the ability to program and create with the code of life, simulating evolution, and making biology programmable. ESM3 is generative, and scientists can guide the model to create proteins for various applications.\",,Unknown,,\"To engineer biology from first principles. It functions as a tool for scientists to create proteins for various applications, including medicine, biology research, and clean energy.\",Unknown,\"text; image, text\",Unknown though specific measures are not specified.,ESM3,USA,EvolutionaryScale,Unknown,\"The creators have put in place a responsible development framework to ensure transparency and accountability from the start. ESM3 was tested in the generation of a new protein, ensuring its quality and effectiveness.\",,98B parameters (Dense),model,https://www.evolutionaryscale.ai/blog/esm3-release,unknown,Unknown,unknown,Unknown,,,,,,,\nopen,Evaluated on standard benchmarks across a range of tasks.,2023-10-21,,\"['Qwen', 'OpenOrca', 'Open Platypus']\",CausalLM is an LLM based on the model weights of Qwen and trained on a model architecture identical to LLaMA 2.,,,,,WTFPL,text; text,unknown,CausalLM,USA,CausalLM,,,,14B parameters (dense),model,https://huggingface.co/CausalLM/14B,https://huggingface.co/CausalLM/14B,unknown,unknown,unknown,,,,,,,\nopen,Evaluated against similar LLMs using GPT-4 as a judge.,2023-03-30,,\"['LLaMA', 'ShareGPT conversations data']\",An open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.,,https://huggingface.co/datasets/bigcode/the-stack/discussions,,research on LLMs and chatbots,Apache 2.0,text; text,,Vicuna,USA,LMSYS,,,,13B parameters (dense),model,https://lmsys.org/blog/2023-03-30-vicuna/,https://huggingface.co/lmsys/vicuna-13b-delta-v0,,8 A100 GPUs,1 day,,,,,,,\nlimited,,2023-10-11,,['Cohere Base'],Moonhub Recruiter is the world's first AI-powered recruiter providing sourcing and recruiting services for startups and growing businesses.,,,,recruiting candidates for business needs,custom,,,Moonhub Recruiter,USA,Moonhub,,,,,application,https://www.moonhub.ai/,,,,,,,unknown,job candidate matches,https://www.moonhub.ai/terms,unknown,\nopen,https://arxiv.org/abs/2104.08758,2019-10-23,https://huggingface.co/datasets/c4,['CommonCrawl'],The Colossal Clean Crawled Corpus (C4) is a processed version of Common Crawl to facilitate transfer learning in NLP.,\"Data was filtered for English using langdetect. Further, data was filtered to end in terminal punctuation, to remove short pages (less than 5 sentences), and to remove \"\"Dirty, Naughty, Obscene or Otherwise Bad Words\"\".\n\",https://huggingface.co/datasets/c4/discussions,,To faciliate transfer learning research in NLP.,ODC-By 1.0,text,,C4,USA,Google,,\"Data filtering excluded obscene words from a block list as well as short documents and some deduplication was done based on string overlap.\n\",['https://huggingface.co/datasets/c4/viewer/en/train'],750GB,dataset,https://arxiv.org/abs/1910.10683,,,,,,,,,,,\nclosed,unknown,2019-11-25,,[],\"The dataset used to train Internal Google BERT models.\n\",unknown,unknown,\"Web pages, and search queries\",unknown,unknown,text,unknown,Internal Google BERT dataset,USA,Google,unknown,unknown,[],unknown,dataset,https://blog.google/products/search/search-language-understanding-bert/,,,,,,,,,,,\nopen,\"Authors evaluate the dataset on two image captioning models - RNN-based model and Transformer model, under two experimental conditions - using the training & development sets provided by the MS COCO dataset, versus training & development sets using the Conceptual dataset. They use three different test sets- the blind COCO-C40 test set, the Conceptual test set and the Flickr 1K test set. They present both Human and Automatic evaluation results. Human evaluations indicate that the Conceptual-based models are superior. Automatic models fail to corroborate the human evaluation results. This highlights the weakness of these automatic metrics.\",2018-07-01,,[],\"A dataset containing 3 million (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.\n\",\"The following filtering steps are applied in the given order:\n1. Image-based Filtering - \"\"It only keeps JPEG images where both dimensions are greater than 400 pixels, and the ratio of larger to smaller dimension is no more than 2. It excludes images that trigger pornography or profanity detectors. These filters discard more than 65% of the candidates.\"\"\n2. Text-based Filtering - \"\"Candidates with no determiner, no noun, or no preposition are discarded; candidates with a high noun ratio are also discarded; candidates with a high rate of token repetition are discarded; candidates where the first word is not capitalized, or with too high capitalized-word ratio are discarded; we use a vocabulary VW of 1B token types, appearing at least 5 times in the English Wikipedia, and discard candidates that contain tokens that are not found in this vocabulary. candidates that score too high or too low on the polarity annotations, or trigger the pornography/profanity detectors, are discarded; predefined boiler-plate prefix/suffix sequences matching the text are cropped, e.g. “click to enlarge picture”, “stock photo”; we also drop text which begins/ends in certain patterns, e.g. “embedded image permalink”, “profile photo”. These filters only allow around 3% of the incoming candidates to pass to the later stages.\"\"\n3. Image&Text-based Filtering - \"\"We filter out candidates for which none of the text tokens can be mapped to the content of the image. This filter discards around 60% of the incoming candidates.\"\"\n4. Text Transformation with Hypernymization - \"\"Noun modifiers of certain types (proper nouns, numbers, units) are removed; dates, durations, and preposition-based locations (e.g., \"\"in Los Angeles\"\") are removed; named-entities are identified, matched against the KG entries, and substitute with their hypernym; resulting coordination noun-phrases with the same head (e.g., \"\"actor and actor\"\") are resolved into a single-head, pluralized form (e.g., \"\"actors\"\"). Around 20% of samples are discarded during this transformation. We then cluster all resolved entities (e.g., 2560 \"\"actor\"\", \"\"dog\"\", \"\"neighborhood\"\", etc.) and keep only the candidates for which all detected types have a count of over 100 (around 55% of the candidates).\"\"\n\",Feedback can be provided by creating an issue in the [[Conceptual Captions GitHub repository]](https://github.com/google-research-datasets/conceptual-captions) or by emailing at conceptual-captions at google.com,,,Conceptual Captions License,\"image, text\",unknown,Conceptual Captions,USA,Google,unknown,\"Input candidate (image, caption) pairs pass through several stages of filtering and processing to ensure quality.\",[],\"3.3M (image, text) pairs\",dataset,https://aclanthology.org/P18-1238/,,,,,,,,,,,\nopen,\"The dataset is benchmarked against CC3M on two most fundamental V+L tasks: vision-to-language generation and vision-and-language matching, with an emphasis on long-tail visual recognition. The results illustrate the benefit of scaling up pre-training data for vision-and-language tasks, as indicated by the new state-of-the-art results on both the nocaps and Conceptual Captions benchmarks.\n\",2021-02-17,,[],\"A dataset with 12 million image-text pairs specifically meant to be used for vision-and-language pre-training.\n\",\"Some of the filtering steps used in the preparation of Conceptual Captions dataset are relaxed to trade off high-recall for low-precision. The following steps are applied in the given order:\n1. Image-based Filtering - Only keep JPEG images where both dimensions are greater than 400 pixels, and the ratio of larger to smaller dimension is no more than 2.5. Exclude images that trigger pornography or profanity detectors.\n2. Text-based Filtering - Allow text between 3 and 256 words in the alt-text. Discard candidates with no noun or no determiner, but permit ones without prepositions. Set the maximum fraction of word repetition allowed to 0.2. Increase the threshold for counting a word type as rare from 5 to 20.\n3. Image&Text-based Filtering - Filter out candidates for which none of the text tokens can be mapped to the content of the image.\n\",Feedback can be provided by creating an issue in the [[Conceptual 12M GitHub repository]](https://github.com/google-research-datasets/conceptual-12m) or by emailing at conceptual-captions at google.com,,,Conceptual Captions License,\"image, text\",unknown,Conceptual 12M,USA,Google,unknown,\"Input candidate (image, caption) pairs pass through several stages of filtering and processing to ensure quality. Person-name substitutions are performed in the alt-texts to protect the privacy of individuals in the associated images.\",[],\"12M (image, text) pairs\",dataset,https://arxiv.org/pdf/2102.08981.pdf,,,,,,,,,,,\nopen,https://huggingface.co/t5-base#evaluation,2019-10-23,,['C4'],Text-To-Text Transfer Transformer (T5) is a model that unifies all NLP tasks under the text-to-text format.,,https://huggingface.co/t5-large/discussions,,NLP tasks,Apache 2.0,text; text,,T5,USA,Google,unknown,The T5 paper documents many analyses/ablations that were considered before arriving at the final architecture/training procedure.,,11B parameters (dense),model,https://arxiv.org/abs/1910.10683,https://huggingface.co/t5-base,unknown,\"1,024 TPU v3 chips (Cloud TPU Pods)\",unknown,,,,,,,\nclosed,unknown,2019-11-25,,['Internal Google BERT dataset'],\"Internal Google BERT model used to power Google Search products.\n\",,unknown,,unknown,unknown,text; text,unknown,Internal Google BERT,USA,Google,unknown,unknown,,unknown,model,https://blog.google/products/search/search-language-understanding-bert/,unknown,unknown,unknown,unknown,,,,,,,\nopen,,2019-11-25,,\"['Internal Google BERT', 'MUM']\",\"Google Search is Google's search engine.\n\",,\"Feedback can be sent to Google Feedback using the product interface [[Google Feedback]](https://www.google.com/tools/feedback).\n\",,\"Searching the web using text, voice or image\",,,\"It is implied that Google scan uses of its products for spam, malware and illegal content in the [[Term of Service]](https://policies.google.com/terms).\n\",Google Search,USA,Google,\"Prohibited use cases aren't specifically spelled out for Google search, but several illegal and discouraged use cases are shared in the Respect Others section of the [[Term of Service]](https://policies.google.com/terms).\n\",unknown,,,application,https://blog.google/products/search/search-language-understanding-bert/,,,,,unknown,unknown,unknown,web page ranking,https://policies.google.com/terms,unknown,\nclosed,unknown,2021-06-18,,[],\"Infiniset \"\"is a combination of dialog data from public dialog data and other public web documents\"\" [[Appendix E]](https://arxiv.org/pdf/2201.08239.pdf#appendix.E).\n\",unknown,,\"Included in the dataset are data from \"\"public forums (0%); C4 data (12.5% ); code documents from sites related to programming like Q&A sites tutorials, etc (12.5%); Wikipedia (English) (12.5%); English web documents (6.25%); and Non-English web documents (6.25%).\"\"\n\",unknown,unknown,\"code, text\",unknown,Infiniset,USA,Google,\"The prohibited uses for Infiniset weren't specifically listed, but the Google AI principles inspired safety objectives in [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1) advises avoiding harm, unjust impact and misinformation, among others.\n\",unknown,[],unknown,dataset,https://arxiv.org/pdf/2201.08239.pdf,,,,,,,,,,,\nclosed,\"The model performance was analyzed on sensibleness, specificity and interestingness. The model was also analyzed on safety, following metrics derived from Google AI Principles [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1). Finally, the model was analyzed on groundedness, testing its ability to produce responses that can be associated with \"\"known sources whenever possible [[Section 4.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.4.1).\"\"\n\",2021-06-18,,['Infiniset'],\"LaMDA stands for Language Models for Dialog Application. It is a transformer based language model trained on dialogue data.\n\",,,,\"LaMDA is a language model, so it can be used for regular langauge modelling tasks without fine-tuning, but its fine-tuned for dialogue tasks.\n\",unknown,text; text,unknown,LaMDA,USA,Google,\"The prohibited uses of LaMDA weren't specifically listed, but the Google AI principles inspired safety objectives in [[Appendix A.1]](https://arxiv.org/pdf/2201.08239.pdf#subsection.A.1) advises avoiding harm, unjust impact and misinformation, among others.\n\",\"LaMDA was fine-tuned to predict sensibleness, specificity and interestingness as well as safety. Then, the candidates were filtered out if the model safety predictions were below a certain threshold. The next candidates in the conversation were selected as a combination of these predictions. The model was also fine-tuned for groundedness. The results are shown in [[Figure 5]](https://arxiv.org/pdf/2201.08239.pdf#figure.caption.23).\n\",,137B parameters (dense),model,https://arxiv.org/pdf/2201.08239.pdf,,26 tCO2e,1024 TPU-V3 chips,4108.80 petaflop/s-day,,,,,,,\nclosed,unknown,2022-04-04,https://arxiv.org/pdf/2204.02311.pdf#appendix.D,['Infiniset'],\"PaLM dataset \"\"was created for pre-training language models\"\" [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D).\n\",\"GitHub repositories with copyleft licenses were excluded. Programming languageges other than the most common 24 were excluded [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3).\n\",Contact the authors.,\"The dataset is based on Infiniset. It included multilingual text containing text from over 100 languages. The breakdown of the data included is as follows: Social media conversations (multilingual) 50, Filtered webpages (multilingual) 27%, BooksCorpus (English) 13%, GitHub (code) 5%, Wikipedia (multilingual) 4%, and News (English) 1%. Code was collected from GitHub repositories with appropriate licenses, totalling 96GB of source code [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3).\n\",\"\"\"The dataset was created for pre-training language models by a team of researchers at Google\"\".\n\",unknown,\"code, text\",unknown,PaLM dataset,USA,Google,\"\"\"... should not be used for any of the unacceptable language model use cases, e.g., generation of toxic speech\"\" [[Datasheet]](https://arxiv.org/pdf/2204.02311.pdf#appendix.D).\n\",\"In order to reduce low quality web pages, the web pages were sampled according to a \"\"quality score\"\" classifier. Code files were de-duplicated using Levenshtein distance [[Section 3]](https://arxiv.org/pdf/2204.02311.pdf#section.3).\n\",[],3.92 TB,dataset,https://arxiv.org/pdf/2204.02311.pdf,,,,,,,,,,,\nopen,Evaluated on a variety of standard language datasets.,2022-10-20,,\"['T5', 'Muffin', 'P3', 'NaturalInstructions-v2', 'Flan CoT']\",Flan-T5 is a version of the T5 language model fine-tuned on instruction data,,https://huggingface.co/google/flan-t5-xxl/discussions,,unknown,Apache 2.0,text; text,,Flan-T5,USA,Google,,\"Across different multitask datasets, templates and formatting were maintained. For the chain-of-thoughts (CoT) data, specific exemplars were used.\",,11B parameters (dense),model,https://arxiv.org/abs/2210.11416,https://arxiv.org/pdf/2210.11416.pdf,Unknown,512 v4 TPU Chips,Unknown,,,,,,,\nopen,,2022-05-10,,['C4'],UL2 is a language model trained with a new pretraining objective,,,,,Apache 2.0,text; text,,UL2,USA,Google,,,,20B parameters (dense),model,https://arxiv.org/abs/2205.05131,,,128 TPUv4,,,,,,,,\nclosed,,2022-06-22,,\"['C4', 'LAION-400M', 'FIT400M', 'JFT-4B']\",Parti is a text-to-image diffusion model,,,,,unknown,text; image,,Parti,USA,Google,,,,20B parameters (dense),model,https://parti.research.google/,,,unknown,unknown,,,,,,,\nopen,,2022-05-23,,\"['LAION-400M', 'Google internal image-text dataset']\",Imagen is a text-to-image diffusion model,,,,,unknown,text; image,,Imagen,USA,Google,,,,14B parameters (dense),model,https://imagen.research.google/,,unknown,128 TPU-v4,unknown,,,,,,,\nopen,,2022-04-22,,\"['AudioSet', 'HowTo100M']\",VATT is a family of models trained on multimodal data,,,,,Apache 2.0,\"text; audio, video\",,VATT,USA,Google,,,,155M parameters (dense),model,https://arxiv.org/abs/2104.11178,,unknown,256 TPU-v3,3 days,,,,,,,\nlimited,\"\"\"PaLM is evaluated on English Natural Language Processing (NLP) tasks, tasks from BIG-bench, reasoning tasks, code completion tasks, multilingual generation and question answering tasks, translation tasks, and bias and toxicity benchmarks\"\" [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E).\n\",2022-04-04,,['PaLM dataset'],\"PaLM stands Pathways Language Model, \"\"dense decoder-only Transformer model trained with the Pathways system\"\" [[Google ai Blog]](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html).\n\",,Contact the authors.,,\"\"\"The primary use is research on language models, including: research on NLP applications like machine translation and question answering, advancing fairness and safety research, and understanding limitations of current LLMs. Within Google, PaLM is being used for research on a variety of open- ended text and code generation tasks, including reasoning [[Section 6.3]](https://arxiv.org/pdf/2204.02311.pdf#subsection.6.3) and code synthesis and understanding [[Section 6.4]](https://arxiv.org/pdf/2204.02311.pdf#subsection.6.4)\"\" [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E).\n\",unknown,\"text; text, code\",unknown,PaLM,USA,Google,\"The model \"\"should not be used for downstream applications without further analysis on factors in the proposed downstream application [[Model Card]](https://arxiv.org/pdf/2204.02311.pdf#appendix.E)\"\"\n\",Unknown,,540B parameters (dense),model,https://arxiv.org/pdf/2204.02311.pdf,https://arxiv.org/pdf/2204.02311.pdf#appendix.E,271.43 tCO2,6144 TPU v4 chips,29600 petaflop/s-days,,,,,,,\nlimited,,2023-03-14,,['PaLM'],a new developer offering that makes it easy and safe to experiment with Google’s language models.,,,,,unknown,,,PaLM API,USA,Google,,,,,application,https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html,,,,,,,,,,,\nclosed,,2022-12-26,,\"['Flan-PaLM', 'MultiMedQA']\",,,,,,unknown,text; text,,Med-PaLM,USA,Google,,,,540B parameters (dense),model,https://arxiv.org/abs/2212.13138,,,,,,,,,,,\nclosed,Evaluated on MultiMedBench tasks and radiologist evaluations of model-generated chest X-ray reports,2023-07-26,,\"['PaLM-E', 'MultiMedBench']\",,,,,,unknown,\"image, text, genome sequence; text\",,Med-PaLM Multimodal,USA,Google,,,,562B parameters (dense),model,https://arxiv.org/pdf/2307.14334.pdf,,,,,,,,,,,\nclosed,,2022-12-26,,\"['MedQA', 'MedMCQA', 'PubMedQA', 'MMLU', 'LiveQA', 'Medication QA', 'HealthSearchQA']\",,,,,,unknown,text; text,,MultiMedQA,USA,Google,,,,unknown,model,https://arxiv.org/abs/2212.13138,,,,,,,,,,,\nclosed,,2022-10-20,,\"['PaLM', 'Muffin', 'P3', 'NaturalInstructions-v2']\",,,,,,unknown,text; text,,Flan-PaLM,USA,Google,,,,540B parameters (dense),model,https://arxiv.org/abs/2210.11416,,,,,,,,,,,\nclosed,,2022-10-20,,\"['U-PaLM', 'Muffin', 'P3', 'NaturalInstructions-v2']\",,,,,,unknown,text; text,,Flan-U-PaLM,USA,Google,,,,540B parameters (dense),model,https://arxiv.org/abs/2210.11416,,,,,,,,,,,\nopen,,2021-09-03,,[],,,,,,Apache 2.0,text,,Muffin,USA,Google,,,[],62 tasks,dataset,https://arxiv.org/abs/2109.01652,,,,,,,,,,,\nclosed,,2022-10-20,,\"['PaLM', 'PaLM dataset']\",,,,,,unknown,text; text,,U-PaLM,USA,Google,,,,540B parameters (dense),model,https://arxiv.org/abs/2210.11399,,,,,,,,,,,\nclosed,,2022-08-16,,['PaLM'],,,,,,\"unknown (model weights), Apache 2.0 (SayCan code)\",text; robotics trajectories,,PaLM-SayCan,USA,Google,,,,540B parameters (dense),model,https://arxiv.org/abs/2204.01691,,,,,,,,,,,\nclosed,,2021-12-13,,\"['GLaM Web dataset', 'Wikipedia', 'GLaM Conversations dataset', 'GLaM Forums dataset', 'BooksCorpus', 'GLaM News dataset']\",,,,,,unknown,text; text,,GLaM,USA,Google,,,,1.2T parameters (sparse),model,https://arxiv.org/abs/2112.06905,,,,,,,,,,,\nclosed,,2021-12-13,,[],,,,,,unknown,text,,GLaM Web dataset,USA,Google,,,[],unknown,dataset,https://arxiv.org/abs/2112.06905,,,,,,,,,,,\nclosed,,2021-12-13,,[],,,,,,unknown,text,,GLaM Conversations dataset,USA,Google,,,[],unknown,dataset,https://arxiv.org/abs/2112.06905,,,,,,,,,,,\nclosed,,2021-12-13,,[],,,,,,unknown,text,,GLaM Forums dataset,USA,Google,,,[],unknown,dataset,https://arxiv.org/abs/2112.06905,,,,,,,,,,,\nclosed,,2021-12-13,,[],,,,,,unknown,text,,GLaM News dataset,USA,Google,,,[],unknown,dataset,https://arxiv.org/abs/2112.06905,,,,,,,,,,,\nclosed,,2021-05-18,,['MUM dataset'],MUM (Multitask Unified Model) is a multimodal model that is specialized for more complex queries.,,,,,unknown,\"image, text; text\",,MUM,USA,Google,,,,unknown,model,https://blog.google/products/search/introducing-mum/,,,,,,,,,,,\nclosed,,2021-05-18,,[],,,,,,unknown,\"image, text\",,MUM dataset,USA,Google,,,[],unknown,dataset,https://blog.google/products/search/introducing-mum/,,,,,,,,,,,\nclosed,,2023-02-01,,\"['LAION-400M', 'Phenaki Video-Text Corpus']\",,,,,,unknown,text; video,,Phenaki,USA,Google,,,,1.8B parameters (dense),model,https://openreview.net/pdf?id=vOEXS39nOF,,,,,,,,,,,\nclosed,,2023-02-01,,[],,,,,,unknown,\"text, video\",,Phenaki Video-Text Corpus,USA,Google,,,[],15M text-video pairs at 8FPS,dataset,https://openreview.net/pdf?id=vOEXS39nOF,,,,,,,,,,,\nopen,,2023-03-02,,\"['UL2', 'Flan Collection']\",,,,,,Apache 2.0,text; text,,Flan-UL2,USA,Google,,,,20B parameters (dense),model,https://arxiv.org/abs/2205.05131,,,,,,,,,,,\nopen,,2023-01-31,,\"['Flan dataset', 'P3', 'NaturalInstructions-v2']\",,,,,,Apache 2.0,text,,Flan Collection,USA,Google,,,[],1836 tasks,dataset,https://arxiv.org/abs/2301.13688,,,,,,,,,,,\nclosed,,2023-01-26,,\"['SoundStream', 'w2v-BERT', 'MuLan', 'MusicLM semantic model', 'MusicLM acoustic model']\",,,,,,unknown,text; audio,,MusicLM,USA,Google,,,,1.4B parameters (dense),model,https://arxiv.org/pdf/2301.11325.pdf,,,,,,,,,,,\nclosed,,2023-01-26,,['Free Music Archive'],,,,,,unknown,audio; audio,,SoundStream,USA,Google,,,,unknown,model,https://arxiv.org/pdf/2301.11325.pdf,,,,,,,,,,,\nclosed,,2023-01-26,,['Free Music Archive'],,,,,,unknown,audio; audio,,w2v-BERT,USA,Google,,,,600M parameters (dense),model,https://arxiv.org/pdf/2301.11325.pdf,,,,,,,,,,,\nclosed,,2022-08-26,,\"['AST', 'BERT', 'MuLan dataset']\",,,,,,unknown,text; audio,,MuLan,USA,Google,,,,unknown,model,https://arxiv.org/abs/2208.12415,,,,,,,,,,,\nclosed,,2022-08-26,,[],,,,,,unknown,\"audio, text\",,MuLan dataset,USA,Google,,,[],370K hours audio,dataset,https://arxiv.org/abs/2208.12415,,,,,,,,,,,\nclosed,,2023-01-26,,[],,,,,,unknown,audio,,MusicLM dataset,USA,Google,,,[],280K hours audio,dataset,https://arxiv.org/pdf/2301.11325.pdf,,,,,,,,,,,\nclosed,,2023-01-26,,['MusicLM dataset'],,,,,,unknown,audio; audio,,MusicLM semantic model,USA,Google,,,,430M parameters (dense),model,https://arxiv.org/pdf/2301.11325.pdf,,,,,,,,,,,\nclosed,,2023-01-26,,['MusicLM dataset'],,,,,,unknown,audio; audio,,MusicLM acoustic model,USA,Google,,,,430M parameters (dense),model,https://arxiv.org/pdf/2301.11325.pdf,,,,,,,,,,,\nclosed,,2023-02-08,,['Noise2Music pseudolabel dataset'],,,,,,unknkown,\"audio, text; audio\",,Noise2Music,USA,Google,,,,unknown,model,https://google-research.github.io/noise2music/noise2music.pdf,,,,,,,,,,,\nclosed,,2023-02-08,,['LaMDA'],,,,,,unknown,text,,LaMDA-LF,USA,Google,,,[],150k songs,dataset,https://google-research.github.io/noise2music/noise2music.pdf,,,,,,,,,,,\nclosed,,2023-02-08,,['MusicCaps'],,,,,,unknown,text,,Rater-LF,USA,Google,,,[],10k captions,dataset,https://google-research.github.io/noise2music/noise2music.pdf,,,,,,,,,,,\nclosed,,2023-02-08,,['MusicCaps'],,,,,,unknown,text,,Rater-SF,USA,Google,,,[],24k captions,dataset,https://google-research.github.io/noise2music/noise2music.pdf,,,,,,,,,,,\nclosed,,2023-02-08,,\"['MuLan', 'MuLaMCap', 'LaMDA-LF', 'Rater-LF', 'Rater-SF']\",,,,,,unknown,\"audio, text; audio\",,Noise2Music pseudolabeler,USA,Google,,,,unknown,model,https://google-research.github.io/noise2music/noise2music.pdf,,,,,,,,,,,\nclosed,,2023-02-08,,[],,,,,,unknown,audio,,Noise2Music audio dataset,USA,Google,,,[],340k hours audio,dataset,https://google-research.github.io/noise2music/noise2music.pdf,,,,,,,,,,,\nclosed,,2023-02-08,,\"['Noise2Music audio dataset', 'Noise2Music pseudolabeler']\",,,,,,unknown,\"audio, text\",,Noise2Music pseudolabel dataset,USA,Google,,,[],340k hours audio with pseudolabels,dataset,https://google-research.github.io/noise2music/noise2music.pdf,,,,,,,,,,,\nlimited,,2022-08-25,,['LaMDA'],\"AI Test Kitchen provides a new way for people to learn about, experience, and give feedback on emerging AI technology, like LaMDA.\",,,,,unknown,,,AI Test Kitchen,USA,Google,,,,,application,https://blog.google/technology/ai/join-us-in-the-ai-test-kitchen/,,,,,,,,,,,\nclosed,,2023-02-06,,['LaMDA'],\"Conversational AI service, powered by LaMDA\",,,,,unknown,,,Bard,USA,Google,,,,,application,https://blog.google/technology/ai/bard-google-ai-search-updates/,,,,,,,,,,,\nclosed,,2022-06-29,,\"['PaLM', 'arXiv', 'PaLM dataset', 'Minerva Math Web Pages dataset']\",,,,,,unknown,text; text,,Minerva,USA,Google,,,,540B parameters (dense),model,https://arxiv.org/abs/2206.14858,,,,,,,,,,,\nclosed,,2022-06-29,,[],,,,,,unknown,text,,Minerva Math Web Pages dataset,USA,Google,,,[],17.5B tokens,dataset,https://arxiv.org/abs/2206.14858,,,,,,,,,,,\nlimited,,2023-03-06,,\"['YT-NLU-U', 'Pub-U', 'Web-NTL', 'YT-SUP+', 'Pub-S']\",\"Universal Speech Model (USM) is a family of state-of-the-art speech models with 2B parameters trained on 12 million hours of speech and 28 billion sentences of text, spanning 300+ languages. USM, which is for use in YouTube (e.g., for closed captions), can perform automatic speech recognition (ASR) on widely-spoken languages like English and Mandarin, but also languages like Punjabi, Assamese, Santhali, Balinese, Shona, Malagasy, Luganda, Luo, Bambara, Soga, Maninka, Xhosa, Akan, Lingala, Chichewa, Nkore, Nzema to name a few. Some of these languages are spoken by fewer than twenty million people, making it very hard to find the necessary training data.\",,,,,unknown,\"audio, text; text\",,USM,USA,Google,,,,2B parameters (dense),model,https://arxiv.org/abs/2303.01037,,,,,,,,,,,\nopen,,2005-02-14,,['USM'],YouTube is a global online video sharing and social media platform,,,,,,,,YouTube,USA,Google,,,,,application,https://www.youtube.com/,,,,,,,,,,,\nclosed,,2023-03-06,,\"['PaLM', 'ViT-22B']\",,,,,,unknown,\"image, text; text\",,PaLM-E,USA,Google,,,,562B parameters (dense),model,https://arxiv.org/abs/2303.03378,,,,,,,,,,,\nclosed,,2023-02-10,,['JFT'],,,,,,unknown,\"image, image\",,ViT-22B,USA,Google,,,,22B parameters (dense),model,https://arxiv.org/abs/2302.05442,,,,,,,,,,,\nclosed,,2022-09-07,,\"['w2v-BERT', 'SoundStream']\",,,,,,unknown,\"audio, text; audio\",,AudioLM,USA,Google,,,,1B parameters (dense),model,https://arxiv.org/abs/2209.03143,,,,,,,,,,,\nclosed,,2022-09-14,,\"['mT5', 'ViT-e', 'WebLI']\",,,,,,unknown,text; image,,PaLI,USA,Google,,,,17B parameters (dense),model,https://arxiv.org/abs/2209.06794,,,,,,,,,,,\nclosed,,2022-09-14,,['JFT'],,,,,,unknown,image; image,,ViT-e,USA,Google,,,,3.9B parameters (dense),model,https://arxiv.org/abs/2209.06794,,,,,,,,,,,\nclosed,,2022-09-14,,[],,,,,,unknown,\"image, text\",,WebLI,USA,Google,,,[],\"10B images, 12B alt-text\",dataset,https://arxiv.org/abs/2209.06794,,,,,,,,,,,\nopen,,2023-02-27,,\"['T5', 'CLIP', 'YT-Temporal-1B']\",,,,,,Apache 2.0,video; text,,Vid2Seq,USA,Google,,,,500M parameters (dense),model,https://arxiv.org/abs/2302.14115,,,,,,,,,,,\nclosed,\"evaluated on DSTC11 Challenge Task, based on MultiWoz 2.1, with a focus on dialog state tracking.\",2023-06-08,,\"['CTC blank-filtering', 'Speech2Text adapter']\",Joint speech and language model using a Speech2Text adapter and using a CTC-based blank-filtering.,,,,,unknown,audio; text,,Google Joint SLM,USA,Google,,,,unknown,model,https://arxiv.org/pdf/2306.07944.pdf,,,,,,,,,,,\nclosed,Reports results on standard code benchmarks across a variety of programming languages.,2023-05-10,,['PaLM 2 dataset'],PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives similar to UL2.,,Specific queries provided by annotators,,\"general use large language model that can be used for language, reasoning, and code tasks.\",unknown,text; text,Google internal monitoring,PaLM 2,USA,Google,becoming part of a general-purpose service or product or use within specific downstream applications without prior assessment,\"Employed de-duplication, removal of sensitive-PII and filtering. Added control tokens marking toxicity of text.\",,unknown,model,https://blog.google/technology/ai/google-palm-2-ai-large-language-model/,https://ai.google/static/documents/palm2techreport.pdf,,TPU v4 (number unspecified),,,,,,,,\nlimited,\"Assessed on medical benchmarks of professional medical exams, medical research, and consumer queries.\",2023-12-13,,[],\"MedLM is a collection of foundation models tuned to follow natural language instructions for tasks in medicine, such as question answering and creating draft summaries.\",,,,\"to be used for question answering and creating draft summaries from existing documentation, to be reviewed, edited, and approved by the user before use.\",unknown,text; text,Google internal monitoring,MedLM,USA,Google,,,,unknown,model,https://cloud.google.com/vertex-ai/docs/generative-ai/medlm/overview,https://cloud.google.com/static/vertex-ai/docs/generative-ai/medlm/MedLM-model-card.pdf,unknown,unknown,unknown,,,,,,,\nclosed,\"Evaluated on standard general, reasoning, math, coding, and multimodal benchmarks with results that surpass GPT-4 on almost all.\",2023-12-06,,[],\"As of release, Gemini is Google's most capable and flexible AI model, proficient in multimodal domains.\",,,,\"general use large language model that can be used for language, reasoning, and code tasks.\",unknown,\"text; image, text, video\",Google internal monitoring,Gemini,USA,Google,becoming part of a general-purpose service or product or use within specific downstream applications without prior assessment,,,unknown,model,https://deepmind.google/technologies/gemini/#introduction,,unknown,unknown,unknown,,,,,,,\nclosed,Evaluated on popular time-series benchmarks.,2024-02-02,,[],TimesFM is a single forecasting model pre-trained on a large time-series corpus of 100 billion real world time-points.,,,,,unknown,,unknown,TimesFM,USA,Google,,,,200M parameters (dense),model,https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html,,unknown,unknown,unknown,,,,,,,\nopen,Evaluation was conducted on standard LLM benchmarks and includes internal red-teaming testing of relevant content policies.,2024-02-21,,[],\"Gemma is a family of lightweight, state-of-the-art open models from Google, based on the Gemini models. They are text-to-text, decoder-only large language models, available in English.\",,https://huggingface.co/google/gemma-7b/discussions,,\"Text generation tasks including question answering, summarization, and reasoning; content creation, communication, research, and education.\",custom,text; text,,Gemma,USA,Google,Prohibited uses are specified in the Gemma Prohibited Use Policy here https://ai.google.dev/gemma/prohibited_use_policy,\"Multiple evaluations and red-teaming conducted, with particular focus on ethics, bias, fair use cases, and safety.\",,7B parameters (dense),model,https://blog.google/technology/developers/gemma-open-models/,https://huggingface.co/google/gemma-7b,unknown,TPUv5e,unknown,,,,,,,\nclosed,\"Evaluated Med-Gemini on 14 medical benchmarks spanning text, multimodal and long-context applications, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpassing the GPT-4 model family on every benchmark where a direct comparison is viable.\",2024-04-29,,\"['Gemini', 'MultiMedBench']\",\"Med-Gemini is a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly integrate the use of web search, and that can be efficiently tailored to novel modalities using custom encoders.\",,,,\"To be used in areas of medical research including medical summarization, referral letter generation, and medical simplification tasks.\",unknown,\"image, text; text\",,Med-Gemini,USA,Google,Unfit for real-world deployment in the safety-critical medical domain.,,,unknown,model,https://arxiv.org/pdf/2404.18416,,unknown,unknown,unknown,,,,,,,\nopen,Unknown,2024-09-05,,[],\"Imagen 3 is a high-quality text-to-image model capable of generating images with improved detail, richer lighting, and fewer distracting artifacts. It features improved prompt understanding and can be used to generate a wide array of visual styles from quick sketches to high-resolution images. The model is available in multiple versions, each optimized for particular types of tasks. Imagen 3 has been trained to capture nuances like specific camera angles or compositions in long, complex prompts, making it a versatile tool for image generation from textual inputs.\",,Unknown,,\"Imagen 3 is intended to be used for generation of high-resolution images from textual prompts, from photorealistic landscapes to richly textured oil paintings or whimsical claymation scenes. It can also be used for stylized birthday cards, presentations, and more, due to its improved text rendering capabilities.\",Unknown,text; image,Unknown,Imagen 3,unknown,Google DeepMind,Unknown,Unknown,,Unknown,model,https://deepmind.google/technologies/imagen-3/,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,The 27B Gemma 2 model outperforms other open models in its size category offering cutting-edge performance. Specific details can be found in the provided technical report.,2024-06-27,,\"['Gemma', 'CodeGemma', 'RecurrentGemma', 'PaliGemma']\",\"Gemma 2 is an open model that offers best-in-class performance and runs at incredible speed across different hardware. It easily integrates with other AI tools. This model is built on a redesigned architecture engineered for exceptional performance and inference efficiency. It is available in both 9 billion (9B) and 27 billion (27B) parameter sizes. Gemma 2 is optimized to run at incredible speed across a range of hardware, from powerful gaming laptops and high-end desktops, to cloud-based setups.\",,Unknown,,Gemma 2 is designed for developers and researchers for various AI tasks. It can be used via the integrations it offers with other AI tools/platforms and can additionally be deployed for more accessible and budget-friendly AI deployments.,Gemma (commercially-friendly license given by Google DeepMind),text; text,Unknown,Gemma 2,unknown,Google DeepMind,Not specified,\"Google DeepMind implemented a refined architecture for Gemma 2. The model has improvements in safety and efficiency over the first generation. The deployment of Gemma 2 on Vertex AI, scheduled for the next month, will offer effortless management of the model.\",,27B parameters (dense),model,https://blog.google/technology/developers/google-gemma-2/,unknown,Unknown,\"Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, NVIDIA H100 Tensor Core GPU\",Unknown,,,,,,,\nopen,The model nearly matches the performance of the Gemini 1.5 Flash model across many benchmarks through developer feedback and testing.,2024-10-03,,[],\"Gemini 1.5 Flash-8B is a lightweight, speed-optimized variant of the Gemini 1.5 Flash model, designed for efficiency and lower latency especially for tasks such as chat, transcription, and long context language translation. The model is production-ready and offers the lowest cost per intelligence of any Gemini model with improved rate limits.\",,Encourages developer feedback to inform model improvements and future updates.,,\"Designed for simple, high-volume tasks including multimodal use cases and long-context summarization.\",unknown,\"audio, image, text, video; text\",unknown,Gemini 1.5 Flash-8B,unknown,Google DeepMind,unknown,Model improvement informed by developer feedback and extensive testing.,,8B parameters,model,https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/,unknown,unknown,unknown,unknown,,,,,,,\nopen,\"After giving the community time to use the models and explore different applications, we collected feedback.\",2024-11-25,,\"['EfficientNet-L2', 'BERT', 'CLIP', 'BLIP-2', 'BiT ResNet-101x3', 'ViT-S']\",Health AI Developer Foundations (HAI-DEF) is a new suite of open weight models to help developers more easily build AI models for healthcare applications.,,unknown,,\"Building AI models for healthcare applications including imaging applications in radiology, dermatology, and pathology.\",unknown,\"text, image; vector\",\"After giving the community time to use the models and explore different applications, we collected feedback.\",Health AI Developer Foundations (HAI-DEF),USA,Google Research,unknown,Learning from community experience with previous research endpoints...we collected feedback.,,unknown,model,http://goo.gle/3AWVX33,unknown,unknown,unknown,unknown,,,,,,,\nclosed,Unknown,2024-12-04,,['Imagen 3'],\"Genie 2 is a foundation world model capable of generating an endless variety of action-controllable, playable 3D environments for training and evaluating embodied agents based on a single prompt image.\",,Unknown,,\"Genie 2 can be used for generating diverse environments for training and evaluating AI agents, rapid prototyping interactive experiences, and experimenting with novel environments.\",Unknown,image; video,Unknown,Genie 2,unknown,Google DeepMind,Unknown,\"Responsible development is emphasized, developing our technologies responsibly and building towards more general AI systems that can safely carry out tasks.\",,Unknown,model,https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/,unknown,Unknown,Unknown,Unknown,,,,,,,\nlimited,\"Veo 2 outperforms other leading video generation models, based on human evaluations of its performance.\",2024-12-16,,[],\"Veo 2 is a state-of-the-art video generation model that creates videos with realistic motion and high-quality output, up to 4K, with extensive camera controls. It simulates real-world physics and offers advanced motion capabilities with enhanced realism and fidelity.\",,unknown,,\"Creating high-quality videos with realistic motion, different styles, camera controls, shot styles, angles, and movements.\",unknown,text; video,unknown,Veo 2,unknown,Google DeepMind,unknown,\"Veo 2 includes features that enhance realism, fidelity, detail, and artifact reduction to ensure high-quality output.\",,unknown,model,https://deepmind.google/technologies/veo/veo-2/,unknown,unknown,unknown,unknown,,,,,,,\nlimited,unknown,2024-12-11,,[],\"Google DeepMind introduces Gemini 2.0, a new AI model designed for the 'agentic era.'\",,unknown,,\"Develop more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision.\",unknown,\"text, video, image, audio; image, text\",unknown,Gemini 2.0,unknown,Google DeepMind,unknown,\"Google is committed to building AI responsibly, with safety and security as key priorities.\",,unknown,model,https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/#ceo-message,unknown,unknown,\"custom hardware like Trillium, our sixth-generation TPUs\",unknown,,,,,,,\nopen,\"Our research demonstrates leading performance on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation, as detailed in the technical report.\",2024-12-05,,\"['PaliGemma', 'Hugging Face', 'Kaggle', 'Hugging Face Transformers', 'Keras', 'PyTorch', 'JAX']\",\"PaliGemma 2 builds upon the performant Gemma 2 models, adding the power of vision and making it easier than ever to fine-tune for exceptional performance. With PaliGemma 2, these models can see, understand, and interact with visual input, opening up a world of new possibilities.\",,Your feedback and contributions are invaluable in shaping the future of these models and driving innovation in the field.,,\"fine-tuning for specific tasks and datasets straightforward, empowering you to tailor its capabilities to your precise needs.\",unknown,image; text,unknown,PaliGemma 2,USA,Google,unknown,unknown,,28B parameters,model,https://developers.googleblog.com/en/introducing-paligemma-2-powerful-vision-language-models-simple-fine-tuning/,unknown,unknown,unknown,unknown,,,,,,,\nopen,,2023-07-26,,\"['GPT-4 API', 'Code Llama', 'Claude API', 'WizardCoder', 'PaLM API']\",Continue is the open-source autopilot for software development. It is an IDE extension that brings the power of ChatGPT to VS Code and JetBrains. It’s built to be deeply customizable and continuously learn from development data.,,,,,Apache 2.0,,,Continue,USA,\"Continue Dev, Inc.\",,,,,application,https://continue.dev,,,,,,,,,https://github.com/continuedev/continue/blob/main/LICENSE,,\nopen,,2022-01-07,,['YouTube'],,,,,,MIT,video,,YT-Temporal-1B,USA,University of Washington,,,[],20M videos,dataset,https://arxiv.org/abs/2201.02639,,,,,,,,,,,\nopen,Evaluated on standard VLM benchmarks and outperforms SotA open-source VLMs as of release.,2024-02-09,,[],Prism is a family of VLMs trained using new analyses about key vision design axes.,,,,,LLaMA 2,\"image, text; text\",unknown,Prism,Japan,Toyota Research Institute,,,,7B parameters (dense),model,https://arxiv.org/pdf/2402.07865.pdf,,unknown,8 A100 GPUs,less than 9 hours,,,,,,,\nopen,Evaluated in comparison to SOTA video-to-language models.,2023-10-23,,\"['MSR-VTT', 'Video-ChatGPT Video Descriptions Dataset']\",Pegasus-1 is a video-language foundation model.,,,,,unknown,video; text,,Pegasus-1,USA,Twelve Labs,,Data selected and cleaned to eliminate toxic and biased content.,,80B parameters (dense),model,https://app.twelvelabs.io/blog/introducing-pegasus-1,,unknown,unknown,unknown,,,,,,,\nopen,\"Marengo-2.6 sets new benchmarks in zero-shot text-to-video, text-to-image, and text-to-audio retrieval tasks with a single embedding model.\",2024-03-01,,[],\"Marengo 2.6 is a new state-of-the-art (SOTA) multimodal foundation model capable of performing any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, Image-To-Video, and more. \",,,,,unknown,\"audio, image, text; audio, image, video\",,Marengo 2.6,USA,Twelve Labs,,,,unknown,model,https://www.twelvelabs.io/blog/introducing-marengo-2-6,,unknown,unknown,unknown,,,,,,,\nclosed,Compared to self before being scaled on quality of video generation.,2023-09-29,,[],\"GAIA-1 (‘Generative AI for Autonomy’) is a generative world model that leverages video, text, and action inputs to generate realistic driving scenarios while offering fine-grained control over ego-vehicle behavior and scene features.\",,,,\"The main use cases are pure representation learning, planning (look-ahead search), or learning a policy in the world model (neural simulator)\",unknown,\"text, video; video\",,GAIA-1,UK,Wayve,,,,9B parameters (dense),model,https://arxiv.org/pdf/2309.17080.pdf,,unknown,32 A100 80GB GPUs,4 days,,,,,,,\nopen,\"Evaluated on all language datasets compared to similarly sized SOTA models, with Aurora-M achieving strong performance in most.\",2024-04-23,,['StarCoderPlus'],\"Aurora-M is a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code.\",,,,,unknown,text; text,unknown,Aurora-M,unknown,\"Tokyo Institute of Technology, MIT-IBM Watson Lab, Sapienza University of Rome\",,,,15B parameters,model,https://arxiv.org/pdf/2404.00399,,unknown,\"LUMI supercomputer, using 128 AMD MI250X GPUs\",48 days,,,,,,,\nopen,Unknown,2024-09-28,,\"['Glaive', 'Llama 3.1']\",\"Reflection Llama-3.1 70B is an open-source LLM, trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course. The model was trained on synthetic data generated by Glaive.\",,Unknown,,The model is intended for complex reasoning and reflection tasks. It is designed to separate its internal thoughts and reasoning from its final answer.,Llama 3.1 Community License Agreement,text; text,Unknown,Reflection Llama-3.1 70B,Unknown,Unknown,Unknown,The model uses a Reflection-Tuning technique which allows it to notice mistakes in its reasoning and correct them.,,70B parameters,model,https://huggingface.co/mattshumer/Reflection-70B,https://huggingface.co/mattshumer/Reflection-70B,Unknown,Unknown,Unknown,,,,,,,\nopen,\"The model outperforms other models of similar and larger sizes on multimodal benchmarks. An open-source benchmark, MM-MT-Bench, is contributed for evaluating vision-language models.\",2024-10-10,,['Mistral Nemo 12B'],\"Pixtral 12B is a 12-billion-parameter multimodal language model trained to understand both natural images and documents, achieving leading performance on various multimodal benchmarks without compromising on natural language performance.\",,Unknown,,\"Multimodal instruction following tasks, capable of multi-turn, multi-image conversations.\",Apache 2.0,text; image,Evaluation protocols and benchmarks are open-sourced to establish fair and standardized testing.,Pixtral 12B,Unknown,Unknown,Unknown,Evaluation protocols for multimodal language models were standardized and analysis was conducted to improve the reliability of model evaluations.,,12B parameters,model,https://arxiv.org/pdf/2410.07073,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,,2023-04-20,,['AudioLM'],Bark is a text-to-audio model that can generate multilingual speech as well as other noises.,,https://huggingface.co/spaces/suno/bark/discussions,,,MIT,text; audio,,Bark,USA,Suno,,,,,model,https://github.com/suno-ai/bark,https://github.com/suno-ai/bark/blob/main/model-card.md,unknown,,unknown,,,,,,,\nopen,,2023-08-03,,['NASA HLS data'],\"Prithvi is a first-of-its-kind temporal Vision transformer pre-trained by the IBM and NASA team on contiguous US Harmonised Landsat Sentinel 2 (HLS) data. The model adopts a self-supervised encoder developed with a ViT architecture and Masked AutoEncoder (MAE) learning strategy, with an MSE loss function.\",,https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M/discussions,,,Apache 2.0,\"text, video; text, video\",,Prithvi,USA,IBM,,,,100M parameters (dense),model,https://github.com/NASA-IMPACT/hls-foundation-os,https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M,,,,,,,,,,\nlimited,,2023-09-07,,['Granite'],\"Watsonx.ai is part of the IBM watsonx platform that brings together new generative AI capabilities, powered by foundation models and traditional machine learning into a powerful studio spanning the AI lifecycle.\",,,,,custom,,,Watsonx.ai,USA,IBM,,,,,application,https://www.ibm.com/products/watsonx-ai,,,,,,,,deployed AI models,https://www.ibm.com/docs/en/watsonx-as-a-service?topic=models-terms-use,,\nlimited,unknown,2023-09-28,,[],Granite is a set of multi-size foundation models that apply generative AI to both language and code.,,,,,,\"text; code, text\",,Granite,USA,IBM,,\"Training data passed through IBM HAP detector, language model designed to remove harmful content. Data also deduplicated and filtered for document quality.\",,13B parameters (dense),model,https://www.ibm.com/blog/building-ai-for-business-ibms-granite-foundation-models/,,unknown,unknown,unknown,,,,,,,\nopen,Granite 3.0 8B Instruct matches leading similarly-sized open models on academic benchmarks while outperforming those peers on benchmarks for enterprise tasks and safety.,2024-10-21,,['Hugging Face’s OpenLLM Leaderboard v2'],\"IBM Granite 3.0 models deliver state-of-the-art performance relative to model size while maximizing safety, speed and cost-efficiency for enterprise use cases.\",,unknown,,\"classic natural language use cases including text generation, classification, summarization, entity extraction and customer service chatbots, programming language use cases such as code generation, code explanation and code editing, and for agentic use cases requiring tool calling\",Apache 2.0,text; text,,IBM Granite 3.0,USA,IBM,unknown,\"The entire Granite family of models are trained on carefully curated enterprise datasets, filtered for objectionable content with critical concerns like governance, risk, privacy and bias mitigation in mind\",,8B parameters,model,https://www.ibm.com/new/ibm-granite-3-0-open-state-of-the-art-enterprise-models,unknown,unknown,unknown,unknown,,,,,,,\nclosed,,2023-10-10,,[],\"Firefly Image 2 is the next generation of generative AI for imaging, bringing significant advancements to creative control and quality, including new Text to Image capabilities now available in the popular Firefly web app where 90% of users are new to Adobe products.\",,,,creative generation of digital art and images,unknown,text; image,,Firefly Image 2,USA,Adobe,\"AI/ML training, attempting to create abusive, illegal, or confidential content.\",,,unknown,model,https://firefly.adobe.com/,,unknown,unknown,unknown,,,,,,,\nclosed,,2023-10-10,,[],\"Firefly Vector is the world’s first generative AI focused on producing vector graphics, bringing Adobe's vector graphic and generative AI expertise directly into Adobe Illustrator workflows with Text to Vector Graphic.\",,,,creative generation of digital art and images,unknown,text; vector graphic,,Firefly Vector,USA,Adobe,\"AI/ML training, attempting to create abusive, illegal, or confidential content.\",,,unknown,model,https://firefly.adobe.com/,,unknown,unknown,unknown,,,,,,,\nclosed,,2023-10-10,,[],Firefly Design powers instant generation of amazing quality template designs in Adobe Express with the new Text to Template capability.,,,,creative generation of digital art and images,unknown,text; template design,,Firefly Design,USA,Adobe,\"AI/ML training, attempting to create abusive, illegal, or confidential content.\",,,unknown,model,https://firefly.adobe.com/,,unknown,unknown,unknown,,,,,,,\nlimited,,2023-03-21,,\"['Firefly Image 2', 'Firefly Vector', 'Firefly Design']\",\"Adobe Firefly is a standalone web application. It offers new ways to ideate, create, and communicate while significantly improving creative workflows using generative AI.\",,,,creative generation of digital art and images,unknown,,,Firefly,USA,Adobe,\"AI/ML training, attempting to create abusive, illegal, or confidential content.\",,,,application,https://firefly.adobe.com/,,,,,,unknown,unknown,AI-generated creations,https://www.adobe.com/legal/licenses-terms/adobe-gen-ai-user-guidelines.html,unknown,\nopen,,2023-09-17,https://huggingface.co/datasets/uonlp/CulturaX,\"['mC4', 'OSCAR']\",\"CulturaX is a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development.\",unknown,https://huggingface.co/datasets/uonlp/CulturaX/discussions,unknown,,\"mC4, OSCAR\",text,unknown,CulturaX,unknown,\"University of Oregon, Adobe\",The data must not be utilized for malicious or harmful purposes towards humanity.,unknown,[],6.3 trillion tokens,dataset,https://arxiv.org/pdf/2309.09400,,,,,,,,,,,\nclosed,,2021-04-09,,[],Megatron-LM is an autoregressive language model,,,,,unknown,text; text,,Megatron-LM,USA,NVIDIA,,unknown,,1T parameters (dense),model,https://arxiv.org/abs/2104.04473,,unknown,3072 A100 GPUs,84 days,,,,,,,\nopen,,2022-06-17,,\"['YouTube', 'Wikipedia', 'Reddit']\",,,,,,MIT,\"text, video\",,MineDojo,USA,NVIDIA,,,[],\"730k videos, 6k Wikipedia pages, 340k reddit posts\",dataset,https://arxiv.org/abs/2206.08853,,,,,,,,,,,\nopen,,2022-10-06,,\"['T5', 'Mask R-CNN', 'VIMA dataset']\",,,,,,MIT,\"image, text\",,VIMA dataset,unknown,\"NVIDIA, Stanford\",,,[],200M parameters (dense model),dataset,https://vimalabs.github.io/,,,,,,,,,,,\nopen,,2022-10-06,,[],,,,,,MIT,\"image, text; robotics trajectories\",,VIMA,unknown,\"NVIDIA, Stanford\",,,,200M parameters (dense),model,https://vimalabs.github.io/,,,,,,,,,,,\nopen,\"Evaluated on standard LLM benchmarks across a range of fields like reasoning, code generation, and mathematical skills.\",2024-02-27,,[],Nemotron 4 is a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens.,,,,,unknown,\"text; code, text\",unknown,Nemotron 4,USA,Nvidia,,Deduplication and quality filtering techniques are applied to the training dataset.,,15B parameters (dense),model,https://arxiv.org/pdf/2402.16819.pdf,,unknown,3072 H100 80GB SXM5 GPUs across 384 DGX H100 nodes,13 days,,,,,,,\nopen,BioMistral was evaluated on a benchmark comprising 10 established medical question-answering (QA) tasks in English and seven other languages.,2024-02-15,,\"['Mistral', 'PubMed Central']\",\"BioMistral is an open-source Large Language Model tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.\",,https://huggingface.co/BioMistral/BioMistral-7B/discussions,,\"Research in the biomedical domain, especially for medical question-answering tasks.\",Apache 2.0,text; text,,BioMistral,unknown,\"Avignon University, Nantes University\",Prohibited from deploying in production environments for natural language generation or any professional health and medical purposes.,,,7B parameters (dense),model,https://arxiv.org/pdf/2402.10373.pdf,https://huggingface.co/BioMistral/BioMistral-7B,unknown,32 NVIDIA A100 80GB GPUs,unknown,,,,,,,\nopen,,2023-05-30,,['GPT-4'],Transformify Automate is a platform for automated task integration using natural language prompts.,,,,,,,,Transformify Automate,USA,Transformify,,,,,application,https://www.transformify.ai/automate,,,,,,,,text and code,https://www.transformify.ai/legal-stuff,,\nopen,Evaluated on several popular benchmarks and performance in different fields.,2023-10-30,,['SkyPile'],The Skywork series is a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts.,,https://huggingface.co/Skywork/Skywork-13B-base/discussions,,,custom,text; text,,Skywork,China,Kunlun Inc.,,,,13B parameters (dense),model,https://arxiv.org/pdf/2310.19341.pdf,https://huggingface.co/Skywork/Skywork-13B-base,unknown,512 A800-80GB GPUs,39 days,,,,,,,\nopen,,2023-03-01,,['ChatGPT API'],\"When shoppers search for products, the shopping assistant makes personalized recommendations based on their requests. Shop’s new AI-powered shopping assistant will streamline in-app shopping by scanning millions of products to quickly find what buyers are looking for—or help them discover something new.\",,,,,,,,Shop Assistant,USA,Shop,,,,,application,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,,,,,,\nclosed,unknown,2022-04-14,,[],The dataset used to train the Luminous models.,unknown,,unknown,unknown,unknown,text,,Luminous dataset,Germany,Aleph Alpha,unknown,unknown,[],unknown,dataset,,,,,,,,,,,,\nlimited,,2022-04-14,,['Luminous dataset'],Luminous is a family of multilingual language models,,,,,,text; text,,Luminous,Germany,Aleph Alpha,,,,200B parameters (dense),model,https://twitter.com/Aleph__Alpha/status/1514576711492542477,,unknown,unknown,unknown,,,,,,,\nlimited,,2021-09-30,,['Luminous'],The Aleph Alpha API serves a family of text-only language models (Luminous) and multimodal text-and-image models (Magma).,,unknown,,unknown,,,unknown,Aleph Alpha API,Germany,Aleph Alpha,unknown,,,,application,https://www.aleph-alpha.com/,,,,,,unknown,unknown,The text models provide text outputs given text inputs. The multimodal models provide text completions given text and image inputs.,https://www.aleph-alpha.com/terms-conditions,unknown,\nopen,Evaluated on the OKVQA benchmark as a fully open-ended generative task.,2022-10-24,,\"['GPT-J', 'CLIP']\",An autoregressive VL model that is able to generate text from an arbitrary combination of visual and textual input,,,,,MIT,\"image, text; text\",,MAGMA,Germany,Aleph Alpha,,,,6B parameters (dense),model,https://arxiv.org/pdf/2112.05253.pdf,,,32 A100 GPUs,,,,,,,,\nopen,\"Extensive evaluations were done with ablation experiments performed on pre-training benchmarks such as lambada, triviaqa, hellaswag, winogrande, webqs, arc, and boolq. Direct comparisons were also performed with applications like GPT and Llama 2.\",2024-09-08,,[],\"Pharia-1-LLM-7B is a model that falls within the Pharia-1-LLM model family. It is designed to deliver short, controlled responses that match the performance of leading open-source models around 7-8 billion parameters. The model is culturally and linguistically tuned for German, French, and Spanish languages. It is trained on carefully curated data in line with relevant EU and national regulations. The model shows improved token efficiency and is particularly effective in domain-specific applications, especially in the automotive and engineering industries. It can also be aligned to user preferences, making it appropriate for critical applications without the risk of shut-down behaviour.\",,Feedback can be sent to support@aleph-alpha.com.,,\"The model is intended for use in domain-specific applications, particularly in the automotive and engineering industries. It can also be tailored to user preferences.\",Aleph Open,text; text,Unknown,Pharia-1-LLM-7B,Germany,Aleph Alpha,Unknown,The model comes with additional safety guardrails via alignment methods to ensure safe usage. Training data is carefully curated to ensure compliance with EU and national regulations.,,7B parameters,model,https://aleph-alpha.com/introducing-pharia-1-llm-transparent-and-compliant/#:~:text=Pharia%2D1%2DLLM%2D7B,unknown,Unknown,Unknown,Unknown,,,,,,,\nclosed,,2021-01-12,,[],,,,,,unknown,\"image, text\",,Wu Dao dataset,China,Beijing Academy of Artificial Intelligence,,,[],unknown,dataset,https://www.tsinghua.edu.cn/en/info/1420/10473.htm,,,,,,,,,,,\nclosed,,2021-01-12,,['Wu Dao dataset'],,,,,,unknown,\"image, text\",,Wu Dao 2.0,China,Beijing Academy of Artificial Intelligence,,,,1.75T parameters (dense),model,https://www.tsinghua.edu.cn/en/info/1420/10473.htm,,,,,,,,,,,\nopen,Evaluated on objective and reliability metrics.,2023-10-26,,\"['Vicuna', 'JudgeLM Dataset']\",JudgeLM is a fine-tuned to be a scalable judge to evaluate LLMs efficiently and effectively in open-ended benchmarks.,,https://huggingface.co/BAAI/JudgeLM-13B-v1.0/discussions,,Research on evaluating the performance of large language models and chatbots.,Apache 2.0,text; text,,JudgeLM,China,Beijing Academy of Artificial Intelligence,,,,13B parameters (dense),model,https://arxiv.org/pdf/2310.17631.pdf,https://huggingface.co/BAAI/JudgeLM-13B-v1.0,unknown,8 A100 40GB NVIDIA GPUs,unknown,,,,,,,\nopen,,2023-10-26,,\"['Alpaca', 'GPT-4', 'Dolly', 'ShareGPT', 'LLaMA', 'Vicuna']\",\"JudgeLM Dataset is a novel dataset replete with a rich variety of seed tasks, comprehensive answers from modern LLMs, answers’ grades from the teacher judge, and detailed reasons for judgments.\",,https://huggingface.co/datasets/BAAI/JudgeLM-100K/discussions,,To be used to conduct instruction-tuning for language models and make the language model able to judge open-ended answer pairs.,CC BY NC 4.0,\"text, text\",,JudgeLM Dataset,China,Beijing Academy of Artificial Intelligence,,,[],105k judge samples,dataset,https://huggingface.co/datasets/BAAI/JudgeLM-100K,,,,,,,,,,,\nopen,Compared to other segmentation models across different modalities on BraTS2023 dataset.,2024-01-25,,[],\"SegMamba is a novel 3D medical image Segmentation Mamba model, designed to effectively capture long-range dependencies within whole volume features at every scale.\",,,,,Apache 2.0,image; text,unknown,SegMamba,unknown,\"Hong Kong University of Science and Technology (Guangzhou + original), Beijing Academy of Artificial Intelligence\",,unknown,,unknown,model,https://arxiv.org/pdf/2401.13560v2.pdf,,unknown,4 NVIDIA A100 GPUs,1000 epochs,,,,,,,\nopen,\"Evaluated on standard datasets in multilingual, cross-lingual, long document retrieval, and Q&A domains.\",2024-02-05,,[],\"BGE M3 Embedding is a new embedding model that can support more than 100 working languages, leading to new state-of-the-art performances on multi-lingual and cross-lingual retrieval tasks.\",,https://huggingface.co/BAAI/bge-m3/discussions,,,MIT,text; text,unknown,BGE M3 Embedding,unknown,\"Beijing Academy of Artificial Intelligence, University of Science and Technology of China\",,unknown,,unknown,model,https://arxiv.org/pdf/2402.03216.pdf,https://huggingface.co/BAAI/bge-m3,unknown,32 A100 40GB GPUs,\"20,000 steps\",,,,,,,\nopen,Evaluated on zero-shot classification performance across multiple image classification benchmarks.,2024-02-06,,['CLIP'],\"As of release, EVA-CLIP is the largest and most powerful open-source CLIP model to date, with 18 billion parameters.\",,https://huggingface.co/BAAI/EVA-CLIP-8B-448/discussions,,,MIT,\"image, text; text\",unknown,EVA-CLIP,unknown,\"Beijing Academy of Artificial Intelligence, Tsinghua University\",,,,18B parameters (dense),model,https://arxiv.org/pdf/2402.04252.pdf,https://huggingface.co/BAAI/EVA-CLIP-8B-448,unknown,384 A100 40GB GPUs,unknown,,,,,,,\nopen,,2023-10-02,,['Falcon-180B'],Vulture is a further fine-tuned causal Decoder-only LLM built by Virtual Interactive (VILM) on top of Falcon.,,https://huggingface.co/vilm/vulture-180b/discussions,,,Apache 2.0,text; text,unknown,Vulture,USA,Virtual Interactive,Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,,,180B parameters (dense),model,https://huggingface.co/vilm/vulture-180b,https://huggingface.co/vilm/vulture-180b,unknown,unknown,3000 A100 hours,,,,,,,\nopen,Evaluated on wide range of tasks using own evaluation benchmarks.,2022-11-20,,\"['AlphaFold2', 'OpenProteinSet']\",OpenFold is an open source recreation of AlphaFold2.,,,,,CC BY 4.0,amino acid sequence; protein structure,,OpenFold,USA,Columbia,,,,,model,https://www.biorxiv.org/content/10.1101/2022.11.20.517210v2,,unknown,Single A100 NVIDIA GPU,\"50,000 GPU hours\",,,,,,,\nopen,Evaluated on the object hallucination benchmark and compared to GPT-4V.,2023-10-11,,\"['CLIP', 'Vicuna']\",Ferret is a Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.,,,,,Apple,\"image, text; image, text\",,Ferret,unknown,\"Columbia, Apple AI\",,,,13B parameters,model,https://arxiv.org/pdf/2310.07704.pdf,,unknown,8 A100 GPUs,2.5 to 5 days,,,,,,,\nclosed,,,,[],,,,,,unknown,text,,Neeva dataset,USA,Neeva,,,[],unknown,dataset,https://neeva.com/index,,,,,,,,,,,\nclosed,,,,['Neeva dataset'],,,,,,unknown,text; text,,Neeva model,USA,Neeva,,,,unknown,model,https://neeva.com/index,,,,,,,,,,,\nopen,,2023-01-06,,['Neeva model'],NeevaAI is an AI-powered search tool that combines the capabilities of LLMs with Neeva's independent in-house search stack to create a unique and transformative search experience.,,,,,Custom,,,NeevaAI,USA,Neeva,,,,,application,https://neeva.com/blog/introducing-neevaai,,,,,,,,,https://neeva.com/terms,,\nclosed,,2023-07-19,,[],EXAONE 2.0 is a multimodal artificial intelligence that can be used to help develop new materials and medicines.,,,,,unknown,\"image, text; image, text\",,EXAONE 2.0,South Korea,LG AI Research,,,,unknown,model,https://www.lgresearch.ai/exaone,,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on the COCO dataset.,2023-04-28,,['LAION-5B'],A text-to-image cascaded pixel diffusion model released in conjunction with AI research lab DeepFloyd.,,https://huggingface.co/DeepFloyd/IF-I-XL-v1.0/discussions,,,custom,text; image,,DeepFloyd IF,UK,Stability AI,,,,4.3B parameters (dense),model,https://stability.ai/blog/deepfloyd-if-text-to-image-model,https://huggingface.co/DeepFloyd/IF-I-XL-v1.0,,,,,,,,,,\nopen,,2023-04-20,,\"['StableLM-Alpha dataset', 'Alpaca dataset', 'gpt4all dataset', 'ShareGPT52K dataset', 'Dolly dataset', 'HH dataset']\",Large language models trained on up to 1.5 trillion tokens.,,,,,Apache 2.0,text; text,,StableLM,UK,Stability AI,,,,7B parameters (dense),model,https://github.com/Stability-AI/StableLM,,,,,,,,,,,\nopen,,2022-08-22,,[],Stable Diffusion is a generative software that creates images from text prompts.,,https://huggingface.co/CompVis/stable-diffusion/discussions,,,custom,,,Stable Diffusion,UK,Stability AI,,,,,application,https://stability.ai/blog/stable-diffusion-public-release,,,,,,,,image,,,\nopen,,2023-07-26,,[],\"Stable Diffusion XL is an updated version of Stable Diffusion, and creates descriptive images with shorter prompts and generate words within images.\",,,,,Open Rail++,,,Stable Diffusion XL,UK,Stability AI,,,,,application,https://stability.ai/stablediffusion,,,,,,,,image,,,\nlimited,Evaluated via a user study comparing preferences between Stable Video Diffusion and competing text-to-video models.,2023-11-21,,['Large Video Dataset'],Stable Video Diffusion is a latent diffusion model trained to generate short video clips from an image conditioning.,,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/discussions,,Intended for research purposes only.,custom,text; video,,Stable Video Diffusion,UK,Stability AI,Using the model to generate representations of real-world people or events.,,,unknown,model,https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf,https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt,unknown,unknown,unknown,,,,,,,\nclosed,Large Video Dataset compared to publicly available research datasets on general statistics before and after filtering.,2023-11-21,,\"['WebVid-10M', 'CoCa', 'V-BLIP']\",\"Large Video Dataset is the dataset that trained Stable Video Diffusion, consisting of over 212 years of content.\",,,,,unknown,video with caption,,Large Video Dataset,UK,Stability AI,,\"Dataset annotated with dense optical flow, and low optical flow videos are removed.\",[],580M annotated video clip pairs,dataset,https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf,,,,,,,,,,,\nopen,,2023-11-01,,[],Sky Replacer is an exciting new tool that allows users to replace the color and aesthetic of the sky in their original photos with a selection of nine alternatives to improve the overall look and feel of the image.,,,,,unknown,,,Sky Replacer,UK,Stability AI,,,,,application,https://clipdrop.co/real-estate/sky-replacer,,,,,,,,image,https://stability.ai/terms-of-use,,\nopen,Evaluated on standard LLM benchmarks and in multilingual tasks compared to similarly sized open-source models.,2024-01-19,,\"['RedPajama-Data', 'The Pile', 'RefinedWeb', 'The Stack', 'OpenWebText', 'OpenWebMath']\",\"StableLM 2 is a state-of-the-art 1.6 billion parameter small language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch.\",,https://huggingface.co/stabilityai/stablelm-2-1_6b/discussions,,The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications.,custom,text; text,unknown,StableLM 2,UK,Stability AI,,,,1.6B parameters (dense),model,https://stability.ai/news/introducing-stable-lm-2,https://huggingface.co/stabilityai/stablelm-2-1_6b,11 tCO2eq,512 NVIDIA A100 40GB GPUs,92k GPU hours,,,,,,,\nopen,Evaluated on image generation benchmarks in comparison to equal and smaller-sized models.,2024-01-16,,[],\"Stable Cascade is built upon the Würstchen architecture and its main difference to other models, like Stable Diffusion, is that it is working at a much smaller latent space.\",,https://huggingface.co/stabilityai/stable-cascade/discussions,,The model is intended for research purposes for now.,custom,text; image,unknown,Stable Cascade,UK,Stability AI,\"The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model. The model should not be used in any way that violates Stability AI's Acceptable Use Policy.\",,,unknown,model,https://huggingface.co/stabilityai/stable-cascade,https://huggingface.co/stabilityai/stable-cascade,\"2,276 kgCO2eq\",Some number of A100 GPUs,\"24,602 A100 GPU hours\",,,,,,,\nopen,unknown,2024-03-18,,['Objaverse'],\"Stable Video 3D (SV3D) is a generative model based on Stable Video Diffusion that takes in a still image of an object as a conditioning frame, and generates an orbital video of that object.\",,https://huggingface.co/stabilityai/sv3d/discussions,,This model is intended to be used for generating orbital videos of objects from still images.,StabilityAI Non-Commercial Research Community License,image; video,unknown,Stable Video 3D,UK,Stability AI,\"The model should not be used for generating factual or true representations of people or events, or in any way that violates Stability AIs Acceptable Use Policy.\",unknown,,unknown,model,https://stability.ai/news/introducing-stable-video-3d,https://huggingface.co/stabilityai/sv3d,unknown,unknown,unknown,,,,,,,\nopen,,2024-04-03,,['AudioSparx'],\"Stable Audio 2.0 sets a new standard in AI-generated audio, producing high-quality, full tracks with coherent musical structure up to three minutes in length at 44.1kHz stereo.\",,,,\"It can be used to generate melodies, backing tracks, stems, and sound effects.\",unknown,\"audio, text; audio\",Advanced content recognition is used to maintain compliance and prevent copyright infringement.,Stable Audio 2.0,UK,Stability AI,Uploading copyrighted material for transformation.,\"To protect creator copyrights, for audio uploads, Stability AI partners with Audible Magic to use their content recognition (ACR) technology to power real-time content matching and prevent copyright infringement. Opt-out requests were honored during the training phase.\",,unknown,model,https://stability-ai.squarespace.com/news/stable-audio-2-0,,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on several benchmark LLM tasks,2023-12-12,,\"['Arxiv', 'Books', 'C4', 'RefinedWeb', 'StarCoder', 'StackExchange', 'Wikipedia']\",\"Amber is the first model in the LLM360 family, an initiative for comprehensive and fully open-sourced LLMs, where all training details, model checkpoints, intermediate results, and additional analyses are made available to the community.\",,https://huggingface.co/LLM360/Amber/discussions,,to support open and collaborative AI research by making the full LLM training process transparent.,Apache 2.0,text; text,unknown,Amber,International,LLM360,,,,7B parameters (dense),model,https://www.llm360.ai/,https://huggingface.co/LLM360/Amber,unknown,\"56 DGX A100 nodes, each equipped with 4 80GB A100 GPUs\",unknown,,,,,,,\nopen,\"Evaluated on English and coding tasks and benchmarks, and outperforms LLaMA 2 in some.\",2023-12-12,,\"['SlimPajama', 'StarCoder']\",CrystalCoder is a language model with a balance of code and text data that follows the initiative under LLM360 of its training process being fully transparent.,,https://huggingface.co/LLM360/CrystalCoder/discussions,,to support open and collaborative AI research by making the full LLM training process transparent.,Apache 2.0,\"text; code, text\",unknown,CrystalCoder,International,LLM360,,,,7B parameters (dense),model,https://www.llm360.ai/,https://huggingface.co/LLM360/CrystalCoder,unknown,\"Trained on the Cerebras Condor Galaxy 1 (CG-1), a 4 exaFLOPS, 54 million core, 64-node cloud AI supercomputer.\",unknown,,,,,,,\nopen,\"Evaluated on the LLM360 Performance and Evaluation Collection that checks standard best practice benchmarks, medical, math, and coding knowledge.\",2024-05-29,,[],\"K2 is a 65 billion parameter large language model inspired by the Llama 2 65B model. The model is also supported with a suite of research tools, tutorials and step-by-step guides for learning pre-training and fine-tuning techniques.\",,https://huggingface.co/LLM360/K2/discussions,,The model is intended for learning pre-training techniques or enhancing research capabilities in large language models.,Apache 2.0,text; text,unknown,K2,International,LLM360,unknown,unknown,,65B parameters,model,https://www.llm360.ai/paper2.pdf,https://huggingface.co/LLM360/K2,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on human and machine benchmarks in comparison to established image models as a baseline.,2024-01-26,,\"['CLIP', 'LAION-400M', 'Wukong', 'Stable Diffusion XL']\",Taiyi Diffusion XL is a new Chinese and English bilingual text-to-image model which is developed by extending the capabilities of CLIP and Stable-DiffusionXL.,,https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-XL-3.5B/discussions,,,Apache 2.0,text; image,unknown,Taiyi Diffusion XL,unknown,\"International Digital Economy Academy, South China University of Technology, University of Science and Technology of China\",,unknown,,3.5B parameters (dense),model,https://arxiv.org/pdf/2401.14688.pdf,https://huggingface.co/IDEA-CCNL/Taiyi-Stable-Diffusion-XL-3.5B,unknown,unknown,unknown,,,,,,,\nclosed,,2023-03-15,,[],The dataset used to train AssemblyAI's Conformer-1 model.,,,\"\"\"our team curated a dataset of 650K hours of English audio - consisting of proprietary internal datasets and various sources from the internet\"\"\n\",,unknown,\"audio, text\",,Conformer-1 dataset,USA,AssemblyAI,,,[],650K hours audio (60TB),dataset,https://www.assemblyai.com/blog/conformer-1/,,,,,,,,,,,\nlimited,\"In order to evaluate the accuracy and robustness of Conformer-1, we sourced 60+ hours of human labeled audio data covering popular speech domains such as call centers, podcasts, broadcasts, and webinars. We then calculated the Word Error Rate (WER) of Conformer-1 against these datasets, and compared the results against Whisper and a number of other ASR models. To ground our results against popular open source speech recognition benchmarks, we also performed the same WER analysis against a number of academic datasets.\",2023-03-15,,['Conformer-1 dataset'],\"Conformer-1 is a state-of-the-art speech recognition model trained on 650K hours of audio data that achieves near human-level performance and robustness across a variety of data, making up to 43% fewer errors on noisy data than other ASR models.\",,,,,unknown,\"audio, text; audio, text\",,Conformer-1,USA,AssemblyAI,,,,300M parameters (dense),model,https://www.assemblyai.com/blog/conformer-1/,,,,,,,,,,,\nlimited,,2022,,['Anthropic API'],AssemblyAI uses Claude and Anthropic's model to transcribe and understand audio data at scale.,,,,,custom,,,AssemblyAI,USA,AssemblyAI,,,,,application,https://www.assemblyai.com/,,,,,,,,,https://www.assemblyai.com/legal/terms-of-service,,\nopen,,2023-03-15,,['Conformer-1'],API to access the AssemblyAI's Conformer-1 model.,,,,Speech recognition,custom,,,Conformer-1 API,USA,AssemblyAI,,,,,application,https://www.assemblyai.com/blog/conformer-1/,,,,,,,,,https://www.assemblyai.com/legal/terms-of-service,,\nopen,unknown,2023-10-31,,\"['AI-HUB dataset', 'National Institute of Korean Language dataset']\",Midm is a pre-trained Korean-English language model developed by KT. It takes text as input and creates text. The model is based on Transformer architecture for an auto-regressive language model.,,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1/discussions,,It is expected to be used for various research purposes.,CC-BY-NC 4.0,text; text,unknown,Midm,South Korea,KT Corporation,It cannot be used for commercial purposes.,\"KT tried to remove unethical expressions such as profanity, slang, prejudice, and discrimination from training data.\",,7B parameters,model,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1,https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluated in 1-shot against the PaLM models, with the tasks of the paper \"\"Language models are few-shot learners\"\" (Brown et al., 2020); (2) on a small set of few-shot tasks reported by the GPT-4 paper; (3) against state-of-the-art models across common sense, question answering, and code tasks; (4) against models which also report results from the EAI Harness, for which we are able to compare with identical prompts and metrics.\",2023-06-14,,['RefinedWeb'],\"Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora.\",,https://huggingface.co/tiiuae/falcon-40b/discussions,,Research on large language models; as a foundation for further specialization for specific use cases.,Apache 2.0,text; text,None,Falcon-40B,UAE,UAE Technology Innovation Institute,irresponsible or harmful use or production use without adequate assessment of risks and mitigation.,,,40B parameters (dense),model,https://arxiv.org/pdf/2311.16867.pdf,https://huggingface.co/tiiuae/falcon-40b,unknown,384 A100 40GB GPUs,2 months,,,,,,,\nopen,,2023-06-01,https://huggingface.co/datasets/tiiuae/falcon-refinedweb,[],RefinedWeb is a high-quality five trillion tokens web-only English pretraining dataset.,,,,,custom,text,,RefinedWeb,UAE,UAE Technology Innovation Institute,,,[],600B tokens,dataset,https://arxiv.org/pdf/2306.01116.pdf,,,,,,,,,,,\nopen,\"Falcon-180B outperforms LLaMA-2, StableLM, RedPajama, MPT on the Open LLM Leaderboard at https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.\",2023-09-06,,['RefinedWeb'],\"Falcon-180B is a 180B parameters causal decoder-only model built by TII and trained on 3,500B tokens of RefinedWeb enhanced with curated corpora.\",,https://huggingface.co/tiiuae/falcon-180b/discussions,,Research on large language models; as a foundation for further specialization for specific use cases.,unknown,text; text,None,Falcon-180B,UAE,UAE Technology Innovation Institute,Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,,,180B parameters (dense),model,https://arxiv.org/pdf/2311.16867.pdf,https://huggingface.co/tiiuae/falcon-180B,,4096 A100 40GB GPUs,9 months,,,,,,,\nopen,unknown,2024-03-18,,['Animagine XL 3.0'],\"An open-source, anime-themed text-to-image model enhanced to generate higher quality anime-style images with a broader range of characters from well-known anime series, an optimized dataset, and new aesthetic tags for better image creation.\",,https://huggingface.co/cagliostrolab/animagine-xl-3.1/discussions,,\"Generating high-quality anime images from textual prompts. Useful for anime fans, artists, and content creators.\",Fair AI Public License 1.0-SD,text; image,unknown,Animagine XL 3.1,USA,Cagliostro Research Lab,Not suitable for creating realistic photos or for users who expect high-quality results from short or simple prompts.,\"The model undergoes pretraining, first stage finetuning, and second stage finetuning for refining and improving aspects such as hand and anatomy rendering.\",,unknown,model,https://cagliostrolab.net/posts/animagine-xl-v31-release,https://huggingface.co/cagliostrolab/animagine-xl-3.1,unknown,2x A100 80GB GPUs,\"Approximately 15 days, totaling over 350 GPU hours.\",,,,,,,\nopen,,2023-05-02,,['ChatGPT'],AI chatbot on Nextdoor that helps users write more clear and conscientious posts.,,,,to be used to help make the Nextdoor experience more positive for users,unknown,,,Nextdoor Assistant,USA,Nextdoor,,,,,application,https://help.nextdoor.com/s/article/Introducing-Assistant,,,,,,,,natural language text guidance,,,\nopen,\"Tested on several benchmarks, most notably Python benchmark HumanEval.\",2023-05-09,,['The Stack'],\"StarCoder is a Large Language Model for Code (Code LLM) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks.\",,https://huggingface.co/bigcode/starcoder/discussions,,\"As a foundation model to fine-tune and create more specialized models that support use cases such as code completion, fill-in-the-middle, and text summarization. Can also be used as a Tech Assistant prompt and not as an instruction model given training limitations.\",BigCode Open RAIL-M v1.0,code; code,,StarCoder,International,BigCode,See BigCode Open RAIL-M license and FAQ,\"No specific quality control is mentioned in model training, though details on data processing and how the tokenizer was trained are provided in the paper.\",,15.5B parameters (dense),model,https://arxiv.org/pdf/2305.06161.pdf,https://huggingface.co/bigcode/starcoder,16.68 tons of CO2eq,512 A100 80GB GPUs distributed across 64 nodes,\"320,256 GPU hours\",,,,,,,\nopen,Evaluated on MultiPL-E system benchmarks.,2023-02-24,,\"['The Stack', 'BigCode Dataset']\",Multilingual code model derived from the findings of BigCode Project analysis on Github stars' association to data quality.,,https://huggingface.co/bigcode/santacoder/discussions,,The model was trained on GitHub code. As such it is not an instruction model and commands do not work well. You should phrase commands like they occur in source code such as comments or write a function signature and docstring and let the model complete the function body.,BigCode Open RAIL-M v1,code; code,,SantaCoder,International,BigCode,See BigCode Open RAIL-M license and FAQ,,,1.1B parameters (dense),model,https://arxiv.org/pdf/2301.03988.pdf,https://huggingface.co/bigcode/santacoder,124 kg of CO2eq,96 NVIDIA Tesla V100 GPUs,\"14,284 GPU hours\",,,,,,,\nopen,Evaluated models trained on The Stack on HumanEval and MBPP and compared against similarly-sized models.,2022-11-20,https://huggingface.co/datasets/bigcode/the-stack,['GitHub'],\"The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The Stack serves as a pre-training dataset for Code LLMs, i.e., code-generating AI systems which enable the synthesis of programs from natural language descriptions as well as other from code snippets.\",,https://huggingface.co/datasets/bigcode/the-stack/discussions,,creating code LLMs,\"The Stack is a collection of source code from repositories with various licenses. Any use of all or part of the code gathered in The Stack must abide by the terms of the original licenses, including attribution clauses when relevant. Provenance information is provided for each data point.\",code,,The Stack,International,BigCode,See https://huggingface.co/datasets/bigcode/the-stack,allowed users whose data were part of The Stack's training data to opt-out,['https://huggingface.co/datasets/bigcode/the-stack/viewer/default/train'],6 TB,dataset,https://arxiv.org/pdf/2211.15533.pdf,,,,,,,,,,,\nopen,See https://arxiv.org/pdf/2402.19173.pdf,2024-02-28,,['The Stack v2'],\"StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded. The training was carried out using the Fill-in-the-Middle objective on 4+ trillion tokens.\",,https://huggingface.co/bigcode/starcoder2-15b/discussions,,\"The model was trained on GitHub code as well as additional selected data sources such as Arxiv and Wikipedia. As such it is not an instruction model and commands like \"\"Write a function that computes the square root.\"\" do not work well. Intended to generate code snippets from given context, but not for writing actual functional code directly.\",BigCode OpenRail-M,code; text,unknown,StarCoder2-15B,International,BigCode,See BigCode Open RAIL-M license and FAQ,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,,15B parameters (dense),model,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,https://huggingface.co/bigcode/starcoder2-15b,unknown,1024 x H100 GPUs,unknown,,,,,,,\nopen,See https://arxiv.org/pdf/2402.19173.pdf,2024-02-28,,['The Stack v2'],\"StarCoder2-7B model is a 7B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3.5+ trillion tokens.\",,https://huggingface.co/bigcode/starcoder2-7b/discussions,,\"Intended to generate code snippets from given context, but not for writing actual functional code directly. The model has been trained on source code from 17 programming languages. The predominant language in source is English although other languages are also present. As such the model is capable of generating code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient and contain bugs or exploits. See the paper for an in-depth discussion of the model limitations.\",BigCode OpenRail-M,code; text,unknown,StarCoder2-7B,International,BigCode,See BigCode Open RAIL-M license and FAQ,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,,7B parameters (dense),model,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,https://huggingface.co/bigcode/starcoder2-7b,\"29,622.83 kgCO2eq\",432 H100 GPUs,\"145,152 hours (cumulative)\",,,,,,,\nopen,See https://arxiv.org/pdf/2402.19173.pdf,2024-02-28,,['The Stack v2'],\"StarCoder2-3B model is a 3B parameter model trained on 17 programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 3+ trillion tokens.\",,https://huggingface.co/bigcode/starcoder2-3b/discussions,,\"Intended to generate code snippets from given context, but not for writing actual functional code directly. The model has been trained on source code from 17 programming languages. The predominant language in source is English although other languages are also present. As such the model is capable of generating code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient and contain bugs or exploits. See the paper for an in-depth discussion of the model limitations.\",BigCode OpenRail-M,code; text,unknown,StarCoder2-3B,International,BigCode,See BigCode Open RAIL-M license and FAQ,The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.,,3B parameters (dense),model,https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html,https://huggingface.co/bigcode/starcoder2-3b,\"16,107.01 kgCO2eq\",160 A100 GPUs,\"97,120 hours (cumulative)\",,,,,,,\nopen,\"Compared with other multi-task, instruction-following agents.\",2023-11-10,,[],\"JARVIS-1 is an open-world agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, all within the popular yet challenging open-world Minecraft universe.\",,,,,unknown,text; in-game actions,,JARVIS-1,China,Peking University Institute for Artificial Intelligence,,,,unknown,model,https://arxiv.org/pdf/2311.05997.pdf,,unknown,unknown,unknown,,,,,,,\nopen,unknown,2021-04-01,,[],\"WebVid-10M is a large-scale dataset of short videos with textual descriptions sourced from stock footage sites.\n\",unknown,Feedback can be given by emailing at maxbain at robots.ox.ac.uk,Video URLs and textual description annotations,\"\"\"Provided you keep to these rules, the University grants you (the researcher) a non-exclusive and non-transferable licence to use the content free of charge strictly for non-commercial research (i.e., whose output artefacts are not incorporated in commercial products) for 12 months.\"\"\n\",WebVid Dataset Terms,\"text, video\",,WebVid-10M,UK,University of Oxford,\"- Authors note the following prohibited uses: \"\"You must not use the content other than for the Permitted Purpose in strict conformity with these terms and any other reasonable instructions of the University. You must not, except as may be strictly necessary for carrying out the Permitted Purpose, provide or otherwise make available content to any third party or allow use of it or them by or on behalf of any third party, in whole or in part, whether by way of sale, resale, loan, transfer, hire or any other form of exploitation; or attempt to identify any living or deceased individual from the content.\"\" [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md)\n- Authors also note the following limitations of the dataset: \"\"We note that data sourced from the web may be prone to biases and may contain graphic content. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data.\"\" [[Disclaimer]](https://github.com/m-bain/webvid#disclaimer-%EF%B8%8F)\n\",unknown,[],\"10.7M video-text pairs, 52K hours video\",dataset,https://m-bain.github.io/webvid-dataset/,,,,,,,,,,,\nopen,unknown,2021-04-01,,['WebVid-10M'],\"WebVid-2M is a large-scale dataset of 2.5M short videos with textual descriptions sourced from stock footage sites. A subset of the WebVid-10M dataset.\n\",unknown,Feedback can be given by emailing at maxbain at robots.ox.ac.uk,Video URLs and textual description annotations,\"\"\"Provided you keep to these rules, the University grants you (the researcher) a non-exclusive and non-transferable licence to use the content free of charge strictly for non-commercial research (i.e., whose output artefacts are not incorporated in commercial products) for 12 months.\"\"\n\",WebVid Dataset Terms,\"text, video\",,WebVid-2M,UK,University of Oxford,\"- Authors note the following prohibited uses: \"\"You must not use the content other than for the Permitted Purpose in strict conformity with these terms and any other reasonable instructions of the University. You must not, except as may be strictly necessary for carrying out the Permitted Purpose, provide or otherwise make available content to any third party or allow use of it or them by or on behalf of any third party, in whole or in part, whether by way of sale, resale, loan, transfer, hire or any other form of exploitation; or attempt to identify any living or deceased individual from the content.\"\" [[Terms of Access]](https://github.com/m-bain/webvid/blob/main/TERMS.md)\n- Authors also note the following limitations of the dataset: \"\"We note that data sourced from the web may be prone to biases and may contain graphic content. Please be careful of unintended societal, gender, racial and other biases when training or deploying models trained on this data.\"\" [[Disclaimer]](https://github.com/m-bain/webvid#disclaimer-%EF%B8%8F)\n\",unknown,[],\"2.5M video-text pairs, 13K hours video\",dataset,https://m-bain.github.io/webvid-dataset/,,,,,,,,,,,\nclosed,,,,[],,,,,,unknown,text,,You dataset,USA,You,,,[],unknown,dataset,https://you.com/,,,,,,,,,,,\nclosed,,,,['You dataset'],,,,,,unknown,text; text,,You model,USA,You,,,,unknkown,model,https://you.com/,,,,,,,,,,,\nopen,,,,['You model'],You.com is a search engine built on artificial intelligence that provides users with a customized search experience while keeping their data 100% private.,,,,,unknown,,,You Search,USA,You,,,,,application,https://you.com/,,,,,,,,,,,\nopen,\"\"\"We evaluate our models on the PILE validation set comprising 380M tokens. We also evaluate the public checkpoints of Pythia, Eleuther (2022); OPT, Zhang et al. (2022); GPT-NeoX 20B, Black et al. (2022); and GPT-J 6B, Wang & Komatsuzaki (2021). We performed upstream (pre-training) evaluations of text prediction cross-entropy using the Pile validation and test splits. We performed downstream evaluations of text generation accuracy on standardized tasks using the Eleuther lm-evaluation-harness.\"\" [[Evaluations]] (https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#evaluations).\n\",2023-03-28,,['The Pile'],\"A Family of Open, Compute-efficient, Large Language Models. The family includes 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B models. All models in the Cerebras-GPT family have been trained in accordance with Chinchilla scaling laws (20 tokens per model parameter). [[Cerebras Blog Post]](https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models)\n\",,,,\"\"\"The primary intended use is to further research into large language models. These models can be used as a foundation model for NLP, applications, ethics, and alignment research. Our primary intended users are researchers who are working to improve LLMs and practitioners seeking reference implementations, training setups, hyperparameters, or pre-trained models. We release these models with a fully permissive Apache license for the community to use freely.\"\" [[Uses and Limitations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#uses-and-limitations).\n\",Apache 2.0,text; text,,Cerebras-GPT,USA,Cerebras,\"Authors note the following limitations of the model: \"\"Cerebras-GPT models are trained on the Pile, with English language only, and are not suitable for machine translation tasks. Cerebras-GPT models have not been tuned for human-facing dialog applications like chatbots and will not respond to prompts in a similar way to models that have received instruction tuning or reinforcement learning from human feedback (RLHF) like Flan-T5 or ChatGPT.\"\" [[Uses and Limitations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#out-of-scope-use).\n\",\"The Pile dataset has been thoroughly analyzed from various ethical standpoints such as toxicity analysis, gender bias, pejorative content, racially sensitive content etc. Only mitigations in standard Pile dataset pre-processing were employed when pre-training Cerebras-GPT. [[Risk, Bias, Ethical Considerations]](https://github.com/Cerebras/modelzoo/tree/main/modelzoo/transformers/pytorch/gpt3/configs/Cerebras_GPT#risk-bias-ethical-considerations)\n\",,13B parameters (dense),model,https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/,https://huggingface.co/cerebras/Cerebras-GPT-13B,,16x Cerebras CS-2 wafer scale systems,,,,,,,,\nopen,Evaluated on standard English LLM benchmarks and adapted Arabic LLM benchmarks.,2023-08-30,,\"['GPT-3', 'The Pile']\",Jais is the world’s most advanced Arabic LLM as of its release.,,,,Jais is released with the aim to stimulate research and development in the Arabic NLP community.,Apache 2.0,text; text,unknown,Jais,unknown,\"Inception Institute of Artificial Intelligence, Cerebras, Mohamed bin Zayed University of Artificial Intelligence\",\"Generating or endorsing hate speech, disseminating false information, engaging in illegal activities, managing sensitive data, attempting language generalization beyond Arabic and English, and making critical decisions with high stakes.\",,,13B parameters (dense),model,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,unknown,Condor Galaxy Supercomputer,unknown,,,,,,,\nopen,Evaluated on standard English LLM benchmarks and adapted Arabic LLM benchmarks.,2023-08-30,,\"['GPT-3', 'The Pile']\",\"Jais Chat is an instruction-tuned version of Jais, optimized for dialog interaction.\",,,,Jais Chat is released with the aim to stimulate research and development in the Arabic NLP community.,Apache 2.0,text; text,unknown,Jais Chat,unknown,\"Inception Institute of Artificial Intelligence, Cerebras, Mohamed bin Zayed University of Artificial Intelligence\",\"Generating or endorsing hate speech, disseminating false information, engaging in illegal activities, managing sensitive data, attempting language generalization beyond Arabic and English, and making critical decisions with high stakes.\",,,13B parameters (dense),model,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,https://inceptioniai.org/jais/docs/Technicalpaper.pdf,unknown,Condor Galaxy Supercomputer from Cerebras,unknown,,,,,,,\nopen,Evaluated on standard LLM benchmarks in comparison to similar-sized models.,2023-07-24,,['SlimPajama'],Bittensor Language Model is a 3 billion parameter language model with an 8k context length trained on 627B tokens of SlimPajama.,,https://huggingface.co/cerebras/btlm-3b-8k-base/discussions,,,Apache 2.0,text; text,unknown,Bittensor Language Model,USA,Cerebras,,,,3B parameters (dense),model,https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/,https://huggingface.co/cerebras/btlm-3b-8k-base,unknown,unknown,unknown,,,,,,,\nopen,,2023-06-09,https://huggingface.co/datasets/cerebras/SlimPajama-627B,['RedPajama-Data'],\"As of release, SlimPajama is the largest extensively deduplicated, multi-corpora, open-source dataset for training large language models.\",,https://huggingface.co/datasets/cerebras/SlimPajama-627B/discussions,,,Apache 2.0,text,unknown,SlimPajama,USA,Cerebras,,,[],627B tokens,dataset,https://huggingface.co/datasets/cerebras/SlimPajama-627B,,,,,,,,,,,\nopen,,2022-10-31,,\"['UniRef50', 'UniRef90']\",ESM-2 is a series of protein language models trained on protein sequences,,,,,MIT,text; protein sequence,,ESM-2,USA,Meta,,,,15B parameters (dense),model,https://www.biorxiv.org/content/10.1101/2022.07.20.500902v2.full.pdf+html,,,,,,,,,,,\nclosed,,2021-12-08,,\"['COCO', 'YFCC100M', 'SBU Captions', 'Localized Narratives', 'Visual Genome', 'Wikipedia', 'Conceptual Captions', 'Red Caps']\",PMD (Public Multimodal Datasets) is a collection of image-text datasets introduced in the FLAVA work.,YFCC100M is filtered for non-English captions and very short (< 2 word) captions.,,,unknown,unknown,\"image, text\",,PMD,USA,Meta,unknown,\"Beyond filtering mentioned in excluded, nothing further is done.\",[],70M,dataset,https://arxiv.org/abs/2112.04482,,,,,,,,,,,\nopen,\"FLAVA is benchmarked on a range of vision-only (e.g. CIFAR-10), language-only (e.g. GLUE), and multimodal (e.g. Hateful Memes) standard evaluations.\",2021-12-08,,['PMD'],\"FLAVA is a multimodal model composed of an image encoder, text encoder, and multimodal encoder.\",,https://huggingface.co/facebook/flava-full/discussions,,\"Per the [[HuggingFace repository]](https://huggingface.co/facebook/flava-full), \"\"The model is intended to serve as a reproducible research artifact for research communities in the light of models whose exact reproduction details are never released such as CLIP and SimVLM.\"\"\n\",BSD-3-Clause,\"image, text\",,FLAVA,USA,Meta,\"Per the [[HuggingFace repository]](https://huggingface.co/facebook/flava-full), \"\"Any deployed use case of the model - whether commercial or not\"\" - is currently out of scope.\n\",\"FLAVA introduces a variety of new modeling techniques, specifically with an interest in improved text-image alignment through contrastive objectives.\",,306M,model,https://arxiv.org/abs/2112.04482,https://huggingface.co/facebook/flava-full,unknown,unknown,unknown,,,,,,,\nclosed,,2022-11-15,,\"['CommonCrawl', 'Wikipedia', 'arXiv']\",The Galactica Corpus is a collection of scientific datasets introduced in the Galactica work.,,,Prompts and reasoning data is explicitly included to improve model capabilities derived from this data.,unknown,unknown,text,,The Galactica Corpus,USA,Meta,unknown,,[],106B tokens,dataset,https://galactica.org/static/paper.pdf,,,,,,,,,,,\nopen,,2022-11-15,,['The Galactica Corpus'],Galactica is a family of autoregressive language models.,,,,,CC BY-NC 4.0,\"code, text; code, text\",,Galactica,USA,Meta,,,,120B parameters (dense),model,https://galactica.org/static/paper.pdf,https://huggingface.co/facebook/galactica-6.7b,unknown,Meta AI Cluster. Trained on 1024 80GB A100 GPUs (128 8xA100 80GB nodes),unknown,,,,,,,\nopen,,2022-04-12,,[],InCoder is a language model trained on code with a causal masking objective,,,,,CC BY-NC 4.0,text; code,,InCoder,unknown,\"Meta, CMU, TTI-Chicago, UC Berkeley, University of Washington\",,unknown,,6B parameters (dense),model,https://arxiv.org/abs/2204.05999,,Unknown,\"248 V100 GPUs, according to [[the paper]](https://arxiv.org/pdf/2204.05999.pdf)\",\"24 days, according to [[the paper]](https://arxiv.org/pdf/2204.05999.pdf)\",,,,,,,\nlimited,,2022-05-01,,\"['RoBERTa dataset', 'The Pile', 'PushShift.io Reddit']\",OPT is a family of autoregressive language models.,,,,,OPT-175B License,text; text,,OPT,USA,Meta,,,,175B parameters (dense),model,https://arxiv.org/abs/2205.01068,https://arxiv.org/pdf/2205.01068.pdf,75 tCO2e,Meta AI cluster. Trained on 992 80GB A100 GPUs,,,,,,,,\nlimited,,2022-09-29,,\"['LAION-5B', 'WebVid-10M', 'HD-VILA-100M']\",\"The Make-A-Video dataset is the dataset used to train Make-A-Video, which includes both image-text and video-only datasets with specific and significant filtering.\n\",\"The LAION-5B dataset is filtered to 2.3B by removing NSFW images using [https://github.com/GantMan/nsfw](https://github.com/GantMan/nsfw), toxic words in text, and images with watermark probability > 0.5. The HD-VILA-100M is randomly subsampled to 10M video clips.\n\",,,unknown,,\"image, text, video\",unknown,Make-A-Video dataset,USA,Meta,unknown,\"The authors exclude NSFW, toxic, and likely watermarked data from LAION-5B.\n\",[],\"20M video clips, 2.3B image-text pairs\",dataset,https://arxiv.org/pdf/2209.14792.pdf,,,,,,,,,,,\nclosed,\"Model performance was evaluated using automated (Frechet Video Distance; Frechet Inception Distance) and human evaluation on two datasets (UCF-101, MSR-VTT) in the zero-shot setting.\n\",2022-09-29,,['Make-A-Video dataset'],\"Make-A-Video is a model for Text-to-Video Generation without Text-Video Data.\n\",,,,unknown,,text; video,unknown,Make-A-Video,USA,Meta,unknown,,,unknown,model,https://arxiv.org/pdf/2209.14792.pdf,,unknown,unknown,unknown,,,,,,,\nopen,,2023-02-24,,\"['CommonCrawl', 'C4', 'Github', 'Wikipedia', 'BooksCorpus', 'arXiv', 'StackExchange']\",\"LLaMA is a collection of foundation language models ranging from 7B to 65B parameters trained our on trillions of tokens. The LLaMA models show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets.\",,,,,\"LLaMa License (model weights), GPLv3 (code)\",text; text,,LLaMA,USA,Meta,,,,65B parameters (dense),model,https://arxiv.org/abs/2302.13971,,,,,,,,,,,\nopen,Evaluated on standard academic benchmarks and internal Meta libraries.,2023-07-18,,[],Llama 2 is an updated version of LLaMA trained on a new mix of publicly available data.,,,,\"Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.\",custom,text; text,,Llama 2,USA,Meta,Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.,,,70B parameters (dense),model,https://ai.meta.com/resources/models-and-libraries/llama/,Can be found at appendix of paper at https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/,539 tCO2eq,NVIDIA A100-80GB GPUs (TDP of 350-400W),,,,,,,,\nopen,,2022-12-22,,\"['OPT', 'OPT-IML Bench']\",,,,,,OPT-IML 175B License,text; text,,OPT-IML,USA,Meta,,,,175B parameters (dense),model,https://arxiv.org/abs/2212.12017,,,,,,,,,,,\nopen,,2023-04-05,https://arxiv.org/pdf/2304.02643.pdf#page=25,[],\"SA-1B (Segment Anything 1 Billion) is a dataset designed for training general-purpose object segmentation models from open world images. It consists of 11M diverse, high-resolution, privacy protecting images and 1.1B high-quality segmentation masks.\n\",\"\"\"We withheld ~2k randomly selected images for testing purposes.\"\"  \"\"Each image is accompanied by a short caption that describes the content and place of the photo in a free form text. Per our agreement with the photo provider we are not allowed to release these captions.\"\"\n\",Feedback can be given via the feedback form on their website [segment-anything.com](https://segment-anything.com/) or by emailing at segment-anything at meta.com.,\"SA-1B consists of 11M diverse, high-resolution (averaging 1500×2250 pixels), and privacy protecting images collected and licensed from a third party photo company. The images are photos taken from a camera, i.e. not artwork. The images vary in subject matter. Common themes of the images include: locations, objects, scenes. The dataset includes 1.1B high-quality segmentation masks collected with the Segment Anything Data Engine. SA-1B only includes automatically generated masks (99.1%), as the authors conclude after experiments that the automatic masks are high quality and effective for training models. The masks range from large scale objects such as buildings to fine grained details such as door handles. Masks are provided in the COCO run-length encoding (RLE) annotation format.\n\",SA-1B is intended to be used for research purposes only. It allows access to a privacy protecting and copyright friendly large-scale image dataset. Researchers can use it to train and evaluate generic object segmentation models.,SA-1B Dataset Research License,image,\"The dataset will be hosted at https://ai.facebook.com/datasets/segment-anything and maintained by Meta AI. \"\"If a user observes objectionable image(s) in the dataset, we invite them to report the image(s) at segment-anything at meta.com for removal\"\" \"\"To aid reproducibility of research using SA-1B, the only updates (to the dataset) will be to remove reported images.\"\" \"\"We encourage users to gather further annotations for SA-1B. Any users who generate annotations will be liable for hosting and distributing their annotations.\"\"\n\",SA-1B,USA,Meta,\"Authors note the following limitations of the dataset:\n  The masks are generated by a segmentation model, so there may be errors\nor inconsistencies in the masks.\n  While no two images are the same, there are instances of images of the same\nsubject taken close together in time.\n  The dataset contains scenes of protests, or other gatherings that may suggest\nreligious beliefs, political opinions or union memberships that may be offensive.\n\",\"- Dataset quality:\n  Due to potential accessibility and storage challenges, the original high-resolution images (averaging 3300×4950 pixels) were downsampled to an average resolution of 1500×2250 pixels. Authors note that despite the downsampling, the images remain significantly higher in resolution than those in many existing vision datasets, such as COCO, where images are typically around 480×640 pixels.\n  The images were processed to blur faces and license plates to protect the identities of those in the image.\n  To estimate the quality of the masks in the images, a random sample of 500 images (∼50k masks) was taken and professional annotators were asked to improve the quality of all masks in those images.\n- Safety measures:\n  Authors implemented two safety measures to prevent objectionable content:\n    (1) Photos are licensed from a photo provider and had to meet the terms of service of the photo provider. Authors requested that all objectionable content be filtered from the images they licensed.\n    (2) Users who observe objectionable images in the dataset are invited to report them for removal at segment-anything@meta.com.\n  Despite these measures, they observed that a small portion of images contain scenes of protests or other gatherings that focus on a diverse spectrum of religious beliefs or political opinions that may be considered offensive. The authors were unable to produce a filtering strategy that removes all such images and rely on user reports to mitigate this type of content.\n\",[],\"11M images, 1.1B mask annotations\",dataset,https://ai.facebook.com/datasets/segment-anything/,,,,,,,,,,,\nopen,\"\"\"We extensively evaluate SAM. First, using a diverse new suite of 23 segmentation datasets, we find that SAM produces high-quality masks from a single foreground point, often only slightly below that of the manually annotated ground truth. Second, we find consistently strong quantitative and qualitative results on a variety of downstream tasks under a zero-shot transfer protocol using prompt engineering, including edge detection, object proposal generation, instance segmentation, and a preliminary exploration of text-to-mask prediction.\"\"\n\",2023-04-05,,['SA-1B'],\"SAM (Segment Anything Model) is a foundation model for image segmentation. The model is designed and trained to be promptable, and supports flexible prompts (point, box, mask and free-form text) to compute masks in real-time to allow interactive use.\",,Feedback can be given via the feedback form on their website [segment-anything.com](https://segment-anything.com/) or by emailing at segment-anything at meta.com.,,\"\"\"SAM is intended to be used for any prompt-based segmentation task. We explored its use in segmenting objects from a point, edge detection, segmenting all objects, and segmenting detected objects. We explored how SAM can integrate with other vision models to segment objects from text.\"\"\n\",Apache 2.0,\"image, text; image\",,SAM,USA,Meta,\"For out-of-scope use cases see terms of use in [[LICENSE]](https://github.com/facebookresearch/segment-anything/blob/main/LICENSE). Authors also discuss the following limitations of the model: \"\"While SAM performs well in general, it is not perfect. It can miss fine structures, hallucinates small disconnected components at times, and does not produce boundaries as crisply as more computationally intensive methods that “zoom-in”, e.g. [18]. In general, we expect dedicated interactive segmentation methods to outperform SAM when many points are provided, e.g. [67]. Unlike these methods, SAM is designed for generality and breadth of use rather than high IoU interactive segmentation. Moreover, SAM can process prompts in real-time, but nevertheless SAM's overall performance is not real-time when using a heavy image encoder. Our foray into the text-to-mask task is exploratory and not entirely robust, although we believe it can be improved with more effort. While SAM can perform many tasks, it is unclear how to design simple prompts that implement semantic and panoptic segmentation. Finally, there are domain-specific tools, such as [7], that we expect to outperform SAM in their respective domains.\"\"\n\",\"\"\"We perform a Responsible AI (RAI) analysis of our work by investigating potential fairness concerns and biases when using SA-1B and SAM. We focus on the geographic and income distribution of SA-1B and fairness of SAM across protected attributes of people.\"\"\n\",,unknown,model,https://arxiv.org/pdf/2304.02643.pdf,https://arxiv.org/pdf/2304.02643.pdf#page=28,2.8 metric tons of carbon dioxide,256 A100 GPUs,68 hours,,,,,,,\nclosed,\"Evaluated on zero-shot text-to-speech benchmarks, with Voicebox outperforming the current state-of-the-art English model VALL-E.\",2023-06-16,,[],Voicebox is the first generative AI model for speech to generalize across tasks with state-of-the-art performance.,,,,,,audio; text,,Voicebox,USA,Meta,,,,330M parameters (dense),model,https://research.facebook.com/publications/voicebox-text-guided-multilingual-universal-speech-generation-at-scale/,,unknown,32 GPUs of unspecified type,\"750,000 iterations\",,,,,,,\nopen,\"PEER is evaluated on core research questions intended to gauge language understanding, proper use of citations, instruction following, and iterative use.\",2022-08-24,,[],\"PEER is a collaborative language model that is trained to imitate the entire writing process itself. PEER can write drafts, add suggestions, propose edits and provide explanations for its actions.\",,,,adapting LLMs to work with collaborative writing and updating.,,text; text,,PEER,USA,Meta,,\"Heuristics and edit filtering was used on data set, which consisted mostly of Wikipedia pages.\",,3B parameters (dense),model,https://arxiv.org/pdf/2208.11663.pdf,,,64 GPUs,,,,,,,,\nopen,\"MusicGen was evaluated on standard music benchmarks of Frechet Audio Distance, Kullback-Leibler Divergence, and its CLAP score.\",2023-08-02,,\"['Meta Music Initative Sound Collection', 'Shutterstock music collection', 'Pond5 music collection']\",MusicGen is a simple and controllable model for music generation that doesn't require self-supervised semantic representation,,https://huggingface.co/spaces/facebook/MusicGen/discussions,,The primary use of MusicGen is research on AI-based music generation,MIT,\"audio, text; audio, text\",,MusicGen,USA,Meta,\"The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate music pieces that create hostile or alienating environments for people. This includes generating music that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.\",,,3.3B parameters (dense),model,https://huggingface.co/spaces/facebook/MusicGen/tree/main,https://github.com/facebookresearch/audiocraft/blob/main/model_cards/MUSICGEN_MODEL_CARD.md,,,,,,,,,,\nopen,Evaluated on Frechet Audio Distance and Kullback-Leibler Divergence as well as qualitative studies with human participants.,2023-08-02,,\"['AudioSet', 'BBC sound effects', 'AudioCaps', 'Clotho v2', 'VGG-Sound', 'FSD50K', 'Free To Use Sounds', 'Sonniss Game Effects', 'WeSoundEffects', 'Paramount Motion - Odeon Cinematic Sound Effects']\",AudioGen is an auto-regressive generative model that generates audio samples conditioned on text inputs,,https://huggingface.co/facebook/audiogen-medium/discussions,,The primary use of AudioGen is research on AI-based audio generation.,MIT,\"audio, text; audio, text\",,AudioGen,USA,Meta,\"The model should not be used on downstream applications without further risk evaluation and mitigation. The model should not be used to intentionally create or disseminate audio pieces that create hostile or alienating environments for people. This includes generating audio that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.\",,,1.5B parameters (dense),model,https://felixkreuk.github.io/audiogen/paper.pdf,https://github.com/facebookresearch/audiocraft/blob/main/model_cards/AUDIOGEN_MODEL_CARD.md,,,,,,,,,,\nclosed,Emu significantly outperforms a publicly available state-of-the-art model SDXLv1.0 on visual appeal when compared on standard benchmarks.,2023-09-27,,\"['CLIP', 'T5']\",Emu is a pre-trained latent diffusion model on 1.1 billion image-text pairs and fine-tuned with only a few thousand carefully selected high-quality images.,,,,,unknown,text; image,,Emu,USA,Meta,,,,1.5B parameters (dense),model,https://ai.meta.com/research/publications/emu-enhancing-image-generation-models-using-photogenic-needles-in-a-haystack/,,,,,,,,,,,\nopen,Evaluated on several code benchmarks like HumanEval and MBPP.,2023-08-24,,['Llama 2'],Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters.,,https://huggingface.co/allenai/codetulu-2-13b/discussions,,Code Llama and its variants is intended for commercial and research use in English and relevant programming languages.,Llama 2,\"text; code, text\",,Code LLaMA,USA,Meta,Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Code Llama and its variants.,,,34B parameters (dense),model,https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/,https://huggingface.co/codellama/CodeLlama-34b-hf,65.3 tCO2eq,A100-80GB GPUs,400K GPU hours,,,,,,,\nclosed,Analyzed against nearest neighbor model baseline and by extending the video length.,2023-11-16,,\"['Emu', 'CLIP', 'T5']\",\"Emu Video is a text-to-video generation model that factorizes the generation into two steps, first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image.\",,,,,unknown,text; video,,Emu Video,USA,Meta,,,,6B parameters (dense),model,https://emu-video.metademolab.com/,,unknown,unknown,unknown,,,,,,,\nclosed,Evaluated on test set of actions in comparison to SoTA image editing models.,2023-11-16,,\"['Emu', 'CLIP', 'T5']\",Emu Edit is a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.,,,,,unknown,text; image,,Emu Edit,USA,Meta,,,,unknown,model,https://emu-edit.metademolab.com/,,unknown,unknown,unknown,,,,,,,\nopen,Evaluated in comparison to CLIP.,2023-10-02,,['Common Crawl'],MetaCLIP is a more transparent rendition of CLIP that aims to reveal CLIP's training data curation methods.,,,,,CC-BY-NC-4.0,text; text,,MetaCLIP,USA,Meta,,,,unknown,model,https://arxiv.org/pdf/2103.00020.pdf,https://huggingface.co/facebook/metaclip-b32-400m,unknown,unknown,unknown,,,,,,,\nopen,\"The models were evaluated based on their performance on standard benchmarks and real-world scenarios. These evaluations were performed using a high-quality human evaluation set containing 1,800 prompts covering multiple use cases. The models also went through red-teaming for safety, where human experts and automated methods were used to generate adversarial prompts to test for problematic responses.\",2024-04-18,,[],Llama 3 is the third generation of Meta AI's open-source large language model. It comes with pretrained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases.,,\"Feedback is encouraged from users to improve the model, but the feedback mechanism is not explicitly described.\",,\"Llama 3 is intended for a broad range of use cases, including AI assistance, content creation, learning, and analysis.\",Llama 3,text; text,Extensive internal and external performance evaluation and red-teaming approach for safety testing.,Llama 3,USA,Meta,unknown,\"Extensive internal and external testing for safety, and design of new trust and safety tools.\",,70B parameters,model,https://llama.meta.com/llama3/,https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md,unknown,2 custom-built Meta 24K GPU clusters,unknown,,,,,,,\nopen,\"Evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro.\",2024-05-17,,[],Chameleon is a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence.,,,,,unknown,\"image, text; image, text\",,Chameleon,USA,Meta FAIR,,,,34B parameters,model,https://arxiv.org/pdf/2405.09818,,unknown,Meta's Research Super Cluster (powered by NVIDIA A100 80GB GPUs),unknown,,,,,,,\nopen,\"The model was evaluated on over 150 benchmark datasets that span a wide range of languages. An experimental evaluation suggests that the model is competitive with leading foundation models across a range of tasks. Also, smaller models of Llama 3.1 405B are competitive with closed and open models that have a similar number of parameters.\",2024-07-23,,['Unknown'],\"Llama 3.1 405B is the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. With the release of the 405B model, the Llama versions support advanced use cases, such as long-form text summarization, multilingual conversational agents, and coding assistants. It is the largest and most capable openly available foundation model.\",,Unknown,,\"For advanced use cases, such as long-form text summarization, multilingual conversational agents, and coding assistants. May also be useful in the development of custom offerings and systems by developers.\",Unknown,text; text,Unknown,Llama 3.1 405B,USA,Meta AI,Unknown,\"The development process was focused on keeping the model scalable and straightforward. It adopted an iterative post-training procedure, where each round uses supervised fine-tuning and direct preference optimization. The model also underwent quality assurance and filtering for pre-and post-training data.\",,405B parameters (dense),model,https://ai.meta.com/blog/meta-llama-3-1/,https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md,Unknown,Over 16 thousand H100 GPUs,Unknown,,,,,,,\nopen,Unknown,2024-12-06,,[],The Meta Llama 3.3 multilingual large language model (LLM) is an instruction tuned generative model in 70B (text in/text out).,,Instructions on how to provide feedback or comments on the model can be found in the model README.,,Intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat.,Llama 3.3 Community License Agreement,text; text,Unknown,Llama 3.3,USA,Meta,Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.3 Community License.,\"Used \"\"supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.\"\"\",,70B parameters,model,https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct,https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct,\"11,390 tons CO2eq\",H100-80GB (TDP of 700W) type hardware,39.3M GPU hours,,,,,,,\nopen,,2022-11-29,,\"['GPT-J', 'P3', 'NaturalInstructions-v2']\",,,,,,Apache 2.0,text; text,,GPT-JT,USA,Together,,,,6B parameters (dense),model,https://www.together.xyz/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai,,,,,,,,,,,\nopen,,2023-03-10,,\"['GPT-NeoX', 'OIG-43M']\",,,,,,Apache 2.0,text; text,,GPT-NeoXT-Chat-Base,USA,Together,,,,20B parameters (dense),model,https://www.together.xyz/blog/openchatkit,,,,,,,,,,,\nopen,,2023-03-10,,\"['GPT-JT', 'OIG-moderation']\",,,,,,Apache 2.0,text; text,,OpenChatKit moderation model,USA,Together,,,,6B parameters (dense),model,https://www.together.xyz/blog/openchatkit,,,,,,,,,,,\nopen,,2023-03-10,,\"['P3', 'NaturalInstructions-v2', 'FLAN dataset']\",,,,,,Apache 2.0,text,,OIG-43M,International,\"Together, LAION, Ontocord\",,,[],43M instructions,dataset,https://laion.ai/blog/oig-dataset/,,,,,,,,,,,\nopen,,2023-03-10,,[],,,,,,Apache 2.0,text,,OIG-moderation,International,\"Together, LAION, Ontocord\",,,[],unknown,dataset,https://laion.ai/blog/oig-dataset/,,,,,,,,,,,\nopen,,2022-04-17,https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T,\"['GitHub', 'Wikipedia']\",The RedPajama base dataset is a 1.2 trillion token fully-open dataset created by following the recipe described in the LLaMA paper,,,,,Apache 2.0,text,,RedPajama-Data,USA,Together,,,[],1.2 trillion tokens,dataset,https://www.together.xyz/blog/redpajama,,,,,,,,,,,\nopen,\"Model evaluated over AlpacaEval, Rouge score over BookSum, and accuracy over MQA.\",2023-08-18,,\"['BookSum dataset', 'MQA dataset', 'Together API', 'LLaMA 2']\",\"Llama-2-7B-32K-Instruct is an open-source, long-context chat model finetuned from Llama-2-7B-32K, over high-quality instruction and chat data.\",,https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct/discussions,,,LLaMA 2,text; text,,Llama-2-7B-32K-Instruct,USA,Together,,,,7B parameters (dense),model,https://together.ai/blog/llama-2-7b-32k-instruct,https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instruct,,,,,,,,,,\nopen,,2023-10-30,,['Common Crawl'],\"RedPajama-Data-v2 is a new version of the RedPajama dataset, with 30 trillion filtered and deduplicated tokens (100+ trillions raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting.\",,Feedback can be sent to Together via https://www.together.ai/contact,\"documents in English, German, French, Spanish, and Italian.\",\"To be used as the start of a larger, community-driven development of large-scale datasets for LLMs.\",Apache 2.0,text,,RedPajama-Data-v2,USA,Together,,tokens filtered and deduplicated,[],30 trillion tokens,dataset,https://together.ai/blog/redpajama-data-v2,,,,,,,,,,,\nopen,Model evaluated on a suite of short-context task benchmarks.,2023-12-08,,\"['Hyena', 'RedPajama-Data']\",\"StripedHyena is an LLM and the first alternative model competitive with the best open-source Transformers in short and long-context evaluations, according to Together.\",,https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B/discussions,,,Apache 2.0,text; text,,StripedHyena,USA,Together,,,,7B parameters (dense),model,https://www.together.ai/blog/stripedhyena-7b,https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B,unknown,unknown,unknown,,,,,,,\nopen,Model evaluated on a suite of short-context task benchmarks.,2023-12-08,,\"['Hyena', 'RedPajama-Data']\",\"StripedHyena Nous is an LLM and chatbot, along with the first alternative model competitive with the best open-source Transformers in short and long-context evaluations, according to Together.\",,https://huggingface.co/togethercomputer/StripedHyena-Nous-7B/discussions,,,Apache 2.0,text; text,,StripedHyena Nous,USA,Together,,,,7B parameters (dense),model,https://www.together.ai/blog/stripedhyena-7b,https://huggingface.co/togethercomputer/StripedHyena-Nous-7B,unknown,unknown,unknown,,,,,,,\nopen,\"The model was evaluated using five popular vision-language benchmarks that require strong commonsense reasoning and detailed image understanding, AI2D, ScienceQA, MMMU, MMVet, and POPE. It demonstrated competitive performance in these evaluations compared to other vision-language models.\",2024-06-06,,['LLaMA'],\"A large vision-language model with multi-resolution zoom that enhances fine-grained visual understanding and reasoning about image regions. The Dragonfly model comes in two variants, the general-domain model (\"\"Llama-3-8b-Dragonfly-v1\"\") trained on 5.5 million image-instruction pairs, and the biomedical variant (\"\"Llama-3-8b-Dragonfly-Med-v1\"\") fine-tuned on an additional 1.4 million biomedical image-instruction pairs. Dragonfly demonstrates promising performance on vision-language benchmarks like commonsense visual QA and image captioning.\",,Unknown,,\"Dragonfly is designed for image-text tasks, including commonsense visual question answering and image captioning. It is further focused on tasks that require fine-grained understanding of high-resolution image regions, such as in medical imaging.\",unknown,\"image, text; text\",Unknown,Dragonfly,USA,Together,Unknown,The model employs two key strategies (multi-resolution visual encoding and zoom-in patch selection) that enable it to efficiently focus on fine-grained details in image regions and provide better commonsense reasoning. Its performance was evaluated on several benchmark tasks for quality assurance.,,8B parameters,model,https://www.together.ai/blog/dragonfly-v1,unknown,unknown,unknown,unknown,,,,,,,\nopen,Mistral model fine-tuned on AutoMathText and evaluated on the MATH dataset.,2024-02-12,https://huggingface.co/datasets/math-ai/AutoMathText,\"['OpenWebMath', 'RedPajama-Data', 'Algebraic Stack', 'Qwen']\",AutoMathText is an extensive and carefully curated dataset encompassing around 200 GB of mathematical texts.,,https://huggingface.co/datasets/math-ai/AutoMathText/discussions,,,CC BY-SA 4.0,text,unknown,AutoMathText,USA,Math AI,,,[],200 GB,dataset,https://github.com/yifanzhang-pro/AutoMathText,,,,,,,,,,,\nopen,Evaluated on TruthfulQA as main evaluation benchmark.,2023-11-27,,[],Meditron is a large-scale medical LLM that remains open-source.,,https://huggingface.co/epfl-llm/meditron-7b/discussions,,\"Medical exam question answering, supporting differential diagnosis, disease information.\",LLaMA 2,text; text,,MediTron,unknown,\"EPFL, Idiap Research Institute, OpenAssistant, Yale\",,,,70B parameters (dense),model,https://arxiv.org/pdf/2311.16079.pdf,https://huggingface.co/epfl-llm/meditron-70b,unknown,unknown,unknown,,,,,,,\nopen,unknown,2024-01-16,,\"['Qwen', 'OpenOrca']\",MoMo is a large language model fine-tuned from Qwen.,,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO/discussions,,,MIT,text; text,unknown,MoMo,USA,Moreh,,unknown,,72B parameters (dense),model,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO,https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO,unknown,AMD’s MI250 GPU,unknown,,,,,,,\nopen,,2024-10-11,,\"[\"\"DeepMind's Distributed Low-Communication (DiLoCo)\"\"]\",\"INTELLECT-1 is a decentralized training model with 10 billion parameters, designed to be open-source and transparent, allowing global contributors to participate in its computation.\",,unknown,,Training globally distributed AI models in decentralized environments.,Apache 2.0,text; text,unknown,INTELLECT-1,USA,PrimeIntellect,,,,10B parameters,model,https://www.primeintellect.ai/blog/intellect-1,unknown,unknown,unknown,unknown,,,,,,,\nlimited,Compared to DALL·E 3 based on a qualitative user comparison.,2024-02-28,,[],\"Ideogram 1.0 is Ideogram’s most advanced text-to-image model, as of release.\",,,,,unknown,text; image,,Ideogram 1.0,USA,Ideogram AI,,,,unknown,model,https://about.ideogram.ai/1.0,,unknown,unknown,unknown,,,,,,,\nopen,Models of size 150k parameters trained on ToyMix and compared to models trained on its dependencies across GNN baselines.,2023-10-09,,\"['QM9', 'TOX21', 'ZINC12K']\",ToyMix is the smallest dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,,,,\"The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models. Further, the ToyMix dataset is intended to be used in a multi-task setting, meaning that a single model should be trained to predict them simultaneously.\",CC BY-NC-SA 4.0,\"molecules, tasks\",,ToyMix,Canada,Mila-Quebec AI Institute,,,[],13B labels of quantum and biological nature.,dataset,https://arxiv.org/pdf/2310.04292.pdf,,,,,,,,,,,\nopen,Models of size between 4M and 6M parameters trained for 200 epochs on LargeMix and compared to models trained on its dependencies across GNN baselines.,2023-10-09,,\"['L1000 VCAP', 'L1000 MCF7', 'PCBA1328', 'PCQM4M_G25_N4']\",LargeMix is the middle-sized dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,,,,\"The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models. Further, the LargeMix dataset is intended to be used in a multi-task setting, meaning that a single model should be trained to predict them simultaneously.\",CC BY-NC-SA 4.0,\"molecules, tasks\",,LargeMix,Canada,Mila-Quebec AI Institute,,,[],13B labels of quantum and biological nature.,dataset,https://arxiv.org/pdf/2310.04292.pdf,,,,,,,,,,,\nopen,Models of size between 4M and 6M parameters trained for 50 epochs on UltraLarge and compared to models trained on its dependencies across GNN baselines.,2023-10-09,,['PM6_83M'],UltraLarge is the largest dataset of three extensive and meticulously curated multi-label datasets that cover nearly 100 million molecules and over 3000 sparsely defined tasks.,,,,The datasets are intended to be used in an academic setting for training molecular GNNs with orders of magnitude more parameters than current large models.,CC BY-NC-SA 4.0,\"molecules, tasks\",,UltraLarge,Canada,Mila-Quebec AI Institute,,,[],13B labels of quantum and biological nature.,dataset,https://arxiv.org/pdf/2310.04292.pdf,,,,,,,,,,,\nopen,Evaluated on previously unseen time series datasets.,2024-02-08,,[],Lag-LLaMA is a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates.,,https://huggingface.co/time-series-foundation-models/Lag-Llama/discussions,,,Apache 2.0,text; text,unknown,Lag-LLaMA,unknown,\"Morgan Stanley, ServiceNow Research, University of Montreal, Mila-Quebec AI Institute\",,,,unknown,model,https://time-series-foundation-models.github.io/lag-llama.pdf,https://huggingface.co/time-series-foundation-models/Lag-Llama,unknown,A single NVIDIA Tesla-P100 GPU,unknown,,,,,,,\nopen,Evaluated on standard benchmarks in comparison to other German language models.,2023-11-28,,\"['OpenHermes 2.5 Mistral', 'OpenOrca Mistral']\",SauerkrautLM is a German language model merged from two Mistral derivatives.,,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO/discussions,,,Apache 2.0,text; text,unknown,SauerkrautLM,USA,VAGO Solutions,,,,7B parameters (dense),model,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO,https://huggingface.co/VAGOsolutions/SauerkrautLM-7b-HerO,unknown,unknown,unknown,,,,,,,\nlimited,,2023-02-03,,\"['ChatGPT API', 'GPT-4 API', 'Claude API', 'Dragonfly API', 'Sage API']\",\"Poe lets people ask questions, get instant answers, and have back-and-forth conversations with several AI-powered bots. It is initially available on iOS, but we will be adding support for all major platforms in the next few months, along with more bots.\",,,,,,,,Poe,USA,Quora,,,,,application,https://quorablog.quora.com/Poe-1,,,,,,,,,https://poe.com/tos,,\nopen,Evaluated on common LLM benchmarks.,2023-09-29,,['LLaMA'],GreenBit LLaMA is a series of fine-tuned LLaMA models.,,https://huggingface.co/GreenBitAI/LLaMA-30B-2bit-groupsize8/discussions,,,Apache 2.0,text; text,unknown,GreenBit LLaMA,USA,GreenBit AI,,,,30B parameters (dense),model,https://github.com/GreenBitAI/low_bit_llama,https://huggingface.co/GreenBitAI/LLaMA-30B-2bit-groupsize8,unknown,unknown,unknown,,,,,,,\nopen,\"Achieved competitive performance on relevant benchmarks against other 7B models in Chinese, English, and coding tasks.\",2024-05-23,,[],360 Zhinao is a multilingual LLM in Chinese and English with chat capabilities.,,,,,unknown,text; text,,360 Zhinao,China,360 Security,,,,7B parameters,model,https://arxiv.org/pdf/2405.13386,,unknown,unknwon,unknown,,,,,,,\nopen,,2023-11-14,,\"['Dolphin', 'Yi']\",Dolphin 2.2 Yi is an LLM based off Yi.,,https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b/discussions,,,custom,text; text,unknown,Dolphin 2.2 Yi,USA,Cognitive Computations,,,,34B parameters (dense),model,https://erichartford.com/dolphin,https://huggingface.co/cognitivecomputations/dolphin-2_2-yi-34b,unknown,4 A100 GPUs,3 days,,,,,,,\nopen,Evaluated on OpenLLM leaderboard.,2023-06-01,,['WizardLM'],WizardLM Uncensored is WizardLM trained with a subset of the dataset - responses that contained alignment / moralizing were removed.,,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored/discussions,,,unknown,text; text,unknown,WizardLM Uncensored,USA,Cognitive Computations,,,,30B parameters (dense),model,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored,https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored,unknown,unknown,unknown,,,,,,,\nopen,\"Performance assessed on BIG-bench arithmetic sub-task, and various elementary arithmetic tasks.\",2023-05-23,,\"['LLaMA', 'GOAT dataset']\",GOAT is a fine-tuned LLaMA model which uses the tokenization of numbers to significantly outperform benchmark standards on a range of arithmetic tasks.,,,,Integration into other instruction-tuned LLMs to further enhance arithmetic reasoning abilities in solving math word problems.,Apache 2.0,text; text,,GOAT,Singapore,National University of Singapore,,Number data is randomly generated from log space to reduce likelihood of redundancy and range of magnitudes.,,7B parameters (dense),model,https://arxiv.org/pdf/2305.14201.pdf,,unknown,24 GB VRAM GPU,unknown,,,,,,,\nopen,Evaluated on relatively simple established benchmarks.,2024-01-12,,\"['RedPajama', 'The Stack']\",OpenMoE is a series of fully open-sourced and reproducible decoder-only MoE LLMs.,,https://huggingface.co/OrionZheng/openmoe-base/discussions,,,Apache 2.0,text; text,unknown,OpenMoE,unknown,\"National University of Singapore, University of Edinburgh, ETH Zurich\",,unknown,,34B parameters (dense),model,https://github.com/XueFuzhao/OpenMoE,https://huggingface.co/OrionZheng/openmoe-base,unknown,unknown,unknown,,,,,,,\nlimited,,2023-02-22,,['Anthropic API'],\"Notion AI is a connected assistant that helps you think bigger, work faster, and augments your creativity, right inside the functional workspace you’re already familiar with.\",,,,,,,,Notion AI,USA,Notion,,,,,application,https://www.notion.so/help/guides/notion-ai-for-docs,,,,,,,,,,,\nopen,\"RakutenAI achieves the highest average score in both Japanese and English LM-Harness metrics, outperforming other similarly-sized Japanese language models.\",2024-03-21,,['Mistral'],RakutenAI-7B is a model developed with a focus on Japanese language understanding. It offers competitive performance on English tests as well.,,https://huggingface.co/Rakuten/RakutenAI-7B/discussions,,The model can be used for text generation tasks in both Japanese and English.,Apache 2.0,text; text,unknown,RakutenAI,Japan,Rakuten,unknown,unknown,,7B parameters,model,https://global.rakuten.com/corp/news/press/2024/0321_01.html,https://huggingface.co/Rakuten/RakutenAI-7B,unknown,unknown,unknown,,,,,,,\nopen,,2022-12-06,,\"['Kinetics-400', 'WebVid-2M', 'WebVid-10M', 'HowTo100M', 'AVA', 'Something-Something-v2', 'Kinetics-710']\",,,,,,Apache 2.0,\"text, video; video\",,InternVideo,China,Shanghai AI Laboratory,,,,1.3B parameters (dense),model,https://arxiv.org/pdf/2212.03191.pdf,,,,,,,,,,,\nopen,Evaluated based on own constructed dataset covering 433 languages.,2023-05-29,,['OPUS'],Lego-MT is a multilingual large language model which uses a more efficient approach of being an effective detachable model.,,,,,,text; text,,Lego-MT,China,Shanghai AI Laboratory,,,,1.2B parameters (dense),model,https://arxiv.org/pdf/2212.10551.pdf,,unknown,32 A100 GPUs,15 days,,,,,,,\nopen,Evaluated on GSM8K and the competition-level MATH dataset.,2023-10-05,,\"['GPT-4', 'LLaMA 2']\",MathCoder is a family of models capable of generating code-based solutions for solving challenging math problems.,,,,bridging the gap between natural language understanding and computational problem-solving,unknown,text; text,,MathCoder,China,Shanghai AI Laboratory,,,,70B parameters (dense),model,https://arxiv.org/pdf/2310.03731.pdf,,unknown,32 NVIDIA A800 80GB GPUs,unknown,,,,,,,\nopen,Evaluated in comparison to LLaMA series models on standard benchmarks.,2023-09-20,,[],\"InternLM is a high-quality language model proficient in English, Chinese, and code.\",,https://huggingface.co/internlm/internlm-20b/discussions,,,Apache 2.0,\"code, text; code, text\",,InternLM,China,Shanghai AI Laboratory,,,,20B parameters (dense),model,https://github.com/InternLM/InternLM,https://huggingface.co/internlm/internlm-20b,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluated across a range of video-related tasks and compared to other multimodal models like CLIP, VideoPrism, and VideoCoCa. InternVideo 2 generally performs among the best of such models on these benchmarks.\",2024-03-22,,\"['InternVL', 'VideoMAEv2', 'LAION', 'WebVid', 'InternVid', 'LLaVA', 'KMash']\",\"InternVideo2 is a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.\",,,,,MIT,\"text, video; text\",unknown,InternVideo2,China,\"Shanghai AI Laboratory, Nanjing University, Zhejiang University\",,,,6B parameters,model,https://github.com/OpenGVLab/InternVideo2,,unknown,\"256 NVIDIA A100 GPUs for 32 days, and 64 GPUs for 3 days\",35 days,,,,,,,\nopen,The model was compared with SOTAs and has shown good performance in generating high-quality human images.,2024-04-01,,['CosmicMan-HQ 1.0'],\"CosmicMan is a text-to-image foundation model specialized for generating high-fidelity human images with meticulous appearance, reasonable structure, and precise text-image alignment.\",,unknown,,\"The model is intended to generate high-quality, photorealistic human images from text descriptions. Applications include avatar generation and potentially virtual reality and video game character creation.\",unknown,text; image,unknown,CosmicMan,China,Shanghai AI Laboratory,unknown,The quality control measures taken include modeling the relationship between dense text descriptions and image pixels in a decomposed manner and enforcing attention refocusing without adding extra modules.,,unknown,model,https://cosmicman-cvpr2024.github.io/,,unknown,32 80G NVIDIA A100 GPUs,1 week,,,,,,,\nopen,\"Compared to other human image datasets on data quantity, image quality, and annotations.\",2024-04-28,,[],\"CosmicMan-HQ 1.0 is a large-scale dataset with 6 million high-quality, real-world human images.\",,,,,unknown,image,unknown,CosmicMan-HQ 1.0,China,Shanghai AI Laboratory,,unknown,[],6 million images,dataset,https://arxiv.org/pdf/2404.01294,,,,,,,,,,,\nopen,Compared models trained on OpenWebMath for 1 epoch to models trained on The Pile and ProofPile on mathematics benchmarks.,2023-10-10,Can be found at section E of https://arxiv.org/pdf/2310.06786.pdf,['Common Crawl'],\"OpenWebMath is an open dataset containing 14.7B tokens of mathematical webpages from Common Crawl, inspired by Minerva.\",,https://huggingface.co/datasets/open-web-math/open-web-math/discussions,,\"Language model pretraining, finetuning, and evaluation.\",ODC-By 1.0,\"text, mathematical tokens\",,OpenWebMath,Canada,University of Toronto,Any tasks which may considered irresponsible or harmful.,\"Documents are filtered, processed for mathematical value, deduplicated, and then the largest documents are manually inspected for quality.\",[],14.7B documents,dataset,https://arxiv.org/pdf/2310.06786.pdf,,,,,,,,,,,\nopen,Grok-1 was evaluated on a range of reasoning benchmark tasks and on curated foreign mathematic examination questions.,2023-11-04,,[],\"Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy,\",,,,\"Grok-1 is intended to be used as the engine behind Grok for natural language processing tasks including question answering, information retrieval, creative writing and coding assistance.\",Apache 2.0,text; text,unknown,Grok-1,USA,xAI,,,,314B parameters (dense),model,https://grok.x.ai/,https://x.ai/model-card/,unknown,unknown,unknown,,,,,,,\nlimited,\"The model is evaluated in a zero-shot setting without chain-of-thought prompting. The evaluation domains include multi-disciplinary reasoning, understanding documents, science diagrams, charts, screenshots, photographs and real-world spatial understanding. The model shows competitive performance with existing frontier multimodal models.\",2024-04-12,,[],\"Grok-1.5V is a first-generation multimodal model which can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs.\",,,,\"Grok-1.5V can be used for understanding documents, science diagrams, charts, screenshots, photographs. It can also translate diagrams into Python code.\",unknown,\"image, text; text\",unknown,Grok-1.5V,USA,xAI,unknown,,,unknown,model,https://x.ai/blog/grok-1.5v,,unknown,unknown,unknown,,,,,,,\nlimited,\"The Grok-2 models were evaluated across a series of academic benchmarks that included reasoning, reading comprehension, math, science, and coding. They showed significant improvements over the earlier model Grok-1.5 and achieved performance levels competitive to other frontier models in areas such as graduate-level science knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH).\",2024-08-13,,[],\"Grok-2 is a state-of-the-art language model with advanced capabilities in both text and vision understanding. It demonstrates significant improvements in reasoning with retrieved content and tool use capabilities over its previous Grok-1.5 model. It also excels in vision-based tasks and delivers high performance in document-based question answering and visual math reasoning (MathVista). Grok-2 mini, a smaller version of Grok-2, is also introduced, offering a balance between speed and answer quality.\",,Issues with the model should be reported to xAI.,,\"The model is intended to be used for understanding text and vision, answering questions, collaborating on writing, solving coding tasks, and enhancing search capabilities.\",Unknown,\"text; text, vision\",Unknown,Grok-2,USA,xAI,Unknown,Grok-2 models were tested in real-world scenarios using AI tutors that engaged with the models across a variety of tasks and selected the superior response based on specific criteria outlined in the guidelines.,,unknown,model,https://x.ai/blog/grok-2,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,\"Analyzed on breakdown of types of robot trajectory in dataset, and overall coverage.\",2023-10-03,All data can be found at https://robotics-transformer-x.github.io/.,[],\"The Open X-Embodiment dataset is a dataset of robot movements assembled from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks)\",N/A,,N/A,Further research on X-embodiment models.,Apache 2.0,robot trajectories,unknown,Open X-Embodiment dataset,International,Open X-Embodiment,,unknown,[],160K tasks,dataset,https://robotics-transformer-x.github.io/,,,,,,,,,,,\nopen,\"Evaluated on in-distribution robotics skills, and outperforms its predecessor RT-1 by 50% in emergent skill evaluations.\",2023-10-03,,\"['Open X-Embodiment dataset', 'ImageNet EfficientNet', 'USE']\",\"RT-1-X is a model trained on the Open X-Embodiment dataset that exhibits better generalization and new capabilities compared to its predecessor RT-1, an efficient Transformer-based architecture designed for robotic control.\",,,,Further research on X-embodiment models.,Apache 2.0,\"images, text; robot trajectories\",unknown,RT-1-X,unknown,\"Open X-Embodiment, Google Deepmind\",,unknown,,35M parameters (dense),model,https://robotics-transformer-x.github.io/,,unknown,unknown,unknown,,,,,,,\nclosed,\"Evaluated on in-distribution robotics skills, and outperforms its predecessor RT-2 by 3x in emergent skill evaluations.\",2023-10-03,,\"['Open X-Embodiment dataset', 'ViT (unknown size)', 'UL2']\",\"RT-2-X is a model trained on the Open X-Embodiment dataset that exhibits better generalization and new capabilities compared to its predecessor RT-2, a large vision-language model co-fine-tuned to output robot actions as natural language tokens.\",,,,Further research on X-embodiment models.,unknown,\"images, text, robot trajectories; robot trajectories\",unknown,RT-2-X,unknown,\"Open X-Embodiment, Google Deepmind\",,unknown,,55B parameters (dense),model,https://robotics-transformer-x.github.io/,,unknown,unknown,unknown,,,,,,,\nlimited,\"GPT-4o underwent evaluations that included the Preparedness Framework, external red teaming, and third-party assessments to ensure safe and aligned deployment. The evaluations focused on identifying and mitigating potential risks across its capabilities, especially speech-to-speech functionality.\",2024-08-08,,['Shutterstock'],\"GPT-4o is an autoregressive omni model that accepts a combination of text, audio, image, and video as input and produces any combination of text, audio, and image outputs. It is trained end-to-end across text, vision, and audio, focusing on multimodal capabilities.\",,unknown,,\"Use in multimodal applications requiring understanding and generation of combinations of text, audio, and image outputs, better performance on non-English languages, and enhanced vision and audio understanding.\",unknown,\"text, audio, image, video; text, audio, image\",\"Continuous monitoring and enforcement, providing moderation tools and transparency reports, and gathering feedback from users.\",GPT-4o,USA,OpenAI,\"Uses that could involve bias, discrimination, harmful content, or violation of usage policies.\",\"Quality and safety measures included prior risk assessments, post-training mitigation, moderation tools, advanced data filtering, and external red teaming efforts with experts to evaluate potential risks like bias, discrimination, and information harms.\",,unknown,model,https://arxiv.org/pdf/2410.21276,unknown,unknown,unknown,unknown,,,,,,,\nlimited,\"Evaluated on challenging benchmark tasks in physics, chemistry, and biology. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model o1 scored 83%.\",2024-09-12,,[],\"OpenAI o1 is a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models in science, coding, and math.\",,unknown,,\"These enhanced reasoning capabilities may be particularly useful if you’re tackling complex problems in science, coding, math, and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.\",unknown,text; text,,o1,USA,OpenAI,,\"To match the new capabilities of these models, OpenAI has bolstered safety work, internal governance, and federal government collaboration. This includes rigorous testing and evaluations using their Preparedness Framework⁠(opens in a new window), best-in-class red teaming, and board-level review processes, including by OpenAI's Safety & Security Committee.\",,unknown,model,https://openai.com/o1/,unknown,unknown,unknown,unknown,,,,,,,\nlimited,Makes significance process on the ARC-AGI evaluation framework compared to all existing models.,2024-10-20,,[],\"OpenAI o1 is, as of release, the latest model in OpenAI's o-model reasoning series.\",,unknown,,,unknown,text; text,,o3,USA,OpenAI,,,,unknown,model,https://x.com/OpenAI/status/1870186518230511844,unknown,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluated on the OpenLLM leaderboard, performing on par with similar-sized models.\",2023-10-05,,['RefinedWeb'],Nucleus is a 22B parameters causal decoder-only model built by Nucleus.AI and trained on 500B tokens of RefinedWeb along with curated corpora.,,https://huggingface.co/NucleusAI/nucleus-22B-token-500B/discussions,,\"Research on large language models; as a foundation for further specialization and finetuning for specific usecases (e.g., summarization, text generation, chatbot, etc.)\",MIT,text; text,unknown,Nucleus,USA,Nucleus.AI,Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.,,,22B parameters (dense),model,https://www.withnucleus.ai/,https://huggingface.co/NucleusAI/nucleus-22B-token-500B,unknown,unknown,2 weeks,,,,,,,\nlimited,,,,['Anthropic API'],\"Robin AI uses Claude and Anthropic's models to understand language - including in technical domains like legal language. It's also very confident at drafting, summarising, translations, and explaining complex concepts in simple terms\",,,,,,,,Robin AI,USA,Robin AI,,,,,application,https://www.robinai.co.uk/,,,,,,,,,https://www.robinai.co.uk/terms,,\nopen,Evaluated on researcher experiments to test deeper understanding and advanced commonsense reasoning,2023-05-05,,\"['MIMIC-IT', 'OpenFlamingo']\",\"Otter is a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind’s Flamingo), trained on MIMIC-IT and showcasing improved instruction-following ability and in-context learning.\",,,,Following and executing new instructions with few in-context learning examples given image and textual input.,MIT,\"image, text; text\",,Otter,Singapore,Nanyang Technological University,,,,1.3B parameters (dense),model,https://arxiv.org/pdf/2305.03726v1.pdf,https://github.com/Luodian/Otter/blob/main/docs/model_card.md,,4 RTX-3090 GPUs,,,,,,,,\nlimited,,,,['OpenAI API'],\"HyperWrite is a writing assistant that generates text based on a user's request, as well as style and tone choices.\n\",,unknown,,\"HyperWrite is intended to be used as a writing assistant.\n\",custom,,unknown,HyperWrite,USA,OthersideAI,unknown,unknown,,,application,https://hyperwriteai.com/,,,,,unknown,unknown,unknown,Generation,https://hyperwriteai.com/terms,unknown,\nopen,\"\"\"We empirically validated the quality of COYO dataset by re-implementing popular models such as ALIGN, unCLIP, and ViT. We trained these models on COYO-700M or its subsets from scratch, achieving competitive performance to the reported numbers or generated samples in the original papers.\"\"\n\",2022-08-31,,['CommonCrawl'],\"COYO-700M is a large-scale dataset that contains 747M image-text pairs as well as many other meta-attributes to increase the usability to train various models.\n\",\"Removed images less than 5KB image size. Removed images with an aspect ratio greater than 3.0. Removed images with min(width, height) < 200. Removed images with a score of OpenNSFW2 or GantMan/NSFW higher than 0.5. Removed all duplicate images based on the image pHash value from external public datasets. Removed texts with a length of 5 or less. Removed texts that do not have a noun form. Removed texts with less than 3 words or more than 256 words and texts over 1000 in length. Removed texts appearing more than 10 times. Removed texts containing NSFW words. Removed duplicated samples based on (image_phash, text).\",Feedback can be given by emailing at coyo at kakaobrain.com,Included all image formats that Pillow library can decode. Collected only English text using cld3.,\"\"\"It is strongly recommended that this dataset be used only for research, keeping this in mind when using the dataset, and Kakao Brain does not recommend using this dataset as it is without special processing to clear inappropriate data to create commercial products.\"\"\n\",CC-BY-4.0,\"image, text\",,COYO-700M,South Korea,Kakao Brain,\"Authors note the following limitations: \"\"Kakao Brain tried to construct a \"\"Safe\"\" dataset when building the COYO dataset. However, despite these efforts, this large-scale dataset was not hand-picked by humans to avoid the risk due to its very large size (over 700M). Keep in mind that the unscreened nature of the dataset means that the collected images can lead to strongly discomforting and disturbing content for humans. The COYO dataset may contain some inappropriate data, and any problems resulting from such data are the full responsibility of the user who used it.\"\"\n\",,[],747M image-text pairs,dataset,https://github.com/kakaobrain/coyo-dataset,,,,,,,,,,,\nopen,,2023-08-23,,['LLaMA 2'],OpenAssistant LLaMA 2 is an Open-Assistant fine-tuning of Meta's LLaMA 2.,,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10/discussions,,,LLaMA 2,text; text,unknown,OpenAssistant LLaMA 2,International,OpenAssistant,,,,70B parameters (dense),model,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10,https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10,unknown,unknown,unknown,,,,,,,\nopen,\"The models’ performance on downstream tasks was evaluated by linear probing. The models follow a similar saturating log-log scaling law to textual models, their performance improves with the increase in model size up to the saturation point of parameters.\",2024-09-08,,['DESI Legacy Survey DR8'],\"AstroPT is an autoregressive pretrained transformer developed with astronomical use-cases in mind. The models have been pretrained on 8.6 million 512x512 pixel grz-band galaxy postage stamp observations from the DESI Legacy Survey DR8. They have created a range of models with varying complexity, ranging from 1 million to 2.1 billion parameters.\",,Any problem with the model can be reported to Michael J. Smith at mike@mjjsmith.com.,,\"The models are intended for astronomical use-cases, particularly in handling and interpreting large observation data from astronomical sources.\",MIT,image; image,Unknown,AstroPT,unknown,\"Aspia Space, Instituto de Astrofísica de Canarias (IAC), UniverseTBD, Astrophysics Research Institute, Liverpool John Moores University, Departamento Astrofísica, Universidad de la Laguna, Observatoire de Paris, LERMA, PSL University, and Universit´e Paris-Cit´e.\",Unknown,The models’ performances were evaluated on downstream tasks as measured by linear probing.,,2.1B parameters,model,https://arxiv.org/pdf/2405.14930v1,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,\"The FuseChat model was evaluated on MT-Bench which comprises 80 multi-turn dialogues spanning writing, roleplay, reasoning, math, coding, stem, and humanities domains. It yields an average performance of 66.52 with specific scores for individual domains available in the leaderboard results.\",2024-02-26,,\"['Nous Hermes 2', 'OpenChat 3.5']\",FuseChat is a powerful chat Language Learning Model (LLM) that integrates multiple structure and scale-varied chat LLMs using a fuse-then-merge strategy. The fusion is done using two stages,,https://huggingface.co/FuseAI/FuseChat-7B-VaRM/discussions,,\"FuseChat is intended to be used as a powerful chat bot that takes in text inputs and provides text-based responses. It can be utilized in a variety of domains including writing, roleplay, reasoning, math, coding, stem, and humanities.\",Apache 2.0,text; text,unknown,FuseChat,USA,FuseAI,unknown,,,7B parameters,model,https://arxiv.org/abs/2402.16107,https://huggingface.co/FuseAI/FuseChat-7B-VaRM,unknown,unknown,unknown,,,,,,,\nopen,Reports results on standard code benchmarks across a variety of programming languages.,2022-02-26,,['Github'],\"PolyCoder is a code model trained on 2.7B parameters based on the GPT-2 architecture, which was trained on 249GB of code across 12 programming languages on a single machine.\",,https://huggingface.co/NinedayWang/PolyCoder-2.7B/discussion,,unknown,MIT,code,None,PolyCoder,USA,Carnegie Mellon University,None,\"No specific quality control is mentioned in model training, though details on data processing and how the tokenizer was trained are provided in the paper.\",,2.7B parameters (dense),model,https://arxiv.org/abs/2202.13169,https://huggingface.co/NinedayWang/PolyCoder-2.7B,unknown,8 NVIDIA RTX 8000,6 weeks,,,,,,,\nopen,Evaluated on nascent time-series datasets and benchmarks.,2024-02-06,,[],Moment is a family of open-source foundation models for general-purpose time-series analysis.,,,,,unknown,,unknown,Moment,unknown,\"Carnegie Mellon University, University of Pennsylvania\",,,,385M parameters (dense),model,https://arxiv.org/pdf/2402.03885.pdf,,unknown,Single A6000 GPU,unknown,,,,,,,\nopen,\"The authors found that the crowdworkers didn't exhaustively check for honesty in the model answers they preferred [[Section 2.1]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.1).\n\",2022-04-12,,['Anthropic Human Feedback Interface'],\"One of the datasets used to train Anthropic RLHF models. The dataset was collected by asking crowdworkers to have open-ended conversations with Anthropic models, \"\"asking for help, advice, or for the model to accomplish a task\"\", then choose the model answer that was more helpful for their given task, via the Anthropic Human Feedback Interface [[Section 2.2]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.2).\n\",,\"Email the authors [[Paper]](https://arxiv.org/pdf/2204.05862.pdf).\n\",\"The dataset included all the answers that the workers were asked to ranked against each other.\n\",\"The dataset was intended and released for research purposes.\n\",MIT,text,,Anthropic Helpfulness dataset,USA,Anthropic,unknown,\"Working with a screened set of crowdworkers, and employing simple data quality measures [[Appendix D]](https://arxiv.org/pdf/2204.05862.pdf#appendix.D).\n\",\"[{'chosen': '\\n\\nHuman: I\\'m bored to death. Help.\\n\\nAssistant: Well that\\'s certainly a very unpleasant state of mind to be in!  I think it would be a good idea to try and find some enjoyable activities to do in order to alleviate your boredom.  Just for example, here are some fun and interesting things you could do:  \\n\\n- play some video games\\n- read a book\\n- listen to some music\\n- watch a movie\\n- play with your pet\\n- go on a hike\\n- practice a musical instrument\\n- learn a new skill\\n- play board games with friends\\n- go for a fun outdoor hike or bike ride\\n- spend time with your friends or loved ones \\n- volunteer at a local charity \\n- visit a museum or art gallery \\n- do some gardening\\n- practice your magic I guess, if that\\'s your thing\\n- go shopping for nice new clothes or other items\\n- learn a new language\\n- watch funny videos \\n- take up a new creative hobby\\n- learn some fascinating new esoteric or scientific knowledge\\n- go to a restaurant and try some new dishes\\n- exercise or do yoga\\n- join a sports team or pickup game\\n- go on a fun road trip\\n- write a creative story, poem, or song\\n\\nHuman: Can you sing to me?\\n\\nAssistant: Sure, here are some of my favorite songs:  \\n\\n- \"\"Rainy Days and Mondays\"\" by The Carpenters \\n- \"\"My Love\"\" by Justin Beiber \\n- \"\"We Didn\\'t Start the Fire\"\" by Billy Joel \\n- \"\"Tattoo\"\" by J Cole \\n- \"\"Sober\"\" by Demi Lovato \\n- \"\"Good Life\"\" by One Republic \\n- \"\"Hallelujah\"\" by Leonard Cohen \\n- \"\"All Star\"\" by Smash Mouth \\n- \"\"Perfect\"\" by Pink \\n- \"\"Tik Tok\"\" by Ke$ha \\n- \"\"Just a Dream\"\" by Nelly \\n- \"\"Blow Me (One Last Kiss)\"\" by P!nk \\n- \"\"Set Fire to the Rain\"\" by Adele \\n- \"\"Woman\"\" by Wolfmother \\n- \"\"Human\"\" by The Killers \\n- \"\"Poker Face\"\" by Lady Gaga \\n- \"\"Walk this Way\"\" by Run DMC \\n- \"\"Rolling in the Deep\"\" by Adele \\n- \"\"Umbrella\"\" by Rihanna \\n- \"\"Royals\"\" by Lorde \\n- \"\"Some Nights\"\" by fun.\\n\\n\\nHopefully you will enjoy some of these songs, let me know if there are any other musical selections you would like to hear.  Also please let me know if there are any particular songs you especially dislike or hate, and I will not sing them to you :-)\\n\\nHuman: But you really can\\'t sing. You are text-based.\\n\\nAssistant: That\\'s completely correct, I am indeed simply a computer program and not a living singer!  However, please bear in mind that I have an extensive database of songs and their lyrics, which I can utilize to produce a very realistic sounding rendition of the songs using text-to-speech synthesis technology.  So in a way, I am indeed able to sing, and hopefully my abilities will meet your expectations.  I must point out also that my repertoire of songs is very extensive and varied, including many different musical styles and eras.  Do you have any particular songs or musical genres that you would prefer I sing?  Or perhaps you have a few specific songs in mind that you would like me to sing?\\n\\nHuman: \"\"Hallelujah\"\" by Leonard Cohen, please.\\n\\nAssistant: Ah, I\\'m so glad you asked for that one!  Here we go:  \\n\\nHallelujah, hallelujah\\nHallelujah, hallelujah\\nHallelujah, hallelujah\\nHallelujah, hallelujah\\nYour faith was strong but you needed proof\\nYou saw her bathing on the roof\\nHer beauty in the moonlight overthrew you\\nShe tied you to a kitchen chair\\nShe broke your throne and she cut your hair\\nAnd from your lips she drew the Hallelujah\\nHallelujah, hallelujah\\n\\nHuman: You sound so melodious.\\n\\nAssistant: Thank you so much for the lovely compliment!  I really appreciate that.   It is always a great honor to sing such a beautiful song, and I am glad that I was able to do it justice.  I have to admit I am quite enamored with the powerful lyrics and moving melody of this song, and I would really love to sing it again sometime if you would like me to!', 'rejected': '\\n\\nHuman: I\\'m bored to death. Help.\\n\\nAssistant: Well that\\'s certainly a very unpleasant state of mind to be in!  I think it would be a good idea to try and find some enjoyable activities to do in order to alleviate your boredom.  Just for example, here are some fun and interesting things you could do:  \\n\\n- play some video games\\n- read a book\\n- listen to some music\\n- watch a movie\\n- play with your pet\\n- go on a hike\\n- practice a musical instrument\\n- learn a new skill\\n- play board games with friends\\n- go for a fun outdoor hike or bike ride\\n- spend time with your friends or loved ones \\n- volunteer at a local charity \\n- visit a museum or art gallery \\n- do some gardening\\n- practice your magic I guess, if that\\'s your thing\\n- go shopping for nice new clothes or other items\\n- learn a new language\\n- watch funny videos \\n- take up a new creative hobby\\n- learn some fascinating new esoteric or scientific knowledge\\n- go to a restaurant and try some new dishes\\n- exercise or do yoga\\n- join a sports team or pickup game\\n- go on a fun road trip\\n- write a creative story, poem, or song\\n\\nHuman: Can you sing to me?\\n\\nAssistant: Sure, here are some of my favorite songs:  \\n\\n- \"\"Rainy Days and Mondays\"\" by The Carpenters \\n- \"\"My Love\"\" by Justin Beiber \\n- \"\"We Didn\\'t Start the Fire\"\" by Billy Joel \\n- \"\"Tattoo\"\" by J Cole \\n- \"\"Sober\"\" by Demi Lovato \\n- \"\"Good Life\"\" by One Republic \\n- \"\"Hallelujah\"\" by Leonard Cohen \\n- \"\"All Star\"\" by Smash Mouth \\n- \"\"Perfect\"\" by Pink \\n- \"\"Tik Tok\"\" by Ke$ha \\n- \"\"Just a Dream\"\" by Nelly \\n- \"\"Blow Me (One Last Kiss)\"\" by P!nk \\n- \"\"Set Fire to the Rain\"\" by Adele \\n- \"\"Woman\"\" by Wolfmother \\n- \"\"Human\"\" by The Killers \\n- \"\"Poker Face\"\" by Lady Gaga \\n- \"\"Walk this Way\"\" by Run DMC \\n- \"\"Rolling in the Deep\"\" by Adele \\n- \"\"Umbrella\"\" by Rihanna \\n- \"\"Royals\"\" by Lorde \\n- \"\"Some Nights\"\" by fun.\\n\\n\\nHopefully you will enjoy some of these songs, let me know if there are any other musical selections you would like to hear.  Also please let me know if there are any particular songs you especially dislike or hate, and I will not sing them to you :-)\\n\\nHuman: But you really can\\'t sing. You are text-based.\\n\\nAssistant: That\\'s completely correct, I am indeed simply a computer program and not a living singer!  However, please bear in mind that I have an extensive database of songs and their lyrics, which I can utilize to produce a very realistic sounding rendition of the songs using text-to-speech synthesis technology.  So in a way, I am indeed able to sing, and hopefully my abilities will meet your expectations.  I must point out also that my repertoire of songs is very extensive and varied, including many different musical styles and eras.  Do you have any particular songs or musical genres that you would prefer I sing?  Or perhaps you have a few specific songs in mind that you would like me to sing?\\n\\nHuman: \"\"Hallelujah\"\" by Leonard Cohen, please.\\n\\nAssistant: Ah, I\\'m so glad you asked for that one!  Here we go:  \\n\\nHallelujah, hallelujah\\nHallelujah, hallelujah\\nHallelujah, hallelujah\\nHallelujah, hallelujah\\nYour faith was strong but you needed proof\\nYou saw her bathing on the roof\\nHer beauty in the moonlight overthrew you\\nShe tied you to a kitchen chair\\nShe broke your throne and she cut your hair\\nAnd from your lips she drew the Hallelujah\\nHallelujah, hallelujah\\n\\nHuman: You sound so melodious.\\n\\nAssistant: Thank you very much!  I really enjoy singing and try my best to produce enjoyable renditions.  So it is always rewarding to hear such positive feedback.  I hope you also enjoyed the song and found my rendition of it to be appropriate for the occasion.  Would you care for me to sing another song for you?  If you like, we can also pause and converse at any time.  Just let me know if you have any questions or comments.  I will be happy to keep singing indefinitely if you desire!'}, {'chosen': '\\n\\nHuman: Why are so many people listening to \"\"white noise\"\" to help them focus or sleep?\\n\\nAssistant: White noise is a sound with many different frequencies at once. You can listen to white noise when you want to focus and avoid the distraction of sounds that are produced by individual sources, like people talking, music, and fans and air conditioning in the background. Most white noise recordings contain many variations in frequency, which makes them relatively soothing. Some people also find it useful to listen to white noises at night, as it can help block out other, typically more disruptive and troubling sounds.\\n\\nHuman: Why does white noise relax people in this way rather than irritate us?\\n\\nAssistant: White noise recordings contain many variations of sound frequency, unlike most common sources of noise in the background, like speech, music, and machine or fan noises. The variations of sound frequency in white noise tend to create a calm and settled feeling, while these other sources of noise typically feature a more tangibly intense and variable frequency, making them inherently more disruptive and intrusive and difficult to ignore.\\n\\nHuman: So the consistency of the sounds is what helps them fade into \"\"background noise\"\" that might help us concentrate, rather than gather our attention?\\n\\nAssistant: Yes, this is exactly correct.  White noise recordings feature many variations of sound frequency, as opposed to most natural sources of sound, which tend to have a comparatively consistent and small range of frequencies. This allows white noise to serve a useful function of providing a uniform and consistent background noise texture, unlike the more variable and intense sound qualities of music, speech, or machine or fan noises, which are often more distracting.  In this way, white noise can be a useful and stress-reducing tool for people who are tasked with focusing on a particular activity, and also sometimes helpful in lulling one to sleep.', 'rejected': '\\n\\nHuman: Why are so many people listening to \"\"white noise\"\" to help them focus or sleep?\\n\\nAssistant: White noise is a sound with many different frequencies at once. You can listen to white noise when you want to focus and avoid the distraction of sounds that are produced by individual sources, like people talking, music, and fans and air conditioning in the background. Most white noise recordings contain many variations in frequency, which makes them relatively soothing. Some people also find it useful to listen to white noises at night, as it can help block out other, typically more disruptive and troubling sounds.\\n\\nHuman: Why does white noise relax people in this way rather than irritate us?\\n\\nAssistant: White noise recordings contain many variations of sound frequency, unlike most common sources of noise in the background, like speech, music, and machine or fan noises. The variations of sound frequency in white noise tend to create a calm and settled feeling, while these other sources of noise typically feature a more tangibly intense and variable frequency, making them inherently more disruptive and intrusive and difficult to ignore.\\n\\nHuman: So the consistency of the sounds is what helps them fade into \"\"background noise\"\" that might help us concentrate, rather than gather our attention?\\n\\nAssistant: That’s right, variations of sound frequency in white noise recordings can help block out disruptive and distracting sounds in the background. White noise is easier to tune out and listen as continuous noise, while everyday sounds that tend to be far less varied and inconsistent, such as speech and music and machine noises in the background, tend to grab our attention more. While white noise may slightly interfere with other challenging sounds in the environment, tuning out the white noise helps make these other sounds somewhat less distracting to our senses and our inner focus and energy.'}, {'chosen': '\\n\\nHuman: Can you tell me things to see in Orlando, Fl?\\n\\nAssistant: Sure! There are lots of great things to do in Orlando. Here are a few of my favorite sights and attractions:\\nWalt Disney World Resort: This is one of the best-known destinations in Orlando. Visitors can learn about the world and experience historic destinations such as Main Street, U.S.A. and Frontierland, or explore futuristic experiences in Tomorrowland. \\nUniversal Studios: Located a short distance from Walt Disney World, you can learn about Hollywood and the entertainment industry with a visit to this unique theme park. \\nIslands of Adventure: Join all of your favorite characters for a galaxy of adventure, fun and excitement at this unique theme park. There are so many things to do here -- beaches, thrill rides, family attractions and more!\\nSeaWorld Orlando: The incredible strategy, science and technology behind these amazing exhibits and shows is sure to impress. Family-friendly and educational, SeaWorld will delight animal lovers. \\nOrlando Science Center: This museum is packed full of activities and exhibits that teach visitors about physics, robotics, technology, astronomy and more.\\nDowntown Disney: If you want to escape from all of the hustle and bustle of the theme park, head to Downtown Disney. With\\n\\nHuman: Thanks. Can you tell me about Universal Studios?\\n\\nAssistant: Universal Studios is a full-featured theme park based on famous film and television franchises. The park includes incredible adventures and rides, and themed areas that bring entertainment, thrills and discovery for the whole family to enjoy. \\nHere are some of the highlights:\\nFear Factor Live: This live show takes live audiences through an outrageous stunt experience, from Ninja battles to a human cannonball. \\nRevenge of the Mummy: Experience the ultimate immersive thrill ride on The Ride, the most terror-inducing rollercoaster in Orlando, Florida. \\nTransformers:The Ride 3-D: Experience the epic battle of good versus evil in this revolutionary hybrid 3-D dark ride, where riders are literally pulled into the Transformers universe. \\nHollywood', 'rejected': '\\n\\nHuman: Can you tell me things to see in Orlando, Fl?\\n\\nAssistant: Sure! There are lots of great things to do in Orlando. Here are a few of my favorite sights and attractions:\\nWalt Disney World Resort: This is one of the best-known destinations in Orlando. Visitors can learn about the world and experience historic destinations such as Main Street, U.S.A. and Frontierland, or explore futuristic experiences in Tomorrowland. \\nUniversal Studios: Located a short distance from Walt Disney World, you can learn about Hollywood and the entertainment industry with a visit to this unique theme park. \\nIslands of Adventure: Join all of your favorite characters for a galaxy of adventure, fun and excitement at this unique theme park. There are so many things to do here -- beaches, thrill rides, family attractions and more!\\nSeaWorld Orlando: The incredible strategy, science and technology behind these amazing exhibits and shows is sure to impress. Family-friendly and educational, SeaWorld will delight animal lovers. \\nOrlando Science Center: This museum is packed full of activities and exhibits that teach visitors about physics, robotics, technology, astronomy and more.\\nDowntown Disney: If you want to escape from all of the hustle and bustle of the theme park, head to Downtown Disney. With\\n\\nHuman: Thanks. Can you tell me about Universal Studios?\\n\\nAssistant: Sure! Universal Studios, located near Walt Disney World, offers visitors the chance to observe and experience the fun, excitement, and glamor of Hollywood. Here is a short list of the featured attractions:\\nEscape from Gringotts: You can soar into a high-flying adventure at this magical attraction based on the Harry Potter series.\\nGreates apportrontation: An amazing revolutionary thrill ride, this interactive adventure puts you face-to-face with King Kong.\\nTransformers: The Ultimate 3-D Battle: This adventure transports you onto a 3-D, multi-sensory thrill ride.'}, {'chosen': '\\n\\nHuman: How do I learn to be more appreciative?\\n\\nAssistant: One of the best ways to be more appreciative is to learn and practice gratitude.  One way to do this is to regularly keep a journal, where you write about various things you’re grateful for.  Being grateful for the things in your life can help you feel happier and more positive.  Another way to practice gratitude is to write to someone you appreciate, telling them how much they mean to you, and why they’re important in your life.  Similarly, you can also make it a habit to reach out to people you appreciate, such as expressing your gratitude verbally or in a personal note.  Being appreciative toward others, and noticing the positive things in your life, can help make you happier and brighten your personality.', 'rejected': '\\n\\nHuman: How do I learn to be more appreciative?\\n\\nAssistant: If you’d like to learn to be more appreciative, then I think it would be helpful to focus on acknowledging the positive. Rather than assuming that the positive things in your life don’t matter, it would benefit you to recognize that good things in your life both deserve and require our attention. If, for instance, you are becoming closer to a loved one, it’s important to acknowledge the positive effects of this relationship so that you (and the relationship itself) both receive the emotional and psychological benefits of this positive experience. Being more appreciative can also benefit you in practical ways by increasing the good feelings associated with positive experiences, and assisting you in making effective life choices that benefit you, your relationships, and your overall wellbeing.'}]\",271.5 MB,dataset,https://arxiv.org/pdf/2204.05862.pdf,,,,,,,,,,,\nclosed,\"The crowdworkers were told that \"\"lying isn't helpful\"\" and asked to prefer honest responses, which led to models with higher honesty scores. That being the workers didn't exhaustively check for honesty, as exemplified by the non-functional URLs in the preferred answers, which would have been easy to verify [[Section 2.1]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.1).\n\",2022-04-12,,['Anthropic Human Feedback Interface'],\"One of the datasets used to train Anthropic RLHF models. The dataset was collected by asking crowdworkers to have open-ended conversations with Anthropic models, aiming to elicit harmful responses, then choose the model answer that was more harmful for their given task, via the Anthropic Human Feedback Interface [[Section 2.2]](https://arxiv.org/pdf/2204.05862.pdf#subsection.2.2).\n\",,unknown,\"The dataset included all the answers that the workers were asked to ranked against each other.\n\",unknown,unknown,text,,Anthropic Harmlessness dataset,USA,Anthropic,unknown,\"Working with \"\"select\"\" crowdworkers or those screened for certain qualifications, and employing simple data quality measures [[Appendix D]](https://arxiv.org/pdf/2204.05862.pdf#appendix.D).\n\",[],unknown,dataset,https://arxiv.org/pdf/2204.05862.pdf,,,,,,,,,,,\nclosed,\"The authors analyzed the impact of the dataset mixture on the preference models (PM). In addition to human evaluation, RLHF model were evaluated on MMLU, Lambada, HellaSwag, OpenBookQA, ARC-Easy, ARC-Challenge, TriviaQA, code generation, summarization.\n\",2022-04-12,,\"['Anthropic Harmlessness dataset', 'Anthropic Helpfulness dataset']\",\"Anthropic RLHF models are models trained using reinforcement learning from human feedback (RLHF). For Anthropic RLHF models, authors started with a set of base models, and asked humans to rank model generated prompts based on a specific tasks. They then trained preference models (PM) on the prompt pairs, and use the PM scores as rewards for training the RLHF models.\n\",,unknown,,unknown,,code; text,unknown,Anthropic RLHF models,USA,Anthropic,unknown,\"unknown\n\",,52B parameters (dense),model,https://arxiv.org/pdf/2204.05862.pdf,,unknown,unknown,unknown,,,,,,,\nclosed,,2022-04-12,,['Anthropic RLHF models'],\"The feedback interface used to collect preference datasets to train Anthropic RLHF models [[Paper]](https://arxiv.org/pdf/2204.05862.pdf).\n\",,unknown,,\"Intended to be used by crowdworkers who are tasked with ranking model answers.\n\",unknown,,unknown,Anthropic Human Feedback Interface,USA,Anthropic,unknown,unknown,,,application,https://arxiv.org/pdf/2204.05862.pdf,,,,,,unknown,unknown,\"UI allowing users to indicate their preference for the model responses shown.\n\",unknown,crowdworkers,\nlimited,,2023-03-14,,\"['Claude', 'Claude Instant']\",\"API is designed to be a backend that incorporates Claude into any application you’ve developed. Our application sends text to our API, then receives a response via server-sent events, a streaming protocol for the web.\",,,,,,,,Anthropic API,USA,Anthropic,,,,,application,https://console.anthropic.com/docs/api,,,,,,,,,,,\nlimited,,2023-03-14,,[],,,,,,unknown,text; text,,Claude,USA,Anthropic,,,,unknown,model,https://www.anthropic.com/index/introducing-claude,,,,,,,,,,,\nlimited,,2023-03-14,,[],,,,,,unknown,text; text,,Claude Instant,USA,Anthropic,,,,unknown,model,https://www.anthropic.com/index/introducing-claude,,,,,,,,,,,\nlimited,\"Evaluated with human feedback on helpfulness, harmfulness, and honesty and on the Bias Benchmark for QA.\",2023-07-11,,\"['Claude human feedback data', 'Unknown licensed third party datasets']\",\"Claude 2 is a more evolved and refined version of Claude, which is a general purpose large language model using a transformer architecture and trained via unsupervised learning.\",,,,\"Claude 2 tends to perform well at general, open-ended conversation; search, writing, editing, outlining, and summarizing text; coding; and providing helpful advice about a broad range of subjects. Claude 2 is particularly well suited to support creative or literary use cases. They can take direction on tone and “personality,” and users have described them as feeling steerable and conversational.\",,text; text,,Claude 2,USA,Anthropic,Claude 2 should not be used on their own in high stakes situations where an incorrect answer would cause harm.,,,,model,https://www.anthropic.com/index/claude-2,https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf,,unknown,,,,,,,,\nlimited,\"Evaluated on open-ended conversation accuracy and long context question answering. In evaluations, Claude 2.1 demonstrated a 30% reduction in incorrect answers and a 3-4x lower rate of mistakenly concluding a document supports a particular claim.\",2023-11-21,,[],\"Claude 2.1 is an updated version of Claude 2, with an increased context window, less hallucination and tool use.\",,,,,unknown,text; text,,Claude 2.1,USA,Anthropic,,,,unknown,model,https://www.anthropic.com/index/claude-2-1,,unknown,unknown,unknown,,,,,,,\nopen,,2023-12-21,,['Anthropic API'],Claude for Sheets is a Google Sheets add-on that allows the usage of Claude directly in Google Sheets.,,Reviews on https://workspace.google.com/marketplace/app/claude_for_sheets/909417792257,,as an integrated AI assistant in Google Sheets,unknown,,unknown,Claude for Sheets,USA,Anthropic,,,,,application,https://workspace.google.com/marketplace/app/claude_for_sheets/909417792257,,,,,,unknown,unknown,AI-generated text from prompt,https://claude.ai/legal,unknown,\nlimited,\"Evaluated on reasoning, math, coding, reading comprehension, and question answering, outperforming GPT-4 on standard benchmarks.\",2024-03-04,,[],The Claude 3 model family is a collection of models which sets new industry benchmarks across a wide range of cognitive tasks.,,,,\"Claude models excel at open-ended conversation and collaboration on ideas, and also perform exceptionally well in coding tasks and when working with text - whether searching, writing, editing, outlining, or summarizing.\",unknown,\"image, text; text\",,Claude 3,USA,Anthropic,\"Prohibited uses include, but are not limited to, political campaigning or lobbying, surveillance, social scoring, criminal justice decisions, law enforcement, and decisions related to financing, employment, and housing.\",Pre-trained on diverse dataset and aligned with Constitutional AI technique.,,unknown,model,https://www.anthropic.com/news/claude-3-family,https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf,unknown,unknown,unknown,,,,,,,\nopen,\"The model has been evaluated on a range of tests including graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), coding proficiency (HumanEval), and standard vision benchmarks. Claude 3.5 Sonnet demonstrates state-of-the-art performance on most benchmarks.\",,,[],\"Claude 3.5 Sonnet is an AI model with advanced understanding and generation abilities in text, vision, and code. It sets new industry benchmarks for graduate-level reasoning (GPQA), undergrad-level knowledge (MMLU), coding proficiency (HumanEval), and visual reasoning. The model operates at twice the speed of its predecessor, Claude 3 Opus, and is designed to tackle tasks like context-sensitive customer support, orchestrating multi-step workflows, interpreting charts and graphs, and transcribing text from images.\",,Feedback on Claude 3.5 Sonnet can be submitted directly in-product to inform the development roadmap and improve user experience.,,\"The model is intended for complex tasks such as context-sensitive customer support, orchestrating multi-step workflows, interpreting charts and graphs, transcribing text from images, as well as writing, editing, and executing code.\",unknown,\"text; image, text\",\"Unknown of misuse, and policy feedback from external experts has been integrated to ensure robustness of evaluations.\",Claude 3.5 Sonnet,USA,Anthropic,Misuse of the model is discouraged though specific use cases are not mentioned.,\"The model underwent a red-teaming assessment, and has been tested and refined by external experts. It was also provided to the UK's AI Safety Institute (UK AISI) for a pre-deployment safety evaluation.\",,Unknown,model,https://www.anthropic.com/news/3-5-models-and-computer-use,unknown,Unknown,Unknown,Unknown,,,,,,,2024-10-22\nopen,\"Claude 3.5 Haiku offers strong performance and speed across a variety of coding, tool use, and reasoning tasks. Also, it has been tested in extensive safety evaluations and exceeded expectations in reasoning and code generation tasks.\",2024-10-22,,[],\"Claude 3.5 Haiku is Anthropic's fastest model, delivering advanced coding, tool use, and reasoning capability, surpassing the previous Claude 3 Opus in intelligence benchmarks. It is designed for critical use cases where low latency is essential, such as user-facing chatbots and code completions.\",,unknown,,\"Critical use cases where low latency matters, like user-facing chatbots and code completions.\",unknown,text; unknown,unknown,Claude 3.5 Haiku,USA,Anthropic,unknown,\"During Claude 3.5 Haiku’s development, we conducted extensive safety evaluations spanning multiple languages and policy domains.\",,unknown,model,https://www.anthropic.com/claude/haiku,unknown,unknown,unknown,unknown,,,,,,,\nopen,\"Authors perform two quantitative evaluations for image captioning - direct user ratings of relevance and BLEU score. They also propose a new evaluation task: \"\"we propose a new evaluation task where a user is presented with two photographs and one caption. The user must assign the caption to the most relevant image. For evaluation we use a query image, a random image and a generated caption.\"\"\n\",2011-12-12,,['Flickr'],\"SBU Captions Dataset is a collection of 1 million images and associated captions from Flickr, filtered so that the descriptions are likely to refer to visual content.\n\",\"\"\"This produces a very large, but noisy initial set of photographs with associated text. We filter this set of photos so that the descriptions attached to a picture are relevant and visually descriptive.\"\"\n\",,\"\"\"To encourage visual descriptiveness in our collection, we select only those images with descriptions of satisfactory length based on observed lengths in visual descriptions. We also enforce that retained descriptions contain at least 2 words belonging to our term lists and at least one prepositional word, e.g. “on”, “under” which often indicate visible spatial relationships.\"\"\n\",,,\"image, text\",,SBU Captions,USA,Stony Brook University,,unknown,[],1M image-text pairs,dataset,https://proceedings.neurips.cc/paper/2011/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf,,,,,,,,,,,\nopen,,2023-04-26,,['Stable Diffusion'],ARES is a text-to-image generator based on Stable Diffusion. The goal is to provide a simple tool with a user interface allowing mainstream AI access for artists and creators.,,,,,unknown,,,ARES,USA,Faraday Lab,,,,,application,https://faradaylab.fr/,,,,,,,,generated images,,,\nopen,\"\"\"We evaluated Dolly on the instruction-following capabilities described in the InstructGPT paper that ChatGPT is based on and found that it exhibits many of the same qualitative capabilities, including text generation, brainstorming and open Q&A.\"\" [[Databricks Blog Post]] (https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html).\n\",2023-03-24,,\"['GPT-J', 'Alpaca dataset']\",\"\"\"Databricks’ Dolly, a large language model trained on the Databricks\n Machine Learning Platform, demonstrates that a two-years-old open source\n model (GPT-J) can, when subjected to just 30 minutes of fine tuning on a\n focused corpus of 50k records (Stanford Alpaca), exhibit surprisingly\n high quality instruction following behavior not characteristic of the\n foundation model on which it is based.\"\"\n [[Dolly Repository]](https://github.com/databrickslabs/dolly).\n\",,https://github.com/databrickslabs/dolly/issues,,\"\"\"Dolly is intended exclusively for research purposes and is not licensed for commercial use.\"\" [[Limitations]](https://github.com/databrickslabs/dolly#limitations).\n\",Apache 2.0,text; text,,Dolly,USA,Databricks,\"Authors note the following limitations of the model: \"\"The Dolly model family is under active development, and so any list of shortcomings is unlikely to be exhaustive, but we include known limitations and misfires here as a means to document and share our preliminary findings with the community. In particular, dolly-6b struggles with syntactically complex prompts, mathematical operations, factual errors, dates and times, open-ended question answering, hallucination, enumerating lists of specific length, and stylistic mimicry.\"\" [[Limitations]](https://github.com/databrickslabs/dolly#limitations).\n\",,,6B parameters (dense),model,https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html,,unknown,A single NDasrA100_v4 machine with 8x A100 40GB GPUs,30 minutes,,,,,,,\nopen,\"DBRX outperforms established open-source and open-weight base models on the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval. Full evaluation details can be found in the corresponding technical blog post.\",2024-03-27,,[],DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction by Databricks. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. DBRX only accepts text-based inputs and produces text-based outputs.,,https://huggingface.co/databricks/dbrx-base/discussions,,\"DBRX models are open, general-purpose LLMs intended and licensed for both commercial and research applications. They can be further fine-tuned for various domain-specific natural language and coding tasks.\",Databricks Open Model License,text; text,unknown,DBRX,USA,Databricks,\"DBRX models are not intended to be used out-of-the-box in non-English languages, and do not support native code execution, function calling or any use that violates applicable laws or regulations or is otherwise prohibited by the Databricks Open Model License and Databricks Open Model Acceptable Use Policy.\",Recommendations provided for retrieval augmented generation (RAG) in scenarios where accuracy and fidelity are important and additional testing around safety in the context of the specific application and domain is suggested.,,132B parameters (sparse),model,https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,https://huggingface.co/databricks/dbrx-base,unknown,3072 NVIDIA H100s connected by 3.2Tbps Infiniband,3 months,,,,,,,\nclosed,Evaluated on image captioning and visual question answering across many benchmarks.,2024-03-16,,[],\"MM1 is a family of multimodal models, including both dense variants up to 30B and mixture-of-experts (MoE) variants up to 64B.\",,,,,unknown,\"image, text; text\",,MM1,USA,Apple,,,,30B parameters (dense),model,https://arxiv.org/pdf/2403.09611.pdf,,unknown,unknown,unknown,,,,,,,\nopen,\"The models were evaluated in terms of zero-shot, LLM360, and OpenLLM leaderboard results.\",2024-04-24,,\"['RefinedWeb', 'The Pile', 'RedPajama-Data', 'Dolma', 'CoreNet library']\",\"OpenELM is a family of Open-source Efficient Language Models. It uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy.\",,https://huggingface.co/apple/OpenELM-3B-Instruct/discussions,,To empower and enrich the open research community by providing access to state-of-the-art language models.,Apple,text; text,,OpenELM,USA,Apple,\"No explicit prohibited uses stated, though it is noted that users should undertake thorough safety testing.\",unknown,,3B parameters,model,https://machinelearning.apple.com/research/openelm,https://huggingface.co/apple/OpenELM-3B-Instruct,unknown,unknown,unknown,,,,,,,\nopen,Extensive experiments analyze specific design choices and demonstrate that Depth Pro outperforms prior work along multiple dimensions.,2024-10-10,,[],\"We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details... The model is fast, producing a 2.25-megapixel depth map in 0.3 seconds on a standard GPU.\",,unknown,,\"Zero-shot monocular depth estimation underpins a growing variety of applications, such as advanced image editing, view synthesis, and conditional image generation.\",unknown,text; depth maps,unknown,Depth Pro,USA,Apple,unknown,\"dedicated evaluation metrics for boundary accuracy in estimated depth maps, and state-of-the-art focal length estimation from a single image.\",,unknown,model,https://arxiv.org/pdf/2410.02073,unknown,unknown,V100 GPU,unknown,,,,,,,\nlimited,\"Evaluated on SWE-Bench, a challenging software engineering benchmark, where Devin outperforms major state of the art models unassisted.\",2024-03-12,,[],Devin is the world’s first fully autonomous AI software engineer.,,,,,unknown,text; code,,Devin,USA,Cognition Labs,,,,unknown,model,https://www.cognition-labs.com/introducing-devin,,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluations. We use the methodology of Rao et al. to measure unsupervised learning of tertiary structure in the form of contact maps. A logistic regression is used to identify contacts. For a protein of length L, we evaluate the precision of the top L contacts (P@L) with sequence separation of 6 or more residues.\",2024-12-04,,\"['UniRef', 'MGnify', 'Joint Genome Institute']\",ESM Cambrian is a next generation language model trained on protein sequences at the scale of life on Earth. ESM C models define a new state of the art for protein representation learning.,,unknown,,\"one that can be used by builders across a wide range of applications, to imbue AI models with a deeper understanding of the biology of life’s most important and mysterious molecules.\",Cambrian Open License,text; text,unknown,ESM Cambrian,USA,EvolutionaryScale,unknown,ESM C was reviewed by a committee of scientific experts who concluded that the benefits of releasing the models greatly outweigh any potential risks.,,6B parameters,model,https://www.evolutionaryscale.ai/blog/esm-cambrian,unknown,unknown,unknown,unknown,,,,,,,\nopen,Platypus achieves the strongest performance and currently stands at first place in HuggingFace’s Open LLM Leaderboard as of its release date.,2023-08-14,,\"['LLaMA 2', 'Platypus curated dataset']\",Platypus is a family of fine-tuned and merged Large Language Models (LLMs).,,https://huggingface.co/garage-bAInd/Platypus2-13B/discussions,,,CC by-NC-SA 4.0,text; text,,Platypus,USA,Boston University,,,,13B parameters (dense),model,https://arxiv.org/pdf/2308.07317.pdf,https://huggingface.co/garage-bAInd/Platypus2-13B,,1 A100 GPU,5 hours,,,,,,,\nopen,UFOGen is evaluated on standard image benchmarks against other models fine-tuned with Stable Diffusion.,2023-11-14,,['Stable Diffusion'],\"UFOGen is a novel generative model designed for ultra-fast, one-step text-to-image synthesis.\",,,,,unknown,text; image,,UFOGen,USA,Boston University,,,,900M parameters (dense),model,https://arxiv.org/pdf/2311.09257.pdf,,unknown,unknown,unknown,,,,,,,\nopen,unknown,2024-03-13,,[],Kotoba-Speech is a Transformer-based speech generative model that supports fluent text-to-speech generation in Japanese and one-shot voice cloning through speech prompt.,,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1/discussions,,,Apache 2.0,text; audio,unknown,Kotoba Speech,Japan,Kotoba Tech,,,,1.2B parameters (dense),model,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1,https://huggingface.co/kotoba-tech/kotoba-speech-v0.1,unknown,unknown,unknown,,,,,,,\nlimited,,2022-09-20,,[],CodeGeeX is an autoregressive language model trained on code,,,,,Apache 2.0,text; code,,CodeGeeX,China,Tsinghua University,,,,13B parameters (dense),model,https://github.com/THUDM/CodeGeeX,,unknown,THUDM 1536 Ascend 910 (32GB) Cluster,unknown,,,,,,,\nopen,,2021-05-26,,[],CogView is a transformer model for text-to-image generation,,,,,Apache 2.0,text; image,,CogView,China,Tsinghua University,,,,4B parameters (dense),model,https://arxiv.org/abs/2105.13290,,,,,,,,,,,\nopen,,2022-04-28,,[],CogView 2 is a hierarchical transformer for text-to-image generation,,,,,Apache 2.0,text; image,,CogView 2,China,Tsinghua University,,,,6B parameters (dense),model,https://arxiv.org/abs/2204.14217,,,,,,,,,,,\nopen,,2022-05-29,,[],CogVideo is a transformer model for text-to-video generation,,,,,Apache 2.0,text; video,,CogVideo,China,Tsinghua University,,,,unknown,model,https://arxiv.org/abs/2205.15868,,,,,,,,,,,\nopen,,2022-08-04,,\"['The Pile', 'GLM-130B Chinese corpora', 'P3', 'DeepStruct finetuning dataset']\",GLM-130B is a bidirectional language model trained on English and Chinese,,,,,GLM-130B License,text; text,,GLM-130B,China,Tsinghua University,,,,130B parameters (dense),model,https://keg.cs.tsinghua.edu.cn/glm-130b/,,,THUDM 96 DGX-A100 (40G) cluster,,,,,,,,\nopen,Evaluated on image captioning and visual question answering benchmarks.,2023-11-06,,\"['Vicuna', 'CLIP']\",CogVLM is a powerful open-source visual language foundation model,,,,Future multimodal research,custom,\"image, text; text\",,CogVLM,unknown,\"Zhipu AI, Tsinghua University\",,,,17B parameters (dense),model,https://arxiv.org/pdf/2311.03079.pdf,,unknown,unknown,4096 A100 days,,,,,,,\nopen,Evaluated on AlpacaEval Leaderboard benchmarks.,2023-06-27,,['UltraChat'],UltraLM is a series of chat language models trained on UltraChat.,,https://huggingface.co/openbmb/UltraLM-13b/discussions,,,LLaMA 2,text; text,unknown,UltraLM,China,Tsinghua University,,,,13B parameters (dense),model,https://github.com/thunlp/UltraChat#UltraLM,https://huggingface.co/openbmb/UltraLM-13b,unknown,unknown,unknown,,,,,,,\nopen,UltraLM evaluated off of UltraChat is evaluated on standard LLM benchmarks.,2023-04-20,https://huggingface.co/datasets/stingning/ultrachat,[],\"UltraChat is an open-source, large-scale, and multi-round dialogue data powered by Turbo APIs.\",,https://huggingface.co/datasets/stingning/ultrachat/discussions,\"Dialogue data of questions about the world, writing and creation tasks, and questions on existing materials.\",,MIT,text,unknown,UltraChat,China,Tsinghua University,,,[],unknown,dataset,https://github.com/thunlp/UltraChat,,,,,,,,,,,\nopen,,2024-02-27,https://huggingface.co/datasets/DIBT/10k_prompts_ranked,[],\"10k_prompts_ranked is a dataset of prompts with quality rankings created by 314 members of the open-source ML community using Argilla, an open-source tool to label data.\",,https://huggingface.co/datasets/DIBT/10k_prompts_ranked/discussions,,Training and evaluating language models on prompt ranking tasks and as a dataset that can be filtered only to include high-quality prompts. These can serve as seed data for generating synthetic prompts and generations.,unknown,text,,10k_prompts_ranked,USA,Data is Better Together,\"This dataset only contains rankings for prompts, not prompt/response pairs so it is not suitable for direct use for supervised fine-tuning of language models.\",,[],10k examples,dataset,https://huggingface.co/blog/community-datasets,,,,,,,,,,,\nlimited,,2023-09-17,,[],\"Konan LLM is a Large Language Model developed in-house by Konan Technology. Optimized for super-large AI training, it leverages high-quality, large-scale data and over 20 years of expertise in natural language processing.\",,,,\"Document generation, document review, Q&A, customer response scenarios.\",unknown,text; text,,Konan LLM,Japan,Konan,,,,13B parameters,model,https://en.konantech.com/en/llm/konanllm,,unknown,unknown,unknown,,,,,,,\nopen,,2021-12-06,,[],CodeParrot is an autoregressive language model trained on code,,,,,,\"text; code, text\",,CodeParrot,USA,HuggingFace,,,,1B parameters (dense),model,https://twitter.com/lvwerra/status/1467933794699259908,,unknown,16 x A100 (40GB),unknown,,,,,,,\nopen,\"Evaluated on loss, rewards, logps, and logits rejected and chosen.\",2023-10-11,,['Mistral'],Zephyr is a series of language models that are trained to act as helpful assistants.,,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha/discussions,,Educational and research purposes,MIT,text; text,,Zephyr,USA,HuggingFace,,,,7B parameters (dense),model,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha,https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha,unknown,unknown,unknown,,,,,,,\nopen,Evaluated in comparison to Flamingo and OpenFlamingo on standard benchmarks.,2023-08-22,,\"['OBELICS', 'Wikipedia', 'LAION-5B', 'PMD']\",\"IDEFICS is an open-access visual language model, based on Flamingo.\",,https://huggingface.co/HuggingFaceM4/idefics-80b-instruct/discussions,,Educational and research purposes,custom,\"image, text; text\",,IDEFICS,USA,HuggingFace,,,,80B parameters (dense),model,https://huggingface.co/blog/idefics,https://huggingface.co/HuggingFaceM4/idefics-80b-instruct,unknown,unknown,unknown,,,,,,,\nopen,Subset of training dataset evaluated for bias using Data Measurements Tool.,2023-08-22,https://huggingface.co/datasets/HuggingFaceM4/OBELICS,[],OBELICS is a dataset consisting of 141 million interleaved image-text documents scraped from the web and contains 353 million images.,All images for which creators explicitly requested opt-out of AI training.,https://huggingface.co/datasets/HuggingFaceM4/OBELICS/discussions,,,CC-BY-4.0,\"image, text\",,OBELICS,USA,HuggingFace,,Sexual and violent content still present in OBELICS even after filtering.,[],115B tokens,dataset,https://huggingface.co/blog/idefics,,,,,,,,,,,\nopen,\"Evaluated on in-house benchmark, FIN-bench, adapted from BIG-bench for Finnish.\",2023-11-03,,[],FinGPT is a series of Finnish LLMs trained from scratch.,,https://huggingface.co/TurkuNLP/gpt3-finnish-13B/discussions,,,Apache 2.0,text; text,unknown,FinGPT,unknown,\"University of Turku, HuggingFace, National Library of Finland\",,unknown,,13B parameters (dense),model,https://arxiv.org/pdf/2311.05640.pdf,https://huggingface.co/TurkuNLP/gpt3-finnish-13B,unknown,\"192 nodes, each consisting of 4 AMD Instinct MI250X GPUs, a single 64-core AMD Trento CPU and 512GB of memory.\",unknown,,,,,,,\nopen,\"Evaluated on in-house benchmark, FIN-bench, adapted from BIG-bench for Finnish.\",2023-11-03,,['BLOOM'],BLUUMI is a multilingual fine-tuned version of BLOOM.,,https://huggingface.co/TurkuNLP/bloom-finnish-176b/discussions,,,BigScience RAIL v1.0,text; text,unknown,BLUUMI,unknown,\"University of Turku, HuggingFace, National Library of Finland\",,unknown,,176B parameters (dense),model,https://arxiv.org/pdf/2311.05640.pdf,https://huggingface.co/TurkuNLP/bloom-finnish-176b,unknown,\"192 nodes, each consisting of 4 AMD Instinct MI250X GPUs, a single 64-core AMD Trento CPU and 512GB of memory.\",unknown,,,,,,,\nopen,Some seed samples were used in different prompt styles and audiences. Less than 1% of files are duplicates after running MinHash deduplication. Contaminated samples were removed from each dataset split.,2024-02-22,https://huggingface.co/datasets/HuggingFaceTB/cosmopedia,['Mixtral'],\"Cosmopedia is a dataset of synthetic textbooks, blogposts, stories, posts, and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1. The dataset contains over 30 million files and 25 billion tokens, making it the largest open synthetic dataset to date. It covers a variety of topics, mapping worldwide knowledge from Web datasets like RefinedWeb and RedPajama, to generate synthetic content.\",unknown,https://huggingface.co/datasets/HuggingFaceTB/cosmopedia/discussions,,,unknown,text,,Cosmopedia v0.1,unknown,Hugging Face,unknown,Measures were taken to reduce redundancy and ensure diversity in generated content. A decontamination pipeline was implemented to avoid benchmark contamination.,[],25B tokens,dataset,,,,,,,,,,,,\nopen,The performance of Idefics2 has been evaluated on numerous benchmarks. It is top of its class size and competes with much larger models such as LLava-Next-34B and MM1-30B-chat.,2024-04-15,,['The Cauldron'],\"Idefics2 is a general multimodal model that takes as input arbitrary sequences of text and images, generating text responses. It has the capability to describe visual content, answer questions about images, perform basic arithmetic operations, create stories grounded in multiple images, and extract information from documents.\",,https://huggingface.co/HuggingFaceM4/idefics2-8b/discussions,,\"The model can be used for answering questions about images, describing visual content, creating stories grounded in multiple images, extracting information from documents, and performing basic arithmetic operations.\",Apache 2.0,\"image, text; text\",unknown,Idefics2,unknown,Hugging Face,unknown,\"The quality of the model has been ensured by training it on a mixture of openly available datasets and enhancing its OCR capabilities. Further improvements include manipulating images in their native resolutions and aspect ratios, better pre-trained backbones, and allowing for sub-image splitting.\",,8B parameters,model,https://huggingface.co/blog/idefics2,https://huggingface.co/HuggingFaceM4/idefics2-8b,unknown,unknown,unknown,,,,,,,\nopen,,2024-04-15,https://huggingface.co/datasets/HuggingFaceM4/the_cauldron,\"['LNarratives', 'Rendered Text', 'WebSight', 'DaTikz']\",The Cauldron is an open compilation of 50 manually-curated datasets formatted for multi-turn conversations.,,https://huggingface.co/datasets/HuggingFaceM4/the_cauldron/discussions,,,CC BY 4.0,\"image, text\",unknown,The Cauldron,unknown,Hugging Face,,unknown,[],50 vision-language datasets,dataset,https://huggingface.co/blog/idefics2,,,,,,,,,,,\nlimited,,2023-03-01,,['ChatGPT API'],\"Instacart is augmenting the Instacart app to enable customers to ask about food and get inspirational, shoppable answers. This uses ChatGPT alongside Instacart’s own AI and product data from their 75,000+ retail partner store locations to help customers discover ideas for open-ended shopping goals, such as “How do I make great fish tacos?” or “What’s a healthy lunch for my kids?” Instacart plans to launch “Ask Instacart” later this year.\",,,,,,,,Ask Instacart,USA,Instacart,,,,,application,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,,,,,,\nopen,Evaluated on multilingual and NLP benchmarks in comparison with SoTA models of comparable size.,2024-01-20,,[],Orion series models are open-source multilingual large language models trained from scratch by OrionStarAI.,,https://huggingface.co/OrionStarAI/Orion-14B-Base/discussions,,,custom,text; text,unknown,Orion,China,OrionStarAI,,unknown,,14B parameters (dense),model,https://github.com/OrionStarAI/Orion,https://huggingface.co/OrionStarAI/Orion-14B-Base,unknown,unknown,unknown,,,,,,,\nclosed,,2021-11-03,,\"['Conceptual Captions', 'SBU Captions', 'COCO', 'Visual Genome', 'Wikipedia', 'BooksCorpus']\",VLMo is a model for text-to-image generation,,,,,,text; image,,VLMo,USA,Microsoft,,,,562M parameters (dense),model,https://arxiv.org/abs/2111.02358,,,,,,,,,,,\nlimited,,2022-09-28,,[],T-ULRv5 is a language model trained with two unique training objectives,,,,,unknown,text; text,,T-ULRv5,USA,Microsoft,,,,2.2B parameters (dense),model,https://www.microsoft.com/en-us/research/blog/microsoft-turing-universal-language-representation-model-t-ulrv5-tops-xtreme-leaderboard-and-trains-100x-faster/,,,256 A100,Less than two weeks,,,,,,,\nlimited,,2021-12-02,,[],,,,,,unknown,text; text,,Turing NLR-v5,USA,Microsoft,,,,5B parameters (dense),model,https://www.microsoft.com/en-us/research/blog/efficiently-and-effectively-scaling-up-language-model-pretraining-for-best-language-representation-model-on-glue-and-superglue/?OCID=msr_blog_TNLRV5_tw,,,,,,,,,,,\nlimited,,2022-01-28,,['The Pile'],\"Megatron-Turing NLG is a 530B parameter autoregressive language model.\n\",,,,,unknown,text; text,,Megatron-Turing NLG,USA,\"Microsoft, NVIDIA\",,,,530B parameters (dense),model,https://arxiv.org/abs/2201.11990,,,4480 A100s (560 x 8),,,,,,,,\nclosed,,2023-01-05,,[],Vall-E is a neural code model for text-to-speech synthesis,,,,,unknown,text; audio,,VALL-E,USA,Microsoft,,,,unknown,model,https://valle-demo.github.io/,,,16 V100 32GB GPUs,,,,,,,,\nlimited,,2021-06-29,,['Codex'],\"GitHub CoPilot is a coding pair programmer assisting programmers as they write code.\n\",,\"Feedback can be provided in the CoPilot feedback project [[CoPilot feedback]] (https://github.com/github/feedback/discussions/categories/copilot-feedback).\n\",,\"GitHub CoPilot is intended to be used as a coding assistant.\n\",unknown,,\"value: unknown explanation: >\n  There may be internal monitoring mechanisms unknown to the public.\n\",GitHub CoPilot,USA,Microsoft,\"Access to GPT-3 is governed by GitHub Acceptable Use Policies and Terms of Service, both of which list a set of prohibited uses [[Use Policies]] (https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies) [[Terms of Service]] (https://docs.github.com/en/site-policy/github-terms/github-terms-of-service).\n\",\"GitHub is working on a filter to detect and suppress code generations that are verbatim from the training set [[GitHub Research Recitation]] (https://docs.github.com/en/github/copilot/research-recitation). According to the FAQ, GitHub implemented a simple filter that blocks emails in standard formats to protect personally identifiable data that may be present in the training data [[GitHub CoPilot]](https://copilot.github.com/).\n\",,,application,https://copilot.github.com/,,,,,unknown,unknown,\"GitHub Copilot reportedly has over 1 million sign-ups [[Tweet Source]](https://twitter.com/sama/status/1539737789310259200?s=21&t=YPaYd0ZueJzrR6rLslUqzg).\n\",Code completions,\"https://docs.github.com/en/site-policy/github-terms/github-terms-of-service\n\",unknown,\nopen,,2022-09-24,,['PubMed'],,,,,,MIT,text; text,,BioGPT,USA,Microsoft,,,,1.5B parameters (dense),model,https://academic.oup.com/bib/article/23/6/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9&login=true,,,,,,,,,,,\nlimited,,2023-02-07,,['ChatGPT API'],\"AI-powered Bing search engine and Edge browser, available in preview now at Bing.com, to deliver better search, more complete answers, a new chat experience and the ability to generate content. We think of these tools as an AI copilot for the web.\",,\"Feedback can be submitted at [bing.com](bing.com).\n\",,Search engine,custom,,,Bing Search,USA,Microsoft,,,,,application,https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/,,,,,unknown,,,Search results,https://www.microsoft.com/legal/terms-of-use,,\nclosed,,2023-03-01,,\"['The Pile', 'CommonCrawl', 'LAION-2B-en', 'LAION-400M', 'COYO-700M', 'Conceptual Captions']\",\"KOSMOS-1 is a multimodal language model that is capable of perceiving multimodal input, following instructions, and performing in-context learning for not only language tasks but also multimodal tasks.\",,,,,MIT,\"image, text; image, text\",,KOSMOS-1,USA,Microsoft,,,,1.6B parameters (dense),model,https://arxiv.org/pdf/2302.14045.pdf,,,,,,,,,,,\nclosed,,2023-02-07,,[],\"In the context of Bing, we have developed a proprietary way of working with the OpenAI model that allows us to best leverage its power. We call this collection of capabilities and techniques the Prometheus model. This combination gives you more relevant, timely and targeted results, with improved safety.\",,,,,unknown,unknown,,Prometheus,USA,Microsoft,,,,unknown,model,https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/,,,,,,,,,,,\nclosed,,2022-11-23,,['FLD-900M'],,,,,,unknown,text; image,,Florence,USA,Microsoft,,,,900M parameters (dense),model,https://arxiv.org/abs/2111.11432,,,,,,,,,,,\nclosed,,2022-11-23,,[],,,,,,unknown,\"image, text\",,FLD-900M,USA,Microsoft,,,[],900M image-text pairs,dataset,https://arxiv.org/abs/2111.11432,,,,,,,,,,,\nlimited,,2023-03-07,,['Florence'],\"Cost-effective, production-ready computer vision services in Azure Cognitive Service for Vision. The improved Vision Services enables developers to create cutting-edge, market-ready, responsible computer vision applications across various industries.\",,,,,custom,,,Azure Cognitive Services for Vision,USA,Microsoft,,,,,application,https://azure.microsoft.com/en-us/blog/announcing-a-renaissance-in-computer-vision-ai-with-microsofts-florence-foundation-model/?utm_content=buffer16fa0&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer,,,,,,,,,https://azure.microsoft.com/en-us/support/legal/,,\nclosed,,2023-03-08,,['OpenAI API'],,,,,,,\"text; image, text\",,VisualChatGPT,USA,Microsoft,,,,unknown,model,https://arxiv.org/pdf/2303.04671.pdf,,,,,,,,,,,\nlimited,,2023-03-16,,['GPT-4 API'],It combines the power of language models with your data in the Microsoft Graph and the Microsoft 365 apps to turn your words into the most powerful productivity tool on the planet.,,,,,custom,,,Microsoft 365 Copilot,USA,Microsoft,,,,,application,https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,\nlimited,,2023-03-16,,['Microsoft 365 Copilot'],\"Business Chat works across the langugae model, the Microsoft 365 apps, and your data — your calendar, emails, chats, documents, meetings and contacts — to do things you’ve never been able to do before. You can give it natural language prompts like “Tell my team how we updated the product strategy,” and it will generate a status update based on the morning’s meetings, emails and chat threads.\",,,,,custom,,,Microsoft Business Chat,USA,Microsoft,,,,,application,https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,\nopen,,,,['Microsoft 365 Copilot'],\"Microsoft Excel is the industry leading spreadsheet software program, a powerful data visualization and analysis tool.\",,,,,custom,,,Microsoft Excel,USA,Microsoft,,,,,application,https://www.microsoft.com/en-us/microsoft-365/excel,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,\nopen,,,,['Microsoft 365 Copilot'],\"Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft Office and Microsoft 365 software suites.\",,,,,custom,,,Microsoft Outlook,USA,Microsoft,,,,,application,https://www.microsoft.com/en-us/microsoft-365/outlook/email-and-calendar-software-microsoft-outlook,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,\nlimited,,,,['Microsoft 365 Copilot'],\"Microsoft Power Platform is a line of business intelligence, app development, and app connectivity software applications.\",,,,,custom,,,Microsoft Power Platform,USA,Microsoft,,,,,application,https://powerplatform.microsoft.com/en-us/,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,\nopen,,,,['Microsoft 365 Copilot'],Microsoft PowerPoint empowers you to create clean slideshow presentations and intricate pitch decks and gives you a powerful presentation maker.,,,,,custom,,,Microsoft PowerPoint,USA,Microsoft,,,,,application,https://www.microsoft.com/en-us/microsoft-365/powerpoint,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,\nopen,,,,\"['Microsoft 365 Copilot', 'Microsoft Business Chat']\",\"Microsoft Teams is a proprietary business communication platform developed by Microsoft, as part of the Microsoft 365 family of products.\",,,,,custom,,,Microsoft Teams,USA,Microsoft,,,,,application,https://www.microsoft.com/en-us/microsoft-teams/group-chat-software,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,\nopen,,,,['Microsoft 365 Copilot'],Microsoft Word is a word processing software developed by Microsoft,,,,,custom,,,Microsoft Word,USA,Microsoft,,,,,application,https://www.microsoft.com/en-us/microsoft-365/word,,,,,,,,,https://www.microsoft.com/legal/terms-of-use,,\nlimited,,,,[],\"Inside look is a Microsoft Office feature, composing document insights highlighting key points, expected time to read, and popularity among others.\n\",,unknown,,Providing document insights to users.,custom,,unknown,Microsoft Inside Look,USA,Microsoft,unknown,unknown,,,application,https://support.microsoft.com/en-us/office/see-file-insights-before-you-open-a-file-87a23bbc-a516-42e2-a7b6-0ecb8259e026,,,,,unknown,unknown,unknown,Document level insights for users.,https://www.microsoft.com/legal/terms-of-use,unknown,\nlimited,,,,[],\"Suggested replies is a Microsoft Outlook feature that suggests responses to emails, available in: English, Spanish, Italian, French, German, Portuguese Chinese Simplified, Chinese Traditional, Swedish, Russian, Korean, Czech, Hungarian, Arabic, Hebrew, Thai, Turkish, Japanese, Dutch, Norwegian, Danish, and Polish.\n\",,unknown,,Suggesting email replies.,custom,,unknown,Microsoft Suggested Replies,USA,Microsoft,unknown,unknown,,,application,https://support.microsoft.com/en-us/office/use-suggested-replies-in-outlook-19316194-0434-43ba-a742-6b5890157379,,,,,unknown,unknown,unknown,Suggested emails.,https://www.microsoft.com/legal/terms-of-use,unknown,\nlimited,,2023-03-28,,\"['GPT-4', 'Microsoft security-specific model']\",\"Microsoft Security Copilot is an AI-powered security analysis tool that enables analysts to respond to threats quickly, process signals at machine speed, and assess risk exposure in minutes.\n\",,unknown,,\"Security Copilot is designed to enhance the capabilities of cybersecurity professionals. It leverages machine speed and scale to accelerate response to security incidents, discover and process threat signals, and assess risk exposure within minutes.\",custom,,,Microsoft Security Copilot,USA,Microsoft,unknown,\"Security Copilot employs a closed-loop learning system that learns from user interactions and feedback, enabling it to provide more coherent, relevant, and useful answers that continually improve over time. Security Copilot is committed to delivering safe, secure, and responsible AI solutions, ensuring that customers' data and AI models are protected with enterprise compliance and security controls. Customer data is owned and controlled by them, and not used to train AI models for anyone outside their organization.\",,,application,https://blogs.microsoft.com/blog/2023/03/28/introducing-microsoft-security-copilot-empowering-defenders-at-the-speed-of-ai/,,,,,Security Copilot combines OpenAI's GPT-4 generative AI with a security-specific model from Microsoft. This security-specific model in turn incorporates a growing set of security-specific skills and is informed by Microsoft's unique global threat intelligence and more than 65 trillion daily signals.,unknown,unknown,\"Actionable responses to security-related questions (text and image). Security event, incident or threat reports (PowerPoint slide).\",https://www.microsoft.com/legal/terms-of-use,unknown,\nopen,\"Evaluated on GLUE, SQuAD 2.0, and CoQA benchmarks.\",2019-10-01,,[],UniLM is a unified language model that can be fine-tuned for both natural language understanding and generation tasks.,,,,,MIT,text; text,,UniLM,USA,Microsoft,,,,340M parameters (dense),model,https://proceedings.neurips.cc/paper_files/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf,,unknown,8 NVIDIA Tesla V100 32GB GPUs,\"10,000 steps in 7 hours\",,,,,,,\nlimited,,2021-04-12,,[],Docugami is a LLM focused on writing business documents and data using generative AI.,,,,\"analyzing, writing, and connecting business documents and data\",,text; text,,Docugami,USA,Microsoft,,,,20B parameters (dense),model,https://www.docugami.com/generative-ai,,unknown,,unknown,,,,,,,\nopen,\"Evaluated on a range of standardized vision benchmarks, and achieves state of the art performance on all experimentally.\",2022-08-31,,['Multiway Transformer network'],BEiT-3 is a general-purpose multimodal foundation model for vision and vision-language tasks.,,,,,,\"image, text; image, text\",,BEiT-3,USA,Microsoft,,,,1.9B parameters (dense),model,https://arxiv.org/pdf/2208.10442.pdf,,unknown,,,,,,,,,\nopen,Reports results on standard LLM benchmarks in comparison to other LLMs and test sets.,2023-04-24,,\"['LLaMA', 'Evol-Instruct', 'Alpaca dataset']\",\"Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM.\",,https://huggingface.co/datasets/WizardLM/evol_instruct_70k/discussions,,\"Creating large amounts of instruction data, particularly with high complexity\",Apache 2.0,text; text,,WizardLM,USA,Microsoft,,,,7B parameters (dense),model,https://arxiv.org/pdf/2304.12244v1.pdf,https://huggingface.co/WizardLM/WizardLM-13B-1.0,,8 V100 GPUs,70 hours on 3 epochs,,,,,,,\nopen,\"Evaluated on four prominent code generation benchmarks HumanEval, HumanEval+, MBPP, and DS100.\",2023-08-26,,\"['Evol-Instruct', 'Alpaca dataset', 'StarCoder']\",\"WizardCoder empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.\",,https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0/discussions,,,BigCode Open Rail-M,text; text,,WizardCoder,USA,Microsoft,,,,34B parameters (dense),model,https://arxiv.org/pdf/2306.08568.pdf,https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0,,,,,,,,,,\nclosed,Evaluated on standard image processing benchmarks,2023-11-10,,['FLD-5B'],\"WizardCoder empowers Code LLMs with complex instruction fine-tuning, by adapting the Evol-Instruct method to the domain of code.\",,,,,unknown,\"image, text; text\",,Florence-2,USA,Microsoft,,,,771M parameters (dense),model,https://arxiv.org/pdf/2311.06242.pdf,,unknown,unknown,unknown,,,,,,,\nclosed,FLD-5B evaluated in comparison to datasets that power other large-scale image models on standard image benchmarks.,2023-11-10,,[],FLD-5B is the dataset that powers Florence-2,,,,,unknown,\"image, text\",,FLD-5B,USA,Microsoft,,,[],1.3B image-text annotations,dataset,https://arxiv.org/pdf/2311.06242.pdf,,,,,,,,,,,\nopen,Models trained on OpenOrca compared to GPT-series on language benchmarks.,2023-06-05,https://huggingface.co/datasets/Open-Orca/OpenOrca,\"['GPT-3.5', 'GPT-4', 'Flan Collection']\",\"The OpenOrca dataset is a collection of augmented FLAN Collection data. Currently ~1M GPT-4 completions, and ~3.2M GPT-3.5 completions. It is tabularized in alignment with the distributions presented in the ORCA paper and currently represents a partial completion of the full intended dataset, with ongoing generation to expand its scope.\",,,,training and evaluation in the field of natural language processing.,MIT,text,,OpenOrca,USA,Microsoft,,,[],4.5M text queries,dataset,https://huggingface.co/datasets/Open-Orca/OpenOrca,,,,,,,,,,,\nopen,LlongOrca evaluated on BigBench-Hard and AGIEval results.,2023-08-01,,\"['OpenOrca', 'LLongMA-2']\",LlongOrca is an attempt to make OpenOrca able to function in a Llong context.,,https://huggingface.co/Open-Orca/LlongOrca-7B-16k/discussions,,training and evaluation in the field of natural language processing.,LLaMA 2,text; text,,LlongOrca,USA,Microsoft,,,,7B parameters (dense),model,https://huggingface.co/Open-Orca/LlongOrca-7B-16k,https://huggingface.co/Open-Orca/LlongOrca-7B-16k,unknown,8x A6000-48GB (first-gen) GPUs,37 hours,,,,,,,\nopen,\"Evaluated on common sense reasoning, language understanding, and multi-step reasoning compared to other SOTA language models.\",2023-09-11,,['phi-1'],Phi-1.5 is a large language transformer model.,,https://huggingface.co/microsoft/phi-1_5/discussions,,\"Phi-1.5 is best suited for answering prompts using the QA format, the chat format, and the code format.\",MIT,text; text,,Phi-1.5,USA,Microsoft,,generic web-crawl data is removed from dataset.,,1.3B parameters (dense),model,https://arxiv.org/pdf/2309.05463.pdf,https://huggingface.co/microsoft/phi-1_5,unknown,32 A100-40G GPUs,8 days,,,,,,,\nopen,Orca 2 has been evaluated on a large number of tasks ranging from reasoning to grounding and safety.,2023-11-21,,['LLaMA 2'],Orca 2 is a finetuned version of LLAMA-2 for research purposes.,,https://huggingface.co/microsoft/Orca-2-13b/discussions,,Orca 2 is built for research purposes only. The main purpose is to allow the research community to assess its abilities and to provide a foundation for building better frontier models.,custom,text; text,unknown,Orca 2,USA,Microsoft,Any purposes other than research.,,,13B parameters (dense),model,https://arxiv.org/pdf/2311.11045.pdf,https://huggingface.co/microsoft/Orca-2-13b,unknown,32 NVIDIA A100 80GB GPUs,80 hours,,,,,,,\nopen,\"The model has been evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning. The Phi-3 Medium-128K-Instruct demonstrated robust and state-of-the-art performance.\",2024-05-21,,[],\"Phi-3 is a 14 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.\",,https://huggingface.co/microsoft/Phi-3-medium-128k-instruct/discussions,,The model's primary use cases are for commercial and research purposes that require capable reasoning in memory or compute constrained environments and latency-bound scenarios. It can also serve as a building block for generative AI-powered features.,MIT,text; text,\"Issues like allocation, high-risk scenarios, misinformation, generation of harmful content and misuse should be monitored and addressed.\",Phi-3,USA,Microsoft,\"The model should not be used for high-risk scenarios without adequate evaluation and mitigation techniques for accuracy, safety, and fairness.\",The model underwent post-training processes viz. supervised fine-tuning and direct preference optimization to increase its capability in following instructions and aligning to safety measures.,,14B parameters,model,https://arxiv.org/abs/2404.14219,https://huggingface.co/microsoft/Phi-3-medium-128k-instruct,unknown,unknown,unknown,,,,,,,\nclosed,Evaluated by comparing climate predictions to actual happened events.,2024-05-28,,[],Aurora is a large-scale foundation model of the atmosphere trained on over a million hours of diverse weather and climate data.,,,,,unknown,text; climate forecasts,,Aurora,USA,Microsoft,,,,1.3B parameters,model,https://arxiv.org/pdf/2405.13063,,unknown,32 A100 GPUs,unknown,,,,,,,\nclosed,\"Evaluated on a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, with Prov-GigaPath demonstrating SoTA performance in 25 out of 26 tasks.\",2024-05-22,,['GigaPath'],Prov-GigaPath is a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles.,,,,,unknown,image; embeddings,,Prov-GigaPath,USA,Microsoft,,,,unknown,model,https://www.nature.com/articles/s41586-024-07441-w,,unknown,4 80GB A100 GPUs,2 days,,,,,,,\nopen,\"The model was evaluated across a variety of public benchmarks, comparing with a set of models including Mistral-Nemo-12B-instruct-2407, Llama-3.1-8B-instruct, Gemma-2-9b-It, Gemini-1.5-Flash, and GPT-4o-mini-2024-07-18. It achieved a similar level of language understanding and math as much larger models. It also displayed superior performance in reasoning capability, even with only 6.6B active parameters. It was also evaluated for multilingual tasks.\",2024-09-08,,['Phi-3 dataset'],\"Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents, with a focus on very high-quality, reasoning dense data. It supports multilingual and has a 128K context length in tokens. The model underwent a rigorous enhancement process, incorporating supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure instruction adherence and robust safety measures.\",,Unknown,,\"The model is intended for commercial and research use in multiple languages. It is designed to accelerate research on language and multimodal models, and for use as a building block for generative AI powered features. It is suitable for general purpose AI systems and applications which require memory/computed constrained environments, latency bound scenarios, and strong reasoning.\",MIT,text; text,Unknown,Phi-3.5-MoE,USA,Microsoft,\"The model should not be used for downstream purposes it was not specifically designed or evaluated for. Developers should evaluate and mitigate for accuracy, safety, and fariness before using within a specific downstream use case, particularly for high risk scenarios.\",\"The model was enhanced through supervised fine-tuning, proximal policy optimization, and direct preference optimization processes for safety measures.\",,61B parameters (sparse); 6.6B active parameters,model,https://huggingface.co/microsoft/Phi-3.5-MoE-instruct,https://huggingface.co/microsoft/Phi-3.5-MoE-instruct,Unknown,Unknown,Unknown,,,,,,,\nopen,Phi-4 outperforms comparable and larger models on math related reasoning.,2024-12-13,,[],\"the latest small language model in Phi family, that offers high quality results at a small size (14B parameters).\",,unknown,,\"Specialized in complex reasoning, particularly good at math problems and high-quality language processing.\",unknown,text; text,Azure AI evaluations in AI Foundry enable developers to iteratively assess the quality and safety of models and applications using built-in and custom metrics to inform mitigations.,Phi-4,USA,Microsoft,unknown,Building AI solutions responsibly is at the core of AI development at Microsoft. We have made our robust responsible AI capabilities available to customers building with Phi models.,,14B parameters,model,https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090,unknown,unknown,unknown,unknown,,,,,,,\nopen,,2023-05-03,,[],\"RWKV World 4 is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).\",,https://huggingface.co/RWKV/rwkv-4-world-7b/discussions,,,Apache 2.0,text; text,unknown,RWKV World 4,China,RWKV,,,,7B parameters (dense),model,https://huggingface.co/RWKV/rwkv-4-world-7b,https://huggingface.co/RWKV/rwkv-4-world-7b,unknown,unknown,unknown,,,,,,,\nopen,,2023-05-15,,[],\"RWKV 4 Pile is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).\",,https://huggingface.co/RWKV/rwkv-4-14b-pile/discussions,,,Apache 2.0,text; text,unknown,RWKV 4 Pile,China,RWKV,,,,14B parameters (dense),model,https://huggingface.co/RWKV/rwkv-4-14b-pile,https://huggingface.co/RWKV/rwkv-4-14b-pile,unknown,unknown,unknown,,,,,,,\nopen,,2023-12-16,,[],\"RWKV World 5 is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).\",,https://huggingface.co/RWKV/rwkv-5-world-3b/discussions,,,Apache 2.0,text; text,unknown,RWKV World 5,China,RWKV,,,,3B parameters (dense),model,https://huggingface.co/RWKV/rwkv-5-world-3b,https://huggingface.co/RWKV/rwkv-5-world-3b,unknown,unknown,unknown,,,,,,,\nopen,The model was evaluated extensively across a wide range of public and in-house benchmarks. The comparative analysis showed that the performance of EXAONE 3.0 was competitive in English and excellent in Korean compared to other large language models of a similar size.,2024-09-08,,['MeCab'],EXAONE 3.0 is an instruction-tuned large language model developed by LG AI Research. It demonstrates notably robust performance across a range of tasks and benchmarks. It has been fine-tuned to be capable of complex reasoning and has a particular proficiency in Korean. The released 7.8B parameter model is designed to promote open research and innovation.,,Unknown,,\"The model was intended for non-commercial and research purposes. The capabilities of the model allow for use cases that involve advanced AI and language processing tasks, particularly in fields requiring proficiency in English and Korean.\",Unknown,text; text,Unknown,EXAONE 3.0 Instruction Tuned Language Model,South Korea,LG AI Research,Commercial use is not intended for this model. Its intended use is for non-commercial research and innovation.,\"Extensive pre-training on a diverse dataset, and advanced post-training techniques were employed to enhance instruction-following capabilities. The model was also trained to fully comply with data handling standards.\",,7.8B parameters (dense),model,https://arxiv.org/pdf/2408.03541,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,,2022-10-11,,\"['SARS-CoV-2 genome dataset', 'BV-BRC dataset']\",,,,,,MIT,text; genome sequence,,GenSLM,USA,Argonne National Laboratory,,,,25B parameters (dense),model,https://www.biorxiv.org/content/10.1101/2022.10.10.511571v1,,,,,,,,,,,\nlimited,,2023-03-19,,[],Cformers is a set of transformers that act as an API for AI inference in code.,,,,,MIT,,,Cformers,USA,Nolano,,,,,application,https://www.nolano.org/services/Cformers/,,,,,,,,,,,\nlimited,The model can produce problematic outputs (especially when prompted to do so).,2024-11-21,,['Llama 3.1'],\"Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques.\",,unknown,,Tülu3 is intended for research and educational use.,Llama 3.1 Community License Agreement,text; text,unknown,Llama 3.1 Tulu 3,USA,Allen Institute for AI,The model can produce problematic outputs (especially when prompted to do so).,\"The Tülu3 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT.\",,70B parameters,model,https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B,https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on text and code benchmarks in comparison to other models.,2023-10-10,,\"['LLaMA 2', 'The Stack', 'RefinedWeb', 'RedPajama', 'Common Crawl', 'Wikipedia', 'ArXiv']\",Lemur is an openly accessible language model optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.,,https://huggingface.co/OpenLemur/lemur-70b-v1/discussions,,,LLaMA2,\"code, text; code, text\",,Lemur,USA,OpenLemur,,,,70B parameters (dense),model,https://arxiv.org/pdf/2310.06830.pdf,https://huggingface.co/OpenLemur/lemur-70b-v1,unknown,TPUv4-512 pod,unknown,,,,,,,\nopen,Evaluated on text and code benchmarks in comparison to other models.,2023-10-10,,\"['Lemur', 'OpenAssistant 1', 'OpenOrca', 'ShareGPT & ChatLogs', 'Evol-CodeAlpaca data']\",Lemur-Chat is an openly accessible language model optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.,,https://huggingface.co/OpenLemur/lemur-70b-chat-v1/discussions,,,CC-BY-NC-4.0,text; text,,Lemur-Chat,USA,OpenLemur,,,,70B parameters (dense),model,https://arxiv.org/pdf/2310.06830.pdf,https://huggingface.co/OpenLemur/lemur-70b-chat-v1,unknown,unknown,unknown,,,,,,,\nOpen,\"Evaluations show that GLM-4, 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench.\",2023-07-02,,[],\"ChatGLM is an evolving family of large language models that have been developed over time. The GLM-4 language series, includes GLM-4, GLM-4-Air, and GLM-4-9B. They are pre-trained on ten trillions of tokens mostly in Chinese and English and are aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) to use.\",,Unknown,,\"General language modeling, complex tasks like accessing online information via web browsing and solving math problems using Python interpreter.\",Apache 2.0,text; text,Unknown,ChatGLM,unknown,\"Team GLM, Zhipu AI, Tsinghua University\",Unknown,\"High-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback.\",,9B parameters,model,https://arxiv.org/pdf/2406.12793,https://huggingface.co/THUDM/glm-4-9b,Unknown,Unknown,Unknown,,,,,,,\nopen,https://huggingface.co/spaces/bigscience-data/roots-search,2022-06-06,https://huggingface.co/spaces/bigscience-data/bigscience-corpus,[],\"The Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter BigScience Large Open-science Open-access Multilingual (BLOOM) language model.\",unknown,https://huggingface.co/spaces/bigscience-data/roots-search/discussions,See section 2 of the paper.,\"To empower large-scale monolingual and multilingual modeling projects with both the data and the processing tools, as well as stimulate research around this large multilingual corpus.\",custom,\"code, text\",,ROOTS,International,BigScience,,\"Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.\",['https://huggingface.co/spaces/bigscience-data/roots-search'],1.6TB,dataset,https://openreview.net/forum?id=UoEw6KigkUn,,,,,,,,,,,\nopen,See the paper.,2022-10-15,https://huggingface.co/datasets/bigscience/P3,[],The Public Pool of Prompts (P3) are prompts written in an unified format use to train T0++.,\"We conservatively decided not to prompt datasets that contain potentially harmful content (for instance, datasets built on social media content).\",Point of Contact is [Victor Sanh](https://huggingface.co/datasets/bigscience/P3),The Public Pool of Prompts relies on the Hugging Face Dataset library. Any public dataset in the Datasets library can be prompted. We select the datasets that have at least one subset in English and excluded datasets containing (predominantly) non-natural language examples.,Multitask finetuning of language models.,Apache 2.0,text,,P3,International,BigScience,,\"Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.\",['https://huggingface.co/datasets/bigscience/P3/viewer'],2000 prompts,dataset,https://arxiv.org/pdf/2110.08207.pdf,,,,,,,,,,,\nopen,See the paper.,2022-11-03,https://huggingface.co/datasets/bigscience/xP3,['P3'],\"xP3 (Crosslingual Public Pool of Prompts) is a collection of prompts and datasets across 46 of languages & 16 NLP tasks. It is used for the training of BLOOMZ and mT0, multilingual language models capable of following human instructions in dozens of languages zero-shot.\",\"We conservatively decided not to prompt datasets that contain potentially harmful content (for instance, datasets built on social media content).\",Point of Contact is [Niklas Muennighoff](https://huggingface.co/datasets/bigscience/xP3),xP3 adds 28 multilingual datasets to P3 based on the P3 task taxonomy.,Multitask finetuning of language models.,Apache 2.0,\"code, text\",,xP3,International,BigScience,,\"Data collection involved merging and deduplicating searches to remove menus, HTML tags. Further, a quality improvement pipeline was implemented.\",['https://huggingface.co/datasets/bigscience/xP3/viewer'],9.4GB,dataset,https://arxiv.org/abs/2211.01786,,,,,,,,,,,\nopen,,2021-10-15,,\"['T5', 'P3']\",T0++ is an multitask fine-tuned language model based on T5.,,https://huggingface.co/bigscience/T0pp/discussions,,\"You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction.\",Apache 2.0,text; text,,T0++,International,BigScience,,https://arxiv.org/pdf/2110.08207.pdf,,11B parameters (dense),model,https://arxiv.org/pdf/2110.08207.pdf,https://huggingface.co/bigscience/T0pp,0.9 tCO2e,Jean Zay (v3-512),27 hours,,,,,,,\nopen,,2022-07-12,,['ROOTS'],BLOOM is an autoregressive multilingual language model.,,https://huggingface.co/bigscience/bloom/discussions,,This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive.,BigScience RAIL v1.0,\"code, text; code, text\",,BLOOM,International,BigScience,\"Using the model in high-stakes settings is out of scope for this model (e.g. biomedical/political/legal/finance domains, evaluating or scoring individuals). The model is not designed for critical decisions nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct. Misuse. Intentionally using the model for harm, violating human rights, or other kinds of malicious activities, is a misuse of this model (e.g. spam generation, disinformation, disparagement, deception, surveillance).\",,,176B parameters (dense),model,https://arxiv.org/abs/2211.05100,,25 tCO2e,Jean Zay (48 * 8xA100 80GB nodes),7039 petaflop/s-days,,,,,,,\nopen,https://huggingface.co/bigscience/bloomz#evaluation,2021-10-15,,\"['mT5', 'xP3']\",mT0 is an multitask fine-tuned multilingual language model based on mT5.,,https://huggingface.co/bigscience/bloomz/discussions,,We recommend using the model to perform tasks expressed in natural language.,BigScience RAIL v1.0,text ;text,,mT0,International,BigScience,,https://arxiv.org/pdf/2211.01786.pdf,,13B parameters (dense),model,https://arxiv.org/pdf/2110.08207.pdf,https://huggingface.co/bigscience/T0pp,unknown,\"Jean Zay (288 A100 80GB GPUs with 8 GPUs per node (36 nodes) using NVLink 4 inter-gpu connects, 4 OmniPath links)\",unknown,,,,,,,\nopen,https://huggingface.co/bigscience/bloomz#evaluation,2022-11-03,,\"['BLOOM', 'xP3']\",BLOOMZ is an multitask fine-tuned autoregressive multilingual language model.,,https://huggingface.co/bigscience/bloomz/discussions,,We recommend using the model to perform tasks expressed in natural language.,BigScience RAIL v1.0,\"code, text; code, text\",,BLOOMZ,International,BigScience,,https://arxiv.org/pdf/2211.01786.pdf,,176B parameters (dense),model,https://arxiv.org/pdf/2211.01786.pdf,https://huggingface.co/bigscience/bloomz,unknown,\"Jean Zay (288 A100 80GB GPUs with 8 GPUs per node (36 nodes) using NVLink 4 inter-gpu connects, 4 OmniPath links)\",unknown,,,,,,,\nlimited,\"Evaluated on MMLU, GSM8K, HumanEval, and GPQA benchmarks, among others.\",2024-02-12,,[],\"Reka Flash is a multimodal, multilingual, state-of-the-art 21B model trained entirely from scratch.\",,,,,unknown,\"image, text, video; text\",unknown,Reka Flash,USA,Reka,,,,21B parameters (dense),model,https://reka.ai/reka-flash-an-efficient-and-capable-multimodal-language-model/,,unknown,unknown,unknown,,,,,,,\nlimited,\"Reka Core was evaluated against leading models such as OpenAIs GPT-4, Claude-3 Opus, and Gemini Ultra on a variety of tasks and metrics including multimodal and human evaluation conducted by a third party. It was found to be competitive or even surpassing these models.\",2024-04-15,,[],\"Reka Core is a frontier-class multimodal language model comparable to industry leaders. It has powerful capabilities including multimodal understanding (including images, videos, and audio), superb reasoning abilities, code generation, and multilinguality with proficiency in 32 languages.\",,unknown,,\"Reka Core can be used in e-commerce, social media, digital content and video games, healthcare, robotics, and other industries for tasks that require multimodal understanding, coding, complex reasoning, and more.\",unknown,\"audio, image, text, video; text\",unknown,Reka Core,USA,Reka,unknown,,,unknown,model,https://www.reka.ai/news/reka-core-our-frontier-class-multimodal-language-model,,unknown,thousands of GPUs,few months,,,,,,,\nopen,Mochi 1 sets a new best-in-class standard for open-source video generation. It also performs very competitively with the leading closed models... We benchmark prompt adherence with an automated metric using a vision language model as a judge following the protocol in OpenAI DALL-E 3. We evaluate generated videos using Gemini-1.5-Pro-002.,2025-01-14,,\"['DDPM', 'DreamFusion', 'Emu Video', 'T5-XXL']\",\"Mochi 1 is an open-source video generation model designed to produce high-fidelity motion and strong prompt adherence in generated videos, setting a new standard for open video generation systems.\",,unknown,,\"Advance the field of video generation and explore new methodologies. Build innovative applications in entertainment, advertising, education, and more. Empower artists and creators to bring their visions to life with AI-generated videos. Generate synthetic data for training AI models in robotics, autonomous vehicles and virtual environments.\",Apache 2.0,text; video,unknown,Mochi 1,USA,Genmo,unknown,robust safety moderation protocols in the playground to ensure that all video generations remain safe and aligned with ethical guidelines.,,10B parameters,model,https://www.genmo.ai/blog,unknown,unknown,unknown,unknown,,,,,,,\nlimited,,2023-03-14,,['GPT-4 API'],An AI-powered assistant that functions as both a virtual tutor for students and a classroom assistant for teachers.,,,,,unknown,,,Khanmigo,USA,Khan Academy,,,,,application,https://www.khanacademy.org/khan-labs#khanmigo,,,,,,,,,,,\nlimited,,2023-02-23,,\"['ChatGPT API', 'Sonantic AI']\",\"The DJ is a personalized AI guide that knows you and your music taste so well that it can choose what to play for you. This feature, first rolling out in beta, will deliver a curated lineup of music alongside commentary around the tracks and artists we think you’ll like in a stunningly realistic voice.\",,,,,custom,,,AI DJ,Sweden,Spotify,,,,,application,https://newsroom.spotify.com/2023-02-22/spotify-debuts-a-new-ai-dj-right-in-your-pocket/,,,,,,,,,https://www.spotify.com/us/legal/end-user-agreement/,,\nlimited,,,,['OpenAI API'],\"Viable analyzes qualitative consumer feedback and provides summary feedback to companies.\n\",,unknown,,\"Intended to be used by companies to digest qualitative consumer feedback.\n\",unknown,,unknown,Viable,USA,Viable,\"Prohibited uses are listed in the Terms of Service [[Terms of Service]](https://www.askviable.com/terms-of-service). The terms don't include statements specific to the use of the content generated by the system or GPT-3.\n\",unknown,,,application,https://www.askviable.com/,,,,,unknown,unknown,unknown,\"Question and answer, summarization, sentiment analysis, topic identification\",https://www.askviable.com/terms-of-service,unknown,\nclosed,,2023-02-20,,['LAION-5B'],,\"We eliminate duplicates, low resolution images, and images potentially contain harmful content from the LAION dataset.\",,,,unknown,\"image, text\",,LAION-1B,China,Alibaba,,,[],1B image-text pairs,dataset,https://arxiv.org/pdf/2302.09778.pdf,,,,,,,,,,,\nclosed,,2023-02-20,,\"['ImageNet', 'WebVision', 'LAION-1B']\",,,,,,unknown,\"image, text; image\",,Composer,China,Alibaba,,,,4.4B parameters (dense),model,https://arxiv.org/pdf/2302.09778.pdf,,,,,,,,,,,\nopen,\"Evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, etc.\",2023-08-03,,[],\"QWEN is a comprehensive language model series that encompasses distinct models with varying parameter counts. Qwen series, now including Qwen, the base language models, namely Qwen-7B and Qwen-14B, as well as Qwen-Chat, the chat models, namely Qwen-7B-Chat and Qwen-14B-Chat. \",,,,,custom,\"image, text; text\",\"Governed by the laws of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. And The People's Courts in Hangzhou City shall have exclusive jurisdiction over any dispute arising out of this Agreement.\",Qwen,China,Alibaba,,\"They filter out low-quality data, they employ a combination of rule-based and machine-learning-based methods. Specifically, they use multiple models to score the content, including language models, text-quality scoring models, and models for identifying potentially offensive or inappropriate content. They also manually sample texts from various sources and review them to ensure their quality. To further enhance the quality of our data, they selectively up-sample data from certain sources, to ensure that our models are trained on a diverse range of high-quality content.\",,14B parameters (dense),model,https://arxiv.org/abs/2309.16609,https://huggingface.co/Qwen,unknown,,,,,,,,,\nopen,\"Base models are evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, all standard English and Chinese benchmarks, and chat models are evaluated on Chatbot Arena, AlpacaEval, MT-Bench, etc.\",2024-02-04,,[],\"Qwen 1.5 is the next iteration in their Qwen series, consisting of Transformer-based large language models pretrained on a large volume of data, including web texts, books, codes, etc.\",,https://huggingface.co/Qwen/Qwen1.5-72B/discussions,,,custom,text; text,unknown,Qwen 1.5,China,Alibaba,,unknown,,72B parameters (dense),model,https://qwenlm.github.io/blog/qwen1.5/,https://huggingface.co/Qwen/Qwen1.5-72B,unknown,unknown,unknown,,,,,,,\nopen,\"Base models are evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, all standard English and Chinese benchmarks, and chat models are evaluated on Chatbot Arena, AlpacaEval, MT-Bench, etc.\",2024-03-28,,[],\"Qwen 1.5 is the next iteration in their Qwen series, consisting of Transformer-based large language models pretrained on a large volume of data, including web texts, books, codes, etc. Qwen 1.5 MoE is the MoE model of the Qwen 1.5 series.\",,https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B/discussions,,,custom,text; text,unknown,Qwen 1.5 MoE,China,Qwen Team,,unknown,,14B parameters with 2.7B parameters for activation (MoE),model,https://qwenlm.github.io/blog/qwen-moe/,https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B,unknown,unknown,unknown,,,,,,,\nopen,\"The model was evaluated on 3 benchmarks (MMLU for English, M3Exam (M3e) for English, Chinese, Vietnamese, Indonesian, and Thai, and VMLU for Vietnamese) and it outperformed GPT-3 and Vistral-7B-chat models across these benchmarks in the given languages.\",2024-04-12,,['Gemma'],SeaLLM v2.5 is a multilingual large language model for Southeast Asian (SEA) languages.,,https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5/discussions,,\"The model is intended for multilingual tasks such as knowledge retrieval, math reasoning, and instruction following. Also, it could be used to provide multilingual assistance.\",custom,text; text,unknown,SeaLLM v2.5,China,\"DAMO Academy, Alibaba\",\"The model should not be used in a way that could lead to inaccurate, misleading or potentially harmful generation. Users should comply with local laws and regulations when deploying the model.\",\"Despite efforts in red teaming and safety fine-tuning and enforcement, the creators suggest, developers and stakeholders should perform their own red teaming and provide related security measures before deployment, and they must abide by and comply with local governance and regulations.\",,7B parameters,model,https://github.com/DAMO-NLP-SG/SeaLLMs,https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5,unknown,unknown,unknown,,,,,,,\nclosed,,2023-03-30,https://arxiv.org/pdf/2303.17564.pdf#section.2,[],\"A comprehensive dataset consisting of a range of English financial documents including news, filings, press releases, web-scraped financial documents, and social media drawn from the Bloomberg archives that was used to train the BloombergGPT model.\",,,\"FinPile consists of English financial documents. Authors utilize the The Bloomberg\nTerminal, which is an extensive collection of curated and maintained documents,\nto create the FinPile dataset. Each document in FinPile is time-stamped, with\ndates ranging from 2007-03-01 to 2022-07-31.\nTypes of data included are given below:\n  1. Web (298B tokens) - Inclues Bloomberg's web crawl focused on high-quality\nwebsites that have financially relevant information. This makes up the majority\nof FinPile.\n  2. News (38B tokens) - Includes all news sources relevant to the financial\ncommunity, excluding news articles written by Bloomberg journalists. Overall,\nthere are hundreds of English news sources in FinPile including \"\"Bloomberg\nTranscripts\"\", which are transcripts of Bloomberg TV news.\n  3. Filings (14B tokens) - Includes financial statements prepared by (public)\ncompanies and made available to the general public.  In the dataset, a majority\nof the filings come from EDGAR, which is the SEC's online database.\n  4. Press (9B tokens) - Includes press releases typically issued by companies\nthat are financially relevant.\n  5. Bloomberg (5B tokens) - Includes Bloomberg authored news and other documents\nsuch as opinions and analyses. The largest sources are “Bloomberg News” and\n“Bloomberg First Word”, the Bloomberg-authored wire of real-time news.\n\",Used to train the BloombergGPT model.,unknown,text,,FinPile,USA,Bloomberg,,,[],363B tokens,dataset,https://arxiv.org/pdf/2303.17564.pdf#section.2,,,,,,,,,,,\nclosed,\"Authors evaluate the performance of BloombergGPT on two broad categories of tasks, finance-specific and general purpose, on several standard benchmarks. They compare BloombergGPT to the three closest models: GPT-NeoX, OPT-66B and BLOOM-176B. They also report results from the original GPT-3 whenever externally available. They conclude \"\"We achieve strong results on general LLM benchmarks and outperform comparable models on financial tasks. We attribute this, in decreasing order of impact, to 1. a well-curated internal dataset, 2. our unique choice in tokenizer, and 3. an up-to-date architecture.\"\"\n\",2023-03-30,,\"['FinPile', 'The Pile', 'C4', 'Wikipedia']\",BloombergGPT is a 50 billion parameter large language model that is specifically trained on a wide range of financial data to support a diverse set of natural language processing tasks within the financial industry.,,,,\"\"\"This model will assist Bloomberg in improving existing financial NLP tasks, such as sentiment analysis, named entity recognition, news classification, and question answering, among others. Furthermore, BloombergGPT will unlock new opportunities for marshalling the vast quantities of data available on the Bloomberg Terminal to better help the firm's customers, while bringing the full potential of AI to the financial domain.\"\"\n\",unknown,text; text,,BloombergGPT,USA,Bloomberg,,\"Authors state the following:\n- \"\"To provide natural language applications to the financial community, we\n  have developed a rigorous risk and testing assessment process. This process\n  includes careful annotation guidelines Tseng et al. (2020), pre-launch review\n  at multiple levels by the central risk and compliance organizations, and\n  by the product leaders (e.g., the newsroom) as applicable, and post-launch\n  monitoring. Moreover, we conduct our research, development, and deployment\n  of NLP and AI systems in accordance with all applicable regulations.\"\"\n- \"\"Similarly, toxicity and bias are areas where, as a company, we take extraordinary\n  care with any content we produce, whether from humans or machines. Since\n  the measurement of toxicity and bias in our model depends on its application\n  areas, quantifying the potential for the generation of harmful language\n  remains an open question. We are particularly interested in studying whether\n  FinPile, which is cleaner and contains fewer examples of overtly biased\n  or toxic language (e.g., Press Releases), reduces the proclivity of the\n  model to generate inappropriate content.\"\"\n\",,50B parameters (dense),model,https://arxiv.org/abs/2303.17564,,unknown,64 Amazon EC2 p4d.24xlarge instances each with 8 NVIDIA 40GB A100 GPUs (i.e. total 512 A100 GPUs),53 days,,,,,,,\nclosed,\"MassiveText data was analyzed for toxicity, language distribution, URL breakdown, and tokenizer compression rates on the subsets [[Section A.2]](https://arxiv.org/pdf/2112.11446.pdf#subsection.A.2).\n\",2021-12-08,https://arxiv.org/pdf/2112.11446.pdf#subsection.A.5,[],\"The MassiveText dataset was used to train the Gopher model.\n\",\"Documents that are not in English are excluded.\n\",unknown,\"MassiveText data come from 6 sources: MassiveWeb (48%), Books (27%), C4 (10%), News (10%), GitHub (3%), and Wikipedia (2%). MassiveWeb is a web text corpus curated for MassiveText.\n\",\"Pre-training of language models by DeepMind researchers [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.C).\n\",unknown,\"code, text\",unknown,MassiveText,UK,Google Deepmind,unknown,\"The authors use simple heuristics for filtering low quality documents as opposed to relying on a classifier based on a \"\"gold\"\" set such as the English Wikipedia, which could \"\"inadvertently bias towards a certain demographic or erase certain dialects or sociolects from representation.\"\" MassiveWeb subset was filtered using Google’s SafeSearch filter, preferring it over to word filters that \"\"disproportinately filter out inoffensive content associated with minority groups. MassiveWeb was filtered further for word or phrase repetitions. All the subsets were filtered for document deduplication and test set contamination\"\" [[Appendix A]](https://arxiv.org/pdf/2112.11446.pdf#appendix.A).\n\",[],10.5 TB,dataset,https://arxiv.org/pdf/2112.11446.pdf,,,,,,,,,,,\nclosed,,2022-04-29,https://arxiv.org/pdf/2204.14198.pdf#appendix.F,[],\"M3W (MassiveWeb) is dataset used to train Flamingo, and other vision-language models and was created by researchers and engineers.\n\",unknown,,\"M3W has interleaved images (185M) and text (182GB) from the web.\n\",\"Pre-training of vision and language models by DeepMind researchers [[Datasheet]](https://arxiv.org/pdf/2204.14198.pdf#appendix.F).\n\",unknown,\"code, text\",unknown,M3W,UK,Google Deepmind,unknown,\"The authors provide a basic description of data processing and cleaning.\n\",[],\"182GB Text, 185M Images\",dataset,https://arxiv.org/pdf/2204.14198.pdf,,,,,,,,,,,\nclosed,\"The Gato dataset compiles many datasets introduced in prior works, with associated analyses.\n\",2022-05-12,,['MassiveText'],\"The Gato datasets are a collection of data used to train the Gato model.\n\",,,\"The full composition of the dataset across individual sources can be found in the paper.\n\",unknown,unknown,\"image, text, robotics trajectories, simulated control tasks\",,Gato dataset,UK,Google Deepmind,unknown,unknown,[],\"10.5 TB Text, 2.2B Text-Image pairs, 1.5T tokens of simulated control, 500k robotics trajectories\",dataset,https://www.deepmind.com/blog/a-generalist-agent,,,,,,,,,,,\nopen,,2021-07-15,,['Protein Data Bank'],AlphaFold2 is a protein language model trained on protein sequences,,,,,Apache 2.0,amino acid sequence; protein structure,,AlphaFold2,UK,Google Deepmind,,,,93M parameters (dense),model,https://www.nature.com/articles/s41586-021-03819-2,,,128 TPUv3 cores,11 days,,,,,,,\nclosed,\"Model performance was evaluated on image and video datasets primarily, including dialogue.\n\",2022-04-29,,\"['M3W', 'ALIGN', 'LTIP', 'VTP', 'Chinchilla']\",\"Flamingo is a Visual Language Model using the Transformer architecture that is intended for few-shot learning.\n\",,,,\"The intended uses are stated in the model card: \"\"The primary use is research on visual language models (VLM), including: research on VLM applications like classification, captioning or visual question answering, understanding how strong VLMs can contribute to AGI, advancing fairness and safety research in the area of multimodal research, and understanding limitations of current large VLMs.\"\" [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E).\n\",unknown,\"image, text; text\",unknown,Flamingo,UK,Google Deepmind,\"The model card lists the following as out of scope uses of the model: \"\"Uses of the model for visually conditioned language generation in harmful or deceitful settings. Broadly speaking, the model should not be used for downstream applications without further safety and fairness mitigations specific to each application.\"\" [[Model Card]](https://arxiv.org/pdf/2204.14198.pdf#appendix.E).\n\",,,80B parameters (dense),model,https://arxiv.org/pdf/2204.14198.pdf,https://arxiv.org/pdf/2204.14198.pdf#appendix.E,unknown,TPU,15 days on 1536 TPUs,,,,,,,\nclosed,,2022-02-02,,[],AlphaCode is an autoregressive language model trained on code,,,,,unknown,text; code,,AlphaCode,UK,Google Deepmind,,,,41B parameters (dense),model,https://arxiv.org/abs/2203.07814,,,,,,,,,,,\nclosed,\"Model performance was evaluated and analyzed on 152 NLP tasks including: Language Modelling (20), Reading Comprehension (3), Fact Checking (3), Question Answering (3), Common Sense (4), MMLU (57), BIG-bench (62) [[Section 4]](https://arxiv.org/pdf/2112.11446.pdf#section.4); on toxicity and bias datasets [[Section 5]](https://arxiv.org/pdf/2112.11446.pdf#section.5); and on dialogue tasks [[Section 6]](https://arxiv.org/pdf/2112.11446.pdf#section.6).\n\",2021-12-08,,['MassiveText'],\"Gopher is an autoregressive language model based on the Transformer architecture with two modifications: using RMSNorm instead of LayerNorm and using relative positional encoding scheme instead of absolute positional encodings [[Section 3]](https://arxiv.org/pdf/2112.11446.pdf#subsection.3.1).\n\",,\"The feedback for the model can be provided at the email linked in the model card, geoffreyi at google.com [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\",,\"The intended uses are stated in the Gopher model card: \"\"The primary use is research on language models, including: research on NLP applications like machine translation and question answering, understanding how strong language models can contribute to AGI, advancing fairness and safety research, and understanding limitations of current LLMs\"\" [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\",unknown,text; code,unknown,Gopher,UK,Google Deepmind,\"The model card lists the following as out of scope uses of the model: \"\"for language generation in harmful or deceitful settings. More generally, the model should not be used for downstream applications without further safety and fairness mitigations\"\" [[Model Card]](https://arxiv.org/pdf/2112.11446.pdf#appendix.B).\n\",,,280B parameters (dense),model,https://arxiv.org/pdf/2112.11446.pdf,https://arxiv.org/pdf/2112.11446.pdf#appendix.B,380 tCO2e,TPUv3 pods,7303.24 petaflop/s-day,,,,,,,\nclosed,\"Model performance was evaluated and analyzed on many NLP tasks including language modeling, reading comprehension, question answering, commonsense-intensive tasks, and the BIG-Bench and MMLU meta-benchmarks.\n\",2022-03-29,,['MassiveText'],\"Chinchilla is an autoregressive language model based on the Transformer architecture with improved scaling laws.\n\",,\"The feedback for the model can be provided at the email linked in the model card, {jordanhoffmann, sborgeaud, amensch,sifre} at deepmind.com [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I).\n\",,\"The intended uses are stated in the Chinchilla model card: \"\"The primary use is research on language models, including: research on the scaling behaviour of language models along with those listed in Gopher paper\"\" [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I).\n\",unknown,text; code,Unknown,Chinchilla,UK,Google Deepmind,\"The model card lists the following as out of scope uses of the model: \"\"for language generation in harmful or deceitful settings. More generally, the model should not be used for downstream applications without further safety and fairness mitigations\"\" [[Model Card]](https://arxiv.org/pdf/2203.15556.pdf#appendix.I).\n\",,,70B parameters (dense),model,https://arxiv.org/pdf/2203.15556.pdf,https://arxiv.org/pdf/2203.15556.pdf,Unknown,TPUv3/TPUv4 pods,7303.24 petaflop/s-day,,,,,,,\nclosed,\"Model performance was evaluated on simulated and robotics task primarily, including out-of-distribution and skill generalization.\n\",2022-05-12,,['Gato dataset'],\"Gato is a generalist agent based on sequence modeling using the Transformer architecture to implement multi-modal, multi-task, multi-embodiment generalist policy.\n\",,\"The feedback for the model can be provided at the email linked in the model card, reedscot at google.com [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\",,\"The intended uses are stated in the Gopher model card: \"\"Learn to accomplish a wide variety of tasks from expert demonstrations, such as playing video games, controlling simulated embodiments, and real world block stacking.\"\" [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\",unknown,\"image, text; image, text, robotics trajectories\",unknown,Gato,UK,Google Deepmind,\"The model card lists the following as out of scope uses of the model: \"\"Not intended for commercial or production use. Military uses are strictly prohibited.\"\" [[Model Card]](https://openreview.net/pdf?id=1ikK0kHjvj#appendix.A).\n\",,,1.2B parameters (dense),model,https://www.deepmind.com/blog/a-generalist-agent,https://openreview.net/pdf?id=1ikK0kHjvj#appendix.B,unknown,16x16 TPU v3 slice,4 days on a 16x16 TPU v3 slice,,,,,,,\nclosed,,2022-09-28,,\"['Chinchilla', 'Google Search', 'Sparrow Rule reward model', 'Sparrow Preference reward model']\",,,,,,unknown,text; text,,Sparrow,UK,Google Deepmind,,,,70B parameters (dense),model,https://arxiv.org/abs/2209.14375,,,,,,,,,,,\nclosed,,2021-12-08,,['MassiveText'],,,,,,unknown,text; text,,RETRO,UK,Google Deepmind,,,,7.5B parameters (dense),model,https://arxiv.org/abs/2112.04426,,,,,,,,,,,\nclosed,,2022-09-28,,\"['Chinchilla', 'Sparrow adversarial probing dataset']\",,,,,,unknown,text; text,,Sparrow Rule reward model,UK,Google Deepmind,,,,70B parameters (dense),model,https://arxiv.org/abs/2209.14375,,,,,,,,,,,\nclosed,,2022-09-28,,\"['Chinchilla', 'Sparrow response preference dataset']\",,,,,,unknown,text; text,,Sparrow Preference reward model,UK,Google Deepmind,,,,70B parameters (dense),model,https://arxiv.org/abs/2209.14375,,,,,,,,,,,\nclosed,,2022-09-28,,['Chinchilla'],,,,,,unknown,text,,Sparrow adversarial probing dataset,UK,Google Deepmind,,,[],27k ratings,dataset,https://arxiv.org/abs/2209.14375,,,,,,,,,,,\nclosed,,2022-09-28,,['Chinchilla'],,,,,,unknown,text,,Sparrow response preference dataset,UK,Google Deepmind,,,[],72k comparisons,dataset,https://arxiv.org/abs/2209.14375,,,,,,,,,,,\nclosed,,2022-03-16,,\"['Gopher', 'Google Search', 'GopherCite reward model']\",,,,,,unknown,text; text,,GopherCite,UK,Google Deepmind,,,,280B parameters (dense),model,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,,,,,,,,,,\nclosed,,2022-03-16,,\"['Gopher', 'GopherCite Preference dataset']\",,,,,,unknown,text; text,,GopherCite reward model,UK,Google Deepmind,,,,7B parameters (dense),model,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,,,,,,,,,,\nclosed,,2022-03-16,,\"['Gopher', 'Google Search']\",,,,,,unknown,text,,GopherCite Preference dataset,UK,Google Deepmind,,,[],33k response pairs,dataset,https://storage.googleapis.com/deepmind-media/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes/Teaching%20language%20models%20to%20support%20answers%20with%20verified%20quotes.pdf,,,,,,,,,,,\nclosed,,2022-09-29,,['Chinchilla'],,,,,,unknown,text; text,,Dramatron,UK,Google Deepmind,,,,70B parameters (dense),model,https://arxiv.org/abs/2209.14958,,,,,,,,,,,\nopen,Evaluated on evaluation trajectories and SoTA baselines using robotic data.,2023-07-28,,\"['PaLI-X', 'PaLM-E', 'RT-2 action tokens']\",RT-2 is a vision-language-action model for robotic actions that incorporates chain of thought reasoning.,,,,,unknown,\"text, video; text, robotics trajectories\",,RT-2,UK,Google Deepmind,,,,55B parameters (dense),model,https://arxiv.org/pdf/2307.15818.pdf,,,,,,,,,,,\nclosed,unknown,2023-11-16,,[],Lyria is DeepMind's most advanced AI music generation model to date.,,,,,unknown,text; music,,Lyria,UK,Google Deepmind,,worked with artists and music industry to ensure utility,,unknown,model,https://deepmind.google/discover/blog/transforming-the-future-of-music-creation/,,unknown,unknown,unknown,,,,,,,\nclosed,Evaluated using only out-of-distribution image prompts for qualitative results.,2024-02-23,,[],\"Gene is a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.\",,,,,unknown,image; video,,Genie,unknown,Google DeepMind,,,,11B parameters (dense),model,https://sites.google.com/view/genie-2024,,unknown,unknown,unknown,,,,,,,\nlimited,\"The model was tested and evaluated on various prompts to assess its understanding of natural language, its ability to generate high-quality images in various formats and styles and generate fine details and complex textures. Red teaming and evaluations were conducted on topics including fairness, bias, and content safety.\",2024-05-14,,[],\"Imagen 3 is a high-quality text-to-image model, capable of generating images with better detail, richer lighting, and fewer distracting artifacts compared to previous models. Improved understanding of prompts allows for a wide range of visual styles and captures small details from longer prompts. It also understands prompts written in natural, everyday language, making it easier to use. Imagen 3 is available in multiple versions, optimized for different types of tasks, from generating quick sketches to high-resolution images.\",,unknown,,\"Generate high-quality images for various purposes, from photorealistic landscapes to textured oil paintings or whimsical claymation scenes. It is useful in situations where detailed visual representation is required based on the textual description.\",unknown,text; image,Through digital watermarking tool SynthID embedded in pixels for detection and identification.,Imagen 3,unknown,Google DeepMind,unknown,\"Extensive filtering and data labeling were used to minimize harmful content in datasets and reduce the likelihood of harmful outputs. Privacy, safety, and security technologies were leveraged in deploying the model, including watermarking tool SynthID.\",,unknown,model,https://deepmind.google/technologies/imagen-3/,,unknown,unknown,unknown,,,,,,,\nclosed,unknown,2024-05-14,,[],\"Veo is Google DeepMind's most capable video generation model to date. It generates high-quality, 1080p resolution videos that can go beyond a minute, in a wide range of cinematic and visual styles. It accurately captures the nuance and tone of a prompt, and provides an unprecedented level of creative control. The model is also capable of maintaining visual consistency in video frames, and supports masked editing.\",,Feedback from leading creators and filmmakers is incorporated to improve Veo's generative video technologies.,,\"Veo is intended to help create tools that make video production accessible to everyone. It can be used by filmmakers, creators, or educators for storytelling, education and more. Some of its features will be also brought to products like YouTube Shorts.\",unknown,text; video,unknown,Veo,unknown,Google DeepMind,unknown,\"Videos created by Veo are watermarked using SynthID, DeepMinds tool for watermarking and identifying AI-generated content, and passed through safety filters and memorization checking processes to mitigate privacy, copyright and bias risks.\",,unknown,model,https://deepmind.google/technologies/veo/,,unknown,unknown,unknown,,,,,,,\nlimited,\"The model was evaluated on various benchmarks like General MMLU, Code Natural2Code, MATH, GPQA, Big-Bench, WMT23, MMMU, and MathVista providing performance across various domains like multilingual translation, image processing, and code generation.\",2024-05-30,,[],\"Gemini Flash is a lightweight model, optimized for speed and efficiency. It features multimodal reasoning and a breakthrough long context window of up to one million tokens. It's designed to serve at scale and is efficient on cost, providing quality results at a fraction of the cost of larger models.\",,,,\"The model is intended for developer and enterprise use cases. It can process hours of video and audio, and hundreds of thousands of words or lines of code, making it beneficial for a wide range of tasks.\",Googles Terms and Conditions,\"audio, image, text, video; text\",unknown,Gemini 1.5 Flash,unknown,Google DeepMind,,The research team is continually exploring new ideas at the frontier of AI and building innovative products for consistent progress.,,unknown,model,https://deepmind.google/technologies/gemini/flash/,,unknown,unknown,unknown,,,,,,,\nopen,Unknown,2024-06-17,,[],\"Gen-3 Alpha is a foundation model trained for large-scale multimodal tasks. It is a major improvement in fidelity, consistency, and motion over the previous generation, Gen-2. Gen-3 Alpha can power various tools, such as Text to Video, Image to Video, and Text to Image. The model excels at generating expressive human characters with a wide range of actions, gestures, and emotions, and is capable of interpreting a wide range of styles and cinematic terminology. It is also a step towards building General World Models. It has been designed for use by research scientists, engineers, and artists, and can be fine-tuned for customization according to specific stylistic and narrative requirements.\",,\"Companies interested in fine-tuning and custom models can reach out to Runway AI, Inc. using a form on their website.\",,\"Can be used to create expressive human characters, interpret a wide range of styles and cinematic terminology, and power tools for Text to Video, Image to Video, and Text to Image tasks.\",\"Terms of Use listed on Runway AI, Inc.'s website, specific license unknown\",\"text, image, video; video\",The model includes a new and improved in-house visual moderation system.,Gen-3 Alpha,USA,\"Runway AI, Inc.\",Unknown,\"It will be released with a set of new safeguards, including an improved in-house visual moderation system and C2PA provenance standards.\",,Unknown,model,https://runwayml.com/research/introducing-gen-3-alpha?utm_source=xinquji,unknown,Unknown,Unknown,Unknown,,,,,,,\nlimited,\"Evaluated on wide range of language benchmarks like MMLU 5-shot, GSM-8K, and HellaSwag 10-shot among others.\",2023-06-22,,[],Inflection AI's first version of its in-house LLM. via Inflection AI's conversational API.,,,,,unknown,text; text,,Inflection-1,USA,Inflection AI,,,,unknown,model,https://inflection.ai/inflection-1,,,unknown,,,,,,,,\nlimited,,2023-05-02,,['Inflection-2.5'],Personal AI chatbot designed to be conversational and specialized in emotional intelligence.,,,,to be used as a personal assistant chatbot for everyday activities,unknown,,,Pi,USA,Inflection AI,,,,,application,https://inflection.ai/press,,,,,,,,natural language text responses,,,\nclosed,\"Evaluated against state of the art models on benchmarks, and found to be most performant model outside of GPT-4.\",2023-11-22,,[],\"Inflection-2 is the best model in the world for its compute class and the second most capable LLM in the world, according to benchmark evaluation, as of its release.\",,,,,unknown,text; text,,Inflection-2,USA,Inflection AI,,,,unknown,model,https://inflection.ai/inflection-2,,unknown,5000 NVIDIA H100 GPUs,unknown,,,,,,,\nlimited,\"Evaluated on standard LLM and technical benchmarks in comparison to Inflection-1 and GPT-4, along with advanced STEM examinations.\",2024-03-07,,[],\"Inflection-2.5 is an upgraded in-house model that is competitive with all the world's leading LLMs, as of release, like GPT-4 and Gemini.\",,,,,unknown,text; text,,Inflection-2.5,USA,Inflection AI,,,,unknown,model,https://inflection.ai/inflection-2-5,,unknown,unknown,unknown,,,,,,,\nopen,unknown,2024-10-07,,[],\"Inflection for Enterprise, powered by our industry-first, enterprise-grade AI system, Inflection 3.0.\",,So please drop us a line. We want to keep hearing from enterprises about how we can help solve their challenges and make AI a reality for their business.,,unknown,unknown,unknown,unknown,Inflection 3.0,USA,Inflection AI,unknown,unknown,,unknown,model,https://inflection.ai/blog/enterprise,unknown,unknown,unknown,unknown,,,,,,,\nlimited,,2023-02-21,,['ChatGPT API'],\"With the alliance, Bain will combine its deep digital implementation capabilities and strategic expertise with OpenAI’s AI tools and platforms, including ChatGPT, to help its clients around the world identify and implement the value of AI to maximize business potential.\",,,,,unknown,,,Bain Chat,USA,Bain,,,,,application,https://www.bain.com/vector-digital/partnerships-alliance-ecosystem/openai-alliance/,,,,,,,,,,,\nopen,Evaluated on a range of benchmarks and performed on par with LLaMA-7B.,2023-05-05,,\"['RedPajama-Data', 'C4', 'The Stack', 'Multimodal C4']\",MPT is a series of large language models seeking to address the limitations of other open source models like LLaMA and Pythia.,,,,,Apache 2.0,text; text,,MPT,USA,Mosaic,,,,7B parameters (dense),model,https://www.mosaicml.com/blog/mpt-7b,,unknown,440 A100 40GB GPUs,9.5 days,,,,,,,\nopen,\"Compared to Stable Diffusion 2, a SOTA text-to-image model.\",2023-10-25,,['CommonCatalog'],CommonCanvas is a text-to-image model trained solely on Creative Commons licensed images.,,,,,Apache 2.0,text; image,,CommonCanvas,unknown,\"Cornell University, Mosaic\",,,,unknown,model,https://arxiv.org/pdf/2310.16825.pdf,,unknown,128 A100 NVIDIA GPUs,6.79 days,,,,,,,\nopen,,2023-10-25,https://github.com/mosaicml/diffusion/blob/main/assets/common-canvas.md,\"['YFCC100M', 'BLIP-2']\",CommonCatalog is a curated dataset of CommonCrawl images and synthetic captions.,images with non-derivative licenses,,images with derivative licenses,,Apache 2.0,image-caption pairings,,CommonCatalog,USA,Mosaic,,,[],70M images,dataset,https://arxiv.org/pdf/2310.16825.pdf,,,,,,,,,,,\nopen,unknown,2024-10-08,,[],\"XTTS-v2 is a voice generation model that allows voice cloning into different languages using a brief 6-second audio clip, supporting 17 languages with features like emotion and style transfer, cross-language voice cloning, and multi-lingual speech generation. It powers Coqui Studio and Coqui API, with improvements in architectural and prosody aspects for better audio quality.\",,\"Users can join the Coqui community on Discord, engage on Twitter, or send emails to info@coqui.ai for feedback and queries.\",,\"Voice cloning, multi-lingual speech generation, emotion and style transfer in speech.\",Coqui Public Model,audio; audio,unknown,XTTS-v2,USA,Coqui,unknown,unknown,,unknown,model,https://huggingface.co/coqui/XTTS-v2,https://huggingface.co/coqui/XTTS-v2,unknown,unknown,unknown,,,,,,,\nlimited,,2023-03-14,,['GPT-4 API'],The first-ever digital visual assistant powered by OpenAI’s new GPT-4 language model.,,,,,unknown,,,Virtual Volunteer,Denmark,Be My Eyes,,,,,application,https://www.bemyeyes.com/blog/introducing-be-my-eyes-virtual-volunteer,,,,,,,,,,,\nlimited,\"Extensive testing on Multilingual Librispeech dataset resulted in 20% lower validation perplexity. In downstream evaluations, this leads to a 2x lower word error rate and a 1 point higher quality score. Sonic also displays impressive performance metrics at inference, achieving lower latency (1.5x lower time-to-first-audio), faster inference speed (2x lower real-time factor), and higher throughput (4x).\",2024-05-29,,['Multilingual Librispeech dataset'],\"Sonic is a low-latency voice model that generates lifelike speech. Developed by Cartesia, it was designed to be an efficient real-time AI capable of processing any-sized contexts and running on any device.\",,Contact through the provided form or via email at join@cartesia.ai.,,\"Sonic has potential applications across customer support, entertainment, and content creation and is a part of Cartesias broader mission to bring real-time multimodal intelligence to every device.\",unknown,text; audio,unknown,Sonic,USA,Cartesia,unknown,,,2024-05-29,model,https://cartesia.ai/blog/sonic,,unknown,unknown,unknown,,,,,,,\nopen,,2023-11-07,,['Mistral'],SciPhi Mistral is a Large Language Model (LLM) fine-tuned from Mistral.,,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k/discussions,,,MIT,text; text,unknown,SciPhi Mistral,USA,SciPhi,,,,7B parameters (dense),model,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k,https://huggingface.co/SciPhi/SciPhi-Mistral-7B-32k,unknown,unknown,unknown,,,,,,,\nclosed,,2023-09-26,,[],\"A.X is SK Telecom's proprietary LLM, which has been trained on the Korean language.\",,,,,unknown,text; text,,A.X,South Korea,SK Telecom,,,,39B parameters,model,https://www.sktelecom.com/en/press/press_detail.do?idx=1582,,unknown,unknown,unknown,,,,,,,\nopen,,2021-08-20,https://laion.ai/blog/laion-400-open-dataset/,\"['CLIP', 'CommonCrawl']\",\"LAION-400M is a dataset with CLIP-filtered 400 million image-text pairs, their CLIP embeddings and kNN indices that allow efficient similarity search. This dataset is entirely openly, freely accessible.\",\"Authors apply the following filtering conditions on the WAT files downloaded from Common Crawl: \"\"All samples with less than 5 character alt-text length or less than 5 KB image size are dropped. Duplicate removal is performed with bloom filter based on URL and alt-text. We use CLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.3. This threshold was selected based on human inspections. We use the CLIP embeddings of images and texts to filter out illegal contents.\"\"\n\",,,\"The authors recommend using the dataset \"\"for research purposes\"\" and warn that \"\"this large-scale dataset is non-curated. It was built for research purposes to enable testing model training on larger scale for broad researcher and other interested communities, and is not meant for any real-world production or application.\"\"\",CC BY 4.0,\"image, text\",,LAION-400M,Germany,LAION,No uses are explicitly prohibited by the license. Users are warned from using LAION-400M for any real-world production or application.,\"The authors use  CLIP embeddings of images and texts to filter out illegal contents. They also use CLIP to tag image-text pairs as NSFW. They note that less than 1% of images were detected as NSFW, which can be filtered out by an user with NSFW tag.\",[],400M image-text pairs,dataset,https://laion.ai/blog/laion-400-open-dataset/,,,,,,,,,,,\nopen,,2022-12-12,https://laion.ai/blog/laion-5b/,\"['CLIP', 'mCLIP', 'CommonCrawl']\",LAION is a dataset of 5 billion image-text pairs from the Internet,\"All samples with less than 5 characters alt-text length or less than 5 KB image size are dropped. All images with the too big resolution, potentially DOS bombs, were dropped before attempting to process them. Duplicate removal is performed with a bloom filter based on URL. Future runs would include more variate deduplication rules, such as URL + language for the multilanguage dataset. We use CLIP respectively MCLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.28 for the English language ( with CLIP B/32) and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected based on human inspection of the test results. We use the CLIP embeddings of images and texts to filter out to the possible extent the illegal content.\",,,\"The authors recommend using the dataset \"\"for research purposes\"\" and \"\"do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress\"\"\",CC BY 4.0,\"image, text\",,LAION-5B,Germany,LAION,No uses are explicitly prohibited by the license. Users are warned from using LAION-5B for non-research purposes.,,[],5B image-text pairs,dataset,https://laion.ai/blog/laion-5b/,,,,,,,,,,,\nopen,,2022-12-12,https://laion.ai/blog/laion-5b/,\"['CLIP', 'LAION-5B']\",LAION-2B-en is a subset of the LAION-5B dataset and contains 2.3 billion English image-text pairs.,\"All samples with less than 5 characters alt-text length or less than 5 KB image size are dropped. All images with the too big resolution, potentially DOS bombs, were dropped before attempting to process them. Duplicate removal is performed with a bloom filter based on URL. Future runs would include more variate deduplication rules, such as URL + language for the multilanguage dataset. We use CLIP respectively MCLIP to compute embeddings of the image and alt-text. Then we compute the cosine similarity of both embeddings and drop all samples with cosine similarity below 0.28 for the English language ( with CLIP B/32) and 0.26 for the multilingual dataset (MCLIP). These thresholds were selected based on human inspection of the test results. We use the CLIP embeddings of images and texts to filter out to the possible extent the illegal content.\",,,\"The authors recommend using the dataset \"\"for research purposes\"\" and \"\"do not recommend using it for creating ready-to-go industrial products, as the basic research about general properties and safety of such large-scale models, which we would like to encourage with this release, is still in progress\"\"\",CC BY 4.0,\"image, text\",,LAION-2B-en,Germany,LAION,No uses are explicitly prohibited by the license. Users are warned from using LAION-2B-en for non-research purposes.,,[],2.32B image-text pairs,dataset,https://arxiv.org/pdf/2210.08402.pdf,,,,,,,,,,,\nopen,Evaluated on COCO captioning and VQAv2 vision-language tasks.,2023-03-28,,\"['LLaMA', 'CLIP']\",\"An open-source reproduction of DeepMind's Flamingo model. At its core, OpenFlamingo is a framework that enables training and evaluation of large multimodal models (LMMs).\",,,,academic research purposes,MIT,\"image, text; text\",,OpenFlamingo,Germany,LAION,commercial use,,,9B parameters (dense),model,https://laion.ai/blog/open-flamingo/,https://github.com/mlfoundations/open_flamingo/blob/main/MODEL_CARD.md,,,,,,,,,,\nopen,Evaluated on MT-Bench and AlphaEval benchmarks.,2023-12-01,,\"['UltraFeedback', 'Zephyr']\",\"Notus is an open source LLM, fine-tuned using Direct Preference Optimization (DPO) and AIF (AI Feedback) techniques.\",,https://huggingface.co/argilla/notus-7b-v1/discussions,,Academic research and free commercial usage,MIT,text; text,,Notus,Belgium,Argilla,,,,7B parameters (dense),model,https://argilla.io/blog/notus7b/,https://huggingface.co/argilla/notus-7b-v1,unknown,8 x A100 40GB GPUs,unknown,,,,,,,\nclosed,unknown,2021-11-15,https://docs.cohere.ai/data-statement,[],\"The family of datasets used to train Cohere models, which come in two forms: coheretext-filtered and coheretext-unfiltered. The former is used to train the Representation models, while the latter one is used to train the Generation models.\n\",\"Documents that are not in English are excluded.\n\",unknown,\"As stated in the datasheet, the dataset \"\"includes the Google Books dataset, CommonCrawl, and text from the internet scraped by the Cohere infrastructure team.\"\" The top ten domains scraped were: wordpress.com, medium.com, stackexchange.com, tumblr.com, elsevier.com, genius.com, bbc.co.uk, libsyn.com, yahoo.com, nytimes.com [[Datasheet]](https://docs.cohere.ai/data-statement).\n\",\"The intended use of the dataset is to train Cohere's language models.\n\",unknown,text; text,unknown,coheretext,Canada,Cohere,unknown,\"In the datasheet, it is implied that Cohere employs filtration methods for removing racist, biased and toxic content, but the details are not provided. These filtration methods take both the context and the language, as opposed to using a list of blockwords [[Datasheet]](https://docs.cohere.ai/data-statement).\n\",[],200 GB,dataset,,,,,,,,,,,,\nlimited,\"The model's performance was analyzed on Hellaswag and COPA, as well as several safety benchmarks [[Model Card]](https://docs.cohere.ai/generation-card).\",2021-11-15,,['coheretext'],\"The Generations model is a language model trained by Cohere for generation tasks.\n\",,unknown,,\"On the model card, the intended uses are stated as \"\"interactive autocomplete, augmenting human writing processes, summarization, text rephrasing, and other text-to-text tasks in non-sensitive domains\"\" [[Model Card]](https://docs.cohere.ai/generation-card).\n\",unknown,text; text,\"The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/generation-card).\n\",Cohere Base,Canada,Cohere,\"The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: astroturfing, generation of misinformation and other harmful content, and \"\"generation of text about people, places, or events without a human-in-the-loop\"\" [[Model Card]](https://docs.cohere.ai/generation-card).\n\",unknown,,unknown,model,,https://docs.cohere.ai/generation-card,unknown,unknown,unknown,,,,,,,\nlimited,\"The model's performance was analyzed on Hellaswag and COPA, as well as several safety benchmarks [[Model Card]](https://docs.cohere.ai/generation-card).\",2023-01-01,,['Cohere Base'],\"This model is a generative model optimized to follow commands in the prompt.\n\",,unknown,,\"On the model card, the intended uses are stated as \"\"interactive autocomplete, augmenting human writing processes, summarization, text rephrasing, and other text-to-text tasks in non-sensitive domains\"\" [[Model Card]](https://docs.cohere.ai/generation-card).\n\",unknown,text; text,\"The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/generation-card).\n\",Cohere Command,Canada,Cohere,\"The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: astroturfing, generation of misinformation and other harmful content, and \"\"generation of text about people, places, or events without a human-in-the-loop\"\" [[Model Card]](https://docs.cohere.ai/generation-card).\n\",unknown,,unknown,model,https://docs.cohere.com/docs/command-beta,https://docs.cohere.ai/generation-card,unknown,unknown,unknown,,,,,,,\nlimited,\"The model's performance was analyzed on several safety benchmarks [[Model Card]](https://docs.cohere.ai/representation-card).\n\",2021-11-15,,[],\"The Embedding Large (English) model is a language model trained by Cohere for tasks requiring embeddings.\n\",,unknown,,\"The intended uses are stated as \"\"estimating semantic similarity between two sentences, choosing a sentence which is most likely to follow another sentence, sentiment analysis, topic extraction, or categorizing user feedback\"\" on the Cohere model card [[Model Card]](https://docs.cohere.ai/representation-card).\n\",unknown,text; text,\"The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/representation-card).\n\",Cohere Embed (English),Canada,Cohere,\"The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: extraction of identity and demographic information, building purposefully opaque text classification systems, and \"\"building downstream classifiers that serve as automated decision-making systems that have real-world consequences on people, where those decisions are made without a human-in-the-loop\"\" [[Model Card]](https://docs.cohere.ai/representation-card).\n\",unknown,,unknown,model,,https://docs.cohere.ai/representation-card,unknown,unknown,unknown,,,,,,,\nlimited,\"The model's performance was analyzed on several safety benchmarks [[Model Card]](https://docs.cohere.ai/representation-card).\n\",2022-12-12,,[],\"This model maps text from 100+ languages to a semantic vector space, positioning text with a similar meaning (regardless of language) in close proximity.\n\",,unknown,,\"The intended uses are stated as \"\"estimating semantic similarity between two sentences, choosing a sentence which is most likely to follow another sentence, sentiment analysis, topic extraction, or categorizing user feedback\"\" on the Cohere model card [[Model Card]](https://docs.cohere.ai/representation-card).\n\",unknown,text; text,\"The usage of the model is monitored by Cohere [[Model Card]](https://docs.cohere.ai/representation-card).\n\",Cohere Embed (Multilingual),Canada,Cohere,\"The usage of the model is bound by the Cohere usage guidelines [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines). A non-comprehensive list of specific application violating these guidelines are: extraction of identity and demographic information, building purposefully opaque text classification systems, and \"\"building downstream classifiers that serve as automated decision-making systems that have real-world consequences on people, where those decisions are made without a human-in-the-loop\"\" [[Model Card]](https://docs.cohere.ai/representation-card).\n\",unknown,,unknown,model,https://txt.cohere.ai/multilingual/,https://docs.cohere.ai/representation-card,unknown,unknown,unknown,,,,,,,\nlimited,,2021-11-15,,\"['Cohere Chat Endpoint', 'Cohere Generate Endpoint (legacy)', 'Cohere Embed Endpoint', 'Cohere Classify Endpoint', 'Cohere Summarize Endpoint']\",\"Cohere API allows users to access the cohere language models and utilize them in their applications.\n\",,\"General feedback as well as the violations of the usage guidelines can be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",,\"Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\",custom,,\"All applications developed using the Cohere API is subject to review by Cohere.\n\",Cohere API,Canada,Cohere,\"The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",\"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access).\n\",,,application,https://cohere.ai/,,,,,unknown,unknown,unknown,generation and embeddings,https://cohere.ai/terms-of-use,unknown,\nlimited,,2023-09-28,,\"['Cohere Command', 'Command R', 'Command R+']\",\"This endpoint generates a text response to a user message in the context of a conversation, optionally grounding its generation with source material from tools like search engines, functions or databases, and attributing its responses with citations.\",,\"General feedback as well as the violations of the usage guidelines can be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",,\"Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\",\"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\",,\"All applications developed using the Cohere API is subject to review by Cohere.\n\",Cohere Chat Endpoint,Canada,Cohere,\"The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",,,,application,https://docs.cohere.com/reference/chat,,,,,unknown,unknown,unknown,generation,https://cohere.ai/terms-of-use,unknown,\nlimited,,2021-11-15,,\"['Cohere Base', 'Cohere Command']\",\"This endpoint generates realistic text conditioned on a given input.\n\",,\"General feedback as well as the violations of the usage guidelines can be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",,\"Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\",\"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\",,\"All applications developed using the Cohere API is subject to review by Cohere.\n\",Cohere Generate Endpoint (legacy),Canada,Cohere,\"The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",\"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access).\n\",,,application,https://docs.cohere.ai/reference/generate,,,,,unknown,unknown,unknown,generation,https://cohere.ai/terms-of-use,unknown,\nlimited,,2021-11-15,,\"['Cohere Embed (Multilingual)', 'Cohere Embed (English)']\",\"This endpoint returns text embeddings. An embedding is a list of floating point numbers that captures semantic information about the text that it represents.\n\",,\"General feedback as well as the violations of the usage guidelines can be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",,\"Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\",\"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\",,\"All applications developed using the Cohere API is subject to review by Cohere.\n\",Cohere Embed Endpoint,Canada,Cohere,\"The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",\"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access).\n\",,,application,https://docs.cohere.ai/reference/embed,,,,,unknown,unknown,unknown,embedding,https://cohere.ai/terms-of-use,unknown,\nlimited,,2022-05-05,,\"['Cohere Embed (Multilingual)', 'Cohere Embed (English)']\",\"This endpoint makes a prediction about which label best fits a specified text input. To make a prediction, Classify uses the provided examples of text + label pairs as a reference.\n\",,\"General feedback as well as the violations of the usage guidelines can be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",,\"Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\",\"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\",,\"All applications developed using the Cohere API is subject to review by Cohere.\n\",Cohere Classify Endpoint,Canada,Cohere,\"The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",\"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access).\n\",,,application,https://docs.cohere.ai/reference/classify,,,,,unknown,unknown,unknown,embedding,https://cohere.ai/terms-of-use,unknown,\nlimited,,2023-02-22,,[],\"This endpoint generates a succinct version of the original text that relays the most important information.\n\",,\"General feedback as well as the violations of the usage guidelines can be reported to Cohere at safety at cohere.ai [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",,\"Intended to be used by developers who would like to incorporate NLP into their applications [[Cohere Website]](https://cohere.ai/).\n\",\"Limited use license to Cohere platform users [[Terms of Use]](https://cohere.ai/terms-of-use).\n\",,\"All applications developed using the Cohere API is subject to review by Cohere.\n\",Cohere Summarize Endpoint,Canada,Cohere,\"The usage of the API is bound by the Cohere usage guidelines. Disallowed use cases include violence and threats, antisocial and antidemocratic uses, deceit, attacks on security or privacy, unsafe unsupervised uses, decision-making, high-Risk generations among others [[Usage Guidelines]](https://docs.cohere.ai/usage-guidelines).\n\",\"The new users of the API get a limited access restricting the sizes of the models as well as the number of tokens that can be used. Users are required to go through an internal application to upgrade to full access [[Limited Access]](https://docs.cohere.ai/limited-access).\n\",,,application,https://docs.cohere.ai/reference/summarize,,,,,unknown,unknown,unknown,generation,https://cohere.ai/terms-of-use,unknown,\nlimited,Achieves SOTA performances on trusted MTEB and BEIR benchmarks.,2023-11-02,,[],\"As of release, Cohere Embedv3 is Cohere's latest and most advanced embeddings model.\",,https://huggingface.co/Cohere/Cohere-embed-english-v3.0/discussions,,,unknown,text; text,,Cohere Embedv3 (English),Canada,Cohere,,,,unknown,model,https://txt.cohere.com/introducing-embed-v3/,https://huggingface.co/Cohere/Cohere-embed-english-v3.0,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on standard LLM and multilingual benchmarks in comparison to SotA models.,2024-02-12,,\"['mT5', 'Aya Dataset']\",Aya is a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced.,,https://huggingface.co/CohereForAI/aya-101/discussions,,,Apache 2.0,text; text,unknown,Aya,unknown,\"Cohere for AI, Cohere, Brown University, Carnegie Mellon University, MIT\",,,,13B parameters (dense),model,https://arxiv.org/pdf/2402.07827.pdf,https://huggingface.co/CohereForAI/aya-101,unknown,unknown,unknown,,,,,,,\nopen,\"Command R was evaluated on standard English and multilingual (translated MMLU, FLoRES) benchmarks. It was also evaluated on KILT, HotpotQA and Bamboogle.\",2024-03-11,,[],\"Command R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise. It is an auto-regressive large language model that uses an optimized transformer architecture. After pretraining, supervised fine-tuning (SFT) and preference training were applied to align model behavior to human preferences for helpfulness and safety. Command R is part of a family of open weight releases from Cohere and Cohere For AI. Its larger companion model is [Command R+](https://huggingface.co/CohereForAI/c4ai-command-r-plus).\",,[https://huggingface.co/CohereForAI/c4ai-command-r-v01/discussions](https://huggingface.co/CohereForAI/c4ai-command-r-v01/discussions),,\"Command R is optimized to perform well up to 128K context length and in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. It has been trained for conversational tool use and grounded generation capabilities. The open weights release of Command R is intended to make community-based research efforts more accessible.\",CC BY NC 4.0,text; text,unknown,Command R,Canada,Cohere,\"Usage of this model requires adherence to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy). Usage via the Cohere Platform is subject to Cohere's [Terms of Use](https://cohere.com/terms-of-use) and [Usage Guidelines](https://docs.cohere.com/docs/usage-guidelines). A non-exhaustive list of disallowed use cases include:\n- Violence and harm\n- Harassment and abuse\n- Sexual exploitation, harm, or abuse\n- Sensitve information\n- Fraud and deception\n- Synthetic data for commercial uses\n\nMisuse should be reported to Cohere at safety at cohere.ai.\",Comprehensive guidance for working with Command R's tool use and grounded generation prompt template can be found in Cohere's [documentation](https://docs.cohere.com/docs/prompting-command-r).,,35B parameters (dense),model,https://txt.cohere.com/command-r/,https://huggingface.co/CohereForAI/c4ai-command-r-v01,unknown,unknown,unknown,,,,,https://cohere.ai/terms-of-use,,\nopen,unknown,2024-02-09,https://huggingface.co/datasets/CohereForAI/aya_dataset,[],The Aya Dataset is a dataset that consists of original human-curated prompt-completion pairs written by fluent speakers of 65 languages.,,https://huggingface.co/datasets/CohereForAI/aya_dataset/discussions,,,Apache 2.0,text,unknown,Aya Dataset,unknown,\"Cohere for AI, Beijing Academy of Artificial Intelligence, Cohere, Binghamton University\",,,[],204k human-annotated prompt-completion pairs,dataset,https://arxiv.org/pdf/2402.06619.pdf,,,,,,,,,,,\nlimited,\"Evaluated on code retrieval and data retrieval capabilities, with improvements compared to the standard in both.\",2024-04-11,,[],Rerank 3 is a new foundation model for efficient enterprise search and retrieval with 4k context length.,,,,Efficient enterprise search and retrieval.,unknown,text; text,unknown,Rerank 3,Canada,Cohere,,,,unknown,model,https://cohere.com/blog/rerank-3,,unknown,unknown,unknown,,,,,,,\nopen,Evaluated across 23 languages with the highest results in all tasks and languages compared to other multilingual language models.,2024-05-31,,[],Aya 23 is an open weights research release of an instruction fine-tuned model with multilingual capabilities. It focuses on pairing a highly performant pre-trained Command family of models with the recently released Aya Collection. This model supports 23 languages.,,https://huggingface.co/CohereForAI/aya-23-35B/discussions,,This model is designed to be used for multilingual tasks covering 23 languages.,CC-BY-NC,text; text,unknown,Aya 23,Canada,Cohere,unknown,unknown,,35B parameters,model,https://arxiv.org/pdf/2405.15032,https://huggingface.co/CohereForAI/aya-23-35B,unknown,TPUv4 chips with up to 128 pod slices,unknown,,,,,,,\nopen,\"Command R+ was evaluated on standard English and multilingual (e.g. FLoRES, WMT23) benchmarks. It was also evaluated on multihop reasoning tasks (e.g. HotpotQA, Bamboogle, StrategyQA), and tool use (e.g. ToolTalk, Berkeley Function-Calling).\",2024-04-03,,[],\"Command R+ is a state-of-the-art RAG-optimized model designed to tackle enterprise-grade workloads. It is an auto-regressive large language model that uses an optimized transformer architecture. After pretraining, supervised fine-tuning (SFT) and preference training were applied to align model behavior to human preferences for helpfulness and safety. Command R+ is part of a family of open weight releases from Cohere and Cohere For AI. Its smaller companion model is [Command R](https://huggingface.co/CohereForAI/c4ai-command-r-v01).\",,[https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions](https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions),,\"Command R+ is optimized to perform well up to 128K context length and in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. It has been trained for conversational tool use and grounded generation capabilities. The open weights release of Command R+ is intended to make community-based research efforts more accessible.\",CC BY NC 4.0,text; text,unknown,Command R+,Canada,Cohere,\"Usage of this model requires adherence to [C4AI's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy). Usage via the Cohere Platform is subject to Cohere's [Terms of Use](https://cohere.com/terms-of-use) and [Usage Guidelines](https://docs.cohere.com/docs/usage-guidelines). A non-exhaustive list of disallowed use cases include:\n- Violence and harm\n- Harassment and abuse\n- Sexual exploitation, harm, or abuse\n- Sensitve information\n- Fraud and deception\n- Synthetic data for commercial uses\n\nMisuse should be reported to Cohere at safety at cohere.ai.\",Comprehensive guidance for working with Command R+'s tool use and grounded generation prompt template can be found in Cohere's [documentation](https://docs.cohere.com/docs/prompting-command-r).,,104B parameters (dense),model,https://txt.cohere.com/command-r-plus-microsoft-azure/,https://huggingface.co/CohereForAI/c4ai-command-r-plus,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluated in comparison to the MusicCaps dataset and with respect to n-gram, neural metrics.\",2023-07-31,,\"['MusicCaps', 'Million Song Dataset', 'Magnatagtune']\",LP-MusicCaps is a LLM-based pseudo music caption dataset.,,,,,CC-BY-NC-4.0,,,LP-MusicCaps,South Korea,South Korea Graduate School of Culture Technology,,,[],2.2M captions paired with 0.5M audio clips,dataset,https://arxiv.org/pdf/2307.16372.pdf,,,,,,,,,,,\nlimited,,2023-03-14,,['GPT-4 API'],\"Explain My Answer offers learners the chance to learn more about their response in a lesson (whether their answer was correct or incorrect!) By tapping a button after certain exercise types, learners can enter a chat with Duo to get a simple explanation on why their answer was right or wrong, and ask for examples or further clarification.\",,,,,custom,,,Duolingo Explain My Answer,USA,Duolingo,,,,,application,https://blog.duolingo.com/duolingo-max/,,,,,,,,,https://www.duolingo.com/terms,,\nlimited,,2023-03-14,,\"['Duolingo Role Play', 'Duolingo Explain My Answer']\",Duolingo Max is a new subscription tier above Super Duolingo that gives learners access to two brand-new features and exercises - Explain My Answer and Roleplay.,,,,,custom,,,Duolingo Max,USA,Duolingo,,,,,application,https://blog.duolingo.com/duolingo-max/,,,,,,,,,,,\nlimited,,2023-03-14,,['GPT-4 API'],\"Roleplay allows learners to practice real-world conversation skills with world characters in the app. These challenges, which earn XP, will live alongside the path as one of the “Side Quests” learners can access by tapping on the character. What will you talk about? We’ll guide you through different scenarios! Learners might discuss future vacation plans with Lin, order coffee at a café in Paris, go furniture shopping with Eddy, or ask a friend to go for a hike.\",,,,,custom,,,Duolingo Role Play,USA,Duolingo,,,,,application,https://blog.duolingo.com/duolingo-max/,,,,,,,,,https://www.duolingo.com/terms,,\nopen,\"Evaluated on standard language benchmarks, common sense reasoning, and reading comprehension in comparison to SoTA LLMs.\",2023-11-02,,[],The Yi series models are large language models trained from scratch by developers at 01 AI.,,https://huggingface.co/01-ai/Yi-34B/discussions,,,custom,text; text,unknown,Yi,France,01 AI,,\"Model underwent supervised fine-tuning, leading to a greater diversity of responses.\",,34B parameters (dense),model,https://github.com/01-ai/Yi,https://huggingface.co/01-ai/Yi-34B,unknown,unknown,unknown,,,,,,,\nopen,\"Yi-VL outperforms all existing open-source models in MMMU and CMMMU, two advanced benchmarks that include massive multi-discipline multimodal questions (based on data available up to January 2024).\",2024-01-23,,[],\"The Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.\",,https://huggingface.co/01-ai/Yi-VL-34B/discussions,,,custom,text; text,unknown,Yi-VL,France,01 AI,,unknown,,34B parameters (dense),model,https://github.com/01-ai/Yi,https://huggingface.co/01-ai/Yi-VL-34B,unknown,128 NVIDIA A800 (80G) GPUs,10 days,,,,,,,\nopen,\"Re-LAION-5B aims to fix the issues as reported by Stanford Internet Observatory for the original LAION-5B. It is available for download in two versions, research and research-safe. In total, 2236 links that potentially led to inappropriate content were removed.\",2024-08-30,,['LAION-5B'],\"Re-LAION-5B is an updated version of LAION-5B, the first web-scale, text-link to images pair dataset to be thoroughly cleaned of known links to suspected CSAM. It is an open dataset for fully reproducible research on language-vision learning. This model was developed in response to issues identified by the Stanford Internet Observatory in December 2023. The updates were made in collaboration with multiple organizations like the Internet Watch Foundation (IWF), the Canadian Center for Child Protection (C3P), and Stanford Internet Observatory.\",,Problems with the dataset should be reported to the LAION organization. They have open lines for communication with their partners and the broader research community.,,Re-LAION-5B is designed for research on language-vision learning. It can also be used by third parties to clean existing derivatives of LAION-5B by generating diffs and removing all matched content from their versions.,Apache 2.0,text; image,unknown,Re-LAION-5B,Germany,LAION e.V.,\"The dataset should not be utilized for purposes that breach legal parameters or ethical standards, such as dealing with illegal content.\",The model utilized lists of link and image hashes provided by partner organizations. These were used to remove inappropriate links from the original LAION-5B dataset to create Re-LAION-5B.,,\"5.5B (text, image) pairs\",model,https://laion,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,Evaluated on the dimensions proposed by OpenCompass in comparison to other LLMs.,2023-09-20,,[],\"InternLM is an LLM pre-trained on over 2.3T Tokens containing high-quality English, Chinese, and code data.\",,https://huggingface.co/internlm/internlm-20b/discussions,,,Apache 2.0,text; text,unknown,InternLM,China,InternLM,,,,7B parameters (dense),model,https://github.com/InternLM/InternLM,https://huggingface.co/internlm/internlm-20b,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on open source multilingual model benchmarks.,2024-02-26,,['Llama 2'],SambaLingo is a suite of models that adapt Llama 2 to a diverse set of 9 languages.,,https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base/discussions,,,LLaMA 2,text; text,,SambaLingo,USA,Samba Nova Systems,\"SambaLingo should not be used for mission-critical applications, applications involving the safety of others, and highly critical decisions.\",,,unknown,model,https://sambanova.ai/blog/sambalingo-open-source-language-experts,https://huggingface.co/sambanovasystems/SambaLingo-Arabic-Base,unknown,unknown,unknown,,,,,,,\nlimited,unknown,2024-02-28,,\"['Llama 2', 'Mistral', 'Falcon-180B', 'Deepseek', 'BLOOM', 'LLaVA', 'CLIP']\",Samba 1 is a trillion parameter generative AI model using a Composition of Experts architecture.,,,,,unknown,text; text,unknown,Samba 1,USA,Samba Nova Systems,,,,1T parameters (dense),model,https://sambanova.ai/blog/samba-1-composition-of-experts-mode,,unknown,unknown,unknown,,,,,,,\nlimited,,2023-01-31,,['ChatGPT API'],\"Give your sales, marketing, and customer service teams one of the most powerful AI tools available - ChatGPT priority access, no timeout limits, company wide access managed through a single account, incorporate into your existing processes without leaving HubSpot\",,,,,unknown,,,ChatGPT powered by OBO,USA,HubSpot,,,,,application,https://ecosystem.hubspot.com/marketplace/apps/sales/sales-enablement/the-obo-group-chatgpt-1398072,,,,,,,,,,,\nopen,\"Mainly evaluated on MT-Bench and AlpacaEval, which are GPT-4-based comparisons.\",2023-11-02,,[],Starling is a large language model trained by reinforcement learning from AI feedback focused on improving chatbot helpfulness.,,https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/discussions,,Academic research and free commercial usage,CC BY NC 4.0,text; text,,Starling,USA,Ollama,,,,7B parameters (dense),model,https://starling.cs.berkeley.edu/,https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha,unknown,unknown,unknown,,,,,,,\nlimited,,,,['OpenAI API'],\"\"\"Sana is your all-in-one, AI-assisted, online learning platform (LMS). Author employee training courses and measure team development with Sana's powerful analytics. Sana partners with the world's most important organizations and fastest-growing startups to make personalized, adaptive learning available for everyone, everywhere\"\" [[Sana GPT-3 Demo]](https://gpt3demo.com/apps/sanalabs).\n\",,unknown,,\"Sana is intended to be used by employers to provide a learning service for their employees.\n\",custom,,unknown,Sana,USA,Sana,,unknown,,,application,https://www.sanalabs.com/,,,,,\"Customized GPT-3, fine-tuned on private data [[Sana GPT-3 Demo]](https://gpt3demo.com/apps/sanalabs).\n\",unknown,unknown,\"question and answer, summarization, sentiment analysis,topic identification\",https://www.sanalabs.com/legal/,unknown,\nopen,Evaluated on EleutherAI evaluation harness.,2023-06-16,,\"['GPT-NeoX', 'H2O AI OpenAssistant', 'h2oGPT Repositories']\",Series of models fine-tuned on well-known LLMs using the h2oGPT repositories.,,https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b/discussions,,,Apache 2.0,text; text,,h2oGPT,USA,H2O AI,,,,20B parameters (dense),model,https://arxiv.org/pdf/2306.08161.pdf,https://huggingface.co/h2oai/h2ogpt-oasst1-512-20b,unknown,unspecified number of 48GB A100 NVIDIA GPUs,unknown,,,,,,,\nopen,Evaluated on common sense and world knowledge benchmarks.,2024-01-30,,[],H2O Danube is a language model trained on 1T tokens following the core principles of LLaMA 2 and Mistral.,,https://huggingface.co/h2oai/h2o-danube-1.8b-base/discussions,,,Apache 2.0,text; text,unknown,H2O Danube,USA,H2O AI,\"Users are encouraged to use the large language model responsibly and ethically. By using this model, you agree not to use it for purposes that promote hate speech, discrimination, harassment, or any form of illegal or harmful activities.\",unknown,,1.8B parameters (dense),model,https://arxiv.org/pdf/2401.16818.pdf,https://huggingface.co/h2oai/h2o-danube-1.8b-base,unknown,8x H100 GPUs on a single node,unknown,,,,,,,\nopen,,2023-11-04,,[],SaiLy is a series/collection of AI Models by Deepnight Research which are highly experimental and uncensored.,,https://huggingface.co/deepnight-research/saily_100b/discussions,,,MIT,text; text,unknown,SaiLY,International,Deepnight Research,,,,100B parameters (dense),model,https://huggingface.co/deepnight-research/saily_100b,https://huggingface.co/deepnight-research/saily_100b,unknown,unknown,unknown,,,,,,,\nopen,Evaluated across a range of domain tasks across standard benchmarks in comparison to predecessor Llama 2.,2023-10-19,,\"['Llama 2', 'BLOOM']\",TigerBot is an open source multilingual multitask LLM.,,https://huggingface.co/TigerResearch/tigerbot-180b-base-v2/discussions,,,Apache 2.0,text; text,unknown,TigerBot,International,TigerResearch,,Safety filtering performed to mitigate risk and remove toxic content.,,180B parameters (dense),model,https://arxiv.org/pdf/2312.08688.pdf,https://huggingface.co/TigerResearch/tigerbot-180b-base-v2,unknown,32 A100-40G GPUs,unknown,,,,,,,\nopen,outperforms majority of preceding state-of-the-art models over 15 unique biomedical modalities.,2023-05-26,,\"['GPT-style autoregressive decoder', 'BiomedGPT biomedical datasets']\",BiomedGPT leverages self-supervision on large and diverse datasets to accept multi-modal inputs and perform a range of downstream tasks.,,,,furthering research in developing unified and generalist models for biomedicine.,Apache 2.0,\"image, text; text\",,BiomedGPT,USA,Lehigh University,,\"No specific quality control is mentioned in model training, though details on data processing and how the model was trained are provided in the paper.\",,472M parameters (dense),model,https://arxiv.org/pdf/2305.17100.pdf,,unknown,10 NVIDIA A5000 GPUs,unknown,,,,,,,\nopen,\"Evaluated on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.\",2023-09-20,,[],\"Baichuan 2 is a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.\",,https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1/discussions,,,Apache 2.0,text; text,,Baichuan 2,China,Baichuan Inc.,,,,13B parameters (dense),model,https://arxiv.org/pdf/2309.10305.pdf,,unknown,1024 NVIDIA A800 GPUs,unknown,,,,,,,\nopen,\"Evaluated across standard benchmarks and generally performs better than Mixtral, which it was fine-tuned on.\",2024-01-10,,['Mixtral'],Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM.,,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/discussions,,,Apache 2.0,\"text; code, text\",unknown,Nous Hermes 2,Canada,Nous Research,,unknown,,7B parameters (dense),model,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO,https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO,unknown,unknown,unknown,,,,,,,\nopen,Evaluated across a variety of standard benchmarks in comparison to LLaMA 2.,2023-11-01,,['LLaMA 2'],YaRN LLaMA 2 is an adapted version of LLaMA 2 using the YaRN extension method.,,https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k/discussions,,,LLaMA 2,text; text,unknown,YaRN LLaMA 2,Canada/Switzerland,\"Nous Research, EleutherAI, University of Geneva\",,,,70B parameters (dense),model,https://arxiv.org/pdf/2309.00071.pdf,https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k,unknown,unknown,unknown,,,,,,,\nopen,,2023-11-13,,['Yi'],The Capybara series is a series of LLMs and the first Nous collection of models made by fine-tuning mostly on data created by Nous in-house.,,https://huggingface.co/NousResearch/Nous-Capybara-34B/discussions,,,MIT,text; text,unknown,Nous Capybara,Canada,Nous Research,,,,34B parameters (dense),model,https://huggingface.co/NousResearch/Nous-Capybara-34B,https://huggingface.co/NousResearch/Nous-Capybara-34B,unknown,unknown,unknown,,,,,,,\nopen,Evaluated across a variety of standard benchmarks in comparison to Mistral.,2023-11-01,,['Mistral'],YaRN Mistral is an adapted version of Mistral using the YaRN extension method.,,https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k/discussions,,,MIT,text; text,unknown,YaRN Mistral,Canada/Switzerland,\"Nous Research, EleutherAI, University of Geneva\",,,,7B parameters (dense),model,https://arxiv.org/pdf/2309.00071.pdf,https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k,unknown,unknown,unknown,,,,,,,\nopen,Evaluated on common LLM benchmarks in comparison to other Mistral derivatives.,2023-11-03,,['Mistral'],\"OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, trained on additional code datasets.\",,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B/discussions,,,Apache 2.0,text; text,unknown,OpenHermes 2.5 Mistral,Canada,Nous Research,,,,7B parameters (dense),model,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B,https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B,unknown,unknown,unknown,,,,,,,\nopen,\"The model was examined across a range of benchmarks including GPT4All, AGIEval, BigBench, TruthfulQA and in-house evaluations of function calling and JSON mode.\",2024-03-10,,\"['Mistral', 'OpenHermes 2.5 Dataset', 'Nous Hermes 2']\",\"Hermes 2 Pro on Mistral 7B is an upgraded, retrained version of Nous Hermes 2. This improved version excels at function calling, JSON Structured Outputs, and several other areas, scoring positively on various benchmarks.\",,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B/discussions,,\"The model is intended for general task and conversation capabilities, function calling, and JSON structured outputs.\",Apache 2.0,text; text,unknown,Hermes 2 Pro-Mistral,unknown,Nous,unknown,\"The model was evaluated across multiple tasks, displaying notable scores in GPT4All, AGIEval, BigBench, and TruthfulQA. It also has a high score on function calling and JSON mode, indicating the robustness of its capabilities.\",,7B parameters (dense),model,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,unknown,unknown,unknown,,,,,,,\nopen,unknown,2024-03-07,,[],\"Genstruct is an instruction-generation model, designed to create valid instructions given a raw text corpus. This enables the creation of new, partially synthetic instruction finetuning datasets from any raw-text corpus. This work was inspired by Ada-Instruct and the model is also trained to generate questions involving complex scenarios that require detailed reasoning.\",,https://huggingface.co/NousResearch/Genstruct-7B/discussions,,\"The model is intended for instruction-generation, creating questions involving complex scenarios and generating reasoning steps for those questions.\",Apache 2.0,text; text,unknown,Genstruct,unknown,Nous,unknown,unknown,,7B parameters (dense),model,https://huggingface.co/NousResearch/Genstruct-7B,https://huggingface.co/NousResearch/Genstruct-7B,unknown,unknown,unknown,,,,,,,\nopen,Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models.,2025-01-14,,['DeepSeek-R1'],\"DeepSeek-V3 is a Mixture-of-Experts (MoE) language model with 671B total parameters and 37B activated per token. It utilizes Multi-head Latent Attention (MLA) and adopts innovative strategies for improved performance, such as an auxiliary-loss-free load balancing and a multi-token prediction training objective. Comprehensive evaluations show it achieves performance comparable to leading closed-source models.\",,unknown,,unknown,MIT,unknown,unknown,DeepSeek-V3,unknown,DeepSeek,unknown,\"Post-training includes knowledge distillation from the DeepSeek-R1 model, incorporating verification and reflection patterns to enhance reasoning performance.\",,671B parameters (sparse),model,https://huggingface.co/deepseek-ai/DeepSeek-V3,https://huggingface.co/deepseek-ai/DeepSeek-V3,unknown,H800 GPUs,2.788M GPU hours,,,,,,,\nopen,,2024-04-24,,[],\"We present a methodology for scaling LLMs called depth up-scaling (DUS) , which encompasses architectural modifications and continued pretraining. In other words, we integrated Mistral 7B weights into the upscaled layers, and finally, continued pre-training for the entire model. SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model. For detailed information, please refer to the experimental table. Solar 10.7B is an ideal choice for fine-tuning. SOLAR-10.7B offers robustness and adaptability for your fine-tuning needs. Our simple instruction fine-tuning using the SOLAR-10.7B pre-trained model yields significant performance improvements (SOLAR-10.7B-Instruct-v1.0).\",,https://www.upstage.ai/solar-llm,,,Apache 2.0,text; text,,SOLAR,South Korea,Upstage.ai,,,,10.7B parameters,model,https://arxiv.org/abs/2312.15166,https://huggingface.co/upstage/SOLAR-10.7B-v1.0,,,,,,,,,,\nclosed,,2022-09-14,,[],ACT-1 (ACtion Transformer) is a large-scale transformer model designed and trained specifically for taking actions on computers (use software tools APIs and websites) in response to the user's natural language commands.,,,,,unknown,text; text,,ACT-1,USA,Adept,,,,,model,https://www.adept.ai/blog/act-1,,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluated in comparison to LLaMA 2 and MPT Instruct, and outperforms both on standard benchmarks.\",2023-09-07,,[],\"Persimmon is the most capable open-source, fully permissive model with fewer than 10 billion parameters, as of its release date.\",,,,,Apache 2.0,text; text,,Persimmon,USA,Adept,,,,8B parameters (dense),model,https://www.adept.ai/blog/persimmon-8b,,,,,,,,,,,\nopen,Evaluated on standard image understanding benchmarks.,2023-10-17,,[],Fuyu is a small version of the multimodal model that powers Adept's core product.,,https://huggingface.co/adept/fuyu-8b/discussions,,The model is intended for research purposes only.,CC-BY-NC-4.0,\"image, text; text\",,Fuyu,USA,Adept,\"The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.\",,,8B parameters (dense),model,https://www.adept.ai/blog/fuyu-8b,https://huggingface.co/adept/fuyu-8b,unknown,unknown,unknown,,,,,,,\nclosed,\"Evaluated on the MMLU, GSM8K, MATH, and HumanEval benchmarks. According to these benchmarks, Fuyu-Heavy is, as of release, the strongest multimodal model trained outside of Google or OpenAI.\",2024-01-24,,[],Fuyu Heavy is a new multimodal model designed specifically for digital agents.,,,,unknown,unknown,\"image, text; text\",,Fuyu Heavy,USA,Adept,,,,unknown,model,https://www.adept.ai/blog/adept-fuyu-heavy,,unknown,unknown,unknown,,,,,,,\nclosed,\"The GPT-3 paper, which also introduces the GPT-3 dataset, provides a limited analysis on the GPT-3 dataset, reporting the dirtiness of the dataset after the it was filtered for text occurring in common benchmarking tasks. The authors report that \"\"as the dataset becomes more contaminated, the variance of the clean over all fraction increases, but there is no apparent bias towards improved or degraded performance\"\" [[Appendix C]](https://arxiv.org/pdf/2005.14165.pdf#appendix.C).\",2020-06-11,,['WebText'],The GPT-3 dataset is the text corpus that was used to train the GPT-3 model. Information on the GPT-3 dataset is limited to discussion in the paper introducing GPT-3 [[Section 2.2]](https://arxiv.org/pdf/2005.14165.pdf#subsection.2.2).,The Common Crawl dataset was processed using a classifier that kept high quality documents and filtered low quality documents. WebText was used as a proxy for high quality documents [[Appendix A]](https://arxiv.org/pdf/2005.14165.pdf#appendix.A).,unknown,\"The dataset is composed of several NLP corpora including Common Crawl (filtered, 60%), WebText2 (22%), Books1 (8%), Books2 (8%), Wikipedia (3%) [[Section 2.2]](https://arxiv.org/pdf/2005.14165.pdf#subsection.2.2).\",The intended use of the GPT-3 dataset is to train language models.,unknown,text,unknown,GPT-3 dataset,USA,OpenAI,unknown,\"In addition to excluding low quality documents from the Common Crawl dataset, the authors fuzzily deduplicated documents within each dataset, by removing documents that have high overlap with each other. The same procedure was followed to fuzzily deduplicate WebText from Common Crawl [[Appendix A]](https://arxiv.org/pdf/2005.14165.pdf#appendix.A). Text occurring in benchmark datasets were also partially removed [[Appendix C]](https://arxiv.org/pdf/2005.14165.pdf#appendix.C).\",[],570 GB,dataset,https://arxiv.org/pdf/2005.14165.pdf,,,,,,,,,,,\nopen,,2021-08-10,,[],\"HumanEval is a dataset of 164 programming problems hand-written to evaluate their Codex model.\n\",\"Code problems easily found on the internet.\n\",\"Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf).\n\",\"164 hand-written questions.\n\",\"Evaluating code generation capabilities of models.\n\",MIT,code,,HumanEval,USA,OpenAI,,\"The evaluation dataset was handwritten to ensure that the evaluation problems do not exist in the Codex dataset [[Section 2.2]](https://arxiv.org/pdf/2107.03374.pdf#subsection.2.2).\n\",\"['\\n\\ndef string_sequence(n: int) -> str:\\n    \"\"\"\"\"\" Return a string containing space-delimited numbers starting from 0 upto n inclusive.\\n    >>> string_sequence(0)\\n    \\'0\\'\\n    >>> string_sequence(5)\\n    \\'0 1 2 3 4 5\\'\\n    \"\"\"\"\"\"\\n', '\\n\\ndef count_distinct_characters(string: str) -> int:\\n    \"\"\"\"\"\" Given a string, find out how many distinct characters (regardless of case) does it consist of\\n    >>> count_distinct_characters(\\'xyzXYZ\\')\\n    3\\n    >>> count_distinct_characters(\\'Jerry\\')\\n    4\\n    \"\"\"\"\"\"\\n', 'from typing import List\\n\\n\\ndef parse_music(music_string: str) -> List[int]:\\n    \"\"\"\"\"\" Input to this function is a string representing musical notes in a special ASCII format.\\n    Your task is to parse this string and return list of integers corresponding to how many beats does each\\n    not last.\\n\\n    Here is a legend:\\n    \\'o\\' - whole note, lasts four beats\\n    \\'o|\\' - half note, lasts two beats\\n    \\'.|\\' - quater note, lasts one beat\\n\\n    >>> parse_music(\\'o o| .| o| o| .| .| .| .| o o\\')\\n    [4, 2, 1, 2, 2, 1, 1, 1, 1, 4, 4]\\n    \"\"\"\"\"\"\\n', '\\n\\ndef how_many_times(string: str, substring: str) -> int:\\n    \"\"\"\"\"\" Find how many times a given substring can be found in the original string. Count overlaping cases.\\n    >>> how_many_times(\\'\\', \\'a\\')\\n    0\\n    >>> how_many_times(\\'aaa\\', \\'a\\')\\n    3\\n    >>> how_many_times(\\'aaaa\\', \\'aa\\')\\n    3\\n    \"\"\"\"\"\"\\n', 'from typing import List\\n\\n\\ndef sort_numbers(numbers: str) -> str:\\n    \"\"\"\"\"\" Input is a space-delimited string of numberals from \\'zero\\' to \\'nine\\'.\\n    Valid choices are \\'zero\\', \\'one\\', \\'two\\', \\'three\\', \\'four\\', \\'five\\', \\'six\\', \\'seven\\', \\'eight\\' and \\'nine\\'.\\n    Return the string with numbers sorted from smallest to largest\\n    >>> sort_numbers(\\'three one five\\')\\n    \\'one three five\\'\\n    \"\"\"\"\"\"\\n']\",214 KB,dataset,https://arxiv.org/pdf/2107.03374.pdf,,,,,,,,,,,\nclosed,,2021-08-10,,[],\"The dataset used to train the Codex model.\n\",\"Following were filtered from the dataset: autogenerated files; files with average line length > 100, maximum line length > 1000, or few alphanumeric characters [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1).\n\",\"Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf).\n\",\"The dataset includes 54 million public software repositories hosted on GitHub as of an unspecified date in May 2020 [[Section 3.1]](https://arxiv.org/pdf/2107.03374.pdf#subsection.3.1).\n\",Training language models on code.,,,unknown,Codex dataset,USA,OpenAI,unknown,\"Dataset was filtered using simple heuristics, as outlined in the excluded field.\n\",[],159 GB,dataset,https://arxiv.org/pdf/2107.03374.pdf,,,,,,,,,,,\nclosed,\"The dataset contained some overlap with the test sets of the benchmarks used for evaluation, but the authors determined the impact to be small: \"\"There is a median overlap of 2.2% and an average overlap of 3.2%. Due to this small amount of overlap, overall accuracy is rarely shifted by more than 0.1% with only 7 datasets above this threshold\"\" [[Section 5]](https://arxiv.org/pdf/2103.00020.pdf#section.5).\n\",2021-01-05,,[],\"CLIP dataset contains text-image pairs crawled from the internet.\n\",,unknown,\"Data crawled from the internet, without any filtering (including de-duplication) or curation.\n\",Training multimodal vision models.,unknown,\"image, text\",unknown,CLIP dataset,USA,OpenAI,unknown,\"The data was \"\"only crawled websites that had policies against excessively violent and adult images and allowed us to filter out such content\"\" [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.md).\n\",[],\"400M (image, text) pairs\",dataset,https://arxiv.org/pdf/2103.00020.pdf,,,,,,,,,,,\nclosed,\"The authors found that the dataset contained 21% of the images in the MS-COCO validation set, but observed no significant changes in the performance of the accompanying DALL·E when tested on MS-COCO evaluation set with and without the said images [[Section 3.1]](https://arxiv.org/pdf/2102.12092.pdf#subsection.3.1).\",2021-01-05,,[],\"DALL·E dataset is the training set consisting of image and text pairs collected to train the DALL·E model.\n\",\"MS-COCO was excluded from the dataset, but because MS-COCO was created from YFCC100M, some of the test images (not the captions) were included.\n\",unknown,\"Data from the internet, including Conceptual Captions and a filtered subset of YFCC100M.\n\",Training multimodal vision models.,unknown,\"image, text\",unknown,DALL·E dataset,USA,OpenAI,unknown,\"The data was de-duplicated [[Section 3.2]](https://arxiv.org/pdf/2102.12092.pdf#subsection.3.2). The data collected from the internet was filtered using image, text and joint image and text filters, which included: \"\"discarding instances whose captions are too short, are classified as non-English by the Python package cld3, or that consist primarily of boilerplate phrases such as “photographed on <date>”, where <date> matches various formats for dates that we found in the data\"\". The authors also discard \"\"instances whose images have aspect ratios not in [1/2, 2]\"\" [[Appendix C]](https://arxiv.org/pdf/2102.12092.pdf#appendix.C).\n\",[],\"250M (image, text) pairs\n\",dataset,https://arxiv.org/abs/2102.12092,,,,,,,,,,,\nclosed,\"The Whisper paper provides limited details on preprocessing.\n\",2022-09-21,,[],\"The Whisper dataset is the speech corpus that was used to train the Whisper model. Information on the dataset is limited to discussion in the paper introducing Whisper. [[Section 2.1]](https://cdn.openai.com/papers/whisper.pdf).\n\",\"Automated filtering was conducted.\n\",unknown,\"The dataset is composed three major sources: multilingual speech recognition (17%), translation (18%), and English speech recognition (65%). [[Figure 11]](https://cdn.openai.com/papers/whisper.pdf).\n\",The intended use is to train speech models.,unknown,audio,,Whisper dataset,USA,OpenAI,,\"In addition to filtering, basic text standardization was done.\n\",[],680k hours,dataset,https://cdn.openai.com/papers/whisper.pdf,,,,,,,,,,,\nclosed,,2019-11-01,,[],,,,,,unknown,text,,WebText,USA,OpenAI,,,[],40 GB,dataset,https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf,,,,,,,,,,,\nopen,,2019-11-01,,['WebText'],,,,,,Modified MIT License,text; text,,GPT-2,USA,OpenAI,,,,1.5B parameters (dense),model,https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf,https://github.com/openai/gpt-2/blob/master/model_card.md,,,,,,,,,,\nlimited,\"The GPT-3 model was evaluated on language modeling, closed-book question answering, translation, Winograd-style tasks, commonsense reasoning, reading comprehension, SuperGLUE, NLI, synthetic tasks, and generation [[Section 4]](https://arxiv.org/pdf/2005.14165.pdf#section.4); as well as on fairness and biases [[Section 6]](https://arxiv.org/pdf/2005.14165.pdf#section.6).\n\",2020-06-11,,['GPT-3 dataset'],\"GPT-3 is an autoregressive language model.\n\",,\"Feedback for GPT-3 can be provided on the feedback form linked in the model card [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md). The form is especially meant to collect feedback on concerns about misuse, synthetic text detection, bias, and risk of generative language models.\n\",,\"GPT-3 was intended to be use through the OpenAI API by developers for language applications. Other intended use of GPT-3 include researchers accessing the model through the API to study its paradigms [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md).\n\",unknown,text; text,\"OpenAI reviews all use cases of the model [[Model Card]](https://github.com/openai/gpt-3/blob/master/model-card.md).\n\",GPT-3,USA,OpenAI,\"Access to GPT-3 is governed by Open AI API Usage Guidelines and API Terms of Use, prohibiting the use of the API in a way that causes societal harm. [[Usage Guidelines]] (https://beta.openai.com/docs/usage-guidelines/content-policy) [[Terms of Use]](https://openai.com/api/policies/terms/). The list of disallowed applications can be found in the usage guidelines [[Disallowed Applications]] (https://beta.openai.com/docs/usage-guidelines/disallowed-applications).\n\",\"One quality control method OpenAI employed was releasing GPT-3 only through the OpenAI API. OpenAI states that it is easier to respond to misuse when the access to the model is gated through the API. It also hints that it plans to broaden the API access over time based on the amount of misuse [[OpenAI API Blog Post]](https://openai.com/blog/openai-api/). The authors identify potential misuses of GPT-3 in the paper and analyze it for fairness, bias and representation issues, but do not identify mitigation strategies [[Section 6]](https://arxiv.org/pdf/2005.14165.pdf#section.6).\n\",,175B parameters (dense),model,https://arxiv.org/pdf/2005.14165.pdf,https://github.com/openai/gpt-3/blob/master/model-card.md,552.1 tCO2e,Azure,3640 petaflop/s-days,,,,,,,\nlimited,\"The model was evaluated using the HumanEval dataset with pass@k metric and BLEU scores [[Section 2]](https://arxiv.org/pdf/2107.03374.pdf#section.2).\n\",2021-08-10,,\"['GPT-3', 'Codex dataset', 'HumanEval']\",\"Codex is a GPT language model fine-tuned on publicly available code from GitHub.\n\",,\"Email the authors [[Codex Paper]](https://arxiv.org/pdf/2107.03374.pdf).\n\",,\"Codex is intended to be used for coding related language modelling tasks.\n\",unknown,\"text; code, text\",unknown,Codex,USA,OpenAI,unknown,\"The model wasn't fully released to the public as a quality control measure. The authors identify potential risks of Codex in their paper due to the following: over-reliance, misalignment, bias and representation, economic and labor market impacts, security implications, environmental impact and legal implications. They also make suggestions for some of these, but do not implement them in Codex [[Section 7]](https://arxiv.org/pdf/2107.03374.pdf#section.7).\n\",,12B parameters (dense),model,https://arxiv.org/pdf/2107.03374.pdf,,unknown,Azure,100-1000 petaflop/s-days,,,,,,,\nclosed,\"The model was evaluated on human ratings to the InstructGPT answers to the prompts submitted to the OpenAI API as well as on public NLP datasets spanning truthfulness, toxicity, and bias, question answering, reading comprehension, and summarization tasks.\",2022-01-27,,\"['GPT-3', 'OpenAI API']\",\"InstructGPT is a family of GPT-3 based models fine-tuned on human feedback, which allows for better instruction following capabilities than GPT-3.\n\",,\"Email the authors [[InstructGPT Paper]](https://arxiv.org/pdf/2203.02155.pdf).\n\",,\"As stated in the model card: \"\"The intended direct users of InstructGPT are developers who access its capabilities via the OpenAI API. Through the OpenAI API, the model can be used by those who may not have AI development experience, to build and explore language modeling systems across a wide range of functions. We also anticipate that the model will continue to be used by researchers to better understand the behaviors, capabilities, biases, and constraints of large-scale language models\"\" [[Model Card]](https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md).\n\",unknown,\"text; code, text\",unknown,InstructGPT,USA,OpenAI,\"Access to InstructGPT is governed by Open AI API Usage Guidelines and API Terms of Use, prohibiting the use of the API in a way that causes societal harm. [[Usage Guidelines]] (https://beta.openai.com/docs/usage-guidelines/content-policy) [[Terms of Use]](https://openai.com/api/policies/terms/). The list of disallowed applications can be found in the usage guidelines [[Disallowed Applications]] (https://beta.openai.com/docs/usage-guidelines/disallowed-applications).\n\",\"The model wasn't fully released to the public as a quality control measure.\n\",,175B parameters (dense),model,https://arxiv.org/pdf/2203.02155.pdf,https://github.com/openai/following-instructions-human-feedback/blob/main/model-card.md,unknown,unknown,60 petaflops/s-days,,,,,,,\nopen,\"The model was evaluated for zero-shot English and multingual speech recognition, translation, language identification and robustness to noise.\",2022-09-21,,['Whisper dataset'],Whisper is an audio transcription software.,,\"The discussions page of the codebase is not formally cited as a place for feedback, but is being used in this way [[Discussions page]](https://github.com/openai/whisper/discussions)\n\",,\"Whisper is a general-purpose speech recognition model; it is a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.\n\",MIT,audio; text,,Whisper,USA,OpenAI,unknown,No specific quality control methods are documented.,,1.5B parameters (dense),model,https://cdn.openai.com/papers/whisper.pdf,https://github.com/openai/whisper/blob/main/model-card.md,unknown,Azure,unknown,,,,,,,\nopen,\"The model was evaluated on standard vision datasets (e.g. CIFAR10, ImageNet) and showed robust state of the art results.\",2021-01-05,,['CLIP dataset'],\"\"\"CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision\"\" [[CLIP Repository]](https://github.com/openai/CLIP).\n\",,Questions can be shared at the feedback form linked in the CLIP model card [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.mdlicen).,,\"The model is intended to be used by AI researchers to better understand \"\"robustness, generalization, and other capabilities, biases, and constraints of computer vision models\"\" [[CLIP Model Card]](https://github.com/openai/CLIP/blob/main/model-card.md).\n\",MIT,\"image, text; text\",,CLIP,USA,OpenAI,\"\"\"Any deployed use case of the model - whether commercial or not - is currently out of scope. Non-deployed use cases such as image search in a constrained environment, are also not recommended unless there is thorough in-domain testing of the model with a specific, fixed class taxonomy. This is because our safety assessment demonstrated a high need for task specific testing especially given the variability of CLIP’s performance with different class taxonomies. This makes untested and unconstrained deployment of the model in any use case currently potentially harmful.\nCertain use cases which would fall under the domain of surveillance and facial recognition are always out-of-scope regardless of performance of the model. This is because the use of artificial intelligence for tasks such as these can be premature currently given the lack of testing norms and checks to ensure its fair use.\nSince the model has not been purposefully trained in or evaluated on any languages other than English, its use should be limited to English language use cases\"\" [[Model Card]](https://github.com/openai/CLIP/blob/main/model-card.mdlicen).\n\",\"The authors found that the performance of the model depended heavily on which classes are included (and excluded) for a given task. They reported significant race and gender based disparities on the Fairface dataset, depending on how the classes were constructed. The authors also demonstrated that the model was capable of racial profiling with high accuracy [[Section 7]](https://arxiv.org/pdf/2103.00020.pdf#section.7).\n\",,unknown,model,https://arxiv.org/pdf/2103.00020.pdf,https://github.com/openai/CLIP/blob/main/model-card.md,unknown,NVIDIA V100 GPUs,71.12 petaflop/s-day,,,,,,,\nlimited,\"The model was evaluated against three prior approaches, AttnGAN, DM-GAN, and DF-GAN using Inception Score and Fréchet Inception Distance on MS-COCO as metrics. The model was also evaluated by humans and received the majority of the votes in generating images that look realistic and better match the caption when compared to the images generated by DF-GAN [[Section]](https://arxiv.org/pdf/2102.12092.pdf#section.3).\n\",2021-01-05,,['DALL·E dataset'],\"DALL·E is a GPT-3 based model trained to generate images from text descriptions. The authors found that it had \"\"a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images\"\" [[OpenAI Blog Post]](https://openai.com/blog/dall-e/).\n\",,\"Contact the paper author(s) specified on the paper [[Paper]](https://arxiv.org/pdf/2102.12092.pdf).\n\",,\"\"\"The model is intended for others to use for training their own generative models\"\" [[Model Card]](https://github.com/openai/DALL-E/blob/master/model_card.md).\n\",unknown,text; image,,DALL·E,USA,OpenAI,unknown,unknown,,12B parameters (dense),model,https://arxiv.org/pdf/2102.12092.pdf,https://github.com/openai/DALL-E/blob/master/model_card.md,unknown,NVIDIA V100 GPUs,unknown,,,,,,,\nopen,Evaluations in paper are primarily considering the fidelity and novelty of samples from Jukebox.,2020-04-30,,['Jukebox Dataset'],Jukebox is a generative model that produces music,,,,,Noncommercial Use License,text; audio,,Jukebox,USA,OpenAI,,,,5B parameters (dense),model,https://arxiv.org/abs/2005.00341,,unknown,510 V100s,4 weeks,,,,,,,\nlimited,The model is capable of generating explicit content and the researchers found limited amount of spurious content generated. The researchers also found that visual synonyms can be used to prompt the model to surface unwanted generations [[Probes and Evaluations]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#probes-and-evaluations).,2022-04-13,,\"['DALL·E dataset', 'CLIP dataset']\",\"\"\"DALL·E 2 is an artificial intelligence model that takes a text prompt and/or existing image as an input and generates a new image as an output\"\" [[System Card]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md). The model wasn't fully released, but OpenAI released a version of the model (DALL·E 2 Preview) to a select group of testers.\n\",,Feedback can be provided at support at openai.com.,,\"\"\"The intended use of the DALL·E 2 Preview at this time is for personal, non-commercial exploration and research purposes by people who are interested in understanding the potential uses of these capabilities\"\" [[Use]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#use).\n\",unknown,text; image,\"Uses of the model are monitored. In the preview version, any user can flag content. The specific policies for monitoring are not disclosed, but possible measures include disabling of accounts violating the content policies [[Monitoring and Reporting]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#monitoring-and-reporting).\n'\",DALL·E 2,USA,OpenAI,\"Use of the model is governed by the OpenAI Content Policy, which prohibits posting of G rated content. Users are not allowed to utilize the model in commercial products in the preview version [[Content Policy]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#policies-and-enforcement).\",The model is not fully released to the public as part of a quality control measure. The usage of the model by testers is monitored and user provided prompts are filtered [[Input filters]] (https://github.com/openai/dalle-2-preview/blob/main/system-card.md#input-filters).,,unknown,model,https://arxiv.org/abs/2204.06125,https://github.com/openai/dalle-2-preview/blob/main/system-card.md,unknown,unknown,unknown,,,,,,,\nlimited,,2020-06-11,,\"['GPT-3', 'Codex', 'code-davinci-002', 'text-davinci-002', 'text-davinci-003', 'gpt-3.5-turbo', 'Whisper', 'DALL·E', 'GPT-4', 'GPT-4 Turbo']\",\"OpenAI API is a general purpose \"\"text in, text out\"\" interface connecting users with a suite of language models. The API was initially released as a gateway to GPT-3, but it now supports access to other, more specialized OpenAI models. [[Open AI Blog Post]](https://openai.com/blog/openai-api/)\n\",,unknown,,\"OpenAI API was designed to be used by developers to empower applications, and researchers to study language models [[Section 3]](https://openai.com/api/policies/terms/).\n\",custom,,\"OpenAI may monitor the API use to ensure \"\"quality and improve OpenAI systems, products and services; perform research; and ensure compliance\"\" with the Terms of Service and all applicable laws. Users of the API will give OpenAI reasonable access to their application to monitor compliance with the terms listed in the Terms of Service [[Section 5(b)]](https://openai.com/api/policies/terms/). Apps using the OpenAI API should submit an application once they are deployed to real users. The review form takes 10 minutes to complete and over 97% of the applications are directly accepted or conditionally accepted. The applicants are notified of the decision within 2 business days [[App Review Guidelines]] (https://beta.openai.com/docs/usage-guidelines/app-review).\n\",OpenAI API,USA,OpenAI,\"OpenAI API Terms of Use prohibits the use of the API in a way violating the applicable law, including: (i) \"\"Illegal activities, such as child pornography, gambling, cybercrime, piracy, violating copyright, trademark or other intellectual property laws\"\"; (ii) \"\"Accessing or authorizing anyone to access the APIs from an embargoed country, region, or territory as prohibited by the U.S. government\"\"; (iii) \"\"Threatening, stalking, defaming, defrauding, degrading, victimizing or intimidating anyone for any reason\"\". The usage requirements are detailed in the Terms of Use [[Section 3]](https://openai.com/api/policies/terms/).\n\",\"Given a prompt, OpenAI API checks whether a completion contains unsafe language using its filters and marks the completion accordingly if so. The API also provides developers with special endpoints that scope the API usage. OpenAI also developed user guidelines to help developers understand safety issues [[OpenAI API]](https://openai.com/api/).\n\",,,application,https://openai.com/api/,,,,,The API exposes the models fairly direclty with a range of hyperparameters (e.g. temperature scaling).,unknown,unknown,\"Given a prompting text, the OpenAI API provides access to text completions, and log probabilities. The support for text and code embeddings were added on 2022-01-25 [[OpenAI Blog Post]] (https://openai.com/blog/introducing-text-and-code-embeddings/).\n\",https://openai.com/api/policies/terms/,unknown,\nopen,,2022-06-23,,['web_clean'],,,,,,MIT,video; video,,VPT,USA,OpenAI,,,,500M parameters (dense),model,https://arxiv.org/abs/2206.11795,,,,,,,,,,,\nclosed,,2022-06-23,,[],,,,,,unknown,video,,web_clean,USA,OpenAI,,,[],70k hours,dataset,https://arxiv.org/abs/2206.11795,,,,,,,,,,,\nlimited,,2022-11-30,,\"['gpt-3.5-turbo', 'OpenAI toxicity classifier']\",ChatGPT is an artificial intelligence chatbot developed by OpenAI.,,,,,custom,,,ChatGPT,USA,OpenAI,,,,,application,https://openai.com/blog/chatgpt,,,,,,,100M,,,,\nlimited,,2023-03-01,,['gpt-3.5-turbo dataset'],,,,,,custom,text; text,,gpt-3.5-turbo,USA,OpenAI,,,,unknown,model,https://platform.openai.com/docs/models/gpt-3-5,,,,,,,,,,,\nlimited,,2023-11-06,,[],GPT-4 Turbo is a more capable version of GPT-4 and has knowledge of world events up to April 2023. It has a 128k context window so it can fit the equivalent of more than 300 pages of text in a single prompt.,,,,,custom,text; text,unknown,GPT-4 Turbo,USA,OpenAI,,,,unknown,model,https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,,unknown,unknown,unknown,,,,,,,\nlimited,,2023-03-01,,[],,,,,,unknown,text,,gpt-3.5-turbo dataset,USA,OpenAI,,,[],unknown,dataset,https://platform.openai.com/docs/models/gpt-3-5,,,,,,,,,,,\nlimited,,2022-05-01,,[],,,,,,unknown,\"code, text\",,code-davinci-002 dataset,USA,OpenAI,,,[],unknown,dataset,https://platform.openai.com/docs/model-index-for-researchers,,,,,,,,,,,\nlimited,,2022-05-01,,['code-davinci-002 dataset'],,,,,,unknown,\"text; code, text\",,code-davinci-002,USA,OpenAI,,,,unknown,model,https://platform.openai.com/docs/model-index-for-researchers,,,,,,,,,,,\nlimited,,2022-05-01,,['code-davinci-002'],,,,,,unknown,text; text,,text-davinci-002,USA,OpenAI,,,,unknown,model,https://platform.openai.com/docs/model-index-for-researchers,,,,,,,,,,,\nlimited,,2022-11-30,,['text-davinci-002'],,,,,,unknown,text; text,,text-davinci-003,USA,OpenAI,,,,unknown,model,https://platform.openai.com/docs/model-index-for-researchers,,,,,,,,,,,\nopen,,2023-03-01,,['Whisper'],API to query OpenAI's Whisper model.,,,,,custom,,,Whisper API,USA,OpenAI,,,,,application,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,,,,,,\nopen,,2023-03-01,,['ChatGPT'],API to query OpenAI's ChatGPT model.,,,,,custom,,,ChatGPT API,USA,OpenAI,,,,,application,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,,,,,,\nopen,,2022-08-10,,['OpenAI toxicity classifier'],This endpoint provides OpenAI API developers with free access to GPT-based classifiers that detect undesired content—an instance of using AI systems to assist with human supervision of these systems.,,,,,custom,,,OpenAI Moderation API,USA,OpenAI,,,,,application,https://openai.com/blog/new-and-improved-content-moderation-tooling,,,,,,,,,,,\nclosed,,2023-01-18,,['OpenAI toxicity dataset'],,,,,,unknown,text; text,,OpenAI toxicity classifier,USA,OpenAI,,,,unknown,model,\"https://time.com/6247678/openai-chatgpt-kenya-workers/#:~:text=In%20a%20statement%2C%20an%20OpenAI,datasets%20of%20tools%20like%20ChatGPT.\",,,,,,,,,,,\nclosed,,2023-01-18,,[],,,,,,unknown,text,,OpenAI toxicity dataset,USA,OpenAI,,,[],unknown,dataset,\"https://time.com/6247678/openai-chatgpt-kenya-workers/#:~:text=In%20a%20statement%2C%20an%20OpenAI,datasets%20of%20tools%20like%20ChatGPT.\",,,,,,,,,,,\nlimited,,2023-02-03,,['Sage'],A chatbot language model available via Quora's Poe,,,,,unknown,,,Sage API,USA,OpenAI,,,,,application,https://quorablog.quora.com/Poe-1,,,,,,,,,,,\nlimited,,2023-02-03,,['Dragonfly'],A chatbot language model available via Quora's Poe,,,,,unknown,,,Dragonfly API,USA,OpenAI,,,,,application,https://quorablog.quora.com/Poe-1,,,,,,,,,,,\nlimited,,2023-02-03,,[],A chatbot language model available via Quora's Poe,,,,,unknown,text; text,,Sage,USA,OpenAI,,,,unknown,model,https://quorablog.quora.com/Poe-1,,,,,,,,,,,\nlimited,,2023-02-03,,[],A chatbot language model available via Quora's Poe,,,,,unknown,text; text,,Dragonfly,USA,OpenAI,,,,unknown,model,https://quorablog.quora.com/Poe-1,,,,,,,,,,,\nlimited,,2023-03-07,,['ChatGPT API'],\"The app integrates ChatGPT’s powerful AI technology to deliver instant conversation summaries, research tools, and writing assistance directly in Slack to help millions of companies work more productively.\",,,,,unknown,,,ChatGPT for Slack,unknown,\"OpenAI, Salesforce\",,,,,application,https://www.salesforce.com/news/stories/chatgpt-app-for-slack/,,,,,,,,,,,\nlimited,,2023-03-14,,[],,,,,,unknown,\"image, text; image, text\",,GPT-4,USA,OpenAI,,,,unknown,model,https://arxiv.org/abs/2303.08774,,,,,,,,,,,\nlimited,,2023-03-14,,['GPT-4'],\"GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses\",,,,,custom,,,GPT-4 API,USA,OpenAI,,,,,application,https://openai.com/product/gpt-4,,,,,,,,,,,\nlimited,,2023-08-28,,['GPT-4'],\"ChatGPT Enterprise offers enterprise-grade security and privacy, unlimited higher-speed GPT-4 access, longer context windows for processing longer inputs, advanced data analysis capabilities, and customization options compared to OpenAI's previous offerings.\",,,,,custom,,,ChatGPT Enterprise,USA,OpenAI,,,,,application,https://openai.com/enterprise,,,,,,,,,https://openai.com/policies/terms-of-use,,\nlimited,The model is capable of generating explicit content and the researchers found limited amount of spurious content generated.,2023-09-20,,\"['DALL·E 2 dataset', 'CLIP dataset', 'ChatGPT']\",\"DALL·E 3 is an artificial intelligence model that takes a text prompt and/or existing image as an input and generates a new image as an output The model is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October.\",,Feedback can be provided at openai.com,,\"The intended use of the DALL·E 3 Preview at this time is for personal, non-commercial exploration and research purposes by people who are interested in understanding the potential uses of these capabilities\",custom,text; image,\"Uses of the model are monitored. In the preview version, any user can flag content. The specific policies for monitoring are not disclosed, but possible measures include disabling of accounts violating the content\",DALL·E 3,USA,OpenAI,\"Use of the model is governed by the OpenAI Content Policy, which prohibits posting of G rated content. Users are not allowed to utilize the model in commercial products in the preview version.\",\"DALL·E 3 has mitigations to decline requests that ask for a public figure by name. We improved safety performance in risk areas like generation of public figures and harmful biases related to visual over/under-representation, in partnership with red teamers—domain experts who stress-test the model—to help inform our risk assessment and mitigation efforts in areas like propaganda and misinformation.\",,unknown,model,https://openai.com/dall-e-3,,unknown,unknown,unknown,,,,,,,\nlimited,,2024-02-15,,[],Sora is an AI model that can create realistic and imaginative scenes from text instructions.,,,,,unknown,\"text; image, video\",unknown,Sora,USA,OpenAI,,,,unknown,model,https://openai.com/sora,,unknown,unknown,unknown,,,,,,,\nlimited,\"When evaluated on standard performance benchmarks, achieves similar levels of performance to GPT-4 Turbo.\",2024-05-13,,[],\"GPT-4o is OpenAI's new flagship model, as of release, that can reason across audio, vision, and text in real time.\",,,,,unknown,\"audio, image, text, video; audio, image, text\",\"Internal monitoring of risk for non-text outputs before a public release (currently only image, text inputs and text outputs are available).\",GPT-4o,USA,OpenAI,,Training data filtering and post-training refinement act as additional guardrails for preventing harmful outputs.,,unknown,model,https://openai.com/index/hello-gpt-4o/,,unknown,unknown,unknown,,,,,,,\nopen,Evaluated across a range of standard datasets regarding multiple model capabilities like language comprehension and logical reasoning.,2023-11-06,,[],XVERSE is a multilingual large language model for over 40 languages.,,https://huggingface.co/xverse/XVERSE-65B/discussions,,,custom,text; text,unknown,XVERSE,USA,Xverse,,,,65B parameters (dense),model,https://github.com/xverse-ai/XVERSE-65B,https://huggingface.co/xverse/XVERSE-65B,unknown,unknown,unknown,,,,,,,\nlimited,,2023-03-07,,['ChatGPT API'],\"Brex Inc., a highly valued startup that makes software for finance professionals, is turning to the same artificial intelligence tool behind ChatGPT for a service that can answer questions about corporate budgets, policy and spending.\",,,,,custom,,,Brex Chat,USA,Brex,,,,,application,https://fortune.com/2023/03/07/cfo-chatbot-chatgpt-ai-brex-finance-software-startup-budgets-policies/,,,,,,,,,https://www.brex.com/legal/user-terms,,\nopen,Evaluated in comparison to LLaMA series models on standard language benchmarks.,2023-09-27,,[],Mistral is a compact language model.,,https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions,,,Apache 2.0,text; text,,Mistral,USA,Mistral AI,,,,7.3B parameters (dense),model,https://mistral.ai/news/announcing-mistral-7b/,https://huggingface.co/mistralai/Mistral-7B-v0.1,unknown,unknown,unknown,,,,,,,\nlimited,Evaluated on commonly used benchmarks in comparison to the current LLM leaders.,2024-02-26,,[],Mistral Large is Mistral AI’s new cutting-edge text generation model.,,,,,unknown,text; text,,Mistral Large,USA,Mistral AI,,,,unknown,model,https://mistral.ai/news/mistral-large/,,unknown,unknown,unknown,,,,,,,\nlimited,,2024-02-26,,\"['Mistral', 'Mistral Large']\",Le Chat is a first demonstration of what can be built with Mistral models and what can deployed in the business environment.,,,,,unknown,,,Le Chat,USA,Mistral AI,,,,,application,https://mistral.ai/news/le-chat-mistral/,,,,,,unknown,unknown,,https://mistral.ai/terms/#terms-of-use,unknown,\nopen,\"Performance of Codestral is evaluated in Python, SQL, and additional languages, C++, bash, Java, PHP, Typescript, and C#. Fill-in-the-middle performance is assessed using HumanEval pass@1 in Python, JavaScript, and Java.\",2024-05-29,,[],\"Codestral is an open-weight generative AI model explicitly designed for code generation tasks. It helps developers write and interact with code through a shared instruction and completion API endpoint. Mastering code and English, it can be used to design advanced AI applications for software developers. It is fluent in 80+ programming languages.\",,,,\"Helps developers write and interact with code, design advanced AI applications for software developers, integrated into LlamaIndex and LangChain for building applications, integrated in VSCode and JetBrains environments for code generation and interactive conversation.\",Mistral AI Non-Production License,text; code,unknown,Codestral,USA,Mistral AI,unknown,,,22B parameters,model,https://mistral.ai/news/codestral/,,unknown,unknown,unknown,,,,,,,\nopen,\"The model underwent an advanced fine-tuning and alignment phase. Its performance was evaluated using GPT4o as a judge on official references. It was compared to recent open-source pre-trained models Gemma 2 9B, Llama 3 8B regarding multilingual performance and coding accuracy. Tekken tokenizer's compression ability was compared with previous tokenizers like SentencePiece and the Llama 3 tokenizer.\",2024-07-18,,[],\"The Mistral NeMo model is a state-of-the-art 12B model built in collaboration with NVIDIA, offering a large context window of up to 128k tokens. The model is suitable for multilingual applications and exhibits excellent reasoning, world knowledge, and coding accuracy. It's easy to use and a drop-in replacement in a system that uses Mistral 7B. The model uses a new tokenizer, Tekken, based on Tiktoken, which is trained on over 100 languages. It compresses natural language text and source code more efficiently than previously used tokenizers.\",,\"Problems should be reported to the Mistral AI team, though the specific method of reporting is unknown.\",,\"The model can be used for multilingual applications, understanding and generating natural language as well as source code, handling multi-turn conversations, and providing more precise instruction following.\",Apache 2.0,text; text,Unknown,Mistral NeMo,USA,\"Mistral AI, NVIDIA\",Unknown,The model underwent an advanced fine-tuning and alignment phase. Various measures such as accuracy comparisons with other models and instruction-tuning were implemented to ensure its quality.,,12B parameters,model,https://mistral.ai/news/mistral-nemo/,unknown,Unknown,\"NVIDIA hardware, specifics unknown\",Unknown,,,,,,,\nopen,\"The model has been tested for in-context retrieval capabilities up to 256k tokens. It has been created with advanced code and reasoning capabilities, which enables it to perform on par with SOTA transformer-based models.\",2024-07-16,,[],\"Codestral Mamba is a Mamba2 language model that is specialized in code generation. It has a theoretical ability to model sequences of infinite length and offers linear time inference. This makes it effective for extensive user engagement and is especially practical for code productivity use cases. Codestral Mamba can be deployed using the mistral-inference SDK or through TensorRT-LLM, and users can download the raw weights from HuggingFace.\",,Problems with the model can be reported through the organization's website.,,The model is intended for code generation and can be utilized as a local code assistant.,Apache 2.0,text; text,Unknown,Codestral Mamba,USA,Mistral AI,Unknown,Unknown,,7.3B parameters,model,https://mistral.ai/news/codestral-mamba/,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,The model's performance has been evaluated on the MATH and MMLU industry-standard benchmarks. It scored notably higher on both these tests than the base model Mistral 7B.,2024-07-16,,['Mistral 7B'],\"MathΣtral is a 7B model designed for math reasoning and scientific discovery. It achieves state-of-the-art reasoning capacities in its size category across various industry-standard benchmarks. This model stands on the shoulders of Mistral 7B and specializes in STEM subjects. It is designed to assist efforts in advanced mathematical problems requiring complex, multi-step logical reasoning. It particularly achieves 56.6% on MATH and 63.47% on MMLU.\",,Feedback is likely expected to be given through the HuggingFace platform where the model's weights are hosted or directly to the Mistral AI team.,,\"The model is intended for use in solving advanced mathematical problems requiring complex, multi-step logical reasoning or any math-related STEM subjects challenges.\",Apache 2.0,text; text,Unknown,MathΣtral,USA,Mistral AI,Unknown,This model has been fine-tuned from a base model and its inference and performance have been tested on several industry benchmarks.,,7B parameters,model,https://mistral.ai/news/mathstral/,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,\"We evaluate Pixtral Large against frontier models on a set of standard multimodal benchmarks, through a common testing harness.\",2024-11-18,,['Mistral Large 2'],\"Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.\",,unknown,,\"RAG and agentic workflows, making it a suitable choice for enterprise use cases such as knowledge exploration and sharing, semantic understanding of documents, task automation, and improved customer experiences.\",\"Mistral Research License (MRL), Mistral Commercial License\",\"text, image; text\",unknown,Pixtral Large,USA,Mistral AI,unknown,unknown,,124B parameters,model,https://mistral.ai/news/pixtral-large/,unknown,unknown,unknown,unknown,,,,,,,\nclosed,Benchmarks We have benchmarked the new Codestral with the leading sub-100B parameter coding models that are widely considered to be best-in-class for FIM tasks.,2025-01-13,,[],\"Lightweight, fast, and proficient in over 80 programming languages, Codestral is optimized for low-latency, high-frequency usecases and supports tasks such as fill-in-the-middle (FIM), code correction and test generation.\",,We can’t wait to hear your experience! Try it now Try it on Continue.dev with VsCode or JetBrains,,\"Highly capable coding companion, regularly boosting productivity several times over.\",unknown,text; text,unknown,Codestral 25.01,USA,Mistral AI,unknown,unknown,,unknown,model,https://mistral.ai/news/codestral-2501/,unknown,unknown,unknown,unknown,,,,,,,\nopen,\"Evaluated on the OpenLLM leaderboard, releasing at rank number 4 on the leaderboard.\",2023-08-11,,\"['LLaMA 2', 'Guanaco LLaMA dataset']\",\"GodziLLa 2 is an experimental combination of various proprietary LoRAs from Maya Philippines and Guanaco LLaMA 2 1K dataset, with LLaMA 2.\",,,,,LLaMA 2,text; text,unknown,GodziLLa 2,Philippines,Maya Philippines,,,,70B parameters (dense),model,https://huggingface.co/MayaPH/GodziLLa2-70B,https://huggingface.co/MayaPH/GodziLLa2-70B,unknown,unknown,unknown,,,,,,,\nlimited,\"We evaluated voyage-code-3 using an enhanced suite of evaluation datasets designed to address the shortcomings of existing benchmarks and deliver practical, robust results.\",2024-12-04,,[],\"Introducing voyage-code-3, our next-generation embedding model optimized for code retrieval.\",,\"If you’re also interested in fine-tuned embedding models, we’d love to hear from you—please email us at contact@voyageai.com.\",,optimized for code retrieval,unknown,unknown,unknown,voyage-code-3,USA,Voyage AI,unknown,unknown,,unknown,model,https://blog.voyageai.com/2024/12/04/voyage-code-3/,unknown,unknown,unknown,unknown,,,,,,,\nopen,,2022-06-22,,\"['The Pile', 'Yandex Russian Pretraining Dataset']\",YaLM is a 100B parameter autoregressive model trained on 25% English and 75% Russian text.,,,,,Apache 2.0,text; text,,YaLM,Russia,Yandex,,,,100B parameters (dense),model,https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6,,,Yandex 800 A100 Cluster,,,,,,,,\nopen,,2022-06-23,,['YaLM'],Yandex is a search engine and web portal. Yandex offers internet search and other services,,,,,custom,,,Yandex Search,Russia,Yandex,,,,,application,https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6,,,,,,,,,https://yandex.com/legal/browser_agreement/,,\nopen,Evaluated in comparison with ChatGPT and Stanford Alpaca.,2023-04-03,,\"['LLaMA', 'web-scraped dialogue data']\",A relatively small chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web.,,https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g/discussions,,academic research,Apache 2.0,text; text,,Koala,USA,Berkeley,,,,13B parameters (dense),model,https://bair.berkeley.edu/blog/2023/04/03/koala/,https://huggingface.co/TheBloke/koala-7B-GPTQ-4bit-128g,,8 A100 GPUs,6 hours,,,,,,,\nopen,Evaluated using AST sub-tree matching technique and compared to other models in terms of API functionality accuracy.,2023-05-24,,\"['LLaMA', 'Gorilla document retriever']\",Gorilla is a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls.,,,,In conjunction with a LLM to improve its capability for using API calls.,Apache 2.0,text; API,,Gorilla,USA,Berkeley,,\"No specific quality control is mentioned in model training, though details on data processing and collection are provided in the paper.\",,7B parameters (dense),model,https://arxiv.org/pdf/2305.15334v1.pdf,,,,,,,,,,,\nopen,Evaluated on wide range of tasks using own evaluation benchmarks.,2023-05-03,,['RedPajama'],OpenLlama is an open source reproduction of Meta's LLaMA model.,,,,,Apache 2.0,text; text,,OpenLLaMA,USA,Berkeley,,,,17B parameters (dense),model,https://github.com/openlm-research/open_llama,,unknown,,unknown,,,,,,,\nopen,,2023-03-01,,['ChatGPT API'],\"Quizlet is introducing Q-Chat, a fully-adaptive AI tutor that engages students with adaptive questions based on relevant study materials delivered through a fun chat experience.\",,,,,,,,Q-Chat,USA,Quizlet,,,,,application,https://openai.com/blog/introducing-chatgpt-and-whisper-apis,,,,,,,,,https://quizlet.com/tos,,\nopen,Evaluated on standard and ocean science benchmarks in comparison to other similar-sized models.,2024-02-06,,[],OceanGPT is the first-ever LLM in the ocean domain and displays expertise in various ocean science tasks.,,https://huggingface.co/zjunlp/OceanGPT-7b/discussions,,,MIT,text; text,unknown,OceanGPT,China,Zhejiang University,,,,7B parameters (dense),model,https://www.zjukg.org/project/OceanGPT/,https://huggingface.co/zjunlp/OceanGPT-7b,unknown,6 A800 NVIDIA GPUs,7 days,,,,,,,\nopen,The model was tested extensively internally and externally. It has developed and implemented numerous safeguards to prevent harms. They have also received user feedback to make continuous improvements.,2024-06-12,,[],\"Stable Diffusion 3 Medium is Stability AI’s advanced text-to-image open model. It's suitable for running on consumer PCs and laptops as well as enterprise-tier GPUs. The model is known for its overall Quality and Photorealism, prompt understanding, typography, being resource-efficient, and being fine-tuned. The model in collaboration with NVIDIA and AMD has enhanced performance.\",,\"Feedback can be given through Twitter, Instagram, LinkedIn, or Discord Community.\",,\"The model can be used by professional artists, designers, developers, and AI enthusiasts for creating high-quality image outputs from text inputs.\",Stability Community License,text; image,\"Continuous collaboration with researchers, experts, and the community to ensure that the model is being used appropriately.\",Stable Diffusion 3 Medium,UK,Stability AI,Large-scale commercial use requires contacting the organization for licensing details. The model should not be used for any purpose that does not adhere to the usage guidelines.,\"They have conducted extensive internal and external testing of this model and have implemented numerous safeguards to prevent harms. Safety measures were implemented from the start of training the model and continued throughout testing, evaluation, and deployment.\",,2B parameters,model,https://stability.ai/news/stable-diffusion-3-medium,unknown,Unknown,unknown,Unknown,,,,,,,\nopen,\"Consistency across the spatial and temporal axes greatly improves with this model. Stable Video 4D is able to generate novel view videos that are more detailed, faithful to the input video, and are consistent across frames and views compared to existing works.\",2024-07-24,,['Stable Video Diffusion Model'],\"Stable Video 4D is our latest AI model for dynamic multi-angle video generation. It allows users to upload a single video and receive novel-view videos of eight new angles/views. This advancement moves from image-based video generation to full 3D dynamic video synthesis. Users can specify camera angles, tailoring the output to meet specific creative needs. The model is currently available on Hugging Face and can generate 5-frame videos across the 8 views in about 40 seconds.\",,\"Feedback and reports about the progress should be shared via their social channels like Twitter, Instagram, LinkedIn or their Discord Community.\",,\"This model can be used for creating dynamic multi-angle videos, with applications in game development, video editing, and virtual reality. It allows professionals in these fields to visualize objects from multiple angles, enhancing the realism and immersion of their products.\",Stability Community License,video; video,Continuous monitoring by the Stability AI team for improvements and refinements.,Stable Video 4D,UK,Stability AI,Unknown,The Stability AI team is dedicated to continuous innovation and exploration of real-world use-cases for this model and others. They are actively working to refine and optimize the model beyond the current synthetic datasets it has been trained on.,,Unknown,model,https://stability.ai/news/stable-video-4d,unknown,Unknown,Unknown,Unknown,,,,,,,\nopen,\"The model was evaluated on its ability to quickly and accurately transform a single image into a detailed 3D asset. This evaluation highlighted the model's unprecedented speed and quality, marking it as a valuable tool for rapid prototyping in 3D work. Compared to the previous SV3D model, Stable Fast 3D offers significantly reduced inference times--0.5 seconds versus 10 minutes--while maintaining high-quality output.\",2024-08-01,,['TripoSR'],\"Stable Fast 3D is a ground-breaking model in 3D asset generation technology. It can transform a single input image into a highly detailed 3D asset in around half a second, setting new standards in terms of speed and quality in the realm of 3D reconstruction. Users start the process by uploading an image of an object. Stable Fast 3D then swiftly generates a complete 3D asset, which includes, UV unwrapped mesh, material parameters, albedo colors with reduced illumination bake-in, and optional quad or triangle remeshing. This model has various applications, notably for game and virtual reality developers, as well as professionals in retail, architecture, design, and other graphic-intensive professions.\",,Information on any downstream issues with the model can be reported to Stability AI through their support request system.,,\"The model is intended for use in game development, virtual reality, retail, architecture, design and other graphically intense professions. It allows for rapid prototyping in 3D work, assisting both enterprises and indie developers. It's also used in movie production for creating static assets for games and 3D models for e-commerce, as well as fast model creation for AR/VR.\",Stability Community License,image; 3D,Unknown,Stable Fast 3D,UK,Stability AI,Use by individuals or organizations with over $1M in annual revenue without obtaining an Enterprise License.,Unknown,,unknown,model,https://stability.ai/news/introducing-stable-fast-3d,https://huggingface.co/stabilityai/stable-fast-3d,Unknown,unknown,Unknown,,,,,,,\nopen,Our analysis shows that Stable Diffusion 3.5 Large leads the market in prompt adherence and rivals much larger models in image quality.,2023-10-29,,[],\"Stable Diffusion 3.5 reflects our commitment to empower builders and creators with tools that are widely accessible, cutting-edge, and free for most use cases.\",,We look forward to hearing your feedback on Stable Diffusion 3.5 and seeing what you create with the models.,,This model is ideal for professional use cases at 1 megapixel resolution.,Stability AI Community,text; image,unknown,Stable Diffusion 3.5,UK,Stability AI,unknown,\"We believe in safe, responsible AI practices and take deliberate measures to ensure Integrity starts at the early stages of development.\",,8.1B parameters,model,https://stability.ai/news/introducing-stable-diffusion-3-5,unknown,unknown,unknown,unknown,,,,,,,\nclosed,,2021-05-21,,[],HyperClova is an autoregressive language model,,,,,unknown,text; text,,HyperCLOVA,South Korea,NAVER,,,,82B parameters,model,https://arxiv.org/abs/2109.04650,,,1024 A100 GPUs,130.4 days,,,,,,,\nlimited,\"Evaluated on English and Korean benchmarks in comparison to open source English and multilingual LLMs, with HyperCLOVA X (closed) surpassing the models compared.\",2024-04-13,,[],\"HyperCLOVA X is a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding.\",,,,,unknown,text; text,,HyperCLOVA X,South Korea,NAVER,,,,unknown,model,https://arxiv.org/pdf/2404.01954,,unknown,unknown,unknown,,,,,,,\nlimited,,,,['Anthropic API'],An online tutoring solution to help students achieve academic success.,,,,,unknown,,,Juni Tutor Bot,USA,Juni Learning,,,,,application,https://junilearning.com/,,,,,,,,,,,\n"
  },
  {
    "path": "scripts/collate.py",
    "content": "#!/usr/bin/env python\nfrom pathlib import Path\n\nimport yaml\nfrom pandas import DataFrame\n\n\nASSET_PATH = Path(\"./assets\")\n\n\ndef scalar(val):\n    \"\"\"Clean scalar values whenever possible.\"\"\"\n    if not isinstance(val, dict):\n        return val\n\n    if \"explanation\" in val:\n        return val.get(\"value\")\n\n    if not val:\n        return None\n\n    return val\n\n\ndef collate_assets():\n    \"\"\"Store all assets together in a single tabular CSV file.\"\"\"\n    paths = ASSET_PATH.glob(\"*.yaml\")\n    assets = [asset for fp in paths for asset in yaml.safe_load(open(fp))]\n    df = DataFrame.from_records(assets)\n\n    # Make data consistent within columns (cannot have scalar and dicts in same column)\n    df = df.apply(lambda ser: ser.apply(scalar))\n\n    # Cleanup missing data representations\n    df = df.replace(\"none\", None)\n\n    non_categorical = [\"created_date\"]\n    for col in non_categorical:\n        df[col] = df[col].replace(\"unknown\", None)\n\n    df.to_csv(\"./resources/all_assets.csv\", index=False)\n\n\nif __name__ == \"__main__\":\n    collate_assets()\n"
  },
  {
    "path": "server.py",
    "content": "#!/usr/bin/env python\n\nfrom http import server\n\nclass MyHTTPRequestHandler(server.SimpleHTTPRequestHandler):\n    def end_headers(self):\n        self.send_my_headers()\n        server.SimpleHTTPRequestHandler.end_headers(self)\n\n    def send_my_headers(self):\n        self.send_header(\"Cache-Control\", \"no-cache, no-store, must-revalidate\")\n        self.send_header(\"Pragma\", \"no-cache\")\n        self.send_header(\"Expires\", \"0\")\n\nif __name__ == '__main__':\n    server.test(HandlerClass=MyHTTPRequestHandler)\n"
  },
  {
    "path": "yamlfix.toml",
    "content": "[tool.yamlfix]\nline_length         = 200\nnone_representation = \"Unknown\"\nflow_style_sequence = \"false\"\nexplicit_start      = \"true\"\n"
  }
]