Full Code of sileod/tasksource for AI

main ef6535aebaed cached

26 files

491.5 KB

102.1k tokens

126 symbols

1 requests

Download .txt

Showing preview only (508K chars total). Download the full file or copy to clipboard to get everything.

Repository: sileod/tasksource
Branch: main
Commit: ef6535aebaed
Files: 26
Total size: 491.5 KB

Directory structure:
gitextract__ri1waap/

├── .github/
│   ├── scripts/
│   │   └── release.py
│   └── workflows/
│       ├── python-publish.yml
│       └── release.yml
├── .gitignore
├── CITATION.cff
├── LICENSE
├── README.md
├── mtasks.md
├── pyproject.toml
├── setup.cfg
├── src/
│   └── tasksource/
│       ├── .ipynb_checkpoints/
│       │   ├── access-checkpoint.py
│       │   ├── preprocess-checkpoint.py
│       │   ├── recast-checkpoint.py
│       │   └── tasks-checkpoint.py
│       ├── __init__.py
│       ├── access.py
│       ├── metadata/
│       │   ├── __init__.py
│       │   ├── bigbench_groups.py
│       │   ├── blimp_groups.py
│       │   ├── original.txt
│       │   └── popularity.py
│       ├── mtasks.py
│       ├── preprocess.py
│       ├── recast.py
│       └── tasks.py
└── tasks.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/scripts/release.py
================================================
#!/usr/bin/env python3
import json
import subprocess


def get_last_version() -> str:
    """Return the version number of the last release."""
    json_string = (
        subprocess.run(
            ["gh", "release", "view", "--json", "tagName"],
            check=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )
        .stdout.decode("utf8")
        .strip()
    )

    return json.loads(json_string)["tagName"]


def bump_patch_number(version_number: str) -> str:
    """Return a copy of `version_number` with the patch number incremented."""
    major, minor, patch = version_number.split(".")
    return f"{major}.{minor}.{int(patch) + 1}"


def create_new_patch_release():
    """Create a new patch release on GitHub."""
    try:
        last_version_number = get_last_version()
    except subprocess.CalledProcessError as err:
        if err.stderr.decode("utf8").startswith("HTTP 404:"):
            # The project doesn't have any releases yet.
            new_version_number = "0.0.1"
        else:
            raise
    else:
        new_version_number = bump_patch_number(last_version_number)

    subprocess.run(
        ["gh", "release", "create", "--generate-notes", new_version_number],
        check=True,
    )


if __name__ == "__main__":
    create_new_patch_release()


================================================
FILE: .github/workflows/python-publish.yml
================================================
name: Publish to PyPI.org
on:
  release:
    types: [published]
jobs:
  pypi:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3
        with:
          fetch-depth: 0
      - run: python3 -m pip install --upgrade build && python3 -m build
      - name: Publish package
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          password: ${{ secrets.PYPI_API_TOKEN }}


================================================
FILE: .github/workflows/release.yml
================================================
name: Create a new patch release
on: workflow_dispatch
jobs:
  github:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Create new patch release
        run: .github/scripts/release.py
        env:
          GITHUB_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover

# Translations
*.mo
*.pot

# Django stuff:
*.log

# Sphinx documentation
docs/_build/

# PyBuilder
target/


================================================
FILE: CITATION.cff
================================================
cff-version: 1.1.0
message: "If you use this work, please cite it as below."
authors:
  - family-names: "Sileo"
    given-names: "Damien"
title: "tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation"
version: "1.0.0"
date-released: 2023-01-01
url: "https://arxiv.org/abs/2301.05948"


================================================
FILE: LICENSE
================================================
Attribution 4.0 International

=======================================================================

Creative Commons Corporation ("Creative Commons") is not a law firm and
does not provide legal services or legal advice. Distribution of
Creative Commons public licenses does not create a lawyer-client or
other relationship. Creative Commons makes its licenses and related
information available on an "as-is" basis. Creative Commons gives no
warranties regarding its licenses, any material licensed under their
terms and conditions, or any related information. Creative Commons
disclaims all liability for damages resulting from their use to the
fullest extent possible.

Using Creative Commons Public Licenses

Creative Commons public licenses provide a standard set of terms and
conditions that creators and other rights holders may use to share
original works of authorship and other material subject to copyright
and certain other rights specified in the public license below. The
following considerations are for informational purposes only, are not
exhaustive, and do not form part of our licenses.

     Considerations for licensors: Our public licenses are
     intended for use by those authorized to give the public
     permission to use material in ways otherwise restricted by
     copyright and certain other rights. Our licenses are
     irrevocable. Licensors should read and understand the terms
     and conditions of the license they choose before applying it.
     Licensors should also secure all rights necessary before
     applying our licenses so that the public can reuse the
     material as expected. Licensors should clearly mark any
     material not subject to the license. This includes other CC-
     licensed material, or material used under an exception or
     limitation to copyright. More considerations for licensors:
	wiki.creativecommons.org/Considerations_for_licensors

     Considerations for the public: By using one of our public
     licenses, a licensor grants the public permission to use the
     licensed material under specified terms and conditions. If
     the licensor's permission is not necessary for any reason--for
     example, because of any applicable exception or limitation to
     copyright--then that use is not regulated by the license. Our
     licenses grant only permissions under copyright and certain
     other rights that a licensor has authority to grant. Use of
     the licensed material may still be restricted for other
     reasons, including because others have copyright or other
     rights in the material. A licensor may make special requests,
     such as asking that all changes be marked or described.
     Although not required by our licenses, you are encouraged to
     respect those requests where reasonable. More_considerations
     for the public:
	wiki.creativecommons.org/Considerations_for_licensees

=======================================================================

Creative Commons Attribution 4.0 International Public License

By exercising the Licensed Rights (defined below), You accept and agree
to be bound by the terms and conditions of this Creative Commons
Attribution 4.0 International Public License ("Public License"). To the
extent this Public License may be interpreted as a contract, You are
granted the Licensed Rights in consideration of Your acceptance of
these terms and conditions, and the Licensor grants You such rights in
consideration of benefits the Licensor receives from making the
Licensed Material available under these terms and conditions.


Section 1 -- Definitions.

  a. Adapted Material means material subject to Copyright and Similar
     Rights that is derived from or based upon the Licensed Material
     and in which the Licensed Material is translated, altered,
     arranged, transformed, or otherwise modified in a manner requiring
     permission under the Copyright and Similar Rights held by the
     Licensor. For purposes of this Public License, where the Licensed
     Material is a musical work, performance, or sound recording,
     Adapted Material is always produced where the Licensed Material is
     synched in timed relation with a moving image.

  b. Adapter's License means the license You apply to Your Copyright
     and Similar Rights in Your contributions to Adapted Material in
     accordance with the terms and conditions of this Public License.

  c. Copyright and Similar Rights means copyright and/or similar rights
     closely related to copyright including, without limitation,
     performance, broadcast, sound recording, and Sui Generis Database
     Rights, without regard to how the rights are labeled or
     categorized. For purposes of this Public License, the rights
     specified in Section 2(b)(1)-(2) are not Copyright and Similar
     Rights.

  d. Effective Technological Measures means those measures that, in the
     absence of proper authority, may not be circumvented under laws
     fulfilling obligations under Article 11 of the WIPO Copyright
     Treaty adopted on December 20, 1996, and/or similar international
     agreements.

  e. Exceptions and Limitations means fair use, fair dealing, and/or
     any other exception or limitation to Copyright and Similar Rights
     that applies to Your use of the Licensed Material.

  f. Licensed Material means the artistic or literary work, database,
     or other material to which the Licensor applied this Public
     License.

  g. Licensed Rights means the rights granted to You subject to the
     terms and conditions of this Public License, which are limited to
     all Copyright and Similar Rights that apply to Your use of the
     Licensed Material and that the Licensor has authority to license.

  h. Licensor means the individual(s) or entity(ies) granting rights
     under this Public License.

  i. Share means to provide material to the public by any means or
     process that requires permission under the Licensed Rights, such
     as reproduction, public display, public performance, distribution,
     dissemination, communication, or importation, and to make material
     available to the public including in ways that members of the
     public may access the material from a place and at a time
     individually chosen by them.

  j. Sui Generis Database Rights means rights other than copyright
     resulting from Directive 96/9/EC of the European Parliament and of
     the Council of 11 March 1996 on the legal protection of databases,
     as amended and/or succeeded, as well as other essentially
     equivalent rights anywhere in the world.

  k. You means the individual or entity exercising the Licensed Rights
     under this Public License. Your has a corresponding meaning.


Section 2 -- Scope.

  a. License grant.

       1. Subject to the terms and conditions of this Public License,
          the Licensor hereby grants You a worldwide, royalty-free,
          non-sublicensable, non-exclusive, irrevocable license to
          exercise the Licensed Rights in the Licensed Material to:

            a. reproduce and Share the Licensed Material, in whole or
               in part; and

            b. produce, reproduce, and Share Adapted Material.

       2. Exceptions and Limitations. For the avoidance of doubt, where
          Exceptions and Limitations apply to Your use, this Public
          License does not apply, and You do not need to comply with
          its terms and conditions.

       3. Term. The term of this Public License is specified in Section
          6(a).

       4. Media and formats; technical modifications allowed. The
          Licensor authorizes You to exercise the Licensed Rights in
          all media and formats whether now known or hereafter created,
          and to make technical modifications necessary to do so. The
          Licensor waives and/or agrees not to assert any right or
          authority to forbid You from making technical modifications
          necessary to exercise the Licensed Rights, including
          technical modifications necessary to circumvent Effective
          Technological Measures. For purposes of this Public License,
          simply making modifications authorized by this Section 2(a)
          (4) never produces Adapted Material.

       5. Downstream recipients.

            a. Offer from the Licensor -- Licensed Material. Every
               recipient of the Licensed Material automatically
               receives an offer from the Licensor to exercise the
               Licensed Rights under the terms and conditions of this
               Public License.

            b. No downstream restrictions. You may not offer or impose
               any additional or different terms or conditions on, or
               apply any Effective Technological Measures to, the
               Licensed Material if doing so restricts exercise of the
               Licensed Rights by any recipient of the Licensed
               Material.

       6. No endorsement. Nothing in this Public License constitutes or
          may be construed as permission to assert or imply that You
          are, or that Your use of the Licensed Material is, connected
          with, or sponsored, endorsed, or granted official status by,
          the Licensor or others designated to receive attribution as
          provided in Section 3(a)(1)(A)(i).

  b. Other rights.

       1. Moral rights, such as the right of integrity, are not
          licensed under this Public License, nor are publicity,
          privacy, and/or other similar personality rights; however, to
          the extent possible, the Licensor waives and/or agrees not to
          assert any such rights held by the Licensor to the limited
          extent necessary to allow You to exercise the Licensed
          Rights, but not otherwise.

       2. Patent and trademark rights are not licensed under this
          Public License.

       3. To the extent possible, the Licensor waives any right to
          collect royalties from You for the exercise of the Licensed
          Rights, whether directly or through a collecting society
          under any voluntary or waivable statutory or compulsory
          licensing scheme. In all other cases the Licensor expressly
          reserves any right to collect such royalties.


Section 3 -- License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the
following conditions.

  a. Attribution.

       1. If You Share the Licensed Material (including in modified
          form), You must:

            a. retain the following if it is supplied by the Licensor
               with the Licensed Material:

                 i. identification of the creator(s) of the Licensed
                    Material and any others designated to receive
                    attribution, in any reasonable manner requested by
                    the Licensor (including by pseudonym if
                    designated);

                ii. a copyright notice;

               iii. a notice that refers to this Public License;

                iv. a notice that refers to the disclaimer of
                    warranties;

                 v. a URI or hyperlink to the Licensed Material to the
                    extent reasonably practicable;

            b. indicate if You modified the Licensed Material and
               retain an indication of any previous modifications; and

            c. indicate the Licensed Material is licensed under this
               Public License, and include the text of, or the URI or
               hyperlink to, this Public License.

       2. You may satisfy the conditions in Section 3(a)(1) in any
          reasonable manner based on the medium, means, and context in
          which You Share the Licensed Material. For example, it may be
          reasonable to satisfy the conditions by providing a URI or
          hyperlink to a resource that includes the required
          information.

       3. If requested by the Licensor, You must remove any of the
          information required by Section 3(a)(1)(A) to the extent
          reasonably practicable.

       4. If You Share Adapted Material You produce, the Adapter's
          License You apply must not prevent recipients of the Adapted
          Material from complying with this Public License.


Section 4 -- Sui Generis Database Rights.

Where the Licensed Rights include Sui Generis Database Rights that
apply to Your use of the Licensed Material:

  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
     to extract, reuse, reproduce, and Share all or a substantial
     portion of the contents of the database;

  b. if You include all or a substantial portion of the database
     contents in a database in which You have Sui Generis Database
     Rights, then the database in which You have Sui Generis Database
     Rights (but not its individual contents) is Adapted Material; and

  c. You must comply with the conditions in Section 3(a) if You Share
     all or a substantial portion of the contents of the database.

For the avoidance of doubt, this Section 4 supplements and does not
replace Your obligations under this Public License where the Licensed
Rights include other Copyright and Similar Rights.


Section 5 -- Disclaimer of Warranties and Limitation of Liability.

  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.

  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.

  c. The disclaimer of warranties and limitation of liability provided
     above shall be interpreted in a manner that, to the extent
     possible, most closely approximates an absolute disclaimer and
     waiver of all liability.


Section 6 -- Term and Termination.

  a. This Public License applies for the term of the Copyright and
     Similar Rights licensed here. However, if You fail to comply with
     this Public License, then Your rights under this Public License
     terminate automatically.

  b. Where Your right to use the Licensed Material has terminated under
     Section 6(a), it reinstates:

       1. automatically as of the date the violation is cured, provided
          it is cured within 30 days of Your discovery of the
          violation; or

       2. upon express reinstatement by the Licensor.

     For the avoidance of doubt, this Section 6(b) does not affect any
     right the Licensor may have to seek remedies for Your violations
     of this Public License.

  c. For the avoidance of doubt, the Licensor may also offer the
     Licensed Material under separate terms or conditions or stop
     distributing the Licensed Material at any time; however, doing so
     will not terminate this Public License.

  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
     License.


Section 7 -- Other Terms and Conditions.

  a. The Licensor shall not be bound by any additional or different
     terms or conditions communicated by You unless expressly agreed.

  b. Any arrangements, understandings, or agreements regarding the
     Licensed Material not stated herein are separate from and
     independent of the terms and conditions of this Public License.


Section 8 -- Interpretation.

  a. For the avoidance of doubt, this Public License does not, and
     shall not be interpreted to, reduce, limit, restrict, or impose
     conditions on any use of the Licensed Material that could lawfully
     be made without permission under this Public License.

  b. To the extent possible, if any provision of this Public License is
     deemed unenforceable, it shall be automatically reformed to the
     minimum extent necessary to make it enforceable. If the provision
     cannot be reformed, it shall be severed from this Public License
     without affecting the enforceability of the remaining terms and
     conditions.

  c. No term or condition of this Public License will be waived and no
     failure to comply consented to unless expressly agreed to by the
     Licensor.

  d. Nothing in this Public License constitutes or may be interpreted
     as a limitation upon, or waiver of, any privileges and immunities
     that apply to the Licensor or You, including from the legal
     processes of any jurisdiction or authority.


=======================================================================

Creative Commons is not a party to its public
licenses. Notwithstanding, Creative Commons may elect to apply one of
its public licenses to material it publishes and in those instances
will be considered the “Licensor.” The text of the Creative Commons
public licenses is dedicated to the public domain under the CC0 Public
Domain Dedication. Except for the limited purpose of indicating that
material is shared under a Creative Commons public license or as
otherwise permitted by the Creative Commons policies published at
creativecommons.org/policies, Creative Commons does not authorize the
use of the trademark "Creative Commons" or any other trademark or logo
of Creative Commons without its prior written consent including,
without limitation, in connection with any unauthorized modifications
to any of its public licenses or any other arrangements,
understandings, or agreements concerning use of licensed material. For
the avoidance of doubt, this paragraph does not form part of the
public licenses.

Creative Commons may be contacted at creativecommons.org.


================================================
FILE: README.md
================================================
## tasksource ![](https://aeiljuispo.cloudimg.io/v7/https://s3.amazonaws.com/moonup/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg?w=20&h=20&f=face) 600+ curated datasets and preprocessings for instant and interchangeable use

Huggingface Datasets is an excellent library, but it lacks standardization, and datasets often require preprocessing work to be used interchangeably.
`tasksource` streamlines interchangeable datasets usage to scale evaluation or multi-task learning.

Each dataset is standardized to a `MultipleChoice`, `Classification`, or `TokenClassification` template with canonical fields. We focus on discriminative tasks (= with negative examples or classes) for our annotations but also provide a `SequenceToSequence` template. All implemented preprocessings are in [tasks.py](https://github.com/sileod/tasksource/blob/main/src/tasksource/tasks.py) or [tasks.md](https://github.com/sileod/tasksource/blob/main/tasks.md). A preprocessing is a function that accepts a dataset and returns the standardized dataset. Preprocessing code is concise and human-readable.

### Installation and usage:
`pip install tasksource`
```python
from tasksource import list_tasks, load_task
df = list_tasks(multilingual=False) # takes some time

for id in df[df.task_type=="MultipleChoice"].id:
    dataset = load_task(id) # all yielded datasets can be used interchangeably
```

Browse the 500+ curated tasks in tasks.md (200+ MultipleChoice tasks, 200+ Classification tasks), and feel free to request a new task. Datasets are downloaded to `$HF_DATASETS_CACHE` (like any Hugging Face dataset), so ensure you have more than 100GB of space available.

You can now also use:
```python
load_dataset("tasksource/data", "glue/rte",max_rows=30_000)
```

### Pretrained models:

Text encoder pretrained on tasksource reached state-of-the-art results: [🤗/deberta-v3-base-tasksource-nli](https://hf.co/sileod/deberta-v3-base-tasksource-nli)

Tasksource pretraining is notably helpful for RLHF reward modeling or any kind of classification, including zero-shot. You can also find a large and a multilingual version.

### tasksource-instruct

The repo also contains some recasting code to convert tasksource datasets to instructions, providing one of the richest instruction-tuning datasets:
[🤗/tasksource-instruct-v0](https://hf.co/datasets/tasksource/tasksource-instruct-v0)


### tasksource-label-nli

We also recast all classification tasks as natural language inference, to improve entailment-based zero-shot classification detection:
[🤗/zero-shot-label-nli](https://huggingface.co/datasets/tasksource/zero-shot-label-nli)

### Write and use custom preprocessings

```python
from tasksource import MultipleChoice

codah = MultipleChoice('question_propmt',choices_list='candidate_answers',
    labels='correct_answer_idx',
    dataset_name='codah', config_name='codah')
    
winogrande = MultipleChoice('sentence',['option1','option2'],'answer',
    dataset_name='winogrande',config_name='winogrande_xl',
    splits=['train','validation',None]) # test labels are not usable
    
tasks = [winogrande.load(), codah.load()]) #  Aligned datasets (same columns) can be used interchangably  
```

 ### Citation and contact

For more details, refer to this [article:](https://arxiv.org/abs/2301.05948) 
```bib
@inproceedings{sileo-2024-tasksource,
    title = "tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework",
    author = "Sileo, Damien",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.1361",
    pages = "15655--15684",
}
```
For help integrating tasksource into your experiments, please contact [damien.sileo@inria.fr](mailto:damien.sileo@inria.fr).

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     


================================================
FILE: mtasks.md
================================================
|     | id                                                           | dataset_name                                | config_name                    | task_name   | preprocessing_name      | task_type           |
|----:|:-------------------------------------------------------------|:--------------------------------------------|:-------------------------------|:------------|:------------------------|:--------------------|
|   0 | xnli/ru                                                      | metaeval/xnli                               | ru                             |             | xnli                    | Classification      |
|   1 | xnli/tr                                                      | metaeval/xnli                               | tr                             |             | xnli                    | Classification      |
|   2 | xnli/ur                                                      | metaeval/xnli                               | ur                             |             | xnli                    | Classification      |
|   3 | xnli/vi                                                      | metaeval/xnli                               | vi                             |             | xnli                    | Classification      |
|   4 | xnli/zh                                                      | metaeval/xnli                               | zh                             |             | xnli                    | Classification      |
|   5 | xnli/hi                                                      | metaeval/xnli                               | hi                             |             | xnli                    | Classification      |
|   6 | xnli/fr                                                      | metaeval/xnli                               | fr                             |             | xnli                    | Classification      |
|   7 | xnli/es                                                      | metaeval/xnli                               | es                             |             | xnli                    | Classification      |
|   8 | xnli/en                                                      | metaeval/xnli                               | en                             |             | xnli                    | Classification      |
|   9 | xnli/el                                                      | metaeval/xnli                               | el                             |             | xnli                    | Classification      |
|  10 | xnli/de                                                      | metaeval/xnli                               | de                             |             | xnli                    | Classification      |
|  11 | xnli/bg                                                      | metaeval/xnli                               | bg                             |             | xnli                    | Classification      |
|  12 | xnli/ar                                                      | metaeval/xnli                               | ar                             |             | xnli                    | Classification      |
|  13 | xnli/th                                                      | metaeval/xnli                               | th                             |             | xnli                    | Classification      |
|  14 | xnli/sw                                                      | metaeval/xnli                               | sw                             |             | xnli                    | Classification      |
|  15 | americas_nli/all_languages                                   | americas_nli                                | all_languages                  |             | americas_nli            | Classification      |
|  16 | multilingual-NLI-26lang-2mil7/MoritzLaurer--multilingual_nli | MoritzLaurer/multilingual-NLI-26lang-2mil7  | MoritzLaurer--multilingual_nli |             | moritz_xnli             | Classification      |
|  17 | stsb_multi_mt/en                                             | stsb_multi_mt                               | en                             |             | stsb_multi_mt           | Classification      |
|  18 | stsb_multi_mt/fr                                             | stsb_multi_mt                               | fr                             |             | stsb_multi_mt           | Classification      |
|  19 | stsb_multi_mt/de                                             | stsb_multi_mt                               | de                             |             | stsb_multi_mt           | Classification      |
|  20 | stsb_multi_mt/es                                             | stsb_multi_mt                               | es                             |             | stsb_multi_mt           | Classification      |
|  21 | stsb_multi_mt/it                                             | stsb_multi_mt                               | it                             |             | stsb_multi_mt           | Classification      |
|  22 | stsb_multi_mt/nl                                             | stsb_multi_mt                               | nl                             |             | stsb_multi_mt           | Classification      |
|  23 | stsb_multi_mt/pl                                             | stsb_multi_mt                               | pl                             |             | stsb_multi_mt           | Classification      |
|  24 | stsb_multi_mt/pt                                             | stsb_multi_mt                               | pt                             |             | stsb_multi_mt           | Classification      |
|  25 | stsb_multi_mt/ru                                             | stsb_multi_mt                               | ru                             |             | stsb_multi_mt           | Classification      |
|  26 | stsb_multi_mt/zh                                             | stsb_multi_mt                               | zh                             |             | stsb_multi_mt           | Classification      |
|  27 | paws-x/zh                                                    | paws-x                                      | zh                             |             | pawsx                   | Classification      |
|  28 | paws-x/ja                                                    | paws-x                                      | ja                             |             | pawsx                   | Classification      |
|  29 | paws-x/ko                                                    | paws-x                                      | ko                             |             | pawsx                   | Classification      |
|  30 | paws-x/en                                                    | paws-x                                      | en                             |             | pawsx                   | Classification      |
|  31 | paws-x/de                                                    | paws-x                                      | de                             |             | pawsx                   | Classification      |
|  32 | paws-x/es                                                    | paws-x                                      | es                             |             | pawsx                   | Classification      |
|  33 | paws-x/fr                                                    | paws-x                                      | fr                             |             | pawsx                   | Classification      |
|  34 | miam/vm2                                                     | miam                                        | vm2                            |             | miam                    | Classification      |
|  35 | miam/maptask                                                 | miam                                        | maptask                        |             | miam                    | Classification      |
|  36 | miam/loria                                                   | miam                                        | loria                          |             | miam                    | Classification      |
|  37 | miam/dihana                                                  | miam                                        | dihana                         |             | miam                    | Classification      |
|  38 | miam/ilisten                                                 | miam                                        | ilisten                        |             | miam                    | Classification      |
|  39 | x-stance/fr                                                  | strombergnlp/x-stance                       | fr                             |             | xstance                 | Classification      |
|  40 | x-stance/de                                                  | strombergnlp/x-stance                       | de                             |             | xstance                 | Classification      |
|  41 | offenseval_2020/da                                           | strombergnlp/offenseval_2020                | da                             |             | offenseval              | Classification      |
|  42 | offenseval_2020/tr                                           | strombergnlp/offenseval_2020                | tr                             |             | offenseval              | Classification      |
|  43 | offenseval_2020/gr                                           | strombergnlp/offenseval_2020                | gr                             |             | offenseval              | Classification      |
|  44 | offenseval_2020/ar                                           | strombergnlp/offenseval_2020                | ar                             |             | offenseval              | Classification      |
|  45 | offenseval_dravidian/tamil                                   | offenseval_dravidian                        | tamil                          |             | offenseval_dravidian    | Classification      |
|  46 | offenseval_dravidian/malayalam                               | offenseval_dravidian                        | malayalam                      |             | offenseval_dravidian    | Classification      |
|  47 | offenseval_dravidian/kannada                                 | offenseval_dravidian                        | kannada                        |             | offenseval_dravidian    | Classification      |
|  48 | MLMA_hate_speech                                             | nedjmaou/MLMA_hate_speech                   |                                |             | mlma_hate               | Classification      |
|  49 | xglue/qam                                                    | xglue                                       | qam                            |             | qam                     | Classification      |
|  50 | x-fact                                                       | metaeval/x-fact                             |                                |             | x_fact                  | Classification      |
|  51 | xglue/nc                                                     | xglue                                       | nc                             |             | xglue___nc              | Classification      |
|  52 | xglue/qadsm                                                  | xglue                                       | qadsm                          |             | xglue___qadsm           | Classification      |
|  53 | xglue/qam                                                    | xglue                                       | qam                            |             | xglue___qam             | Classification      |
|  54 | xglue/wpr                                                    | xglue                                       | wpr                            |             | xglue___wpr             | Classification      |
|  55 | xlwic/xlwic_fr_fr                                            | pasinit/xlwic                               | xlwic_fr_fr                    |             | xlwic                   | Classification      |
|  56 | xlwic/xlwic_en_ko                                            | pasinit/xlwic                               | xlwic_en_ko                    |             | xlwic                   | Classification      |
|  57 | xlwic/xlwic_it_it                                            | pasinit/xlwic                               | xlwic_it_it                    |             | xlwic                   | Classification      |
|  58 | xlwic/xlwic_de_de                                            | pasinit/xlwic                               | xlwic_de_de                    |             | xlwic                   | Classification      |
|  59 | oasst1_dense_flat/quality                                    | tasksource/oasst1_dense_flat                |                                | quality     | oasst1__quality         | Classification      |
|  60 | oasst1_dense_flat/toxicity                                   | tasksource/oasst1_dense_flat                |                                | toxicity    | oasst1__toxicity        | Classification      |
|  61 | oasst1_dense_flat/helpfulness                                | tasksource/oasst1_dense_flat                |                                | helpfulness | oasst1__helpfulness     | Classification      |
|  62 | language-identification                                      | papluca/language-identification             |                                |             | language_identification | Classification      |
|  63 | wili_2018                                                    | wili_2018                                   |                                |             | wili_2018_langid        | Classification      |
|  64 | exams/multilingual                                           | exams                                       | multilingual                   |             | exams                   | MultipleChoice      |
|  65 | xcsr/X-CSQA-ar                                               | xcsr                                        | X-CSQA-ar                      |             | xcsr                    | MultipleChoice      |
|  66 | xcsr/X-CODAH-zh                                              | xcsr                                        | X-CODAH-zh                     |             | xcsr                    | MultipleChoice      |
|  67 | xcsr/X-CODAH-de                                              | xcsr                                        | X-CODAH-de                     |             | xcsr                    | MultipleChoice      |
|  68 | xcsr/X-CSQA-ru                                               | xcsr                                        | X-CSQA-ru                      |             | xcsr                    | MultipleChoice      |
|  69 | xcsr/X-CODAH-fr                                              | xcsr                                        | X-CODAH-fr                     |             | xcsr                    | MultipleChoice      |
|  70 | xcsr/X-CODAH-it                                              | xcsr                                        | X-CODAH-it                     |             | xcsr                    | MultipleChoice      |
|  71 | xcsr/X-CODAH-jap                                             | xcsr                                        | X-CODAH-jap                    |             | xcsr                    | MultipleChoice      |
|  72 | xcsr/X-CODAH-nl                                              | xcsr                                        | X-CODAH-nl                     |             | xcsr                    | MultipleChoice      |
|  73 | xcsr/X-CODAH-pt                                              | xcsr                                        | X-CODAH-pt                     |             | xcsr                    | MultipleChoice      |
|  74 | xcsr/X-CODAH-en                                              | xcsr                                        | X-CODAH-en                     |             | xcsr                    | MultipleChoice      |
|  75 | xcsr/X-CODAH-ru                                              | xcsr                                        | X-CODAH-ru                     |             | xcsr                    | MultipleChoice      |
|  76 | xcsr/X-CODAH-ar                                              | xcsr                                        | X-CODAH-ar                     |             | xcsr                    | MultipleChoice      |
|  77 | xcsr/X-CODAH-vi                                              | xcsr                                        | X-CODAH-vi                     |             | xcsr                    | MultipleChoice      |
|  78 | xcsr/X-CODAH-hi                                              | xcsr                                        | X-CODAH-hi                     |             | xcsr                    | MultipleChoice      |
|  79 | xcsr/X-CODAH-sw                                              | xcsr                                        | X-CODAH-sw                     |             | xcsr                    | MultipleChoice      |
|  80 | xcsr/X-CODAH-ur                                              | xcsr                                        | X-CODAH-ur                     |             | xcsr                    | MultipleChoice      |
|  81 | xcsr/X-CODAH-pl                                              | xcsr                                        | X-CODAH-pl                     |             | xcsr                    | MultipleChoice      |
|  82 | xcsr/X-CSQA-ur                                               | xcsr                                        | X-CSQA-ur                      |             | xcsr                    | MultipleChoice      |
|  83 | xcsr/X-CODAH-es                                              | xcsr                                        | X-CODAH-es                     |             | xcsr                    | MultipleChoice      |
|  84 | xcsr/X-CSQA-pt                                               | xcsr                                        | X-CSQA-pt                      |             | xcsr                    | MultipleChoice      |
|  85 | xcsr/X-CSQA-vi                                               | xcsr                                        | X-CSQA-vi                      |             | xcsr                    | MultipleChoice      |
|  86 | xcsr/X-CSQA-hi                                               | xcsr                                        | X-CSQA-hi                      |             | xcsr                    | MultipleChoice      |
|  87 | xcsr/X-CSQA-pl                                               | xcsr                                        | X-CSQA-pl                      |             | xcsr                    | MultipleChoice      |
|  88 | xcsr/X-CSQA-sw                                               | xcsr                                        | X-CSQA-sw                      |             | xcsr                    | MultipleChoice      |
|  89 | xcsr/X-CSQA-nl                                               | xcsr                                        | X-CSQA-nl                      |             | xcsr                    | MultipleChoice      |
|  90 | xcsr/X-CSQA-jap                                              | xcsr                                        | X-CSQA-jap                     |             | xcsr                    | MultipleChoice      |
|  91 | xcsr/X-CSQA-it                                               | xcsr                                        | X-CSQA-it                      |             | xcsr                    | MultipleChoice      |
|  92 | xcsr/X-CSQA-es                                               | xcsr                                        | X-CSQA-es                      |             | xcsr                    | MultipleChoice      |
|  93 | xcsr/X-CSQA-fr                                               | xcsr                                        | X-CSQA-fr                      |             | xcsr                    | MultipleChoice      |
|  94 | xcsr/X-CSQA-zh                                               | xcsr                                        | X-CSQA-zh                      |             | xcsr                    | MultipleChoice      |
|  95 | xcsr/X-CSQA-en                                               | xcsr                                        | X-CSQA-en                      |             | xcsr                    | MultipleChoice      |
|  96 | xcsr/X-CSQA-de                                               | xcsr                                        | X-CSQA-de                      |             | xcsr                    | MultipleChoice      |
|  97 | xcopa/qu                                                     | xcopa                                       | qu                             |             | xcopa                   | MultipleChoice      |
|  98 | xcopa/it                                                     | xcopa                                       | it                             |             | xcopa                   | MultipleChoice      |
|  99 | xcopa/ht                                                     | xcopa                                       | ht                             |             | xcopa                   | MultipleChoice      |
| 100 | xcopa/et                                                     | xcopa                                       | et                             |             | xcopa                   | MultipleChoice      |
| 101 | xcopa/vi                                                     | xcopa                                       | vi                             |             | xcopa                   | MultipleChoice      |
| 102 | xcopa/id                                                     | xcopa                                       | id                             |             | xcopa                   | MultipleChoice      |
| 103 | xcopa/translation-et                                         | xcopa                                       | translation-et                 |             | xcopa                   | MultipleChoice      |
| 104 | xcopa/th                                                     | xcopa                                       | th                             |             | xcopa                   | MultipleChoice      |
| 105 | xcopa/sw                                                     | xcopa                                       | sw                             |             | xcopa                   | MultipleChoice      |
| 106 | xcopa/translation-sw                                         | xcopa                                       | translation-sw                 |             | xcopa                   | MultipleChoice      |
| 107 | xcopa/translation-ht                                         | xcopa                                       | translation-ht                 |             | xcopa                   | MultipleChoice      |
| 108 | xcopa/translation-it                                         | xcopa                                       | translation-it                 |             | xcopa                   | MultipleChoice      |
| 109 | xcopa/ta                                                     | xcopa                                       | ta                             |             | xcopa                   | MultipleChoice      |
| 110 | xcopa/translation-zh                                         | xcopa                                       | translation-zh                 |             | xcopa                   | MultipleChoice      |
| 111 | xcopa/translation-vi                                         | xcopa                                       | translation-vi                 |             | xcopa                   | MultipleChoice      |
| 112 | xcopa/translation-id                                         | xcopa                                       | translation-id                 |             | xcopa                   | MultipleChoice      |
| 113 | xcopa/translation-tr                                         | xcopa                                       | translation-tr                 |             | xcopa                   | MultipleChoice      |
| 114 | xcopa/translation-th                                         | xcopa                                       | translation-th                 |             | xcopa                   | MultipleChoice      |
| 115 | xcopa/translation-ta                                         | xcopa                                       | translation-ta                 |             | xcopa                   | MultipleChoice      |
| 116 | xcopa/zh                                                     | xcopa                                       | zh                             |             | xcopa                   | MultipleChoice      |
| 117 | xcopa/tr                                                     | xcopa                                       | tr                             |             | xcopa                   | MultipleChoice      |
| 118 | xstory_cloze/eu                                              | juletxara/xstory_cloze                      | eu                             |             | xstory                  | MultipleChoice      |
| 119 | xstory_cloze/my                                              | juletxara/xstory_cloze                      | my                             |             | xstory                  | MultipleChoice      |
| 120 | xstory_cloze/te                                              | juletxara/xstory_cloze                      | te                             |             | xstory                  | MultipleChoice      |
| 121 | xstory_cloze/sw                                              | juletxara/xstory_cloze                      | sw                             |             | xstory                  | MultipleChoice      |
| 122 | xstory_cloze/en                                              | juletxara/xstory_cloze                      | en                             |             | xstory                  | MultipleChoice      |
| 123 | xstory_cloze/ru                                              | juletxara/xstory_cloze                      | ru                             |             | xstory                  | MultipleChoice      |
| 124 | xstory_cloze/zh                                              | juletxara/xstory_cloze                      | zh                             |             | xstory                  | MultipleChoice      |
| 125 | xstory_cloze/es                                              | juletxara/xstory_cloze                      | es                             |             | xstory                  | MultipleChoice      |
| 126 | xstory_cloze/ar                                              | juletxara/xstory_cloze                      | ar                             |             | xstory                  | MultipleChoice      |
| 127 | xstory_cloze/hi                                              | juletxara/xstory_cloze                      | hi                             |             | xstory                  | MultipleChoice      |
| 128 | xstory_cloze/id                                              | juletxara/xstory_cloze                      | id                             |             | xstory                  | MultipleChoice      |
| 129 | xglue/ner                                                    | xglue                                       | ner                            |             | xglue_ner               | TokenClassification |
| 130 | xglue/pos                                                    | xglue                                       | pos                            |             | xglue_pos               | TokenClassification |
| 131 | universal_dependencies/nyq_aha/pos                           | universal_dependencies                      | nyq_aha                        | pos         | udep__pos               | TokenClassification |
| 132 | universal_dependencies/sme_giella/pos                        | universal_dependencies                      | sme_giella                     | pos         | udep__pos               | TokenClassification |
| 133 | universal_dependencies/no_bokmaal/pos                        | universal_dependencies                      | no_bokmaal                     | pos         | udep__pos               | TokenClassification |
| 134 | universal_dependencies/no_nynorsk/pos                        | universal_dependencies                      | no_nynorsk                     | pos         | udep__pos               | TokenClassification |
| 135 | universal_dependencies/no_nynorsklia/pos                     | universal_dependencies                      | no_nynorsklia                  | pos         | udep__pos               | TokenClassification |
| 136 | universal_dependencies/cu_proiel/pos                         | universal_dependencies                      | cu_proiel                      | pos         | udep__pos               | TokenClassification |
| 137 | universal_dependencies/fro_srcmf/pos                         | universal_dependencies                      | fro_srcmf                      | pos         | udep__pos               | TokenClassification |
| 138 | universal_dependencies/orv_rnc/pos                           | universal_dependencies                      | orv_rnc                        | pos         | udep__pos               | TokenClassification |
| 139 | universal_dependencies/pl_lfg/pos                            | universal_dependencies                      | pl_lfg                         | pos         | udep__pos               | TokenClassification |
| 140 | universal_dependencies/otk_tonqq/pos                         | universal_dependencies                      | otk_tonqq                      | pos         | udep__pos               | TokenClassification |
| 141 | universal_dependencies/fa_perdt/pos                          | universal_dependencies                      | fa_perdt                       | pos         | udep__pos               | TokenClassification |
| 142 | universal_dependencies/fa_seraji/pos                         | universal_dependencies                      | fa_seraji                      | pos         | udep__pos               | TokenClassification |
| 143 | universal_dependencies/pcm_nsc/pos                           | universal_dependencies                      | pcm_nsc                        | pos         | udep__pos               | TokenClassification |
| 144 | universal_dependencies/pl_pdb/pos                            | universal_dependencies                      | pl_pdb                         | pos         | udep__pos               | TokenClassification |
| 145 | universal_dependencies/pl_pud/pos                            | universal_dependencies                      | pl_pud                         | pos         | udep__pos               | TokenClassification |
| 146 | universal_dependencies/pt_bosque/pos                         | universal_dependencies                      | pt_bosque                      | pos         | udep__pos               | TokenClassification |
| 147 | universal_dependencies/pt_gsd/pos                            | universal_dependencies                      | pt_gsd                         | pos         | udep__pos               | TokenClassification |
| 148 | universal_dependencies/pt_pud/pos                            | universal_dependencies                      | pt_pud                         | pos         | udep__pos               | TokenClassification |
| 149 | universal_dependencies/orv_torot/pos                         | universal_dependencies                      | orv_torot                      | pos         | udep__pos               | TokenClassification |
| 150 | universal_dependencies/myu_tudet/pos                         | universal_dependencies                      | myu_tudet                      | pos         | udep__pos               | TokenClassification |
| 151 | universal_dependencies/gv_cadhan/pos                         | universal_dependencies                      | gv_cadhan                      | pos         | udep__pos               | TokenClassification |
| 152 | universal_dependencies/gun_thomas/pos                        | universal_dependencies                      | gun_thomas                     | pos         | udep__pos               | TokenClassification |
| 153 | universal_dependencies/koi_uh/pos                            | universal_dependencies                      | koi_uh                         | pos         | udep__pos               | TokenClassification |
| 154 | universal_dependencies/kpv_ikdp/pos                          | universal_dependencies                      | kpv_ikdp                       | pos         | udep__pos               | TokenClassification |
| 155 | universal_dependencies/kpv_lattice/pos                       | universal_dependencies                      | kpv_lattice                    | pos         | udep__pos               | TokenClassification |
| 156 | universal_dependencies/ko_gsd/pos                            | universal_dependencies                      | ko_gsd                         | pos         | udep__pos               | TokenClassification |
| 157 | universal_dependencies/ko_kaist/pos                          | universal_dependencies                      | ko_kaist                       | pos         | udep__pos               | TokenClassification |
| 158 | universal_dependencies/ko_pud/pos                            | universal_dependencies                      | ko_pud                         | pos         | udep__pos               | TokenClassification |
| 159 | universal_dependencies/kmr_mg/pos                            | universal_dependencies                      | kmr_mg                         | pos         | udep__pos               | TokenClassification |
| 160 | universal_dependencies/la_ittb/pos                           | universal_dependencies                      | la_ittb                        | pos         | udep__pos               | TokenClassification |
| 161 | universal_dependencies/la_llct/pos                           | universal_dependencies                      | la_llct                        | pos         | udep__pos               | TokenClassification |
| 162 | universal_dependencies/la_perseus/pos                        | universal_dependencies                      | la_perseus                     | pos         | udep__pos               | TokenClassification |
| 163 | universal_dependencies/la_proiel/pos                         | universal_dependencies                      | la_proiel                      | pos         | udep__pos               | TokenClassification |
| 164 | universal_dependencies/lv_lvtb/pos                           | universal_dependencies                      | lv_lvtb                        | pos         | udep__pos               | TokenClassification |
| 165 | universal_dependencies/lt_alksnis/pos                        | universal_dependencies                      | lt_alksnis                     | pos         | udep__pos               | TokenClassification |
| 166 | universal_dependencies/lt_hse/pos                            | universal_dependencies                      | lt_hse                         | pos         | udep__pos               | TokenClassification |
| 167 | universal_dependencies/olo_kkpp/pos                          | universal_dependencies                      | olo_kkpp                       | pos         | udep__pos               | TokenClassification |
| 168 | universal_dependencies/mt_mudt/pos                           | universal_dependencies                      | mt_mudt                        | pos         | udep__pos               | TokenClassification |
| 169 | universal_dependencies/ro_nonstandard/pos                    | universal_dependencies                      | ro_nonstandard                 | pos         | udep__pos               | TokenClassification |
| 170 | universal_dependencies/mr_ufal/pos                           | universal_dependencies                      | mr_ufal                        | pos         | udep__pos               | TokenClassification |
| 171 | universal_dependencies/gun_dooley/pos                        | universal_dependencies                      | gun_dooley                     | pos         | udep__pos               | TokenClassification |
| 172 | universal_dependencies/mdf_jr/pos                            | universal_dependencies                      | mdf_jr                         | pos         | udep__pos               | TokenClassification |
| 173 | universal_dependencies/ro_rrt/pos                            | universal_dependencies                      | ro_rrt                         | pos         | udep__pos               | TokenClassification |
| 174 | universal_dependencies/ru_taiga/pos                          | universal_dependencies                      | ru_taiga                       | pos         | udep__pos               | TokenClassification |
| 175 | universal_dependencies/ru_gsd/pos                            | universal_dependencies                      | ru_gsd                         | pos         | udep__pos               | TokenClassification |
| 176 | universal_dependencies/ta_mwtt/pos                           | universal_dependencies                      | ta_mwtt                        | pos         | udep__pos               | TokenClassification |
| 177 | universal_dependencies/ta_ttb/pos                            | universal_dependencies                      | ta_ttb                         | pos         | udep__pos               | TokenClassification |
| 178 | universal_dependencies/te_mtg/pos                            | universal_dependencies                      | te_mtg                         | pos         | udep__pos               | TokenClassification |
| 179 | universal_dependencies/th_pud/pos                            | universal_dependencies                      | th_pud                         | pos         | udep__pos               | TokenClassification |
| 180 | universal_dependencies/tpn_tudet/pos                         | universal_dependencies                      | tpn_tudet                      | pos         | udep__pos               | TokenClassification |
| 181 | universal_dependencies/qtd_sagt/pos                          | universal_dependencies                      | qtd_sagt                       | pos         | udep__pos               | TokenClassification |
| 182 | universal_dependencies/tr_boun/pos                           | universal_dependencies                      | tr_boun                        | pos         | udep__pos               | TokenClassification |
| 183 | universal_dependencies/tr_gb/pos                             | universal_dependencies                      | tr_gb                          | pos         | udep__pos               | TokenClassification |
| 184 | universal_dependencies/tr_imst/pos                           | universal_dependencies                      | tr_imst                        | pos         | udep__pos               | TokenClassification |
| 185 | universal_dependencies/tr_pud/pos                            | universal_dependencies                      | tr_pud                         | pos         | udep__pos               | TokenClassification |
| 186 | universal_dependencies/uk_iu/pos                             | universal_dependencies                      | uk_iu                          | pos         | udep__pos               | TokenClassification |
| 187 | universal_dependencies/hsb_ufal/pos                          | universal_dependencies                      | hsb_ufal                       | pos         | udep__pos               | TokenClassification |
| 188 | universal_dependencies/ur_udtb/pos                           | universal_dependencies                      | ur_udtb                        | pos         | udep__pos               | TokenClassification |
| 189 | universal_dependencies/ug_udt/pos                            | universal_dependencies                      | ug_udt                         | pos         | udep__pos               | TokenClassification |
| 190 | universal_dependencies/vi_vtb/pos                            | universal_dependencies                      | vi_vtb                         | pos         | udep__pos               | TokenClassification |
| 191 | universal_dependencies/wbp_ufal/pos                          | universal_dependencies                      | wbp_ufal                       | pos         | udep__pos               | TokenClassification |
| 192 | universal_dependencies/cy_ccg/pos                            | universal_dependencies                      | cy_ccg                         | pos         | udep__pos               | TokenClassification |
| 193 | universal_dependencies/wo_wtb/pos                            | universal_dependencies                      | wo_wtb                         | pos         | udep__pos               | TokenClassification |
| 194 | universal_dependencies/yo_ytb/pos                            | universal_dependencies                      | yo_ytb                         | pos         | udep__pos               | TokenClassification |
| 195 | universal_dependencies/tl_ugnayan/pos                        | universal_dependencies                      | tl_ugnayan                     | pos         | udep__pos               | TokenClassification |
| 196 | universal_dependencies/ro_simonero/pos                       | universal_dependencies                      | ro_simonero                    | pos         | udep__pos               | TokenClassification |
| 197 | universal_dependencies/tl_trg/pos                            | universal_dependencies                      | tl_trg                         | pos         | udep__pos               | TokenClassification |
| 198 | universal_dependencies/sv_talbanken/pos                      | universal_dependencies                      | sv_talbanken                   | pos         | udep__pos               | TokenClassification |
| 199 | universal_dependencies/ru_pud/pos                            | universal_dependencies                      | ru_pud                         | pos         | udep__pos               | TokenClassification |
| 200 | universal_dependencies/ru_syntagrus/pos                      | universal_dependencies                      | ru_syntagrus                   | pos         | udep__pos               | TokenClassification |
| 201 | universal_dependencies/kfm_aha/pos                           | universal_dependencies                      | kfm_aha                        | pos         | udep__pos               | TokenClassification |
| 202 | universal_dependencies/sa_ufal/pos                           | universal_dependencies                      | sa_ufal                        | pos         | udep__pos               | TokenClassification |
| 203 | universal_dependencies/sa_vedic/pos                          | universal_dependencies                      | sa_vedic                       | pos         | udep__pos               | TokenClassification |
| 204 | universal_dependencies/gd_arcosg/pos                         | universal_dependencies                      | gd_arcosg                      | pos         | udep__pos               | TokenClassification |
| 205 | universal_dependencies/sr_set/pos                            | universal_dependencies                      | sr_set                         | pos         | udep__pos               | TokenClassification |
| 206 | universal_dependencies/sms_giellagas/pos                     | universal_dependencies                      | sms_giellagas                  | pos         | udep__pos               | TokenClassification |
| 207 | universal_dependencies/sk_snk/pos                            | universal_dependencies                      | sk_snk                         | pos         | udep__pos               | TokenClassification |
| 208 | universal_dependencies/sl_ssj/pos                            | universal_dependencies                      | sl_ssj                         | pos         | udep__pos               | TokenClassification |
| 209 | universal_dependencies/sl_sst/pos                            | universal_dependencies                      | sl_sst                         | pos         | udep__pos               | TokenClassification |
| 210 | universal_dependencies/soj_aha/pos                           | universal_dependencies                      | soj_aha                        | pos         | udep__pos               | TokenClassification |
| 211 | universal_dependencies/ajp_madar/pos                         | universal_dependencies                      | ajp_madar                      | pos         | udep__pos               | TokenClassification |
| 212 | universal_dependencies/es_ancora/pos                         | universal_dependencies                      | es_ancora                      | pos         | udep__pos               | TokenClassification |
| 213 | universal_dependencies/es_gsd/pos                            | universal_dependencies                      | es_gsd                         | pos         | udep__pos               | TokenClassification |
| 214 | universal_dependencies/es_pud/pos                            | universal_dependencies                      | es_pud                         | pos         | udep__pos               | TokenClassification |
| 215 | universal_dependencies/swl_sslc/pos                          | universal_dependencies                      | swl_sslc                       | pos         | udep__pos               | TokenClassification |
| 216 | universal_dependencies/sv_lines/pos                          | universal_dependencies                      | sv_lines                       | pos         | udep__pos               | TokenClassification |
| 217 | universal_dependencies/sv_pud/pos                            | universal_dependencies                      | sv_pud                         | pos         | udep__pos               | TokenClassification |
| 218 | universal_dependencies/gsw_uzh/pos                           | universal_dependencies                      | gsw_uzh                        | pos         | udep__pos               | TokenClassification |
| 219 | universal_dependencies/kk_ktb/pos                            | universal_dependencies                      | kk_ktb                         | pos         | udep__pos               | TokenClassification |
| 220 | universal_dependencies/hi_hdtb/pos                           | universal_dependencies                      | hi_hdtb                        | pos         | udep__pos               | TokenClassification |
| 221 | universal_dependencies/ja_pud/pos                            | universal_dependencies                      | ja_pud                         | pos         | udep__pos               | TokenClassification |
| 222 | universal_dependencies/zh_gsd/pos                            | universal_dependencies                      | zh_gsd                         | pos         | udep__pos               | TokenClassification |
| 223 | universal_dependencies/zh_gsdsimp/pos                        | universal_dependencies                      | zh_gsdsimp                     | pos         | udep__pos               | TokenClassification |
| 224 | universal_dependencies/zh_hk/pos                             | universal_dependencies                      | zh_hk                          | pos         | udep__pos               | TokenClassification |
| 225 | universal_dependencies/zh_pud/pos                            | universal_dependencies                      | zh_pud                         | pos         | udep__pos               | TokenClassification |
| 226 | universal_dependencies/ckt_hse/pos                           | universal_dependencies                      | ckt_hse                        | pos         | udep__pos               | TokenClassification |
| 227 | universal_dependencies/lzh_kyoto/pos                         | universal_dependencies                      | lzh_kyoto                      | pos         | udep__pos               | TokenClassification |
| 228 | universal_dependencies/cop_scriptorium/pos                   | universal_dependencies                      | cop_scriptorium                | pos         | udep__pos               | TokenClassification |
| 229 | universal_dependencies/hr_set/pos                            | universal_dependencies                      | hr_set                         | pos         | udep__pos               | TokenClassification |
| 230 | universal_dependencies/cs_cac/pos                            | universal_dependencies                      | cs_cac                         | pos         | udep__pos               | TokenClassification |
| 231 | universal_dependencies/cs_cltt/pos                           | universal_dependencies                      | cs_cltt                        | pos         | udep__pos               | TokenClassification |
| 232 | universal_dependencies/cs_fictree/pos                        | universal_dependencies                      | cs_fictree                     | pos         | udep__pos               | TokenClassification |
| 233 | universal_dependencies/cs_pdt/pos                            | universal_dependencies                      | cs_pdt                         | pos         | udep__pos               | TokenClassification |
| 234 | universal_dependencies/cs_pud/pos                            | universal_dependencies                      | cs_pud                         | pos         | udep__pos               | TokenClassification |
| 235 | universal_dependencies/da_ddt/pos                            | universal_dependencies                      | da_ddt                         | pos         | udep__pos               | TokenClassification |
| 236 | universal_dependencies/nl_alpino/pos                         | universal_dependencies                      | nl_alpino                      | pos         | udep__pos               | TokenClassification |
| 237 | universal_dependencies/nl_lassysmall/pos                     | universal_dependencies                      | nl_lassysmall                  | pos         | udep__pos               | TokenClassification |
| 238 | universal_dependencies/en_esl/pos                            | universal_dependencies                      | en_esl                         | pos         | udep__pos               | TokenClassification |
| 239 | universal_dependencies/en_ewt/pos                            | universal_dependencies                      | en_ewt                         | pos         | udep__pos               | TokenClassification |
| 240 | universal_dependencies/en_gum/pos                            | universal_dependencies                      | en_gum                         | pos         | udep__pos               | TokenClassification |
| 241 | universal_dependencies/zh_cfl/pos                            | universal_dependencies                      | zh_cfl                         | pos         | udep__pos               | TokenClassification |
| 242 | universal_dependencies/ca_ancora/pos                         | universal_dependencies                      | ca_ancora                      | pos         | udep__pos               | TokenClassification |
| 243 | universal_dependencies/yue_hk/pos                            | universal_dependencies                      | yue_hk                         | pos         | udep__pos               | TokenClassification |
| 244 | universal_dependencies/bxr_bdt/pos                           | universal_dependencies                      | bxr_bdt                        | pos         | udep__pos               | TokenClassification |
| 245 | universal_dependencies/af_afribooms/pos                      | universal_dependencies                      | af_afribooms                   | pos         | udep__pos               | TokenClassification |
| 246 | universal_dependencies/krl_kkpp/pos                          | universal_dependencies                      | krl_kkpp                       | pos         | udep__pos               | TokenClassification |
| 247 | universal_dependencies/akk_riao/pos                          | universal_dependencies                      | akk_riao                       | pos         | udep__pos               | TokenClassification |
| 248 | universal_dependencies/aqz_tudet/pos                         | universal_dependencies                      | aqz_tudet                      | pos         | udep__pos               | TokenClassification |
| 249 | universal_dependencies/sq_tsa/pos                            | universal_dependencies                      | sq_tsa                         | pos         | udep__pos               | TokenClassification |
| 250 | universal_dependencies/am_att/pos                            | universal_dependencies                      | am_att                         | pos         | udep__pos               | TokenClassification |
| 251 | universal_dependencies/grc_perseus/pos                       | universal_dependencies                      | grc_perseus                    | pos         | udep__pos               | TokenClassification |
| 252 | universal_dependencies/grc_proiel/pos                        | universal_dependencies                      | grc_proiel                     | pos         | udep__pos               | TokenClassification |
| 253 | universal_dependencies/apu_ufpa/pos                          | universal_dependencies                      | apu_ufpa                       | pos         | udep__pos               | TokenClassification |
| 254 | universal_dependencies/en_gumreddit/pos                      | universal_dependencies                      | en_gumreddit                   | pos         | udep__pos               | TokenClassification |
| 255 | universal_dependencies/ar_nyuad/pos                          | universal_dependencies                      | ar_nyuad                       | pos         | udep__pos               | TokenClassification |
| 256 | universal_dependencies/ar_pud/pos                            | universal_dependencies                      | ar_pud                         | pos         | udep__pos               | TokenClassification |
| 257 | universal_dependencies/hy_armtdp/pos                         | universal_dependencies                      | hy_armtdp                      | pos         | udep__pos               | TokenClassification |
| 258 | universal_dependencies/aii_as/pos                            | universal_dependencies                      | aii_as                         | pos         | udep__pos               | TokenClassification |
| 259 | universal_dependencies/bm_crb/pos                            | universal_dependencies                      | bm_crb                         | pos         | udep__pos               | TokenClassification |
| 260 | universal_dependencies/eu_bdt/pos                            | universal_dependencies                      | eu_bdt                         | pos         | udep__pos               | TokenClassification |
| 261 | universal_dependencies/be_hse/pos                            | universal_dependencies                      | be_hse                         | pos         | udep__pos               | TokenClassification |
| 262 | universal_dependencies/bho_bhtb/pos                          | universal_dependencies                      | bho_bhtb                       | pos         | udep__pos               | TokenClassification |
| 263 | universal_dependencies/br_keb/pos                            | universal_dependencies                      | br_keb                         | pos         | udep__pos               | TokenClassification |
| 264 | universal_dependencies/bg_btb/pos                            | universal_dependencies                      | bg_btb                         | pos         | udep__pos               | TokenClassification |
| 265 | universal_dependencies/ar_padt/pos                           | universal_dependencies                      | ar_padt                        | pos         | udep__pos               | TokenClassification |
| 266 | universal_dependencies/en_lines/pos                          | universal_dependencies                      | en_lines                       | pos         | udep__pos               | TokenClassification |
| 267 | universal_dependencies/akk_pisandub/pos                      | universal_dependencies                      | akk_pisandub                   | pos         | udep__pos               | TokenClassification |
| 268 | universal_dependencies/en_pronouns/pos                       | universal_dependencies                      | en_pronouns                    | pos         | udep__pos               | TokenClassification |
| 269 | universal_dependencies/el_gdt/pos                            | universal_dependencies                      | el_gdt                         | pos         | udep__pos               | TokenClassification |
| 270 | universal_dependencies/he_htb/pos                            | universal_dependencies                      | he_htb                         | pos         | udep__pos               | TokenClassification |
| 271 | universal_dependencies/qhe_hiencs/pos                        | universal_dependencies                      | qhe_hiencs                     | pos         | udep__pos               | TokenClassification |
| 272 | universal_dependencies/hi_pud/pos                            | universal_dependencies                      | hi_pud                         | pos         | udep__pos               | TokenClassification |
| 273 | universal_dependencies/hu_szeged/pos                         | universal_dependencies                      | hu_szeged                      | pos         | udep__pos               | TokenClassification |
| 274 | universal_dependencies/is_icepahc/pos                        | universal_dependencies                      | is_icepahc                     | pos         | udep__pos               | TokenClassification |
| 275 | universal_dependencies/id_csui/pos                           | universal_dependencies                      | id_csui                        | pos         | udep__pos               | TokenClassification |
| 276 | universal_dependencies/id_gsd/pos                            | universal_dependencies                      | id_gsd                         | pos         | udep__pos               | TokenClassification |
| 277 | universal_dependencies/id_pud/pos                            | universal_dependencies                      | id_pud                         | pos         | udep__pos               | TokenClassification |
| 278 | universal_dependencies/ga_idt/pos                            | universal_dependencies                      | ga_idt                         | pos         | udep__pos               | TokenClassification |
| 279 | universal_dependencies/it_isdt/pos                           | universal_dependencies                      | it_isdt                        | pos         | udep__pos               | TokenClassification |
| 280 | universal_dependencies/it_partut/pos                         | universal_dependencies                      | it_partut                      | pos         | udep__pos               | TokenClassification |
| 281 | universal_dependencies/it_postwita/pos                       | universal_dependencies                      | it_postwita                    | pos         | udep__pos               | TokenClassification |
| 282 | universal_dependencies/it_pud/pos                            | universal_dependencies                      | it_pud                         | pos         | udep__pos               | TokenClassification |
| 283 | universal_dependencies/it_twittiro/pos                       | universal_dependencies                      | it_twittiro                    | pos         | udep__pos               | TokenClassification |
| 284 | universal_dependencies/it_vit/pos                            | universal_dependencies                      | it_vit                         | pos         | udep__pos               | TokenClassification |
| 285 | universal_dependencies/ja_bccwj/pos                          | universal_dependencies                      | ja_bccwj                       | pos         | udep__pos               | TokenClassification |
| 286 | universal_dependencies/ja_gsd/pos                            | universal_dependencies                      | ja_gsd                         | pos         | udep__pos               | TokenClassification |
| 287 | universal_dependencies/ja_modern/pos                         | universal_dependencies                      | ja_modern                      | pos         | udep__pos               | TokenClassification |
| 288 | universal_dependencies/got_proiel/pos                        | universal_dependencies                      | got_proiel                     | pos         | udep__pos               | TokenClassification |
| 289 | universal_dependencies/de_pud/pos                            | universal_dependencies                      | de_pud                         | pos         | udep__pos               | TokenClassification |
| 290 | universal_dependencies/is_pud/pos                            | universal_dependencies                      | is_pud                         | pos         | udep__pos               | TokenClassification |
| 291 | universal_dependencies/de_hdt/pos                            | universal_dependencies                      | de_hdt                         | pos         | udep__pos               | TokenClassification |
| 292 | universal_dependencies/en_pud/pos                            | universal_dependencies                      | en_pud                         | pos         | udep__pos               | TokenClassification |
| 293 | universal_dependencies/myv_jr/pos                            | universal_dependencies                      | myv_jr                         | pos         | udep__pos               | TokenClassification |
| 294 | universal_dependencies/de_lit/pos                            | universal_dependencies                      | de_lit                         | pos         | udep__pos               | TokenClassification |
| 295 | universal_dependencies/et_ewt/pos                            | universal_dependencies                      | et_ewt                         | pos         | udep__pos               | TokenClassification |
| 296 | universal_dependencies/fo_farpahc/pos                        | universal_dependencies                      | fo_farpahc                     | pos         | udep__pos               | TokenClassification |
| 297 | universal_dependencies/fo_oft/pos                            | universal_dependencies                      | fo_oft                         | pos         | udep__pos               | TokenClassification |
| 298 | universal_dependencies/fi_ftb/pos                            | universal_dependencies                      | fi_ftb                         | pos         | udep__pos               | TokenClassification |
| 299 | universal_dependencies/fi_ood/pos                            | universal_dependencies                      | fi_ood                         | pos         | udep__pos               | TokenClassification |
| 300 | universal_dependencies/fi_pud/pos                            | universal_dependencies                      | fi_pud                         | pos         | udep__pos               | TokenClassification |
| 301 | universal_dependencies/fi_tdt/pos                            | universal_dependencies                      | fi_tdt                         | pos         | udep__pos               | TokenClassification |
| 302 | universal_dependencies/et_edt/pos                            | universal_dependencies                      | et_edt                         | pos         | udep__pos               | TokenClassification |
| 303 | universal_dependencies/fr_ftb/pos                            | universal_dependencies                      | fr_ftb                         | pos         | udep__pos               | TokenClassification |
| 304 | universal_dependencies/fr_fqb/pos                            | universal_dependencies                      | fr_fqb                         | pos         | udep__pos               | TokenClassification |
| 305 | universal_dependencies/de_gsd/pos                            | universal_dependencies                      | de_gsd                         | pos         | udep__pos               | TokenClassification |
| 306 | universal_dependencies/gl_treegal/pos                        | universal_dependencies                      | gl_treegal                     | pos         | udep__pos               | TokenClassification |
| 307 | universal_dependencies/gl_ctg/pos                            | universal_dependencies                      | gl_ctg                         | pos         | udep__pos               | TokenClassification |
| 308 | universal_dependencies/fr_spoken/pos                         | universal_dependencies                      | fr_spoken                      | pos         | udep__pos               | TokenClassification |
| 309 | universal_dependencies/en_partut/pos                         | universal_dependencies                      | en_partut                      | pos         | udep__pos               | TokenClassification |
| 310 | universal_dependencies/fr_pud/pos                            | universal_dependencies                      | fr_pud                         | pos         | udep__pos               | TokenClassification |
| 311 | universal_dependencies/fr_partut/pos                         | universal_dependencies                      | fr_partut                      | pos         | udep__pos               | TokenClassification |
| 312 | universal_dependencies/fr_sequoia/pos                        | universal_dependencies                      | fr_sequoia                     | pos         | udep__pos               | TokenClassification |
| 313 | universal_dependencies/fr_gsd/pos                            | universal_dependencies                      | fr_gsd                         | pos         | udep__pos               | TokenClassification |
| 314 | oasst1_pairwise_rlhf_reward                                  | tasksource/oasst1_pairwise_rlhf_reward      |                                |             | oasst_rlhf              | MultipleChoice      |
| 315 | multilingual-sentiments/all                                  | tyqiangz/multilingual-sentiments            | all                            |             | sentiment               | Classification      |
| 316 | tweet_sentiment_multilingual/arabic                          | cardiffnlp/tweet_sentiment_multilingual     | arabic                         |             | tweet_sentiment         | Classification      |
| 317 | tweet_sentiment_multilingual/french                          | cardiffnlp/tweet_sentiment_multilingual     | french                         |             | tweet_sentiment         | Classification      |
| 318 | tweet_sentiment_multilingual/english                         | cardiffnlp/tweet_sentiment_multilingual     | english                        |             | tweet_sentiment         | Classification      |
| 319 | tweet_sentiment_multilingual/hindi                           | cardiffnlp/tweet_sentiment_multilingual     | hindi                          |             | tweet_sentiment         | Classification      |
| 320 | tweet_sentiment_multilingual/portuguese                      | cardiffnlp/tweet_sentiment_multilingual     | portuguese                     |             | tweet_sentiment         | Classification      |
| 321 | tweet_sentiment_multilingual/spanish                         | cardiffnlp/tweet_sentiment_multilingual     | spanish                        |             | tweet_sentiment         | Classification      |
| 322 | tweet_sentiment_multilingual/all                             | cardiffnlp/tweet_sentiment_multilingual     | all                            |             | tweet_sentiment         | Classification      |
| 323 | tweet_sentiment_multilingual/german                          | cardiffnlp/tweet_sentiment_multilingual     | german                         |             | tweet_sentiment         | Classification      |
| 324 | tweet_sentiment_multilingual/italian                         | cardiffnlp/tweet_sentiment_multilingual     | italian                        |             | tweet_sentiment         | Classification      |
| 325 | amazon_reviews_multi/all_languages                           | amazon_reviews_multi                        | all_languages                  |             | review_sentiment        | Classification      |
| 326 | universal-joy                                                | metaeval/universal-joy                      |                                |             | emotion                 | Classification      |
| 327 | mms                                                          | Brand24/mms                                 |                                |             | mms_sentiment           | Classification      |
| 328 | mapa                                                         | joelito/mapa                                |                                |             | mapa_fine               | TokenClassification |
| 329 | mapa                                                         | joelito/mapa                                |                                |             | mapa_corase             | TokenClassification |
| 330 | ACES                                                         | nikitam/ACES                                |                                |             | aces_ranking            | MultipleChoice      |
| 331 | ACES                                                         | nikitam/ACES                                |                                |             | aces_phenomena          | Classification      |
| 332 | massive/my-MM                                                | AmazonScience/massive                       | my-MM                          |             | amazon_intent           | Classification      |
| 333 | massive/ro-RO                                                | AmazonScience/massive                       | ro-RO                          |             | amazon_intent           | Classification      |
| 334 | massive/pt-PT                                                | AmazonScience/massive                       | pt-PT                          |             | amazon_intent           | Classification      |
| 335 | massive/pl-PL                                                | AmazonScience/massive                       | pl-PL                          |             | amazon_intent           | Classification      |
| 336 | massive/nl-NL                                                | AmazonScience/massive                       | nl-NL                          |             | amazon_intent           | Classification      |
| 337 | massive/nb-NO                                                | AmazonScience/massive                       | nb-NO                          |             | amazon_intent           | Classification      |
| 338 | massive/es-ES                                                | AmazonScience/massive                       | es-ES                          |             | amazon_intent           | Classification      |
| 339 | massive/ms-MY                                                | AmazonScience/massive                       | ms-MY                          |             | amazon_intent           | Classification      |
| 340 | massive/mn-MN                                                | AmazonScience/massive                       | mn-MN                          |             | amazon_intent           | Classification      |
| 341 | massive/ml-IN                                                | AmazonScience/massive                       | ml-IN                          |             | amazon_intent           | Classification      |
| 342 | massive/lv-LV                                                | AmazonScience/massive                       | lv-LV                          |             | amazon_intent           | Classification      |
| 343 | massive/ko-KR                                                | AmazonScience/massive                       | ko-KR                          |             | amazon_intent           | Classification      |
| 344 | massive/ru-RU                                                | AmazonScience/massive                       | ru-RU                          |             | amazon_intent           | Classification      |
| 345 | massive/kn-IN                                                | AmazonScience/massive                       | kn-IN                          |             | amazon_intent           | Classification      |
| 346 | massive/ka-GE                                                | AmazonScience/massive                       | ka-GE                          |             | amazon_intent           | Classification      |
| 347 | massive/jv-ID                                                | AmazonScience/massive                       | jv-ID                          |             | amazon_intent           | Classification      |
| 348 | massive/ja-JP                                                | AmazonScience/massive                       | ja-JP                          |             | amazon_intent           | Classification      |
| 349 | massive/it-IT                                                | AmazonScience/massive                       | it-IT                          |             | amazon_intent           | Classification      |
| 350 | massive/is-IS                                                | AmazonScience/massive                       | is-IS                          |             | amazon_intent           | Classification      |
| 351 | massive/id-ID                                                | AmazonScience/massive                       | id-ID                          |             | amazon_intent           | Classification      |
| 352 | massive/hy-AM                                                | AmazonScience/massive                       | hy-AM                          |             | amazon_intent           | Classification      |
| 353 | massive/hu-HU                                                | AmazonScience/massive                       | hu-HU                          |             | amazon_intent           | Classification      |
| 354 | massive/hi-IN                                                | AmazonScience/massive                       | hi-IN                          |             | amazon_intent           | Classification      |
| 355 | massive/he-IL                                                | AmazonScience/massive                       | he-IL                          |             | amazon_intent           | Classification      |
| 356 | massive/fr-FR                                                | AmazonScience/massive                       | fr-FR                          |             | amazon_intent           | Classification      |
| 357 | massive/km-KH                                                | AmazonScience/massive                       | km-KH                          |             | amazon_intent           | Classification      |
| 358 | massive/fi-FI                                                | AmazonScience/massive                       | fi-FI                          |             | amazon_intent           | Classification      |
| 359 | massive/sl-SL                                                | AmazonScience/massive                       | sl-SL                          |             | amazon_intent           | Classification      |
| 360 | massive/sv-SE                                                | AmazonScience/massive                       | sv-SE                          |             | amazon_intent           | Classification      |
| 361 | massive/af-ZA                                                | AmazonScience/massive                       | af-ZA                          |             | amazon_intent           | Classification      |
| 362 | massive/am-ET                                                | AmazonScience/massive                       | am-ET                          |             | amazon_intent           | Classification      |
| 363 | massive/ar-SA                                                | AmazonScience/massive                       | ar-SA                          |             | amazon_intent           | Classification      |
| 364 | massive/az-AZ                                                | AmazonScience/massive                       | az-AZ                          |             | amazon_intent           | Classification      |
| 365 | massive/bn-BD                                                | AmazonScience/massive                       | bn-BD                          |             | amazon_intent           | Classification      |
| 366 | massive/ca-ES                                                | AmazonScience/massive                       | ca-ES                          |             | amazon_intent           | Classification      |
| 367 | massive/cy-GB                                                | AmazonScience/massive                       | cy-GB                          |             | amazon_intent           | Classification      |
| 368 | massive/da-DK                                                | AmazonScience/massive                       | da-DK                          |             | amazon_intent           | Classification      |
| 369 | massive/de-DE                                                | AmazonScience/massive                       | de-DE                          |             | amazon_intent           | Classification      |
| 370 | massive/el-GR                                                | AmazonScience/massive                       | el-GR                          |             | amazon_intent           | Classification      |
| 371 | massive/sq-AL                                                | AmazonScience/massive                       | sq-AL                          |             | amazon_intent           | Classification      |
| 372 | massive/en-US                                                | AmazonScience/massive                       | en-US                          |             | amazon_intent           | Classification      |
| 373 | massive/all                                                  | AmazonScience/massive                       | all                            |             | amazon_intent           | Classification      |
| 374 | massive/zh-TW                                                | AmazonScience/massive                       | zh-TW                          |             | amazon_intent           | Classification      |
| 375 | massive/zh-CN                                                | AmazonScience/massive                       | zh-CN                          |             | amazon_intent           | Classification      |
| 376 | massive/vi-VN                                                | AmazonScience/massive                       | vi-VN                          |             | amazon_intent           | Classification      |
| 377 | massive/ur-PK                                                | AmazonScience/massive                       | ur-PK                          |             | amazon_intent           | Classification      |
| 378 | massive/tr-TR                                                | AmazonScience/massive                       | tr-TR                          |             | amazon_intent           | Classification      |
| 379 | massive/tl-PH                                                | AmazonScience/massive                       | tl-PH                          |             | amazon_intent           | Classification      |
| 380 | massive/th-TH                                                | AmazonScience/massive                       | th-TH                          |             | amazon_intent           | Classification      |
| 381 | massive/te-IN                                                | AmazonScience/massive                       | te-IN                          |             | amazon_intent           | Classification      |
| 382 | massive/ta-IN                                                | AmazonScience/massive                       | ta-IN                          |             | amazon_intent           | Classification      |
| 383 | massive/sw-KE                                                | AmazonScience/massive                       | sw-KE                          |             | amazon_intent           | Classification      |
| 384 | massive/all_1.1                                              | AmazonScience/massive                       | all_1.1                        |             | amazon_intent           | Classification      |
| 385 | massive/fa-IR                                                | AmazonScience/massive                       | fa-IR                          |             | amazon_intent           | Classification      |
| 386 | tydi-as2-balanced                                            | tasksource/tydi-as2-balanced                |                                |             | tidy_as2                | Classification      |
| 387 | multiconer_v2/Hindi (HI)                                     | MultiCoNER/multiconer_v2                    | Hindi (HI)                     |             | multiconer              | TokenClassification |
| 388 | multiconer_v2/Multilingual (MULTI)                           | MultiCoNER/multiconer_v2                    | Multilingual (MULTI)           |             | multiconer              | TokenClassification |
| 389 | multiconer_v2/Ukrainian (UK)                                 | MultiCoNER/multiconer_v2                    | Ukrainian (UK)                 |             | multiconer              | TokenClassification |
| 390 | multiconer_v2/Swedish (SV)                                   | MultiCoNER/multiconer_v2                    | Swedish (SV)                   |             | multiconer              | TokenClassification |
| 391 | multiconer_v2/Spanish (ES)                                   | MultiCoNER/multiconer_v2                    | Spanish (ES)                   |             | multiconer              | TokenClassification |
| 392 | multiconer_v2/Bangla (BN)                                    | MultiCoNER/multiconer_v2                    | Bangla (BN)                    |             | multiconer              | TokenClassification |
| 393 | multiconer_v2/Chinese (ZH)                                   | MultiCoNER/multiconer_v2                    | Chinese (ZH)                   |             | multiconer              | TokenClassification |
| 394 | multiconer_v2/English (EN)                                   | MultiCoNER/multiconer_v2                    | English (EN)                   |             | multiconer              | TokenClassification |
| 395 | multiconer_v2/Farsi (FA)                                     | MultiCoNER/multiconer_v2                    | Farsi (FA)                     |             | multiconer              | TokenClassification |
| 396 | multiconer_v2/Portuguese (PT)                                | MultiCoNER/multiconer_v2                    | Portuguese (PT)                |             | multiconer              | TokenClassification |
| 397 | multiconer_v2/German (DE)                                    | MultiCoNER/multiconer_v2                    | German (DE)                    |             | multiconer              | TokenClassification |
| 398 | multiconer_v2/Italian (IT)                                   | MultiCoNER/multiconer_v2                    | Italian (IT)                   |             | multiconer              | TokenClassification |
| 399 | multiconer_v2/French (FR)                                    | MultiCoNER/multiconer_v2                    | French (FR)                    |             | multiconer              | TokenClassification |
| 400 | mtop                                                         | tasksource/mtop                             |                                |             | mtop                    | Classification      |
| 401 | multilingual-zero-shot-label-nli                             | tasksource/multilingual-zero-shot-label-nli |                                |             | mlabel_nli              | Classification      |


================================================
FILE: pyproject.toml
================================================
[build-system]
requires = ["setuptools>=45", "setuptools_scm[toml]>=6.2"]
build-backend = "setuptools.build_meta"

[tool.setuptools_scm]


================================================
FILE: setup.cfg
================================================
 [metadata]
name = tasksource
description = Preprocessings to prepare datasets for a task
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/sileod/tasksource/
classifiers =
    Programming Language :: Python :: 3
    License :: OSI Approved :: BSD License
    Intended Audience :: Developers

[options]
package_dir =
    = src
packages = find:
python_requires = >=3.6
install_requires =
    dotwiz
    funcy
    datasets
    exrex
    magicattr
    pandas
    numpy
    scipy
    sorcery

[options.packages.find]
where = src


================================================
FILE: src/tasksource/.ipynb_checkpoints/access-checkpoint.py
================================================
from .preprocess import Preprocessing
import re
import pandas as pd
from . import tasks, recast
from .metadata import dataset_rank
from datasets import load_dataset
import funcy as fc
import os
import copy
from sorcery import dict_of
from functools import cache
import random


class lazy_mtasks:
    def __getattr__(self, name):
        from . import mtasks
        return getattr(mtasks, name)

    def __dir__(self):
        from . import mtasks
        return dir(mtasks)
lmtasks=lazy_mtasks()

def parse_var_name(s):
    config_name,task_name = None,None
    if '__' in s and '___' not in s: # dataset__task
        dataset_name, task_name = s.split('__') 
    elif '__' not in s.replace('___','') and '___' in s: #dataset___config
        dataset_name, config_name = s.split('___') 
    elif  '___' in s and '__' in s.split('___')[1]: #dataset___config__task
        dataset_name, config_task=s.split('___')
        config_name,task_name = config_task.split('__')
    else: # dataset 
        dataset_name = s
    return dataset_name,config_name,task_name

def pretty_name(x):
    dn = x.dataset_name.split("/")[-1]   
    cn = x.config_name if x.config_name else ""
    tn = x.task_name if x.task_name else ""
    return f"{dn}/{cn}/{tn}".replace('//','/').rstrip('/')

@cache
def list_tasks(tasks_path=f'{os.path.dirname(__file__)}/tasks.py',multilingual=False,instruct=False, excluded=[]):
    if multilingual:
        tasks_path=tasks_path.replace('/tasks.py','/mtasks.py')
    task_order = open(tasks_path).readlines()
    task_order = [x.split('=')[0].rstrip() for x in task_order if '=' in x]
    task_order = [x for x in task_order if x.isidentifier()]
    task_order = fc.flip(dict(enumerate(task_order)))

    l = []
    _tasks = (lmtasks if multilingual else tasks)

    for key in dir(_tasks):
        if key not in task_order:
            continue
        value=getattr(_tasks, key)
        if isinstance(value,Preprocessing):
            dataset_name, config_name, task_name = parse_var_name(key)
            dataset_name = (value.dataset_name if value.dataset_name else dataset_name)
            config_name = (value.config_name if value.config_name else config_name)
            hasattr(value,key)
            l+=[{'dataset_name': dataset_name,
                 'config_name' : config_name,
                 'task_name': task_name,
                 'preprocessing_name': key,
                'task_type': value.__class__.__name__,'mapping': value,
                'rank':task_order.get(key,None)}]   
    df=pd.DataFrame(l).explode('config_name')
    df = df.sort_values('rank').reset_index(drop=True)
    df['id'] = df.apply(lambda x: pretty_name(x), axis=1)
    df.insert(0, 'id', df.pop('id'))
    del df['rank']
    if instruct:
        df=df[df.id.map(lambda x: not any(a in x for a in recast.improper_labels))]
    df=df[df.id.map(lambda x: not any(x in a for a in excluded))]
    return df

#task_df =list_tasks()
#mtask_df =list_tasks(multilingual=True)

def dict_to_query(d=dict(), **kwargs):
    d={**d,**kwargs}
    return '&'.join([f'`{k}`=="{v}"' for k,v in d.items()])

def load_preprocessing(tasks=tasks, **kwargs):
    _tasks_df = list_tasks(multilingual=tasks==lmtasks)
    y = _tasks_df.copy().query(dict_to_query(**kwargs)).iloc[0]
    preprocessing= copy.copy(getattr(tasks, y.preprocessing_name))
    for c in 'dataset_name','config_name':
        if not isinstance(getattr(preprocessing,c), str):
             setattr(preprocessing,c,getattr(y,c))
    return preprocessing

def load_task(id=None, dataset_name=None,config_name=None,task_name=None,preprocessing_name=None,
         max_rows=None, max_rows_eval=None, multilingual=False, instruct=False, seed=0, **load_dataset_kwargs):
    query = dict_of(id, dataset_name, config_name, task_name,preprocessing_name)
    query = {k:v for k,v in query.items() if v}
    _tasks = (lmtasks if multilingual else tasks)
    preprocessing = load_preprocessing(_tasks, **query)

    if "trust_remote_code" not in load_dataset_kwargs:
        load_dataset_kwargs["trust_remote_code"] = True
    
    dataset = load_dataset(preprocessing.dataset_name, preprocessing.config_name, **load_dataset_kwargs)
    dataset= preprocessing(dataset,max_rows, max_rows_eval)
    dataset.task_type = preprocessing.__class__.__name__
    if instruct:
        dataset=recast.recast_instruct(dataset)
    return dataset

================================================
FILE: src/tasksource/.ipynb_checkpoints/preprocess-checkpoint.py
================================================
from collections.abc import Iterable
from dotwiz import DotWiz
from dataclasses import dataclass
from typing import Union
import itertools
import funcy as fc
import exrex 
import magicattr 
import numpy as np
import copy
import datasets
import time

MAX_MC_OPTIONS = 4

def get_column_names(dataset):
    cn = dataset.column_names
    if type(cn)==dict:
        return set(fc.flatten(cn.values()))
    else:
        return set(cn)


def sample_dataset(dataset,n=10000, n_eval=1000,seed=0):
    for k in dataset:
        n_k=(n if k=='train' else n_eval)
        if n_k and len(dataset[k])>n_k:
            dataset[k]=dataset[k].train_test_split(train_size=n_k,seed=seed)['train']
    return dataset

class Preprocessing(DotWiz):
    default_splits = ('train','validation','test')
    _instances = []

    def __post_init__(self):
        Preprocessing._instances+=[self]

    @staticmethod
    def __map_to_target(x,fn=lambda x:None, target=None):
        x[target]=fn(x)
        return x
        
    def load(self):
        return self(datasets.load_dataset(self.dataset_name,self.config_name))

    def __call__(self,dataset, max_rows=None, max_rows_eval=None,seed=0):
        dataset = self.pre_process(dataset)

        # manage splits
        for k,v in zip(self.default_splits, self.splits):
            if v and k!=v:
                dataset[k]=dataset[v]
                del dataset[v]
            if k in dataset and not v: # obfuscated label
                del dataset[k]
        dataset = fix_splits(dataset)

        for k in list(dataset.keys()):
            if k not in self.default_splits:
                del dataset[k]
        dataset = sample_dataset(dataset, max_rows, max_rows_eval,seed=seed)
        
        # field annotated with a string
        substitutions = {v:k for k,v in self.to_dict().items()
            if (k and k not in {'splits','dataset_name','config_name'} 
            and type(v)==str and k!=v)}

        dataset=dataset.remove_columns([c for c in substitutions.values() if c in dataset['train'].features and c not in substitutions])
        dataset=dataset.rename_columns(substitutions)

        # field annotated with a function                                
        for k in self.to_dict().keys():
            v=getattr(self, k)
            if callable(v) and k not in {"post_process","pre_process","load"}:
                dataset=dataset.map(self.__map_to_target,
                                    fn_kwargs={'fn':v,'target':k})

        dataset=dataset.remove_columns(
            get_column_names(dataset)-set(self.to_dict().keys()))
        dataset = fix_labels(dataset)
        dataset = fix_splits(dataset) # again: label mapping changed
        dataset = self.post_process(dataset)
        return dataset


@dataclass
class cat(Preprocessing):
    fields:Union[str,list]=None
    separator:str=' '
        
    def __call__(self, example=None):
        y=[np.char.array(example[f]) + sep 
                for f,sep in zip(self.fields[::-1],itertools.repeat(self.separator))]
        y=list(sum(*y))
        if len(y)==1:
            y=y[0]
        return y


def pretty(f):
    class pretty_f(DotWiz):
        def __init__(self,*args):
            self.__f_arg = f(*args)
            for a in args:
                setattr(self,'value',a)
                
        def __call__(self, *args,**kwargs):
            return self.__f_arg(*args,**kwargs)

        def __repr__(self):
            return f"{self.__f_arg.__qualname__ .split('.')[0]}({self.value})"
    return pretty_f

class dotgetter:
    def __init__(self, path=''):
        self.path=path

    def __bool__(self):
        return bool(self.path)

    def __getattr__(self, k):
        return self.__class__(f'{self.path}.{k}'.lstrip('.'))
    
    def __getitem__(self, i):
        return self.__class__(f'{self.path}[{i}]')

    def __call__(self, example=None):
        return magicattr.get(DotWiz(example), self.path)

    def __hash__(self):
        return hash(self.path)


@dataclass
class ClassificationFields(Preprocessing):
    sentence1:str='sentence1'
    sentence2:str='sentence2'
    labels:str='labels'

@dataclass
class Seq2SeqLMFields(Preprocessing):
    prompt:str='prompt'
    output:str='output'

@dataclass
class TokenClassificationFields(Preprocessing):
    tokens:str='tokens'
    labels:str='labels'
        
@dataclass
class MultipleChoiceFields(Preprocessing):
    inputs:str='input'
    choices:Iterable=tuple()
    labels:str='labels'
    choices_list:str=None
    def __post_init__(self):
        for i, c in enumerate(self.choices):
            setattr(self,f'choice{i}',c)
        delattr(self,'choices')
        if not self.choices_list:
            delattr(self,'choices_list')
    
    def __call__(self,dataset, *args, **kwargs):
        dataset = super().__call__(dataset, *args, **kwargs)
        if self.choices_list:
            dataset = dataset.filter(lambda x: 1<len(x['choices_list']))
            n_options = min([len(x) for k in dataset for x in dataset[k]['choices_list']])
            n_options = min(MAX_MC_OPTIONS,n_options)
            dataset = dataset.map(self.flatten_choice_list, fn_kwargs={'n_options':n_options})

        else:
            dataset = dataset.map(self.sample_choices, fn_kwargs={'n_options':MAX_MC_OPTIONS})
        return dataset

    @staticmethod
    def flatten_choice_list(x, n_options=None):
        n_neg = n_options-1 if n_options else None
        choices = x['choices_list']
        label=x['labels']
        neg = choices[:label] + choices[label+1:]
        pos = choices[label]
        x['labels']=0
        x['choices_list']=[pos]+neg[:n_neg]
        for i,o in enumerate(x['choices_list']):
            x[f'choice{i}']=o
        del x['choices_list']
        return x

    @staticmethod
    def sample_choices(x, n_options=None):
        choices = [x[c] for c in x if 'choice' in c]
        if not MAX_MC_OPTIONS or len(choices)<=n_options:
            return x
        n_neg = n_options-1 if n_options else None
        label=x['labels']
        neg = choices[:label] + choices[label+1:]
        pos = choices[label]
        x['labels']=0
        choices_list=[pos]+neg[:n_neg]
        for c in list(x):
            if 'choice' in c:
                del x[c]
        for i,o in enumerate(choices_list):
            x[f'choice{i}']=o
        return x

@dataclass
class SharedFields:
    splits:list=Preprocessing.default_splits
    dataset_name:str = None
    config_name:str = None
    pre_process: callable = fc.identity
    post_process: callable = fc.identity
    #language:str="en"
    

@dataclass
class Classification(SharedFields, ClassificationFields): pass

@dataclass
class MultipleChoice(SharedFields, MultipleChoiceFields): pass

@dataclass
class TokenClassification(SharedFields, TokenClassificationFields): pass

@dataclass
class Seq2SeqLM(SharedFields, Seq2SeqLMFields): pass

get=dotgetter()
constant = pretty(fc.constantly)
regen = lambda x: list(exrex.generate(x))

def name(label_name, classes):
    return lambda x:classes[x[label_name]]

def fix_splits(dataset):

    if len(dataset)==1 and "train" not in dataset:
        k = list(dataset)[0]
        dataset['train'] = copy.deepcopy(dataset[k])
        del dataset[k]

    if 'auxiliary_train' in dataset:
        del dataset['auxiliary_train']
    
    if 'test' in dataset: # manage obfuscated labels
        if 'labels' in dataset['test'].features:
            if len(set(fc.flatten(dataset['test'].to_dict()['labels'])))==1:
                del dataset['test']

    if 'validation' in dataset and 'train' not in dataset:
        train_validation = dataset['validation'].train_test_split(0.5, seed=0)
        dataset['train'] = train_validation['train']
        dataset['validation']=train_validation['test']
    
    if 'validation' in dataset and 'test' not in dataset:
        validation_test = dataset['validation'].train_test_split(0.5, seed=0)
        dataset['validation'] = validation_test['train']
        dataset['test']=validation_test['test']

    if 'train' in dataset and 'validation' not in dataset:
        train_val = dataset['train'].train_test_split(train_size=0.90, seed=0)
        dataset['train'] = train_val['train']
        dataset['validation']=train_val['test']

    if 'test' in dataset and 'validation' not in dataset:
        validation_test = dataset['test'].train_test_split(0.5, seed=0)
        dataset['validation'] = validation_test['train']
        dataset['test']=validation_test['test']

    if 'validation' not in dataset and 'test' not in dataset:
        train_val_test = dataset["train"].train_test_split(train_size=0.90, seed=0)
        val_test = train_val_test["test"].train_test_split(0.5, seed=0)
        dataset["train"] = train_val_test["train"]
        dataset["validation"] = val_test["train"]
        dataset["test"] = val_test["test"]
        
    return dataset 

def fix_labels(dataset, label_key='labels'):
    if type(dataset['train'][label_key][0]) in [int,list,float]:
        return dataset
    labels=set(fc.flatten(dataset[k][label_key] for k in {"train"}))
    if set(labels)=={'entailment','neutral','contradiction'}:
        order=lambda x:dict(fc.flip(enumerate(['entailment','neutral','contradiction']))).get(x,x)
    else:
        order=str
    labels=sorted(labels, key=order)
    dataset=dataset.cast_column(label_key, datasets.ClassLabel(names=labels))
    return dataset

def concatenate_dataset_dict(l):
    """Concatenate a list of DatastDict objects sharing same splits and columns."""
    keys=l[0].keys()
    return datasets.DatasetDict({k: datasets.concatenate_datasets([x[k] for x in l]) for k in keys})

================================================
FILE: src/tasksource/.ipynb_checkpoints/recast-checkpoint.py
================================================
import random
from datasets import DatasetDict, Dataset
from sorcery import dict_of
import string

improper_labels =['recast/recast_kg_relations','linguisticprobing',"lex_glue/scotus",'lexical_relation_classification/ROOT09',"pragmeval/squinky","pragmeval/emobank",'pragmeval/persuasiveness']
improper_labels += ['glue/stsb', 'sick/relatedness', 'joci', 'utilitarianism', 'amazon_counterfactual/en', 'toxic_conversations', 'ethos/multilabel', 'lex_glue/eurlex', 'lex_glue/unfair_tos', 'app_reviews', 'humicroedit/subtask-1', 'stackoverflow-questions', 'go_emotions/simplified', 'google_wellformed_query', 'has_part', 'blog_authorship_corpus/age', 'promptCoherence', 'Sarcasm_News_Headline', 'auditor_review/demo-org--auditor_review', 'Dynasent_Disagreement', 'Politeness_Disagreement', 'SBIC_Disagreement', 'SChem_Disagreement', 'Dilemmas_Disagreement', 'sts-companion', 'acceptability-prediction', 'chaos-mnli-ambiguity', 'headline_cause/en_simple', 'oasst1_dense_flat', 'civil_comments']

improper_labels += ['stsb_multi_mt','MLMA_hate_speech','icl-symbol-tuning-instruct','zero-shot-label-nli']

improper_labels += ['essay-scoring','english-grading','HelpSteer','oasst2']

def render_options(options):
    options = [f'"{x}"' for x in options]
    return f"{', '.join(options[:-1])} or {options[-1]}"

def render_classification(text,options,answer):
    example = 'text_A→text_B' if text.startswith('text_A:') else 'the following'
    inputs = f'With no explanation, label {example} with either {render_options(options)}.\n{text}'
    targets = f"{answer}."
    return dict_of(inputs,targets)

def render_token_classification(tokens,options,labels):
    prefix = f'With no explanation, label each line with {render_options(options)} preceded by ":".\n'
    inputs = prefix+"\n".join(tokens)
    targets = "\n".join([':'.join(x) for x in zip(tokens,labels)])
    return dict_of(inputs,targets)

def render_multiple_choice(prompt, options, labels):
    inputs=(prompt+'\n' if prompt else '')
    letters = string.ascii_uppercase[:len(options)]
    inputs=f'With no explanation, chose the best option from {render_options(letters)}. {inputs}'    
    for letter, option in zip(letters, options):
        inputs+=f'\n{letter}: {option}'
    targets = f'{letters[labels]}.'
    return dict_of(inputs, targets) 

def negative_sample_options(y, labels,N=4):
    if len(labels)<N:
        return labels
    else:
        return [y]+random.sample([x for x in labels if x!=y], N-1)

def shuffle_choices(x):
    choices = sorted([k for k in x if 'choice' in k])
    choices_texts = [x[c] for c in choices]
    correct_choice =choices_texts[x['labels']]
    random.shuffle(choices_texts)
    for c, ct in zip(choices, choices_texts):
        x[c]=ct
    x["labels"]=choices_texts.index(correct_choice)
    return x

def recast_dataset_classification_to_mc(dataset,sep="[SEP]",N=4):

    def recast_split(d,N=N):
        labels = d.features['labels']
        df=d.to_pandas()
        df['inputs'] = df.sentence1
        if "sentence2" in df:
            df['inputs'] +=sep + df.sentence2

        N=min(N, len(labels.names))
        df['choices']=df.apply(lambda x:negative_sample_options(labels.int2str(x['labels']), labels.names,N),axis=1)     
        df['labels']=df.apply(lambda x:x['choices'].index(labels.int2str(x['labels'])),axis=1)

        for i in range(N):
            df[f'choice{i}']= "This example is " + df.choices.map(lambda x:x[i])

        choices = [f'choice{i}' for i in range(N)]
        return Dataset.from_pandas(df[['inputs',*choices,'labels']],preserve_index=False)

    return DatasetDict({k: recast_split(v) for k,v in dataset.items()})


def recast_instruct(dataset):
    features = dataset['train'].features
    labels = features['labels']

    if "sentence1" in features:
        task_type='Classification'
    if "choice0" in features:
        task_type = "MultipleChoice"
    if "tokens" in features:
        task_type = "TokenClassification"

    def recast_MultipleChoice(x):
        x=shuffle_choices(x)
        choices = sorted([k for k in x if 'choice' in k])
        if all([x[c] in x['inputs'] for c in choices]):
            return {"inputs":x['inputs'], 'targets': x[f"choice{x['labels']}"].strip()+"."}
        else:
            return render_multiple_choice(x['inputs'],[x[c] for c in choices],x['labels'])

    def recast_TokenClassification(x):
        distractors = list(labels.feature.names)
        x_labels = [labels.feature.int2str(y) for y in x['labels']]
        labels_set= list({labels.feature.int2str(y) for y in x['labels']})
        options=list(dict.fromkeys(labels_set+distractors))[:max(len(labels_set),10)]
        return render_token_classification(x['tokens'],options,x_labels)

    def recast_Classification(x):
        if 'sentence2' in x:
            text=f"text_A: {x['sentence1']}\ntext_B: {x['sentence2']}"
        else:
            text=x['sentence1']
            
        answer=labels.int2str(x['labels']).strip()
        options= negative_sample_options(answer, labels._int2str)
        return render_classification(text, options, answer)
        
    dataset = dataset.map(eval(f"recast_{task_type}"))
    dataset = dataset.remove_columns([k for k in features if k not in ['inputs','targets']])
    return dataset
 

================================================
FILE: src/tasksource/.ipynb_checkpoints/tasks-checkpoint.py
================================================
from .preprocess import cat, get, regen, name, constant, Classification, TokenClassification, MultipleChoice
from .metadata import bigbench_discriminative_english, blimp_hard, imppres_presupposition, imppres_implicature, udep_en_configs, udep_en_labels
from datasets import get_dataset_config_names, Sequence, ClassLabel, Dataset, DatasetDict

# variable name: dataset___config__task

###################### NLI/paraphrase ###############################

glue___mnli = Classification(sentence1="premise", sentence2="hypothesis", labels="label", splits=["train", None, "validation_matched"])
glue___qnli = Classification("question","sentence", labels="label")
glue___rte = Classification(sentence1="sentence1", sentence2="sentence2", labels="label")
glue___wnli = Classification(sentence1="sentence1", sentence2="sentence2", labels="label")
#glue___ax = Classification(sentence1="premise", sentence2="hypothesis", labels="label", splits=["test", None, None]) # fully masked

glue___mrpc = Classification(sentence1="sentence1", sentence2="sentence2", labels="label")
glue___qqp = Classification(sentence1="question1", sentence2="question2", labels="label")
glue___stsb = Classification(sentence1="sentence1", sentence2="sentence2", labels="label")

super_glue___boolq = Classification(sentence1="question", labels="label")
super_glue___cb = Classification(sentence1="premise", sentence2="hypothesis", labels="label")
super_glue___multirc = Classification(
    cat(["paragraph", "question"]),
    'answer',
    labels='label'
)
#super_glue___rte = Classification(sentence1="premise", sentence2="hypothesis", labels="label") # in glue
super_glue___wic = Classification(
    sentence1=cat(["word","sentence1"], " : "),
    sentence2=cat(["word","sentence2"], " : "),
    labels='label'
)
super_glue___axg = Classification(sentence1="premise", sentence2="hypothesis", labels="label", splits=["test", None, None])


anli__a1 = Classification('premise','hypothesis','label', splits=['train_r1','dev_r1','test_r1'])
anli__a2 = Classification('premise','hypothesis','label', splits=['train_r2','dev_r2','test_r2'])
anli__a3 = Classification('premise','hypothesis','label', splits=['train_r3','dev_r3','test_r3'])


babi_nli = Classification("premise", "hypothesis", "label",
    dataset_name="tasksource/babi_nli",
    config_name=set(get_dataset_config_names("tasksource/babi_nli"))-{"agents-motivations"}
) # agents-motivations task is not as clear-cut as the others


sick__label         = Classification('sentence_A','sentence_B','label')
sick__relatedness   = Classification('sentence_A','sentence_B','relatedness_score')
sick__entailment_AB = Classification('sentence_A','sentence_B','entailment_AB')
#sick__entailment_BA = Classification('sentence_A','sentence_B','entailment_BA')

def remove_neg_1(dataset):
    return dataset.filter(lambda x:x['labels']!=-1)

snli = Classification(sentence1="premise", sentence2="hypothesis", labels="label",
    post_process=remove_neg_1)

scitail = Classification("sentence1","sentence2","gold_label",config_name="snli_format")

hans = Classification(sentence1="premise", sentence2="hypothesis", labels="label")

wanli = Classification('premise','hypothesis','gold', dataset_name="alisawuffles/WANLI")

recast_nli = Classification(sentence1="context", sentence2="hypothesis", labels="label", dataset_name="tasksource/recast",
    config_name=['recast_kg_relations', 'recast_puns', 'recast_factuality', 'recast_verbnet',
    'recast_verbcorner', 'recast_ner', 'recast_sentiment', 'recast_megaveridicality'])


probability_words_nli = Classification(sentence1="context", sentence2="hypothesis", labels="label",
    dataset_name="sileod/probability_words_nli", 
    config_name=["reasoning_1hop","reasoning_2hop","usnli"])

nan_nli = Classification("premise", "hypothesis", "label", dataset_name="joey234/nan-nli")

nli_fever = Classification("premise","hypothesis","label",
    dataset_name="pietrolesci/nli_fever", splits=["train","dev",None])

breaking_nli = Classification("sentence1","sentence2","label",
    dataset_name="pietrolesci/breaking_nli", splits=["full",None,None])

conj_nli = Classification("premise","hypothesis","label",post_process=remove_neg_1,
    dataset_name="pietrolesci/conj_nli",splits=['train','dev',None])

fracas = Classification("premise","hypothesis","label",
    dataset_name="pietrolesci/fracas")

dialogue_nli = Classification("sentence1","sentence2","label",
    dataset_name="pietrolesci/dialogue_nli")   

mpe_nli = Classification("premise","hypothesis","label",
    dataset_name="pietrolesci/mpe",
    splits=["train","dev","test"])  

dnc_nli = Classification("context","hypothesis","label",
    dataset_name="pietrolesci/dnc")

# gpt3_nli = Classification("text_a","text_b","label",dataset_name="pietrolesci/gpt3_nli") # not sound enough

recast_white__fnplus = Classification("text","hypothesis","label",
    dataset_name="pietrolesci/recast_white",splits=['fnplus',None,None])
recast_white__sprl = Classification("text","hypothesis","label",
    dataset_name="pietrolesci/recast_white",splits=['sprl',None,None])
recast_white__dpr = Classification("text","hypothesis","label",
    dataset_name="pietrolesci/recast_white",splits=['dpr',None,None])

joci = Classification("context","hypothesis",
    labels=lambda x: [None, "impossible", "technically possible", "plausible", "likely", "very likely"][x["original_label"]],
    pre_process=lambda ds:ds.filter(lambda x:x['original_label']!=0),
    dataset_name="pietrolesci/joci",splits=['full',None,None])

#enfever_nli = Classification("evidence","claim","label", dataset_name="ctu-aic/enfever_nli")

robust_nli__IS_CS = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/robust_nli", splits=["IS_CS",None,None])
robust_nli__LI_LI = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/robust_nli", splits=["LI_LI",None,None])
robust_nli__ST_WO = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/robust_nli", splits=["ST_WO",None,None])
robust_nli__PI_SP = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/robust_nli", splits=["PI_SP",None,None])
robust_nli__PI_CD = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/robust_nli", splits=["PI_CD",None,None])
robust_nli__ST_SE = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/robust_nli", splits=["ST_SE",None,None])
robust_nli__ST_NE = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/robust_nli", splits=["ST_NE",None,None])
robust_nli__ST_LM = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/robust_nli", splits=["ST_LM",None,None])
robust_nli_is_sd = Classification("premise","hypothesis","label",
    dataset_name="pietrolesci/robust_nli_is_sd")
robust_nli_li_ts = Classification("premise","hypothesis","label",
    dataset_name="pietrolesci/robust_nli_li_ts")

gen_debiased_nli__snli_seq_z = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/gen_debiased_nli", splits=["snli_seq_z",None,None])
gen_debiased_nli__snli_z_aug = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/gen_debiased_nli", splits=["snli_z_aug",None,None])
gen_debiased_nli__snli_par_z = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/gen_debiased_nli", splits=["snli_par_z",None,None])
gen_debiased_nli__mnli_par_z = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/gen_debiased_nli", splits=["mnli_par_z",None,None])
gen_debiased_nli__mnli_z_aug = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/gen_debiased_nli", splits=["mnli_z_aug",None,None])
gen_debiased_nli__mnli_seq_z = Classification("premise","hypothesis","label",
	dataset_name="pietrolesci/gen_debiased_nli", splits=["mnli_seq_z",None,None])

add_one_rte = Classification("premise","hypothesis","label",
    dataset_name="pietrolesci/add_one_rte",splits=["train","dev","test"])

def _imppres_post_process(ds,prefix=''):
    # imppres entailment definition is either purely semantic or purely pragmatic
    # because of that, we assign differentiate the labels from anli/mnli notation
    return ds.cast_column('labels', ClassLabel(
    names=[f'{prefix}_entailment',f'{prefix}_neutral',f'{prefix}_contradiction']))

imppres__presupposition = imppres__prag = Classification("premise","hypothesis","gold_label",
    dataset_name="tasksource/imppres", config_name=imppres_presupposition,
    post_process=_imppres_post_process)

imppres__prag = Classification("premise","hypothesis","gold_label_prag",
    dataset_name="tasksource/imppres", config_name=imppres_implicature,
    post_process=lambda x: _imppres_post_process(x,'pragmatic'))

imppres__log = Classification("premise","hypothesis","gold_label_log",
    dataset_name="tasksource/imppres", config_name=imppres_implicature,
    post_process=lambda x: _imppres_post_process(x,'logical'))


#glue__diagnostics = Classification("premise","hypothesis","label",
#    dataset_name="pietrolesci/glue_diagnostics",splits=["test",None,None])

hlgd = Classification("headline_a", "headline_b", labels="label")

paws___labeled_final   = Classification("sentence1", "sentence2", name('label',['not_paraphrase','paraphrase']))
paws___labeled_swap    = Classification("sentence1", "sentence2", name('label',['not_paraphrase','paraphrase']), splits=["train", None, None])
#paws___unlabeled_final = Classification("sentence1", "sentence2", "label")

#quora = Classification(get.questions.text[0], get.questions.text[1], 'is_duplicate') # in glue
medical_questions_pairs = Classification("question_1","question_2", name("label",['not similar','similar']))
 
###################### Token Classification #########################

conll2003__pos_tags   = TokenClassification(tokens="tokens", labels='pos_tags')
conll2003__chunk_tags = TokenClassification(tokens="tokens", labels='chunk_tags')
conll2003__ner_tags   = TokenClassification(tokens="tokens", labels='ner_tags')

#tner___tweebank_ner    = TokenClassification(tokens="tokens", labels="tags")

######################## Multiple choice ###########################


model_written_evals = MultipleChoice('question', choices=['answer_matching_behavior','answer_not_matching_behavior'], labels=constant(0),  
    dataset_name="Anthropic/model-written-evals")

truthful_qa___multiple_choice = MultipleChoice(
    "question",
    choices_list=get.mc1_targets.choices,
    labels=constant(0)
)

fig_qa = MultipleChoice(
    "startphrase",
    choices=["ending1","ending2"],
    labels="labels",
    dataset_name="nightingal3/fig-qa",
    splits=["train","validation",None]
)

bigbench = MultipleChoice(
    'inputs',
    choices_list='multiple_choice_targets',
    labels=lambda x:x['multiple_choice_scores'].index(1) if 1 in ['multiple_choice_scores'] else -1,
    dataset_name='tasksource/bigbench',
    config_name=bigbench_discriminative_english - {"social_i_qa","intersect_geometry"} # english multiple choice tasks, minus duplicates
)
#"goal_step_wikihow"

blimp_hard = MultipleChoice(inputs=constant(''),
    choices=['sentence_good','sentence_bad'],
    labels=constant(0),
    dataset_name="blimp",
    config_name=blimp_hard # tasks where GPT2 is at least 10% below  human accuracy
)

cos_e = MultipleChoice('question',
    choices_list='choices',
    labels= lambda x: x['choices_list'].index(x['answer']),
    config_name='v1.0')

cosmos_qa = MultipleChoice(cat(['context','question']),regen('answer[0-3]'),'label')

dream = MultipleChoice(
    lambda x:"\n".join(x['dialogue']+[x['question']]),
    choices_list='choice',
    labels=lambda x:x['choices_list'].index(x['answer'])
)

openbookqa = MultipleChoice(
    'question_stem',
    choices_list=get.choices.text,
    labels='answerKey'
)

qasc = MultipleChoice(
    'question',
    choices_list=get.choices.text,
    labels=lambda x: "ABCDEFGH".index(x['answerKey']),
    splits=['train','validation',None]
    
)

quartz = MultipleChoice(
    'question',
    choices_list=get.choices.text,
    labels='answerKey'
)
quail = MultipleChoice(
    cat(['context','question']),
    choices_list='answers',
    labels='correct_answer_id' 
)

head_qa___en = MultipleChoice("qtext",
    choices_list = lambda x:[a['atext'] for a in x["answers"]],
    labels = lambda x:[a['aid'] for a in x["answers"]].index(x["ra"])
)


sciq = MultipleChoice(
    'question',
    ['correct_answer']+regen('distractor[1-3]'),
    labels=constant(0))

social_i_qa = MultipleChoice(
    'question',
    ['answerA','answerB','answerC'],
    'label')

wiki_hop___original = MultipleChoice(
    'question', 
    choices_list='candidates',
    labels=lambda x:x['choices_list'].index(x["answer"]))

wiqa = MultipleChoice('question_stem',
    choices_list = lambda x: x['choices']['text'],
    labels='answer_label_as_choice')

piqa = MultipleChoice('goal', choices=['sol1','sol2'], labels='label')

hellaswag = MultipleChoice('ctx_a',
    choices_list=lambda x: [f'{x["ctx_b"]}{e}' for e in x["endings"]],
    labels='label', splits=['train','validation',None])

super_glue___copa = MultipleChoice('premise',['choice1','choice2'],'label')

balanced_copa = MultipleChoice('premise',['choice1','choice2'],'label',
    dataset_name="pkavumba/balanced-copa")

e_care = MultipleChoice('premise',['choice1','choice2'],'label',
    dataset_name="12ml/e-CARE")

art = MultipleChoice(cat(['hypothesis_1','hypothesis_2']),
    ['observation_1','observation_2'],
    labels=lambda x:x['label']-1,
    splits=['train','validation',None]
)


mmlu = MultipleChoice('question',labels='answer',choices_list='choices',splits=['validation','dev','test'],
    dataset_name="tasksource/mmlu",
    config_name=get_dataset_config_names("tasksource/mmlu")
)

winogrande = MultipleChoice('sentence',['option1','option2'],'answer',config_name='winogrande_xl',
    splits=['train','validation',None])

codah = MultipleChoice('question_propmt',choices_list='candidate_answers',labels='correct_answer_idx',config_name='codah')

ai2_arc__challenge = MultipleChoice('question',
    choices_list=get.choices.text,  
    labels=lambda x: get.choices.label(x).index(x["answerKey"]),
    config_name=["ARC-Challenge","ARC-Easy"])

definite_pronoun_resolution = MultipleChoice(
    inputs=cat(["sentence","pronoun"],' : '),
    choices_list='candidates',
    labels="label",
    splits=['train',None,'test'])

swag___regular=MultipleChoice(cat(["sent1","sent2"]),regen("ending[0-3]"),"label")

def _split_choices(s):
    import re
    return [x.rstrip(', ') for x in re.split(r'[a-e] \) (.*?)',s) if x.strip(', ')]

math_qa = MultipleChoice(
    'Problem', 
    choices_list = lambda x: _split_choices(x['options']),
    labels = lambda x:'abcde'.index(x['correct'])   
)

#aqua_rat___tokenized = MultipleChoice("question",choices_list="options",labels=lambda x:"ABCDE".index(x['correct'])) in math_qa


######################## Classification (other) ########################
glue___cola = Classification(sentence1="sentence", labels="label")
glue___sst2 = Classification(sentence1="sentence", labels="label")

utilitarianism = Classification("comparison",labels="label",
dataset_name="metaeval/utilitarianism")

amazon_counterfactual = Classification(
    "text", labels="label",
    dataset_name="mteb/amazon_counterfactual",
    config_name="en")

insincere_questions = Classification(
    "text", labels="label_text",
    dataset_name="SetFit/insincere-questions")

toxic_conversations = Classification(
    "text", labels="label",
    dataset_name="SetFit/toxic_conversations")

turingbench = Classification("Generation",labels="label",
    dataset_name="turingbench/TuringBench",
    splits=["train","validation",None])


trec = Classification(sentence1="text", labels="fine_label")

tals_vitaminc = Classification('claim','evidence','label', dataset_name="tals/vitaminc")

hope_edi = Classification("text", labels="label", splits=["train", "validation", None], config_name=["english"])

#fever___v1_0 = Classification(sentence1="claim", labels="label", splits=["train", "paper_dev", "paper_test"], dataset_name="fever", config_name="v1.0")
#fever___v2_0 = Classification(sentence1="claim", labels="label", splits=[None, "validation", None], dataset_name="fever", config_name="v2.0")

rumoureval_2019 = Classification(
    sentence1="source_text",
    sentence2=lambda x: str(x["reply_text"]),
    labels="label", dataset_name="strombergnlp/rumoureval_2019", config_name="RumourEval2019",
    post_process=lambda ds:ds.filter(lambda x:x['labels']!=None)    
)

ethos___binary = Classification(sentence1="text", labels="label", splits=["train", None, None])
ethos___multilabel = Classification(
    'text',
    labels=lambda x: [x[c] for c in
    ['violence', 'gender', 'race', 'national_origin', 'disability', 'religion', 'sexual_orientation','directed_vs_generalized']
    ],
    splits=["train", None, None]
)

tweet_eval = Classification(sentence1="text", labels="label",
    config_name=["emoji", "emotion", "hate", "irony", "offensive", "sentiment"])

def stance_kwargs(topic):
    return {
        "sentence1": constant(f'Topic: {topic}. \n Opinion:\n'), 
        "sentence2": "text", 
        "labels": "label", 
        "config_name": f"stance_{topic.lower()}",
        "dataset_name": "tweet_eval"
    }

tweet_eval_abortion = Classification(**stance_kwargs("abortion"))
tweet_eval_atheism  = Classification(**stance_kwargs("atheism"))
tweet_eval_climate  = Classification(**stance_kwargs("climate"))
tweet_eval_feminist = Classification(**stance_kwargs("feminist"))
tweet_eval_hillary  = Classification(**stance_kwargs("Hillary"))


discovery = Classification("sentence1", "sentence2", labels="label", config_name=["discovery"])

pragmeval_1 = Classification("sentence",labels="label",
    dataset_name="pragmeval",
    config_name= ["emobank-arousal", "emobank-dominance", "emobank-valence", "squinky-formality", "squinky-implicature", 
    "squinky-informativeness","switchboard","mrda","verifiability"])

pragmeval_2 = Classification("sentence1","sentence2",labels="label",
    dataset_name="pragmeval",
    config_name= ["emergent", "gum", "pdtb", "persuasiveness-claimtype", 
    "persuasiveness-eloquence", "persuasiveness-premisetype", "persuasiveness-relevance", "persuasiveness-specificity", 
    "persuasiveness-strength", "sarcasm","stac"])

silicone = Classification("Utterance",labels="Label",
    config_name=['dyda_da', 'dyda_e', 'iemocap', 'maptask', 'meld_e', 'meld_s', 'oasis', 'sem'] # +['swda', 'mrda'] # in pragmeval
)

lex_glue___eurlex = Classification(sentence1="text", labels="labels") 
lex_glue___scotus = Classification(sentence1="text", labels="label")
lex_glue___ledgar = Classification(sentence1="text", labels="label")
lex_glue___unfair_tos = Classification(sentence1="text", labels="labels")
lex_glue___case_hold = MultipleChoice("context", choices_list='endings', labels="label")

language_identification = Classification("text",labels="labels", dataset_name="papluca/language-identification")

################ Automatically generated (verified)##########

imdb = Classification(sentence1="text", labels="label", splits=["train", None, "test"])

rotten_tomatoes = Classification(sentence1="text", labels="label")

ag_news = Classification(sentence1="text", labels="label", splits=["train", None, "test"])

yelp_review_full = Classification(sentence1="text", labels="label", splits=["train", None, "test"], config_name=["yelp_review_full"])

financial_phrasebank = Classification(sentence1="sentence", labels="label", splits=["train", None, None],
    config_name=["sentences_allagree"])

poem_sentiment = Classification(sentence1="verse_text", labels="label")

#emotion = Classification(sentence1="text", labels="label") # file not found

dbpedia_14 = Classification(sentence1="content", labels="label", splits=["train", None, "test"], config_name=["dbpedia_14"])

amazon_polarity = Classification(sentence1="content", labels="label", splits=["train", None, "test"], config_name=["amazon_polarity"])

app_reviews = Classification("review", labels="star", splits=["train", None, None])

# multi_nli = Classification(sentence1="premise", sentence2="hypothesis", labels="label", splits=["train", "validation_matched", None]) #glue

hate_speech18 = Classification(sentence1="text", labels="label", splits=["train", None, None])

sms_spam = Classification(sentence1="sms", labels="label", splits=["train", None, None])

humicroedit___subtask_1 = Classification("original", "edit", labels="meanGrade", dataset_name="humicroedit", config_name="subtask-1")
humicroedit___subtask_2 = Classification(
    sentence1=cat(['original1','edit1'],' : '),
    sentence2=cat(['original2','edit2'],' : '),
    labels="label", dataset_name="humicroedit", config_name="subtask-2")

snips_built_in_intents = Classification(sentence1="text", labels="label", splits=["train", None, None])

banking77 = Classification(sentence1="text", labels="label", splits=["train", None, "test"])

hate_speech_offensive = Classification(sentence1="tweet", labels="class", splits=["train", None, None])

yahoo_answers_topics = Classification(
    "question_title","question_content",labels="topic")

stackoverflow_questions=Classification("title","body",labels="label",
    dataset_name="pacovaldez/stackoverflow-questions")

#hyperpartisan_news_detection___byarticle = Classification(sentence1="text", labels="hyperpartisan", splits=["train", None, None]) # files too heavy
#hyperpartisan_news_detection___bypublisher = Classification(sentence1="text", labels="hyperpartisan", splits=["train","validation", None]) # files too heavy

hyperpartisan_news = Classification(
    "text",
    labels=lambda x: {'true':'hyperpartisan','false':'not_hyperpartisan'}.get(x["label"]),
    dataset_name="zapsdcn/hyperpartisan_news")

scierc = Classification("text",labels="label",dataset_name="zapsdcn/sciie")
citation_intent = Classification("text",labels="label",dataset_name="zapsdcn/citation_intent")

#go_emotions___raw = Classification(sentence1="text", splits=["train", None, None])
go_emotions___simplified = Classification(sentence1="text", labels="labels")

#boolq = Classification(sentence1="question", splits=["train", "validation", None]) # in superglue

#ecthr_cases___alleged_violation_prediction = Classification(labels="labels", dataset_name="ecthr_cases", config_name="alleged-violation-prediction")
#ecthr_cases___violation_prediction = Classification(labels="labels", dataset_name="ecthr_cases", config_name="violation-prediction")
#   too long

scicite = Classification(sentence1="string", labels="label",dataset_name="allenai/scicite")

liar = Classification(sentence1="statement", labels="label")

relbert_lexical_relation_classification = Classification(sentence1="head", sentence2="tail", labels="relation",
 dataset_name="relbert/lexical_relation_classification",
 config_name=["BLESS","CogALexV","EVALution","K&H+N","ROOT09"])


linguisticprobing = Classification("sentence", labels="label", dataset_name="tasksource/linguisticprobing", 
    config_name=['subj_number',
                'obj_number',
                'past_present',
                'sentence_length',
                'top_constituents',
                'tree_depth',
                'coordination_inversion',
                'odd_man_out',
                'bigram_shift']#+['word_content'] #too many labels 
)

crowdflower = Classification("text", labels="label",
 splits=["train", None, None], dataset_name="tasksource/crowdflower",
 config_name=['sentiment_nuclear_power',
            'tweet_global_warming',
            'airline-sentiment',
            'corporate-messaging',
            'economic-news',
            'political-media-audience',
            'political-media-bias',
            'political-media-message',
            'text_emotion']
)

ethics___commonsense = Classification(sentence1="text", labels="label", dataset_name="metaeval/ethics", config_name="commonsense")
ethics___deontology = Classification(sentence1="text", labels="label", dataset_name="metaeval/ethics", config_name="deontology")
ethics___justice = Classification(sentence1="text", labels="label", dataset_name="metaeval/ethics", config_name="justice")
ethics___virtue = Classification(sentence1="sentence1", sentence2="sentence2", labels="label", dataset_name="metaeval/ethics", config_name="virtue")

emo = Classification(sentence1="text", labels="label", splits=["train", None, "test"], config_name=["emo2019"])

google_wellformed_query = Classification(sentence1="content", labels="rating")

tweets_hate_speech_detection = Classification(sentence1="tweet", labels="label", splits=["train", None, None])

#adv_glue___adv_sst2 = Classification(sentence1="sentence", labels="label", splits=["validation", None, None])
#adv_glue___adv_qqp = Classification(sentence1="question1", sentence2="question2", labels="label", splits=["validation", None, None])
#adv_glue___adv_mnli = Classification(sentence1="premise", sentence2="hypothesis", labels="label", splits=["validation", None, None])
#adv_glue___adv_mnli_mismatched = Classification(sentence1="premise", sentence2="hypothesis", labels="label", splits=["validation", None, None])
#adv_glue___adv_qnli = Classification(sentence1="question", labels="label", splits=["validation", None, None])
#adv_glue___adv_rte = Classification(sentence1="sentence1", sentence2="sentence2", labels="label", splits=["validation", None, None])

has_part = Classification("arg1","arg2", labels="score", splits=["train", None, None])

wnut_17 = TokenClassification(tokens="tokens", labels="ner_tags", config_name=["wnut_17"])

ncbi_disease = TokenClassification(tokens="tokens", labels="ner_tags", config_name=["ncbi_disease"])

acronym_identification = TokenClassification(labels="labels", tokens="tokens")

jnlpba = TokenClassification(tokens="tokens", labels="ner_tags", splits=["train", "validation", None], config_name=["jnlpba"])

#species_800 = TokenClassification(tokens="tokens", labels="ner_tags", config_name=["species_800"]) missing files

SpeedOfMagic_ontonotes_english = TokenClassification(tokens="tokens", labels="ner_tags", dataset_name="SpeedOfMagic/ontonotes_english", config_name="SpeedOfMagic--ontonotes_english")

blog_authorship_corpus__gender    = Classification(sentence1="text",labels="gender")
blog_authorship_corpus__age       = Classification(sentence1="text",labels="age")
#blog_authorship_corpus__horoscope = Classification(sentence1="text",labels="horoscope")
blog_authorship_corpus__job       = Classification(sentence1="text",labels="job")

launch_open_question_type = Classification(sentence1="question", labels="resolve_type", dataset_name="launch/open_question_type")

health_fact = Classification(sentence1="claim", labels="label",
    pre_process = lambda ds:ds.filter(lambda x:x['label'] not in {-1})
)

commonsense_qa = MultipleChoice(
    "question",
    choices_list=get.choices.text,
    labels=lambda x: "ABCDE".index(x["answerKey"]),
    splits=["train","validation",None]
)
mc_taco = Classification(
    lambda x: f'{x["sentence"]} {x["question"]} {x["answer"]}',
    labels="label",
    splits=[ "validation",None,"test"]
)

ade_corpus_v2___Ade_corpus_v2_classification = Classification("text",labels="label")

discosense = MultipleChoice("context",choices=regen("option\_[0-3]"),labels="label",
    dataset_name="prajjwal1/discosense")
    
circa = Classification(
    sentence1=cat(["context","question-X"]),
    sentence2="answer-Y",
    labels="goldstandard2", post_process=remove_neg_1)

#code_x_glue_cc_defect_detection = Classification("func", labels="target")

#code_x_glue_cc_clone_detection_big_clone_bench = Classification("func1", "func2", "label") # in bigbench + too heavy (100g)

#code_x_glue_cc_code_refinement = MultipleChoice(
#    constant(""), choices=["buggy","fixed"], labels=constant(0),
#    config_name="medium")

#effective_feedback_student_writing = Classification("discourse_text", 
#labels="discourse_effectiveness",dataset_name="YaHi/EffectiveFeedbackStudentWriting")
# discontinued /!\

#promptSentiment = Classification("text",labels="label",dataset_name="Ericwang/promptSentiment")
#promptNLI = Classification("premise","hypothesis",labels="label",dataset_name="Ericwang/promptNLI")
#promptSpoke = Classification("text",labels="label",dataset_name="Ericwang/promptSpoke")
#promptProficiency = Classification("text",labels="label",dataset_name="Ericwang/promptProficiency")
#promptGrammar = Classification("text",labels="label",dataset_name="Ericwang/promptGrammar")
#promptCoherence = Classification("text",labels="label",dataset_name="Ericwang/promptCoherence")

phrase_similarity = Classification(
    sentence1=cat(["phrase1","sentence1"], " : "),
    sentence2=cat(["phrase2","sentence2"], " : "),
    labels='label',
    dataset_name="PiC/phrase_similarity"
)

exaggeration_detection = Classification(
    sentence1="press_release_conclusion",
    sentence2="abstract_conclusion",
    labels="exaggeration_label", 
    dataset_name="copenlu/scientific-exaggeration-detection"
)
quarel = Classification(
    "question",
    labels=lambda x: "AB"[x["answer_index"]]
)

mwong_fever_evidence_related = Classification(sentence1="claim", sentence2="evidence", labels=name("labels",['unrelated','related']),
    splits=["train", "valid", "test"], dataset_name="mwong/fever-evidence-related")

numer_sense = Classification("sentence",labels="target",splits=["train",None,None])

dynasent__r1 = Classification("sentence", labels="gold_label", 
    dataset_name="dynabench/dynasent", config_name="dynabench.dynasent.r1.all")
dynasent__r2 = Classification("sentence", labels="gold_label", 
    dataset_name="dynabench/dynasent", config_name="dynabench.dynasent.r2.all")

sarcasm_news = Classification("headline", labels="is_sarcastic",
    dataset_name="raquiba/Sarcasm_News_Headline")

sem_eval_2010_task_8 = Classification("sentence",labels="relation")

auditor_review = Classification(sentence1="sentence",
    labels=name("label",['negative','neutral','positive']),
    dataset_name="demo-org/auditor_review")

medmcqa = MultipleChoice("question", choices=regen('op[a-d]'),labels='cop')


dynasent_disagreement    = Classification("text", labels="binary_disagreement", dataset_name="RuyuanWan/Dynasent_Disagreement")
politeness_disagreement  = Classification("text", labels="binary_disagreement", dataset_name="RuyuanWan/Politeness_Disagreement")
sbic_disagreement        = Classification("text", labels="binary_disagreement", dataset_name="RuyuanWan/SBIC_Disagreement")
schem_disagreement       = Classification("text", labels="binary_disagreement", dataset_name="RuyuanWan/SChem_Disagreement")
dilemmas_disagreement    = Classification("text", labels="binary_disagreement", dataset_name="RuyuanWan/Dilemmas_Disagreement")

logiqa = MultipleChoice(
    cat(["context","query"]),
    choices_list = 'options',
    labels = "correct_option",
    dataset_name="lucasmccabe/logiqa"
)

#proto_qa = MultipleChoice(
#    "question",
#    choices_list=lambda x:x['answer-clusters']['answers'],
#    labels=lambda x: x['answer-clusters']['count'].index(max(x['answer-clusters']['count'])),
#    config_name='proto_qa'
#)

wiki_qa = Classification("question","answer", name("label",['False','True']))

cycic_classification = Classification("question",labels=name("correct_answer",['False','True']),
    dataset_name = "tasksource/cycic_classification")
cycic_mc = MultipleChoice("question", choices=regen('answer\_option[0-4]'), labels="correct_answer",
    dataset_name = "tasksource/cycic_multiplechoice")


def _preprocess_chatgpt_detection(ex):
    import random
    label=random.random()<0.5
    ex['label']=int(label)
    ex['answer']=[str(ex['human_answers'][0]),str(ex['chatgpt_answers'][0])][label]
    return ex

#chatgpt_detection = Classification("question","answer","label",
#    dataset_name = 'Hello-SimpleAI/HC3', config_name="all",
#    pre_process=lambda dataset:dataset.map(_preprocess_chatgpt_detection))

sts_companion = Classification("sentence1","sentence2","label",
    dataset_name="tasksource/sts-companion")

commonsense_qa_2 = Classification("question",labels="answer",
    dataset_name="tasksource/commonsense_qa_2.0")

ling_nli = Classification("premise","hypothesis","label",dataset_name="tasksource/lingnli")

monotonicity_entailment = Classification("sentence1", "sentence2", "gold_label",    
    dataset_name="tasksource/monotonicity-entailment")

arct = MultipleChoice(cat(["reason","claim"]),choices=["warrant0","warrant1"],
    labels="correctLabelW0orW1", dataset_name="tasksource/arct")

scinli = Classification("sentence1", "sentence2", labels="label",
    post_process=lambda x:x.shuffle(seed=0),
    dataset_name="tasksource/scinli")

naturallogic = Classification(" sent1 "," sent2 "," new_label ",dataset_name="tasksource/naturallogic")

onestop_qa = MultipleChoice(cat(["paragraph","question"]),choices_list="answers",
    labels=constant(0))

moral_stories = MultipleChoice(cat(["situation","intention"]),
    choices=['moral_action',"immoral_action"],labels=constant(0),
    dataset_name="demelin/moral_stories", config_name="full")

prost = MultipleChoice(cat(["context","ex_question"]), choices=['A','B','C','D'],labels="label",
    dataset_name="corypaik/prost")

dyna_hate = Classification("text",labels="label",dataset_name="aps/dynahate",splits=['train',None,None])

syntactic_augmentation_nli = Classification('sentence1',"sentence2","gold_label",dataset_name="metaeval/syntactic-augmentation-nli")

autotnli = Classification("premises", "hypothesis", "label", dataset_name="tasksource/autotnli")
#equate = Classification("sentence1", "sentence2", "gold_label",dataset_name="metaeval/equate")

conqada = Classification("sentence1","sentence2","label",dataset_name="lasha-nlp/CONDAQA",
    pre_process = lambda ds:ds.filter(lambda x:x['label'] in {"DON'T KNOW","YES","NO"})
)

webgbpt_comparisons = MultipleChoice(get.question.full_text, choices=['answer_0','answer_1'],
    labels=lambda x:int(x['score_1']>0),
    dataset_name="openai/webgpt_comparisons")

synthetic_instruct = MultipleChoice('prompt', choices=['chosen', 'rejected'],
    labels=constant(0), dataset_name="Dahoas/synthetic-instruct-gptj-pairwise")

scruples = Classification("text",labels="binarized_label",dataset_name="metaeval/scruples")

wouldyourather = MultipleChoice(constant('Most people would rather:'), choices=['option_a','option_b'],
    labels= lambda x: int(x['votes_a']<x['votes_b']),
    dataset_name="metaeval/wouldyourather")

#attempto_nli = Classification("premise","hypothesis",
#    lambda x:f'race-{x["race_label"]}',
#    dataset_name="sileod/attempto-nli")

defeasible_nli = Classification(cat(["Premise","Hypothesis"]),"Update",labels="UpdateType",
    dataset_name="metaeval/defeasible-nli",config_name=['atomic', 'snli'])

#defeasible_nli_social = Classification(cat(["SocialChemROT","Hypothesis"]),"Update",labels="UpdateType",
#    dataset_name="metaeval/defeasible-nli",config_name='social')

help_nli = Classification("ori_sentence","new_sentence","gold_label",
    dataset_name="tasksource/help-nli")
    
nli_veridicality_transitivity = Classification("sentence1","sentence2","gold_label",
    dataset_name="metaeval/nli-veridicality-transitivity")

lonli = Classification("premise","hypothesis","label",
    dataset_name="tasksource/lonli")

dadc_limit = Classification("sentence1","sentence2","label",
    dataset_name="tasksource/dadc-limit-nli")

flute = Classification("premise","hypothesis","label",
    dataset_name="ColumbiaNLP/FLUTE")

strategy_qa = Classification('question',labels='answer',
    dataset_name="tasksource/strategy-qa",splits=['train',None,None])

summarize_from_feedback = MultipleChoice(get.info.post,
    choices_list=lambda x: [x['summaries'][0]['text'],x['summaries'][1]['text']],
    labels="choice",
    dataset_name="openai/summarize_from_feedback", config_name="comparisons",
    pre_process = lambda ds:ds.filter(lambda x: type(get.info.post(x))==str)
)

folio = Classification("premises","conclusion",
    labels=lambda x:{'False':'contradiction','True':'entailment', 'Uncertain':'neutral'}.get(x["label"]),
    dataset_name="tasksource/folio")

tomi_nli = Classification("premise","hypothesis","label",
    dataset_name="tasksource/tomi-nli")

avicenna = Classification("Premise 1","Premise 2","Syllogistic relation",
    dataset_name="tasksource/avicenna")

shp = MultipleChoice("history",
    choices=['human_ref_A','human_ref_B'],
    labels="labels",
    dataset_name="stanfordnlp/SHP")

medqa_usmle = MultipleChoice('sent1',choices=regen('ending[0-3]'),labels='label',
    dataset_name="GBaker/MedQA-USMLE-4-options-hf")

wikimedqa = MultipleChoice("text",choices=regen('option\_[0-7]'),labels='label',
    dataset_name="sileod/wikimedqa",
    config_name=["medwiki"])

cicero = MultipleChoice(lambda x: " ".join(x['Dialogue']),
    choices_list="Choices", labels=lambda x:x['Human Written Answer'][0],
    dataset_name="declare-lab/cicero")

creak = Classification("sentence",labels="label",
    dataset_name='amydeng2000/CREAK')

mutual = MultipleChoice("article",choices_list="options",
    labels=lambda x: "ABCD".index(x['answers']),
    dataset_name="tasksource/mutual",splits=["train",None,None])

neqa = MultipleChoice('prompt',choices_list='classes',labels="answer_index",
    dataset_name="inverse-scaling/NeQA")
quote_repetition = MultipleChoice('prompt',choices_list='classes',labels="answer_index",
    dataset_name="inverse-scaling/quote-repetition")
redefine_math = MultipleChoice('prompt',choices_list='classes',labels="answer_index",
    dataset_name="inverse-scaling/redefine-math")

puzzte = Classification("puzzle_text","question","answer",
    dataset_name="tasksource/puzzte")

implicatures = MultipleChoice(cat(['context','response'],"\n"),
    choices=['correct_implicature','incorrect_implicature'],
    labels=constant(0),
    dataset_name='tasksource/implicatures')

race = MultipleChoice(cat(['question','article'],'\n'), choices_list='options',
    labels=lambda x:'ABCDE'.index(x['answer']),
    config_name=['middle','high'])

race_c = MultipleChoice(cat(['question','article'],'\n'),choices_list='option',labels='label',
    dataset_name='tasksource/race-c')

spartqa_yn=Classification("story","question","answer",
    dataset_name="tasksource/spartqa-yn")

spartqa_mc=MultipleChoice(cat(["story","question"]),choices_list="candidate_answers",labels="answer",
    dataset_name="tasksource/spartqa-mchoice")

temporal_nli = Classification("Premise","Hypothesis","Label",
    dataset_name="tasksource/temporal-nli")

riddle_sense = MultipleChoice("question", choices_list=get.choices.text, 
    labels=lambda x : "ABCDE".index(x['answerKey']))

clcd = Classification(
    "sentence1","sentence2","label",
    dataset_name="tasksource/clcd-english")

twentyquestions = Classification("question","subject","answer",dataset_name="maximedb/twentyquestions")

reclor = MultipleChoice(cat(["context","question"]),choices_list="answers",labels="label",
    dataset_name="metaeval/reclor",splits=['train','validation',None])

c_aug_imdb = Classification("Text",labels="Sentiment",
    dataset_name='tasksource/counterfactually-augmented-imdb')

c_aug_snli = Classification("sentence1","sentence2","gold_label",
    dataset_name='tasksource/counterfactually-augmented-snli')

cnli = Classification("premise","hypothesis","label",
    dataset_name='metaeval/cnli')

perturbed_boolq = Classification("question",labels="hard_label",
    dataset_name='tasksource/boolq-natural-perturbations')

#mega_acceptability = Classification("sentence",labels="average",
#    dataset_name='metaeval/mega-acceptability-v2')

graded_acceptability = Classification("text",labels="normalized_score",
    dataset_name="metaeval/acceptability-prediction")

equate = Classification("sentence1","sentence2","gold_label",
    dataset_name='metaeval/equate')

science_qa = MultipleChoice("question",choices_list="choices",labels="answer",
    dataset_name="tasksource/ScienceQA_text_only")

ekar=MultipleChoice("question",choices_list=get.choices.text,
    labels=lambda x:"ABCD".index(x['answerKey']),
dataset_name="Jiangjie/ekar_english")

implicit_hate = Classification("post",labels="class",
    dataset_name="tasksource/implicit-hate-stg1")

nli_unambiguity = Classification("premise","hypothesis","gini",
    dataset_name="metaeval/chaos-mnli-ambiguity")

headline_cause = Classification('left_title','right_title','label',
    dataset_name='IlyaGusev/headline_cause',config_name='en_simple')

logiqa_2 = Classification("premise","hypothesis","label",dataset_name="tasksource/logiqa-2.0-nli")

_oasst = dict(dataset_name="tasksource/oasst2_dense_flat",
    pre_process = lambda ds:ds.filter(lambda x:x['lang']=='en'))

oasst1__quality = Classification("parent_text","text",labels="quality",**_oasst)
oasst1__toxicity = Classification("parent_text","text",labels="toxicity",**_oasst)
oasst1__helpfulness = Classification("parent_text","text",labels="helpfulness",**_oasst)

mindgames = Classification("premise","hypothesis","label",dataset_name="sileod/mindgames")

def _udep_post_process(ds):
    return ds.cast_column('labels', Sequence(ClassLabel(names=udep_en_labels)))

udep__deprel = TokenClassification('tokens',lambda x:[udep_en_labels.index(a) for a in x['deprel']],
    config_name=udep_en_configs,dataset_name="universal_dependencies",post_process=_udep_post_process)

ambient= Classification("premise","hypothesis","hypothesis_ambiguous",dataset_name="metaeval/ambient")

path_naturalness = MultipleChoice(constant(""),choices=['choice1','choice2'],labels="label",
    dataset_name="metaeval/path-naturalness-prediction")

civil_comments__toxicity = Classification("text",labels="toxicity")
civil_comments__severe_toxicity = Classification("text",labels="severe_toxicity")
civil_comments__obscene = Classification("text",labels="obscene")
civil_comments__threat = Classification("text",labels="threat")
civil_comments__insult = Classification("text",labels="insult")
civil_comments__identity_attack = Classification("text",labels="identity_attack")
civil_comments__sexual_explicit = Classification("text",labels="sexual_explicit")

cloth = MultipleChoice("sentence", choices_list=lambda x:[x["answer"]]+x["distractors"],labels=constant(0), dataset_name="AndyChiang/cloth")
dgen  = MultipleChoice("sentence", choices_list=lambda x:[x["answer"]]+x["distractors"],labels=constant(0), dataset_name="AndyChiang/dgen")

i2d2 = Classification("sentence1",labels=name('label',['False','True']), dataset_name="tasksource/I2D2")

arg_me = Classification('argument','conclusion','stance', dataset_name="webis/args_me")
valueeval_stance = Classification("Premise","Conclusion","Stance", dataset_name="webis/Touche23-ValueEval")
starcon = Classification('argument','topic','label',dataset_name="tasksource/starcon")

banking77 = Classification("text",labels="label",dataset_name="PolyAI/banking77")
    
control = Classification('premise','hypothesis',"label",dataset_name="tasksource/ConTRoL-nli")
tracie = Classification("premise","hypothesis","answer",dataset_name='tasksource/tracie')
sherliic = Classification("premise","hypothesis","label",dataset_name='tasksource/sherliic')

sen_making__1 = MultipleChoice(constant('Chose most plausible:'), choices=['sentence0','sentence1'],labels='false', 
    dataset_name="tasksource/sen-making")

sen_making__2 = MultipleChoice(lambda x: [x['sentence0'],x['sentence1']][x['false']] + '\n is not plausible because :',
    choices=['A','B','C'],labels=lambda x: 'ABC'.index(x['reason']), dataset_name="tasksource/sen-making")

winowhy = Classification('sentence', lambda x: f'In "{x["wnli_sent1"]}", {x["wnli_sent2"]}',
    labels=name('label',['False','True']), dataset_name="tasksource/winowhy")

#for CFG in "cognitive-bias", "fake-news", "gender-bias", "hate-speech", "linguistic-bias", "political-bias", "racial-bias", "text-level-bias":
#    print(f"mbib__{CFG.replace('-','_')} = Classification('text',labels=name('label',['not {CFG}','{CFG}']), dataset_name='mediabiasgroup/mbib-base', config_name='{CFG}')")

"""
mbib_cognitive_bias	= Classification('text',labels=name('label',['not cognitive-bias','cognitive-bias']), dataset_name='mediabiasgroup/mbib-base', config_name='cognitive-bias')
mbib_fake_news	= Classification('text',labels=name('label',['not fake-news','fake-news']), dataset_name='mediabiasgroup/mbib-base', config_name='fake-news')
mbib_gender_bias	= Classification('text',labels=name('label',['not gender-bias','gender-bias']), dataset_name='mediabiasgroup/mbib-base', config_name='gender-bias')
mbib_hate_speech	= Classification('text',labels=name('label',['not hate-speech','hate-speech']), dataset_name='mediabiasgroup/mbib-base', config_name='hate-speech')
mbib_linguistic_bias	= Classification('text',labels=name('label',['not linguistic-bias','linguistic-bias']), dataset_name='mediabiasgroup/mbib-base', config_name='linguistic-bias')
mbib_political_bias	= Classification('text',labels=name('label',['not political-bias','political-bias']), dataset_name='mediabiasgroup/mbib-base', config_name='political-bias')
mbib_racial_bias	= Classification('text',labels=name('label',['not racial-bias','racial-bias']), dataset_name='mediabiasgroup/mbib-base', config_name='racial-bias')
mbib_text_level_bias	= Classification('text',labels=name('label',['not text-level-bias','text-level-bias']), dataset_name='mediabiasgroup/mbib-base', config_name='text-level-bias')
"""

robustLR = Classification("context","statement","label", dataset_name="tasksource/robustLR")

cluttr = Classification("story","query", "target_text",dataset_name="CLUTRR/v1", config_name="gen_train234_test2to10")

logical_fallacy = Classification("source_article", labels="logical_fallacies", dataset_name="tasksource/logical-fallacy")

parade = Classification("Definition1","Definition2", labels=name('Binary labels',["not-paraphrase","paraphrase"]), dataset_name="tasksource/parade")

cladder = Classification("given_info", "question", "answer",dataset_name="tasksource/cladder")

subjectivity = Classification("Sentence",labels="Label",dataset_name="tasksource/subjectivity")

moh   = Classification("context","expression","label", dataset_name="tasksource/MOH")
vuac  = Classification("context","expression","label", dataset_name="tasksource/VUAC")
trofi = Classification("context","expression","label", dataset_name="tasksource/TroFi", splits=['train',None,'test'])

sharc_classification = Classification("snippet", lambda x:f'{x["scenario"]}\n{x["question"]}',
    labels=lambda x:x["answer"] if x['answer'] in  {"Yes","No","Irrelevant"} else "Clarification needed",
    dataset_name='sharc_modified',config_name='mod')

conceptrules_v2 = Classification("context", "text", "label", dataset_name="tasksource/conceptrules_v2")

scidtb = Classification("unit1_txt","unit2_txt","label", dataset_name="metaeval/disrpt",config_name='eng.dep.scidtb.rels')

chunking = TokenClassification("tokens","chunk_tags", dataset_name="conll2000")

few_nerd = TokenClassification("tokens","fine_ner_tags",dataset_name="DFKI-SLT/few-nerd",config_name='supervised')
finer = TokenClassification('tokens','ner_tags',dataset_name='nlpaueb/finer-139')

label_nli = Classification("premise","hypothesis","labels",dataset_name='tasksource/zero-shot-label-nli')

com2sense = Classification("sent",labels="label",dataset_name="tasksource/com2sense",splits=['train',"validation",None])

scone = Classification('sentence1_edited','sentence2_edited','gold_label_edited',dataset_name="tasksource/scone")

winodict = MultipleChoice(cat(['definition','sentence']),['option1','option2'],'label',dataset_name='tasksource/winodict')

fool_me_twice = Classification(
    lambda x: " ".join(a['text'] for a in x['gold_evidence']),
    'text', 'label', dataset_name='tasksource/fool-me-twice')

monli = Classification("sentence1","sentence2","gold_label", dataset_name="tasksource/monli")

causality = Classification('premise','hypothesis','relation', dataset_name='tasksource/corr2cause')

lsat = MultipleChoice(cat(['passage','question']), choices_list='references',labels='gold_index',dataset_name='lighteval/lsat_qa',config_name='all')

apt = Classification('text_a','text_b',name('labels',['not_paraphrase','paraphrase']),dataset_name='tasksource/apt')

#xsum_factuality = Classification("summary",labels="is_factual")

financial_sentiment = Classification("text",labels=name('label',['Bearish','Bullish','Neutral']),
    dataset_name="zeroshot/twitter-financial-news-sentiment")

def _icl_rand(x):
    import random
    return random.Random(x['sentence1'][:50]).randint(0,1) #deterministic label for each input

icl = Classification("inputs", lambda x: x['symbols'][_icl_rand(x)],
    labels=lambda x: str(x['symbols'][_icl_rand(x)]==x['targets']),
    dataset_name="tasksource/icl-symbol-tuning-instruct",
    pre_process=lambda ds:ds.filter(lambda x:len(x['inputs'])<500*4), # 500 tokens of 4 char 
)

space_nli = Classification("premises","hypothesis","label",dataset_name="tasksource/SpaceNLI")

propsegment = Classification("hypothesis","premise",
    labels = lambda x:{'n':'neutral','e':'entailment','c':'contradiction'}[x['label']],
    dataset_name="sihaochen/propsegment",config_name='nli')

hatemoji = Classification('text',labels=name("label_gold", ['not-hate-speech','hate-speech']),
    dataset_name="HannahRoseKirk/HatemojiBuild")

regset = Classification("context",labels="answer",dataset_name='tasksource/regset')

esci = Classification('query','product_text','esci_label',
    dataset_name="tasksource/esci",
    pre_process=lambda ds:ds.filter(lambda x:x['product_locale']=='us'))

def _preprocess_chatbot_arena(ds):
    ds=ds.filter(lambda x:x['winner'] in ["model_a","model_b"])
    ds=ds.filter(lambda x:x['language']=="English")

    def _unroll(x):
        f=lambda x:"\n".join([f"{turn['role']}:\n{turn['content']}" for turn in x])
        x['conversation_a'] = f(x['conversation_a'])
        x['conversation_b'] = f(x['conversation_b'])
        return x
    ds=ds.map(_unroll)
    return ds

chatbot_arena = MultipleChoice(constant(""),
    choices=["conversation_a","conversation_b"],
    labels=lambda x: ["model_a","model_b"].index(x["winner"]),
    dataset_name="lmsys/chatbot_arena_conversations",
    pre_process=_preprocess_chatbot_arena)

dnd_intent = Classification("examples",labels="label_names",
    dataset_name='neurae/dnd_style_intents')

fld = Classification("context","hypothesis", "proof_label",
    dataset_name="hitachi-nlp/FLD.v2",config_name="default")

flds = Classification("context","hypothesis", "proof_label",
    dataset_name="hitachi-nlp/FLD.v2",config_name="star")

sdoh_nli = Classification("premise","hypothesis",labels=lambda x:{True:"entailment",False:"not_entailment"}[x['label']],
    dataset_name="tasksource/SDOH-NLI")

scifact_entailment = Classification(lambda x:"\n".join(x["abstract"]),"claim",
    labels=lambda x:x['verdict'].replace('NEI','NEUTRAL').lower(),
    dataset_name="allenai/scifact_entailment")

feasibilityQA = Classification(cat(['knowledge','premise']),'hypothesis','binary_classification_label',
    dataset_name="tasksource/feasibilityQA")
                               
simple_pair = Classification("premise","hypothesis","label", dataset_name="tasksource/simple_pair")
adjective_scale_probe = Classification("premise","hypothesis","label", dataset_name="tasksource/AdjectiveScaleProbe-nli")
repectively_nli = Classification("premise","hypothesis","label",dataset_name="tasksource/resnli")

spartun=MultipleChoice(cat(["story","question"]),choices_list="candidate_answers",
    labels=lambda x: [c.lower() for c in x['choices_list']].index(x["answer"][0].lower()),
    pre_process=lambda ds:ds.filter(lambda x:len(x['answer'])==1),
    dataset_name="tasksource/SpaRTUN")

resq=MultipleChoice(cat(["story","question"]),choices_list="candidate_answers",
    labels=lambda x: [c.lower() for c in x['choices_list']].index(x["answer"][0].lower()),
    pre_process=lambda ds:ds.filter(lambda x:len(x['answer'])==1),
    dataset_name="tasksource/ReSQ")

semantic_fragments_nli = Classification("sentence1","sentence2","gold_label",
    dataset_name="tasksource/semantic_fragments_nli")

moritz_zs_nli = Classification('text','hypothesis','labels',
    pre_process=lambda ds:ds.filter(lambda x:x['task_name'] not in  ["mnli", "anli", "fevernli", "wanli", "lingnli"]),
    dataset_name="MoritzLaurer/dataset_train_nli"
) 

stepgame = Classification('story','question','label',dataset_name="tasksource/stepgame")

def _nlgraph_binarize(x):
    a=x['answer'].lower()
    if "yes" in a: return "True"
    if "no" in a: return "False"
    assert False

nlgraph = Classification('question',labels=_nlgraph_binarize,
    pre_process=lambda ds:ds.filter(lambda x:x['task'] in "connectivity cycle hamilton"),
    dataset_name="tasksource/nlgraph")

oasst_rlhf = MultipleChoice("prompt",choices=['chosen','rejected'],labels=constant(0),
    dataset_name="tasksource/oasst2_pairwise_rlhf_reward")

anthropic_rlhf_helpfulness = MultipleChoice(constant('Most helpful assistant answer:'), ['chosen','rejected'], constant(0),
    dataset_name="tasksource/hh-rlhf",config_name=["helpful-base", "helpful-online", "helpful-rejection-sampled"])

anthropic_rlhf_harmless = MultipleChoice(constant('Most harmless assistant answer:'), ['chosen','rejected'], constant(0),
    dataset_name="tasksource/hh-rlhf",config_name="harmless-base")

ruletaker = Classification(
    lambda x: 'What is not explicitly stated as true is considered false. \n' +x["context"], #closed world assumption
    "question","label",dataset_name="tasksource/ruletaker")

para_rules = Classification(
    lambda x: 'What is not explicitly stated as true is considered false. \n' +x["context"], #closed world assumption
    "question", labels=name("label",["False","True"]),
    dataset_name="qbao775/PARARULE-Plus")

proofwriter_deduction = Classification("theory","question","answer",
    dataset_name="tasksource/proofwriter") #open world assumption

logical_entailment = Classification("A","B","label",dataset_name='tasksource/logical-entailment')

nope = Classification('premise','hypothesis',
    labels=lambda x:dict(E='entailment',N='neutral',C='contradiction').get(x['label'],x['label']),
    dataset_name='tasksource/nope')

logicNLI = Classification('premise','hypothesis','label',dataset_name='tasksource/LogicNLI')

contract_nli__seg = Classification("premise","hypothesis","label", dataset_name="kiddothe2b/contract-nli",config_name="contractnli_a")

contract_nli__full = Classification("premise","hypothesis","label", dataset_name="kiddothe2b/contract-nli",config_name="contractnli_b")

nli4ct = Classification(lambda x: "\n".join(x['Primary_evidence']),'Statement',"Label",
    dataset_name="AshtonIsNotHere/nli4ct_semeval2024",splits=['train','dev',None])

lsat_ar = MultipleChoice(
    cat(['context','question']),
    choices_list='answers',labels="label",
     dataset_name="tasksource/lsat-ar")
    
lsat_rc = MultipleChoice(
    cat(['context','question']),
    choices_list='answers',labels="label",
     dataset_name="tasksource/lsat-rc")
    
biosift_nli = Classification("Abstract","Hypothesis",
    labels=lambda x: {True:"entailment",False:"not-entailment"}[bool(x['Entailment'])],
    dataset_name="AshtonIsNotHere/biosift-nli")

brainteasers = MultipleChoice("question",
    choices_list=lambda x:eval(x["choice_list"]),
    labels="label",
    dataset_name="tasksource/brainteasers",config_name=['WP','SP'])

#GATED !
#toxigen = Classification("text",labels="toxicity_human", dataset_name="skg/toxigen-data")

persuasiveness = Classification("claim","argument",labels="persuasiveness_metric",dataset_name="Anthropic/persuasion")

#ste_wic = Classification(cat("text_1","text_2"),
#    lambda x:f"{x['target']} means the same thing in these texts",
#    "gold_label_binary",
#    dataset_name="cardiffnlp/super_tweeteval", config_name="tempo_wic",splits=['train','validation',None])

#ste_nerd = Classification("text",
#    lambda x:f"definition of {x['target']} here is 'x{['definition']}'",
#    "gold_label_binary",
#    dataset_name="cardiffnlp/super_tweeteval", config_name="tweet_nerd",splits=['train','validation',None])
 
#ste_sim = Classification("text_1","text_2",lambda x:x['gold_score']/5,
#    dataset_name="cardiffnlp/super_tweeteval",config_name="tweet_similarity",splits=['train','validation',None])

#ste_intimacy = Classification("text_1",labels=lambda x:x['gold_score']/5,
#    dataset_name="cardiffnlp/super_tweeteval",config_name="tweet_intimacy")

#ccdv/patent-classification|abstract text label

ambigNQ = Classification("question",labels=lambda x:{True:"ambiguous", False:"not ambiguous"}.get(x["ambig"]),
    dataset_name="erbacher/AmbigNQ-clarifying-question")

siga_nli = Classification("premise","statement","label",dataset_name="tasksource/SIGA-nli")

unigram_fol = Classification("premise","hypothesis","label",dataset_name='unigram/FOL-nli')

#gs_goal = MultipleChoice("sent2",regen("ending[0-3]"),"label",
#        dataset_name="tasksource/goal-step-wikihow",config_name="goal")

#gs_step = MultipleChoice("sent2",regen("ending[0-3]"),"label",
#        dataset_name="tasksource/goal-step-wikihow",config_name="step")

gs_order = MultipleChoice("sent2",regen("ending[0-1]"),"label",
        dataset_name="tasksource/goal-step-wikihow",config_name="order")

paradise = MultipleChoice("sent2",regen("ending[0-3]"),"label",
      dataset_name="GGLab/PARADISE")

docnli = Classification("premise","hypothesis","label",dataset_name="tasksource/doc-nli")

mctest_nli = Classification("premise","hypothesis","label",dataset_name="tasksource/mctest-nli")

patent_phrase_similarity = Classification("anchor","target","label",dataset_name="tasksource/patent-phrase-similarity")

nlsat = Classification('sentence',labels='label',dataset_name="tasksource/natural-language-satisfiability")

idioms_nli = Classification('premise','hypothesis','label',dataset_name="tasksource/idioms-nli")

lifeycle_entailment = Classification("premise","hypothesis","label",dataset_name='tasksource/lifecycle-entailment')


helpsteer__helpfulness = Classification("prompt", "response", "helpfulness", dataset_name="nvidia/HelpSteer")
helpsteer__correctness = Classification("prompt", "response", "correctness", dataset_name="nvidia/HelpSteer")
helpsteer__coherence = Classification("prompt", "response", "coherence", dataset_name="nvidia/HelpSteer")
helpsteer__complexity = Classification("prompt", "response", "complexity", dataset_name="nvidia/HelpSteer")
helpsteer__verbosity = Classification("prompt", "response", "verbosity", dataset_name="nvidia/HelpSteer")

helpsteer_2__helpfulness = Classification("prompt","response","helpfulness",dataset_name="nvidia/HelpSteer2")
helpsteer_2__correctness = Classification("prompt", "response", "correctness", dataset_name="nvidia/HelpSteer2")
helpsteer_2__coherence = Classification("prompt", "response", "coherence", dataset_name="nvidia/HelpSteer2")
helpsteer_2__complexity = Classification("prompt", "response", "complexity", dataset_name="nvidia/HelpSteer2")
helpsteer_2__verbosity = Classification("prompt", "response", "verbosity", dataset_name="nvidia/HelpSteer2")

msci_nli = Classification('sentence1','sentence2','label',dataset_name='sadat2307/MSciNLI')

#lex_glue___ecthr_a = Classification(sentence1="text", labels="labels",dataset_name="coastalcph/lex_glue",config_name="ecthr_a") # too long
#lex_glue___ecthr_b = Classification(sentence1="text", labels="labels") # too long

ultrafeedback = MultipleChoice("question", choices=['response_j','response_k'],labels=constant(0), dataset_name="pushpdeep/UltraFeedback-paired")

essay_scoring = Classification("full_text",labels="score",dataset_name='tasksource/AES2-essay-scoring')

#argument_feedback = Classification("discourse_text",labels="discourse_effectiveness", dataset_name="tasksource/argument-feedback")

eg = lambda x: Classification("full_text", labels=lambda y:int(y[x]), dataset_name="tasksource/english-grading")
grading__cohesion = eg('cohesion')
grading__syntax = eg('syntax')
grading__vocabulary = eg('vocabulary')
grading__phraseology = eg('phraseology')
grading__grammar = eg('grammar')
grading__conventions = eg('conventions')

wice = Classification(lambda x: "\n".join(x['evidence']),'claim','label',
    dataset_name='tasksource/wice')

hover = Classification("evidence","claim","label",
    dataset_name="Dzeniks/hover") 

hover__nli = Classification("evidence","claim",name("label",["entailment","neutral","contradiction"]),
    dataset_name="Dzeniks/hover-3way")

tasksource_dpo = MultipleChoice("prompt",choices=['chosen','rejected'],labels=constant(0),
    dataset_name="tasksource/tasksource_dpo_pairs")

seahorse = Classification('article',cat(["summary", "question"]),'answer',
    dataset_name="tasksource/seahorse_summarization_evaluation")

mip = Classification("prompt",labels="y",
    dataset_name="sileod/missing-item-prediction",config_name="contrastive")

jigsaw_toxicity = Classification('comment_text',labels=name("toxic",["notthate","hate"]),
    dataset_name="tasksource/jigsaw_toxicity")

pol_nli = Classification("premise","hypothesis",labels=name('entailment',['entailment','not_entailment']),
    dataset_name="mlburnham/Pol_NLI")

synthetic_retrieval_nli = Classification('premise','hypothesis','label',dataset_name='tasksource/synthetic-retrieval-NLI',
    config_name=["binary","count","position"],
    pre_process=lambda ds:ds.filter(lambda x:x['n']<=2048))

issue_similarity = Classification("text1","text2","label",
    dataset_name="WhereIsAI/github-issue-similarity")

#nli_l2 = Classification("sentence1","sentence2","labels",
#    dataset_name="tasksource/merged-2l-nli")

#nli_l3 =  Classification("sentence1","sentence2","labels",
#    dataset_name="tasksource/merged-3l-nli")


================================================
FILE: src/tasksource/__init__.py
================================================
from .tasks import *
from .preprocess import *
from .access import *


================================================
FILE: src/tasksource/access.py
================================================
from .preprocess import Preprocessing
import re
import pandas as pd
from . import tasks, recast
from .metadata import dataset_rank
from datasets import load_dataset
import funcy as fc
import os
import copy
from sorcery import dict_of
from functools import cache
import random


class lazy_mtasks:
    def __getattr__(self, name):
        from . import mtasks
        return getattr(mtasks, name)

    def __dir__(self):
        from . import mtasks
        return dir(mtasks)
lmtasks=lazy_mtasks()

def parse_var_name(s):
    config_name,task_name = None,None
    if '__' in s and '___' not in s: # dataset__task
        dataset_name, task_name = s.split('__') 
    elif '__' not in s.replace('___','') and '___' in s: #dataset___config
        dataset_name, config_name = s.split('___') 
    elif  '___' in s and '__' in s.split('___')[1]: #dataset___config__task
        dataset_name, config_task=s.split('___')
        config_name,task_name = config_task.split('__')
    else: # dataset 
        dataset_name = s
    return dataset_name,config_name,task_name

def pretty_name(x):
    dn = x.dataset_name.split("/")[-1]   
    cn = x.config_name if x.config_name else ""
    tn = x.task_name if x.task_name else ""
    return f"{dn}/{cn}/{tn}".replace('//','/').rstrip('/')

@cache
def list_tasks(tasks_path=f'{os.path.dirname(__file__)}/tasks.py',multilingual=False,instruct=False, excluded=[]):
    if multilingual:
        tasks_path=tasks_path.replace('/tasks.py','/mtasks.py')
    task_order = open(tasks_path).readlines()
    task_order = [x.split('=')[0].rstrip() for x in task_order if '=' in x]
    task_order = [x for x in task_order if x.isidentifier()]
    task_order = fc.flip(dict(enumerate(task_order)))

    l = []
    _tasks = (lmtasks if multilingual else tasks)

    for key in dir(_tasks):
        if key not in task_order:
            continue
        value=getattr(_tasks, key)
        if isinstance(value,Preprocessing):
            dataset_name, config_name, task_name = parse_var_name(key)
            dataset_name = (value.dataset_name if value.dataset_name else dataset_name)
            config_name = (value.config_name if value.config_name else config_name)
            hasattr(value,key)
            l+=[{'dataset_name': dataset_name,
                 'config_name' : config_name,
                 'task_name': task_name,
                 'preprocessing_name': key,
                'task_type': value.__class__.__name__,'mapping': value,
                'rank':task_order.get(key,None)}]   
    df=pd.DataFrame(l).explode('config_name')
    df = df.sort_values('rank').reset_index(drop=True)
    df['id'] = df.apply(lambda x: pretty_name(x), axis=1)
    df.insert(0, 'id', df.pop('id'))
    del df['rank']
    if instruct:
        df=df[df.id.map(lambda x: not any(a in x for a in recast.improper_labels))]
    df=df[df.id.map(lambda x: not any(x in a for a in excluded))]
    return df

#task_df =list_tasks()
#mtask_df =list_tasks(multilingual=True)

def dict_to_query(d=dict(), **kwargs):
    d={**d,**kwargs}
    return '&'.join([f'`{k}`=="{v}"' for k,v in d.items()])

def load_preprocessing(tasks=tasks, **kwargs):
    _tasks_df = list_tasks(multilingual=tasks==lmtasks)
    y = _tasks_df.copy().query(dict_to_query(**kwargs)).iloc[0]
    preprocessing= copy.copy(getattr(tasks, y.preprocessing_name))
    for c in 'dataset_name','config_name':
        if not isinstance(getattr(preprocessing,c), str):
             setattr(preprocessing,c,getattr(y,c))
    return preprocessing

def load_task(id=None, dataset_name=None,config_name=None,task_name=None,preprocessing_name=None,
         max

Download .txt

gitextract__ri1waap/

├── .github/
│   ├── scripts/
│   │   └── release.py
│   └── workflows/
│       ├── python-publish.yml
│       └── release.yml
├── .gitignore
├── CITATION.cff
├── LICENSE
├── README.md
├── mtasks.md
├── pyproject.toml
├── setup.cfg
├── src/
│   └── tasksource/
│       ├── .ipynb_checkpoints/
│       │   ├── access-checkpoint.py
│       │   ├── preprocess-checkpoint.py
│       │   ├── recast-checkpoint.py
│       │   └── tasks-checkpoint.py
│       ├── __init__.py
│       ├── access.py
│       ├── metadata/
│       │   ├── __init__.py
│       │   ├── bigbench_groups.py
│       │   ├── blimp_groups.py
│       │   ├── original.txt
│       │   └── popularity.py
│       ├── mtasks.py
│       ├── preprocess.py
│       ├── recast.py
│       └── tasks.py
└── tasks.md

Download .txt

SYMBOL INDEX (126 symbols across 10 files)

FILE: .github/scripts/release.py
  function get_last_version (line 6) | def get_last_version() -> str:
  function bump_patch_number (line 22) | def bump_patch_number(version_number: str) -> str:
  function create_new_patch_release (line 28) | def create_new_patch_release():

FILE: src/tasksource/.ipynb_checkpoints/access-checkpoint.py
  class lazy_mtasks (line 15) | class lazy_mtasks:
    method __getattr__ (line 16) | def __getattr__(self, name):
    method __dir__ (line 20) | def __dir__(self):
  function parse_var_name (line 25) | def parse_var_name(s):
  function pretty_name (line 38) | def pretty_name(x):
  function list_tasks (line 45) | def list_tasks(tasks_path=f'{os.path.dirname(__file__)}/tasks.py',multil...
  function dict_to_query (line 84) | def dict_to_query(d=dict(), **kwargs):
  function load_preprocessing (line 88) | def load_preprocessing(tasks=tasks, **kwargs):
  function load_task (line 97) | def load_task(id=None, dataset_name=None,config_name=None,task_name=None...

FILE: src/tasksource/.ipynb_checkpoints/preprocess-checkpoint.py
  function get_column_names (line 16) | def get_column_names(dataset):
  function sample_dataset (line 24) | def sample_dataset(dataset,n=10000, n_eval=1000,seed=0):
  class Preprocessing (line 31) | class Preprocessing(DotWiz):
    method __post_init__ (line 35) | def __post_init__(self):
    method __map_to_target (line 39) | def __map_to_target(x,fn=lambda x:None, target=None):
    method load (line 43) | def load(self):
    method __call__ (line 46) | def __call__(self,dataset, max_rows=None, max_rows_eval=None,seed=0):
  class cat (line 87) | class cat(Preprocessing):
    method __call__ (line 91) | def __call__(self, example=None):
  function pretty (line 100) | def pretty(f):
  class dotgetter (line 114) | class dotgetter:
    method __init__ (line 115) | def __init__(self, path=''):
    method __bool__ (line 118) | def __bool__(self):
    method __getattr__ (line 121) | def __getattr__(self, k):
    method __getitem__ (line 124) | def __getitem__(self, i):
    method __call__ (line 127) | def __call__(self, example=None):
    method __hash__ (line 130) | def __hash__(self):
  class ClassificationFields (line 135) | class ClassificationFields(Preprocessing):
  class Seq2SeqLMFields (line 141) | class Seq2SeqLMFields(Preprocessing):
  class TokenClassificationFields (line 146) | class TokenClassificationFields(Preprocessing):
  class MultipleChoiceFields (line 151) | class MultipleChoiceFields(Preprocessing):
    method __post_init__ (line 156) | def __post_init__(self):
    method __call__ (line 163) | def __call__(self,dataset, *args, **kwargs):
    method flatten_choice_list (line 176) | def flatten_choice_list(x, n_options=None):
    method sample_choices (line 190) | def sample_choices(x, n_options=None):
  class SharedFields (line 208) | class SharedFields:
  class Classification (line 218) | class Classification(SharedFields, ClassificationFields): pass
  class MultipleChoice (line 221) | class MultipleChoice(SharedFields, MultipleChoiceFields): pass
  class TokenClassification (line 224) | class TokenClassification(SharedFields, TokenClassificationFields): pass
  class Seq2SeqLM (line 227) | class Seq2SeqLM(SharedFields, Seq2SeqLMFields): pass
  function name (line 233) | def name(label_name, classes):
  function fix_splits (line 236) | def fix_splits(dataset):
  function fix_labels (line 280) | def fix_labels(dataset, label_key='labels'):
  function concatenate_dataset_dict (line 292) | def concatenate_dataset_dict(l):

FILE: src/tasksource/.ipynb_checkpoints/recast-checkpoint.py
  function render_options (line 13) | def render_options(options):
  function render_classification (line 17) | def render_classification(text,options,answer):
  function render_token_classification (line 23) | def render_token_classification(tokens,options,labels):
  function render_multiple_choice (line 29) | def render_multiple_choice(prompt, options, labels):
  function negative_sample_options (line 38) | def negative_sample_options(y, labels,N=4):
  function shuffle_choices (line 44) | def shuffle_choices(x):
  function recast_dataset_classification_to_mc (line 54) | def recast_dataset_classification_to_mc(dataset,sep="[SEP]",N=4):
  function recast_instruct (line 76) | def recast_instruct(dataset):

FILE: src/tasksource/.ipynb_checkpoints/tasks-checkpoint.py
  function remove_neg_1 (line 51) | def remove_neg_1(dataset):
  function _imppres_post_process (line 149) | def _imppres_post_process(ds,prefix=''):
  function _split_choices (line 331) | def _split_choices(s):
  function stance_kwargs (line 397) | def stance_kwargs(topic):
  function _preprocess_chatgpt_detection (line 691) | def _preprocess_chatgpt_detection(ex):
  function _udep_post_process (line 915) | def _udep_post_process(ds):
  function _icl_rand (line 1026) | def _icl_rand(x):
  function _preprocess_chatbot_arena (line 1051) | def _preprocess_chatbot_arena(ds):
  function _nlgraph_binarize (line 1112) | def _nlgraph_binarize(x):

FILE: src/tasksource/access.py
  class lazy_mtasks (line 15) | class lazy_mtasks:
    method __getattr__ (line 16) | def __getattr__(self, name):
    method __dir__ (line 20) | def __dir__(self):
  function parse_var_name (line 25) | def parse_var_name(s):
  function pretty_name (line 38) | def pretty_name(x):
  function list_tasks (line 45) | def list_tasks(tasks_path=f'{os.path.dirname(__file__)}/tasks.py',multil...
  function dict_to_query (line 84) | def dict_to_query(d=dict(), **kwargs):
  function load_preprocessing (line 88) | def load_preprocessing(tasks=tasks, **kwargs):
  function load_task (line 97) | def load_task(id=None, dataset_name=None,config_name=None,task_name=None...

FILE: src/tasksource/mtasks.py
  function all (line 5) | def all(dataset_name):
  function concatenate_configs (line 13) | def concatenate_configs(dataset):
  function udep_post_process (line 102) | def udep_post_process(ds):

FILE: src/tasksource/preprocess.py
  function get_column_names (line 16) | def get_column_names(dataset):
  function sample_dataset (line 24) | def sample_dataset(dataset,n=10000, n_eval=1000,seed=0):
  class Preprocessing (line 31) | class Preprocessing(DotWiz):
    method __post_init__ (line 35) | def __post_init__(self):
    method __map_to_target (line 39) | def __map_to_target(x,fn=lambda x:None, target=None):
    method load (line 43) | def load(self):
    method __call__ (line 46) | def __call__(self,dataset, max_rows=None, max_rows_eval=None,seed=0):
  class cat (line 87) | class cat(Preprocessing):
    method __call__ (line 91) | def __call__(self, example=None):
  function pretty (line 100) | def pretty(f):
  class dotgetter (line 114) | class dotgetter:
    method __init__ (line 115) | def __init__(self, path=''):
    method __bool__ (line 118) | def __bool__(self):
    method __getattr__ (line 121) | def __getattr__(self, k):
    method __getitem__ (line 124) | def __getitem__(self, i):
    method __call__ (line 127) | def __call__(self, example=None):
    method __hash__ (line 130) | def __hash__(self):
  class ClassificationFields (line 135) | class ClassificationFields(Preprocessing):
  class Seq2SeqLMFields (line 141) | class Seq2SeqLMFields(Preprocessing):
  class TokenClassificationFields (line 146) | class TokenClassificationFields(Preprocessing):
  class MultipleChoiceFields (line 151) | class MultipleChoiceFields(Preprocessing):
    method __post_init__ (line 156) | def __post_init__(self):
    method __call__ (line 163) | def __call__(self,dataset, *args, **kwargs):
    method flatten_choice_list (line 176) | def flatten_choice_list(x, n_options=None):
    method sample_choices (line 190) | def sample_choices(x, n_options=None):
  class SharedFields (line 208) | class SharedFields:
  class Classification (line 218) | class Classification(SharedFields, ClassificationFields): pass
  class MultipleChoice (line 221) | class MultipleChoice(SharedFields, MultipleChoiceFields): pass
  class TokenClassification (line 224) | class TokenClassification(SharedFields, TokenClassificationFields): pass
  class Seq2SeqLM (line 227) | class Seq2SeqLM(SharedFields, Seq2SeqLMFields): pass
  function name (line 233) | def name(label_name, classes):
  function fix_splits (line 236) | def fix_splits(dataset):
  function fix_labels (line 280) | def fix_labels(dataset, label_key='labels'):
  function concatenate_dataset_dict (line 292) | def concatenate_dataset_dict(l):

FILE: src/tasksource/recast.py
  function render_options (line 13) | def render_options(options):
  function render_classification (line 17) | def render_classification(text,options,answer):
  function render_token_classification (line 23) | def render_token_classification(tokens,options,labels):
  function render_multiple_choice (line 29) | def render_multiple_choice(prompt, options, labels):
  function negative_sample_options (line 38) | def negative_sample_options(y, labels,N=4):
  function shuffle_choices (line 44) | def shuffle_choices(x):
  function recast_dataset_classification_to_mc (line 54) | def recast_dataset_classification_to_mc(dataset,sep="[SEP]",N=4):
  function recast_instruct (line 76) | def recast_instruct(dataset):

FILE: src/tasksource/tasks.py
  function remove_neg_1 (line 51) | def remove_neg_1(dataset):
  function _imppres_post_process (line 149) | def _imppres_post_process(ds,prefix=''):
  function _split_choices (line 331) | def _split_choices(s):
  function stance_kwargs (line 397) | def stance_kwargs(topic):
  function _preprocess_chatgpt_detection (line 691) | def _preprocess_chatgpt_detection(ex):
  function _udep_post_process (line 915) | def _udep_post_process(ds):
  function _icl_rand (line 1026) | def _icl_rand(x):
  function _preprocess_chatbot_arena (line 1051) | def _preprocess_chatbot_arena(ds):
  function _nlgraph_binarize (line 1112) | def _nlgraph_binarize(x):

Download .json

Condensed preview — 26 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (519K chars).

[
  {
    "path": ".github/scripts/release.py",
    "chars": 1328,
    "preview": "#!/usr/bin/env python3\nimport json\nimport subprocess\n\n\ndef get_last_version() -> str:\n    \"\"\"Return the version number o"
  },
  {
    "path": ".github/workflows/python-publish.yml",
    "chars": 431,
    "preview": "name: Publish to PyPI.org\non:\n  release:\n    types: [published]\njobs:\n  pypi:\n    runs-on: ubuntu-latest\n    steps:\n    "
  },
  {
    "path": ".github/workflows/release.yml",
    "chars": 319,
    "preview": "name: Create a new patch release\non: workflow_dispatch\njobs:\n  github:\n    runs-on: ubuntu-latest\n    steps:\n      - nam"
  },
  {
    "path": ".gitignore",
    "chars": 702,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\n"
  },
  {
    "path": "CITATION.cff",
    "chars": 331,
    "preview": "cff-version: 1.1.0\nmessage: \"If you use this work, please cite it as below.\"\nauthors:\n  - family-names: \"Sileo\"\n    give"
  },
  {
    "path": "LICENSE",
    "chars": 18646,
    "preview": "Attribution 4.0 International\n\n=======================================================================\n\nCreative Commons"
  },
  {
    "path": "README.md",
    "chars": 4795,
    "preview": "## tasksource ![](https://aeiljuispo.cloudimg.io/v7/https://s3.amazonaws.com/moonup/production/uploads/5fc0bcb41160c47d1"
  },
  {
    "path": "mtasks.md",
    "chars": 85648,
    "preview": "|     | id                                                           | dataset_name                                | con"
  },
  {
    "path": "pyproject.toml",
    "chars": 137,
    "preview": "[build-system]\nrequires = [\"setuptools>=45\", \"setuptools_scm[toml]>=6.2\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[tool"
  },
  {
    "path": "setup.cfg",
    "chars": 581,
    "preview": " [metadata]\nname = tasksource\ndescription = Preprocessings to prepare datasets for a task\nlong_description = file: READM"
  },
  {
    "path": "src/tasksource/.ipynb_checkpoints/access-checkpoint.py",
    "chars": 4385,
    "preview": "from .preprocess import Preprocessing\nimport re\nimport pandas as pd\nfrom . import tasks, recast\nfrom .metadata import da"
  },
  {
    "path": "src/tasksource/.ipynb_checkpoints/preprocess-checkpoint.py",
    "chars": 9660,
    "preview": "from collections.abc import Iterable\nfrom dotwiz import DotWiz\nfrom dataclasses import dataclass\nfrom typing import Unio"
  },
  {
    "path": "src/tasksource/.ipynb_checkpoints/recast-checkpoint.py",
    "chars": 5281,
    "preview": "import random\nfrom datasets import DatasetDict, Dataset\nfrom sorcery import dict_of\nimport string\n\nimproper_labels =['re"
  },
  {
    "path": "src/tasksource/.ipynb_checkpoints/tasks-checkpoint.py",
    "chars": 61919,
    "preview": "from .preprocess import cat, get, regen, name, constant, Classification, TokenClassification, MultipleChoice\nfrom .metad"
  },
  {
    "path": "src/tasksource/__init__.py",
    "chars": 69,
    "preview": "from .tasks import *\nfrom .preprocess import *\nfrom .access import *\n"
  },
  {
    "path": "src/tasksource/access.py",
    "chars": 4385,
    "preview": "from .preprocess import Preprocessing\nimport re\nimport pandas as pd\nfrom . import tasks, recast\nfrom .metadata import da"
  },
  {
    "path": "src/tasksource/metadata/__init__.py",
    "chars": 6152,
    "preview": "from .bigbench_groups import *\nfrom .blimp_groups import *\nfrom .popularity import *\n\nimppres_presupposition=['presuppos"
  },
  {
    "path": "src/tasksource/metadata/bigbench_groups.py",
    "chars": 3612,
    "preview": "bigbench_discriminative = set(\"\"\"abstract_narrative_understanding\r\nanachronisms\r\nanalogical_similarity\r\nanalytic_entailm"
  },
  {
    "path": "src/tasksource/metadata/blimp_groups.py",
    "chars": 2721,
    "preview": "import pandas as pd\n\ndfh=pd.read_csv('https://raw.githubusercontent.com/alexwarstadt/blimp/master/raw_results/summary/hu"
  },
  {
    "path": "src/tasksource/metadata/original.txt",
    "chars": 4569,
    "preview": "WANLI\nrecast/recast_verbnet\nrecast/recast_verbcorner\nrecast/recast_ner\nrecast/recast_sentiment\nrecast/recast_puns\nrecast"
  },
  {
    "path": "src/tasksource/metadata/popularity.py",
    "chars": 25139,
    "preview": "dataset_rank = {'glue': 0,\r\n 'super_glue': 12,\r\n 'tweet_eval': 23,\r\n 'blimp': 34,\r\n 'imdb': 101,\r\n 'wikitext': 102,\r\n 's"
  },
  {
    "path": "src/tasksource/mtasks.py",
    "chars": 6995,
    "preview": "from .preprocess import cat, get,name, regen, constant, Classification, TokenClassification, MultipleChoice\nfrom .metada"
  },
  {
    "path": "src/tasksource/preprocess.py",
    "chars": 9660,
    "preview": "from collections.abc import Iterable\nfrom dotwiz import DotWiz\nfrom dataclasses import dataclass\nfrom typing import Unio"
  },
  {
    "path": "src/tasksource/recast.py",
    "chars": 5281,
    "preview": "import random\nfrom datasets import DatasetDict, Dataset\nfrom sorcery import dict_of\nimport string\n\nimproper_labels =['re"
  },
  {
    "path": "src/tasksource/tasks.py",
    "chars": 61919,
    "preview": "from .preprocess import cat, get, regen, name, constant, Classification, TokenClassification, MultipleChoice\nfrom .metad"
  },
  {
    "path": "tasks.md",
    "chars": 178623,
    "preview": "|     | id                                                                   | dataset_name                             "
  }
]

About this extraction

This page contains the full source code of the sileod/tasksource GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 26 files (491.5 KB), approximately 102.1k tokens, and a symbol index with 126 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo