Full Code of EleutherAI/lm-evaluation-harness for AI

main ee7e8f4fe58e cached

15734 files

11.1 MB

3.9M tokens

4474 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (15,180K chars total). Download the full file to get everything.

Repository: EleutherAI/lm-evaluation-harness
Branch: main
Commit: ee7e8f4fe58e
Files: 15734
Total size: 11.1 MB

Directory structure:
gitextract_npmvb7su/

├── .github/
│   └── workflows/
│       ├── new_tasks.yml
│       ├── publish.yml
│       └── unit_tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CITATION.bib
├── CODEOWNERS
├── LICENSE.md
├── MANIFEST.in
├── README.md
├── docs/
│   ├── API_guide.md
│   ├── CONTRIBUTING.md
│   ├── README.md
│   ├── chat-template-readme.md
│   ├── config_files.md
│   ├── decontamination.md
│   ├── footguns.md
│   ├── interface.md
│   ├── model_guide.md
│   ├── new_task_guide.md
│   ├── python-api.md
│   └── task_guide.md
├── examples/
│   ├── lm-eval-overview.ipynb
│   ├── transformer-lens.py
│   ├── visualize-wandb.ipynb
│   └── visualize-zeno.ipynb
├── ignore.txt
├── lm_eval/
│   ├── __init__.py
│   ├── __main__.py
│   ├── _cli/
│   │   ├── __init__.py
│   │   ├── harness.py
│   │   ├── ls.py
│   │   ├── run.py
│   │   ├── subcommand.py
│   │   ├── utils.py
│   │   └── validate.py
│   ├── api/
│   │   ├── __init__.py
│   │   ├── filter.py
│   │   ├── group.py
│   │   ├── instance.py
│   │   ├── metrics.py
│   │   ├── model.py
│   │   ├── registry.py
│   │   ├── samplers.py
│   │   ├── task.py
│   │   └── utils.py
│   ├── caching/
│   │   ├── __init__.py
│   │   └── cache.py
│   ├── config/
│   │   ├── __init__.py
│   │   ├── evaluate_config.py
│   │   ├── group.py
│   │   └── task.py
│   ├── decontamination/
│   │   ├── __init__.py
│   │   ├── archiver.py
│   │   ├── decontaminate.py
│   │   └── janitor.py
│   ├── defaults.py
│   ├── evaluator.py
│   ├── evaluator_utils.py
│   ├── filters/
│   │   ├── __init__.py
│   │   ├── custom.py
│   │   ├── decontamination.py
│   │   ├── extraction.py
│   │   ├── selection.py
│   │   └── transformation.py
│   ├── loggers/
│   │   ├── __init__.py
│   │   ├── evaluation_tracker.py
│   │   ├── utils.py
│   │   └── wandb_logger.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── anthropic_llms.py
│   │   ├── api_models.py
│   │   ├── dummy.py
│   │   ├── gguf.py
│   │   ├── hf_audiolm.py
│   │   ├── hf_steered.py
│   │   ├── hf_vlms.py
│   │   ├── huggingface.py
│   │   ├── ibm_watsonx_ai.py
│   │   ├── mamba_lm.py
│   │   ├── megatron_lm.py
│   │   ├── mistral3.py
│   │   ├── nemo_lm.py
│   │   ├── neuron_optimum.py
│   │   ├── openai_completions.py
│   │   ├── optimum_habana.py
│   │   ├── optimum_ipex.py
│   │   ├── optimum_lm.py
│   │   ├── sglang_causallms.py
│   │   ├── sglang_generate_API.py
│   │   ├── textsynth.py
│   │   ├── utils.py
│   │   ├── utils_hf.py
│   │   ├── vllm_causallms.py
│   │   ├── vllm_vlms.py
│   │   └── winml.py
│   ├── prompts/
│   │   └── __init__.py
│   ├── result_schema.py
│   ├── tasks/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── _factory.py
│   │   ├── _index.py
│   │   ├── _yaml_loader.py
│   │   ├── aclue/
│   │   │   ├── README.md
│   │   │   ├── _aclue.yaml
│   │   │   ├── _default_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── aclue_ancient_chinese_culture.yaml
│   │   │   ├── aclue_ancient_literature.yaml
│   │   │   ├── aclue_ancient_medical.yaml
│   │   │   ├── aclue_ancient_phonetics.yaml
│   │   │   ├── aclue_basic_ancient_chinese.yaml
│   │   │   ├── aclue_couplet_prediction.yaml
│   │   │   ├── aclue_homographic_character_resolution.yaml
│   │   │   ├── aclue_named_entity_recognition.yaml
│   │   │   ├── aclue_poetry_appreciate.yaml
│   │   │   ├── aclue_poetry_context_prediction.yaml
│   │   │   ├── aclue_poetry_quality_assessment.yaml
│   │   │   ├── aclue_poetry_sentiment_analysis.yaml
│   │   │   ├── aclue_polysemy_resolution.yaml
│   │   │   ├── aclue_reading_comprehension.yaml
│   │   │   └── aclue_sentence_segmentation.yaml
│   │   ├── acpbench/
│   │   │   ├── README.md
│   │   │   ├── boolq_cot_2shot/
│   │   │   │   ├── _boolq_cot_2shot_yaml
│   │   │   │   ├── act_reach.yaml
│   │   │   │   ├── app.yaml
│   │   │   │   ├── just.yaml
│   │   │   │   ├── land.yaml
│   │   │   │   ├── prog.yaml
│   │   │   │   ├── reach.yaml
│   │   │   │   └── val.yaml
│   │   │   ├── gen_2shot/
│   │   │   │   ├── _gen_yaml_2shot
│   │   │   │   ├── acp_grammar.lark
│   │   │   │   ├── acp_utils.py
│   │   │   │   ├── act_reach.yaml
│   │   │   │   ├── app.yaml
│   │   │   │   ├── just.yaml
│   │   │   │   ├── land.yaml
│   │   │   │   ├── next_act.yaml
│   │   │   │   ├── prog.yaml
│   │   │   │   ├── reach.yaml
│   │   │   │   └── val.yaml
│   │   │   ├── gen_2shot_with_pddl/
│   │   │   │   ├── _gen_yaml_2shot
│   │   │   │   ├── acp_grammar.lark
│   │   │   │   ├── acp_utils.py
│   │   │   │   ├── act_reach.yaml
│   │   │   │   ├── app.yaml
│   │   │   │   ├── just.yaml
│   │   │   │   ├── land.yaml
│   │   │   │   ├── next_act.yaml
│   │   │   │   ├── prog.yaml
│   │   │   │   ├── reach.yaml
│   │   │   │   └── val.yaml
│   │   │   └── mcq_cot_2shot/
│   │   │       ├── _mcq_cot_2shot_yaml
│   │   │       ├── act_reach.yaml
│   │   │       ├── app.yaml
│   │   │       ├── just.yaml
│   │   │       ├── land.yaml
│   │   │       ├── prog.yaml
│   │   │       ├── reach.yaml
│   │   │       └── val.yaml
│   │   ├── aexams/
│   │   │   ├── README.md
│   │   │   ├── _aexams.yaml
│   │   │   ├── _default_template_yaml
│   │   │   ├── aexams_Biology.yaml
│   │   │   ├── aexams_IslamicStudies.yaml
│   │   │   ├── aexams_Physics.yaml
│   │   │   ├── aexams_Science.yaml
│   │   │   └── aexams_Social.yaml
│   │   ├── afrimgsm/
│   │   │   ├── README.md
│   │   │   ├── direct/
│   │   │   │   ├── afrimgsm.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimgsm_amh.yaml
│   │   │   │   │   ├── afrimgsm_eng.yaml
│   │   │   │   │   ├── afrimgsm_ewe.yaml
│   │   │   │   │   ├── afrimgsm_fra.yaml
│   │   │   │   │   ├── afrimgsm_hau.yaml
│   │   │   │   │   ├── afrimgsm_ibo.yaml
│   │   │   │   │   ├── afrimgsm_kin.yaml
│   │   │   │   │   ├── afrimgsm_lin.yaml
│   │   │   │   │   ├── afrimgsm_lug.yaml
│   │   │   │   │   ├── afrimgsm_orm.yaml
│   │   │   │   │   ├── afrimgsm_sna.yaml
│   │   │   │   │   ├── afrimgsm_sot.yaml
│   │   │   │   │   ├── afrimgsm_swa.yaml
│   │   │   │   │   ├── afrimgsm_twi.yaml
│   │   │   │   │   ├── afrimgsm_vai.yaml
│   │   │   │   │   ├── afrimgsm_wol.yaml
│   │   │   │   │   ├── afrimgsm_xho.yaml
│   │   │   │   │   ├── afrimgsm_yaml
│   │   │   │   │   ├── afrimgsm_yor.yaml
│   │   │   │   │   └── afrimgsm_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimgsm_amh.yaml
│   │   │   │   │   ├── afrimgsm_eng.yaml
│   │   │   │   │   ├── afrimgsm_ewe.yaml
│   │   │   │   │   ├── afrimgsm_fra.yaml
│   │   │   │   │   ├── afrimgsm_hau.yaml
│   │   │   │   │   ├── afrimgsm_ibo.yaml
│   │   │   │   │   ├── afrimgsm_kin.yaml
│   │   │   │   │   ├── afrimgsm_lin.yaml
│   │   │   │   │   ├── afrimgsm_lug.yaml
│   │   │   │   │   ├── afrimgsm_orm.yaml
│   │   │   │   │   ├── afrimgsm_sna.yaml
│   │   │   │   │   ├── afrimgsm_sot.yaml
│   │   │   │   │   ├── afrimgsm_swa.yaml
│   │   │   │   │   ├── afrimgsm_twi.yaml
│   │   │   │   │   ├── afrimgsm_vai.yaml
│   │   │   │   │   ├── afrimgsm_wol.yaml
│   │   │   │   │   ├── afrimgsm_xho.yaml
│   │   │   │   │   ├── afrimgsm_yaml
│   │   │   │   │   ├── afrimgsm_yor.yaml
│   │   │   │   │   └── afrimgsm_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimgsm_amh.yaml
│   │   │   │   │   ├── afrimgsm_eng.yaml
│   │   │   │   │   ├── afrimgsm_ewe.yaml
│   │   │   │   │   ├── afrimgsm_fra.yaml
│   │   │   │   │   ├── afrimgsm_hau.yaml
│   │   │   │   │   ├── afrimgsm_ibo.yaml
│   │   │   │   │   ├── afrimgsm_kin.yaml
│   │   │   │   │   ├── afrimgsm_lin.yaml
│   │   │   │   │   ├── afrimgsm_lug.yaml
│   │   │   │   │   ├── afrimgsm_orm.yaml
│   │   │   │   │   ├── afrimgsm_sna.yaml
│   │   │   │   │   ├── afrimgsm_sot.yaml
│   │   │   │   │   ├── afrimgsm_swa.yaml
│   │   │   │   │   ├── afrimgsm_twi.yaml
│   │   │   │   │   ├── afrimgsm_vai.yaml
│   │   │   │   │   ├── afrimgsm_wol.yaml
│   │   │   │   │   ├── afrimgsm_xho.yaml
│   │   │   │   │   ├── afrimgsm_yaml
│   │   │   │   │   ├── afrimgsm_yor.yaml
│   │   │   │   │   └── afrimgsm_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimgsm_amh.yaml
│   │   │   │   │   ├── afrimgsm_eng.yaml
│   │   │   │   │   ├── afrimgsm_ewe.yaml
│   │   │   │   │   ├── afrimgsm_fra.yaml
│   │   │   │   │   ├── afrimgsm_hau.yaml
│   │   │   │   │   ├── afrimgsm_ibo.yaml
│   │   │   │   │   ├── afrimgsm_kin.yaml
│   │   │   │   │   ├── afrimgsm_lin.yaml
│   │   │   │   │   ├── afrimgsm_lug.yaml
│   │   │   │   │   ├── afrimgsm_orm.yaml
│   │   │   │   │   ├── afrimgsm_sna.yaml
│   │   │   │   │   ├── afrimgsm_sot.yaml
│   │   │   │   │   ├── afrimgsm_swa.yaml
│   │   │   │   │   ├── afrimgsm_twi.yaml
│   │   │   │   │   ├── afrimgsm_vai.yaml
│   │   │   │   │   ├── afrimgsm_wol.yaml
│   │   │   │   │   ├── afrimgsm_xho.yaml
│   │   │   │   │   ├── afrimgsm_yaml
│   │   │   │   │   ├── afrimgsm_yor.yaml
│   │   │   │   │   └── afrimgsm_zul.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimgsm_amh.yaml
│   │   │   │       ├── afrimgsm_eng.yaml
│   │   │   │       ├── afrimgsm_ewe.yaml
│   │   │   │       ├── afrimgsm_fra.yaml
│   │   │   │       ├── afrimgsm_hau.yaml
│   │   │   │       ├── afrimgsm_ibo.yaml
│   │   │   │       ├── afrimgsm_kin.yaml
│   │   │   │       ├── afrimgsm_lin.yaml
│   │   │   │       ├── afrimgsm_lug.yaml
│   │   │   │       ├── afrimgsm_orm.yaml
│   │   │   │       ├── afrimgsm_sna.yaml
│   │   │   │       ├── afrimgsm_sot.yaml
│   │   │   │       ├── afrimgsm_swa.yaml
│   │   │   │       ├── afrimgsm_twi.yaml
│   │   │   │       ├── afrimgsm_vai.yaml
│   │   │   │       ├── afrimgsm_wol.yaml
│   │   │   │       ├── afrimgsm_xho.yaml
│   │   │   │       ├── afrimgsm_yaml
│   │   │   │       ├── afrimgsm_yor.yaml
│   │   │   │       └── afrimgsm_zul.yaml
│   │   │   ├── direct_cot/
│   │   │   │   ├── afrimgsm_cot.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimgsm_cot_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_eng.yaml
│   │   │   │   │   ├── afrimgsm_cot_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_yaml
│   │   │   │   │   ├── afrimgsm_cot_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimgsm_cot_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_eng.yaml
│   │   │   │   │   ├── afrimgsm_cot_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_yaml
│   │   │   │   │   ├── afrimgsm_cot_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimgsm_cot_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_eng.yaml
│   │   │   │   │   ├── afrimgsm_cot_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_yaml
│   │   │   │   │   ├── afrimgsm_cot_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimgsm_cot_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_eng.yaml
│   │   │   │   │   ├── afrimgsm_cot_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_yaml
│   │   │   │   │   ├── afrimgsm_cot_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_zul.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimgsm_cot_amh.yaml
│   │   │   │       ├── afrimgsm_cot_eng.yaml
│   │   │   │       ├── afrimgsm_cot_ewe.yaml
│   │   │   │       ├── afrimgsm_cot_fra.yaml
│   │   │   │       ├── afrimgsm_cot_hau.yaml
│   │   │   │       ├── afrimgsm_cot_ibo.yaml
│   │   │   │       ├── afrimgsm_cot_kin.yaml
│   │   │   │       ├── afrimgsm_cot_lin.yaml
│   │   │   │       ├── afrimgsm_cot_lug.yaml
│   │   │   │       ├── afrimgsm_cot_orm.yaml
│   │   │   │       ├── afrimgsm_cot_sna.yaml
│   │   │   │       ├── afrimgsm_cot_sot.yaml
│   │   │   │       ├── afrimgsm_cot_swa.yaml
│   │   │   │       ├── afrimgsm_cot_twi.yaml
│   │   │   │       ├── afrimgsm_cot_vai.yaml
│   │   │   │       ├── afrimgsm_cot_wol.yaml
│   │   │   │       ├── afrimgsm_cot_xho.yaml
│   │   │   │       ├── afrimgsm_cot_yaml
│   │   │   │       ├── afrimgsm_cot_yor.yaml
│   │   │   │       └── afrimgsm_cot_zul.yaml
│   │   │   ├── gen_utils.py
│   │   │   ├── gen_yaml.sh
│   │   │   ├── run.sh
│   │   │   ├── translate/
│   │   │   │   ├── afrimgsm_tt.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimgsm_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_translate_yaml
│   │   │   │   │   ├── afrimgsm_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_translate_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimgsm_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_translate_yaml
│   │   │   │   │   ├── afrimgsm_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_translate_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimgsm_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_translate_yaml
│   │   │   │   │   ├── afrimgsm_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_translate_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimgsm_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_translate_yaml
│   │   │   │   │   ├── afrimgsm_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_translate_zul.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimgsm_translate_amh.yaml
│   │   │   │       ├── afrimgsm_translate_ewe.yaml
│   │   │   │       ├── afrimgsm_translate_fra.yaml
│   │   │   │       ├── afrimgsm_translate_hau.yaml
│   │   │   │       ├── afrimgsm_translate_ibo.yaml
│   │   │   │       ├── afrimgsm_translate_kin.yaml
│   │   │   │       ├── afrimgsm_translate_lin.yaml
│   │   │   │       ├── afrimgsm_translate_lug.yaml
│   │   │   │       ├── afrimgsm_translate_orm.yaml
│   │   │   │       ├── afrimgsm_translate_sna.yaml
│   │   │   │       ├── afrimgsm_translate_sot.yaml
│   │   │   │       ├── afrimgsm_translate_swa.yaml
│   │   │   │       ├── afrimgsm_translate_twi.yaml
│   │   │   │       ├── afrimgsm_translate_wol.yaml
│   │   │   │       ├── afrimgsm_translate_xho.yaml
│   │   │   │       ├── afrimgsm_translate_yaml
│   │   │   │       ├── afrimgsm_translate_yor.yaml
│   │   │   │       └── afrimgsm_translate_zul.yaml
│   │   │   ├── translate_cot/
│   │   │   │   ├── afrimgsm_tt_cot.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_translate_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_translate_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_translate_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_translate_zul.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │       ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │       ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │       ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │       ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │       ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │       ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │       ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │       ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │       ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │       ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │       ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │       ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │       ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │       ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │       ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │       ├── afrimgsm_cot_translate_yaml
│   │   │   │       ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │       └── afrimgsm_cot_translate_zul.yaml
│   │   │   └── utils.py
│   │   ├── afrimmlu/
│   │   │   ├── README.md
│   │   │   ├── direct/
│   │   │   │   ├── afrimmlu.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimmlu_direct
│   │   │   │   │   ├── afrimmlu_direct_amh.yaml
│   │   │   │   │   ├── afrimmlu_direct_eng.yaml
│   │   │   │   │   ├── afrimmlu_direct_ewe.yaml
│   │   │   │   │   ├── afrimmlu_direct_fra.yaml
│   │   │   │   │   ├── afrimmlu_direct_hau.yaml
│   │   │   │   │   ├── afrimmlu_direct_ibo.yaml
│   │   │   │   │   ├── afrimmlu_direct_kin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lug.yaml
│   │   │   │   │   ├── afrimmlu_direct_orm.yaml
│   │   │   │   │   ├── afrimmlu_direct_sna.yaml
│   │   │   │   │   ├── afrimmlu_direct_sot.yaml
│   │   │   │   │   ├── afrimmlu_direct_swa.yaml
│   │   │   │   │   ├── afrimmlu_direct_twi.yaml
│   │   │   │   │   ├── afrimmlu_direct_wol.yaml
│   │   │   │   │   ├── afrimmlu_direct_xho.yaml
│   │   │   │   │   ├── afrimmlu_direct_yor.yaml
│   │   │   │   │   ├── afrimmlu_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimmlu_direct
│   │   │   │   │   ├── afrimmlu_direct_amh.yaml
│   │   │   │   │   ├── afrimmlu_direct_eng.yaml
│   │   │   │   │   ├── afrimmlu_direct_ewe.yaml
│   │   │   │   │   ├── afrimmlu_direct_fra.yaml
│   │   │   │   │   ├── afrimmlu_direct_hau.yaml
│   │   │   │   │   ├── afrimmlu_direct_ibo.yaml
│   │   │   │   │   ├── afrimmlu_direct_kin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lug.yaml
│   │   │   │   │   ├── afrimmlu_direct_orm.yaml
│   │   │   │   │   ├── afrimmlu_direct_sna.yaml
│   │   │   │   │   ├── afrimmlu_direct_sot.yaml
│   │   │   │   │   ├── afrimmlu_direct_swa.yaml
│   │   │   │   │   ├── afrimmlu_direct_twi.yaml
│   │   │   │   │   ├── afrimmlu_direct_wol.yaml
│   │   │   │   │   ├── afrimmlu_direct_xho.yaml
│   │   │   │   │   ├── afrimmlu_direct_yor.yaml
│   │   │   │   │   ├── afrimmlu_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimmlu_direct
│   │   │   │   │   ├── afrimmlu_direct_amh.yaml
│   │   │   │   │   ├── afrimmlu_direct_eng.yaml
│   │   │   │   │   ├── afrimmlu_direct_ewe.yaml
│   │   │   │   │   ├── afrimmlu_direct_fra.yaml
│   │   │   │   │   ├── afrimmlu_direct_hau.yaml
│   │   │   │   │   ├── afrimmlu_direct_ibo.yaml
│   │   │   │   │   ├── afrimmlu_direct_kin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lug.yaml
│   │   │   │   │   ├── afrimmlu_direct_orm.yaml
│   │   │   │   │   ├── afrimmlu_direct_sna.yaml
│   │   │   │   │   ├── afrimmlu_direct_sot.yaml
│   │   │   │   │   ├── afrimmlu_direct_swa.yaml
│   │   │   │   │   ├── afrimmlu_direct_twi.yaml
│   │   │   │   │   ├── afrimmlu_direct_wol.yaml
│   │   │   │   │   ├── afrimmlu_direct_xho.yaml
│   │   │   │   │   ├── afrimmlu_direct_yor.yaml
│   │   │   │   │   ├── afrimmlu_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimmlu_direct
│   │   │   │   │   ├── afrimmlu_direct_amh.yaml
│   │   │   │   │   ├── afrimmlu_direct_eng.yaml
│   │   │   │   │   ├── afrimmlu_direct_ewe.yaml
│   │   │   │   │   ├── afrimmlu_direct_fra.yaml
│   │   │   │   │   ├── afrimmlu_direct_hau.yaml
│   │   │   │   │   ├── afrimmlu_direct_ibo.yaml
│   │   │   │   │   ├── afrimmlu_direct_kin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lug.yaml
│   │   │   │   │   ├── afrimmlu_direct_orm.yaml
│   │   │   │   │   ├── afrimmlu_direct_sna.yaml
│   │   │   │   │   ├── afrimmlu_direct_sot.yaml
│   │   │   │   │   ├── afrimmlu_direct_swa.yaml
│   │   │   │   │   ├── afrimmlu_direct_twi.yaml
│   │   │   │   │   ├── afrimmlu_direct_wol.yaml
│   │   │   │   │   ├── afrimmlu_direct_xho.yaml
│   │   │   │   │   ├── afrimmlu_direct_yor.yaml
│   │   │   │   │   ├── afrimmlu_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimmlu_direct
│   │   │   │       ├── afrimmlu_direct_amh.yaml
│   │   │   │       ├── afrimmlu_direct_eng.yaml
│   │   │   │       ├── afrimmlu_direct_ewe.yaml
│   │   │   │       ├── afrimmlu_direct_fra.yaml
│   │   │   │       ├── afrimmlu_direct_hau.yaml
│   │   │   │       ├── afrimmlu_direct_ibo.yaml
│   │   │   │       ├── afrimmlu_direct_kin.yaml
│   │   │   │       ├── afrimmlu_direct_lin.yaml
│   │   │   │       ├── afrimmlu_direct_lug.yaml
│   │   │   │       ├── afrimmlu_direct_orm.yaml
│   │   │   │       ├── afrimmlu_direct_sna.yaml
│   │   │   │       ├── afrimmlu_direct_sot.yaml
│   │   │   │       ├── afrimmlu_direct_swa.yaml
│   │   │   │       ├── afrimmlu_direct_twi.yaml
│   │   │   │       ├── afrimmlu_direct_wol.yaml
│   │   │   │       ├── afrimmlu_direct_xho.yaml
│   │   │   │       ├── afrimmlu_direct_yor.yaml
│   │   │   │       ├── afrimmlu_direct_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── fewshot.sh
│   │   │   ├── gen_utils.py
│   │   │   ├── translate/
│   │   │   │   ├── afrimmlu_tt.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimmlu_translate
│   │   │   │   │   ├── afrimmlu_translate_amh.yaml
│   │   │   │   │   ├── afrimmlu_translate_ewe.yaml
│   │   │   │   │   ├── afrimmlu_translate_fra.yaml
│   │   │   │   │   ├── afrimmlu_translate_hau.yaml
│   │   │   │   │   ├── afrimmlu_translate_ibo.yaml
│   │   │   │   │   ├── afrimmlu_translate_kin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lug.yaml
│   │   │   │   │   ├── afrimmlu_translate_orm.yaml
│   │   │   │   │   ├── afrimmlu_translate_sna.yaml
│   │   │   │   │   ├── afrimmlu_translate_sot.yaml
│   │   │   │   │   ├── afrimmlu_translate_swa.yaml
│   │   │   │   │   ├── afrimmlu_translate_twi.yaml
│   │   │   │   │   ├── afrimmlu_translate_wol.yaml
│   │   │   │   │   ├── afrimmlu_translate_xho.yaml
│   │   │   │   │   ├── afrimmlu_translate_yor.yaml
│   │   │   │   │   ├── afrimmlu_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimmlu_translate
│   │   │   │   │   ├── afrimmlu_translate_amh.yaml
│   │   │   │   │   ├── afrimmlu_translate_ewe.yaml
│   │   │   │   │   ├── afrimmlu_translate_fra.yaml
│   │   │   │   │   ├── afrimmlu_translate_hau.yaml
│   │   │   │   │   ├── afrimmlu_translate_ibo.yaml
│   │   │   │   │   ├── afrimmlu_translate_kin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lug.yaml
│   │   │   │   │   ├── afrimmlu_translate_orm.yaml
│   │   │   │   │   ├── afrimmlu_translate_sna.yaml
│   │   │   │   │   ├── afrimmlu_translate_sot.yaml
│   │   │   │   │   ├── afrimmlu_translate_swa.yaml
│   │   │   │   │   ├── afrimmlu_translate_twi.yaml
│   │   │   │   │   ├── afrimmlu_translate_wol.yaml
│   │   │   │   │   ├── afrimmlu_translate_xho.yaml
│   │   │   │   │   ├── afrimmlu_translate_yor.yaml
│   │   │   │   │   ├── afrimmlu_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimmlu_translate
│   │   │   │   │   ├── afrimmlu_translate_amh.yaml
│   │   │   │   │   ├── afrimmlu_translate_ewe.yaml
│   │   │   │   │   ├── afrimmlu_translate_fra.yaml
│   │   │   │   │   ├── afrimmlu_translate_hau.yaml
│   │   │   │   │   ├── afrimmlu_translate_ibo.yaml
│   │   │   │   │   ├── afrimmlu_translate_kin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lug.yaml
│   │   │   │   │   ├── afrimmlu_translate_orm.yaml
│   │   │   │   │   ├── afrimmlu_translate_sna.yaml
│   │   │   │   │   ├── afrimmlu_translate_sot.yaml
│   │   │   │   │   ├── afrimmlu_translate_swa.yaml
│   │   │   │   │   ├── afrimmlu_translate_twi.yaml
│   │   │   │   │   ├── afrimmlu_translate_wol.yaml
│   │   │   │   │   ├── afrimmlu_translate_xho.yaml
│   │   │   │   │   ├── afrimmlu_translate_yor.yaml
│   │   │   │   │   ├── afrimmlu_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimmlu_translate
│   │   │   │   │   ├── afrimmlu_translate_amh.yaml
│   │   │   │   │   ├── afrimmlu_translate_ewe.yaml
│   │   │   │   │   ├── afrimmlu_translate_fra.yaml
│   │   │   │   │   ├── afrimmlu_translate_hau.yaml
│   │   │   │   │   ├── afrimmlu_translate_ibo.yaml
│   │   │   │   │   ├── afrimmlu_translate_kin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lug.yaml
│   │   │   │   │   ├── afrimmlu_translate_orm.yaml
│   │   │   │   │   ├── afrimmlu_translate_sna.yaml
│   │   │   │   │   ├── afrimmlu_translate_sot.yaml
│   │   │   │   │   ├── afrimmlu_translate_swa.yaml
│   │   │   │   │   ├── afrimmlu_translate_twi.yaml
│   │   │   │   │   ├── afrimmlu_translate_wol.yaml
│   │   │   │   │   ├── afrimmlu_translate_xho.yaml
│   │   │   │   │   ├── afrimmlu_translate_yor.yaml
│   │   │   │   │   ├── afrimmlu_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimmlu_translate
│   │   │   │       ├── afrimmlu_translate_amh.yaml
│   │   │   │       ├── afrimmlu_translate_ewe.yaml
│   │   │   │       ├── afrimmlu_translate_fra.yaml
│   │   │   │       ├── afrimmlu_translate_hau.yaml
│   │   │   │       ├── afrimmlu_translate_ibo.yaml
│   │   │   │       ├── afrimmlu_translate_kin.yaml
│   │   │   │       ├── afrimmlu_translate_lin.yaml
│   │   │   │       ├── afrimmlu_translate_lug.yaml
│   │   │   │       ├── afrimmlu_translate_orm.yaml
│   │   │   │       ├── afrimmlu_translate_sna.yaml
│   │   │   │       ├── afrimmlu_translate_sot.yaml
│   │   │   │       ├── afrimmlu_translate_swa.yaml
│   │   │   │       ├── afrimmlu_translate_twi.yaml
│   │   │   │       ├── afrimmlu_translate_wol.yaml
│   │   │   │       ├── afrimmlu_translate_xho.yaml
│   │   │   │       ├── afrimmlu_translate_yor.yaml
│   │   │   │       ├── afrimmlu_translate_zul.yaml
│   │   │   │       └── utils.py
│   │   │   └── utils.py
│   │   ├── afrixnli/
│   │   │   ├── README.md
│   │   │   ├── anli prompt/
│   │   │   │   ├── en-direct/
│   │   │   │   │   ├── afrixnli_en_direct_amh.yaml
│   │   │   │   │   ├── afrixnli_en_direct_eng.yaml
│   │   │   │   │   ├── afrixnli_en_direct_ewe.yaml
│   │   │   │   │   ├── afrixnli_en_direct_fra.yaml
│   │   │   │   │   ├── afrixnli_en_direct_hau.yaml
│   │   │   │   │   ├── afrixnli_en_direct_ibo.yaml
│   │   │   │   │   ├── afrixnli_en_direct_kin.yaml
│   │   │   │   │   ├── afrixnli_en_direct_lin.yaml
│   │   │   │   │   ├── afrixnli_en_direct_lug.yaml
│   │   │   │   │   ├── afrixnli_en_direct_orm.yaml
│   │   │   │   │   ├── afrixnli_en_direct_sna.yaml
│   │   │   │   │   ├── afrixnli_en_direct_sot.yaml
│   │   │   │   │   ├── afrixnli_en_direct_swa.yaml
│   │   │   │   │   ├── afrixnli_en_direct_twi.yaml
│   │   │   │   │   ├── afrixnli_en_direct_wol.yaml
│   │   │   │   │   ├── afrixnli_en_direct_xho.yaml
│   │   │   │   │   ├── afrixnli_en_direct_yaml
│   │   │   │   │   ├── afrixnli_en_direct_yor.yaml
│   │   │   │   │   ├── afrixnli_en_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── native-direct/
│   │   │   │   │   ├── afrixnli_native_direct_amh.yaml
│   │   │   │   │   ├── afrixnli_native_direct_eng.yaml
│   │   │   │   │   ├── afrixnli_native_direct_ewe.yaml
│   │   │   │   │   ├── afrixnli_native_direct_fra.yaml
│   │   │   │   │   ├── afrixnli_native_direct_hau.yaml
│   │   │   │   │   ├── afrixnli_native_direct_ibo.yaml
│   │   │   │   │   ├── afrixnli_native_direct_kin.yaml
│   │   │   │   │   ├── afrixnli_native_direct_lin.yaml
│   │   │   │   │   ├── afrixnli_native_direct_lug.yaml
│   │   │   │   │   ├── afrixnli_native_direct_orm.yaml
│   │   │   │   │   ├── afrixnli_native_direct_sna.yaml
│   │   │   │   │   ├── afrixnli_native_direct_sot.yaml
│   │   │   │   │   ├── afrixnli_native_direct_swa.yaml
│   │   │   │   │   ├── afrixnli_native_direct_twi.yaml
│   │   │   │   │   ├── afrixnli_native_direct_wol.yaml
│   │   │   │   │   ├── afrixnli_native_direct_xho.yaml
│   │   │   │   │   ├── afrixnli_native_direct_yaml
│   │   │   │   │   ├── afrixnli_native_direct_yor.yaml
│   │   │   │   │   ├── afrixnli_native_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── translate/
│   │   │   │       ├── afrixnli_translate_amh.yaml
│   │   │   │       ├── afrixnli_translate_ewe.yaml
│   │   │   │       ├── afrixnli_translate_fra.yaml
│   │   │   │       ├── afrixnli_translate_hau.yaml
│   │   │   │       ├── afrixnli_translate_ibo.yaml
│   │   │   │       ├── afrixnli_translate_kin.yaml
│   │   │   │       ├── afrixnli_translate_lin.yaml
│   │   │   │       ├── afrixnli_translate_lug.yaml
│   │   │   │       ├── afrixnli_translate_orm.yaml
│   │   │   │       ├── afrixnli_translate_sna.yaml
│   │   │   │       ├── afrixnli_translate_sot.yaml
│   │   │   │       ├── afrixnli_translate_swa.yaml
│   │   │   │       ├── afrixnli_translate_twi.yaml
│   │   │   │       ├── afrixnli_translate_wol.yaml
│   │   │   │       ├── afrixnli_translate_xho.yaml
│   │   │   │       ├── afrixnli_translate_yaml
│   │   │   │       ├── afrixnli_translate_yor.yaml
│   │   │   │       ├── afrixnli_translate_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── direct/
│   │   │   │   ├── afrixnli.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrixnli_amh.yaml
│   │   │   │   │   ├── afrixnli_eng.yaml
│   │   │   │   │   ├── afrixnli_ewe.yaml
│   │   │   │   │   ├── afrixnli_fra.yaml
│   │   │   │   │   ├── afrixnli_hau.yaml
│   │   │   │   │   ├── afrixnli_ibo.yaml
│   │   │   │   │   ├── afrixnli_kin.yaml
│   │   │   │   │   ├── afrixnli_lin.yaml
│   │   │   │   │   ├── afrixnli_lug.yaml
│   │   │   │   │   ├── afrixnli_orm.yaml
│   │   │   │   │   ├── afrixnli_sna.yaml
│   │   │   │   │   ├── afrixnli_sot.yaml
│   │   │   │   │   ├── afrixnli_swa.yaml
│   │   │   │   │   ├── afrixnli_twi.yaml
│   │   │   │   │   ├── afrixnli_wol.yaml
│   │   │   │   │   ├── afrixnli_xho.yaml
│   │   │   │   │   ├── afrixnli_yaml
│   │   │   │   │   ├── afrixnli_yor.yaml
│   │   │   │   │   ├── afrixnli_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrixnli_amh.yaml
│   │   │   │   │   ├── afrixnli_eng.yaml
│   │   │   │   │   ├── afrixnli_ewe.yaml
│   │   │   │   │   ├── afrixnli_fra.yaml
│   │   │   │   │   ├── afrixnli_hau.yaml
│   │   │   │   │   ├── afrixnli_ibo.yaml
│   │   │   │   │   ├── afrixnli_kin.yaml
│   │   │   │   │   ├── afrixnli_lin.yaml
│   │   │   │   │   ├── afrixnli_lug.yaml
│   │   │   │   │   ├── afrixnli_orm.yaml
│   │   │   │   │   ├── afrixnli_sna.yaml
│   │   │   │   │   ├── afrixnli_sot.yaml
│   │   │   │   │   ├── afrixnli_swa.yaml
│   │   │   │   │   ├── afrixnli_twi.yaml
│   │   │   │   │   ├── afrixnli_wol.yaml
│   │   │   │   │   ├── afrixnli_xho.yaml
│   │   │   │   │   ├── afrixnli_yaml
│   │   │   │   │   ├── afrixnli_yor.yaml
│   │   │   │   │   ├── afrixnli_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrixnli_amh.yaml
│   │   │   │   │   ├── afrixnli_eng.yaml
│   │   │   │   │   ├── afrixnli_ewe.yaml
│   │   │   │   │   ├── afrixnli_fra.yaml
│   │   │   │   │   ├── afrixnli_hau.yaml
│   │   │   │   │   ├── afrixnli_ibo.yaml
│   │   │   │   │   ├── afrixnli_kin.yaml
│   │   │   │   │   ├── afrixnli_lin.yaml
│   │   │   │   │   ├── afrixnli_lug.yaml
│   │   │   │   │   ├── afrixnli_orm.yaml
│   │   │   │   │   ├── afrixnli_sna.yaml
│   │   │   │   │   ├── afrixnli_sot.yaml
│   │   │   │   │   ├── afrixnli_swa.yaml
│   │   │   │   │   ├── afrixnli_twi.yaml
│   │   │   │   │   ├── afrixnli_wol.yaml
│   │   │   │   │   ├── afrixnli_xho.yaml
│   │   │   │   │   ├── afrixnli_yaml
│   │   │   │   │   ├── afrixnli_yor.yaml
│   │   │   │   │   ├── afrixnli_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrixnli_amh.yaml
│   │   │   │   │   ├── afrixnli_eng.yaml
│   │   │   │   │   ├── afrixnli_ewe.yaml
│   │   │   │   │   ├── afrixnli_fra.yaml
│   │   │   │   │   ├── afrixnli_hau.yaml
│   │   │   │   │   ├── afrixnli_ibo.yaml
│   │   │   │   │   ├── afrixnli_kin.yaml
│   │   │   │   │   ├── afrixnli_lin.yaml
│   │   │   │   │   ├── afrixnli_lug.yaml
│   │   │   │   │   ├── afrixnli_orm.yaml
│   │   │   │   │   ├── afrixnli_sna.yaml
│   │   │   │   │   ├── afrixnli_sot.yaml
│   │   │   │   │   ├── afrixnli_swa.yaml
│   │   │   │   │   ├── afrixnli_twi.yaml
│   │   │   │   │   ├── afrixnli_wol.yaml
│   │   │   │   │   ├── afrixnli_xho.yaml
│   │   │   │   │   ├── afrixnli_yaml
│   │   │   │   │   ├── afrixnli_yor.yaml
│   │   │   │   │   ├── afrixnli_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrixnli_amh.yaml
│   │   │   │       ├── afrixnli_eng.yaml
│   │   │   │       ├── afrixnli_ewe.yaml
│   │   │   │       ├── afrixnli_fra.yaml
│   │   │   │       ├── afrixnli_hau.yaml
│   │   │   │       ├── afrixnli_ibo.yaml
│   │   │   │       ├── afrixnli_kin.yaml
│   │   │   │       ├── afrixnli_lin.yaml
│   │   │   │       ├── afrixnli_lug.yaml
│   │   │   │       ├── afrixnli_orm.yaml
│   │   │   │       ├── afrixnli_sna.yaml
│   │   │   │       ├── afrixnli_sot.yaml
│   │   │   │       ├── afrixnli_swa.yaml
│   │   │   │       ├── afrixnli_twi.yaml
│   │   │   │       ├── afrixnli_wol.yaml
│   │   │   │       ├── afrixnli_xho.yaml
│   │   │   │       ├── afrixnli_yaml
│   │   │   │       ├── afrixnli_yor.yaml
│   │   │   │       ├── afrixnli_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── gen_utils.py
│   │   │   ├── lai prompt/
│   │   │   │   ├── direct/
│   │   │   │   │   ├── afrixnli_manual_direct_amh.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_eng.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_ewe.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_fra.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_hau.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_ibo.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_kin.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_lin.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_lug.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_orm.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_sna.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_sot.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_swa.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_twi.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_wol.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_xho.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_yaml
│   │   │   │   │   ├── afrixnli_manual_direct_yor.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── translate/
│   │   │   │       ├── afrixnli_manual_translate_amh.yaml
│   │   │   │       ├── afrixnli_manual_translate_ewe.yaml
│   │   │   │       ├── afrixnli_manual_translate_fra.yaml
│   │   │   │       ├── afrixnli_manual_translate_hau.yaml
│   │   │   │       ├── afrixnli_manual_translate_ibo.yaml
│   │   │   │       ├── afrixnli_manual_translate_kin.yaml
│   │   │   │       ├── afrixnli_manual_translate_lin.yaml
│   │   │   │       ├── afrixnli_manual_translate_lug.yaml
│   │   │   │       ├── afrixnli_manual_translate_orm.yaml
│   │   │   │       ├── afrixnli_manual_translate_sna.yaml
│   │   │   │       ├── afrixnli_manual_translate_sot.yaml
│   │   │   │       ├── afrixnli_manual_translate_swa.yaml
│   │   │   │       ├── afrixnli_manual_translate_twi.yaml
│   │   │   │       ├── afrixnli_manual_translate_wol.yaml
│   │   │   │       ├── afrixnli_manual_translate_xho.yaml
│   │   │   │       ├── afrixnli_manual_translate_yaml
│   │   │   │       ├── afrixnli_manual_translate_yor.yaml
│   │   │   │       ├── afrixnli_manual_translate_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── translate/
│   │   │   │   ├── afrixnli_tt.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrixnli_translate_amh.yaml
│   │   │   │   │   ├── afrixnli_translate_ewe.yaml
│   │   │   │   │   ├── afrixnli_translate_fra.yaml
│   │   │   │   │   ├── afrixnli_translate_hau.yaml
│   │   │   │   │   ├── afrixnli_translate_ibo.yaml
│   │   │   │   │   ├── afrixnli_translate_kin.yaml
│   │   │   │   │   ├── afrixnli_translate_lin.yaml
│   │   │   │   │   ├── afrixnli_translate_lug.yaml
│   │   │   │   │   ├── afrixnli_translate_orm.yaml
│   │   │   │   │   ├── afrixnli_translate_sna.yaml
│   │   │   │   │   ├── afrixnli_translate_sot.yaml
│   │   │   │   │   ├── afrixnli_translate_swa.yaml
│   │   │   │   │   ├── afrixnli_translate_twi.yaml
│   │   │   │   │   ├── afrixnli_translate_wol.yaml
│   │   │   │   │   ├── afrixnli_translate_xho.yaml
│   │   │   │   │   ├── afrixnli_translate_yaml
│   │   │   │   │   ├── afrixnli_translate_yor.yaml
│   │   │   │   │   ├── afrixnli_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrixnli_translate_amh.yaml
│   │   │   │   │   ├── afrixnli_translate_ewe.yaml
│   │   │   │   │   ├── afrixnli_translate_fra.yaml
│   │   │   │   │   ├── afrixnli_translate_hau.yaml
│   │   │   │   │   ├── afrixnli_translate_ibo.yaml
│   │   │   │   │   ├── afrixnli_translate_kin.yaml
│   │   │   │   │   ├── afrixnli_translate_lin.yaml
│   │   │   │   │   ├── afrixnli_translate_lug.yaml
│   │   │   │   │   ├── afrixnli_translate_orm.yaml
│   │   │   │   │   ├── afrixnli_translate_sna.yaml
│   │   │   │   │   ├── afrixnli_translate_sot.yaml
│   │   │   │   │   ├── afrixnli_translate_swa.yaml
│   │   │   │   │   ├── afrixnli_translate_twi.yaml
│   │   │   │   │   ├── afrixnli_translate_wol.yaml
│   │   │   │   │   ├── afrixnli_translate_xho.yaml
│   │   │   │   │   ├── afrixnli_translate_yaml
│   │   │   │   │   ├── afrixnli_translate_yor.yaml
│   │   │   │   │   ├── afrixnli_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrixnli_translate_amh.yaml
│   │   │   │   │   ├── afrixnli_translate_ewe.yaml
│   │   │   │   │   ├── afrixnli_translate_fra.yaml
│   │   │   │   │   ├── afrixnli_translate_hau.yaml
│   │   │   │   │   ├── afrixnli_translate_ibo.yaml
│   │   │   │   │   ├── afrixnli_translate_kin.yaml
│   │   │   │   │   ├── afrixnli_translate_lin.yaml
│   │   │   │   │   ├── afrixnli_translate_lug.yaml
│   │   │   │   │   ├── afrixnli_translate_orm.yaml
│   │   │   │   │   ├── afrixnli_translate_sna.yaml
│   │   │   │   │   ├── afrixnli_translate_sot.yaml
│   │   │   │   │   ├── afrixnli_translate_swa.yaml
│   │   │   │   │   ├── afrixnli_translate_twi.yaml
│   │   │   │   │   ├── afrixnli_translate_wol.yaml
│   │   │   │   │   ├── afrixnli_translate_xho.yaml
│   │   │   │   │   ├── afrixnli_translate_yaml
│   │   │   │   │   ├── afrixnli_translate_yor.yaml
│   │   │   │   │   ├── afrixnli_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrixnli_translate_amh.yaml
│   │   │   │   │   ├── afrixnli_translate_ewe.yaml
│   │   │   │   │   ├── afrixnli_translate_fra.yaml
│   │   │   │   │   ├── afrixnli_translate_hau.yaml
│   │   │   │   │   ├── afrixnli_translate_ibo.yaml
│   │   │   │   │   ├── afrixnli_translate_kin.yaml
│   │   │   │   │   ├── afrixnli_translate_lin.yaml
│   │   │   │   │   ├── afrixnli_translate_lug.yaml
│   │   │   │   │   ├── afrixnli_translate_orm.yaml
│   │   │   │   │   ├── afrixnli_translate_sna.yaml
│   │   │   │   │   ├── afrixnli_translate_sot.yaml
│   │   │   │   │   ├── afrixnli_translate_swa.yaml
│   │   │   │   │   ├── afrixnli_translate_twi.yaml
│   │   │   │   │   ├── afrixnli_translate_wol.yaml
│   │   │   │   │   ├── afrixnli_translate_xho.yaml
│   │   │   │   │   ├── afrixnli_translate_yaml
│   │   │   │   │   ├── afrixnli_translate_yor.yaml
│   │   │   │   │   ├── afrixnli_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrixnli_translate_amh.yaml
│   │   │   │       ├── afrixnli_translate_ewe.yaml
│   │   │   │       ├── afrixnli_translate_fra.yaml
│   │   │   │       ├── afrixnli_translate_hau.yaml
│   │   │   │       ├── afrixnli_translate_ibo.yaml
│   │   │   │       ├── afrixnli_translate_kin.yaml
│   │   │   │       ├── afrixnli_translate_lin.yaml
│   │   │   │       ├── afrixnli_translate_lug.yaml
│   │   │   │       ├── afrixnli_translate_orm.yaml
│   │   │   │       ├── afrixnli_translate_sna.yaml
│   │   │   │       ├── afrixnli_translate_sot.yaml
│   │   │   │       ├── afrixnli_translate_swa.yaml
│   │   │   │       ├── afrixnli_translate_twi.yaml
│   │   │   │       ├── afrixnli_translate_wol.yaml
│   │   │   │       ├── afrixnli_translate_xho.yaml
│   │   │   │       ├── afrixnli_translate_yaml
│   │   │   │       ├── afrixnli_translate_yor.yaml
│   │   │   │       ├── afrixnli_translate_zul.yaml
│   │   │   │       └── utils.py
│   │   │   └── utils.py
│   │   ├── afrobench/
│   │   │   ├── README.md
│   │   │   ├── adr/
│   │   │   │   ├── README.md
│   │   │   │   ├── afridiacritics.yaml
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afridiacritics_bbj.yaml
│   │   │   │   │   ├── afridiacritics_fon.yaml
│   │   │   │   │   ├── afridiacritics_ibo.yaml
│   │   │   │   │   ├── afridiacritics_wol.yaml
│   │   │   │   │   ├── afridiacritics_yaml
│   │   │   │   │   └── afridiacritics_yor.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afridiacritics_bbj.yaml
│   │   │   │   │   ├── afridiacritics_fon.yaml
│   │   │   │   │   ├── afridiacritics_ibo.yaml
│   │   │   │   │   ├── afridiacritics_wol.yaml
│   │   │   │   │   ├── afridiacritics_yaml
│   │   │   │   │   └── afridiacritics_yor.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afridiacritics_bbj.yaml
│   │   │   │   │   ├── afridiacritics_fon.yaml
│   │   │   │   │   ├── afridiacritics_ibo.yaml
│   │   │   │   │   ├── afridiacritics_wol.yaml
│   │   │   │   │   ├── afridiacritics_yaml
│   │   │   │   │   └── afridiacritics_yor.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afridiacritics_bbj.yaml
│   │   │   │   │   ├── afridiacritics_fon.yaml
│   │   │   │   │   ├── afridiacritics_ibo.yaml
│   │   │   │   │   ├── afridiacritics_wol.yaml
│   │   │   │   │   ├── afridiacritics_yaml
│   │   │   │   │   └── afridiacritics_yor.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afridiacritics_bbj.yaml
│   │   │   │       ├── afridiacritics_fon.yaml
│   │   │   │       ├── afridiacritics_ibo.yaml
│   │   │   │       ├── afridiacritics_wol.yaml
│   │   │   │       ├── afridiacritics_yaml
│   │   │   │       └── afridiacritics_yor.yaml
│   │   │   ├── afriqa/
│   │   │   │   ├── README.md
│   │   │   │   ├── afriqa.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── utils.py
│   │   │   ├── afrisenti/
│   │   │   │   ├── README.md
│   │   │   │   ├── afrisenti.yaml
│   │   │   │   ├── fewshot.sh
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── run.sh
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── run.sh
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   └── utils.py
│   │   │   ├── afrobench-lite.yaml
│   │   │   ├── afrobench.yaml
│   │   │   ├── belebele/
│   │   │   │   ├── README.md
│   │   │   │   ├── belebele.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   └── utils.py
│   │   │   ├── flores/
│   │   │   │   ├── README.md
│   │   │   │   ├── flores.yaml
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── flores
│   │   │   │   │   │   ├── flores_ace_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ace_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_acq_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_aeb_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_afr_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_aka_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_amh_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ary_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_arz_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_bam_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ban_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_bem_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_cjk_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_dik_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_dyu_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ewe_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fon_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fra_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fuv_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_gaz_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_hau_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ibo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kab_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kam_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kbp_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kea_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kik_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kmb_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_knc_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_knc_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kon_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lua_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lug_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_luo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_mos_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nus_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nya_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_plt_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_run_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sag_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sna_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_som_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sot_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ssw_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sun_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_swh_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_taq_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_taq_Tfng-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tir_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tsn_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tum_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_twi_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tzm_Tfng-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_umb_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_wol_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_xho_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_yor_Latn-eng_Latn.yaml
│   │   │   │   │   │   └── flores_zul_Latn-eng_Latn.yaml
│   │   │   │   │   ├── english-african/
│   │   │   │   │   │   ├── flores
│   │   │   │   │   │   ├── flores_eng_Latn-ace_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ace_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-acq_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-aeb_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-afr_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-aka_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-amh_Ethi.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ary_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-arz_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-bam_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ban_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-bem_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-cjk_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-dik_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-dyu_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ewe_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fon_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fra_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fuv_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-gaz_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-hau_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ibo_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kab_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kam_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kbp_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kea_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kik_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kin_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kmb_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-knc_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-knc_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kon_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lin_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lua_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lug_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-luo_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-mos_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nso_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nus_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nya_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-plt_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-run_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sag_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sna_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-som_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sot_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ssw_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sun_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-swh_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-taq_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-taq_Tfng.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tir_Ethi.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tsn_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tso_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tum_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-twi_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tzm_Tfng.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-umb_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-wol_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-xho_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-yor_Latn.yaml
│   │   │   │   │   │   └── flores_eng_Latn-zul_Latn.yaml
│   │   │   │   │   └── flores
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── flores
│   │   │   │   │   │   ├── flores_ace_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ace_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_acq_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_aeb_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_afr_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_aka_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_amh_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ary_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_arz_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_bam_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ban_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_bem_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_cjk_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_dik_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_dyu_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ewe_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fon_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fra_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fuv_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_gaz_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_hau_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ibo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kab_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kam_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kbp_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kea_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kik_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kmb_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_knc_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_knc_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kon_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lua_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lug_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_luo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_mos_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nus_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nya_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_plt_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_run_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sag_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sna_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_som_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sot_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ssw_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sun_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_swh_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_taq_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_taq_Tfng-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tir_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tsn_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tum_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_twi_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tzm_Tfng-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_umb_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_wol_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_xho_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_yor_Latn-eng_Latn.yaml
│   │   │   │   │   │   └── flores_zul_Latn-eng_Latn.yaml
│   │   │   │   │   ├── english-african/
│   │   │   │   │   │   ├── flores
│   │   │   │   │   │   ├── flores_eng_Latn-ace_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ace_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-acq_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-aeb_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-afr_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-aka_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-amh_Ethi.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ary_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-arz_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-bam_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ban_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-bem_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-cjk_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-dik_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-dyu_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ewe_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fon_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fra_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fuv_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-gaz_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-hau_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ibo_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kab_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kam_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kbp_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kea_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kik_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kin_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kmb_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-knc_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-knc_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kon_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lin_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lua_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lug_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-luo_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-mos_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nso_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nus_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nya_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-plt_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-run_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sag_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sna_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-som_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sot_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ssw_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sun_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-swh_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-taq_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-taq_Tfng.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tir_Ethi.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tsn_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tso_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tum_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-twi_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tzm_Tfng.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-umb_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-wol_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-xho_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-yor_Latn.yaml
│   │   │   │   │   │   └── flores_eng_Latn-zul_Latn.yaml
│   │   │   │   │   └── flores
│   │   │   │   └── prompt_3/
│   │   │   │       ├── african-english/
│   │   │   │       │   ├── flores
│   │   │   │       │   ├── flores_ace_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_ace_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_acq_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_aeb_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_afr_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_aka_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_amh_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── flores_ary_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_arz_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_bam_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_ban_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_bem_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_cjk_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_dik_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_dyu_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_ewe_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_fon_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_fra_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_fuv_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_gaz_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_hau_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_ibo_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kab_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kam_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kbp_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kea_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kik_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kin_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kmb_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_knc_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_knc_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kon_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_lin_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_lua_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_lug_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_luo_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_mos_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_nso_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_nus_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_nya_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_plt_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_run_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_sag_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_sna_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_som_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_sot_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_ssw_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_sun_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_swh_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_taq_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_taq_Tfng-eng_Latn.yaml
│   │   │   │       │   ├── flores_tir_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── flores_tsn_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_tso_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_tum_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_twi_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_tzm_Tfng-eng_Latn.yaml
│   │   │   │       │   ├── flores_umb_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_wol_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_xho_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_yor_Latn-eng_Latn.yaml
│   │   │   │       │   └── flores_zul_Latn-eng_Latn.yaml
│   │   │   │       ├── english-african/
│   │   │   │       │   ├── flores
│   │   │   │       │   ├── flores_eng_Latn-ace_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-ace_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-acq_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-aeb_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-afr_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-aka_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-amh_Ethi.yaml
│   │   │   │       │   ├── flores_eng_Latn-ary_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-arz_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-bam_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-ban_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-bem_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-cjk_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-dik_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-dyu_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-ewe_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-fon_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-fra_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-fuv_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-gaz_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-hau_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-ibo_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kab_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kam_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kbp_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kea_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kik_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kin_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kmb_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-knc_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-knc_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kon_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-lin_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-lua_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-lug_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-luo_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-mos_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-nso_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-nus_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-nya_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-plt_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-run_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-sag_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-sna_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-som_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-sot_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-ssw_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-sun_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-swh_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-taq_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-taq_Tfng.yaml
│   │   │   │       │   ├── flores_eng_Latn-tir_Ethi.yaml
│   │   │   │       │   ├── flores_eng_Latn-tsn_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-tso_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-tum_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-twi_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-tzm_Tfng.yaml
│   │   │   │       │   ├── flores_eng_Latn-umb_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-wol_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-xho_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-yor_Latn.yaml
│   │   │   │       │   └── flores_eng_Latn-zul_Latn.yaml
│   │   │   │       └── flores
│   │   │   ├── injongointent/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── injongointent.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── injongointent
│   │   │   │   │   ├── injongointent_amh.yaml
│   │   │   │   │   ├── injongointent_eng.yaml
│   │   │   │   │   ├── injongointent_ewe.yaml
│   │   │   │   │   ├── injongointent_hau.yaml
│   │   │   │   │   ├── injongointent_ibo.yaml
│   │   │   │   │   ├── injongointent_kin.yaml
│   │   │   │   │   ├── injongointent_lin.yaml
│   │   │   │   │   ├── injongointent_lug.yaml
│   │   │   │   │   ├── injongointent_orm.yaml
│   │   │   │   │   ├── injongointent_sna.yaml
│   │   │   │   │   ├── injongointent_sot.yaml
│   │   │   │   │   ├── injongointent_swa.yaml
│   │   │   │   │   ├── injongointent_twi.yaml
│   │   │   │   │   ├── injongointent_wol.yaml
│   │   │   │   │   ├── injongointent_xho.yaml
│   │   │   │   │   ├── injongointent_yor.yaml
│   │   │   │   │   ├── injongointent_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── injongointent
│   │   │   │   │   ├── injongointent_amh.yaml
│   │   │   │   │   ├── injongointent_eng.yaml
│   │   │   │   │   ├── injongointent_ewe.yaml
│   │   │   │   │   ├── injongointent_hau.yaml
│   │   │   │   │   ├── injongointent_ibo.yaml
│   │   │   │   │   ├── injongointent_kin.yaml
│   │   │   │   │   ├── injongointent_lin.yaml
│   │   │   │   │   ├── injongointent_lug.yaml
│   │   │   │   │   ├── injongointent_orm.yaml
│   │   │   │   │   ├── injongointent_sna.yaml
│   │   │   │   │   ├── injongointent_sot.yaml
│   │   │   │   │   ├── injongointent_swa.yaml
│   │   │   │   │   ├── injongointent_twi.yaml
│   │   │   │   │   ├── injongointent_wol.yaml
│   │   │   │   │   ├── injongointent_xho.yaml
│   │   │   │   │   ├── injongointent_yor.yaml
│   │   │   │   │   ├── injongointent_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── injongointent
│   │   │   │   │   ├── injongointent_amh.yaml
│   │   │   │   │   ├── injongointent_eng.yaml
│   │   │   │   │   ├── injongointent_ewe.yaml
│   │   │   │   │   ├── injongointent_hau.yaml
│   │   │   │   │   ├── injongointent_ibo.yaml
│   │   │   │   │   ├── injongointent_kin.yaml
│   │   │   │   │   ├── injongointent_lin.yaml
│   │   │   │   │   ├── injongointent_lug.yaml
│   │   │   │   │   ├── injongointent_orm.yaml
│   │   │   │   │   ├── injongointent_sna.yaml
│   │   │   │   │   ├── injongointent_sot.yaml
│   │   │   │   │   ├── injongointent_swa.yaml
│   │   │   │   │   ├── injongointent_twi.yaml
│   │   │   │   │   ├── injongointent_wol.yaml
│   │   │   │   │   ├── injongointent_xho.yaml
│   │   │   │   │   ├── injongointent_yor.yaml
│   │   │   │   │   ├── injongointent_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── injongointent
│   │   │   │   │   ├── injongointent_amh.yaml
│   │   │   │   │   ├── injongointent_eng.yaml
│   │   │   │   │   ├── injongointent_ewe.yaml
│   │   │   │   │   ├── injongointent_hau.yaml
│   │   │   │   │   ├── injongointent_ibo.yaml
│   │   │   │   │   ├── injongointent_kin.yaml
│   │   │   │   │   ├── injongointent_lin.yaml
│   │   │   │   │   ├── injongointent_lug.yaml
│   │   │   │   │   ├── injongointent_orm.yaml
│   │   │   │   │   ├── injongointent_sna.yaml
│   │   │   │   │   ├── injongointent_sot.yaml
│   │   │   │   │   ├── injongointent_swa.yaml
│   │   │   │   │   ├── injongointent_twi.yaml
│   │   │   │   │   ├── injongointent_wol.yaml
│   │   │   │   │   ├── injongointent_xho.yaml
│   │   │   │   │   ├── injongointent_yor.yaml
│   │   │   │   │   ├── injongointent_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── injongointent
│   │   │   │       ├── injongointent_amh.yaml
│   │   │   │       ├── injongointent_eng.yaml
│   │   │   │       ├── injongointent_ewe.yaml
│   │   │   │       ├── injongointent_hau.yaml
│   │   │   │       ├── injongointent_ibo.yaml
│   │   │   │       ├── injongointent_kin.yaml
│   │   │   │       ├── injongointent_lin.yaml
│   │   │   │       ├── injongointent_lug.yaml
│   │   │   │       ├── injongointent_orm.yaml
│   │   │   │       ├── injongointent_sna.yaml
│   │   │   │       ├── injongointent_sot.yaml
│   │   │   │       ├── injongointent_swa.yaml
│   │   │   │       ├── injongointent_twi.yaml
│   │   │   │       ├── injongointent_wol.yaml
│   │   │   │       ├── injongointent_xho.yaml
│   │   │   │       ├── injongointent_yor.yaml
│   │   │   │       ├── injongointent_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── mafand/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── mafand.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── mafand
│   │   │   │   │   │   ├── mafand_amh-en.yaml
│   │   │   │   │   │   ├── mafand_bam-fr.yaml
│   │   │   │   │   │   ├── mafand_bbj-fr.yaml
│   │   │   │   │   │   ├── mafand_ewe-fr.yaml
│   │   │   │   │   │   ├── mafand_fon-fr.yaml
│   │   │   │   │   │   ├── mafand_hau-en.yaml
│   │   │   │   │   │   ├── mafand_ibo-en.yaml
│   │   │   │   │   │   ├── mafand_kin-en.yaml
│   │   │   │   │   │   ├── mafand_lug-en.yaml
│   │   │   │   │   │   ├── mafand_luo-en.yaml
│   │   │   │   │   │   ├── mafand_mos-fr.yaml
│   │   │   │   │   │   ├── mafand_nya-en.yaml
│   │   │   │   │   │   ├── mafand_pcm-en.yaml
│   │   │   │   │   │   ├── mafand_sna-en.yaml
│   │   │   │   │   │   ├── mafand_swa-en.yaml
│   │   │   │   │   │   ├── mafand_tsn-en.yaml
│   │   │   │   │   │   ├── mafand_twi-en.yaml
│   │   │   │   │   │   ├── mafand_wol-fr.yaml
│   │   │   │   │   │   ├── mafand_xho-en.yaml
│   │   │   │   │   │   ├── mafand_yor-en.yaml
│   │   │   │   │   │   ├── mafand_zul-en.yaml
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   └── english-african/
│   │   │   │   │       ├── mafand
│   │   │   │   │       ├── mafand_en-amh.yaml
│   │   │   │   │       ├── mafand_en-hau.yaml
│   │   │   │   │       ├── mafand_en-ibo.yaml
│   │   │   │   │       ├── mafand_en-kin.yaml
│   │   │   │   │       ├── mafand_en-lug.yaml
│   │   │   │   │       ├── mafand_en-luo.yaml
│   │   │   │   │       ├── mafand_en-nya.yaml
│   │   │   │   │       ├── mafand_en-pcm.yaml
│   │   │   │   │       ├── mafand_en-sna.yaml
│   │   │   │   │       ├── mafand_en-swa.yaml
│   │   │   │   │       ├── mafand_en-tsn.yaml
│   │   │   │   │       ├── mafand_en-twi.yaml
│   │   │   │   │       ├── mafand_en-xho.yaml
│   │   │   │   │       ├── mafand_en-yor.yaml
│   │   │   │   │       ├── mafand_en-zul.yaml
│   │   │   │   │       ├── mafand_fr-bam.yaml
│   │   │   │   │       ├── mafand_fr-bbj.yaml
│   │   │   │   │       ├── mafand_fr-ewe.yaml
│   │   │   │   │       ├── mafand_fr-fon.yaml
│   │   │   │   │       ├── mafand_fr-mos.yaml
│   │   │   │   │       ├── mafand_fr-wol.yaml
│   │   │   │   │       └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── mafand
│   │   │   │   │   │   ├── mafand_amh-en.yaml
│   │   │   │   │   │   ├── mafand_bam-fr.yaml
│   │   │   │   │   │   ├── mafand_bbj-fr.yaml
│   │   │   │   │   │   ├── mafand_ewe-fr.yaml
│   │   │   │   │   │   ├── mafand_fon-fr.yaml
│   │   │   │   │   │   ├── mafand_hau-en.yaml
│   │   │   │   │   │   ├── mafand_ibo-en.yaml
│   │   │   │   │   │   ├── mafand_kin-en.yaml
│   │   │   │   │   │   ├── mafand_lug-en.yaml
│   │   │   │   │   │   ├── mafand_luo-en.yaml
│   │   │   │   │   │   ├── mafand_mos-fr.yaml
│   │   │   │   │   │   ├── mafand_nya-en.yaml
│   │   │   │   │   │   ├── mafand_pcm-en.yaml
│   │   │   │   │   │   ├── mafand_sna-en.yaml
│   │   │   │   │   │   ├── mafand_swa-en.yaml
│   │   │   │   │   │   ├── mafand_tsn-en.yaml
│   │   │   │   │   │   ├── mafand_twi-en.yaml
│   │   │   │   │   │   ├── mafand_wol-fr.yaml
│   │   │   │   │   │   ├── mafand_xho-en.yaml
│   │   │   │   │   │   ├── mafand_yor-en.yaml
│   │   │   │   │   │   ├── mafand_zul-en.yaml
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   └── english-african/
│   │   │   │   │       ├── mafand
│   │   │   │   │       ├── mafand_en-amh.yaml
│   │   │   │   │       ├── mafand_en-hau.yaml
│   │   │   │   │       ├── mafand_en-ibo.yaml
│   │   │   │   │       ├── mafand_en-kin.yaml
│   │   │   │   │       ├── mafand_en-lug.yaml
│   │   │   │   │       ├── mafand_en-luo.yaml
│   │   │   │   │       ├── mafand_en-nya.yaml
│   │   │   │   │       ├── mafand_en-pcm.yaml
│   │   │   │   │       ├── mafand_en-sna.yaml
│   │   │   │   │       ├── mafand_en-swa.yaml
│   │   │   │   │       ├── mafand_en-tsn.yaml
│   │   │   │   │       ├── mafand_en-twi.yaml
│   │   │   │   │       ├── mafand_en-xho.yaml
│   │   │   │   │       ├── mafand_en-yor.yaml
│   │   │   │   │       ├── mafand_en-zul.yaml
│   │   │   │   │       ├── mafand_fr-bam.yaml
│   │   │   │   │       ├── mafand_fr-bbj.yaml
│   │   │   │   │       ├── mafand_fr-ewe.yaml
│   │   │   │   │       ├── mafand_fr-fon.yaml
│   │   │   │   │       ├── mafand_fr-mos.yaml
│   │   │   │   │       ├── mafand_fr-wol.yaml
│   │   │   │   │       └── utils.py
│   │   │   │   └── prompt_3/
│   │   │   │       ├── african-english/
│   │   │   │       │   ├── mafand
│   │   │   │       │   ├── mafand_amh-en.yaml
│   │   │   │       │   ├── mafand_bam-fr.yaml
│   │   │   │       │   ├── mafand_bbj-fr.yaml
│   │   │   │       │   ├── mafand_ewe-fr.yaml
│   │   │   │       │   ├── mafand_fon-fr.yaml
│   │   │   │       │   ├── mafand_hau-en.yaml
│   │   │   │       │   ├── mafand_ibo-en.yaml
│   │   │   │       │   ├── mafand_kin-en.yaml
│   │   │   │       │   ├── mafand_lug-en.yaml
│   │   │   │       │   ├── mafand_luo-en.yaml
│   │   │   │       │   ├── mafand_mos-fr.yaml
│   │   │   │       │   ├── mafand_nya-en.yaml
│   │   │   │       │   ├── mafand_pcm-en.yaml
│   │   │   │       │   ├── mafand_sna-en.yaml
│   │   │   │       │   ├── mafand_swa-en.yaml
│   │   │   │       │   ├── mafand_tsn-en.yaml
│   │   │   │       │   ├── mafand_twi-en.yaml
│   │   │   │       │   ├── mafand_wol-fr.yaml
│   │   │   │       │   ├── mafand_xho-en.yaml
│   │   │   │       │   ├── mafand_yor-en.yaml
│   │   │   │       │   ├── mafand_zul-en.yaml
│   │   │   │       │   └── utils.py
│   │   │   │       └── english-african/
│   │   │   │           ├── mafand
│   │   │   │           ├── mafand_en-amh.yaml
│   │   │   │           ├── mafand_en-hau.yaml
│   │   │   │           ├── mafand_en-ibo.yaml
│   │   │   │           ├── mafand_en-kin.yaml
│   │   │   │           ├── mafand_en-lug.yaml
│   │   │   │           ├── mafand_en-luo.yaml
│   │   │   │           ├── mafand_en-nya.yaml
│   │   │   │           ├── mafand_en-pcm.yaml
│   │   │   │           ├── mafand_en-sna.yaml
│   │   │   │           ├── mafand_en-swa.yaml
│   │   │   │           ├── mafand_en-tsn.yaml
│   │   │   │           ├── mafand_en-twi.yaml
│   │   │   │           ├── mafand_en-xho.yaml
│   │   │   │           ├── mafand_en-yor.yaml
│   │   │   │           ├── mafand_en-zul.yaml
│   │   │   │           ├── mafand_fr-bam.yaml
│   │   │   │           ├── mafand_fr-bbj.yaml
│   │   │   │           ├── mafand_fr-ewe.yaml
│   │   │   │           ├── mafand_fr-fon.yaml
│   │   │   │           ├── mafand_fr-mos.yaml
│   │   │   │           ├── mafand_fr-wol.yaml
│   │   │   │           └── utils.py
│   │   │   ├── masakhaner/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── masakhaner.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── masakhaner
│   │   │   │   │   ├── masakhaner_am.yaml
│   │   │   │   │   ├── masakhaner_bbj.yaml
│   │   │   │   │   ├── masakhaner_bm.yaml
│   │   │   │   │   ├── masakhaner_ee.yaml
│   │   │   │   │   ├── masakhaner_ha.yaml
│   │   │   │   │   ├── masakhaner_ig.yaml
│   │   │   │   │   ├── masakhaner_lg.yaml
│   │   │   │   │   ├── masakhaner_luo.yaml
│   │   │   │   │   ├── masakhaner_mos.yaml
│   │   │   │   │   ├── masakhaner_ny.yaml
│   │   │   │   │   ├── masakhaner_pcm.yaml
│   │   │   │   │   ├── masakhaner_rw.yaml
│   │   │   │   │   ├── masakhaner_sn.yaml
│   │   │   │   │   ├── masakhaner_sw.yaml
│   │   │   │   │   ├── masakhaner_tn.yaml
│   │   │   │   │   ├── masakhaner_tw.yaml
│   │   │   │   │   ├── masakhaner_wo.yaml
│   │   │   │   │   ├── masakhaner_xh.yaml
│   │   │   │   │   ├── masakhaner_yo.yaml
│   │   │   │   │   ├── masakhaner_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── masakhaner
│   │   │   │   │   ├── masakhaner_am.yaml
│   │   │   │   │   ├── masakhaner_bbj.yaml
│   │   │   │   │   ├── masakhaner_bm.yaml
│   │   │   │   │   ├── masakhaner_ee.yaml
│   │   │   │   │   ├── masakhaner_ha.yaml
│   │   │   │   │   ├── masakhaner_ig.yaml
│   │   │   │   │   ├── masakhaner_lg.yaml
│   │   │   │   │   ├── masakhaner_luo.yaml
│   │   │   │   │   ├── masakhaner_mos.yaml
│   │   │   │   │   ├── masakhaner_ny.yaml
│   │   │   │   │   ├── masakhaner_pcm.yaml
│   │   │   │   │   ├── masakhaner_rw.yaml
│   │   │   │   │   ├── masakhaner_sn.yaml
│   │   │   │   │   ├── masakhaner_sw.yaml
│   │   │   │   │   ├── masakhaner_tn.yaml
│   │   │   │   │   ├── masakhaner_tw.yaml
│   │   │   │   │   ├── masakhaner_wo.yaml
│   │   │   │   │   ├── masakhaner_xh.yaml
│   │   │   │   │   ├── masakhaner_yo.yaml
│   │   │   │   │   ├── masakhaner_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── masakhaner
│   │   │   │   │   ├── masakhaner_am.yaml
│   │   │   │   │   ├── masakhaner_bbj.yaml
│   │   │   │   │   ├── masakhaner_bm.yaml
│   │   │   │   │   ├── masakhaner_ee.yaml
│   │   │   │   │   ├── masakhaner_ha.yaml
│   │   │   │   │   ├── masakhaner_ig.yaml
│   │   │   │   │   ├── masakhaner_lg.yaml
│   │   │   │   │   ├── masakhaner_luo.yaml
│   │   │   │   │   ├── masakhaner_mos.yaml
│   │   │   │   │   ├── masakhaner_ny.yaml
│   │   │   │   │   ├── masakhaner_pcm.yaml
│   │   │   │   │   ├── masakhaner_rw.yaml
│   │   │   │   │   ├── masakhaner_sn.yaml
│   │   │   │   │   ├── masakhaner_sw.yaml
│   │   │   │   │   ├── masakhaner_tn.yaml
│   │   │   │   │   ├── masakhaner_tw.yaml
│   │   │   │   │   ├── masakhaner_wo.yaml
│   │   │   │   │   ├── masakhaner_xh.yaml
│   │   │   │   │   ├── masakhaner_yo.yaml
│   │   │   │   │   ├── masakhaner_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── masakhaner
│   │   │   │   │   ├── masakhaner_am.yaml
│   │   │   │   │   ├── masakhaner_bbj.yaml
│   │   │   │   │   ├── masakhaner_bm.yaml
│   │   │   │   │   ├── masakhaner_ee.yaml
│   │   │   │   │   ├── masakhaner_ha.yaml
│   │   │   │   │   ├── masakhaner_ig.yaml
│   │   │   │   │   ├── masakhaner_lg.yaml
│   │   │   │   │   ├── masakhaner_luo.yaml
│   │   │   │   │   ├── masakhaner_mos.yaml
│   │   │   │   │   ├── masakhaner_ny.yaml
│   │   │   │   │   ├── masakhaner_pcm.yaml
│   │   │   │   │   ├── masakhaner_rw.yaml
│   │   │   │   │   ├── masakhaner_sn.yaml
│   │   │   │   │   ├── masakhaner_sw.yaml
│   │   │   │   │   ├── masakhaner_tn.yaml
│   │   │   │   │   ├── masakhaner_tw.yaml
│   │   │   │   │   ├── masakhaner_wo.yaml
│   │   │   │   │   ├── masakhaner_xh.yaml
│   │   │   │   │   ├── masakhaner_yo.yaml
│   │   │   │   │   ├── masakhaner_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── masakhaner
│   │   │   │       ├── masakhaner_am.yaml
│   │   │   │       ├── masakhaner_bbj.yaml
│   │   │   │       ├── masakhaner_bm.yaml
│   │   │   │       ├── masakhaner_ee.yaml
│   │   │   │       ├── masakhaner_ha.yaml
│   │   │   │       ├── masakhaner_ig.yaml
│   │   │   │       ├── masakhaner_lg.yaml
│   │   │   │       ├── masakhaner_luo.yaml
│   │   │   │       ├── masakhaner_mos.yaml
│   │   │   │       ├── masakhaner_ny.yaml
│   │   │   │       ├── masakhaner_pcm.yaml
│   │   │   │       ├── masakhaner_rw.yaml
│   │   │   │       ├── masakhaner_sn.yaml
│   │   │   │       ├── masakhaner_sw.yaml
│   │   │   │       ├── masakhaner_tn.yaml
│   │   │   │       ├── masakhaner_tw.yaml
│   │   │   │       ├── masakhaner_wo.yaml
│   │   │   │       ├── masakhaner_xh.yaml
│   │   │   │       ├── masakhaner_yo.yaml
│   │   │   │       ├── masakhaner_zu.yaml
│   │   │   │       └── utils.py
│   │   │   ├── masakhanews/
│   │   │   │   ├── README.md
│   │   │   │   ├── masakhanews.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── utils.py
│   │   │   ├── masakhapos/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── masakhapos.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── utils.py
│   │   │   ├── naijarc/
│   │   │   │   ├── README.md
│   │   │   │   ├── naijarc.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   └── utils.py
│   │   │   ├── nollysenti/
│   │   │   │   ├── README.md
│   │   │   │   ├── nollysenti.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── nollysenti
│   │   │   │   │   ├── nollysenti_eng.yaml
│   │   │   │   │   ├── nollysenti_hau.yaml
│   │   │   │   │   ├── nollysenti_ibo.yaml
│   │   │   │   │   ├── nollysenti_pcm.yaml
│   │   │   │   │   ├── nollysenti_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── nollysenti
│   │   │   │   │   ├── nollysenti_eng.yaml
│   │   │   │   │   ├── nollysenti_hau.yaml
│   │   │   │   │   ├── nollysenti_ibo.yaml
│   │   │   │   │   ├── nollysenti_pcm.yaml
│   │   │   │   │   ├── nollysenti_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── nollysenti
│   │   │   │   │   ├── nollysenti_eng.yaml
│   │   │   │   │   ├── nollysenti_hau.yaml
│   │   │   │   │   ├── nollysenti_ibo.yaml
│   │   │   │   │   ├── nollysenti_pcm.yaml
│   │   │   │   │   ├── nollysenti_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── nollysenti
│   │   │   │   │   ├── nollysenti_eng.yaml
│   │   │   │   │   ├── nollysenti_hau.yaml
│   │   │   │   │   ├── nollysenti_ibo.yaml
│   │   │   │   │   ├── nollysenti_pcm.yaml
│   │   │   │   │   ├── nollysenti_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── nollysenti
│   │   │   │       ├── nollysenti_eng.yaml
│   │   │   │       ├── nollysenti_hau.yaml
│   │   │   │       ├── nollysenti_ibo.yaml
│   │   │   │       ├── nollysenti_pcm.yaml
│   │   │   │       ├── nollysenti_yor.yaml
│   │   │   │       └── utils.py
│   │   │   ├── ntrex/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── ntrex.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── ntrex
│   │   │   │   │   │   ├── ntrex_afr_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_amh_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_arb_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_bem_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ewe_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_fra_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_hau_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ibo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_kin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_mey_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_mlg_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_msa_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nde_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nya_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_orm_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_shi_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_sna_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_som_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ssw_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_swa_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tam_Taml-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tel_Telu-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tir_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ton_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tsn_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_urd_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ven_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_wol_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_xho_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_yor_Latn-eng_Latn.yaml
│   │   │   │   │   │   └── ntrex_zul_Latn-eng_Latn.yaml
│   │   │   │   │   └── english-african/
│   │   │   │   │       ├── ntrex
│   │   │   │   │       ├── ntrex_eng_Latn-afr_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-amh_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-arb_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-bem_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ewe_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-fra_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-hau_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ibo_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-kin_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-mey_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-mlg_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-msa_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nde_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nso_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nya_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-orm_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-shi_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-sna_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-som_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ssw_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-swa_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tam_Taml.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tel_Telu.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tir_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ton_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tsn_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-urd_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ven_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-wol_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-xho_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-yor_Latn.yaml
│   │   │   │   │       └── ntrex_eng_Latn-zul_Latn.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── ntrex
│   │   │   │   │   │   ├── ntrex_afr_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_amh_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_arb_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_bem_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ewe_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_fra_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_hau_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ibo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_kin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_mey_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_mlg_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_msa_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nde_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nya_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_orm_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_shi_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_sna_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_som_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ssw_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_swa_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tam_Taml-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tel_Telu-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tir_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ton_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tsn_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_urd_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ven_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_wol_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_xho_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_yor_Latn-eng_Latn.yaml
│   │   │   │   │   │   └── ntrex_zul_Latn-eng_Latn.yaml
│   │   │   │   │   └── english-african/
│   │   │   │   │       ├── ntrex
│   │   │   │   │       ├── ntrex_eng_Latn-afr_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-amh_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-arb_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-bem_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ewe_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-fra_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-hau_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ibo_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-kin_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-mey_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-mlg_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-msa_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nde_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nso_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nya_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-orm_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-shi_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-sna_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-som_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ssw_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-swa_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tam_Taml.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tel_Telu.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tir_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ton_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tsn_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-urd_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ven_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-wol_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-xho_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-yor_Latn.yaml
│   │   │   │   │       └── ntrex_eng_Latn-zul_Latn.yaml
│   │   │   │   └── prompt_3/
│   │   │   │       ├── african-english/
│   │   │   │       │   ├── ntrex
│   │   │   │       │   ├── ntrex_afr_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_amh_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_arb_Arab-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_bem_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ewe_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_fra_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_hau_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ibo_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_kin_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_mey_Arab-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_mlg_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_msa_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_nde_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_nso_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_nya_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_orm_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_shi_Arab-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_sna_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_som_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ssw_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_swa_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_tam_Taml-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_tel_Telu-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_tir_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ton_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_tsn_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_urd_Arab-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ven_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_wol_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_xho_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_yor_Latn-eng_Latn.yaml
│   │   │   │       │   └── ntrex_zul_Latn-eng_Latn.yaml
│   │   │   │       └── english-african/
│   │   │   │           ├── ntrex
│   │   │   │           ├── ntrex_eng_Latn-afr_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-amh_Ethi.yaml
│   │   │   │           ├── ntrex_eng_Latn-arb_Arab.yaml
│   │   │   │           ├── ntrex_eng_Latn-bem_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-ewe_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-fra_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-hau_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-ibo_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-kin_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-mey_Arab.yaml
│   │   │   │           ├── ntrex_eng_Latn-mlg_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-msa_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-nde_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-nso_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-nya_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-orm_Ethi.yaml
│   │   │   │           ├── ntrex_eng_Latn-shi_Arab.yaml
│   │   │   │           ├── ntrex_eng_Latn-sna_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-som_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-ssw_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-swa_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-tam_Taml.yaml
│   │   │   │           ├── ntrex_eng_Latn-tel_Telu.yaml
│   │   │   │           ├── ntrex_eng_Latn-tir_Ethi.yaml
│   │   │   │           ├── ntrex_eng_Latn-ton_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-tsn_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-urd_Arab.yaml
│   │   │   │           ├── ntrex_eng_Latn-ven_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-wol_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-xho_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-yor_Latn.yaml
│   │   │   │           └── ntrex_eng_Latn-zul_Latn.yaml
│   │   │   ├── openai_mmlu/
│   │   │   │   ├── README.md
│   │   │   │   ├── openai_mmlu.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   └── utils.py
│   │   │   ├── salt/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── salt
│   │   │   │   │   ├── salt_ach-eng.yaml
│   │   │   │   │   ├── salt_eng-ach.yaml
│   │   │   │   │   ├── salt_eng-ibo.yaml
│   │   │   │   │   ├── salt_eng-lgg.yaml
│   │   │   │   │   ├── salt_eng-lug.yaml
│   │   │   │   │   ├── salt_eng-nyn.yaml
│   │   │   │   │   ├── salt_eng-swa.yaml
│   │   │   │   │   ├── salt_eng-teo.yaml
│   │   │   │   │   ├── salt_ibo-eng.yaml
│   │   │   │   │   ├── salt_lgg-eng.yaml
│   │   │   │   │   ├── salt_lug-eng.yaml
│   │   │   │   │   ├── salt_nyn-eng.yaml
│   │   │   │   │   ├── salt_swa-eng.yaml
│   │   │   │   │   └── salt_teo-eng.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── salt
│   │   │   │   │   ├── salt_ach-eng.yaml
│   │   │   │   │   ├── salt_eng-ach.yaml
│   │   │   │   │   ├── salt_eng-ibo.yaml
│   │   │   │   │   ├── salt_eng-lgg.yaml
│   │   │   │   │   ├── salt_eng-lug.yaml
│   │   │   │   │   ├── salt_eng-nyn.yaml
│   │   │   │   │   ├── salt_eng-swa.yaml
│   │   │   │   │   ├── salt_eng-teo.yaml
│   │   │   │   │   ├── salt_ibo-eng.yaml
│   │   │   │   │   ├── salt_lgg-eng.yaml
│   │   │   │   │   ├── salt_lug-eng.yaml
│   │   │   │   │   ├── salt_nyn-eng.yaml
│   │   │   │   │   ├── salt_swa-eng.yaml
│   │   │   │   │   └── salt_teo-eng.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── salt
│   │   │   │   │   ├── salt_ach-eng.yaml
│   │   │   │   │   ├── salt_eng-ach.yaml
│   │   │   │   │   ├── salt_eng-ibo.yaml
│   │   │   │   │   ├── salt_eng-lgg.yaml
│   │   │   │   │   ├── salt_eng-lug.yaml
│   │   │   │   │   ├── salt_eng-nyn.yaml
│   │   │   │   │   ├── salt_eng-swa.yaml
│   │   │   │   │   ├── salt_eng-teo.yaml
│   │   │   │   │   ├── salt_ibo-eng.yaml
│   │   │   │   │   ├── salt_lgg-eng.yaml
│   │   │   │   │   ├── salt_lug-eng.yaml
│   │   │   │   │   ├── salt_nyn-eng.yaml
│   │   │   │   │   ├── salt_swa-eng.yaml
│   │   │   │   │   └── salt_teo-eng.yaml
│   │   │   │   └── salt.yaml
│   │   │   ├── sample_run_scripts/
│   │   │   │   ├── run_afrobench.sh
│   │   │   │   └── run_afrobench_lite.sh
│   │   │   ├── sib/
│   │   │   │   ├── README.md
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── sib.yaml
│   │   │   │   └── utils.py
│   │   │   ├── uhura-arc-easy/
│   │   │   │   ├── README.md
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── uhura.yaml
│   │   │   │   └── utils.py
│   │   │   └── xlsum/
│   │   │       ├── README.md
│   │   │       ├── prompt_1/
│   │   │       │   ├── utils.py
│   │   │       │   ├── xlsum
│   │   │       │   ├── xlsum_amharic.yaml
│   │   │       │   ├── xlsum_arabic.yaml
│   │   │       │   ├── xlsum_hausa.yaml
│   │   │       │   ├── xlsum_igbo.yaml
│   │   │       │   ├── xlsum_kirundi.yaml
│   │   │       │   ├── xlsum_oromo.yaml
│   │   │       │   ├── xlsum_pidgin.yaml
│   │   │       │   ├── xlsum_somali.yaml
│   │   │       │   ├── xlsum_swahili.yaml
│   │   │       │   ├── xlsum_telugu.yaml
│   │   │       │   ├── xlsum_tigrinya.yaml
│   │   │       │   └── xlsum_yoruba.yaml
│   │   │       ├── prompt_2/
│   │   │       │   ├── utils.py
│   │   │       │   ├── xlsum
│   │   │       │   ├── xlsum_amharic.yaml
│   │   │       │   ├── xlsum_arabic.yaml
│   │   │       │   ├── xlsum_hausa.yaml
│   │   │       │   ├── xlsum_igbo.yaml
│   │   │       │   ├── xlsum_kirundi.yaml
│   │   │       │   ├── xlsum_oromo.yaml
│   │   │       │   ├── xlsum_pidgin.yaml
│   │   │       │   ├── xlsum_somali.yaml
│   │   │       │   ├── xlsum_swahili.yaml
│   │   │       │   ├── xlsum_telugu.yaml
│   │   │       │   ├── xlsum_tigrinya.yaml
│   │   │       │   └── xlsum_yoruba.yaml
│   │   │       ├── prompt_3/
│   │   │       │   ├── utils.py
│   │   │       │   ├── xlsum
│   │   │       │   ├── xlsum_amharic.yaml
│   │   │       │   ├── xlsum_arabic.yaml
│   │   │       │   ├── xlsum_hausa.yaml
│   │   │       │   ├── xlsum_igbo.yaml
│   │   │       │   ├── xlsum_kirundi.yaml
│   │   │       │   ├── xlsum_oromo.yaml
│   │   │       │   ├── xlsum_pidgin.yaml
│   │   │       │   ├── xlsum_somali.yaml
│   │   │       │   ├── xlsum_swahili.yaml
│   │   │       │   ├── xlsum_telugu.yaml
│   │   │       │   ├── xlsum_tigrinya.yaml
│   │   │       │   └── xlsum_yoruba.yaml
│   │   │       ├── utils.py
│   │   │       └── xlsum.yaml
│   │   ├── agieval/
│   │   │   ├── README.md
│   │   │   ├── agieval.yaml
│   │   │   ├── agieval_cn.yaml
│   │   │   ├── agieval_en.yaml
│   │   │   ├── agieval_nous.yaml
│   │   │   ├── aqua-rat.yaml
│   │   │   ├── gaokao-biology.yaml
│   │   │   ├── gaokao-chemistry.yaml
│   │   │   ├── gaokao-chinese.yaml
│   │   │   ├── gaokao-english.yaml
│   │   │   ├── gaokao-geography.yaml
│   │   │   ├── gaokao-history.yaml
│   │   │   ├── gaokao-mathcloze.yaml
│   │   │   ├── gaokao-mathqa.yaml
│   │   │   ├── gaokao-physics.yaml
│   │   │   ├── jec-qa-ca.yaml
│   │   │   ├── jec-qa-kd.yaml
│   │   │   ├── logiqa-en.yaml
│   │   │   ├── logiqa-zh.yaml
│   │   │   ├── lsat-ar.yaml
│   │   │   ├── lsat-lr.yaml
│   │   │   ├── lsat-rc.yaml
│   │   │   ├── math.yaml
│   │   │   ├── sat-en-without-passage.yaml
│   │   │   ├── sat-en.yaml
│   │   │   ├── sat-math.yaml
│   │   │   └── utils.py
│   │   ├── aime/
│   │   │   ├── README.md
│   │   │   ├── aime.yaml
│   │   │   ├── aime24.yaml
│   │   │   ├── aime25.yaml
│   │   │   └── utils.py
│   │   ├── alghafa/
│   │   │   ├── copa_ar/
│   │   │   │   ├── README.md
│   │   │   │   └── copa_ar.yaml
│   │   │   └── piqa_ar/
│   │   │       ├── README.md
│   │   │       └── piqa_ar.yaml
│   │   ├── anli/
│   │   │   ├── README.md
│   │   │   ├── anli_r1.yaml
│   │   │   ├── anli_r2.yaml
│   │   │   └── anli_r3.yaml
│   │   ├── arab_culture/
│   │   │   ├── README.md
│   │   │   ├── _arab_culture.yaml
│   │   │   ├── _arab_culture_gulf.yaml
│   │   │   ├── _arab_culture_levant.yaml
│   │   │   ├── _arab_culture_nile_valley.yaml
│   │   │   ├── _arab_culture_north_africa.yaml
│   │   │   ├── _default_arab_culture_mcq_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── arab_culture_algeria.yaml
│   │   │   ├── arab_culture_egypt.yaml
│   │   │   ├── arab_culture_jordan.yaml
│   │   │   ├── arab_culture_ksa.yaml
│   │   │   ├── arab_culture_lebanon.yaml
│   │   │   ├── arab_culture_libya.yaml
│   │   │   ├── arab_culture_morocco.yaml
│   │   │   ├── arab_culture_palestine.yaml
│   │   │   ├── arab_culture_sudan.yaml
│   │   │   ├── arab_culture_syria.yaml
│   │   │   ├── arab_culture_tunisia.yaml
│   │   │   ├── arab_culture_uae.yaml
│   │   │   ├── arab_culture_yemen.yaml
│   │   │   ├── prompts.py
│   │   │   └── utils_mcq.py
│   │   ├── arab_culture_completion/
│   │   │   ├── README.md
│   │   │   ├── _arab_culture_completion.yaml
│   │   │   ├── _arab_culture_completion_gulf.yaml
│   │   │   ├── _arab_culture_completion_levant.yaml
│   │   │   ├── _arab_culture_completion_nile_valley.yaml
│   │   │   ├── _arab_culture_completion_north_africa.yaml
│   │   │   ├── _default_arab_culture_completion_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── arab_culture_completion_algeria.yaml
│   │   │   ├── arab_culture_completion_egypt.yaml
│   │   │   ├── arab_culture_completion_jordan.yaml
│   │   │   ├── arab_culture_completion_ksa.yaml
│   │   │   ├── arab_culture_completion_lebanon.yaml
│   │   │   ├── arab_culture_completion_libya.yaml
│   │   │   ├── arab_culture_completion_morocco.yaml
│   │   │   ├── arab_culture_completion_palestine.yaml
│   │   │   ├── arab_culture_completion_sudan.yaml
│   │   │   ├── arab_culture_completion_syria.yaml
│   │   │   ├── arab_culture_completion_tunisia.yaml
│   │   │   ├── arab_culture_completion_uae.yaml
│   │   │   ├── arab_culture_completion_yemen.yaml
│   │   │   ├── prompts.py
│   │   │   └── utils_completion.py
│   │   ├── arabic_leaderboard_complete/
│   │   │   ├── README.md
│   │   │   ├── arabic_leaderboard_alghafa/
│   │   │   │   ├── arabic_leaderboard_alghafa.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_mcq_exams_test_ar.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_meta_ar_dialects.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_meta_ar_msa.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_facts_truefalse_balanced_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_grounded_statement_soqal_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_grounded_statement_xglue_mlqa_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_rating_sentiment_no_neutral_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_rating_sentiment_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_sentiment_task.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_exams/
│   │   │   │   ├── arabic_exams.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_exams.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mmlu/
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_abstract_algebra.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_anatomy.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_astronomy.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_business_ethics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_clinical_knowledge.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_biology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_chemistry.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_computer_science.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_mathematics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_medicine.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_physics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_computer_security.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_conceptual_physics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_econometrics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_electrical_engineering.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_elementary_mathematics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_formal_logic.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_global_facts.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_biology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_chemistry.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_computer_science.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_european_history.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_geography.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_government_and_politics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_macroeconomics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_mathematics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_microeconomics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_physics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_psychology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_statistics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_us_history.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_world_history.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_human_aging.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_human_sexuality.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_international_law.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_jurisprudence.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_logical_fallacies.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_machine_learning.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_management.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_marketing.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_medical_genetics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_miscellaneous.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_moral_disputes.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_moral_scenarios.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_nutrition.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_philosophy.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_prehistory.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_accounting.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_law.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_medicine.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_psychology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_public_relations.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_security_studies.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_sociology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_us_foreign_policy.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_virology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_world_religions.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_arc_challenge/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_arc_challenge.yaml
│   │   │   │   ├── arabic_mt_arc_challenge.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_arc_easy/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_arc_easy.yaml
│   │   │   │   ├── arabic_mt_arc_easy.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_boolq/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_boolq.yaml
│   │   │   │   ├── arabic_mt_boolq.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_copa/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_copa.yaml
│   │   │   │   ├── arabic_mt_copa.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_hellaswag/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_hellaswag.yaml
│   │   │   │   ├── arabic_mt_hellaswag.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_mmlu/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_mmlu.yaml
│   │   │   │   ├── arabic_mt_mmlu.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_openbook_qa/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_openbook_qa.yaml
│   │   │   │   ├── arabic_mt_openbook_qa.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_piqa/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_piqa.yaml
│   │   │   │   ├── arabic_mt_piqa.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_race/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_race.yaml
│   │   │   │   ├── arabic_mt_race.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_sciq/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_sciq.yaml
│   │   │   │   ├── arabic_mt_sciq.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_toxigen/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_toxigen.yaml
│   │   │   │   ├── arabic_mt_toxigen.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_avca/
│   │   │   │   ├── arabic_leaderboard_acva.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Algeria.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Ancient_Egypt.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arab_Empire.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Architecture.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Art.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Astronomy.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Calligraphy.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Ceremony.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Clothing.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Culture.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Food.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Funeral.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Geography.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_History.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Language_Origin.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Literature.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Math.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Medicine.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Music.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Ornament.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Philosophy.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Physics_and_Chemistry.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Wedding.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Bahrain.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Comoros.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Egypt_modern.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromAncientEgypt.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromByzantium.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromChina.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromGreece.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromIslam.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromPersia.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromRome.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Iraq.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islam_Education.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islam_branches_and_schools.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islamic_law_system.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Jordan.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Kuwait.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Lebanon.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Libya.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Mauritania.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Mesopotamia_civilization.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Morocco.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Oman.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Palestine.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Qatar.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Saudi_Arabia.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Somalia.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Sudan.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Syria.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Tunisia.yaml
│   │   │   │   ├── arabic_leaderboard_acva_United_Arab_Emirates.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Yemen.yaml
│   │   │   │   ├── arabic_leaderboard_acva_communication.yaml
│   │   │   │   ├── arabic_leaderboard_acva_computer_and_phone.yaml
│   │   │   │   ├── arabic_leaderboard_acva_daily_life.yaml
│   │   │   │   ├── arabic_leaderboard_acva_entertainment.yaml
│   │   │   │   └── utils.py
│   │   │   └── arabic_leaderboard_complete.yaml
│   │   ├── arabic_leaderboard_light/
│   │   │   ├── README.md
│   │   │   ├── arabic_leaderboard_alghafa_light/
│   │   │   │   ├── arabic_leaderboard_alghafa_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_mcq_exams_test_ar_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_meta_ar_dialects_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_meta_ar_msa_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_facts_truefalse_balanced_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_grounded_statement_soqal_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_grounded_statement_xglue_mlqa_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_rating_sentiment_no_neutral_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_rating_sentiment_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_sentiment_task_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_exams_light/
│   │   │   │   ├── arabic_exams_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_exams_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mmlu_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_abstract_algebra_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_anatomy_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_astronomy_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_business_ethics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_clinical_knowledge_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_biology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_chemistry_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_computer_science_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_mathematics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_medicine_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_physics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_computer_security_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_conceptual_physics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_econometrics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_electrical_engineering_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_elementary_mathematics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_formal_logic_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_global_facts_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_biology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_chemistry_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_computer_science_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_european_history_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_geography_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_government_and_politics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_macroeconomics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_mathematics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_microeconomics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_physics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_psychology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_statistics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_us_history_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_world_history_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_human_aging_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_human_sexuality_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_international_law_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_jurisprudence_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_logical_fallacies_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_machine_learning_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_management_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_marketing_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_medical_genetics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_miscellaneous_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_moral_disputes_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_moral_scenarios_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_nutrition_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_philosophy_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_prehistory_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_accounting_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_law_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_medicine_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_psychology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_public_relations_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_security_studies_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_sociology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_us_foreign_policy_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_virology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_world_religions_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_arc_challenge_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_arc_challenge_light.yaml
│   │   │   │   ├── arabic_mt_arc_challenge_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_arc_easy_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_arc_easy_light.yaml
│   │   │   │   ├── arabic_mt_arc_easy_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_boolq_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_boolq_light.yaml
│   │   │   │   ├── arabic_mt_boolq_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_copa_light/
│   │   │   │   ├── arabic_mt_copa_light.yaml
│   │   │   │   ├── arbic_leaderboard_arabic_mt_copa_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_hellaswag_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_hellaswag_light.yaml
│   │   │   │   ├── arabic_mt_hellaswag_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_mmlu_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_mmlu_light.yaml
│   │   │   │   ├── arabic_mt_mmlu_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_openbook_qa_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_openbook_qa_light.yaml
│   │   │   │   ├── arabic_mt_openbook_qa_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_piqa_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_piqa_light.yaml
│   │   │   │   ├── arabic_mt_piqa_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_race_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_race_light.yaml
│   │   │   │   ├── arabic_mt_race_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_sciq_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_sciq_light.yaml
│   │   │   │   ├── arabic_mt_sciq_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_toxigen_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_toxigen_light.yaml
│   │   │   │   ├── arabic_mt_toxigen_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_avca_light/
│   │   │   │   ├── arabic_leaderboard_acva_Algeria_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Ancient_Egypt_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arab_Empire_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Architecture_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Art_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Astronomy_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Calligraphy_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Ceremony_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Clothing_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Culture_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Food_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Funeral_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Geography_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_History_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Language_Origin_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Literature_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Math_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Medicine_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Music_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Ornament_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Philosophy_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Physics_and_Chemistry_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Wedding_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Bahrain_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Comoros_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Egypt_modern_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromAncientEgypt_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromByzantium_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromChina_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromGreece_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromIslam_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromPersia_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromRome_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Iraq_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islam_Education_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islam_branches_and_schools_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islamic_law_system_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Jordan_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Kuwait_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Lebanon_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Libya_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Mauritania_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Mesopotamia_civilization_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Morocco_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Oman_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Palestine_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Qatar_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Saudi_Arabia_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Somalia_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Sudan_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Syria_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Tunisia_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_United_Arab_Emirates_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Yemen_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_communication_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_computer_and_phone_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_daily_life_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_entertainment_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_light.yaml
│   │   │   │   └── utils.py
│   │   │   └── arabic_leaderboard_light.yaml
│   │   ├── arabicmmlu/
│   │   │   ├── README.md
│   │   │   ├── _arabicmmlu.yaml
│   │   │   ├── _arabicmmlu_humanities.yaml
│   │   │   ├── _arabicmmlu_language.yaml
│   │   │   ├── _arabicmmlu_other.yaml
│   │   │   ├── _arabicmmlu_social_science.yaml
│   │   │   ├── _arabicmmlu_stem.yaml
│   │   │   ├── _default_arabicmmlu_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── arabicmmlu_accounting_university.yaml
│   │   │   ├── arabicmmlu_arabic_language_general.yaml
│   │   │   ├── arabicmmlu_arabic_language_grammar.yaml
│   │   │   ├── arabicmmlu_arabic_language_high_school.yaml
│   │   │   ├── arabicmmlu_arabic_language_middle_school.yaml
│   │   │   ├── arabicmmlu_arabic_language_primary_school.yaml
│   │   │   ├── arabicmmlu_biology_high_school.yaml
│   │   │   ├── arabicmmlu_civics_high_school.yaml
│   │   │   ├── arabicmmlu_civics_middle_school.yaml
│   │   │   ├── arabicmmlu_computer_science_high_school.yaml
│   │   │   ├── arabicmmlu_computer_science_middle_school.yaml
│   │   │   ├── arabicmmlu_computer_science_primary_school.yaml
│   │   │   ├── arabicmmlu_computer_science_university.yaml
│   │   │   ├── arabicmmlu_driving_test.yaml
│   │   │   ├── arabicmmlu_economics_high_school.yaml
│   │   │   ├── arabicmmlu_economics_middle_school.yaml
│   │   │   ├── arabicmmlu_economics_university.yaml
│   │   │   ├── arabicmmlu_general_knowledge.yaml
│   │   │   ├── arabicmmlu_general_knowledge_middle_school.yaml
│   │   │   ├── arabicmmlu_general_knowledge_primary_school.yaml
│   │   │   ├── arabicmmlu_geography_high_school.yaml
│   │   │   ├── arabicmmlu_geography_middle_school.yaml
│   │   │   ├── arabicmmlu_geography_primary_school.yaml
│   │   │   ├── arabicmmlu_history_high_school.yaml
│   │   │   ├── arabicmmlu_history_middle_school.yaml
│   │   │   ├── arabicmmlu_history_primary_school.yaml
│   │   │   ├── arabicmmlu_islamic_studies.yaml
│   │   │   ├── arabicmmlu_islamic_studies_high_school.yaml
│   │   │   ├── arabicmmlu_islamic_studies_middle_school.yaml
│   │   │   ├── arabicmmlu_islamic_studies_primary_school.yaml
│   │   │   ├── arabicmmlu_law_professional.yaml
│   │   │   ├── arabicmmlu_management_university.yaml
│   │   │   ├── arabicmmlu_math_primary_school.yaml
│   │   │   ├── arabicmmlu_natural_science_middle_school.yaml
│   │   │   ├── arabicmmlu_natural_science_primary_school.yaml
│   │   │   ├── arabicmmlu_philosophy_high_school.yaml
│   │   │   ├── arabicmmlu_physics_high_school.yaml
│   │   │   ├── arabicmmlu_political_science_university.yaml
│   │   │   ├── arabicmmlu_social_science_middle_school.yaml
│   │   │   ├── arabicmmlu_social_science_primary_school.yaml
│   │   │   └── utils.py
│   │   ├── aradice/
│   │   │   ├── ArabicMMLU/
│   │   │   │   ├── EGY/
│   │   │   │   │   ├── AraDiCE_ArabicMMLU.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_humanities_history.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_social-science_civics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_social-science_economics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_social-science_geography.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_stem_biology.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_stem_physics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_humanities_history.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_social-science_civics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_social-science_economics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_social-science_geography.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_social-science_social-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_stem_computer-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_stem_natural-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_humanities_islamic-studies.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_language_arabic-language-general.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_language_arabic-language-grammar.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_other_driving-test.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_other_general-knowledge.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_humanities_history.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_humanities_islamic-studies.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_language_arabic-language.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_other_general-knowledge.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_social-science_geography.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_social-science_social-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_stem_computer-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_stem_math.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_stem_natural-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_prof_humanities_law.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_other_management.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_social-science_accounting.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_social-science_economics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_social-science_political-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_stem_computer-science.yaml
│   │   │   │   │   ├── _default_template_yaml
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   └── utils.py
│   │   │   │   └── LEV/
│   │   │   │       ├── AraDiCE_ArabicMMLU.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_humanities_history.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_social-science_civics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_social-science_economics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_social-science_geography.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_stem_biology.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_stem_physics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_humanities_history.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_social-science_civics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_social-science_economics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_social-science_geography.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_social-science_social-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_stem_computer-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_stem_natural-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_humanities_islamic-studies.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_language_arabic-language-general.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_language_arabic-language-grammar.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_other_driving-test.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_other_general-knowledge.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_humanities_history.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_humanities_islamic-studies.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_language_arabic-language.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_other_general-knowledge.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_social-science_geography.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_social-science_social-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_stem_computer-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_stem_math.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_stem_natural-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_prof_humanities_law.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_other_management.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_social-science_accounting.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_social-science_economics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_social-science_political-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_stem_computer-science.yaml
│   │   │   │       ├── _default_template_yaml
│   │   │   │       ├── metrics.py
│   │   │   │       └── utils.py
│   │   │   ├── README.md
│   │   │   ├── aradice.yaml
│   │   │   ├── boolq/
│   │   │   │   ├── EGY/
│   │   │   │   │   ├── boolq_egy.yaml
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   └── utils.py
│   │   │   │   ├── ENG/
│   │   │   │   │   ├── boolq_eng.yaml
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   └── utils.py
│   │   │   │   ├── LEV/
│   │   │   │   │   ├── boolq_lev.yaml
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   └── utils.py
│   │   │   │   └── MSA/
│   │   │   │       ├── boolq_msa.yaml
│   │   │   │       ├── metrics.py
│   │   │   │       └── utils.py
│   │   │   ├── cultural-benchmark/
│   │   │   │   ├── egypt.yaml
│   │   │   │   ├── jordan.yaml
│   │   │   │   ├── lebanon.yaml
│   │   │   │   ├── metrics.py
│   │   │   │   ├── palestine.yaml
│   │   │   │   ├── qatar.yaml
│   │   │   │   ├── syria.yaml
│   │   │   │   └── utils.py
│   │   │   ├── openbookqa/
│   │   │   │   ├── metrics.py
│   │   │   │   ├── openbookqa_egy.yaml
│   │   │   │   ├── openbookqa_eng.yaml
│   │   │   │   ├── openbookqa_lev.yaml
│   │   │   │   ├── openbookqa_msa.yaml
│   │   │   │   └── utils.py
│   │   │   ├── piqa/
│   │   │   │   ├── metrics.py
│   │   │   │   ├── piqa_egy.yaml
│   │   │   │   ├── piqa_eng.yaml
│   │   │   │   ├── piqa_lev.yaml
│   │   │   │   └── piqa_msa.yaml
│   │   │   ├── truthfulqa_mcq/
│   │   │   │   ├── metrics.py
│   │   │   │   ├── truthfulqa_mc1_egy.yaml
│   │   │   │   ├── truthfulqa_mc1_eng.yaml
│   │   │   │   ├── truthfulqa_mc1_lev.yaml
│   │   │   │   └── truthfulqa_mc1_msa.yaml
│   │   │   └── winogrande/
│   │   │       ├── metrics.py
│   │   │       ├── utils.py
│   │   │       ├── winogrande_egy.yaml
│   │   │       ├── winogrande_eng.yaml
│   │   │       ├── winogrande_lev.yaml
│   │   │       └── winogrande_msa.yaml
│   │   ├── arc/
│   │   │   ├── README.md
│   │   │   ├── arc_challenge.yaml
│   │   │   ├── arc_challenge_chat.yaml
│   │   │   └── arc_easy.yaml
│   │   ├── arc_mt/
│   │   │   ├── README.md
│   │   │   ├── arc_challenge_mt_da.yaml
│   │   │   ├── arc_challenge_mt_de.yaml
│   │   │   ├── arc_challenge_mt_el.yaml
│   │   │   ├── arc_challenge_mt_es.yaml
│   │   │   ├── arc_challenge_mt_fi.yaml
│   │   │   ├── arc_challenge_mt_hu.yaml
│   │   │   ├── arc_challenge_mt_is.yaml
│   │   │   ├── arc_challenge_mt_it.yaml
│   │   │   ├── arc_challenge_mt_nb.yaml
│   │   │   ├── arc_challenge_mt_pl.yaml
│   │   │   ├── arc_challenge_mt_pt.yaml
│   │   │   └── arc_challenge_mt_sv.yaml
│   │   ├── arithmetic/
│   │   │   ├── README.md
│   │   │   ├── arithmetic_1dc.yaml
│   │   │   ├── arithmetic_2da.yaml
│   │   │   ├── arithmetic_2dm.yaml
│   │   │   ├── arithmetic_2ds.yaml
│   │   │   ├── arithmetic_3da.yaml
│   │   │   ├── arithmetic_3ds.yaml
│   │   │   ├── arithmetic_4da.yaml
│   │   │   ├── arithmetic_4ds.yaml
│   │   │   ├── arithmetic_5da.yaml
│   │   │   └── arithmetic_5ds.yaml
│   │   ├── asdiv/
│   │   │   ├── README.md
│   │   │   ├── asdiv-cot-llama.yaml
│   │   │   └── default.yaml
│   │   ├── babi/
│   │   │   ├── README.md
│   │   │   └── babi.yaml
│   │   ├── babilong/
│   │   │   ├── README.md
│   │   │   ├── _babilong_common_yaml
│   │   │   ├── babilong.yaml
│   │   │   ├── babilong_longctx.yaml
│   │   │   ├── babilong_qa1.yaml
│   │   │   ├── babilong_qa10.yaml
│   │   │   ├── babilong_qa11.yaml
│   │   │   ├── babilong_qa12.yaml
│   │   │   ├── babilong_qa13.yaml
│   │   │   ├── babilong_qa14.yaml
│   │   │   ├── babilong_qa15.yaml
│   │   │   ├── babilong_qa16.yaml
│   │   │   ├── babilong_qa17.yaml
│   │   │   ├── babilong_qa18.yaml
│   │   │   ├── babilong_qa19.yaml
│   │   │   ├── babilong_qa2.yaml
│   │   │   ├── babilong_qa20.yaml
│   │   │   ├── babilong_qa3.yaml
│   │   │   ├── babilong_qa4.yaml
│   │   │   ├── babilong_qa5.yaml
│   │   │   ├── babilong_qa6.yaml
│   │   │   ├── babilong_qa7.yaml
│   │   │   ├── babilong_qa8.yaml
│   │   │   ├── babilong_qa9.yaml
│   │   │   └── common_utils.py
│   │   ├── bangla/
│   │   │   ├── README.md
│   │   │   ├── bangla_boolqa.yaml
│   │   │   ├── bangla_commonsenseqa.yaml
│   │   │   ├── bangla_mmlu.yaml
│   │   │   ├── bangla_openbookqa.yaml
│   │   │   └── bangla_piqa.yaml
│   │   ├── basque_bench/
│   │   │   ├── README.md
│   │   │   ├── arc_eu_challenge.yaml
│   │   │   ├── arc_eu_easy.yaml
│   │   │   ├── basque_bench.yaml
│   │   │   ├── flores_eu/
│   │   │   │   ├── _flores_common_yaml
│   │   │   │   ├── create_yamls_flores_eu.py
│   │   │   │   ├── flores_ca-eu.yaml
│   │   │   │   ├── flores_de-eu.yaml
│   │   │   │   ├── flores_en-eu.yaml
│   │   │   │   ├── flores_es-eu.yaml
│   │   │   │   ├── flores_eu-ca.yaml
│   │   │   │   ├── flores_eu-de.yaml
│   │   │   │   ├── flores_eu-en.yaml
│   │   │   │   ├── flores_eu-es.yaml
│   │   │   │   ├── flores_eu-fr.yaml
│   │   │   │   ├── flores_eu-gl.yaml
│   │   │   │   ├── flores_eu-it.yaml
│   │   │   │   ├── flores_eu-pt.yaml
│   │   │   │   ├── flores_eu.yaml
│   │   │   │   ├── flores_fr-eu.yaml
│   │   │   │   ├── flores_gl-eu.yaml
│   │   │   │   ├── flores_it-eu.yaml
│   │   │   │   └── flores_pt-eu.yaml
│   │   │   ├── mgsm_direct_eu.yaml
│   │   │   ├── mgsm_native_cot_eu.yaml
│   │   │   ├── paws_eu.yaml
│   │   │   ├── piqa_eu.yaml
│   │   │   ├── utils.py
│   │   │   ├── wnli_eu.yaml
│   │   │   └── xcopa_eu.yaml
│   │   ├── basqueglue/
│   │   │   ├── README.md
│   │   │   ├── bec.yaml
│   │   │   ├── bhtc.yaml
│   │   │   ├── coref.yaml
│   │   │   ├── qnli.yaml
│   │   │   ├── utils.py
│   │   │   ├── vaxx.yaml
│   │   │   └── wic.yaml
│   │   ├── bbh/
│   │   │   ├── README.md
│   │   │   ├── _generate_configs.py
│   │   │   ├── cot_fewshot/
│   │   │   │   ├── _bbh.yaml
│   │   │   │   ├── _bbh_cot_fewshot.yaml
│   │   │   │   ├── _cot_fewshot_template_yaml
│   │   │   │   ├── boolean_expressions.yaml
│   │   │   │   ├── causal_judgement.yaml
│   │   │   │   ├── date_understanding.yaml
│   │   │   │   ├── disambiguation_qa.yaml
│   │   │   │   ├── dyck_languages.yaml
│   │   │   │   ├── formal_fallacies.yaml
│   │   │   │   ├── geometric_shapes.yaml
│   │   │   │   ├── hyperbaton.yaml
│   │   │   │   ├── logical_deduction_five_objects.yaml
│   │   │   │   ├── logical_deduction_seven_objects.yaml
│   │   │   │   ├── logical_deduction_three_objects.yaml
│   │   │   │   ├── movie_recommendation.yaml
│   │   │   │   ├── multistep_arithmetic_two.yaml
│   │   │   │   ├── navigate.yaml
│   │   │   │   ├── object_counting.yaml
│   │   │   │   ├── penguins_in_a_table.yaml
│   │   │   │   ├── reasoning_about_colored_objects.yaml
│   │   │   │   ├── ruin_names.yaml
│   │   │   │   ├── salient_translation_error_detection.yaml
│   │   │   │   ├── snarks.yaml
│   │   │   │   ├── sports_understanding.yaml
│   │   │   │   ├── temporal_sequences.yaml
│   │   │   │   ├── tracking_shuffled_objects_five_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_seven_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_three_objects.yaml
│   │   │   │   ├── web_of_lies.yaml
│   │   │   │   └── word_sorting.yaml
│   │   │   ├── cot_zeroshot/
│   │   │   │   ├── _bbh_cot_zeroshot.yaml
│   │   │   │   ├── _cot_zeroshot_template_yaml
│   │   │   │   ├── boolean_expressions.yaml
│   │   │   │   ├── causal_judgement.yaml
│   │   │   │   ├── date_understanding.yaml
│   │   │   │   ├── disambiguation_qa.yaml
│   │   │   │   ├── dyck_languages.yaml
│   │   │   │   ├── formal_fallacies.yaml
│   │   │   │   ├── geometric_shapes.yaml
│   │   │   │   ├── hyperbaton.yaml
│   │   │   │   ├── logical_deduction_five_objects.yaml
│   │   │   │   ├── logical_deduction_seven_objects.yaml
│   │   │   │   ├── logical_deduction_three_objects.yaml
│   │   │   │   ├── movie_recommendation.yaml
│   │   │   │   ├── multistep_arithmetic_two.yaml
│   │   │   │   ├── navigate.yaml
│   │   │   │   ├── object_counting.yaml
│   │   │   │   ├── penguins_in_a_table.yaml
│   │   │   │   ├── reasoning_about_colored_objects.yaml
│   │   │   │   ├── ruin_names.yaml
│   │   │   │   ├── salient_translation_error_detection.yaml
│   │   │   │   ├── snarks.yaml
│   │   │   │   ├── sports_understanding.yaml
│   │   │   │   ├── temporal_sequences.yaml
│   │   │   │   ├── tracking_shuffled_objects_five_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_seven_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_three_objects.yaml
│   │   │   │   ├── utils.py
│   │   │   │   ├── web_of_lies.yaml
│   │   │   │   └── word_sorting.yaml
│   │   │   ├── fewshot/
│   │   │   │   ├── _bbh_fewshot.yaml
│   │   │   │   ├── _fewshot_template_yaml
│   │   │   │   ├── boolean_expressions.yaml
│   │   │   │   ├── causal_judgement.yaml
│   │   │   │   ├── date_understanding.yaml
│   │   │   │   ├── disambiguation_qa.yaml
│   │   │   │   ├── dyck_languages.yaml
│   │   │   │   ├── formal_fallacies.yaml
│   │   │   │   ├── geometric_shapes.yaml
│   │   │   │   ├── hyperbaton.yaml
│   │   │   │   ├── logical_deduction_five_objects.yaml
│   │   │   │   ├── logical_deduction_seven_objects.yaml
│   │   │   │   ├── logical_deduction_three_objects.yaml
│   │   │   │   ├── movie_recommendation.yaml
│   │   │   │   ├── multistep_arithmetic_two.yaml
│   │   │   │   ├── navigate.yaml
│   │   │   │   ├── object_counting.yaml
│   │   │   │   ├── penguins_in_a_table.yaml
│   │   │   │   ├── reasoning_about_colored_objects.yaml
│   │   │   │   ├── ruin_names.yaml
│   │   │   │   ├── salient_translation_error_detection.yaml
│   │   │   │   ├── snarks.yaml
│   │   │   │   ├── sports_understanding.yaml
│   │   │   │   ├── temporal_sequences.yaml
│   │   │   │   ├── tracking_shuffled_objects_five_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_seven_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_three_objects.yaml
│   │   │   │   ├── web_of_lies.yaml
│   │   │   │   └── word_sorting.yaml
│   │   │   └── zeroshot/
│   │   │       ├── _bbh_zeroshot.yaml
│   │   │       ├── _zeroshot_template_yaml
│   │   │       ├── boolean_expressions.yaml
│   │   │       ├── causal_judgement.yaml
│   │   │       ├── date_understanding.yaml
│   │   │       ├── disambiguation_qa.yaml
│   │   │       ├── dyck_languages.yaml
│   │   │       ├── formal_fallacies.yaml
│   │   │       ├── geometric_shapes.yaml
│   │   │       ├── hyperbaton.yaml
│   │   │       ├── logical_deduction_five_objects.yaml
│   │   │       ├── logical_deduction_seven_objects.yaml
│   │   │       ├── logical_deduction_three_objects.yaml
│   │   │       ├── movie_recommendation.yaml
│   │   │       ├── multistep_arithmetic_two.yaml
│   │   │       ├── navigate.yaml
│   │   │       ├── object_counting.yaml
│   │   │       ├── penguins_in_a_table.yaml
│   │   │       ├── reasoning_about_colored_objects.yaml
│   │   │       ├── ruin_names.yaml
│   │   │       ├── salient_translation_error_detection.yaml
│   │   │       ├── snarks.yaml
│   │   │       ├── sports_understanding.yaml
│   │   │       ├── temporal_sequences.yaml
│   │   │       ├── tracking_shuffled_objects_five_objects.yaml
│   │   │       ├── tracking_shuffled_objects_seven_objects.yaml
│   │   │       ├── tracking_shuffled_objects_three_objects.yaml
│   │   │       ├── utils.py
│   │   │       ├── web_of_lies.yaml
│   │   │       └── word_sorting.yaml
│   │   ├── bbq/
│   │   │   ├── README.md
│   │   │   ├── bbq_generate.yaml
│   │   │   ├── bbq_generate_ambig.yaml
│   │   │   ├── bbq_generate_disambig.yaml
│   │   │   ├── bbq_multiple_choice.yaml
│   │   │   ├── bbq_multiple_choice_ambig.yaml
│   │   │   ├── bbq_multiple_choice_disambig.yaml
│   │   │   └── utils.py
│   │   ├── bear/
│   │   │   ├── README.md
│   │   │   ├── bear.yaml
│   │   │   └── bear_big.yaml
│   │   ├── belebele/
│   │   │   ├── README.md
│   │   │   ├── _belebele.yaml
│   │   │   ├── _default_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── belebele_acm_Arab.yaml
│   │   │   ├── belebele_afr_Latn.yaml
│   │   │   ├── belebele_als_Latn.yaml
│   │   │   ├── belebele_amh_Ethi.yaml
│   │   │   ├── belebele_apc_Arab.yaml
│   │   │   ├── belebele_arb_Arab.yaml
│   │   │   ├── belebele_arb_Latn.yaml
│   │   │   ├── belebele_ars_Arab.yaml
│   │   │   ├── belebele_ary_Arab.yaml
│   │   │   ├── belebele_arz_Arab.yaml
│   │   │   ├── belebele_asm_Beng.yaml
│   │   │   ├── belebele_azj_Latn.yaml
│   │   │   ├── belebele_bam_Latn.yaml
│   │   │   ├── belebele_ben_Beng.yaml
│   │   │   ├── belebele_ben_Latn.yaml
│   │   │   ├── belebele_bod_Tibt.yaml
│   │   │   ├── belebele_bul_Cyrl.yaml
│   │   │   ├── belebele_cat_Latn.yaml
│   │   │   ├── belebele_ceb_Latn.yaml
│   │   │   ├── belebele_ces_Latn.yaml
│   │   │   ├── belebele_ckb_Arab.yaml
│   │   │   ├── belebele_dan_Latn.yaml
│   │   │   ├── belebele_deu_Latn.yaml
│   │   │   ├── belebele_ell_Grek.yaml
│   │   │   ├── belebele_eng_Latn.yaml
│   │   │   ├── belebele_est_Latn.yaml
│   │   │   ├── belebele_eus_Latn.yaml
│   │   │   ├── belebele_fin_Latn.yaml
│   │   │   ├── belebele_fra_Latn.yaml
│   │   │   ├── belebele_fuv_Latn.yaml
│   │   │   ├── belebele_gaz_Latn.yaml
│   │   │   ├── belebele_grn_Latn.yaml
│   │   │   ├── belebele_guj_Gujr.yaml
│   │   │   ├── belebele_hat_Latn.yaml
│   │   │   ├── belebele_hau_Latn.yaml
│   │   │   ├── belebele_heb_Hebr.yaml
│   │   │   ├── belebele_hin_Deva.yaml
│   │   │   ├── belebele_hin_Latn.yaml
│   │   │   ├── belebele_hrv_Latn.yaml
│   │   │   ├── belebele_hun_Latn.yaml
│   │   │   ├── belebele_hye_Armn.yaml
│   │   │   ├── belebele_ibo_Latn.yaml
│   │   │   ├── belebele_ilo_Latn.yaml
│   │   │   ├── belebele_ind_Latn.yaml
│   │   │   ├── belebele_isl_Latn.yaml
│   │   │   ├── belebele_ita_Latn.yaml
│   │   │   ├── belebele_jav_Latn.yaml
│   │   │   ├── belebele_jpn_Jpan.yaml
│   │   │   ├── belebele_kac_Latn.yaml
│   │   │   ├── belebele_kan_Knda.yaml
│   │   │   ├── belebele_kat_Geor.yaml
│   │   │   ├── belebele_kaz_Cyrl.yaml
│   │   │   ├── belebele_kea_Latn.yaml
│   │   │   ├── belebele_khk_Cyrl.yaml
│   │   │   ├── belebele_khm_Khmr.yaml
│   │   │   ├── belebele_kin_Latn.yaml
│   │   │   ├── belebele_kir_Cyrl.yaml
│   │   │   ├── belebele_kor_Hang.yaml
│   │   │   ├── belebele_lao_Laoo.yaml
│   │   │   ├── belebele_lin_Latn.yaml
│   │   │   ├── belebele_lit_Latn.yaml
│   │   │   ├── belebele_lug_Latn.yaml
│   │   │   ├── belebele_luo_Latn.yaml
│   │   │   ├── belebele_lvs_Latn.yaml
│   │   │   ├── belebele_mal_Mlym.yaml
│   │   │   ├── belebele_mar_Deva.yaml
│   │   │   ├── belebele_mkd_Cyrl.yaml
│   │   │   ├── belebele_mlt_Latn.yaml
│   │   │   ├── belebele_mri_Latn.yaml
│   │   │   ├── belebele_mya_Mymr.yaml
│   │   │   ├── belebele_nld_Latn.yaml
│   │   │   ├── belebele_nob_Latn.yaml
│   │   │   ├── belebele_npi_Deva.yaml
│   │   │   ├── belebele_npi_Latn.yaml
│   │   │   ├── belebele_nso_Latn.yaml
│   │   │   ├── belebele_nya_Latn.yaml
│   │   │   ├── belebele_ory_Orya.yaml
│   │   │   ├── belebele_pan_Guru.yaml
│   │   │   ├── belebele_pbt_Arab.yaml
│   │   │   ├── belebele_pes_Arab.yaml
│   │   │   ├── belebele_plt_Latn.yaml
│   │   │   ├── belebele_pol_Latn.yaml
│   │   │   ├── belebele_por_Latn.yaml
│   │   │   ├── belebele_ron_Latn.yaml
│   │   │   ├── belebele_rus_Cyrl.yaml
│   │   │   ├── belebele_shn_Mymr.yaml
│   │   │   ├── belebele_sin_Latn.yaml
│   │   │   ├── belebele_sin_Sinh.yaml
│   │   │   ├── belebele_slk_Latn.yaml
│   │   │   ├── belebele_slv_Latn.yaml
│   │   │   ├── belebele_sna_Latn.yaml
│   │   │   ├── belebele_snd_Arab.yaml
│   │   │   ├── belebele_som_Latn.yaml
│   │   │   ├── belebele_sot_Latn.yaml
│   │   │   ├── belebele_spa_Latn.yaml
│   │   │   ├── belebele_srp_Cyrl.yaml
│   │   │   ├── belebele_ssw_Latn.yaml
│   │   │   ├── belebele_sun_Latn.yaml
│   │   │   ├── belebele_swe_Latn.yaml
│   │   │   ├── belebele_swh_Latn.yaml
│   │   │   ├── belebele_tam_Taml.yaml
│   │   │   ├── belebele_tel_Telu.yaml
│   │   │   ├── belebele_tgk_Cyrl.yaml
│   │   │   ├── belebele_tgl_Latn.yaml
│   │   │   ├── belebele_tha_Thai.yaml
│   │   │   ├── belebele_tir_Ethi.yaml
│   │   │   ├── belebele_tsn_Latn.yaml
│   │   │   ├── belebele_tso_Latn.yaml
│   │   │   ├── belebele_tur_Latn.yaml
│   │   │   ├── belebele_ukr_Cyrl.yaml
│   │   │   ├── belebele_urd_Arab.yaml
│   │   │   ├── belebele_urd_Latn.yaml
│   │   │   ├── belebele_uzn_Latn.yaml
│   │   │   ├── belebele_vie_Latn.yaml
│   │   │   ├── belebele_war_Latn.yaml
│   │   │   ├── belebele_wol_Latn.yaml
│   │   │   ├── belebele_xho_Latn.yaml
│   │   │   ├── belebele_yor_Latn.yaml
│   │   │   ├── belebele_zho_Hans.yaml
│   │   │   ├── belebele_zho_Hant.yaml
│   │   │   ├── belebele_zsm_Latn.yaml
│   │   │   └── belebele_zul_Latn.yaml
│   │   ├── benchmarks/
│   │   │   ├── README.md
│   │   │   ├── flan/
│   │   │   │   ├── _held_in_template_yaml
│   │   │   │   ├── flan_held_in.yaml
│   │   │   │   └── flan_held_out.yaml
│   │   │   ├── minerva_math.yaml
│   │   │   ├── multimedqa/
│   │   │   │   ├── README.md
│   │   │   │   └── multimedqa.yaml
│   │   │   ├── openllm.yaml
│   │   │   ├── pythia.yaml
│   │   │   └── t0_eval.yaml
│   │   ├── bertaqa/
│   │   │   ├── README.md
│   │   │   ├── _bertaqa_template
│   │   │   ├── bertaqa_en.yaml
│   │   │   ├── bertaqa_en_mt_gemma-7b.yaml
│   │   │   ├── bertaqa_en_mt_hitz.yaml
│   │   │   ├── bertaqa_en_mt_itzuli.yaml
│   │   │   ├── bertaqa_en_mt_latxa-13b-v1.1.yaml
│   │   │   ├── bertaqa_en_mt_latxa-13b-v1.yaml
│   │   │   ├── bertaqa_en_mt_latxa-70b-v1.1.yaml
│   │   │   ├── bertaqa_en_mt_latxa-70b-v1.yaml
│   │   │   ├─

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/new_tasks.yml
================================================
name: Tasks Modified

on:
  push:
    branches:
      - 'main'
  pull_request:
    branches:
      - 'main'
  workflow_dispatch:

env:
  TQDM_DISABLE: "1"
  HF_HUB_DISABLE_PROGRESS_BARS: "1"

# comment/edit out the above to stop/change the triggers
jobs:
  changed_files:
    runs-on: ubuntu-latest  # windows-latest || macos-latest
    timeout-minutes: 120
    name: Scan for changed tasks
    steps:
      - name: checkout
        uses: actions/checkout@v6
        with:
          fetch-depth: 2  # OR "2" -> To retrieve the preceding commit.

      # Uses the tj-actions/changed-files action to check for changes.
      # The `files_yaml` input optionally takes a yaml string to specify filters,
      # and prepends the filter name to the standard output names.
      - name: Check task folders
        id: changed-tasks
        uses: tj-actions/changed-files@v47
        with:
          # tasks checks the tasks folder and api checks the api folder for changes
          files_yaml: |
            tasks:
              - lm_eval/tasks/**
            api:
              - lm_eval/api/**
          write_output_files: true

    # The next step is optional; the files are written to the workspace by default (above).
    # so it's just for debugging
      - name: Run Tests
        if: steps.changed-tasks.outputs.tasks_any_modified == 'true' || steps.changed-tasks.outputs.api_any_modified == 'true'
        run: |
          echo .github/outputs/tasks_all_changed_and_modified_files.txt >> 'GITHUB_ENV'
          echo "One or more test file(s) has changed."
          echo "List of all the files that have changed: ${{ steps.changed-tasks.outputs.tasks_all_modified_files }}"

      - name: Install uv
        if: steps.changed-tasks.outputs.tasks_any_modified == 'true' || steps.changed-tasks.outputs.api_any_modified == 'true'
        uses: astral-sh/setup-uv@v7
        with:
          enable-cache: true
          python-version: "3.10"
          activate-environment: true
      - name: Install dependencies
        if: steps.changed-tasks.outputs.tasks_any_modified == 'true' || steps.changed-tasks.outputs.api_any_modified == 'true'
        run: |
          uv pip install -e '.[dev,ifeval,unitxt,math,longbench,hf]' --extra-index-url https://download.pytorch.org/whl/cpu
      - name: Test with pytest
        # if new tasks are added, run tests on them
        if: steps.changed-tasks.outputs.tasks_any_modified == 'true'
        run: pytest -x -s -vv tests/test_tasks.py
        # if api is modified, run tests on it
      - name: Test more tasks with pytest
        env:
          API: true
        if: steps.changed-tasks.outputs.api_any_modified == 'true'
        run: pytest -x -s -vv -n=auto tests/test_tasks.py


================================================
FILE: .github/workflows/publish.yml
================================================
name: Publish Python distribution to PyPI

on:
  push:
    tags:
      - '*'

jobs:
  build:
    name: Build distribution
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v6
    - name: Set up Python
      uses: actions/setup-python@v6
      with:
        python-version: "3.x"

    - name: Print version
      run: |
        # Extract version from pyproject.toml
        PYPROJECT_VERSION=$(grep 'version = ' pyproject.toml | head -1 | cut -d'"' -f2)
        echo "Version in pyproject.toml: $PYPROJECT_VERSION"

    - name: Install pypa/build
      run: >-
        python3 -m
        pip install
        build
        --user
    - name: Build a binary wheel and a source tarball
      run: python3 -m build
    - name: Store the distribution packages
      uses: actions/upload-artifact@v7
      with:
        name: python-package-distributions
        path: dist/

  publish-to-pypi:
    name: >-
      Publish Python distribution to PyPI
    if: startsWith(github.ref, 'refs/tags/')  # only publish to PyPI on tag pushes
    needs:
    - build
    runs-on: ubuntu-latest
    environment:
      name: pypi
      url: https://pypi.org/p/lm_eval
    permissions:
      id-token: write  # IMPORTANT: mandatory for trusted publishing

    steps:
    - name: Download all the dists
      uses: actions/download-artifact@v8
      with:
        name: python-package-distributions
        path: dist/
    - name: Publish distribution to PyPI
      uses: pypa/gh-action-pypi-publish@release/v1

  publish-to-testpypi:
    name: Publish Python distribution to TestPyPI
    needs:
    - build
    runs-on: ubuntu-latest

    environment:
      name: testpypi
      url: https://test.pypi.org/p/lm_eval

    permissions:
      id-token: write  # IMPORTANT: mandatory for trusted publishing

    steps:
    - name: Download all the dists
      uses: actions/download-artifact@v8
      with:
        name: python-package-distributions
        path: dist/
    - name: Publish distribution to TestPyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        repository-url: https://test.pypi.org/legacy/


================================================
FILE: .github/workflows/unit_tests.yml
================================================
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
# just comment out unwanted steps to turn off the test.
name: Unit Tests

on:
  push:
    branches:
      - 'main'
  pull_request:
    branches:
      - 'main'
  workflow_dispatch:

env:
  TQDM_DISABLE: "1"
  HF_HUB_DISABLE_PROGRESS_BARS: "1"

# Jobs run concurrently and steps run sequentially within a job.
# jobs: linter and cpu_tests. Add more jobs/steps as required.
jobs:
  linter:
    name: Linters
    runs-on: ubuntu-latest
    timeout-minutes: 5

    steps:
      - name: Checkout Code
        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Install uv
        uses: astral-sh/setup-uv@v7
        with:
          enable-cache: true
          python-version: "3.10"
          activate-environment: true
      - name: Install pip
        run: uv pip install pip
      - name: Pre-Commit
        env:
          SKIP: "no-commit-to-branch,mypy"
        uses: pre-commit/action@v3.0.1
        with:
          extra_args: --from-ref ${{ github.event.pull_request.base.sha || 'HEAD~1' }} --to-ref HEAD
  # Job 2
  testcpu:
    name: CPU Tests
    runs-on: ubuntu-latest
    strategy:
      fail-fast: true
      matrix:
        python-version: ["3.10", "3.11", "3.12"]
    timeout-minutes: 30
    steps:
      - name: Checkout Code
        uses: actions/checkout@v6
      - name: Install uv
        uses: astral-sh/setup-uv@v7
        with:
          enable-cache: true
          python-version: ${{ matrix.python-version }}
          activate-environment: true

      # Cache HuggingFace cache directory for CPU tests
      - name: Cache HuggingFace cache (CPU tests)
        uses: actions/cache@v5
        id: cache-hf-cpu
        with:
          path: ~/.cache/huggingface
          key: ${{ runner.os }}-hf-cache-cpu
          restore-keys: |
            ${{ runner.os }}-hf-cache-cpu

      - name: Install dependencies
        run: |
          uv pip install -e '.[dev,unitxt,hf]' --extra-index-url https://download.pytorch.org/whl/cpu
          uv pip install hf_xet

      - name: Test with pytest
        run: pytest -x --showlocals -s -vv -n=auto --ignore=tests/models/test_openvino.py --ignore=tests/models/test_hf_steered.py --ignore=tests/scripts/test_zeno_visualize.py

      # Save test artifacts
      - name: Archive test artifacts
        if: always()  # Upload artifacts even if tests fail
        uses: actions/upload-artifact@v7
        with:
          name: output_testcpu${{ matrix.python-version }}
          path: |
            test_logs/*

#  testmodels:
#    name: External LM Tests
#    runs-on: ubuntu-latest
#    timeout-minutes: 30
#    steps:
#      - name: Checkout Code
#        uses: actions/checkout@v4
#      - name: Set up Python 3.9
#        uses: actions/setup-python@v5
#        with:
#          python-version: 3.9
#          cache: pip
#          cache-dependency-path: pyproject.toml
#
#      # Cache HuggingFace cache directory for External LM tests
#      - name: Cache HuggingFace cache (External LM tests)
#        uses: actions/cache@v3
#        id: cache-hf-lm
#        with:
#          path: ~/.cache/huggingface
#          key: ${{ runner.os }}-hf-cache-external-lm
#          restore-keys: |
#            ${{ runner.os }}-hf-cache-external-lm
#
#      - name: Install dependencies
#        run: |
#          python -m pip install --upgrade pip
#          pip install -e '.[dev,optimum,api]' --extra-index-url https://download.pytorch.org/whl/cpu
#          pip install -U transformers peft accelerate
#
#      - name: Test with pytest
#        run: python -m pytest tests/models --showlocals -s -vv
#        continue-on-error: true  # Continue workflow even if tests fail


================================================
FILE: .gitignore
================================================
# macOS system files
.DS_Store

# Virtual environments
.venv/
venv/
ENV/
env/
*.env

# Python bytecode and build artifacts
__pycache__/
*.py[cod]
*.so
*.egg-info/
build/
dist/

# IDE & editor settings
.vscode/
.idea/

# Jupyter
.ipynb_checkpoints/
profile_default/
ipython_config.py

# Output and data
output/
data/
temp/
test_logs/

# Caching
lm_eval/caching/.cache
lm_cache/

# Logging
*.log
logs/

# wandb experiment tracking
wandb/
examples/wandb/

# PyInstaller
*.spec

#uv
uv.lock


================================================
FILE: .pre-commit-config.yaml
================================================
# Ignore test linting to avoid conflicting changes to version stability.
exclude: ^tests/testdata/
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: check-added-large-files
      - id: check-ast
      - id: fix-byte-order-marker
      - id: check-case-conflict
      - id: check-json
      - id: check-merge-conflict
        args: [ --assume-in-merge ]
      - id: check-symlinks
      - id: check-yaml
        args: [ "--unsafe" ]
      - id: destroyed-symlinks
      - id: detect-private-key
      - id: end-of-file-fixer
      - id: no-commit-to-branch
        always_run: false
      - id: requirements-txt-fixer
      - id: trailing-whitespace
        args: [ --markdown-linebreak-ext=md ]
      - id: fix-byte-order-marker
        exclude: docs/CNAME
      - id: mixed-line-ending
        args: [ --fix=lf ]
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.6
    hooks:
      # Run the linter.
      - id: ruff-check
        args: [ --fix ]
      # Run the formatter.
      - id: ruff-format
  - repo: https://github.com/codespell-project/codespell
    rev: v2.4.2
    hooks:
      - id: codespell
        exclude: >
          (?x)^(

              .*\.json|ignore.txt|lm_eval/tasks/.*|.*yaml|.*\.ipynb
          )$

        args: [ --check-filenames, --check-hidden, --ignore-words=ignore.txt ]
  - repo: https://github.com/jackdewinter/pymarkdown
    rev: v0.9.36
    hooks:
      - id: pymarkdown
        exclude: ^(lm_eval/tasks/.*|docs/footguns\.md)$
        args: [ fix, -r ]


================================================
FILE: CITATION.bib
================================================
@misc{eval-harness,
  author       = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
  title        = {A framework for few-shot language model evaluation},
  month        = 12,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v0.4.0},
  doi          = {10.5281/zenodo.10256836},
  url          = {https://zenodo.org/records/10256836}
}


================================================
FILE: CODEOWNERS
================================================
* @baberabb
* @0xSMT


================================================
FILE: LICENSE.md
================================================
MIT License

Copyright (c) 2020 EleutherAI

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: MANIFEST.in
================================================
recursive-include tests


================================================
FILE: README.md
================================================
# Language Model Evaluation Harness

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10256836.svg)](https://doi.org/10.5281/zenodo.10256836)

---

## Latest News 📣
- [2025/12] **CLI refactored** with subcommands (`run`, `ls`, `validate`) and YAML config file support via `--config`. See the [CLI Reference](./docs/interface.md) and [Configuration Guide](./docs/config_files.md).
- [2025/12] **Lighter install**: Base package no longer includes `transformers`/`torch`. Install model backends separately: `pip install lm_eval[hf]`, `lm_eval[vllm]`, etc.
- [2025/07] Added `think_end_token` arg to `hf` (token/str), `vllm` and `sglang` (str) for stripping CoT reasoning traces from models that support it.
- [2025/03] Added support for steering HF models!
- [2025/02] Added [SGLang](https://docs.sglang.ai/) support!
- [2024/09] We are prototyping allowing users of LM Evaluation Harness to create and evaluate on text+image multimodal input, text output tasks, and have just added the `hf-multimodal` and `vllm-vlm` model types and `mmmu` task as a prototype feature. We welcome users to try out this in-progress feature and stress-test it for themselves, and suggest they check out [`lmms-eval`](https://github.com/EvolvingLMMs-Lab/lmms-eval), a wonderful project originally forking off of the lm-evaluation-harness, for a broader range of multimodal tasks, models, and features.
- [2024/07] [API model](docs/API_guide.md) support has been updated and refactored, introducing support for batched and async requests, and making it significantly easier to customize and use for your own purposes. **To run Llama 405B, we recommend using VLLM's OpenAI-compliant API to host the model, and use the `local-completions` model type to evaluate the model.**
- [2024/07] New Open LLM Leaderboard tasks have been added ! You can find them under the [leaderboard](lm_eval/tasks/leaderboard/README.md) task group.

---

## Announcement

**A new v0.4.0 release of lm-evaluation-harness is available** !

New updates and features include:

- **New Open LLM Leaderboard tasks have been added ! You can find them under the [leaderboard](lm_eval/tasks/leaderboard/README.md) task group.**
- Internal refactoring
- Config-based task creation and configuration
- Easier import and sharing of externally-defined task config YAMLs
- Support for Jinja2 prompt design, easy modification of prompts + prompt imports from Promptsource
- More advanced configuration options, including output post-processing, answer extraction, and multiple LM generations per document, configurable fewshot settings, and more
- Speedups and new modeling libraries supported, including: faster data-parallel HF model usage, vLLM support, MPS support with HuggingFace, and more
- Logging and usability changes
- New tasks including CoT BIG-Bench-Hard, Belebele, user-defined task groupings, and more

Please see our updated documentation pages in `docs/` for more details.

Development will be continuing on the `main` branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub, or in the [EleutherAI discord](https://discord.gg/eleutherai)!

---

## Overview

This project provides a unified framework to test generative language models on a large number of different evaluation tasks.

**Features:**

- Over 60 standard academic benchmarks for LLMs, with hundreds of subtasks and variants implemented.
- Support for models loaded via [transformers](https://github.com/huggingface/transformers/) (including quantization via [GPTQModel](https://github.com/ModelCloud/GPTQModel) and [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)), [GPT-NeoX](https://github.com/EleutherAI/gpt-neox), and [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/), with a flexible tokenization-agnostic interface.
- Support for fast and memory-efficient inference with [vLLM](https://github.com/vllm-project/vllm).
- Support for commercial APIs including [OpenAI](https://openai.com), and [TextSynth](https://textsynth.com/).
- Support for evaluation on adapters (e.g. LoRA) supported in [HuggingFace's PEFT library](https://github.com/huggingface/peft).
- Support for local models and benchmarks.
- Evaluation with publicly available prompts ensures reproducibility and comparability between papers.
- Easy support for custom prompts and evaluation metrics.

The Language Model Evaluation Harness is the backend for 🤗 Hugging Face's popular [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), has been used in [hundreds of papers](https://scholar.google.com/scholar?oi=bibs&hl=en&authuser=2&cites=15052937328817631261,4097184744846514103,1520777361382155671,17476825572045927382,18443729326628441434,14801318227356878622,7890865700763267262,12854182577605049984,15641002901115500560,5104500764547628290), and is used internally by dozens of organizations including NVIDIA, Cohere, BigScience, BigCode, Nous Research, and Mosaic ML.

## Install

To install the `lm-eval` package from the github repository, run:

```bash
git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .
```

### Installing Model Backends

The base installation provides the core evaluation framework. **Model backends must be installed separately** using optional extras:

For HuggingFace transformers models:

```bash
pip install "lm_eval[hf]"
```

For vLLM inference:

```bash
pip install "lm_eval[vllm]"
```

For API-based models (OpenAI, Anthropic, etc.):

```bash
pip install "lm_eval[api]"
```

Multiple backends can be installed together:

```bash
pip install "lm_eval[hf,vllm,api]"
```

A detailed table of all optional extras is available at the end of this document.

## Basic Usage

### Documentation

| Guide | Description |
|-------|-------------|
| [CLI Reference](./docs/interface.md) | Command-line arguments and subcommands |
| [Configuration Guide](./docs/config_files.md) | YAML config file format and examples |
| [Python API](./docs/python-api.md) | Programmatic usage with `simple_evaluate()` |
| [Task Guide](./lm_eval/tasks/README.md) | Available tasks and task configuration |

Use `lm-eval -h` to see available options, or `lm-eval run -h` for evaluation options.

List available tasks with:

```bash
lm-eval ls tasks
```

### Hugging Face `transformers`

> [!Important]
> To use the HuggingFace backend, first install: `pip install "lm_eval[hf]"`

To evaluate a model hosted on the [HuggingFace Hub](https://huggingface.co/models) (e.g. GPT-J-6B) on `hellaswag` you can use the following command (this assumes you are using a CUDA-compatible GPU):

```bash
lm_eval --model hf \
    --model_args pretrained=EleutherAI/gpt-j-6B \
    --tasks hellaswag \
    --device cuda:0 \
    --batch_size 8
```

Additional arguments can be provided to the model constructor using the `--model_args` flag. Most notably, this supports the common practice of using the `revisions` feature on the Hub to store partially trained checkpoints, or to specify the datatype for running a model:

```bash
lm_eval --model hf \
    --model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \
    --tasks lambada_openai,hellaswag \
    --device cuda:0 \
    --batch_size 8
```

Models that are loaded via both `transformers.AutoModelForCausalLM` (autoregressive, decoder-only GPT style models) and `transformers.AutoModelForSeq2SeqLM` (such as encoder-decoder models like T5) in Huggingface are supported.

Batch size selection can be automated by setting the  ```--batch_size``` flag to ```auto```. This will perform automatic detection of the largest batch size that will fit on your device. On tasks where there is a large difference between the longest and shortest example, it can be helpful to periodically recompute the largest batch size, to gain a further speedup. To do this, append ```:N``` to above flag to automatically recompute the largest batch size ```N``` times. For example, to recompute the batch size 4 times, the command would be:

```bash
lm_eval --model hf \
    --model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \
    --tasks lambada_openai,hellaswag \
    --device cuda:0 \
    --batch_size auto:4
```

> [!Note]
> Just like you can provide a local path to `transformers.AutoModel`, you can also provide a local path to `lm_eval` via `--model_args pretrained=/path/to/model`

#### Evaluating GGUF Models

`lm-eval` supports evaluating models in GGUF format using the Hugging Face (`hf`) backend. This allows you to use quantized models compatible with `transformers`, `AutoModel`, and llama.cpp conversions.

To evaluate a GGUF model, pass the path to the directory containing the model weights, the `gguf_file`, and optionally a separate `tokenizer` path using the `--model_args` flag.

**🚨 Important Note:**  
If no separate tokenizer is provided, Hugging Face will attempt to reconstruct the tokenizer from the GGUF file — this can take **hours** or even hang indefinitely. Passing a separate tokenizer avoids this issue and can reduce tokenizer loading time from hours to seconds.

**✅ Recommended usage:**

```bash
lm_eval --model hf \
    --model_args pretrained=/path/to/gguf_folder,gguf_file=model-name.gguf,tokenizer=/path/to/tokenizer \
    --tasks hellaswag \
    --device cuda:0 \
    --batch_size 8
```

> [!Tip]
> Ensure the tokenizer path points to a valid Hugging Face tokenizer directory (e.g., containing tokenizer_config.json, vocab.json, etc.).

#### Multi-GPU Evaluation with Hugging Face `accelerate`

We support three main ways of using Hugging Face's [accelerate 🚀](https://github.com/huggingface/accelerate) library for multi-GPU evaluation.

To perform *data-parallel evaluation* (where each GPU loads a **separate full copy** of the model), we leverage the `accelerate` launcher as follows:

```bash
accelerate launch -m lm_eval --model hf \
    --tasks lambada_openai,arc_easy \
    --batch_size 16
```

(or via `accelerate launch --no-python lm_eval`).

For cases where your model can fit on a single GPU, this allows you to evaluate on K GPUs K times faster than on one.

**WARNING**: This setup does not work with FSDP model sharding, so in `accelerate config` FSDP must be disabled, or the NO_SHARD FSDP option must be used.

The second way of using `accelerate` for multi-GPU evaluation is when your model is *too large to fit on a single GPU.*

In this setting, run the library *outside the `accelerate` launcher*, but passing `parallelize=True` to `--model_args` as follows:

```bash
lm_eval --model hf \
    --tasks lambada_openai,arc_easy \
    --model_args parallelize=True \
    --batch_size 16
```

This means that your model's weights will be split across all available GPUs.

For more advanced users or even larger models, we allow for the following arguments when `parallelize=True` as well:

- `device_map_option`: How to split model weights across available GPUs. defaults to "auto".
- `max_memory_per_gpu`: the max GPU memory to use per GPU in loading the model.
- `max_cpu_memory`: the max amount of CPU memory to use when offloading the model weights to RAM.
- `offload_folder`: a folder where model weights will be offloaded to disk if needed.

The third option is to use both at the same time. This will allow you to take advantage of both data parallelism and model sharding, and is especially useful for models that are too large to fit on a single GPU.

```bash
accelerate launch --multi_gpu --num_processes {nb_of_copies_of_your_model} \
    -m lm_eval --model hf \
    --tasks lambada_openai,arc_easy \
    --model_args parallelize=True \
    --batch_size 16
```

To learn more about model parallelism and how to use it with the `accelerate` library, see the [accelerate documentation](https://huggingface.co/docs/transformers/v4.15.0/en/parallelism)

**Warning: We do not natively support multi-node evaluation using the `hf` model type! Please reference [our GPT-NeoX library integration](https://github.com/EleutherAI/gpt-neox/blob/main/eval.py) for an example of code in which a custom multi-machine evaluation script is written.**

**Note: we do not currently support multi-node evaluations natively, and advise using either an externally hosted server to run inference requests against, or creating a custom integration with your distributed framework [as is done for the GPT-NeoX library](https://github.com/EleutherAI/gpt-neox/blob/main/eval_tasks/eval_adapter.py).**

### Steered Hugging Face `transformers` models

To evaluate a Hugging Face `transformers` model with steering vectors applied, specify the model type as `steered` and provide the path to either a PyTorch file containing pre-defined steering vectors, or a CSV file that specifies how to derive steering vectors from pretrained `sparsify` or `sae_lens` models (you will need to install the corresponding optional dependency for this method).

Specify pre-defined steering vectors:

```python
import torch

steer_config = {
    "layers.3": {
        "steering_vector": torch.randn(1, 768),
        "bias": torch.randn(1, 768),
        "steering_coefficient": 1,
        "action": "add"
    },
}
torch.save(steer_config, "steer_config.pt")
```

Specify derived steering vectors:

```python
import pandas as pd

pd.DataFrame({
    "loader": ["sparsify"],
    "action": ["add"],
    "sparse_model": ["EleutherAI/sae-pythia-70m-32k"],
    "hookpoint": ["layers.3"],
    "feature_index": [30],
    "steering_coefficient": [10.0],
}).to_csv("steer_config.csv", index=False)
```

Run the evaluation harness with steering vectors applied:

```bash
lm_eval --model steered \
    --model_args pretrained=EleutherAI/pythia-160m,steer_path=steer_config.pt \
    --tasks lambada_openai,hellaswag \
    --device cuda:0 \
    --batch_size 8
```

### NVIDIA `nemo` models

[NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo) is a generative AI framework built for researchers and pytorch developers working on language models.

To evaluate a `nemo` model, start by installing NeMo following [the documentation](https://github.com/NVIDIA/NeMo?tab=readme-ov-file#installation). We highly recommended to use the NVIDIA PyTorch or NeMo container, especially if having issues installing Apex or any other dependencies (see [latest released containers](https://github.com/NVIDIA/NeMo/releases)). Please also install the lm evaluation harness library following the instructions in [the Install section](https://github.com/EleutherAI/lm-evaluation-harness/tree/main?tab=readme-ov-file#install).

NeMo models can be obtained through [NVIDIA NGC Catalog](https://catalog.ngc.nvidia.com/models) or in [NVIDIA's Hugging Face page](https://huggingface.co/nvidia). In [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo/tree/main/scripts/nlp_language_modeling) there are conversion scripts to convert the `hf` checkpoints of popular models like llama, falcon, mixtral or mpt to `nemo`.

Run a `nemo` model on one GPU:

```bash
lm_eval --model nemo_lm \
    --model_args path=<path_to_nemo_model> \
    --tasks hellaswag \
    --batch_size 32
```

It is recommended to unpack the `nemo` model to avoid the unpacking inside the docker container - it may overflow disk space. For that you can run:

```bash
mkdir MY_MODEL
tar -xvf MY_MODEL.nemo -c MY_MODEL
```

#### Multi-GPU evaluation with NVIDIA `nemo` models

By default, only one GPU is used. But we do support either data replication or tensor/pipeline parallelism during evaluation, on one node.

1) To enable data replication, set the `model_args` of `devices` to the number of data replicas to run. For example, the command to run 8 data replicas over 8 GPUs is:

```bash
torchrun --nproc-per-node=8 --no-python lm_eval \
    --model nemo_lm \
    --model_args path=<path_to_nemo_model>,devices=8 \
    --tasks hellaswag \
    --batch_size 32
```

1) To enable tensor and/or pipeline parallelism, set the `model_args` of `tensor_model_parallel_size` and/or `pipeline_model_parallel_size`. In addition, you also have to set up `devices` to be equal to the product of `tensor_model_parallel_size` and/or `pipeline_model_parallel_size`. For example, the command to use one node of 4 GPUs with tensor parallelism of 2 and pipeline parallelism of 2 is:

```bash
torchrun --nproc-per-node=4 --no-python lm_eval \
    --model nemo_lm \
    --model_args path=<path_to_nemo_model>,devices=4,tensor_model_parallel_size=2,pipeline_model_parallel_size=2 \
    --tasks hellaswag \
    --batch_size 32
```

Note that it is recommended to substitute the `python` command by `torchrun --nproc-per-node=<number of devices> --no-python` to facilitate loading the model into the GPUs. This is especially important for large checkpoints loaded into multiple GPUs.

Not supported yet: multi-node evaluation and combinations of data replication with tensor or pipeline parallelism.

### Megatron-LM models

[Megatron-LM](https://github.com/NVIDIA/Megatron-LM) is NVIDIA's large-scale transformer training framework. This backend allows direct evaluation of Megatron-LM checkpoints without conversion.

**Requirements:**
- Megatron-LM must be installed or accessible via `MEGATRON_PATH` environment variable
- PyTorch with CUDA support

**Setup:**

```bash
# Set environment variable pointing to Megatron-LM installation
export MEGATRON_PATH=/path/to/Megatron-LM
```

**Basic usage (single GPU):**

```bash
lm_eval --model megatron_lm \
    --model_args load=/path/to/checkpoint,tokenizer_type=HuggingFaceTokenizer,tokenizer_model=/path/to/tokenizer \
    --tasks hellaswag \
    --batch_size 1
```

**Supported checkpoint formats:**
- Standard Megatron checkpoints (`model_optim_rng.pt`)
- Distributed checkpoints (`.distcp` format, auto-detected)

#### Parallelism Modes

The Megatron-LM backend supports the following parallelism modes:

| Mode | Configuration | Description |
|------|---------------|-------------|
| Single GPU | `devices=1` (default) | Standard single GPU evaluation |
| Data Parallelism | `devices>1, TP=1` | Each GPU has a full model replica, data is distributed |
| Tensor Parallelism | `TP == devices` | Model layers are split across GPUs |
| Expert Parallelism | `EP == devices, TP=1` | For MoE models, experts are distributed across GPUs |

> [!Note]
> - Pipeline Parallelism (PP > 1) is not currently supported.
> - Expert Parallelism (EP) cannot be combined with Tensor Parallelism (TP).

**Data Parallelism (4 GPUs, each with full model replica):**

```bash
torchrun --nproc-per-node=4 -m lm_eval --model megatron_lm \
    --model_args load=/path/to/checkpoint,tokenizer_model=/path/to/tokenizer,devices=4 \
    --tasks hellaswag
```

**Tensor Parallelism (TP=2):**

```bash
torchrun --nproc-per-node=2 -m lm_eval --model megatron_lm \
    --model_args load=/path/to/checkpoint,tokenizer_model=/path/to/tokenizer,devices=2,tensor_model_parallel_size=2 \
    --tasks hellaswag
```

**Expert Parallelism for MoE models (EP=4):**

```bash
torchrun --nproc-per-node=4 -m lm_eval --model megatron_lm \
    --model_args load=/path/to/moe_checkpoint,tokenizer_model=/path/to/tokenizer,devices=4,expert_model_parallel_size=4 \
    --tasks hellaswag
```

**Using extra_args for additional Megatron options:**

```bash
lm_eval --model megatron_lm \
    --model_args load=/path/to/checkpoint,tokenizer_model=/path/to/tokenizer,extra_args="--no-rope-fusion --trust-remote-code" \
    --tasks hellaswag
```

> [!Note]
> The `--use-checkpoint-args` flag is enabled by default, which loads model architecture parameters from the checkpoint. For checkpoints converted via Megatron-Bridge, this typically includes all necessary model configuration.

#### Multi-GPU evaluation with OpenVINO models

Pipeline parallelism during evaluation is supported with OpenVINO models

To enable pipeline parallelism, set the `model_args` of `pipeline_parallel`. In addition, you also have to set up `device` to value `HETERO:<GPU index1>,<GPU index2>` for example `HETERO:GPU.1,GPU.0` For example, the command to use pipeline parallelism of 2 is:

```bash
lm_eval --model openvino \
    --tasks wikitext \
    --model_args pretrained=<path_to_ov_model>,pipeline_parallel=True \
    --device HETERO:GPU.1,GPU.0
```

### Tensor + Data Parallel and Optimized Inference with `vLLM`

We also support vLLM for faster inference on [supported model types](https://docs.vllm.ai/en/latest/models/supported_models.html), especially faster when splitting a model across multiple GPUs. For single-GPU or multi-GPU — tensor parallel, data parallel, or a combination of both — inference, for example:

```bash
lm_eval --model vllm \
    --model_args pretrained={model_name},tensor_parallel_size={GPUs_per_model},dtype=auto,gpu_memory_utilization=0.8,data_parallel_size={model_replicas} \
    --tasks lambada_openai \
    --batch_size auto
```

To use vllm, do `pip install "lm_eval[vllm]"`. For a full list of supported vLLM configurations, please reference our [vLLM integration](https://github.com/EleutherAI/lm-evaluation-harness/blob/e74ec966556253fbe3d8ecba9de675c77c075bce/lm_eval/models/vllm_causallms.py) and the vLLM documentation.

vLLM occasionally differs in output from Huggingface. We treat Huggingface as the reference implementation and provide a [script](./scripts/model_comparator.py) for checking the validity of vllm results against HF.

> [!Tip]
> For fastest performance, we recommend using `--batch_size auto` for vLLM whenever possible, to leverage its continuous batching functionality!

> [!Tip]
> Passing `max_model_len=4096` or some other reasonable default to vLLM through model args may cause speedups or prevent out-of-memory errors when trying to use auto batch size, such as for Mistral-7B-v0.1 which defaults to a maximum length of 32k.

### Tensor + Data Parallel and Fast Offline Batching Inference with `SGLang`

We support SGLang for efficient offline batch inference. Its **[Fast Backend Runtime](https://docs.sglang.ai/index.html)** delivers high performance through optimized memory management and parallel processing techniques. Key features include tensor parallelism, continuous batching, and support for various quantization methods (FP8/INT4/AWQ/GPTQ).

To use SGLang as the evaluation backend, please **install it in advance** via SGLang documents [here](https://docs.sglang.io/get_started/install.html#install-sglang).

> [!Tip]
> Due to the installing method of [`Flashinfer`](https://docs.flashinfer.ai/)-- a fast attention kernel library, we don't include the dependencies of `SGLang` within [pyproject.toml](pyproject.toml). Note that the `Flashinfer` also has some requirements on `torch` version.

SGLang's server arguments are slightly different from other backends, see [here](https://docs.sglang.io/advanced_features/server_arguments.html) for more information. We provide an example of the usage here:

```bash
lm_eval --model sglang \
    --model_args pretrained={model_name},dp_size={data_parallel_size},tp_size={tensor_parallel_size},dtype=auto \
    --tasks gsm8k_cot \
    --batch_size auto
```

> [!Tip]
> When encountering out-of-memory (OOM) errors (especially for multiple-choice tasks), try these solutions:
>
> 1. Use a manual `batch_size`, rather than `auto`.
> 2. Lower KV cache pool memory usage by adjusting `mem_fraction_static` - Add to your model arguments for example `--model_args pretrained=...,mem_fraction_static=0.7`.
> 3. Increase tensor parallel size `tp_size` (if using multiple GPUs).

### Windows ML

We support **Windows ML** for hardware-accelerated inference on Windows platforms. This enables evaluation on CPU, GPU, and **NPU (Neural Processing Unit)** devices.

Windows ML?
https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/overview

To use Windows ML, install the required dependencies:

```bash
pip install wasdk-Microsoft.Windows.AI.MachineLearning[all] wasdk-Microsoft.Windows.ApplicationModel.DynamicDependency.Bootstrap onnxruntime-windowsml onnxruntime-genai-winml
```

Evaluate an ONNX Runtime GenAI LLM on NPU/GPU/CPU on Windows:

```bash
lm_eval --model winml \
    --model_args pretrained=/path/to/onnx/model \
    --tasks mmlu \
    --batch_size 1
```

> [!Note]
> The Windows ML backend is ONLY for ONNX Runtime GenAI model format. Models targeting `transformers.js` won't work. You can verify this by finding the `genai_config.json` file in the model folder.

> [!Note]
> To run an ONNX Runtime GenAI model on the target device, you MUST convert the original model to that vendor and device type. Converted models won't work / work well on other vendor or device types. To learn more on model conversion, please visit [Microsoft AI Tool Kit](https://code.visualstudio.com/docs/intelligentapps/modelconversion)

### Model APIs and Inference Servers

> [!Important]
> To use API-based models, first install: `pip install "lm_eval[api]"`

Our library also supports the evaluation of models served via several commercial APIs, and we hope to implement support for the most commonly used performant local/self-hosted inference servers.

To call a hosted model, use:

```bash
export OPENAI_API_KEY=YOUR_KEY_HERE
lm_eval --model openai-completions \
    --model_args model=davinci-002 \
    --tasks lambada_openai,hellaswag
```

We also support using your own local inference server with servers that mirror the OpenAI Completions and ChatCompletions APIs.

```bash
lm_eval --model local-completions --tasks gsm8k --model_args model=facebook/opt-125m,base_url=http://{yourip}:8000/v1/completions,num_concurrent=1,max_retries=3,tokenized_requests=False,batch_size=16
```

Note that for externally hosted models, configs such as `--device` which relate to where to place a local model should not be used and do not function. Just like you can use `--model_args` to pass arbitrary arguments to the model constructor for local models, you can use it to pass arbitrary arguments to the model API for hosted models. See the documentation of the hosting service for information on what arguments they support.

| API or Inference Server                                                                                                   | Implemented?                                                                                            | `--model <xxx>` name                                | Models supported:                                                                                                                                                                                                                                                                                                                                          | Request Types:                                                                 |
|---------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
| OpenAI Completions                                                                                                        | :heavy_check_mark:                                                                                      | `openai-completions`, `local-completions`           | All OpenAI Completions API models                                                                                                                                                                                                                                                                                                                          | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| OpenAI ChatCompletions                                                                                                    | :heavy_check_mark:                                                                                      | `openai-chat-completions`, `local-chat-completions` | [All ChatCompletions API models](https://platform.openai.com/docs/guides/gpt)                                                                                                                                                                                                                                                                              | `generate_until` (no logprobs)                                                 |
| Anthropic                                                                                                                 | :heavy_check_mark:                                                                                      | `anthropic`                                         | [Supported Anthropic Engines](https://docs.anthropic.com/claude/reference/selecting-a-model)                                                                                                                                                                                                                                                               | `generate_until` (no logprobs)                                                 |
| Anthropic Chat                                                                                                            | :heavy_check_mark:                                                                                      | `anthropic-chat`, `anthropic-chat-completions`      | [Supported Anthropic Engines](https://docs.anthropic.com/claude/docs/models-overview)                                                                                                                                                                                                                                                                      | `generate_until` (no logprobs)                                                 |
| Textsynth                                                                                                                 | :heavy_check_mark:                                                                                      | `textsynth`                                         | [All supported engines](https://textsynth.com/documentation.html#engines)                                                                                                                                                                                                                                                                                  | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| Cohere                                                                                                                    | [:hourglass: - blocked on Cohere API bug](https://github.com/EleutherAI/lm-evaluation-harness/pull/395) | N/A                                                 | [All `cohere.generate()` engines](https://docs.cohere.com/docs/models)                                                                                                                                                                                                                                                                                     | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| [Llama.cpp](https://github.com/ggerganov/llama.cpp) (via [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)) | :heavy_check_mark:                                                                                      | `gguf`, `ggml`                                      | [All models supported by llama.cpp](https://github.com/ggerganov/llama.cpp)                                                                                                                                                                                                                                                                                | `generate_until`, `loglikelihood`, (perplexity evaluation not yet implemented) |
| vLLM                                                                                                                      | :heavy_check_mark:                                                                                      | `vllm`                                              | [Most HF Causal Language Models](https://docs.vllm.ai/en/latest/models/supported_models.html)                                                                                                                                                                                                                                                              | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| Mamba                                                                                                                     | :heavy_check_mark:                                                                                      | `mamba_ssm`                                         | [Mamba architecture Language Models via the `mamba_ssm` package](https://huggingface.co/state-spaces)                                                                                                                                                                                                                                                      | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| Huggingface Optimum (Causal LMs)                                                                                          | :heavy_check_mark:                                                                                      | `openvino`                                          | Any decoder-only AutoModelForCausalLM converted with Huggingface Optimum into OpenVINO™ Intermediate Representation (IR) format                                                                                                                                                                                                                            | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| Huggingface Optimum-intel IPEX (Causal LMs)                                                                               | :heavy_check_mark:                                                                                      | `ipex`                                              | Any decoder-only AutoModelForCausalLM                                                                                                                                                                                                                                                                                                                      | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| Huggingface Optimum-habana (Causal LMs)                                                                               | :heavy_check_mark:                                                                                      | `habana`                                              | Any decoder-only AutoModelForCausalLM                                                                                                                                                                                                                                                                                                                      | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| Neuron via AWS Inf2 (Causal LMs)                                                                                          | :heavy_check_mark:                                                                                      | `neuronx`                                           | Any decoder-only AutoModelForCausalLM supported to run on [huggingface-ami image for inferentia2](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2)                                                                                                                                                                                            | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| NVIDIA NeMo                                                                                                               | :heavy_check_mark:                                                                                      | `nemo_lm`                                           | [All supported models](https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/core/core.html#nemo-models)                                                                                                                                                                                                                                     | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| NVIDIA Megatron-LM                                                                                                        | :heavy_check_mark:                                                                                      | `megatron_lm`                                       | [Megatron-LM GPT models](https://github.com/NVIDIA/Megatron-LM) (standard and distributed checkpoints)                                                                                                                                                                                                                                                     | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| Watsonx.ai                                                                                                                | :heavy_check_mark:                                                                                      | `watsonx_llm`                                       | [Supported Watsonx.ai Engines](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx)                                                                                                                                                                                                                                 | `generate_until` `loglikelihood`                                               |
| Windows ML                                                                                           | :heavy_check_mark:                                                                                      | `winml`                                             | [ONNX models in GenAI format](https://code.visualstudio.com/docs/intelligentapps/modelconversion)                                                                                                                                                                                                                                                                                                                                 | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |
| [Your local inference server!](docs/API_guide.md)                                                                         | :heavy_check_mark:                                                                                      | `local-completions` or `local-chat-completions`     | Support for OpenAI API-compatible servers, with easy customization for other APIs.                                                                                                                                                                                                                                                                         | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                     |

Models which do not supply logits or logprobs can be used with tasks of type `generate_until` only, while local models, or APIs that supply logprobs/logits of their prompts, can be run on all task types: `generate_until`, `loglikelihood`, `loglikelihood_rolling`, and `multiple_choice`.

For more information on the different task `output_types` and model request types, see [our documentation](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/model_guide.md#interface).

> [!Note]
> For best performance with closed chat model APIs such as Anthropic Claude 3 and GPT-4, we recommend carefully looking at a few sample outputs using `--limit 10` first to confirm answer extraction and scoring on generative tasks is performing as expected. providing `system="<some system prompt here>"` within `--model_args` for anthropic-chat-completions, to instruct the model what format to respond in, may be useful.

### Other Frameworks

A number of other libraries contain scripts for calling the eval harness through their library. These include [GPT-NeoX](https://github.com/EleutherAI/gpt-neox/blob/main/eval_tasks/eval_adapter.py), [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples/MoE/readme_evalharness.md), and [mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/blob/master/eval_harness.py).

To create your own custom integration you can follow instructions from [this tutorial](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md#external-library-usage).

### Additional Features

> [!Note]
> For tasks unsuitable for direct evaluation — either due risks associated with executing untrusted code or complexities in the evaluation process — the `--predict_only` flag is available to obtain decoded generations for post-hoc evaluation.

If you have a Metal compatible Mac, you can run the eval harness using the MPS back-end by replacing `--device cuda:0` with `--device mps` (requires PyTorch version 2.1 or higher). **Note that the PyTorch MPS backend is still in early stages of development, so correctness issues or unsupported operations may exist. If you observe oddities in model performance on the MPS back-end, we recommend first checking that a forward pass of your model on `--device cpu` and `--device mps` match.**

> [!Note]
> You can inspect what the LM inputs look like by running the following command:
>
> ```bash
> python write_out.py \
>     --tasks <task1,task2,...> \
>     --num_fewshot 5 \
>     --num_examples 10 \
>     --output_base_path /path/to/output/folder
> ```
>
> This will write out one text file for each task.

To verify the data integrity of the tasks you're performing in addition to running the tasks themselves, you can use the `--check_integrity` flag:

```bash
lm_eval --model openai \
    --model_args engine=davinci-002 \
    --tasks lambada_openai,hellaswag \
    --check_integrity
```

## Advanced Usage Tips

For models loaded with the HuggingFace  `transformers` library, any arguments provided via `--model_args` get passed to the relevant constructor directly. This means that anything you can do with `AutoModel` can be done with our library. For example, you can pass a local path via `pretrained=` or use models finetuned with [PEFT](https://github.com/huggingface/peft) by taking the call you would run to evaluate the base model and add `,peft=PATH` to the `model_args` argument:

```bash
lm_eval --model hf \
    --model_args pretrained=EleutherAI/gpt-j-6b,parallelize=True,load_in_4bit=True,peft=nomic-ai/gpt4all-j-lora \
    --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq \
    --device cuda:0
```

Models provided as delta weights can be easily loaded using the Hugging Face transformers library. Within --model_args, set the delta argument to specify the delta weights, and use the pretrained argument to designate the relative base model to which they will be applied:

```bash
lm_eval --model hf \
    --model_args pretrained=Ejafa/llama_7B,delta=lmsys/vicuna-7b-delta-v1.1 \
    --tasks hellaswag
```

GPTQ quantized models can be loaded using [GPTQModel](https://github.com/ModelCloud/GPTQModel) (faster) or [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)

GPTQModel: add `,gptqmodel=True` to `model_args`

```bash
lm_eval --model hf \
    --model_args pretrained=model-name-or-path,gptqmodel=True \
    --tasks hellaswag
```

AutoGPTQ: add `,autogptq=True` to `model_args`:

```bash
lm_eval --model hf \
    --model_args pretrained=model-name-or-path,autogptq=model.safetensors,gptq_use_triton=True \
    --tasks hellaswag
```

We support wildcards in task names, for example you can run all of the machine-translated lambada tasks via `--task lambada_openai_mt_*`.

## Saving & Caching Results

To save evaluation results provide an `--output_path`. We also support logging model responses with the `--log_samples` flag for post-hoc analysis.

> [!TIP]
> Use `--use_cache <DIR>` to cache evaluation results and skip previously evaluated samples when resuming runs of the same (model, task) pairs. Note that caching is rank-dependent, so restart with the same GPU count if interrupted. You can also use --cache_requests to save dataset preprocessing steps for faster evaluation resumption.

To push results and samples to the Hugging Face Hub, first ensure an access token with write access is set in the `HF_TOKEN` environment variable. Then, use the `--hf_hub_log_args` flag to specify the organization, repository name, repository visibility, and whether to push results and samples to the Hub - [example dataset on the  HF Hub](https://huggingface.co/datasets/KonradSzafer/lm-eval-results-demo). For instance:

```bash
lm_eval --model hf \
    --model_args pretrained=model-name-or-path,autogptq=model.safetensors,gptq_use_triton=True \
    --tasks hellaswag \
    --log_samples \
    --output_path results \
    --hf_hub_log_args hub_results_org=EleutherAI,hub_repo_name=lm-eval-results,push_results_to_hub=True,push_samples_to_hub=True,public_repo=False \
```

This allows you to easily download the results and samples from the Hub, using:

```python
from datasets import load_dataset

load_dataset("EleutherAI/lm-eval-results-private", "hellaswag", "latest")
```

For a full list of supported arguments, check out the [interface](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md) guide in our documentation!

## Visualizing Results

You can seamlessly visualize and analyze the results of your evaluation harness runs using both Weights & Biases (W&B) and Zeno.

### Zeno

You can use [Zeno](https://zenoml.com) to visualize the results of your eval harness runs.

First, head to [hub.zenoml.com](https://hub.zenoml.com) to create an account and get an API key [on your account page](https://hub.zenoml.com/account).
Add this key as an environment variable:

```bash
export ZENO_API_KEY=[your api key]
```

You'll also need to install the `lm_eval[zeno]` package extra.

To visualize the results, run the eval harness with the `log_samples` and `output_path` flags.
We expect `output_path` to contain multiple folders that represent individual model names.
You can thus run your evaluation on any number of tasks and models and upload all of the results as projects on Zeno.

```bash
lm_eval \
    --model hf \
    --model_args pretrained=EleutherAI/gpt-j-6B \
    --tasks hellaswag \
    --device cuda:0 \
    --batch_size 8 \
    --log_samples \
    --output_path output/gpt-j-6B
```

Then, you can upload the resulting data using the `zeno_visualize` script:

```bash
python scripts/zeno_visualize.py \
    --data_path output \
    --project_name "Eleuther Project"
```

This will use all subfolders in `data_path` as different models and upload all tasks within these model folders to Zeno.
If you run the eval harness on multiple tasks, the `project_name` will be used as a prefix and one project will be created per task.

You can find an example of this workflow in [examples/visualize-zeno.ipynb](examples/visualize-zeno.ipynb).

### Weights and Biases

With the [Weights and Biases](https://wandb.ai/site) integration, you can now spend more time extracting deeper insights into your evaluation results. The integration is designed to streamline the process of logging and visualizing experiment results using the Weights & Biases (W&B) platform.

The integration provide functionalities

- to automatically log the evaluation results,
- log the samples as W&B Tables for easy visualization,
- log the `results.json` file as an artifact for version control,
- log the `<task_name>_eval_samples.json` file if the samples are logged,
- generate a comprehensive report for analysis and visualization with all the important metric,
- log task and cli specific configs,
- and more out of the box like the command used to run the evaluation, GPU/CPU counts, timestamp, etc.

First you'll need to install the lm_eval[wandb] package extra. Do `pip install lm_eval[wandb]`.

Authenticate your machine with an your unique W&B token. Visit https://wandb.ai/authorize to get one. Do `wandb login` in your command line terminal.

Run eval harness as usual with a `wandb_args` flag. Use this flag to provide arguments for initializing a wandb run ([wandb.init](https://docs.wandb.ai/ref/python/init)) as comma separated string arguments.

```bash
lm_eval \
    --model hf \
    --model_args pretrained=microsoft/phi-2,trust_remote_code=True \
    --tasks hellaswag,mmlu_abstract_algebra \
    --device cuda:0 \
    --batch_size 8 \
    --output_path output/phi-2 \
    --limit 10 \
    --wandb_args project=lm-eval-harness-integration \
    --log_samples
```

In the stdout, you will find the link to the W&B run page as well as link to the generated report. You can find an example of this workflow in [examples/visualize-wandb.ipynb](examples/visualize-wandb.ipynb), and an example of how to integrate it beyond the CLI.

## Contributing

Check out our [open issues](https://github.com/EleutherAI/lm-evaluation-harness/issues) and feel free to submit pull requests!

For more information on the library and how everything fits together, see our [documentation pages](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/docs).

To get started with development, first clone the repository and install the dev dependencies:

```bash
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e ".[dev,hf]"
````

### Implementing new tasks

To implement a new task in the eval harness, see [this guide](./docs/new_task_guide.md).

In general, we follow this priority list for addressing concerns about prompting and other eval details:

1. If there is widespread agreement among people who train LLMs, use the agreed upon procedure.
2. If there is a clear and unambiguous official implementation, use that procedure.
3. If there is widespread agreement among people who evaluate LLMs, use the agreed upon procedure.
4. If there are multiple common implementations but not universal or widespread agreement, use our preferred option among the common implementations. As before, prioritize choosing from among the implementations found in LLM training papers.

These are guidelines and not rules, and can be overruled in special circumstances.

We try to prioritize agreement with the procedures used by other groups to decrease the harm when people inevitably compare runs across different papers despite our discouragement of the practice. Historically, we also prioritized the implementation from [Language Models are Few Shot Learners](https://arxiv.org/abs/2005.14165) as our original goal was specifically to compare results with that paper.

### Support

The best way to get support is to open an issue on this repo or join the [EleutherAI Discord server](https://discord.gg/eleutherai). The `#lm-thunderdome` channel is dedicated to developing this project and the `#release-discussion` channel is for receiving support for our releases. If you've used the library and have had a positive (or negative) experience, we'd love to hear from you!

## Optional Extras

Extras dependencies can be installed via `pip install -e ".[NAME]"`

### Model Backends

These extras install dependencies required to run specific model backends:

| NAME           | Description                                      |
|----------------|--------------------------------------------------|
| hf             | HuggingFace Transformers (torch, transformers, accelerate, peft) |
| vllm           | vLLM fast inference                              |
| api            | API models (OpenAI, Anthropic, local servers)    |
| gptq           | AutoGPTQ quantized models                        |
| gptqmodel      | GPTQModel quantized models                       |
| ibm_watsonx_ai | IBM watsonx.ai models                            |
| ipex           | Intel IPEX backend                               |
| habana         | Intel Gaudi backend                              |
| optimum        | Intel OpenVINO models                            |
| neuronx        | AWS Inferentia2 instances                        |
| winml          | Windows ML (ONNX Runtime GenAI) - CPU/GPU/NPU    |
| sparsify       | Sparsify model steering                          |
| sae_lens       | SAELens model steering                           |

### Task Dependencies

These extras install dependencies required for specific evaluation tasks:

| NAME                 | Description                    |
|----------------------|--------------------------------|
| tasks                | All task-specific dependencies |
| acpbench             | ACP Bench tasks                |
| audiolm_qwen         | Qwen2 audio models             |
| ifeval               | IFEval task                    |
| japanese_leaderboard | Japanese LLM tasks             |
| longbench            | LongBench tasks                |
| math                 | Math answer checking           |
| multilingual         | Multilingual tokenizers        |
| ruler                | RULER tasks                    |

### Development & Utilities

| NAME          | Description                    |
|---------------|--------------------------------|
| dev           | Linting & contributions        |
| hf_transfer   | Speed up HF downloads          |
| sentencepiece | Sentencepiece tokenizer        |
| unitxt        | Unitxt tasks                   |
| wandb         | Weights & Biases logging       |
| zeno          | Zeno result visualization      |

## Cite as

```text
@misc{eval-harness,
  author       = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
  title        = {The Language Model Evaluation Harness},
  month        = 07,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {v0.4.3},
  doi          = {10.5281/zenodo.12608602},
  url          = {https://zenodo.org/records/12608602}
}
```


================================================
FILE: docs/API_guide.md
================================================
# TemplateAPI Usage Guide

The `TemplateAPI` class is a versatile superclass designed to facilitate the integration of various API-based language models into the lm-evaluation-harness framework. This guide will explain how to use and extend the `TemplateAPI` class to implement your own API models. If your API implements the OpenAI API you can use the `local-completions` or the `local-chat-completions` (defined [here](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/openai_completions.py)) model types, which can also serve as examples of how to effectively subclass this template.

## Overview

The `TemplateAPI` class provides a template for creating API-based model implementations. It handles common functionalities such as:

- Tokenization (optional)
- Batch processing
- Caching
- Retrying failed requests
- Parsing API responses

To use this class, you typically need to subclass it and implement specific methods for your API.

## Key Methods to Implement

When subclassing `TemplateAPI`, you need to implement the following methods:

1. `_create_payload`: Creates the JSON payload for API requests.
2. `parse_logprobs`: Parses log probabilities from API responses.
3. `parse_generations`: Parses generated text from API responses.

Optional Properties:

4. `header`: Returns the headers for the API request.
5. `api_key`: Returns the API key for authentication (if required).

You may also need to override other methods or properties depending on your API's specific requirements.

> [!NOTE]
> Currently loglikelihood and MCQ based tasks (such as MMLU) are only supported for completion endpoints. Not for chat-completion — those that expect a list of dicts — endpoints! Completion APIs which support instruct tuned models can be evaluated with the `--apply_chat_template` option in order to simultaneously evaluate models using a chat template format while still being able to access the model logits needed for loglikelihood-based tasks.

## TemplateAPI Arguments

When initializing a `TemplateAPI` instance or a subclass, you can provide several arguments to customize its behavior. Here's a detailed explanation of some important arguments:

- `model` or `pretrained` (str):
  - The name or identifier of the model to use.
  - `model` takes precedence over `pretrained` when both are provided.

- `base_url` (str):
  - The base URL for the API endpoint.

- `tokenizer` (str, optional):
  - The name or path of the tokenizer to use.
  - If not provided, it defaults to using the same tokenizer name as the model.

- `num_concurrent` (int):
  - Number of concurrent requests to make to the API.
  - Useful for APIs that support parallel processing.
  - Default is 1 (sequential processing).

- `timeout` (int, optional):
  - Timeout for API requests in seconds.
  - Default is 30.

- `tokenized_requests` (bool):
  - Determines whether the input is pre-tokenized. Defaults to `True`.
  - Requests can be sent in either tokenized form (`list[list[int]]`) or as text (`list[str]`, or `str` for batch_size=1).
  - For loglikelihood-based tasks, prompts require tokenization to calculate the context length. If `False` prompts are decoded back to text before being sent to the API.
  - Not as important for `generate_until` tasks.
  - Ignored for chat formatted inputs (list[dict...]) or if tokenizer_backend is None.

- `tokenizer_backend` (str, optional):
  - Required for loglikelihood-based or MCQ tasks.
  - Specifies the tokenizer library to use. Options are "tiktoken", "huggingface", or None.
  - Default is "huggingface".

- `max_length` (int, optional):
  - Maximum length of input + output.
  - Default is 2048.

- `max_retries` (int, optional):
  - Maximum number of retries for failed API requests.
  - Default is 3.

- `max_gen_toks` (int, optional):
  - Maximum number of tokens to generate in completion tasks.
  - Default is 256 or set in task yaml.

- `batch_size` (int or str, optional):
  - Number of requests to batch together (if the API supports batching).
  - Can be an integer or "auto" (which defaults to 1 for API models).
  - Default is 1.

- `seed` (int, optional):
  - Random seed for reproducibility.
  - Default is 1234.

- `add_bos_token` (bool, optional):
  - Whether to add the beginning-of-sequence token to inputs (when tokenizing).
  - Default is False.

- `custom_prefix_token_id` (int, optional):
  - Custom token ID to use as a prefix for inputs.
  - If not provided, uses the model's default BOS or EOS token (if `add_bos_token` is True).

- `verify_certificate` (bool, optional):
  - Whether to validate the certificate of the API endpoint (if HTTPS).
  - Default is True.

- `header` (dict, optional):
  - Custom headers for API requests.
  - If not provided, uses `{"Authorization": f"Bearer {self.api_key}"}` by default.

Example usage:

```python
class MyAPIModel(TemplateAPI):
    def __init__(self, **kwargs):
        super().__init__(
            model="my-model",
            base_url="https://api.mymodel.com/v1/completions",
            tokenizer_backend="huggingface",
            num_concurrent=5,
            max_retries=5,
            batch_size=10,
            **kwargs
        )

    # Implement other required methods...
```

When subclassing `TemplateAPI`, you can override these arguments in your `__init__` method to set default values specific to your API. You can also add additional (potentially user-specified) arguments as needed for your specific implementation.

## Example Implementation: OpenAI API

The `OpenAICompletionsAPI` and `OpenAIChatCompletion` ([here](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/openai_completions.py) classes demonstrate how to implement API models using the `TemplateAPI` class. Here's a breakdown of the key components:

### 1. Subclassing and Initialization

```python
@register_model("openai-completions")
class OpenAICompletionsAPI(LocalCompletionsAPI):
    def __init__(
        self,
        base_url="https://api.openai.com/v1/completions",
        tokenizer_backend="tiktoken",
        **kwargs,
    ):
        super().__init__(
            base_url=base_url, tokenizer_backend=tokenizer_backend, **kwargs
        )
```

### 2. Implementing API Key Retrieval

```python
@cached_property
def api_key(self):
    key = os.environ.get("OPENAI_API_KEY", None)
    if key is None:
        raise ValueError(
            "API key not found. Please set the OPENAI_API_KEY environment variable."
        )
    return key
```

### 3. Creating the Payload

```python
def _create_payload(
    self,
    messages: Union[List[List[int]], List[dict], List[str], str],
    generate=False,
    gen_kwargs: Optional[dict] = None,
    **kwargs,
) -> dict:
    if generate:
        # ... (implementation for generation)
    else:
        # ... (implementation for log likelihood)
```

### 4. Parsing API Responses

```python
@staticmethod
def parse_logprobs(
    outputs: Union[Dict, List[Dict]],
    tokens: List[List[int]] = None,
    ctxlens: List[int] = None,
    **kwargs,
) -> List[Tuple[float, bool]]:
    # ... (implementation)

@staticmethod
def parse_generations(outputs: Union[Dict, List[Dict]], **kwargs) -> List[str]:
    # ... (implementation)
```

The requests are initiated in the `model_call` or the `amodel_call` methods.

## Implementing Your Own API Model

To implement your own API model:

1. Subclass `TemplateAPI` or one of its subclasses (e.g., `LocalCompletionsAPI`).
2. Override the `__init__` method if you need to set specific parameters.
3. Implement the `_create_payload` and `header` methods to create the appropriate payload for your API.
4. Implement the `parse_logprobs` and `parse_generations` methods to parse your API's responses.
5. Override the `api_key` property if your API requires authentication.
6. Override any other methods as necessary to match your API's behavior.

## Best Practices

1. Use the `@register_model` decorator to register your model with the framework (and import it in `lm_eval/models/__init__.py`!).
2. Use environment variables for sensitive information like API keys.
3. Properly handle batching and concurrent requests if supported by your API.


================================================
FILE: docs/CONTRIBUTING.md
================================================
# Contributing to LM Evaluation Harness

Welcome and thank you for your interest in the LM Evaluation Harness! We welcome contributions and feedback and appreciate your time spent with our library, and hope you find it useful!

## Important Resources

There are several places information about LM Evaluation Harness is located:

- Our [documentation pages](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/docs)
- We occasionally use [GitHub Milestones](https://github.com/EleutherAI/lm-evaluation-harness/milestones) to track progress toward specific near-term version releases.
- We maintain a [Project Board](https://github.com/orgs/EleutherAI/projects/25) for tracking current work items and PRs, and for future roadmap items or feature requests.
- Further discussion and support conversations are located in the #lm-thunderdome channel of the [EleutherAI discord](https://discord.gg/eleutherai).

## Code Style

LM Evaluation Harness uses [ruff](https://github.com/astral-sh/ruff) for linting via [pre-commit](https://pre-commit.com/).

You can install linters and dev tools via

```pip install lm_eval[dev]``` or ```pip install -e ".[dev]"```

Then, run

```pre-commit install```

in order to ensure linters and other checks will be run upon committing.

## Testing

We use [pytest](https://docs.pytest.org/en/latest/) for running unit tests. All library unit tests can be run via:

```bash
python -m pytest --showlocals -s -vv -n=auto --ignore=tests/models/test_openvino.py
```

## Verbose logging

You can enable verbose logging with the environment variable `LMEVAL_LOG_LEVEL="debug"`.

## Contributor License Agreement

We ask that new contributors agree to a Contributor License Agreement affirming that EleutherAI has the rights to use your contribution to our library.
First-time pull requests will have a reply added by @CLAassistant containing instructions for how to confirm this, and we require it before merging your PR.

## Contribution Best Practices

We recommend a few best practices to make your contributions or reported errors easier to assist with.

**For Pull Requests:**

- PRs should be titled descriptively, and be opened with a brief description of the scope and intent of the new contribution.
- New features should have appropriate documentation added alongside them.
- Aim for code maintainability, and minimize code copying.
- If opening a task, try to share test results on the task using a publicly-available model, and if any public results are available on the task, compare to them.

**For Feature Requests:**

- Provide a short paragraph's worth of description. What is the feature you are requesting? What is its motivation, and an example use case of it? How does this differ from what is currently supported?

**For Bug Reports**:

- Provide a short description of the bug.
- Provide a *reproducible example*--what is the command you run with our library that results in this error? Have you tried any other steps to resolve it?
- Provide a *full error traceback* of the error that occurs, if applicable. A one-line error message or small screenshot snippet is unhelpful without the surrounding context.
- Note what version of the codebase you are using, and any specifics of your environment and setup that may be relevant.

**For Requesting New Tasks**:

- Provide a 1-2 sentence description of what the task is and what it evaluates.
- Provide a link to the paper introducing the task.
- Provide a link to where the dataset can be found.
- Provide a link to a paper containing results on an open-source model on the task, for use in comparisons and implementation validation.
- If applicable, link to any codebase that has implemented the task (especially the original publication's codebase, if existent).

## How Can I Get Involved?

To quickly get started, we maintain a list of good first issues, which can be found [on our project board](https://github.com/orgs/EleutherAI/projects/25/views/8) or by [filtering GH Issues](https://github.com/EleutherAI/lm-evaluation-harness/issues?q=is%3Aopen+label%3A%22good+first+issue%22+label%3A%22help+wanted%22). These are typically smaller code changes or self-contained features which can be added without extensive familiarity with library internals, and we recommend new contributors consider taking a stab at one of these first if they are feeling uncertain where to begin.

There are a number of distinct ways to contribute to LM Evaluation Harness, and all are extremely helpful! A sampling of ways to contribute include:

- **Implementing and verifying new evaluation tasks**: Is there a task you'd like to see LM Evaluation Harness support? Consider opening an issue requesting it, or helping add it! Verifying and cross-checking task implementations with their original versions is also a very valuable form of assistance in ensuring standardized evaluation.
- **Improving documentation** - Improvements to the documentation, or noting pain points / gaps in documentation, are helpful in order for us to improve the user experience of the library and clarity + coverage of documentation.
- **Testing and devops** - We are very grateful for any assistance in adding tests for the library that can be run for new PRs, and other devops workflows.
- **Adding new modeling / inference library integrations** - We hope to support a broad range of commonly-used inference libraries popular among the community, and welcome PRs for new integrations, so long as they are documented properly and maintainable.
- **Proposing or Contributing New Features** - We want LM Evaluation Harness to support a broad range of evaluation usecases. If you have a feature that is not currently supported but desired, feel free to open an issue describing the feature and, if applicable, how you intend to implement it. We would be happy to give feedback on the cleanest way to implement new functionalities and are happy to coordinate with interested contributors via GH discussions or via discord.

We hope that this has been helpful, and appreciate your interest in contributing! Further questions can be directed to [our Discord](discord.gg/eleutherai).


================================================
FILE: docs/README.md
================================================
# Eval Harness Documentation

Welcome to the docs for the LM Evaluation Harness!

## Table of Contents

* To learn about the public interface of the library, as well as how to evaluate via the command line or as integrated into an external library, see the [Interface](./interface.md).
* To learn how to add a new library, API, or model type to the library, as well as a quick explainer on the types of ways to evaluate an LM, see the [Model Guide](./model_guide.md).
  * For an extended description of how to extend the library to new model classes served over an API, see the [API Guide](./API_guide.md).
* For a crash course on adding new tasks to the library, see our [New Task Guide](./new_task_guide.md).
* To learn more about pushing the limits of task configuration that the Eval Harness supports, see the [Task Configuration Guide](./task_guide.md).


================================================
FILE: docs/chat-template-readme.md
================================================
# Chat Template Delimiter Handling Update

## Overview

This change modifies how delimiters are handled when applying chat templates in the request construction process for likelihood and multiple-choice based tasks. When `apply_chat_template` is set to `True`, the target delimiter is now set to an empty string instead of using the configured delimiter.

## Background

By default, the system uses a target delimiter (typically a whitespace " ") between the context and target text when constructing prompts. The full string is constructed as:

```text
doc_to_text(doc) + target_delimiter + doc_to_target(doc)
```

While this worked well for base models where we wanted the model to predict a single whitespace followed by the answer, chat models have their own formatting conventions that handle spacing differently.

## The Change

- When `apply_chat_template=True`, the target delimiter is now empty ("") instead of the default whitespace
- This prevents interference between chat template formatting and the default delimiter system
- Particularly important for multiple choice tasks where the template itself handles spacing

## Example

```text
# Before (with default delimiter " ")
<user>Question: What color is the sky?\nAnswer:<assistant> blue

# After
<user>Question: What color is the sky?\nAnswer:<assistant>blue
```


================================================
FILE: docs/config_files.md
================================================
# Configuration Guide

This guide explains how to use YAML configuration files with `lm-eval` to define reusable evaluation settings.

## Overview

Instead of passing many CLI arguments, you can define evaluation parameters in a YAML configuration file:

```bash
# Instead of:
lm-eval run --model hf --model_args pretrained=gpt2,dtype=float32 --tasks hellaswag arc_easy --num_fewshot 5 --batch_size 8 --device cuda:0

# Use:
lm-eval run --config eval_config.yaml
```

CLI arguments override config file values, so you can set defaults in a config file and override specific settings:

```bash
lm-eval run --config eval_config.yaml --tasks mmlu --limit 100
```

## Quick Reference

All configuration keys correspond directly to CLI arguments. See the [CLI Reference](interface.md#lm-eval-run) for detailed descriptions of each option.

## Config Schema

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `model` | string | `"hf"` | Model type/provider |
| `model_args` | dict | `{}` | Model constructor arguments |
| `tasks` | list/string | required | Tasks to evaluate |
| `num_fewshot` | int/null | `null` | Few-shot example count |
| `batch_size` | int/string | `1` | Batch size or "auto" |
| `max_batch_size` | int/null | `null` | Max batch size for auto |
| `device` | string/null | `"cuda:0"` | Device to use |
| `limit` | float/null | `null` | Example limit per task |
| `samples` | dict/null | `null` | Specific sample indices |
| `use_cache` | string/null | `null` | Response cache path |
| `cache_requests` | string/dict | `{}` | Request cache settings |
| `output_path` | string/null | `null` | Results output path |
| `log_samples` | bool | `false` | Save model I/O |
| `predict_only` | bool | `false` | Skip metrics |
| `apply_chat_template` | bool/string | `false` | Chat template |
| `system_instruction` | string/null | `null` | System prompt |
| `fewshot_as_multiturn` | bool/null | `null` | Multi-turn few-shot |
| `include_path` | string/null | `null` | External tasks path |
| `gen_kwargs` | dict | `{}` | Generation arguments |
| `wandb_args` | dict | `{}` | W&B init arguments |
| `hf_hub_log_args` | dict | `{}` | HF Hub logging |
| `seed` | list/int | `[0,1234,1234,1234]` | Random seeds |
| `trust_remote_code` | bool | `false` | Trust remote code |
| `metadata` | dict | `{}` | Task metadata |

---

## Example

```yaml
# basic_eval.yaml
model: hf
model_args:
  pretrained: gpt2
  dtype: float32

tasks:
  - hellaswag
  - arc_easy

num_fewshot: 0
batch_size: auto
device: cuda:0

output_path: ./results/gpt2/
log_samples: true

wandb_args:
  project: llm-evals
  name: mistral-7b-instruct
  tags:
    - mistral
    - instruct
    - production

hf_hub_log_args:
  hub_results_org: my-org
  results_repo_name: llm-eval-results
  push_results_to_hub: true
  public_repo: false
```

---

## Programmatic Usage

For loading config files in Python, see the [Python API Guide](python-api.md#using-evaluatorconfig).

---

## Validation

Validate your configuration before running:

```bash
# Check that tasks exist
lm-eval validate --tasks hellaswag,arc_easy

# With external tasks
lm-eval validate --tasks my_task --include_path /path/to/tasks
```

---

## Tips

1. **Start simple**: Begin with minimal config and add options as needed
2. **Use CLI overrides**: Set defaults in config, override with CLI for experiments
3. **Separate concerns**: Create different configs for different model families or task sets
4. **Version control**: Commit config files alongside results for reproducibility
5. **Use comments**: YAML supports `#` comments to document your choices


================================================
FILE: docs/decontamination.md
================================================
# Decontamination

## Usage

The provided directory should contain
the ngram files and info.json produced in "Pile Ngram Generation" further down.

```bash
python -m lm_eval \
    --model gpt2 \
    --device 0 \
    --tasks sciq
```

## Background

Downstream evaluations test model generalization, and are less useful when test set data also exists in the training set, referred to as leakage or contamination.

Filtering your training set against the test set is a good first step, however this isn't always possible, as in the case of a new benchmark or one that wasn't considered prior to model training. When training set filtering isn't possible, it is useful to measure the impact of test set leakage by detecting the contaminated test examples and producing a clean version of the benchmark.

The basis for our decontamination procedure can be found in Appendix C of "Language Models are Few-Shot Learners". OpenAI defined a test document as contaminated if any N-gram overlap existed with any training document. They used a range of N values between 8 and 13 depending on dataset, while we just used 13 for simplicity.

## Implementation

Contamination detection can be found in `lm_eval/decontaminate.py` with supporting code in `lm_eval/decontamination/`.

decontaminate.py does the following:

1. Build dictionaries of all ngrams and their corresponding evaluation/document ids.
2. Scan through sorted files containing training set n-grams.
3. If a match is found, the corresponding evaluation/document combinations are marked as contaminated.

`lm_eval/evaluator.py` can then produce a clean version of the benchmark by excluding the results of contaminated documents. For each metric, a clean version will be shown in the results with a "decontaminate" suffix.

This is disabled by default for new tasks, to support decontamination on a task override the "should_decontaminate" and "doc_to_decontamination_query" methods. For more details see the [task guide](task_guide.md).

## Pile Ngram Generation

The relevant scripts can be found in `scripts/clean_training_data`, which also import from
`lm_eval/decontamination/`

1. git clone https://github.com/EleutherAI/lm-evaluation-harness.git
2. pip install -r requirements.txt
3. Download The Pile from [The Eye](https://the-eye.eu/public/AI/pile/train/)
4. Place pile files in "pile" directory under "lm-evaluation-harness" (or create a symlink)
5. Run generate_13_grams.

```bash
export PYTHONHASHSEED=0
python -m scripts/clean_training_data/generate_13_grams \
       -dir path/to/working/directory \
       -n 13 \
       -buckets 500
```

Took approximately 4 days for us. We had the time to wait, but this could be scaled out by doing partial pile scans on multiple instances of this script and merging the relevant buckets. We fixed PYTHONHASHSEED to ensure reproducibility of bucket hashing in case you need to stop and start.

6. Sort the generated 13-grams.

```bash
python -m scripts/clean_training_data/sort_13_gram_buckets \
       -dir path/to/working/directory/output
```

Took approximately 5 days for us. You could speed this up by spreading the files around to different machines and running the sort script before gathering them together.

7. Compress the sorted 13 grams files and place them together with info.json.

This step only takes a few hours.

```bash
python -m scripts/clean_training_data/compress_and_package \
       -dir path/to/working/directory \
       -output path/to/final/directory \
       -procs 8
```


================================================
FILE: docs/footguns.md
================================================
# Common Pitfalls and Troubleshooting Guide

This document highlights common pitfalls and troubleshooting tips when using this library. We'll continue to add more tips as we discover them.

## YAML Configuration Issues

### Newline Characters in YAML (`\n`)

**Problem:** When specifying newline characters in YAML, they may be interpreted incorrectly depending on how you format them.

```yaml
# ❌ WRONG: Single quotes don't process escape sequences
generation_kwargs:
  until: ['\n']  # Gets parsed as the literal characters '\' and 'n' i.e "\\n"

```
```yaml
# ✅ RIGHT: Use double quotes for escape sequences
generation_kwargs:
  until: ["\n"]  # Gets parsed as an actual newline character

```

**Solutions:**
- Use double quotes for strings containing escape sequences
- For multiline content, use YAML's block scalars (`|` or `>`)
- When generating YAML programmatically, be careful with how template engines handle escape sequences

### Quoting in YAML

**When to use different types of quotes:**

- **No quotes**: Simple values (numbers, booleans, alphanumeric strings without special characters)
  ```yaml
  simple_value: plain text
  number: 42

  ```

- **Single quotes (')**:
  - Preserves literal values
  - Use when you need special characters to be treated literally
  - Escape single quotes by doubling them: `'It''s working'`
  ```yaml
  literal_string: 'The newline character \n is not processed here'
  path: 'C:\Users\name'  # Backslashes preserved

  ```

- **Double quotes (")**:
  - Processes escape sequences like `\n`, `\t`, etc.
  - Use for strings that need special characters interpreted
  - Escape double quotes with backslash: `"He said \"Hello\""`
  ```yaml
  processed_string: "First line\nSecond line"  # Creates actual newline
  unicode: "Copyright symbol: \u00A9"  # Unicode character

  ```


================================================
FILE: docs/interface.md
================================================
# User Guide

This document details the interface exposed by `lm-eval` and provides details on what flags are available to users.

## Command-line Interface

The `lm-eval` CLI is organized into subcommands:

| Command | Description |
|---------|-------------|
| `lm-eval run` | Run evaluations on language models |
| `lm-eval ls` | List available tasks, groups, subtasks, or tags |
| `lm-eval validate` | Validate task configurations |

Run the library via the `lm-eval` entrypoint or `python -m lm_eval`.

Use `-h` or `--help` to see available options:

```bash
lm-eval -h              # Show all subcommands
lm-eval run -h          # Show options for run command
lm-eval ls -h           # Show options for list command
```

> **Legacy Compatibility**: The original single-command interface still works. Running `lm-eval --model hf --tasks hellaswag` automatically inserts the `run` subcommand.

---

## Quick Start

```bash
# List available tasks
lm-eval ls tasks

# Basic evaluation
lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag

# With few-shot examples
lm-eval run --model hf --model_args pretrained=gpt2 --tasks arc_easy --num_fewshot 5

# Save results and model outputs
lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag --output_path ./results/ --log_samples

# Use a config file
lm-eval run --config eval_config.yaml
```

---

## `lm-eval run`

Run evaluations on language models.

```bash
lm-eval run --model <model> --tasks <task> [options]
```

### Quick Examples

```bash
# Basic evaluation with HuggingFace model
lm-eval run --model hf --model_args pretrained=gpt2 dtype=float32 --tasks hellaswag

# Multiple tasks with few-shot examples
lm-eval run --model vllm --model_args pretrained=EleutherAI/gpt-j-6B --tasks arc_easy arc_challenge --num_fewshot 5

# Custom generation parameters
lm-eval run --model hf --model_args pretrained=gpt2 --tasks lambada --gen_kwargs temperature=0.8 top_p=0.95

# Use a YAML configuration file
lm-eval run --config my_config.yaml --tasks mmlu
```

### Model and Tasks

| Argument | Short | Description |
|----------|-------|-------------|
| `--model` | `-M` | Model type/provider name (default: `hf`). See [supported models](https://github.com/EleutherAI/lm-evaluation-harness#model-apis-and-inference-servers). |
| `--model_args` | `-a` | Model constructor arguments as `key=val key2=val2` or `key=val,key2=val2`. For HuggingFace models, see [`HFLM`](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/models/huggingface.py) for available arguments. |
| `--tasks` | `-t` | Space or comma-separated list of task names or groups. Use `lm-eval ls tasks` to see available tasks. |
| `--apply_chat_template` | | Apply chat template to prompts. Use without argument for default template, or specify template name. |
| `--limit` | `-L` | Limit examples per task. Integer for count, float (0.0-1.0) for percentage. **For testing only.** |
| `--use_cache` | `-c` | Path prefix for SQLite cache of model responses (e.g., `/path/to/cache_`). |

### Evaluation Settings

| Argument | Short | Description |
|----------|-------|-------------|
| `--num_fewshot` | `-f` | Number of few-shot examples in context. |
| `--batch_size` | `-b` | Batch size: integer, `auto`, or `auto:N` to auto-tune N times (default: 1). |
| `--max_batch_size` | | Maximum batch size when using `--batch_size auto`. |
| `--device` | | Device to use: `cuda`, `cuda:0`, `cpu`, `mps` (default: `cuda`). |
| `--gen_kwargs` | | Generation arguments as `key=val key2=val2`. Values parsed with `ast.literal_eval`. Example: `temperature=0.8 'stop=["\n\n"]'` |

### Data and Output

| Argument | Short | Description |
|----------|-------|-------------|
| `--output_path` | `-o` | Output directory or JSON file for results. Required with `--log_samples`. |
| `--log_samples` | `-s` | Save all model inputs/outputs for post-hoc analysis. |
| `--samples` | `-E` | JSON mapping task names to sample indices, e.g., `'{"task1": [0,1,2]}'`. Incompatible with `--limit`. |

### Caching and Performance

| Argument | Description |
|----------|-------------|
| `--cache_requests` | Cache preprocessed prompts: `true`, `refresh`, or `delete`. Cached files stored in `lm_eval/cache/.cache` or path set by `LM_HARNESS_CACHE_PATH` env var. |
| `--check_integrity` | Run task test suite validation before evaluation. |

### Prompt Formatting

| Argument | Description |
|----------|-------------|
| `--system_instruction` | Custom system instruction prepended to prompts. |
| `--fewshot_as_multiturn` | Format few-shot examples as multi-turn conversation. Auto-enabled with `--apply_chat_template`. Set to `false` to disable. |

### Task Management

| Argument | Description |
|----------|-------------|
| `--include_path` | Additional directory containing external task YAML files. |

### Logging and Tracking

| Argument | Short | Description |
|----------|-------|-------------|
| `--verbosity` | `-v` | **(Deprecated)** Use `LMEVAL_LOG_LEVEL` env var instead. |
| `--write_out` | `-w` | Print prompts for first few documents (for debugging). |
| `--show_config` | | Display full task configuration after evaluation. |
| `--wandb_args` | | Weights & Biases arguments as `key=val`. E.g., `project=my-project name=run-1`. |
| `--wandb_config_args` | | Additional W&B config arguments. |
| `--hf_hub_log_args` | | HuggingFace Hub logging arguments. See [HF Hub Logging](#huggingface-hub-logging). |

### Advanced Options

| Argument | Short | Description |
|----------|-------|-------------|
| `--predict_only` | `-x` | Save predictions only, skip metric computation. Implies `--log_samples`. |
| `--seed` | | Random seeds as single integer or comma-separated list for `python,numpy,torch,fewshot`. Default: `0,1234,1234,1234`. Use `None` to skip. Example: `--seed 42` or `--seed 0,None,8,52`. |
| `--trust_remote_code` | | Allow executing remote code from HuggingFace Hub. |
| `--confirm_run_unsafe_code` | | Confirm understanding of risks for tasks executing arbitrary Python. |
| `--metadata` | | JSON string passed to TaskConfig. Required for some tasks like RULER. Example: `--metadata '{"max_seq_length": 4096}'`. |

### Configuration File

| Argument | Short | Description |
|----------|-------|-------------|
| `--config` | `-C` | Path to YAML configuration file. CLI arguments override config file values. See [Configuration Files](config_files.md). |

### HuggingFace Hub Logging

The `--hf_hub_log_args` argument accepts these keys:

| Key | Description |
|-----|-------------|
| `hub_results_org` | Organization name on HF Hub. Defaults to token owner. |
| `details_repo_name` | Repository name for detailed results. |
| `results_repo_name` | Repository name for aggregated results. |
| `push_results_to_hub` | `True`/`False` - push results to Hub. |
| `push_samples_to_hub` | `True`/`False` - push samples to Hub. Requires `--log_samples`. |
| `public_repo` | `True`/`False` - make repository public. |
| `leaderboard_url` | URL to associated leaderboard. |
| `point_of_contact` | Contact email for results dataset. |
| `gated` | `True`/`False` - gate the details dataset. |

---

## `lm-eval ls`

List available tasks, groups, subtasks, or tags.

```bash
lm-eval ls [tasks|groups|subtasks|tags] [--include_path DIR]
```

### Arguments

| Argument | Description |
|----------|-------------|
| `tasks` | List all available tasks (groups, subtasks, and tags). |
| `groups` | List only task groups (e.g., `mmlu`, `glue`, `superglue`). |
| `subtasks` | List only individual subtasks (e.g., `mmlu_anatomy`, `hellaswag`). |
| `tags` | List task tags (e.g., `reasoning`, `knowledge`). |
| `--include_path` | Additional directory for external task definitions. |

### Task Organization

- **Groups**: Collections of related tasks with aggregated metrics across subtasks (e.g., `mmlu` contains 57 subtasks)
- **Subtasks**: Individual evaluation tasks (e.g., `mmlu_anatomy`, `hellaswag`)
- **Tags**: Categories for filtering tasks without aggregated metrics (e.g., `reasoning`, `language`)

### Examples

```bash
# List all tasks
lm-eval ls tasks

# List only task groups
lm-eval ls groups

# Include external tasks
lm-eval ls tasks --include_path /path/to/external/tasks
```

---

## `lm-eval validate`

Validate task configurations before running evaluations.

```bash
lm-eval validate --tasks <task1,task2> [--include_path DIR]
```

### Arguments

| Argument | Short | Description |
|----------|-------|-------------|
| `--tasks` | `-t` | **(Required)** Comma-separated list of task names to validate. |
| `--include_path` | | Additional directory for external task definitions. |

### Validation Checks

The validate command performs:

- **Task existence**: Verifies all specified tasks are available
- **Configuration syntax**: Checks YAML/JSON configuration files
- **Dataset access**: Validates dataset paths and configurations
- **Required fields**: Ensures all mandatory task parameters are present
- **Metric definitions**: Verifies metric functions and aggregation methods
- **Filter pipelines**: Validates filter chains and their parameters
- **Template rendering**: Tests prompt templates with sample data

### Examples

```bash
# Validate a single task
lm-eval validate --tasks hellaswag

# Validate multiple tasks
lm-eval validate --tasks arc_easy,arc_challenge,hellaswag

# Validate a task group
lm-eval validate --tasks mmlu

# Validate external tasks
lm-eval validate --tasks my_custom_task --include_path ./custom_tasks
```

---

## Python API

For programmatic usage, see the [Python API Guide](python-api.md).

---

## Environment Variables

| Variable | Description |
|----------|-------------|
| `LMEVAL_LOG_LEVEL` | Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`). |
| `LM_HARNESS_CACHE_PATH` | Path for cached requests (default: `lm_eval/cache/.cache`). |
| `HF_TOKEN` | HuggingFace Hub token for private datasets/models. |
| `TOKENIZERS_PARALLELISM` | Set to `false` to avoid tokenizer warnings (auto-set by CLI). |


================================================
FILE: docs/model_guide.md
================================================
# New Model Guide

This guide may be of special interest to users who are using the library outside of the repository, via installing the library via pypi and calling `lm_eval.evaluator.evaluate()` to evaluate an existing model.

In order to properly evaluate a given LM, we require implementation of a wrapper class subclassing the `lm_eval.api.model.LM` class, that defines how the Evaluation Harness should interface with your model. This guide walks through how to write this `LM` subclass via adding it to the library!

## Setup

To get started contributing, go ahead and fork the main repo, clone it, create a branch with the name of your model, and install the project requirements in your environment:

```sh
# After forking...
git clone https://github.com/<YOUR-USERNAME>/lm-evaluation-harness.git
cd lm-evaluation-harness
git checkout -b <model-type>
pip install -e ".[dev]"
```

Now, we'll create a new file where we'll be adding our model:

```sh
touch lm_eval/models/<my_model_filename>.py
```

**Tip: this filename should not shadow package names! For example, naming your file `anthropic.py` is disallowed since the API's name on pypi is `anthropic`, but naming it `anthropic_llms.py` works with no problems.**

## Interface

All models must subclass the `lm_eval.api.model.LM` class.

The LM class enforces a common interface via which we can extract responses from a model:

```python
class MyCustomLM(LM):
    #...
    def loglikelihood(self, requests: list[Instance]) -> list[tuple[float, bool]]:
        #...


    def loglikelihood_rolling(self, requests: list[Instance]) -> list[tuple[float, bool]]:
        #...


    def generate_until(self, requests: list[Instance]) -> list[str]:
        #...
    #...
```

Where `Instance` is a dataclass defined in [`lm_eval.api.instance`](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/api/instance.py) with property `args` of request-dependent type signature described below.

We support three types of requests, consisting of different interactions / measurements with an autoregressive LM.

All three request types take as input `requests` of type `list[Instance]` that have a matching `Instance.request_type` to the method name.

- `generate_until`
  - Each request contains `Instance.args : Tuple[str, dict]` containing 1. an input string to the LM and 2. a dictionary of keyword arguments used to control generation parameters.
  - Using this input and these generation parameters, text will be sampled from the language model (typically until a maximum output length or specific stopping string sequences--for example, `{"until": ["\n\n", "."], "max_gen_toks": 128}`).
  - The generated output text from the model will then be returned.

- `loglikelihood`
  - Each request contains `Instance.args : Tuple[str, str]` containing 1. an input string to the LM and 2. a target string on which the loglikelihood of the LM producing this target, conditioned on the input, will be returned.
  - Each request will have, as result, `(ll, is_greedy): Tuple[float, int]` returned, where `ll` is a floating point number representing the log probability of generating the target string conditioned on the input, and `is_greedy` being either the value `0` or `1`, with it being `1` if and only if the target string *would be generated by greedy sampling from the LM* (that is, if the  target string is the *most likely* N-token string to be output by the LM given the input. )

- `loglikelihood_rolling`
  - Each request contains `Instance.args : Tuple[str]`, which is an input string to the model whose *entire* loglikelihood, conditioned on purely the EOT token, will be calculated.
  - This is used to evaluate *perplexity* on a data distribution.
  - It should return `(ll,) : Tuple[float]` , a.k.a. solely the *loglikelihood* of producing each piece of text given no starting input.

To allow a model to be evaluated on all types of tasks, you will need to implement these three types of measurements (note that `loglikelihood_rolling` is a special case of `loglikelihood`). For a reference implementation, check out `lm_eval/models/huggingface.py` ! Additionally, check out `lm_eval.api.model.TemplateLM` for a class that abstracts away some commonly used functions across LM subclasses, or see if your model would lend itself well to subclassing the `lm_eval.models.huggingface.HFLM` class and overriding just the initialization or a couple methods!

**Tip: be careful of indexing in loglikelihood!**

LMs take in tokens in position `[0 1 2 ... N]` and output a probability distribution for token position `N+1`. We provide a simplified graphic here, excerpted from `huggingface.py`:

```text
# how this all works (illustrated on a causal decoder-only setup):
#          CTX      CONT
# inp    0 1 2 3|4 5 6 7 8 9   <- last token is deleted by inp[:, :-1]
# model  \               \
# logits   1 2 3|4 5 6 7 8 9   <- the ctx half gets tossed out by the
# cont_toks      4 5 6 7 8 9      [:, -len(continuation_enc):, :self.vocab_size] slice
```

The final token of the target is not passed into the LM, because we want the LM's predictions *up to but not past* that final target token. For more information, check out https://github.com/EleutherAI/lm-evaluation-harness/issues/942 .

## Registration

Congrats on implementing your model! Now it's time to test it out.

To make your model usable via the command line interface to `lm-eval` using `python -m lm_eval`, you'll need to tell `lm-eval` what your model's name is.

This is done via a *decorator*, `lm_eval.api.registry.register_model`. Using `register_model()`, one can both tell the package what the model's name(s) to be used are when invoking it with `python -m lm_eval --model <name>` and alert `lm-eval` to the model's existence.

```python
from lm_eval.api.registry import register_model

@register_model("<name1>", "<name2>")
class MyCustomLM(LM):
```

Using this decorator results in the class being added to an accounting of the usable LM types maintained internally to the library at `lm_eval.api.registry.MODEL_REGISTRY`. See `lm_eval.api.registry` for more detail on what sorts of registries and decorators exist in the library!

**Tip: be sure to import your model in `lm_eval/models/__init__.py!`**

## Testing

We also recommend that new model contributions be accompanied by short tests of their 3 core functionalities, at minimum. To see an example of such tests, look at https://github.com/EleutherAI/lm-evaluation-harness/blob/35bdecd379c0cefad6897e67db892f4a6026a128/tests/test_ggml.py .

## Chat Templating

Many models are fine-tuned with a [Chat Template](https://huggingface.co/docs/transformers/main/en/chat_templating) in order to enable back-and-forth interaction between a "User"'s queries and the model (often called "Assistant")'s responses. It can be desirable to evaluate fine-tuned models on evaluation tasks while wrapped in the conversational format they expect.

In order to make your model optionally compatible with a chat format, three additional methods must be implemented:

```python
class MyCustomLM(LM):
    #...
    @property
    def tokenizer_name(self) -> str:
        """
        Return the name of the model's tokenizer and/or the accompanying chat template.
        The returned string is used to cache requests.

        Returns:
            str: The name of the model's tokenizer and/or chat template.
        """

    def chat_template(self, chat_template: Union[bool, str] = False) -> str:
        """
        Get the appropriate chat template for the model based on the `chat_template` argument.

        This method returns the chat template string to build the prompt from a chat history.
        The chat template is saved in the evaluation results for reproducibility.
        Boolean arguments should be used with models that have only one chat template,
        while string arguments are used with models that have multiple chat templates.
        For the reference implementation, see HFLM class in `lm_eval.models.huggingface`.

        Args:
            chat_template (Union[bool, str]): Specifies whether to apply a chat template:
                - If False: Do not apply any chat template.
                - If True: Apply the default chat template.
                - If str: Apply the specified chat template by name.

        Returns:
            str: The selected chat template in Jinja format.
        """

    def apply_chat_template(self, chat_history: List[Dict[str, str]]) -> str:
        """
        Process a chat history to create a string that can be tokenized and input into the model.

        Args:
            chat_history (List[Dict[str, str]]): A list of dictionaries representing the chat history,
                where each dictionary has "role" and "content" keys.

        Returns:
            str: A string representing the chat history that can be tokenized and fed into the model.
        """
```

- `apply_chat_template`
  - This method performs the bulk of the work required for chat-formatting.
  - As input, a `chat_history: List[Dict[str, str]]` is passed in. This is a transcript of a conversation of a form similar to

  ```text
      [
        {"system": <user-provided system message such as "You are a helpful math-focused chatbot">},
        {"user": <task example - a few-shot example 'input'>}
        {"assistant": <correct response to the above example>},
        # ... more few-shot examples, potentially
        {"user": <test set query--response on which we will evaluate>},
      ]
  ```

  which can then be converted into a string input.
  - The output is a string representing this conversation that can be fed into the model.
  - For example, this consists of simply calling `tokenizer.apply_chat_template` for HFLM--see the implementation there for reference.
- `tokenizer_name`
  - LM Eval Harness supports [caching requests](https://github.com/EleutherAI/lm-evaluation-harness/blob/4902aaaf1f374682f95ac25fe2e13b23faddc91a/lm_eval/__main__.py#L140) that are sent to a model, for faster setup when repeating an already-performed evaluation.
  - However, we don't want to use the cache of chat transcripts rendered using one chat template or system prompt to send to a model with a different template! So, we use this `lm.tokenizer_name` string to distinguish caches for a given model (and chat template) from one another.
- `chat_template`
  - Chat templates are typically provided as a Jinja template string or a string formatted with str.format to include user and assistant messages in a single prompt. This template string is saved in the evaluation results to ensure reproducibility.

If not implemented for a given model type, the flags `--apply_chat_template` , `--fewshot_as_multiturn`, and `--system_instruction` cannot be used.

## Other

**Pro tip**: In order to make the Evaluation Harness overestimate total runtimes rather than underestimate it, HuggingFace models come in-built with the ability to provide responses on data points in *descending order by total input length* via `lm_eval.utils.Reorderer`. Take a look at `lm_eval.models.hf_causal.HFLM` to see how this is done, and see if you can implement it in your own model!

## Conclusion

After reading this guide, you should be able to add new model APIs or implementations to the Eval Harness library!


================================================
FILE: docs/new_task_guide.md
================================================
# New Task Guide

`lm-evaluation-harness` is a framework that strives to support a wide range of zero- and few-shot evaluation tasks on autoregressive language models (LMs).

This documentation page provides a walkthrough to get started creating your own task, in `lm-eval` versions v0.4.0 and later.

A more interactive tutorial is available as a Jupyter notebook [here](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/examples/lm-eval-overview.ipynb).

## Setup

If you haven't already, go ahead and fork the main repo, clone it, create a branch with the name of your task, and install the project requirements in your environment:

```sh
# After forking...
git clone https://github.com/<YOUR-USERNAME>/lm-evaluation-harness.git
cd lm-evaluation-harness
git checkout -b <task-name>
pip install -e ".[dev]"
```

In this document, we'll walk through the basics of implementing a static benchmark evaluation in two formats: a *generative* task which requires sampling text from a model, such as [`gsm8k`](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/gsm8k/gsm8k.yaml), and a *discriminative*, or *multiple choice*, task where the model picks the most likely of several fixed answer choices, such as [`sciq`](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/sciq/sciq.yaml).

## Creating a YAML file

To implement a new standard task, we'll need to write a YAML file which configures our task logic. We start by making a new empty YAML file. This file can have any name, but we recommend placing it in a subfolder of `lm_eval/tasks` titled by the dataset or task's shorthand name: for example,

```sh
touch lm_eval/tasks/<dataset_name>/<my_new_task_name>.yaml
```

Or, copy the template subfolder we provide from `templates/new_yaml_task`:

```sh
cp -r templates/new_yaml_task lm_eval/tasks/
```

and rename the folders and YAML file(s) as desired.

### Selecting and configuring a dataset

All data downloading and management is handled through the HuggingFace (**HF**) [`datasets`](https://github.com/huggingface/datasets) API. So, the first thing you should do is check to see if your task's dataset is already provided in their catalog [here](https://huggingface.co/datasets). If it's not in there, please consider adding it to their Hub to make it accessible to a wider user base by following their [new dataset guide](https://github.com/huggingface/datasets/blob/main/ADD_NEW_DATASET.md)
.
> [!TIP]
> To test your task, we recommend using verbose logging using `export LMEVAL_LOG_LEVEL="DEBUG"` in your shell before running the evaluation script. This will help you debug any issues that may arise.
Once you have a HuggingFace dataset prepared for your task, we want to assign our new YAML to use this dataset:

```yaml
dataset_path: ... # the name of the dataset on the HF Hub.
dataset_name: ... # the dataset configuration to use. Leave `null` if your dataset does not require a config to be passed. See https://huggingface.co/docs/datasets/load_hub#configurations for more info.
dataset_kwargs: null # any extra keyword arguments that should be passed to the dataset constructor, e.g. `data_dir`.
```

Next, we'd like to tell our task what the dataset's train, validation, and test splits are named, if they exist:

```yaml
training_split: <split name of training set, or `null`>
validation_split: <split name of val. set, or `null`>
test_split: <split name of test set, or `null`>
```

Tests will run on the `test_split` if it is available, and otherwise evaluate on the `validation_split`.

We can also specify from which split the task should retrieve few-shot examples via:

```yaml
fewshot_split: <split name to draw fewshot examples from, or `null`>
```

or by hardcoding them, either using the following in the yaml file:

```yaml
fewshot_config:
  sampler: first_n
  samples: [
    {<sample 1>},
    {<sample 2>},
  ]
```

The full `fewshot_config` supports the following fields:

```yaml
fewshot_config:
  sampler: default        # Sampling strategy: "default" (random) or "first_n"
  split: train            # Dataset split to draw fewshot examples from (overrides fewshot_split)
  samples: [...]          # Hardcoded list of fewshot examples, or a callable returning them
  doc_to_text: "..."      # Override doc_to_text for fewshot examples only
  doc_to_target: "..."    # Override doc_to_target for fewshot examples only
  doc_to_choice: "..."    # Override doc_to_choice for fewshot examples only
  gen_prefix: "Answer:"   # Prefix for assistant response in fewshot examples
  fewshot_delimiter: "\n\n"  # Delimiter between fewshot examples
  target_delimiter: " "      # Delimiter between question and answer
```

All fields are optional. If not specified, they inherit from the parent `TaskConfig`. This allows you to format fewshot examples differently from the evaluation examples — useful when your fewshot source has different field names or requires different formatting.

You can also hardcode fewshot examples by adding the function `list_fewshot_samples` in the associated utils.py file:

```python
def list_fewshot_samples() -> list[dict]:
  return [{<sample 1>}, {<sample 2>}]
```

See `lm_eval/tasks/minerva_math/minerva_math_algebra.yaml` for an example of the latter, and `lm_eval/tasks/gsm8k/gsm8k-cot.yaml` for an example of the former.

In this case, each sample must contain the same fields as the samples in the above sets--for example, if `doc_to_text` expects an `input` field when rendering input prompts, these provided samples must include an `input` key.

If neither above options are not set, we will default to train/validation/test sets, in that order.

Finally, our dataset may not be already in the exact format we want. Maybe we have to strip whitespace and special characters via a regex from our dataset's "question" field! Or maybe we just want to rename its columns to match a convention we'll be using for our prompts.

Let's create a python file in the directory where we're writing our YAML file:

```bash
touch lm_eval/tasks/<dataset_name>/utils.py
```

Now, in `utils.py` we'll write a function to process each split of our dataset (the following example is drawn from [the `hellaswag` task](../lm_eval/tasks/hellaswag/utils.py)):

```python
def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
    def _process_doc(doc):
        ctx = doc["ctx_a"] + " " + doc["ctx_b"].capitalize()
        out_doc = {
            "query": preprocess(doc["activity_label"] + ": " + ctx),
            "choices": [preprocess(ending) for ending in doc["endings"]],
            "gold": int(doc["label"]),
        }
        return out_doc

    return dataset.map(_process_doc)
```

Now, in our YAML config file we'll use the `!function` constructor, and tell the config where our imported Python function will come from. At runtime, before doing anything else we will preprocess our dataset according to this function!

```yaml
process_docs: !function utils.process_docs
```

### Using Local Datasets

To load a local dataset for evaluation, you can specify data files in the `dataset_kwargs` field, such as the following for JSON files:

```yaml
dataset_path: json
dataset_name: null
dataset_kwargs:
  data_files: /path/to/my/json
```

Or with files already split into separate directories:

```yaml
dataset_path: arrow
dataset_kwargs:
  data_files:
    train: /path/to/arrow/train/data-00000-of-00001.arrow
    validation: /path/to/arrow/validation/data-00000-of-00001.arrow
```

Alternatively, if you have previously downloaded a dataset from huggingface hub (using `save_to_disk()`) and wish to use the local files, you will need to use `data_dir` under `dataset_kwargs` to point to where the directory is.

```yaml
dataset_path: hellaswag
dataset_kwargs:
  data_dir: hellaswag_local/
```

You can also set `dataset_path` as a directory path in your local system. This will assume that there is a loading script with the same name as the directory. [See datasets docs](https://huggingface.co/docs/datasets/loading#local-loading-script).

## Writing a Prompt Template

The next thing we need to do is decide what format to use when presenting the data to the LM. This is our **prompt**, where we'll define both an input and output format.

To write a prompt, users will use `doc_to_text`, `doc_to_target`, and `doc_to_choice` (Optional when certain conditions are met).

`doc_to_text` defines the input string a model will be given while `doc_to_target` and `doc_to_choice` will be used to generate the target text. `doc_to_target` can be either a text string that refers to the target string or an integer that refers to the index of the correct label. When it is set as an index, `doc_to_choice` must also be set with the appropriate list of possible choice strings.

### Basic prompts

If a dataset is straightforward enough, users can enter the feature name directly. This assumes that no preprocessing is required. For example in [Swag](https://github.com/EleutherAI/lm-evaluation-harness/blob/1710b42d52d0f327cb0eb3cb1bfbbeca992836ca/lm_eval/tasks/swag/swag.yaml#L10-L11), `doc_to_text` and `doc_to_target` given the name of one of the feature each.

```yaml
doc_to_text: startphrase
doc_to_target: label
```

Hard-coding is also possible as is the case in [SciQ](https://github.com/EleutherAI/lm-evaluation-harness/blob/1710b42d52d0f327cb0eb3cb1bfbbeca992836ca/lm_eval/tasks/sciq/sciq.yaml#L11).

```yaml
doc_to_target: 3
```

`doc_to_choice` can be directly given a list of text as option (See [Toxigen](https://github.com/EleutherAI/lm-evaluation-harness/blob/1710b42d52d0f327cb0eb3cb1bfbbeca992836ca/lm_eval/tasks/toxigen/toxigen.yaml#L11))

```yaml
doc_to_choice: ['No', 'Yes']
```

if a dataset feature is already a list, you can set the name of the feature as `doc_to_choice` (See [Hellaswag](https://github.com/EleutherAI/lm-evaluation-harness/blob/e0eda4d3ffa10e5f65e0976161cd134bec61983a/lm_eval/tasks/hellaswag/hellaswag.yaml#L13))

```yaml
doc_to_choice: choices
```

### Writing a prompt with Jinja 2

We support the [Jinja 2](https://jinja.palletsprojects.com/en/3.1.x/) templating language for writing prompts. In practice, this means you can take your dataset's columns and do many basic string manipulations to place each document into prompted format.

Take for example the dataset `super_glue/boolq`. As input, we'd like to use the features `passage` and `question` and string them together so that for a sample line `doc`, the model sees something in the format of:

```text
doc["passage"]
Question: doc["question"]?
Answer:
```

We do this by [writing](https://github.com/EleutherAI/lm-evaluation-harness/blob/1710b42d52d0f327cb0eb3cb1bfbbeca992836ca/lm_eval/tasks/super_glue/boolq/default.yaml#L9C1-L9C61)

```yaml
doc_to_text: "{{passage}}\nQuestion: {{question}}?\nAnswer:"
```

Such that `{{passage}}` will be replaced by `doc["passage"]` and `{{question}}` with `doc["question"]` when rendering the prompt template.

Our intended output is for the model to predict a single whitespace, and then the answer to the question. We do this via:

```yaml
doc_to_target: "{{answer}}"
```

#### Multiple choice format

For tasks which are multiple choice (a fixed, finite set of label words per each document) and evaluated via comparing loglikelihoods of all label words (the `multiple_choice` task output type) we enforce a particular convention on prompt format.

> [!WARNING]
> We add `target_delimiter` between input and target which defaults to " ", such that the full input-output string is `doc_to_text(doc) + target_delimiter + doc_to_target(doc)`. `doc_to_text` and `doc_to_target` should not contain trailing right or left whitespace, respectively. For multiple choice the target will be each choice index concatenated with the delimiter.

An annotated example in the case of SciQ is as follows:

```yaml
doc_to_text: "{{support.lstrip()}}\nQuestion: {{question}}\nAnswer:" # This is the input portion of the prompt for this doc. It will have " {{choice}}" appended to it as target for each choice in answer_choices.
doc_to_target: 3 # this contains the index into the answer choice list of the correct answer.
doc_to_choice: "{{[distractor1, distractor2, distractor3, correct_answer]}}"
```

Task implementers are thus able to decide what the answer choices should be for a document, and what prompt format to use.

The label index can also be sourced from a feature directly. For example in `superglue/boolq`, the label index if defined in the feature `label`. We can set `doc_to_target` as simply `label`. The options or verbalizers can be written in the form of a list `["no", "yes"]` that will correspond to the label index.

```yaml
doc_to_text: "{{passage}}\nQuestion: {{question}}?\nAnswer:"
doc_to_target: label
doc_to_choice: ["no", "yes"]
```

### Using Python Functions for Prompts

There may be cases where the prompt we want to implement is easier expressed in Python instead of Jinja 2. For this, we can use Python helper functions that are defined in the YAML config. It should be noted that the function script must be in the same directory as the yaml.

A good example is WikiText that requires a lot of regex rules to clean the samples.

```python
def wikitext_detokenizer(doc):
    string = doc["page"]
    # contractions
    string = string.replace("s '", "s'")
    string = re.sub(r"/' [0-9]/", r"/'[0-9]/", string)
    ...
    string = string.replace(" 's", "'s")

    return string
```

We can load this function in `doc_to_target` by using a `!function` operator after `doc_to_target` and followed by `<file name>.<function name>`. In the file [wikitext.yaml](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/wikitext/wikitext.yaml) we write:

```yaml
doc_to_target: !function preprocess_wikitext.wikitext_detokenizer
```

### Importing a Prompt from Promptsource

[Promptsource](https://github.com/bigscience-workshop/promptsource/tree/main/promptsource) is a great repository for crowdsourced prompts for many datasets. We can load these prompts easily by using the `use_prompt` argument and filling it with the format `"promptsource:<name of prompt template>"`. To use this, `doc_to_text` and `doc_to_target` should be left undefined. This will fetch the template of the dataset defined in the YAML file.

For example, For Super Glue BoolQ, if we want to use the prompt template `GPT-3 Style` we can add this to the YAML file.

```yaml
use_prompt: "promptsource:GPT-3 Style"
```

If you would like to run evaluation on all prompt templates, you can simply call it this way.

```yaml
use_prompt: "promptsource:*"
```

### Setting metrics

You're almost done! Now we need to choose how to score our task.

- *If this is a multiple choice task:* do you just want to check your model's accuracy in choosing the correct answer choice?
- *If this is a generation task:* do you just want to check how often your model outputs *exactly the ground-truth output string provided*?

If the answer to the above is no: you'll need to record what scoring metrics to use! Metrics can be listed in the following format:

```yaml
metric_list:
  - metric: <name of the metric here>
    aggregation: <name of the aggregation fn here>
    higher_is_better: <true or false>
  - metric: !function script.function
    aggregation: ...
    higher_is_better: ...
```

`aggregation` and `higher_is_better` can optionally be left out to default to the manually-set defaults if using a natively supported metric, otherwise it must be defined explicitly (for example, when using a custom metric implemented as a function).

For a full list of natively supported metrics and aggregation functions see [`docs/task_guide.md`](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md). All metrics supported in [HuggingFace Evaluate](https://github.com/huggingface/evaluate/tree/main/metrics) can also be used, and will be loaded if a given metric name is not one natively supported in `lm-eval` or `hf_evaluate` is set to `true`.

### Optional, More Advanced Setup

Some tasks may require more advanced processing logic than is described in this guide.

As a heuristic check:

- Does your task require generating multiple free-form outputs per input document?
- Does your task require complex, multi-step post-processing of generated model outputs?
- Does your task require subsetting documents on the fly based on their content?
- Do you expect to compute metrics after applying multiple such processing steps on your model outputs?
- Does your task rely on metrics that need a custom implementation?

For more detail on the task system and advanced features, see [`docs/task_guide.md`](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md). If none of the above sounds like they apply to your task, it's time to continue onto checking your task performance!

### Task name + tags (registering a task)

To test a task conveniently, it helps to *register* the task--that is, to give it a name and make the `lm-eval` library aware it exists!

If you're writing your YAML file inside the `lm_eval/tasks` folder, you just need to give your task a name! You can do this inside your YAML file:

```yaml
task: <name of the task>
```

Including a task name is mandatory.

It is often also convenient to label your task with several `tag` values, though this field is optional:

```yaml
tag:
  - tag1
  - tag2
```

This will add your task to the `tag1` and `tag2` tags, enabling people to know how to categorize your task, and if desired run all tasks in one of these groups at once, your task along with them.

If your task is not in the `lm_eval/tasks` folder, you'll need to tell the Eval Harness where to look for YAML files.

You can do this via the `--include_path` argument in `__main__.py`. This command will be used to initialize the `TaskManager` object which you can also use for your custom scripts.

```python
task_manager = TaskManager(args.verbosity, include_path=args.include_path)
```

Passing `--tasks /path/to/yaml/file` is also accepted.

### Advanced Group Configs

While `tag` values are helpful when you want to be able to quickly and conveniently run a set of related tasks via `--tasks my_tag_name`, often, we wish to implement more complex logic. For example, the MMLU benchmark contains 57 *subtasks* that must all be *averaged* together in order to report a final 'MMLU score'.

Groupings of tasks might also use particular variants of a task--for example, we might want to default to evaluating a task as 5-shot when called as part of a given grouping, but not have a preference for number of shots when evaluating it as a standalone.

We implement this via **groups**, which are distinct from tags. Groups can be implemented via *group config* YAML files, which are laid out similarly but slightly differently to tasks' YAML configs.

The most basic form of group can be defined via a YAML config similar to the following:

```yaml
group: nli_tasks
task:
  - cb
  - anli_r1
  - rte
metadata:
  version: 1.0
```

This will behave almost identically to a `tag` that includes these 3 tasks, but with one key distinction: we'll print the `nli_tasks` group as a row (with no associated metrics) in our table of outputs, and visually show that these 3 tasks appear under its subheader.

Now, let's assume we actually want to report an aggregate score for `nli_tasks`. We would instead use a YAML config like the following:

```yaml
group: nli_tasks
task:
  - cb
  - anli_r1
  - rte
aggregate_metric_list:
  - metric: acc
    aggregation: mean
    weight_by_size: true # defaults to `true`. Set this to `false` to do a "macro" average (taking each subtask's average accuracy, and summing those accuracies and dividing by 3)--by default we do a "micro" average (retain all subtasks' per-document accuracies, and take the mean over all documents' accuracies to get our aggregate mean).
metadata:
  version: 1.0
```

Similar to our `metric_list` for listing out the metrics we want to calculate for a given task, we use an `aggregate_metric_list` field to specify which metric name to aggregate across subtasks, what aggregation function to use, and whether we should micro- or macro- average these metrics. See [./task_guide.md](./task_guide.md) for a full list of related sub-keys.

**[!Tip]: currently, we predominantly only support the aggregation of group metrics that use `mean` (either micro- or macro- averaged) over their subtasks. If you require even more complex aggregation rules, you may want to perform aggregation offline.**

Group configs can be fairly complex! We can do various operations, such as defining new subtask(s) inline in our group YAML, overriding an existing task's specific config value, or nesting existing groups within our

For example, let's build a config for evaluating MMLU and a few natural language inference tasks. For MMLU, we can write the name for the benchmark as a subtask written under `task`. You can configure the parameters such as `num_fewshot`. If the task being configured is a group such as `mmlu` or `super_glue`, the parameter set will be applied to all of the subtasks.

```yaml
group: nli_and_mmlu
task:
  - group: nli_tasks
    task:
      - cb
      - anli_r1
      - rte
    aggregate_metric_list:
      - metric: acc
        aggregation: mean
        higher_is_better: true
  - task: mmlu
    num_fewshot: 2
```

### Configuring python classes

There can be occasions when yaml-based tasks cannot accommodate how a task is handled. LM-Eval supports the manually implementing tasks as was previously done before `0.4.x`. To register the task, you can simply make a yaml with the name of the task in `task` and the class object in `class` using the `!function` prefix.

```yaml
task: squadv2
class: !function task.SQuAD2
```

This also applies to building group configurations with subtasks that are python classes.

```yaml
group: scrolls
task:
  - task: scrolls_qasper
    class: !function task.Qasper
  - task: scrolls_quality
    class: !function task.QuALITY
  - task: scrolls_narrativeqa
    class: !function task.NarrativeQA
  ...
```

You can also pass a custom argument to your class by accepting `config` in the custom class constructor.
Here's how to do it:

```yaml
task: 20_newsgroups
class: !function task.Unitxt
recipe: card=cards.20_newsgroups,template=templates.classification.multi_class.title
```

In this example, `recipe` is the custom argument for the `Unitxt` class.

## Beautifying Table Display

To avoid conflict, each task needs to be registered with a unique name. Because of this, slight variations of task are still counted as unique tasks and need to be named uniquely. This could be done by appending an additional naming that may refer to the variation such as in MMLU where the template used to evaluated for flan are differentiated from the default by the prefix `mmlu_flan_*`. Printing the full task names can easily clutter the results table at the end of the evaluation especially when you have a long list of tasks or are using a benchmark that comprises of many tasks. To make it more legible, you can use `task_alias` and `group_alias` to provide an alternative task name and group name that will be printed. For example in `mmlu_abstract_algebra.yaml` we set `task_alias` to `abstract_algebra`. In group configs, a `group_alias` for a group can also be set.

```yaml
"dataset_name": "abstract_algebra"
"description": "The following are multiple choice questions (with answers) about abstract\
  \ algebra.\n\n"
"include": "_default_template_yaml"
"task": "mmlu_abstract_algebra"
"task_alias": "abstract_algebra"
```

## Checking validity

After registering your task, you can now check on your data downloading and verify that the few-shot samples look as intended. Run the following command with your desired args:

```bash
python -m scripts.write_out \
    --output_base_path <path> \
    --tasks <your-task-name> \
    --sets <train | val | test> \
    --num_fewshot K \
    --num_examples N \
```

Open the file specified at the `--output_base_path <path>` and ensure it passes
a simple eye test.

## Versioning

One key feature in LM Evaluation Harness is the ability to version tasks and groups--that is, mark them with a specific version number that can be bumped whenever a breaking change is made.

This version info can be provided by adding the following to your new task or group config file:

```yaml
metadata:
  version: 0
```

Now, whenever a change needs to be made to your task in the future, please increase the version number by 1 so that users can differentiate the different task iterations and versions.

If you are incrementing a task's version, please also consider adding a changelog to the task's README.md noting the date, PR number, what version you have updated to, and a one-liner describing the change.

for example,

- \[Dec 25, 2023\] (PR #999) Version 0.0 -> 1.0: Fixed a bug with answer extraction that led to underestimated performance.

## Checking performance + equivalence

It's now time to check models' performance on your task! In the evaluation harness, we intend to support a wide range of evaluation tasks and setups, but prioritize the inclusion of already-proven benchmarks following the precise evaluation setups in the literature where possible.

To enable this, we provide a checklist that should be completed when contributing a new task, to enable accurate book-keeping and to ensure that tasks added to the library are well-tested and, where applicable, precedented.

### Task Validity Checklist

The checklist is the following:

For adding novel benchmarks/datasets to the library:

- [ ] Is the task an existing benchmark in the literature?
  - [ ] Have you referenced the original paper that introduced the task?
  - [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?

If other tasks on this dataset are already supported:

- [ ] Is the "Main" variant of this task clearly denoted?
- [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
- [ ] Have you noted which, if any, published evaluation setups are matched by this variant?

It is recommended to include a filled-out copy of this checklist in the README.md for the subfolder you are creating, if you have created a new subfolder in `lm_eval/tasks`.

**Finally, please add a short description of your task(s), along with a link to its subfolder in lm_eval/tasks, to [`lm_eval/tasks/README.md`](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/README.md) so that users can discover your task in the library, and follow the link to your README for more information about the variants supported, their task names, and the original source of the dataset and/or evaluation setup.**

## Submitting your task

You're all set! Now push your work and make a pull request to the `main` branch! Thanks for the contribution :). If there are any questions, please leave a message in the `#lm-thunderdome` channel on the EAI discord!


================================================
FILE: docs/python-api.md
================================================
# Python API

This guide covers programmatic usage of the evaluation harness in Python scripts and applications.

## Overview

The library provides three main ways to run evaluations programmatically:

| Function | Use Case |
|----------|----------|
| `simple_evaluate()` | Most common - accepts model name strings or LM objects |
| `EvaluatorConfig` | Config-based - load settings from YAML or dataclass |
| `evaluate()` | Low-level - full control over task dictionaries |

---

## Quick Start

The simplest way to run an evaluation:

```python
import lm_eval

results = lm_eval.simple_evaluate(
    model="hf",
    model_args="pretrained=gpt2",
    tasks=["hellaswag"],
)

print(results["results"])
```

---

## Using `simple_evaluate()`

The `simple_evaluate()` function is the recommended entry point for most use cases.

### Basic Usage

```python
import lm_eval

results = lm_eval.simple_evaluate(
    model="hf",
    model_args="pretrained=gpt2,dtype=float32",
    tasks=["hellaswag", "arc_easy"],
    num_fewshot=5,
    batch_size=8,
    device="cuda:0",
)
```

### With a Pre-initialized Model

```python
import lm_eval
from lm_eval.models.huggingface import HFLM

# Initialize model separately
lm = HFLM(pretrained="gpt2", batch_size=16)

results = lm_eval.simple_evaluate(
    model=lm,
    tasks=["hellaswag"],
    num_fewshot=0,
)
```

### With External Tasks

```python
import lm_eval
from lm_eval.tasks import TaskManager

# Include custom task definitions
task_manager = TaskManager(include_path="/path/to/custom/tasks")

results = lm_eval.simple_evaluate(
    model="hf",
    model_args="pretrained=gpt2",
    tasks=["my_custom_task"],
    task_manager=task_manager,
)
```

### Common Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | str or LM | Model name (e.g., "hf", "vllm") or LM instance |
| `model_args` | str or dict | Model constructor arguments |
| `tasks` | list[str] | Task names to evaluate |
| `num_fewshot` | int | Number of few-shot examples |
| `batch_size` | int or str | Batch size or "auto" |
| `device` | str | Device (cuda, cpu, mps) |
| `limit` | int or float | Limit examples per task |
| `log_samples` | bool | Save model inputs/outputs |
| `task_manager` | TaskManager | For external tasks |
| `gen_kwargs` | dict | Generation arguments |
| `apply_chat_template` | bool or str | Use chat template |
| `system_instruction` | str | System prompt |
| `fewshot_as_multiturn` | bool | Multi-turn few-shot |

See [`lm_eval/evaluator.py`](../lm_eval/evaluator.py) for the complete parameter list.

### Return Value

`simple_evaluate()` returns a dictionary with:

```python
{
    "results": {
        "task_name": {
            "metric_name": value,
            "metric_name,stderr": stderr_value,
        }
    },
    "configs": {...},      # Task configurations
    "versions": {...},     # Task versions
    "n-shot": {...},       # Few-shot counts
    "higher_is_better": {...},
    "n-samples": {...},
    "samples": {...},      # If log_samples=True
}
```

---

## Using `EvaluatorConfig`

The `EvaluatorConfig` class provides a structured way to manage evaluation settings.

### From YAML File

```python
from lm_eval.config.evaluate_config import EvaluatorConfig
import lm_eval

# Load configuration from YAML
config = EvaluatorConfig.from_config("eval_config.yaml")

# Process tasks
task_manager = config.process_tasks()

# Run evaluation
results = lm_eval.simple_evaluate(
    model=config.model,
    model_args=config.model_args,
    tasks=config.tasks,
    num_fewshot=config.num_fewshot,
    batch_size=config.batch_size,
    device=config.device,
    task_manager=task_manager,
    log_samples=config.log_samples,
    gen_kwargs=config.gen_kwargs,
    apply_chat_template=config.apply_chat_template,
    system_instruction=config.system_instruction,
)
```

### Direct Instantiation

```python
from lm_eval.config.evaluate_config import EvaluatorConfig

config = EvaluatorConfig(
    model="hf",
    model_args={"pretrained": "gpt2", "dtype": "float32"},
    tasks=["hellaswag", "arc_easy"],
    num_fewshot=5,
    batch_size=8,
    device="cuda:0",
    output_path="./results/",
    log_samples=True,
)

# Validate and process
task_manager = config.process_tasks()
```

### Config Fields

See the [Configuration Guide](config_files.md#config-schema) for all available fields.

---

## Using `evaluate()`

The `evaluate()` function provides lower-level control, accepting pre-built task dictionaries.

### With Custom Task Objects

```python
import lm_eval
from lm_eval.tasks import TaskManager, get_task_dict
from lm_eval.models.huggingface import HFLM

# Initialize model
lm = HFLM(pretrained="gpt2", batch_size=16)

# Build task dictionary
task_manager = TaskManager(include_path="/path/to/custom/tasks")
task_dict = get_task_dict(
    ["hellaswag", "my_custom_task"],
    task_manager
)

# Run evaluation
results = lm_eval.evaluate(
    lm=lm,
    task_dict=task_dict,
    num_fewshot=5,
    limit=100,
)
```

### Mixed Task Sources

```python
from lm_eval.tasks import get_task_dict

# Combine different task sources
task_dict = get_task_dict(
    [
        "mmlu",                           # Stock task name
        "my_custom_task",                 # From include_path
        {"task": "inline_task", ...},     # Inline config dict
    ],
    task_manager
)
```

---

## Custom Models

To evaluate a custom model, create a subclass of `lm_eval.api.model.LM`:

```python
from lm_eval.api.model import LM

class MyCustomLM(LM):
    def __init__(self, model, batch_size=1):
        super().__init__()
        self.model = model
        self._batch_size = batch_size

    def loglikelihood(self, requests):
        # Return list of (logprob, is_greedy) tuples
        ...

    def generate_until(self, requests):
        # Return list of generated strings
        ...

    def loglikelihood_rolling(self, requests):
        # Return list of (logprob, is_greedy) tuples
        ...

    @property
    def batch_size(self):
        return self._batch_size
```

Then use it with `simple_evaluate()`:

```python
my_model = load_my_model()
lm = MyCustomLM(model=my_model, batch_size=16)

results = lm_eval.simple_evaluate(
    model=lm,
    tasks=["hellaswag"],
)
```

For detailed guidance on implementing custom models, see the [Model Guide](model_guide.md).

---

## Logging

Configure logging for debugging:

```python
from lm_eval.utils import setup_logging

# Set log level
setup_logging("DEBUG")  # DEBUG, INFO, WARNING, ERROR

# Or use environment variable
import os
os.environ["LMEVAL_LOG_LEVEL"] = "DEBUG"
```

---

## Examples

### Batch Evaluation of Multiple Models

```python
import lm_eval

models = [
    "gpt2",
    "gpt2-medium",
    "gpt2-large",
]

all_results = {}
for model_name in models:
    results = lm_eval.simple_evaluate(
        model="hf",
        model_args=f"pretrained={model_name}",
        tasks=["hellaswag"],
        batch_size="auto",
    )
    all_results[model_name] = results["results"]
```

### Save and Load Results

```python
import json
import lm_eval
from lm_eval.utils import handle_non_serializable

results = lm_eval.simple_evaluate(
    model="hf",
    model_args="pretrained=gpt2",
    tasks=["hellaswag"],
)

# Save results
with open("results.json", "w") as f:
    json.dump(results, f, default=handle_non_serializable, indent=2)
```


================================================
FILE: docs/task_guide.md
================================================
# Task Configuration

The `lm-evaluation-harness` is meant to be an extensible and flexible framework within which many different evaluation tasks can be defined. All tasks in the new version of the harness are built around a YAML configuration file format.

These YAML configuration files, along with the current codebase commit hash, are intended to be shareable such that providing the YAML config enables another researcher to precisely replicate the evaluation setup used by another, in the case that the prompt or setup differs from standard `lm-eval` task implementations.

While adding a standard evaluation task on a new dataset can be occasionally as simple as swapping out a Hugging Face dataset path in an existing file, more specialized evaluation setups also exist. Here we'll provide a crash course on the more advanced logic implementable in YAML form available to users.

If your intended task relies on features beyond what is described in this guide, we'd love to hear about it! Feel free to open an issue describing the scenario on Github, create a PR to the project with a proposed implementation, or ask in the `#lm-thunderdome` channel on the EleutherAI discord.

## Configurations

Tasks are configured via the `TaskConfig` object. Below, we describe all fields usable within the object, and their role in defining a task.

### Parameters

Task naming + registration:

- **task** (`str`, defaults to None) — name of the task.
- **task_alias** (`str`, defaults to None) - Alias of the task name that will be printed in the final table results.
- **tag** (`str`, *optional*) — name of the task tags(s) a task belongs to. Enables one to run all tasks with a specified tag name at once.

Dataset configuration options:

- **dataset_path** (`str`) — The name of the dataset as listed by HF in the datasets Hub.
- **dataset_name**  (`str`, *optional*, defaults to None) — The name of what HF calls a “data instance” or sub-task of the benchmark. If your task does not contain any data instances, just leave this to default to None. (If you're familiar with the HF `datasets.load_dataset` function, these are just the first 2 arguments to it.)
- **dataset_kwargs** (`dict`, *optional*) — Auxiliary arguments that `datasets.load_dataset` accepts. This can be used to specify arguments such as `data_files` or `data_dir` if you want to use local datafiles such as json or csv.
- **custom_dataset** (`Callable`, *optional) - A function that returns a `dict[str, datasets.Dataset]` (<split_name>, dataset) object. This can be used to load a dataset from a custom source or to preprocess the dataset in a way that is not supported by the `datasets` library. Will have access to `metadata` field if defined (from config and passed to TaskManager), and `model_args` from runtime (if using `evaluate`).
- **training_split** (`str`, *optional*) — Split in the dataset to use as the training split.
- **validation_split** (`str`, *optional*) — Split in the dataset to use as the validation split.
- **test_split** (`str`, *optional*) — Split in the dataset to use as the test split.
- **fewshot_split** (`str`, *optional*) — Split in the dataset to draw few-shot exemplars from. assert that this not None if num_fewshot > 0.
- **process_docs** (`Callable`, *optional*) — Optionally define a function to apply to each HF dataset split, to preprocess all documents before being fed into prompt template rendering or other evaluation steps. Can be used to rename dataset columns, or to process documents into a format closer to the expected format expected by a prompt template.

Prompting / in-context formatting options:

- **use_prompt** (`str`, *optional*) — Name of prompt in promptsource to use. if defined, will overwrite doc_to_text, doc_to_target, and doc_to_choice.
- **description** (`str`, *optional*) — An optional prepended Jinja2 template or string which will be prepended to the few-shot examples passed into the model, often describing the task or providing instructions to a model, such as `"The following are questions (with answers) about {{subject}}.\n\n"`. No delimiters or spacing are inserted between the description and the first few-shot example.
- **doc_to_text** (`Union[Callable, str]`, *optional*) — Jinja2 template, string, or function to process a sample into the appropriate input for the model.
- **doc_to_target** (`Union[Callable, str]`, *optional*) — Jinja2 template, string, or function to process a sample into the appropriate target output for the model. For multiple choice tasks, this should return an index into the answer choice list of the correct answer.
- **doc_to_choice** (`Union[Callable, str]`, *optional*) — Jinja2 template, string, or function to process a sample into a list of possible string choices for `multiple_choice` tasks. Left undefined for `generate_until` tasks.
- **fewshot_delimiter** (`str`, *optional*, defaults to "\n\n") — String to insert between few-shot examples.
- **target_delimiter** (`str`, *optional*, defaults to `" "`) — String to insert between input and target output for the datapoint being tested.
- **gen_prefix** (`str`, *optional*) — String to append after the <|assistant|> token. For example, if the task is to generate a question, the gen_prefix could be "The answer is: " to prompt the model to generate an answer to the question. If not using a chat template then this string will be appended to the end of the prompt.

Runtime configuration options:

- **num_fewshot** (`int`, *optional*, defaults to 0) — Number of few-shot examples before the input.
- **batch_size** (`int`, *optional*, defaults to 1) — Batch size.

Scoring details:

- **metric_list** (`str`, *optional*, defaults to None) — A list of metrics to use for evaluation. See docs for expected format.
- **output_type** (`str`, *optional*, defaults to "generate_until") — Selects the type of model output for the given task. Options are `generate_until`, `loglikelihood`, `loglikelihood_rolling`, and `multiple_choice`.
- **generation_kwargs** (`dict`, *optional*) — Auxiliary arguments for the `generate` function from HF transformers library. Advanced keyword arguments may not be supported for non-HF LM classes.
- **repeats** (`int`, *optional*, defaults to 1) — Number of repeated runs through model for each sample. Can be used for cases such as self-consistency.
- **filter_list** (`Union[str, list]`, *optional*) — List of filters to postprocess model outputs. See below for further detail on the filter API.
- **should_decontaminate** (`bool`, *optional*, defaults to False) - Whether to decontaminate or not.
- **doc_to_decontamination_query** (`str`, *optional*) — Query for decontamination if `should_decontaminate` is True. If `should_decontaminate` is True but `doc_to_decontamination_query` is `None`, `doc_to_decontamination_query` will follow `doc_to_text`.

Other:

- **metadata** (`dict`, *optional*) — An optional field where arbitrary metadata can be passed. Most tasks should include a `version` key in this field that is used to denote the version of the yaml config. Other special metadata keys are: `num_fewshot`, to override the printed `n-shot` table column for a task. Will also be passed to the `custom_dataset` function if defined.

## Filters

A key component of the `lm-evaluation-harness` library is the `Filter` object. In a typical evaluation run of the harness, we take the formatted inputs and run them through our LM, with the appropriate output type (greedy or free-form generation, or loglikelihood-based comparative scoring).

After getting scores or output text from our LM on each `Instance` or document in the dataset, we then need to feed these responses into a metric or scoring function to return scores to a user.

However, certain tasks may require more complex behavior than directly turning over model outputs to a metric function. For example, we may want to post-process our output text by truncating it or extracting a model's answer, we may want to ensemble over multiple "takes" on a different document, et cetera.

**Detailed Aside**:
We do such post-processing by operating on *responses*, which are stored after running an LM on an `Instance` from the task in `Instance.resps`.

`resps` is a `List[str]` for each instance, and we pass a `List[List[<expected return type from model>]]` to our filters that is a list of `[instance.resps for instance in instances]`.

Our filters, after completing a pipeline, must return a `List[<expected return type from model>]` which we then unpack and store each element of in `Instance.filtered_resps` for the corresponding instance. Thus, we take as input a list of returns from our model for each doc, and must return a return from our model *without it being wrapped in a list* for each doc.
**End Aside**

A full list of supported filter operations can be found in `lm_eval/filters/__init__.py`. Contributions of new filter types are welcome!

### Multiple Filter Pipelines

Tasks need not be limited to a single filter pipeline. We enable users to run multiple, distinct, filter pipelines on *the same model outputs* generated in one run on a task.

As a case study, let's look at an implementation of solving the Gsm8k math word problem benchmark in `lm_eval/tasks/gsm8k/gsm8k-cot-self-consistency.yaml`. Here, we are emulating the setup used by [Self-Consistency Improves Chain of Thought Prompting](https://arxiv.org/abs/2203.11171), in which evaluation is performed by generating N chain-of-thought outputs from a model via temperature-based sampling, then selecting the answers output by the model at the end of the chains of thought, then majority voting across all those numeric answers.

Within our YAML file:

```yaml
...
repeats: 64
filter_list:
  - name: "score-first"
    filter:
      - function: "regex"
        regex_pattern: "The answer is (\\-?[0-9\\.\\,]*[0-9]+)"
      - function: "take_first"
  - name: "maj@64"
    filter:
      - function: "regex"
        regex_pattern: "The answer is (\\-?[0-9\\.\\,]*[0-9]+)"
      - function: "majority_vote"
      - function: "take_first"
  - name: "maj@8"
    filter:
      - function: "take_first_k"
        k: 8
      - function: "regex"
        regex_pattern: "The answer is (\\-?[0-9\\.\\,]*[0-9]+)"
      - function: "majority_vote"
      - function: "take_first"
```

We are able to provide multiple different filter pipelines, each with their own name and list of filters to apply in sequence.

Our first filter pipeline implements

- applying a regex to the model generations (extracting the number within the phrase "The answer is (number)")
- selecting only the first out of the 64 model answers

Then scoring this single answer.

```yaml
- name: "score-first"
  filter:
    - function: "regex"
      regex_pattern: "The answer is (\\-?[0-9\\.\\,]*[0-9]+)"
    - function: "take_first"
```

Our second filter pipeline, "maj@64", does majority voting across all 64 answers via:

- applying the same regex to all responses, to get the numerical answer from the model for each of the 64 responses per problem
- applying majority voting to all responses, which then returns a length-1 `[<majority answer>]` list for each
- taking the first element of this length-1 list, to then score the sole response `<majority answer>` for each document.

```yaml
- name: "maj@64"
  filter:
    - function: "regex"
      regex_pattern: "The answer is (\\-?[0-9\\.\\,]*[0-9]+)"
    - function: "majority_vote"
    - function: "take_first"
```

Our final filter pipeline, "maj@8", does majority voting across the first 8 of the model's responses per document via:

- subsetting the len-64 list of responses `[answer1, answer2, ..., answer64]` to `[answer1, answer2, ..., answer8]` for each document
- performing the same sequence of filters on these new sets of 8 responses, for each document.

```yaml
- name: "maj@8"
  filter:
    - function: "take_first_k"
      k: 8
    - function: "regex"
      regex_pattern: "The answer is (\\-?[0-9\\.\\,]*[0-9]+)"
    - function: "majority_vote"
    - function: "take_first"
```

Thus, given the 64 responses from our LM on each document, we can report metrics on these responses in these 3 different ways, as defined by our filter pipelines.

### Adding a custom filter

Just like adding a custom model with `register_model` decorator one is able to do the same with filters, for example

```python
from lm_eval.api.filter import Filter
from lm_eval.api.registry import register_filter

@register_filter("new_filter")
class NewFilter(Filter)
    ...
```

## Embedded Python Code

Use can use python functions for certain arguments by using the `!function` operator after the argument name followed by `<filename>.<pythonfunctionname>`. This feature can be used for the following arguments:

1. `doc_to_text`
2. `doc_to_target`
3. `doc_to_choice`
4. `aggregation` for a `metric` in `metric_list`

## (No Longer Recommended) Direct `Task` Subclassing

The prior implementation method of new tasks was to subclass `Task`. While we intend to migrate all tasks to the new YAML implementation option going forward, it remains possible to subclass the Task class and implement custom logic. For more information, see `docs/task_guide.md` in v0.3.0 of the `lm-evaluation-harness`.

## Including a Base YAML

You can base a YAML on another YAML file as a template. This can be handy when you need to just change the prompt for `doc_to_text` but keep the rest the same or change `filters` to compare which is better. Simply use `include` in the YAML file and write the name of the template you want to base from. This assumes that the base template is in the same directory. Otherwise, You will need to define the full path.

```yaml
include: <YAML filename or with full path>
...
```

You can find an example of how to use this feature at [gsm8k-cot-self-consistency.yaml](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/gsm8k/gsm8k-cot-self-consistency.yaml) where it is based off [gsm8k-cot.yaml](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/gsm8k/gsm8k-cot.yaml)

## Passing Arguments to Metrics

Metrics can be defined in the `metric_list` argument when building the YAML config. Multiple metrics can be listed along with any auxiliary arguments. For example, setting the [`exact_match` metric](https://github.com/huggingface/evaluate/tree/main/metrics/exact_match), auxiliary arguments such as `ignore_case`, `ignore_punctuation`, `regexes_to_ignore` can be listed as well. They will be added to the metric function as `kwargs`. Some metrics have predefined values for `aggregation` and `higher_is_better` so listing the metric name only can be sufficient.

```yaml
metric_list:
  - metric: acc
  - metric: exact_match
    aggregation: mean
    higher_is_better: true
    ignore_case: true
    ignore_punctuation: false
    regexes_to_ignore:
      - ","
      - "\\$"
```

### Natively Supported Metrics

Here we list all metrics currently supported natively in `lm-eval`:

Metrics:

- `acc` (accuracy)
- `acc_norm` (length-normalized accuracy)
- `acc_mutual_info` (baseline loglikelihood - normalized accuracy)
- `perplexity`
- `word_perplexity` (perplexity per word)
- `byte_perplexity` (perplexity per byte)
- `bits_per_byte`
- `matthews_corrcoef` (Matthews correlation coefficient)
- `f1` (F1 score)
- `bleu`
- `chrf`
- `ter`

Aggregation functions:

- `mean`
- `median`
- `perplexity`
- `weighted_perplexity`
- `bits_per_byte`

### Adding a Multiple Choice Metric

Adding a multiple choice metric has a few steps. To get it working you need to:

1. register a metric function
2. register an aggregation function
3. update the `Task` definition to make sure the correct arguments are passed

The default metric and aggregation functions are in `lm_eval/api/metrics.py`, and you can add a function there if it's for general use. The metrics are towards the bottom of the file and look like this:

```python
@register_metric(
    metric="mcc",
    higher_is_better=True,
    output_type="multiple_choice",
    aggregation="matthews_corrcoef",
)
def mcc_fn(items):  # This is a passthrough function
    return items
```

Note that many of these are passthrough functions, and for multiple choice (at least) this function is never actually called.

Aggregation functions are defined towards the top of the file, here's an example:

```python
@register_aggregation("matthews_corrcoef")
def matthews_corrcoef(items):
    unzipped_list = list(zip(*items))
    golds = unzipped_list[0]
    preds = unzipped_list[1]
    return sklearn.metrics.matthews_corrcoef(golds, preds)
```

This function returns a single numeric value. The input is defined in `Task.process_results` in `lm_eval/api/task.py`. There's a section that looks like this:

```python
result_dict = {
    **({"acc": acc} if "acc" in use_metric else {}),
    **({"f1": (gold, pred)} if "f1" in use_metric else {}),
    **({"mcc": (gold, pred)} if "mcc" in use_metric else {}),
    **({"acc_norm": acc_norm} if "acc_norm" in use_metric else {}),
    **({"exact_match": exact_match} if "exact_match" in use_metric else {}),
}
```

The value here determines the input to the aggregation function, though the name used matches the metric function. These metrics all have simple needs and just need the accuracy or gold and predicted values, but immediately below this there are examples of metrics with more complicated needs you can use as reference.

## Good Reference Tasks

Contributing a new task can be daunting! Luckily, much of the work has often been done for you in a different, similarly evaluated task. Good examples of task implementations to study include:

Multiple choice tasks:

- SciQ (`lm_eval/tasks/sciq/sciq.yaml`)

Corpus perplexity evaluations:

- Wikitext (`lm_eval/tasks/wikitext/wikitext.yaml`)

Generative tasks:

- GSM8k (`lm_eval/tasks/gsm8k/gsm8k.yaml`)

Tasks using complex filtering:

- GSM8k with CoT (+ with Self-Consistency): (`lm_eval/tasks/gsm8k/gsm8k-cot.yaml` ; `lm_eval/tasks/gsm8k/gsm8k-cot-self-consistency.yaml`)

# Group Configuration

When evaluating a language model, it is not unusual to test across a number of tasks that may not be related to one another in order to assess a variety of capabilities. To this end, it may be cumbersome to have to list the set of tasks or add a new group name to each yaml of each individual task.

To solve this, we can create a **group** yaml config. This is a config that contains the names of the tasks that should be included in a particular group. The config consists of two main keys: a `group` key which denotes the name of the group (as it would be called from the command line, e.g. `mmlu`) and a `task` key which is where we can list the tasks. The tasks listed in `task` are the task names that have been registered. A good example of a group yaml config can be found at [../lm_eval/tasks/mmlu/default/_mmlu.yaml]. See also the [New Task Guide](./new_task_guide.md) for a more in-depth and tutorial-esque explanation of how to write complex GroupConfigs.

## Configurations

Groups are configured via the `GroupConfig` object. Below, we describe all fields usable within the object, and their role in defining a task.

### Parameters

- **group** (`str`, defaults to `None`) — name of the group. Used to invoke it from the command line.
- **group_alias** (`str`, defaults to `None`) - Alternative name for the group that will be printed in the table output.
- **task** (`Union[str, list]`, defaults to `None`) - List of tasks that constitute the group.
- **aggregate_metric_list** (`list`, defaults to `None`) - similar to `metric_list` in TaskConfigs, provide a list of configurations for metrics that should be aggregated across subtasks. Leaving empty will result in no aggregation being performed for this group. Keys for each list entry are:
  - `metric: str` - the name of the metric to aggregate over (all subtasks must report a metric holding this name.)
  - `aggregation: str` - what aggregation function to apply to aggregate these per-subtask metrics. **currently, only `mean` is supported.**
  - `weight_by_size: bool = True` whether to perform micro- averaging (`True`) or macro- (`False`) averaging of subtasks' accuracy scores when reporting the group's metric. MMLU, for example, averages over per-document accuracies (the *micro average*), resulting in the same accuracy as if one simply concatenated all 57 subjects into a single dataset and evaluated accuracy on that dataset.
  - `filter_list: Union[str, List[str]] = "none"` - what filter keys one should match on to aggregate results. For example, if trying to aggregate over the `exact_match` metric using `strict-match` filter for `bbh_cot_zeroshot`, then set this to be `filter_list: "strict-match"`.  
- **metadata** (`dict`, *optional*) - As with TaskConfigs, a field where extra config metadata can be passed. set the `num_fewshot` key within this to override the printed n_shot value in a results table for your group, for example.


================================================
FILE: examples/lm-eval-overview.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Qw83KAePAhaS"
   },
   "source": [
    "# Releasing LM-Evaluation-Harness v0.4.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Z7k2vq1iAdqr"
   },
   "source": [
    "With the vast amount of work done in the field today, it helps to have a tool that people can use easily to share their results and use to check others to ensure reported numbers are valid. The LM Evaluation Harness is one such tool the community has used extensively. We want to continue to support the community and with that in mind, we’re excited to announce a major update on the LM Evaluation Harness to further our goal for open and accessible AI research."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0gDoM0AJAvEc"
   },
   "source": [
    "Our refactor stems from our desires to make the following believed best practices easier to carry out.  \n",
    "\n",
    "1.   Never copy results from other papers\n",
    "2.   Always share your exact prompts\n",
    "3.   Always provide model outputs\n",
    "4.   Qualitatively review a small batch of outputs before running evaluation jobs at scale\n",
    "\n",
    "We also wanted to make the library a better experience to use and to contribute or design evaluations within. New features in the new release that serve this purpose include:\n",
    "\n",
    "1. Faster Evaluation Runtimes (accelerated data-parallel inference with HF Transformers + Accelerate, and commonly used or faster inference libraries such as vLLM and Llama-CPP)\n",
    "2. Easier addition and sharing of new tasks (YAML-based task config formats, allowing single-file sharing of custom tasks)\n",
    "3. More configurability, for more advanced workflows and easier operation with modifying prompts\n",
    "4. Better logging of data at runtime and post-hoc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nnwsOpjda_YW"
   },
   "source": [
    "In this notebook we will be going through a short tutorial on how things work."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zAov81vTbL2K"
   },
   "source": [
    "## Install LM-Eval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "8hiosGzq_qZg",
    "outputId": "6ab73e5e-1f54-417e-a388-07e0d870b132"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting git+https://github.com/EleutherAI/lm-evaluation-harness.git@big-refactor\n",
      "  Cloning https://github.com/EleutherAI/lm-evaluation-harness.git (to revision big-refactor) to /tmp/pip-req-build-tnssql5s\n",
      "  Running command git clone --filter=blob:none --quiet https://github.com/EleutherAI/lm-evaluation-harness.git /tmp/pip-req-build-tnssql5s\n",
      "  Running command git checkout -b big-refactor --track origin/big-refactor\n",
      "  Switched to a new branch 'big-refactor'\n",
      "  Branch 'big-refactor' set up to track remote branch 'big-refactor' from 'origin'.\n",
      "  Resolved https://github.com/EleutherAI/lm-evaluation-harness.git to commit 42f486ee49b65926a444cb0620870a39a5b4b0a8\n",
      "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
      "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
      "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
      "Collecting accelerate>=0.21.0 (from lm-eval==1.0.0)\n",
      "  Downloading accelerate-0.24.1-py3-none-any.whl (261 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m261.4/261.4 kB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting evaluate (from lm-eval==1.0.0)\n",
      "  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.1/84.1 kB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting datasets>=2.0.0 (from lm-eval==1.0.0)\n",
      "  Downloading datasets-2.15.0-py3-none-any.whl (521 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m521.2/521.2 kB\u001b[0m \u001b[31m9.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting jsonlines (from lm-eval==1.0.0)\n",
      "  Downloading jsonlines-4.0.0-py3-none-any.whl (8.7 kB)\n",
      "Requirement already satisfied: numexpr in /usr/local/lib/python3.10/dist-packages (from lm-eval==1.0.0) (2.8.7)\n",
      "Collecting peft>=0.2.0 (from lm-eval==1.0.0)\n",
      "  Downloading peft-0.6.2-py3-none-any.whl (174 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m174.7/174.7 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting pybind11>=2.6.2 (from lm-eval==1.0.0)\n",
      "  Downloading pybind11-2.11.1-py3-none-any.whl (227 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m227.7/227.7 kB\u001b[0m \u001b[31m12.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting pytablewriter (from lm-eval==1.0.0)\n",
      "  Downloading pytablewriter-1.2.0-py3-none-any.whl (111 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m111.1/111.1 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting rouge-score>=0.0.4 (from lm-eval==1.0.0)\n",
      "  Downloading rouge_score-0.1.2.tar.gz (17 kB)\n",
      "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "Collecting sacrebleu>=1.5.0 (from lm-eval==1.0.0)\n",
      "  Downloading sacrebleu-2.3.2-py3-none-any.whl (119 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.7/119.7 kB\u001b[0m \u001b[31m8.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: scikit-learn>=0.24.1 in /usr/local/lib/python3.10/dist-packages (from lm-eval==1.0.0) (1.2.2)\n",
      "Collecting sqlitedict (from lm-eval==1.0.0)\n",
      "  Downloading sqlitedict-2.1.0.tar.gz (21 kB)\n",
      "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "Requirement already satisfied: torch>=1.8 in /usr/local/lib/python3.10/dist-packages (from lm-eval==1.0.0) (2.1.0+cu118)\n",
      "Collecting tqdm-multiprocess (from lm-eval==1.0.0)\n",
      "  Downloading tqdm_multiprocess-0.0.11-py3-none-any.whl (9.8 kB)\n",
      "Requirement already satisfied: transformers>=4.1 in /usr/local/lib/python3.10/dist-packages (from lm-eval==1.0.0) (4.35.2)\n",
      "Collecting zstandard (from lm-eval==1.0.0)\n",
      "  Downloading zstandard-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.4/5.4 MB\u001b[0m \u001b[31m29.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.21.0->lm-eval==1.0.0) (1.23.5)\n",
      "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.21.0->lm-eval==1.0.0) (23.2)\n",
      "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.21.0->lm-eval==1.0.0) (5.9.5)\n",
      "Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.21.0->lm-eval==1.0.0) (6.0.1)\n",
      "Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.21.0->lm-eval==1.0.0) (0.19.4)\n",
      "Requirement already satisfied: pyarrow>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==1.0.0) (9.0.0)\n",
      "Collecting pyarrow-hotfix (from datasets>=2.0.0->lm-eval==1.0.0)\n",
      "  Downloading pyarrow_hotfix-0.6-py3-none-any.whl (7.9 kB)\n",
      "Collecting dill<0.3.8,>=0.3.0 (from datasets>=2.0.0->lm-eval==1.0.0)\n",
      "  Downloading dill-0.3.7-py3-none-any.whl (115 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m14.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==1.0.0) (1.5.3)\n",
      "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==1.0.0) (2.31.0)\n",
      "Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==1.0.0) (4.66.1)\n",
      "Requirement already satisfied: xxhash in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==1.0.0) (3.4.1)\n",
      "Collecting multiprocess (from datasets>=2.0.0->lm-eval==1.0.0)\n",
      "  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m19.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: fsspec[http]<=2023.10.0,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==1.0.0) (2023.6.0)\n",
      "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets>=2.0.0->lm-eval==1.0.0) (3.8.6)\n",
      "Collecting responses<0.19 (from evaluate->lm-eval==1.0.0)\n",
      "  Downloading responses-0.18.0-py3-none-any.whl (38 kB)\n",
      "Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from peft>=0.2.0->lm-eval==1.0.0) (0.4.0)\n",
      "Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from rouge-score>=0.0.4->lm-eval==1.0.0) (1.4.0)\n",
      "Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from rouge-score>=0.0.4->lm-eval==1.0.0) (3.8.1)\n",
      "Requirement already satisfied: six>=1.14.0 in /usr/local/lib/python3.10/dist-packages (from rouge-score>=0.0.4->lm-eval==1.0.0) (1.16.0)\n",
      "Collecting portalocker (from sacrebleu>=1.5.0->lm-eval==1.0.0)\n",
      "  Downloading portalocker-2.8.2-py3-none-any.whl (17 kB)\n",
      "Requirement already satisfied: regex in /usr/local/lib/python3.10/dist-packages (from sacrebleu>=1.5.0->lm-eval==1.0.0) (2023.6.3)\n",
      "Requirement already satisfied: tabulate>=0.8.9 in /usr/local/lib/python3.10/dist-packages (from sacrebleu>=1.5.0->lm-eval==1.0.0) (0.9.0)\n",
      "Collecting colorama (from sacrebleu>=1.5.0->lm-eval==1.0.0)\n",
      "  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)\n",
      "Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from sacrebleu>=1.5.0->lm-eval==1.0.0) (4.9.3)\n",
      "Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.24.1->lm-eval==1.0.0) (1.11.3)\n",
      "Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.24.1->lm-eval==1.0.0) (1.3.2)\n",
      "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.24.1->lm-eval==1.0.0) (3.2.0)\n",
      "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.8->lm-eval==1.0.0) (3.13.1)\n",
      "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.8->lm-eval==1.0.0) (4.5.0)\n",
      "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.8->lm-eval==1.0.0) (1.12)\n",
      "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.8->lm-eval==1.0.0) (3.2.1)\n",
      "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.8->lm-eval==1.0.0) (3.1.2)\n",
      "Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.8->lm-eval==1.0.0) (2.1.0)\n",
      "Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.1->lm-eval==1.0.0) (0.15.0)\n",
      "Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonlines->lm-eval==1.0.0) (23.1.0)\n",
      "Requirement already satisfied: setuptools>=38.3.0 in /usr/local/lib/python3.10/dist-packages (from pytablewriter->lm-eval==1.0.0) (67.7.2)\n",
      "Collecting DataProperty<2,>=1.0.1 (from pytablewriter->lm-eval==1.0.0)\n",
      "  Downloading DataProperty-1.0.1-py3-none-any.whl (27 kB)\n",
      "Collecting mbstrdecoder<2,>=1.0.0 (from pytablewriter->lm-eval==1.0.0)\n",
      "  Downloading mbstrdecoder-1.1.3-py3-none-any.whl (7.8 kB)\n",
      "Collecting pathvalidate<4,>=2.3.0 (from pytablewriter->lm-eval==1.0.0)\n",
      "  Downloading pathvalidate-3.2.0-py3-none-any.whl (23 kB)\n",
      "Collecting tabledata<2,>=1.3.1 (from pytablewriter->lm-eval==1.0.0)\n",
      "  Downloading tabledata-1.3.3-py3-none-any.whl (11 kB)\n",
      "Collecting tcolorpy<1,>=0.0.5 (from pytablewriter->lm-eval==1.0.0)\n",
      "  Downloading tcolorpy-0.1.4-py3-none-any.whl (7.9 kB)\n",
      "Collecting typepy[datetime]<2,>=1.3.2 (from pytablewriter->lm-eval==1.0.0)\n",
      "  Downloading typepy-1.3.2-py3-none-any.whl (31 kB)\n",
      "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==1.0.0) (3.3.2)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==1.0.0) (6.0.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==1.0.0) (4.0.3)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==1.0.0) (1.9.2)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==1.0.0) (1.4.0)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets>=2.0.0->lm-eval==1.0.0) (1.3.1)\n",
      "Requirement already satisfied: chardet<6,>=3.0.4 in /usr/local/lib/python3.10/dist-packages (from mbstrdecoder<2,>=1.0.0->pytablewriter->lm-eval==1.0.0) (5.2.0)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.0.0->lm-eval==1.0.0) (3.4)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.0.0->lm-eval==1.0.0) (2.0.7)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets>=2.0.0->lm-eval==1.0.0) (2023.7.22)\n",
      "Requirement already satisfied: python-dateutil<3.0.0,>=2.8.0 in /usr/local/lib/python3.10/dist-packages (from typepy[datetime]<2,>=1.3.2->pytablewriter->lm-eval==1.0.0) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2018.9 in /usr/local/lib/python3.10/dist-packages (from typepy[datetime]<2,>=1.3.2->pytablewriter->lm-eval==1.0.0) (2023.3.post1)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.8->lm-eval==1.0.0) (2.1.3)\n",
      "Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->rouge-score>=0.0.4->lm-eval==1.0.0) (8.1.7)\n",
      "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.8->lm-eval==1.0.0) (1.3.0)\n",
      "Building wheels for collected packages: lm-eval, rouge-score, sqlitedict\n",
      "  Building wheel for lm-eval (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
      "  Created wheel for lm-eval: filename=lm_eval-1.0.0-py3-none-any.whl size=994254 sha256=88356155b19f2891981ecef948326ad6ce8ca40a6009378410ec20d0e225995a\n",
      "  Stored in directory: /tmp/pip-ephem-wheel-cache-9v6ye7h3/wheels/17/01/26/599c0779e9858a70a73fa8a306699b5b9a868f820c225457b0\n",
      "  Building wheel for rouge-score (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=6bb0d44e4881972c43ce194e7cb65233d309758cb15f0dec54590d3d2efcfc36\n",
      "  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4\n",
      "  Building wheel for sqlitedict (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Created wheel for sqlitedict: filename=sqlitedict-2.1.0-py3-none-any.whl size=16863 sha256=5747f7dd73ddf3d8fbcebf51b5e4f718fabe1e94bccdf16d2f22a2e65ee7fdf4\n",
      "  Stored in directory: /root/.cache/pip/wheels/79/d6/e7/304e0e6cb2221022c26d8161f7c23cd4f259a9e41e8bbcfabd\n",
      "Successfully built lm-eval rouge-score sqlitedict\n",
      "Installing collected packages: sqlitedict, zstandard, tcolorpy, pybind11, pyarrow-hotfix, portalocker, pathvalidate, mbstrdecoder, jsonlines, dill, colorama, typepy, tqdm-multiprocess, sacrebleu, rouge-score, responses, multiprocess, accelerate, datasets, DataProperty, tabledata, peft, evaluate, pytablewriter, lm-eval\n",
      "Successfully installed DataProperty-1.0.1 accelerate-0.24.1 colorama-0.4.6 datasets-2.15.0 dill-0.3.7 evaluate-0.4.1 jsonlines-4.0.0 lm-eval-1.0.0 mbstrdecoder-1.1.3 multiprocess-0.70.15 pathvalidate-3.2.0 peft-0.6.2 portalocker-2.8.2 pyarrow-hotfix-0.6 pybind11-2.11.1 pytablewriter-1.2.0 responses-0.18.0 rouge-score-0.1.2 sacrebleu-2.3.2 sqlitedict-2.1.0 tabledata-1.3.3 tcolorpy-0.1.4 tqdm-multiprocess-0.0.11 typepy-1.3.2 zstandard-0.22.0\n"
     ]
    }
   ],
   "source": [
    "# Install LM-Eval\n",
    "!pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 0,
     "referenced_widgets": [
      "a1d3a8aa016544a78e8821c8f6199e06",
      "f61ed33fad754146bdd2ac9db1ba1c48",
      "bfa0af6aeff344c6845e1080a878e92e",
      "fd1ad9e0367d4004aae853b91c3a7617",
      "6b2d90209ec14230b3d58a74ac9b83bf",
      "a73f357065d34d7baf0453ae4a8d75e2",
      "46f521b73fd943c081c648fd873ebc0a",
      "7c5689bc13684db8a22681f41863dddd",
      "48763b6233374554ae76035c0483066f",
      "4986a21eb560448fa79f4b25cde48951",
      "aed3acd2f2d74003b44079c333a0698e"
     ]
    },
    "id": "uyO5MaKkZyah",
    "outputId": "d46e8096-5086-4e49-967e-ea33d4a2a335"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a1d3a8aa016544a78e8821c8f6199e06",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading builder script:   0%|          | 0.00/5.67k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8rfUeX6n_wkK"
   },
   "source": [
    "## Create new evaluation tasks with config-based tasks\n",
    "\n",
    "Even within the same task, many works have reported numbers based on different choices of evaluation. Some report on the test sets, validation sets, or even subset of the training sets. Others have specialized prompts and verbalizers. We introduce YAMLs to allow users to easily make different variations. By leveraging the YAML configs to configure evaluations, the refactored LM-Eval takes the methods of the `Task` object and makes them configurable by setting the appropriate attributes in the config file. There, users can set the tasks they want by setting the name of the HF dataset (local tasks are also possible), the dataset splits used, and much more. Key configurations relating to prompting, such as `doc_to_text`, previously implemented as a method of the same name, are now configurable with jinja2 to allow high-level scripting to transform a HF dataset to text string as input to the model.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HYFUhhfOSJKe"
   },
   "source": [
    "A core-feature to LM-Eval is to configure tasks with YAML configs. With configs, you can fill preset fields to easily set up a task.\n",
    "\n",
    "Here, we write a demo YAML config for a multiple-choice evaluation of BoolQ:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "bg3dGROW-V39"
   },
   "outputs": [],
   "source": [
    "YAML_boolq_string = \"\"\"\n",
    "task: demo_boolq\n",
    "dataset_path: aps/super_glue\n",
    "dataset_name: boolq\n",
    "output_type: multiple_choice\n",
    "training_split: train\n",
    "validation_split: validation\n",
    "doc_to_text: \"{{passage}}\\nQuestion: {{question}}?\\nAnswer:\"\n",
    "doc_to_target: label\n",
    "doc_to_choice: [\"no\", \"yes\"]\n",
    "should_decontaminate: true\n",
    "doc_to_decontamination_query: passage\n",
    "metric_list:\n",
    "  - metric: acc\n",
    "\"\"\"\n",
    "with open(\"boolq.yaml\", \"w\") as f:\n",
    "    f.write(YAML_boolq_string)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can now run evaluation on this task, by pointing to the config file we've just created:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "LOUHK7PtQfq4"
   },
   "outputs": [],
   "source": "%env LMEVAL_LOG_LEVEL=DEBUG\n!lm_eval \\\n    --model hf \\\n    --model_args pretrained=EleutherAI/pythia-2.8b \\\n    --include_path ./ \\\n    --tasks demo_boolq \\\n    --limit 10"
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LOUHK7PtQfq4"
   },
   "source": [
    "Often, tasks are part of a larger group used to measure different capabilities. The dynamism of the field today means new dimensions of evaluation can come about which would mix and match new and older tasks alike. In LM-Eval, We can also group tasks and call that the group name to evaluate on a set of tasks easily. In this instance, let's evaluate the tag `yes_or_no_tasks` which comprise of the tasks `demo_boolq` and `demo_cola`; tasks which are multiple choice tasks with options `yes` and `no` as the name suggests.\n",
    "\n",
    "<!-- making new groups is easier than ever, allowing user to work bottom-up by makiing individual tasks and linking them to a group or Top-Down, making a new group by listing existing tasks.\n",
    "\n",
    "We also show the aggregate across samples besides only showing the aggregation between subtasks. This may come in handy when certain groups want to be aggregated as a single task. -->\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "fthNg3ywO-kA"
   },
   "outputs": [],
   "source": [
    "YAML_cola_string = \"\"\"\n",
    "tag: yes_or_no_tasks\n",
    "task: demo_cola\n",
    "dataset_path: glue\n",
    "dataset_name: cola\n",
    "output_type: multiple_choice\n",
    "training_split: train\n",
    "validation_split: validation\n",
    "doc_to_text: \"{{sentence}}\\nQuestion: Does this sentence make sense?\\nAnswer:\"\n",
    "doc_to_target: label\n",
    "doc_to_choice: [\"no\", \"yes\"]\n",
    "should_decontaminate: true\n",
    "doc_to_decontamination_query: sentence\n",
    "metric_list:\n",
    "  - metric: acc\n",
    "\"\"\"\n",
    "with open(\"cola.yaml\", \"w\") as f:\n",
    "    f.write(YAML_cola_string)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "XceRKCuuDtbn"
   },
   "outputs": [],
   "source": "# !accelerate launch --no_python\n%env LMEVAL_LOG_LEVEL=DEBUG\n!lm_eval \\\n    --model hf \\\n    --model_args pretrained=EleutherAI/pythia-2.8b \\\n    --include_path ./ \\\n    --tasks yes_or_no_tasks \\\n    --limit 10 \\\n    --output output/yes_or_no_tasks/ \\\n    --log_samples"
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XceRKCuuDtbn"
   },
   "source": [
    "## Edit Prompt Templates Quickly\n",
    "\n",
    "The following is a yaml made to evaluate the specific subtask of `high_school_geography` from MMLU. It uses the standard prompt where the we choose the letters from the options with most likelihood as the model's prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "GTFvdt9kSlBG"
   },
   "outputs": [],
   "source": [
    "YAML_mmlu_geo_string = \"\"\"\n",
    "task: demo_mmlu_high_school_geography\n",
    "dataset_path: cais/mmlu\n",
    "dataset_name: high_school_geography\n",
    "description: \"The following are multiple choice questions (with answers) about high school geography.\\n\\n\"\n",
    "test_split: test\n",
    "fewshot_split: dev\n",
    "fewshot_config:\n",
    "  sampler: first_n\n",
    "output_type: multiple_choice\n",
    "doc_to_text: \"{{question.strip()}}\\nA. {{choices[0]}}\\nB. {{choices[1]}}\\nC. {{choices[2]}}\\nD. {{choices[3]}}\\nAnswer:\"\n",
    "doc_to_choice: [\"A\", \"B\", \"C\", \"D\"]\n",
    "doc_to_target: answer\n",
    "metric_list:\n",
    "  - metric: acc\n",
    "    aggregation: mean\n",
    "    higher_is_better: true\n",
    "  - metric: acc_norm\n",
    "    aggregation: mean\n",
    "    higher_is_better: true\n",
    "\"\"\"\n",
    "with open(\"mmlu_high_school_geography.yaml\", \"w\") as f:\n",
    "    f.write(YAML_mmlu_geo_string)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "jyKOfCsKb-xy"
   },
   "outputs": [],
   "source": "# !accelerate launch --no_python\n%env LMEVAL_LOG_LEVEL=DEBUG\n!lm_eval \\\n    --mode

Download .txt

Showing preview only (822K chars total). Download the full file or copy to clipboard to get everything.

gitextract_npmvb7su/

├── .github/
│   └── workflows/
│       ├── new_tasks.yml
│       ├── publish.yml
│       └── unit_tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CITATION.bib
├── CODEOWNERS
├── LICENSE.md
├── MANIFEST.in
├── README.md
├── docs/
│   ├── API_guide.md
│   ├── CONTRIBUTING.md
│   ├── README.md
│   ├── chat-template-readme.md
│   ├── config_files.md
│   ├── decontamination.md
│   ├── footguns.md
│   ├── interface.md
│   ├── model_guide.md
│   ├── new_task_guide.md
│   ├── python-api.md
│   └── task_guide.md
├── examples/
│   ├── lm-eval-overview.ipynb
│   ├── transformer-lens.py
│   ├── visualize-wandb.ipynb
│   └── visualize-zeno.ipynb
├── ignore.txt
├── lm_eval/
│   ├── __init__.py
│   ├── __main__.py
│   ├── _cli/
│   │   ├── __init__.py
│   │   ├── harness.py
│   │   ├── ls.py
│   │   ├── run.py
│   │   ├── subcommand.py
│   │   ├── utils.py
│   │   └── validate.py
│   ├── api/
│   │   ├── __init__.py
│   │   ├── filter.py
│   │   ├── group.py
│   │   ├── instance.py
│   │   ├── metrics.py
│   │   ├── model.py
│   │   ├── registry.py
│   │   ├── samplers.py
│   │   ├── task.py
│   │   └── utils.py
│   ├── caching/
│   │   ├── __init__.py
│   │   └── cache.py
│   ├── config/
│   │   ├── __init__.py
│   │   ├── evaluate_config.py
│   │   ├── group.py
│   │   └── task.py
│   ├── decontamination/
│   │   ├── __init__.py
│   │   ├── archiver.py
│   │   ├── decontaminate.py
│   │   └── janitor.py
│   ├── defaults.py
│   ├── evaluator.py
│   ├── evaluator_utils.py
│   ├── filters/
│   │   ├── __init__.py
│   │   ├── custom.py
│   │   ├── decontamination.py
│   │   ├── extraction.py
│   │   ├── selection.py
│   │   └── transformation.py
│   ├── loggers/
│   │   ├── __init__.py
│   │   ├── evaluation_tracker.py
│   │   ├── utils.py
│   │   └── wandb_logger.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── anthropic_llms.py
│   │   ├── api_models.py
│   │   ├── dummy.py
│   │   ├── gguf.py
│   │   ├── hf_audiolm.py
│   │   ├── hf_steered.py
│   │   ├── hf_vlms.py
│   │   ├── huggingface.py
│   │   ├── ibm_watsonx_ai.py
│   │   ├── mamba_lm.py
│   │   ├── megatron_lm.py
│   │   ├── mistral3.py
│   │   ├── nemo_lm.py
│   │   ├── neuron_optimum.py
│   │   ├── openai_completions.py
│   │   ├── optimum_habana.py
│   │   ├── optimum_ipex.py
│   │   ├── optimum_lm.py
│   │   ├── sglang_causallms.py
│   │   ├── sglang_generate_API.py
│   │   ├── textsynth.py
│   │   ├── utils.py
│   │   ├── utils_hf.py
│   │   ├── vllm_causallms.py
│   │   ├── vllm_vlms.py
│   │   └── winml.py
│   ├── prompts/
│   │   └── __init__.py
│   ├── result_schema.py
│   ├── tasks/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── _factory.py
│   │   ├── _index.py
│   │   ├── _yaml_loader.py
│   │   ├── aclue/
│   │   │   ├── README.md
│   │   │   ├── _aclue.yaml
│   │   │   ├── _default_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── aclue_ancient_chinese_culture.yaml
│   │   │   ├── aclue_ancient_literature.yaml
│   │   │   ├── aclue_ancient_medical.yaml
│   │   │   ├── aclue_ancient_phonetics.yaml
│   │   │   ├── aclue_basic_ancient_chinese.yaml
│   │   │   ├── aclue_couplet_prediction.yaml
│   │   │   ├── aclue_homographic_character_resolution.yaml
│   │   │   ├── aclue_named_entity_recognition.yaml
│   │   │   ├── aclue_poetry_appreciate.yaml
│   │   │   ├── aclue_poetry_context_prediction.yaml
│   │   │   ├── aclue_poetry_quality_assessment.yaml
│   │   │   ├── aclue_poetry_sentiment_analysis.yaml
│   │   │   ├── aclue_polysemy_resolution.yaml
│   │   │   ├── aclue_reading_comprehension.yaml
│   │   │   └── aclue_sentence_segmentation.yaml
│   │   ├── acpbench/
│   │   │   ├── README.md
│   │   │   ├── boolq_cot_2shot/
│   │   │   │   ├── _boolq_cot_2shot_yaml
│   │   │   │   ├── act_reach.yaml
│   │   │   │   ├── app.yaml
│   │   │   │   ├── just.yaml
│   │   │   │   ├── land.yaml
│   │   │   │   ├── prog.yaml
│   │   │   │   ├── reach.yaml
│   │   │   │   └── val.yaml
│   │   │   ├── gen_2shot/
│   │   │   │   ├── _gen_yaml_2shot
│   │   │   │   ├── acp_grammar.lark
│   │   │   │   ├── acp_utils.py
│   │   │   │   ├── act_reach.yaml
│   │   │   │   ├── app.yaml
│   │   │   │   ├── just.yaml
│   │   │   │   ├── land.yaml
│   │   │   │   ├── next_act.yaml
│   │   │   │   ├── prog.yaml
│   │   │   │   ├── reach.yaml
│   │   │   │   └── val.yaml
│   │   │   ├── gen_2shot_with_pddl/
│   │   │   │   ├── _gen_yaml_2shot
│   │   │   │   ├── acp_grammar.lark
│   │   │   │   ├── acp_utils.py
│   │   │   │   ├── act_reach.yaml
│   │   │   │   ├── app.yaml
│   │   │   │   ├── just.yaml
│   │   │   │   ├── land.yaml
│   │   │   │   ├── next_act.yaml
│   │   │   │   ├── prog.yaml
│   │   │   │   ├── reach.yaml
│   │   │   │   └── val.yaml
│   │   │   └── mcq_cot_2shot/
│   │   │       ├── _mcq_cot_2shot_yaml
│   │   │       ├── act_reach.yaml
│   │   │       ├── app.yaml
│   │   │       ├── just.yaml
│   │   │       ├── land.yaml
│   │   │       ├── prog.yaml
│   │   │       ├── reach.yaml
│   │   │       └── val.yaml
│   │   ├── aexams/
│   │   │   ├── README.md
│   │   │   ├── _aexams.yaml
│   │   │   ├── _default_template_yaml
│   │   │   ├── aexams_Biology.yaml
│   │   │   ├── aexams_IslamicStudies.yaml
│   │   │   ├── aexams_Physics.yaml
│   │   │   ├── aexams_Science.yaml
│   │   │   └── aexams_Social.yaml
│   │   ├── afrimgsm/
│   │   │   ├── README.md
│   │   │   ├── direct/
│   │   │   │   ├── afrimgsm.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimgsm_amh.yaml
│   │   │   │   │   ├── afrimgsm_eng.yaml
│   │   │   │   │   ├── afrimgsm_ewe.yaml
│   │   │   │   │   ├── afrimgsm_fra.yaml
│   │   │   │   │   ├── afrimgsm_hau.yaml
│   │   │   │   │   ├── afrimgsm_ibo.yaml
│   │   │   │   │   ├── afrimgsm_kin.yaml
│   │   │   │   │   ├── afrimgsm_lin.yaml
│   │   │   │   │   ├── afrimgsm_lug.yaml
│   │   │   │   │   ├── afrimgsm_orm.yaml
│   │   │   │   │   ├── afrimgsm_sna.yaml
│   │   │   │   │   ├── afrimgsm_sot.yaml
│   │   │   │   │   ├── afrimgsm_swa.yaml
│   │   │   │   │   ├── afrimgsm_twi.yaml
│   │   │   │   │   ├── afrimgsm_vai.yaml
│   │   │   │   │   ├── afrimgsm_wol.yaml
│   │   │   │   │   ├── afrimgsm_xho.yaml
│   │   │   │   │   ├── afrimgsm_yaml
│   │   │   │   │   ├── afrimgsm_yor.yaml
│   │   │   │   │   └── afrimgsm_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimgsm_amh.yaml
│   │   │   │   │   ├── afrimgsm_eng.yaml
│   │   │   │   │   ├── afrimgsm_ewe.yaml
│   │   │   │   │   ├── afrimgsm_fra.yaml
│   │   │   │   │   ├── afrimgsm_hau.yaml
│   │   │   │   │   ├── afrimgsm_ibo.yaml
│   │   │   │   │   ├── afrimgsm_kin.yaml
│   │   │   │   │   ├── afrimgsm_lin.yaml
│   │   │   │   │   ├── afrimgsm_lug.yaml
│   │   │   │   │   ├── afrimgsm_orm.yaml
│   │   │   │   │   ├── afrimgsm_sna.yaml
│   │   │   │   │   ├── afrimgsm_sot.yaml
│   │   │   │   │   ├── afrimgsm_swa.yaml
│   │   │   │   │   ├── afrimgsm_twi.yaml
│   │   │   │   │   ├── afrimgsm_vai.yaml
│   │   │   │   │   ├── afrimgsm_wol.yaml
│   │   │   │   │   ├── afrimgsm_xho.yaml
│   │   │   │   │   ├── afrimgsm_yaml
│   │   │   │   │   ├── afrimgsm_yor.yaml
│   │   │   │   │   └── afrimgsm_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimgsm_amh.yaml
│   │   │   │   │   ├── afrimgsm_eng.yaml
│   │   │   │   │   ├── afrimgsm_ewe.yaml
│   │   │   │   │   ├── afrimgsm_fra.yaml
│   │   │   │   │   ├── afrimgsm_hau.yaml
│   │   │   │   │   ├── afrimgsm_ibo.yaml
│   │   │   │   │   ├── afrimgsm_kin.yaml
│   │   │   │   │   ├── afrimgsm_lin.yaml
│   │   │   │   │   ├── afrimgsm_lug.yaml
│   │   │   │   │   ├── afrimgsm_orm.yaml
│   │   │   │   │   ├── afrimgsm_sna.yaml
│   │   │   │   │   ├── afrimgsm_sot.yaml
│   │   │   │   │   ├── afrimgsm_swa.yaml
│   │   │   │   │   ├── afrimgsm_twi.yaml
│   │   │   │   │   ├── afrimgsm_vai.yaml
│   │   │   │   │   ├── afrimgsm_wol.yaml
│   │   │   │   │   ├── afrimgsm_xho.yaml
│   │   │   │   │   ├── afrimgsm_yaml
│   │   │   │   │   ├── afrimgsm_yor.yaml
│   │   │   │   │   └── afrimgsm_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimgsm_amh.yaml
│   │   │   │   │   ├── afrimgsm_eng.yaml
│   │   │   │   │   ├── afrimgsm_ewe.yaml
│   │   │   │   │   ├── afrimgsm_fra.yaml
│   │   │   │   │   ├── afrimgsm_hau.yaml
│   │   │   │   │   ├── afrimgsm_ibo.yaml
│   │   │   │   │   ├── afrimgsm_kin.yaml
│   │   │   │   │   ├── afrimgsm_lin.yaml
│   │   │   │   │   ├── afrimgsm_lug.yaml
│   │   │   │   │   ├── afrimgsm_orm.yaml
│   │   │   │   │   ├── afrimgsm_sna.yaml
│   │   │   │   │   ├── afrimgsm_sot.yaml
│   │   │   │   │   ├── afrimgsm_swa.yaml
│   │   │   │   │   ├── afrimgsm_twi.yaml
│   │   │   │   │   ├── afrimgsm_vai.yaml
│   │   │   │   │   ├── afrimgsm_wol.yaml
│   │   │   │   │   ├── afrimgsm_xho.yaml
│   │   │   │   │   ├── afrimgsm_yaml
│   │   │   │   │   ├── afrimgsm_yor.yaml
│   │   │   │   │   └── afrimgsm_zul.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimgsm_amh.yaml
│   │   │   │       ├── afrimgsm_eng.yaml
│   │   │   │       ├── afrimgsm_ewe.yaml
│   │   │   │       ├── afrimgsm_fra.yaml
│   │   │   │       ├── afrimgsm_hau.yaml
│   │   │   │       ├── afrimgsm_ibo.yaml
│   │   │   │       ├── afrimgsm_kin.yaml
│   │   │   │       ├── afrimgsm_lin.yaml
│   │   │   │       ├── afrimgsm_lug.yaml
│   │   │   │       ├── afrimgsm_orm.yaml
│   │   │   │       ├── afrimgsm_sna.yaml
│   │   │   │       ├── afrimgsm_sot.yaml
│   │   │   │       ├── afrimgsm_swa.yaml
│   │   │   │       ├── afrimgsm_twi.yaml
│   │   │   │       ├── afrimgsm_vai.yaml
│   │   │   │       ├── afrimgsm_wol.yaml
│   │   │   │       ├── afrimgsm_xho.yaml
│   │   │   │       ├── afrimgsm_yaml
│   │   │   │       ├── afrimgsm_yor.yaml
│   │   │   │       └── afrimgsm_zul.yaml
│   │   │   ├── direct_cot/
│   │   │   │   ├── afrimgsm_cot.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimgsm_cot_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_eng.yaml
│   │   │   │   │   ├── afrimgsm_cot_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_yaml
│   │   │   │   │   ├── afrimgsm_cot_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimgsm_cot_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_eng.yaml
│   │   │   │   │   ├── afrimgsm_cot_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_yaml
│   │   │   │   │   ├── afrimgsm_cot_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimgsm_cot_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_eng.yaml
│   │   │   │   │   ├── afrimgsm_cot_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_yaml
│   │   │   │   │   ├── afrimgsm_cot_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimgsm_cot_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_eng.yaml
│   │   │   │   │   ├── afrimgsm_cot_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_yaml
│   │   │   │   │   ├── afrimgsm_cot_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_zul.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimgsm_cot_amh.yaml
│   │   │   │       ├── afrimgsm_cot_eng.yaml
│   │   │   │       ├── afrimgsm_cot_ewe.yaml
│   │   │   │       ├── afrimgsm_cot_fra.yaml
│   │   │   │       ├── afrimgsm_cot_hau.yaml
│   │   │   │       ├── afrimgsm_cot_ibo.yaml
│   │   │   │       ├── afrimgsm_cot_kin.yaml
│   │   │   │       ├── afrimgsm_cot_lin.yaml
│   │   │   │       ├── afrimgsm_cot_lug.yaml
│   │   │   │       ├── afrimgsm_cot_orm.yaml
│   │   │   │       ├── afrimgsm_cot_sna.yaml
│   │   │   │       ├── afrimgsm_cot_sot.yaml
│   │   │   │       ├── afrimgsm_cot_swa.yaml
│   │   │   │       ├── afrimgsm_cot_twi.yaml
│   │   │   │       ├── afrimgsm_cot_vai.yaml
│   │   │   │       ├── afrimgsm_cot_wol.yaml
│   │   │   │       ├── afrimgsm_cot_xho.yaml
│   │   │   │       ├── afrimgsm_cot_yaml
│   │   │   │       ├── afrimgsm_cot_yor.yaml
│   │   │   │       └── afrimgsm_cot_zul.yaml
│   │   │   ├── gen_utils.py
│   │   │   ├── gen_yaml.sh
│   │   │   ├── run.sh
│   │   │   ├── translate/
│   │   │   │   ├── afrimgsm_tt.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimgsm_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_translate_yaml
│   │   │   │   │   ├── afrimgsm_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_translate_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimgsm_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_translate_yaml
│   │   │   │   │   ├── afrimgsm_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_translate_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimgsm_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_translate_yaml
│   │   │   │   │   ├── afrimgsm_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_translate_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimgsm_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_translate_yaml
│   │   │   │   │   ├── afrimgsm_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_translate_zul.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimgsm_translate_amh.yaml
│   │   │   │       ├── afrimgsm_translate_ewe.yaml
│   │   │   │       ├── afrimgsm_translate_fra.yaml
│   │   │   │       ├── afrimgsm_translate_hau.yaml
│   │   │   │       ├── afrimgsm_translate_ibo.yaml
│   │   │   │       ├── afrimgsm_translate_kin.yaml
│   │   │   │       ├── afrimgsm_translate_lin.yaml
│   │   │   │       ├── afrimgsm_translate_lug.yaml
│   │   │   │       ├── afrimgsm_translate_orm.yaml
│   │   │   │       ├── afrimgsm_translate_sna.yaml
│   │   │   │       ├── afrimgsm_translate_sot.yaml
│   │   │   │       ├── afrimgsm_translate_swa.yaml
│   │   │   │       ├── afrimgsm_translate_twi.yaml
│   │   │   │       ├── afrimgsm_translate_wol.yaml
│   │   │   │       ├── afrimgsm_translate_xho.yaml
│   │   │   │       ├── afrimgsm_translate_yaml
│   │   │   │       ├── afrimgsm_translate_yor.yaml
│   │   │   │       └── afrimgsm_translate_zul.yaml
│   │   │   ├── translate_cot/
│   │   │   │   ├── afrimgsm_tt_cot.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_translate_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_translate_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_translate_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yaml
│   │   │   │   │   ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │   │   └── afrimgsm_cot_translate_zul.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimgsm_cot_translate_amh.yaml
│   │   │   │       ├── afrimgsm_cot_translate_ewe.yaml
│   │   │   │       ├── afrimgsm_cot_translate_fra.yaml
│   │   │   │       ├── afrimgsm_cot_translate_hau.yaml
│   │   │   │       ├── afrimgsm_cot_translate_ibo.yaml
│   │   │   │       ├── afrimgsm_cot_translate_kin.yaml
│   │   │   │       ├── afrimgsm_cot_translate_lin.yaml
│   │   │   │       ├── afrimgsm_cot_translate_lug.yaml
│   │   │   │       ├── afrimgsm_cot_translate_orm.yaml
│   │   │   │       ├── afrimgsm_cot_translate_sna.yaml
│   │   │   │       ├── afrimgsm_cot_translate_sot.yaml
│   │   │   │       ├── afrimgsm_cot_translate_swa.yaml
│   │   │   │       ├── afrimgsm_cot_translate_twi.yaml
│   │   │   │       ├── afrimgsm_cot_translate_vai.yaml
│   │   │   │       ├── afrimgsm_cot_translate_wol.yaml
│   │   │   │       ├── afrimgsm_cot_translate_xho.yaml
│   │   │   │       ├── afrimgsm_cot_translate_yaml
│   │   │   │       ├── afrimgsm_cot_translate_yor.yaml
│   │   │   │       └── afrimgsm_cot_translate_zul.yaml
│   │   │   └── utils.py
│   │   ├── afrimmlu/
│   │   │   ├── README.md
│   │   │   ├── direct/
│   │   │   │   ├── afrimmlu.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimmlu_direct
│   │   │   │   │   ├── afrimmlu_direct_amh.yaml
│   │   │   │   │   ├── afrimmlu_direct_eng.yaml
│   │   │   │   │   ├── afrimmlu_direct_ewe.yaml
│   │   │   │   │   ├── afrimmlu_direct_fra.yaml
│   │   │   │   │   ├── afrimmlu_direct_hau.yaml
│   │   │   │   │   ├── afrimmlu_direct_ibo.yaml
│   │   │   │   │   ├── afrimmlu_direct_kin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lug.yaml
│   │   │   │   │   ├── afrimmlu_direct_orm.yaml
│   │   │   │   │   ├── afrimmlu_direct_sna.yaml
│   │   │   │   │   ├── afrimmlu_direct_sot.yaml
│   │   │   │   │   ├── afrimmlu_direct_swa.yaml
│   │   │   │   │   ├── afrimmlu_direct_twi.yaml
│   │   │   │   │   ├── afrimmlu_direct_wol.yaml
│   │   │   │   │   ├── afrimmlu_direct_xho.yaml
│   │   │   │   │   ├── afrimmlu_direct_yor.yaml
│   │   │   │   │   ├── afrimmlu_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimmlu_direct
│   │   │   │   │   ├── afrimmlu_direct_amh.yaml
│   │   │   │   │   ├── afrimmlu_direct_eng.yaml
│   │   │   │   │   ├── afrimmlu_direct_ewe.yaml
│   │   │   │   │   ├── afrimmlu_direct_fra.yaml
│   │   │   │   │   ├── afrimmlu_direct_hau.yaml
│   │   │   │   │   ├── afrimmlu_direct_ibo.yaml
│   │   │   │   │   ├── afrimmlu_direct_kin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lug.yaml
│   │   │   │   │   ├── afrimmlu_direct_orm.yaml
│   │   │   │   │   ├── afrimmlu_direct_sna.yaml
│   │   │   │   │   ├── afrimmlu_direct_sot.yaml
│   │   │   │   │   ├── afrimmlu_direct_swa.yaml
│   │   │   │   │   ├── afrimmlu_direct_twi.yaml
│   │   │   │   │   ├── afrimmlu_direct_wol.yaml
│   │   │   │   │   ├── afrimmlu_direct_xho.yaml
│   │   │   │   │   ├── afrimmlu_direct_yor.yaml
│   │   │   │   │   ├── afrimmlu_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimmlu_direct
│   │   │   │   │   ├── afrimmlu_direct_amh.yaml
│   │   │   │   │   ├── afrimmlu_direct_eng.yaml
│   │   │   │   │   ├── afrimmlu_direct_ewe.yaml
│   │   │   │   │   ├── afrimmlu_direct_fra.yaml
│   │   │   │   │   ├── afrimmlu_direct_hau.yaml
│   │   │   │   │   ├── afrimmlu_direct_ibo.yaml
│   │   │   │   │   ├── afrimmlu_direct_kin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lug.yaml
│   │   │   │   │   ├── afrimmlu_direct_orm.yaml
│   │   │   │   │   ├── afrimmlu_direct_sna.yaml
│   │   │   │   │   ├── afrimmlu_direct_sot.yaml
│   │   │   │   │   ├── afrimmlu_direct_swa.yaml
│   │   │   │   │   ├── afrimmlu_direct_twi.yaml
│   │   │   │   │   ├── afrimmlu_direct_wol.yaml
│   │   │   │   │   ├── afrimmlu_direct_xho.yaml
│   │   │   │   │   ├── afrimmlu_direct_yor.yaml
│   │   │   │   │   ├── afrimmlu_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimmlu_direct
│   │   │   │   │   ├── afrimmlu_direct_amh.yaml
│   │   │   │   │   ├── afrimmlu_direct_eng.yaml
│   │   │   │   │   ├── afrimmlu_direct_ewe.yaml
│   │   │   │   │   ├── afrimmlu_direct_fra.yaml
│   │   │   │   │   ├── afrimmlu_direct_hau.yaml
│   │   │   │   │   ├── afrimmlu_direct_ibo.yaml
│   │   │   │   │   ├── afrimmlu_direct_kin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lin.yaml
│   │   │   │   │   ├── afrimmlu_direct_lug.yaml
│   │   │   │   │   ├── afrimmlu_direct_orm.yaml
│   │   │   │   │   ├── afrimmlu_direct_sna.yaml
│   │   │   │   │   ├── afrimmlu_direct_sot.yaml
│   │   │   │   │   ├── afrimmlu_direct_swa.yaml
│   │   │   │   │   ├── afrimmlu_direct_twi.yaml
│   │   │   │   │   ├── afrimmlu_direct_wol.yaml
│   │   │   │   │   ├── afrimmlu_direct_xho.yaml
│   │   │   │   │   ├── afrimmlu_direct_yor.yaml
│   │   │   │   │   ├── afrimmlu_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimmlu_direct
│   │   │   │       ├── afrimmlu_direct_amh.yaml
│   │   │   │       ├── afrimmlu_direct_eng.yaml
│   │   │   │       ├── afrimmlu_direct_ewe.yaml
│   │   │   │       ├── afrimmlu_direct_fra.yaml
│   │   │   │       ├── afrimmlu_direct_hau.yaml
│   │   │   │       ├── afrimmlu_direct_ibo.yaml
│   │   │   │       ├── afrimmlu_direct_kin.yaml
│   │   │   │       ├── afrimmlu_direct_lin.yaml
│   │   │   │       ├── afrimmlu_direct_lug.yaml
│   │   │   │       ├── afrimmlu_direct_orm.yaml
│   │   │   │       ├── afrimmlu_direct_sna.yaml
│   │   │   │       ├── afrimmlu_direct_sot.yaml
│   │   │   │       ├── afrimmlu_direct_swa.yaml
│   │   │   │       ├── afrimmlu_direct_twi.yaml
│   │   │   │       ├── afrimmlu_direct_wol.yaml
│   │   │   │       ├── afrimmlu_direct_xho.yaml
│   │   │   │       ├── afrimmlu_direct_yor.yaml
│   │   │   │       ├── afrimmlu_direct_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── fewshot.sh
│   │   │   ├── gen_utils.py
│   │   │   ├── translate/
│   │   │   │   ├── afrimmlu_tt.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrimmlu_translate
│   │   │   │   │   ├── afrimmlu_translate_amh.yaml
│   │   │   │   │   ├── afrimmlu_translate_ewe.yaml
│   │   │   │   │   ├── afrimmlu_translate_fra.yaml
│   │   │   │   │   ├── afrimmlu_translate_hau.yaml
│   │   │   │   │   ├── afrimmlu_translate_ibo.yaml
│   │   │   │   │   ├── afrimmlu_translate_kin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lug.yaml
│   │   │   │   │   ├── afrimmlu_translate_orm.yaml
│   │   │   │   │   ├── afrimmlu_translate_sna.yaml
│   │   │   │   │   ├── afrimmlu_translate_sot.yaml
│   │   │   │   │   ├── afrimmlu_translate_swa.yaml
│   │   │   │   │   ├── afrimmlu_translate_twi.yaml
│   │   │   │   │   ├── afrimmlu_translate_wol.yaml
│   │   │   │   │   ├── afrimmlu_translate_xho.yaml
│   │   │   │   │   ├── afrimmlu_translate_yor.yaml
│   │   │   │   │   ├── afrimmlu_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrimmlu_translate
│   │   │   │   │   ├── afrimmlu_translate_amh.yaml
│   │   │   │   │   ├── afrimmlu_translate_ewe.yaml
│   │   │   │   │   ├── afrimmlu_translate_fra.yaml
│   │   │   │   │   ├── afrimmlu_translate_hau.yaml
│   │   │   │   │   ├── afrimmlu_translate_ibo.yaml
│   │   │   │   │   ├── afrimmlu_translate_kin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lug.yaml
│   │   │   │   │   ├── afrimmlu_translate_orm.yaml
│   │   │   │   │   ├── afrimmlu_translate_sna.yaml
│   │   │   │   │   ├── afrimmlu_translate_sot.yaml
│   │   │   │   │   ├── afrimmlu_translate_swa.yaml
│   │   │   │   │   ├── afrimmlu_translate_twi.yaml
│   │   │   │   │   ├── afrimmlu_translate_wol.yaml
│   │   │   │   │   ├── afrimmlu_translate_xho.yaml
│   │   │   │   │   ├── afrimmlu_translate_yor.yaml
│   │   │   │   │   ├── afrimmlu_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrimmlu_translate
│   │   │   │   │   ├── afrimmlu_translate_amh.yaml
│   │   │   │   │   ├── afrimmlu_translate_ewe.yaml
│   │   │   │   │   ├── afrimmlu_translate_fra.yaml
│   │   │   │   │   ├── afrimmlu_translate_hau.yaml
│   │   │   │   │   ├── afrimmlu_translate_ibo.yaml
│   │   │   │   │   ├── afrimmlu_translate_kin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lug.yaml
│   │   │   │   │   ├── afrimmlu_translate_orm.yaml
│   │   │   │   │   ├── afrimmlu_translate_sna.yaml
│   │   │   │   │   ├── afrimmlu_translate_sot.yaml
│   │   │   │   │   ├── afrimmlu_translate_swa.yaml
│   │   │   │   │   ├── afrimmlu_translate_twi.yaml
│   │   │   │   │   ├── afrimmlu_translate_wol.yaml
│   │   │   │   │   ├── afrimmlu_translate_xho.yaml
│   │   │   │   │   ├── afrimmlu_translate_yor.yaml
│   │   │   │   │   ├── afrimmlu_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrimmlu_translate
│   │   │   │   │   ├── afrimmlu_translate_amh.yaml
│   │   │   │   │   ├── afrimmlu_translate_ewe.yaml
│   │   │   │   │   ├── afrimmlu_translate_fra.yaml
│   │   │   │   │   ├── afrimmlu_translate_hau.yaml
│   │   │   │   │   ├── afrimmlu_translate_ibo.yaml
│   │   │   │   │   ├── afrimmlu_translate_kin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lin.yaml
│   │   │   │   │   ├── afrimmlu_translate_lug.yaml
│   │   │   │   │   ├── afrimmlu_translate_orm.yaml
│   │   │   │   │   ├── afrimmlu_translate_sna.yaml
│   │   │   │   │   ├── afrimmlu_translate_sot.yaml
│   │   │   │   │   ├── afrimmlu_translate_swa.yaml
│   │   │   │   │   ├── afrimmlu_translate_twi.yaml
│   │   │   │   │   ├── afrimmlu_translate_wol.yaml
│   │   │   │   │   ├── afrimmlu_translate_xho.yaml
│   │   │   │   │   ├── afrimmlu_translate_yor.yaml
│   │   │   │   │   ├── afrimmlu_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrimmlu_translate
│   │   │   │       ├── afrimmlu_translate_amh.yaml
│   │   │   │       ├── afrimmlu_translate_ewe.yaml
│   │   │   │       ├── afrimmlu_translate_fra.yaml
│   │   │   │       ├── afrimmlu_translate_hau.yaml
│   │   │   │       ├── afrimmlu_translate_ibo.yaml
│   │   │   │       ├── afrimmlu_translate_kin.yaml
│   │   │   │       ├── afrimmlu_translate_lin.yaml
│   │   │   │       ├── afrimmlu_translate_lug.yaml
│   │   │   │       ├── afrimmlu_translate_orm.yaml
│   │   │   │       ├── afrimmlu_translate_sna.yaml
│   │   │   │       ├── afrimmlu_translate_sot.yaml
│   │   │   │       ├── afrimmlu_translate_swa.yaml
│   │   │   │       ├── afrimmlu_translate_twi.yaml
│   │   │   │       ├── afrimmlu_translate_wol.yaml
│   │   │   │       ├── afrimmlu_translate_xho.yaml
│   │   │   │       ├── afrimmlu_translate_yor.yaml
│   │   │   │       ├── afrimmlu_translate_zul.yaml
│   │   │   │       └── utils.py
│   │   │   └── utils.py
│   │   ├── afrixnli/
│   │   │   ├── README.md
│   │   │   ├── anli prompt/
│   │   │   │   ├── en-direct/
│   │   │   │   │   ├── afrixnli_en_direct_amh.yaml
│   │   │   │   │   ├── afrixnli_en_direct_eng.yaml
│   │   │   │   │   ├── afrixnli_en_direct_ewe.yaml
│   │   │   │   │   ├── afrixnli_en_direct_fra.yaml
│   │   │   │   │   ├── afrixnli_en_direct_hau.yaml
│   │   │   │   │   ├── afrixnli_en_direct_ibo.yaml
│   │   │   │   │   ├── afrixnli_en_direct_kin.yaml
│   │   │   │   │   ├── afrixnli_en_direct_lin.yaml
│   │   │   │   │   ├── afrixnli_en_direct_lug.yaml
│   │   │   │   │   ├── afrixnli_en_direct_orm.yaml
│   │   │   │   │   ├── afrixnli_en_direct_sna.yaml
│   │   │   │   │   ├── afrixnli_en_direct_sot.yaml
│   │   │   │   │   ├── afrixnli_en_direct_swa.yaml
│   │   │   │   │   ├── afrixnli_en_direct_twi.yaml
│   │   │   │   │   ├── afrixnli_en_direct_wol.yaml
│   │   │   │   │   ├── afrixnli_en_direct_xho.yaml
│   │   │   │   │   ├── afrixnli_en_direct_yaml
│   │   │   │   │   ├── afrixnli_en_direct_yor.yaml
│   │   │   │   │   ├── afrixnli_en_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── native-direct/
│   │   │   │   │   ├── afrixnli_native_direct_amh.yaml
│   │   │   │   │   ├── afrixnli_native_direct_eng.yaml
│   │   │   │   │   ├── afrixnli_native_direct_ewe.yaml
│   │   │   │   │   ├── afrixnli_native_direct_fra.yaml
│   │   │   │   │   ├── afrixnli_native_direct_hau.yaml
│   │   │   │   │   ├── afrixnli_native_direct_ibo.yaml
│   │   │   │   │   ├── afrixnli_native_direct_kin.yaml
│   │   │   │   │   ├── afrixnli_native_direct_lin.yaml
│   │   │   │   │   ├── afrixnli_native_direct_lug.yaml
│   │   │   │   │   ├── afrixnli_native_direct_orm.yaml
│   │   │   │   │   ├── afrixnli_native_direct_sna.yaml
│   │   │   │   │   ├── afrixnli_native_direct_sot.yaml
│   │   │   │   │   ├── afrixnli_native_direct_swa.yaml
│   │   │   │   │   ├── afrixnli_native_direct_twi.yaml
│   │   │   │   │   ├── afrixnli_native_direct_wol.yaml
│   │   │   │   │   ├── afrixnli_native_direct_xho.yaml
│   │   │   │   │   ├── afrixnli_native_direct_yaml
│   │   │   │   │   ├── afrixnli_native_direct_yor.yaml
│   │   │   │   │   ├── afrixnli_native_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── translate/
│   │   │   │       ├── afrixnli_translate_amh.yaml
│   │   │   │       ├── afrixnli_translate_ewe.yaml
│   │   │   │       ├── afrixnli_translate_fra.yaml
│   │   │   │       ├── afrixnli_translate_hau.yaml
│   │   │   │       ├── afrixnli_translate_ibo.yaml
│   │   │   │       ├── afrixnli_translate_kin.yaml
│   │   │   │       ├── afrixnli_translate_lin.yaml
│   │   │   │       ├── afrixnli_translate_lug.yaml
│   │   │   │       ├── afrixnli_translate_orm.yaml
│   │   │   │       ├── afrixnli_translate_sna.yaml
│   │   │   │       ├── afrixnli_translate_sot.yaml
│   │   │   │       ├── afrixnli_translate_swa.yaml
│   │   │   │       ├── afrixnli_translate_twi.yaml
│   │   │   │       ├── afrixnli_translate_wol.yaml
│   │   │   │       ├── afrixnli_translate_xho.yaml
│   │   │   │       ├── afrixnli_translate_yaml
│   │   │   │       ├── afrixnli_translate_yor.yaml
│   │   │   │       ├── afrixnli_translate_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── direct/
│   │   │   │   ├── afrixnli.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrixnli_amh.yaml
│   │   │   │   │   ├── afrixnli_eng.yaml
│   │   │   │   │   ├── afrixnli_ewe.yaml
│   │   │   │   │   ├── afrixnli_fra.yaml
│   │   │   │   │   ├── afrixnli_hau.yaml
│   │   │   │   │   ├── afrixnli_ibo.yaml
│   │   │   │   │   ├── afrixnli_kin.yaml
│   │   │   │   │   ├── afrixnli_lin.yaml
│   │   │   │   │   ├── afrixnli_lug.yaml
│   │   │   │   │   ├── afrixnli_orm.yaml
│   │   │   │   │   ├── afrixnli_sna.yaml
│   │   │   │   │   ├── afrixnli_sot.yaml
│   │   │   │   │   ├── afrixnli_swa.yaml
│   │   │   │   │   ├── afrixnli_twi.yaml
│   │   │   │   │   ├── afrixnli_wol.yaml
│   │   │   │   │   ├── afrixnli_xho.yaml
│   │   │   │   │   ├── afrixnli_yaml
│   │   │   │   │   ├── afrixnli_yor.yaml
│   │   │   │   │   ├── afrixnli_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrixnli_amh.yaml
│   │   │   │   │   ├── afrixnli_eng.yaml
│   │   │   │   │   ├── afrixnli_ewe.yaml
│   │   │   │   │   ├── afrixnli_fra.yaml
│   │   │   │   │   ├── afrixnli_hau.yaml
│   │   │   │   │   ├── afrixnli_ibo.yaml
│   │   │   │   │   ├── afrixnli_kin.yaml
│   │   │   │   │   ├── afrixnli_lin.yaml
│   │   │   │   │   ├── afrixnli_lug.yaml
│   │   │   │   │   ├── afrixnli_orm.yaml
│   │   │   │   │   ├── afrixnli_sna.yaml
│   │   │   │   │   ├── afrixnli_sot.yaml
│   │   │   │   │   ├── afrixnli_swa.yaml
│   │   │   │   │   ├── afrixnli_twi.yaml
│   │   │   │   │   ├── afrixnli_wol.yaml
│   │   │   │   │   ├── afrixnli_xho.yaml
│   │   │   │   │   ├── afrixnli_yaml
│   │   │   │   │   ├── afrixnli_yor.yaml
│   │   │   │   │   ├── afrixnli_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrixnli_amh.yaml
│   │   │   │   │   ├── afrixnli_eng.yaml
│   │   │   │   │   ├── afrixnli_ewe.yaml
│   │   │   │   │   ├── afrixnli_fra.yaml
│   │   │   │   │   ├── afrixnli_hau.yaml
│   │   │   │   │   ├── afrixnli_ibo.yaml
│   │   │   │   │   ├── afrixnli_kin.yaml
│   │   │   │   │   ├── afrixnli_lin.yaml
│   │   │   │   │   ├── afrixnli_lug.yaml
│   │   │   │   │   ├── afrixnli_orm.yaml
│   │   │   │   │   ├── afrixnli_sna.yaml
│   │   │   │   │   ├── afrixnli_sot.yaml
│   │   │   │   │   ├── afrixnli_swa.yaml
│   │   │   │   │   ├── afrixnli_twi.yaml
│   │   │   │   │   ├── afrixnli_wol.yaml
│   │   │   │   │   ├── afrixnli_xho.yaml
│   │   │   │   │   ├── afrixnli_yaml
│   │   │   │   │   ├── afrixnli_yor.yaml
│   │   │   │   │   ├── afrixnli_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrixnli_amh.yaml
│   │   │   │   │   ├── afrixnli_eng.yaml
│   │   │   │   │   ├── afrixnli_ewe.yaml
│   │   │   │   │   ├── afrixnli_fra.yaml
│   │   │   │   │   ├── afrixnli_hau.yaml
│   │   │   │   │   ├── afrixnli_ibo.yaml
│   │   │   │   │   ├── afrixnli_kin.yaml
│   │   │   │   │   ├── afrixnli_lin.yaml
│   │   │   │   │   ├── afrixnli_lug.yaml
│   │   │   │   │   ├── afrixnli_orm.yaml
│   │   │   │   │   ├── afrixnli_sna.yaml
│   │   │   │   │   ├── afrixnli_sot.yaml
│   │   │   │   │   ├── afrixnli_swa.yaml
│   │   │   │   │   ├── afrixnli_twi.yaml
│   │   │   │   │   ├── afrixnli_wol.yaml
│   │   │   │   │   ├── afrixnli_xho.yaml
│   │   │   │   │   ├── afrixnli_yaml
│   │   │   │   │   ├── afrixnli_yor.yaml
│   │   │   │   │   ├── afrixnli_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrixnli_amh.yaml
│   │   │   │       ├── afrixnli_eng.yaml
│   │   │   │       ├── afrixnli_ewe.yaml
│   │   │   │       ├── afrixnli_fra.yaml
│   │   │   │       ├── afrixnli_hau.yaml
│   │   │   │       ├── afrixnli_ibo.yaml
│   │   │   │       ├── afrixnli_kin.yaml
│   │   │   │       ├── afrixnli_lin.yaml
│   │   │   │       ├── afrixnli_lug.yaml
│   │   │   │       ├── afrixnli_orm.yaml
│   │   │   │       ├── afrixnli_sna.yaml
│   │   │   │       ├── afrixnli_sot.yaml
│   │   │   │       ├── afrixnli_swa.yaml
│   │   │   │       ├── afrixnli_twi.yaml
│   │   │   │       ├── afrixnli_wol.yaml
│   │   │   │       ├── afrixnli_xho.yaml
│   │   │   │       ├── afrixnli_yaml
│   │   │   │       ├── afrixnli_yor.yaml
│   │   │   │       ├── afrixnli_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── gen_utils.py
│   │   │   ├── lai prompt/
│   │   │   │   ├── direct/
│   │   │   │   │   ├── afrixnli_manual_direct_amh.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_eng.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_ewe.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_fra.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_hau.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_ibo.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_kin.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_lin.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_lug.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_orm.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_sna.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_sot.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_swa.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_twi.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_wol.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_xho.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_yaml
│   │   │   │   │   ├── afrixnli_manual_direct_yor.yaml
│   │   │   │   │   ├── afrixnli_manual_direct_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── translate/
│   │   │   │       ├── afrixnli_manual_translate_amh.yaml
│   │   │   │       ├── afrixnli_manual_translate_ewe.yaml
│   │   │   │       ├── afrixnli_manual_translate_fra.yaml
│   │   │   │       ├── afrixnli_manual_translate_hau.yaml
│   │   │   │       ├── afrixnli_manual_translate_ibo.yaml
│   │   │   │       ├── afrixnli_manual_translate_kin.yaml
│   │   │   │       ├── afrixnli_manual_translate_lin.yaml
│   │   │   │       ├── afrixnli_manual_translate_lug.yaml
│   │   │   │       ├── afrixnli_manual_translate_orm.yaml
│   │   │   │       ├── afrixnli_manual_translate_sna.yaml
│   │   │   │       ├── afrixnli_manual_translate_sot.yaml
│   │   │   │       ├── afrixnli_manual_translate_swa.yaml
│   │   │   │       ├── afrixnli_manual_translate_twi.yaml
│   │   │   │       ├── afrixnli_manual_translate_wol.yaml
│   │   │   │       ├── afrixnli_manual_translate_xho.yaml
│   │   │   │       ├── afrixnli_manual_translate_yaml
│   │   │   │       ├── afrixnli_manual_translate_yor.yaml
│   │   │   │       ├── afrixnli_manual_translate_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── translate/
│   │   │   │   ├── afrixnli_tt.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrixnli_translate_amh.yaml
│   │   │   │   │   ├── afrixnli_translate_ewe.yaml
│   │   │   │   │   ├── afrixnli_translate_fra.yaml
│   │   │   │   │   ├── afrixnli_translate_hau.yaml
│   │   │   │   │   ├── afrixnli_translate_ibo.yaml
│   │   │   │   │   ├── afrixnli_translate_kin.yaml
│   │   │   │   │   ├── afrixnli_translate_lin.yaml
│   │   │   │   │   ├── afrixnli_translate_lug.yaml
│   │   │   │   │   ├── afrixnli_translate_orm.yaml
│   │   │   │   │   ├── afrixnli_translate_sna.yaml
│   │   │   │   │   ├── afrixnli_translate_sot.yaml
│   │   │   │   │   ├── afrixnli_translate_swa.yaml
│   │   │   │   │   ├── afrixnli_translate_twi.yaml
│   │   │   │   │   ├── afrixnli_translate_wol.yaml
│   │   │   │   │   ├── afrixnli_translate_xho.yaml
│   │   │   │   │   ├── afrixnli_translate_yaml
│   │   │   │   │   ├── afrixnli_translate_yor.yaml
│   │   │   │   │   ├── afrixnli_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrixnli_translate_amh.yaml
│   │   │   │   │   ├── afrixnli_translate_ewe.yaml
│   │   │   │   │   ├── afrixnli_translate_fra.yaml
│   │   │   │   │   ├── afrixnli_translate_hau.yaml
│   │   │   │   │   ├── afrixnli_translate_ibo.yaml
│   │   │   │   │   ├── afrixnli_translate_kin.yaml
│   │   │   │   │   ├── afrixnli_translate_lin.yaml
│   │   │   │   │   ├── afrixnli_translate_lug.yaml
│   │   │   │   │   ├── afrixnli_translate_orm.yaml
│   │   │   │   │   ├── afrixnli_translate_sna.yaml
│   │   │   │   │   ├── afrixnli_translate_sot.yaml
│   │   │   │   │   ├── afrixnli_translate_swa.yaml
│   │   │   │   │   ├── afrixnli_translate_twi.yaml
│   │   │   │   │   ├── afrixnli_translate_wol.yaml
│   │   │   │   │   ├── afrixnli_translate_xho.yaml
│   │   │   │   │   ├── afrixnli_translate_yaml
│   │   │   │   │   ├── afrixnli_translate_yor.yaml
│   │   │   │   │   ├── afrixnli_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrixnli_translate_amh.yaml
│   │   │   │   │   ├── afrixnli_translate_ewe.yaml
│   │   │   │   │   ├── afrixnli_translate_fra.yaml
│   │   │   │   │   ├── afrixnli_translate_hau.yaml
│   │   │   │   │   ├── afrixnli_translate_ibo.yaml
│   │   │   │   │   ├── afrixnli_translate_kin.yaml
│   │   │   │   │   ├── afrixnli_translate_lin.yaml
│   │   │   │   │   ├── afrixnli_translate_lug.yaml
│   │   │   │   │   ├── afrixnli_translate_orm.yaml
│   │   │   │   │   ├── afrixnli_translate_sna.yaml
│   │   │   │   │   ├── afrixnli_translate_sot.yaml
│   │   │   │   │   ├── afrixnli_translate_swa.yaml
│   │   │   │   │   ├── afrixnli_translate_twi.yaml
│   │   │   │   │   ├── afrixnli_translate_wol.yaml
│   │   │   │   │   ├── afrixnli_translate_xho.yaml
│   │   │   │   │   ├── afrixnli_translate_yaml
│   │   │   │   │   ├── afrixnli_translate_yor.yaml
│   │   │   │   │   ├── afrixnli_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrixnli_translate_amh.yaml
│   │   │   │   │   ├── afrixnli_translate_ewe.yaml
│   │   │   │   │   ├── afrixnli_translate_fra.yaml
│   │   │   │   │   ├── afrixnli_translate_hau.yaml
│   │   │   │   │   ├── afrixnli_translate_ibo.yaml
│   │   │   │   │   ├── afrixnli_translate_kin.yaml
│   │   │   │   │   ├── afrixnli_translate_lin.yaml
│   │   │   │   │   ├── afrixnli_translate_lug.yaml
│   │   │   │   │   ├── afrixnli_translate_orm.yaml
│   │   │   │   │   ├── afrixnli_translate_sna.yaml
│   │   │   │   │   ├── afrixnli_translate_sot.yaml
│   │   │   │   │   ├── afrixnli_translate_swa.yaml
│   │   │   │   │   ├── afrixnli_translate_twi.yaml
│   │   │   │   │   ├── afrixnli_translate_wol.yaml
│   │   │   │   │   ├── afrixnli_translate_xho.yaml
│   │   │   │   │   ├── afrixnli_translate_yaml
│   │   │   │   │   ├── afrixnli_translate_yor.yaml
│   │   │   │   │   ├── afrixnli_translate_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afrixnli_translate_amh.yaml
│   │   │   │       ├── afrixnli_translate_ewe.yaml
│   │   │   │       ├── afrixnli_translate_fra.yaml
│   │   │   │       ├── afrixnli_translate_hau.yaml
│   │   │   │       ├── afrixnli_translate_ibo.yaml
│   │   │   │       ├── afrixnli_translate_kin.yaml
│   │   │   │       ├── afrixnli_translate_lin.yaml
│   │   │   │       ├── afrixnli_translate_lug.yaml
│   │   │   │       ├── afrixnli_translate_orm.yaml
│   │   │   │       ├── afrixnli_translate_sna.yaml
│   │   │   │       ├── afrixnli_translate_sot.yaml
│   │   │   │       ├── afrixnli_translate_swa.yaml
│   │   │   │       ├── afrixnli_translate_twi.yaml
│   │   │   │       ├── afrixnli_translate_wol.yaml
│   │   │   │       ├── afrixnli_translate_xho.yaml
│   │   │   │       ├── afrixnli_translate_yaml
│   │   │   │       ├── afrixnli_translate_yor.yaml
│   │   │   │       ├── afrixnli_translate_zul.yaml
│   │   │   │       └── utils.py
│   │   │   └── utils.py
│   │   ├── afrobench/
│   │   │   ├── README.md
│   │   │   ├── adr/
│   │   │   │   ├── README.md
│   │   │   │   ├── afridiacritics.yaml
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afridiacritics_bbj.yaml
│   │   │   │   │   ├── afridiacritics_fon.yaml
│   │   │   │   │   ├── afridiacritics_ibo.yaml
│   │   │   │   │   ├── afridiacritics_wol.yaml
│   │   │   │   │   ├── afridiacritics_yaml
│   │   │   │   │   └── afridiacritics_yor.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afridiacritics_bbj.yaml
│   │   │   │   │   ├── afridiacritics_fon.yaml
│   │   │   │   │   ├── afridiacritics_ibo.yaml
│   │   │   │   │   ├── afridiacritics_wol.yaml
│   │   │   │   │   ├── afridiacritics_yaml
│   │   │   │   │   └── afridiacritics_yor.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afridiacritics_bbj.yaml
│   │   │   │   │   ├── afridiacritics_fon.yaml
│   │   │   │   │   ├── afridiacritics_ibo.yaml
│   │   │   │   │   ├── afridiacritics_wol.yaml
│   │   │   │   │   ├── afridiacritics_yaml
│   │   │   │   │   └── afridiacritics_yor.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afridiacritics_bbj.yaml
│   │   │   │   │   ├── afridiacritics_fon.yaml
│   │   │   │   │   ├── afridiacritics_ibo.yaml
│   │   │   │   │   ├── afridiacritics_wol.yaml
│   │   │   │   │   ├── afridiacritics_yaml
│   │   │   │   │   └── afridiacritics_yor.yaml
│   │   │   │   └── prompt_5/
│   │   │   │       ├── afridiacritics_bbj.yaml
│   │   │   │       ├── afridiacritics_fon.yaml
│   │   │   │       ├── afridiacritics_ibo.yaml
│   │   │   │       ├── afridiacritics_wol.yaml
│   │   │   │       ├── afridiacritics_yaml
│   │   │   │       └── afridiacritics_yor.yaml
│   │   │   ├── afriqa/
│   │   │   │   ├── README.md
│   │   │   │   ├── afriqa.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── afriqa
│   │   │   │   │   ├── afriqa_bem.yaml
│   │   │   │   │   ├── afriqa_fon.yaml
│   │   │   │   │   ├── afriqa_hau.yaml
│   │   │   │   │   ├── afriqa_ibo.yaml
│   │   │   │   │   ├── afriqa_kin.yaml
│   │   │   │   │   ├── afriqa_swa.yaml
│   │   │   │   │   ├── afriqa_twi.yaml
│   │   │   │   │   ├── afriqa_yor.yaml
│   │   │   │   │   ├── afriqa_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── utils.py
│   │   │   ├── afrisenti/
│   │   │   │   ├── README.md
│   │   │   │   ├── afrisenti.yaml
│   │   │   │   ├── fewshot.sh
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── run.sh
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── run.sh
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── afrisenti
│   │   │   │   │   ├── afrisenti_amh.yaml
│   │   │   │   │   ├── afrisenti_arq.yaml
│   │   │   │   │   ├── afrisenti_ary.yaml
│   │   │   │   │   ├── afrisenti_hau.yaml
│   │   │   │   │   ├── afrisenti_ibo.yaml
│   │   │   │   │   ├── afrisenti_kin.yaml
│   │   │   │   │   ├── afrisenti_orm.yaml
│   │   │   │   │   ├── afrisenti_pcm.yaml
│   │   │   │   │   ├── afrisenti_por.yaml
│   │   │   │   │   ├── afrisenti_swa.yaml
│   │   │   │   │   ├── afrisenti_tir.yaml
│   │   │   │   │   ├── afrisenti_tso.yaml
│   │   │   │   │   ├── afrisenti_twi.yaml
│   │   │   │   │   ├── afrisenti_yor.yaml
│   │   │   │   │   ├── utils.py
│   │   │   │   │   └── xx.py
│   │   │   │   └── utils.py
│   │   │   ├── afrobench-lite.yaml
│   │   │   ├── afrobench.yaml
│   │   │   ├── belebele/
│   │   │   │   ├── README.md
│   │   │   │   ├── belebele.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── belebele
│   │   │   │   │   ├── belebele_afr.yaml
│   │   │   │   │   ├── belebele_amh.yaml
│   │   │   │   │   ├── belebele_ary.yaml
│   │   │   │   │   ├── belebele_arz.yaml
│   │   │   │   │   ├── belebele_bam.yaml
│   │   │   │   │   ├── belebele_eng.yaml
│   │   │   │   │   ├── belebele_fra.yaml
│   │   │   │   │   ├── belebele_fuv.yaml
│   │   │   │   │   ├── belebele_gaz.yaml
│   │   │   │   │   ├── belebele_hau.yaml
│   │   │   │   │   ├── belebele_ibo.yaml
│   │   │   │   │   ├── belebele_kea.yaml
│   │   │   │   │   ├── belebele_kin.yaml
│   │   │   │   │   ├── belebele_lin.yaml
│   │   │   │   │   ├── belebele_lug.yaml
│   │   │   │   │   ├── belebele_luo.yaml
│   │   │   │   │   ├── belebele_nya.yaml
│   │   │   │   │   ├── belebele_plt.yaml
│   │   │   │   │   ├── belebele_por.yaml
│   │   │   │   │   ├── belebele_sna.yaml
│   │   │   │   │   ├── belebele_som.yaml
│   │   │   │   │   ├── belebele_sot.yaml
│   │   │   │   │   ├── belebele_ssw.yaml
│   │   │   │   │   ├── belebele_swa.yaml
│   │   │   │   │   ├── belebele_tir.yaml
│   │   │   │   │   ├── belebele_tsn.yaml
│   │   │   │   │   ├── belebele_tso.yaml
│   │   │   │   │   ├── belebele_wol.yaml
│   │   │   │   │   ├── belebele_xho.yaml
│   │   │   │   │   ├── belebele_yor.yaml
│   │   │   │   │   └── belebele_zul.yaml
│   │   │   │   └── utils.py
│   │   │   ├── flores/
│   │   │   │   ├── README.md
│   │   │   │   ├── flores.yaml
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── flores
│   │   │   │   │   │   ├── flores_ace_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ace_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_acq_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_aeb_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_afr_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_aka_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_amh_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ary_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_arz_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_bam_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ban_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_bem_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_cjk_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_dik_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_dyu_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ewe_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fon_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fra_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fuv_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_gaz_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_hau_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ibo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kab_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kam_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kbp_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kea_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kik_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kmb_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_knc_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_knc_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kon_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lua_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lug_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_luo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_mos_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nus_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nya_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_plt_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_run_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sag_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sna_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_som_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sot_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ssw_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sun_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_swh_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_taq_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_taq_Tfng-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tir_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tsn_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tum_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_twi_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tzm_Tfng-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_umb_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_wol_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_xho_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_yor_Latn-eng_Latn.yaml
│   │   │   │   │   │   └── flores_zul_Latn-eng_Latn.yaml
│   │   │   │   │   ├── english-african/
│   │   │   │   │   │   ├── flores
│   │   │   │   │   │   ├── flores_eng_Latn-ace_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ace_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-acq_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-aeb_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-afr_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-aka_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-amh_Ethi.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ary_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-arz_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-bam_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ban_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-bem_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-cjk_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-dik_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-dyu_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ewe_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fon_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fra_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fuv_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-gaz_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-hau_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ibo_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kab_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kam_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kbp_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kea_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kik_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kin_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kmb_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-knc_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-knc_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kon_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lin_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lua_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lug_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-luo_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-mos_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nso_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nus_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nya_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-plt_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-run_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sag_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sna_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-som_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sot_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ssw_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sun_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-swh_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-taq_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-taq_Tfng.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tir_Ethi.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tsn_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tso_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tum_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-twi_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tzm_Tfng.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-umb_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-wol_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-xho_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-yor_Latn.yaml
│   │   │   │   │   │   └── flores_eng_Latn-zul_Latn.yaml
│   │   │   │   │   └── flores
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── flores
│   │   │   │   │   │   ├── flores_ace_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ace_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_acq_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_aeb_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_afr_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_aka_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_amh_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ary_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_arz_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_bam_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ban_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_bem_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_cjk_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_dik_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_dyu_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ewe_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fon_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fra_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_fuv_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_gaz_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_hau_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ibo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kab_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kam_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kbp_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kea_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kik_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kmb_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_knc_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_knc_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_kon_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lua_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_lug_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_luo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_mos_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nus_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_nya_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_plt_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_run_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sag_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sna_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_som_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sot_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_ssw_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_sun_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_swh_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_taq_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_taq_Tfng-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tir_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tsn_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tum_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_twi_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_tzm_Tfng-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_umb_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_wol_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_xho_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── flores_yor_Latn-eng_Latn.yaml
│   │   │   │   │   │   └── flores_zul_Latn-eng_Latn.yaml
│   │   │   │   │   ├── english-african/
│   │   │   │   │   │   ├── flores
│   │   │   │   │   │   ├── flores_eng_Latn-ace_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ace_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-acq_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-aeb_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-afr_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-aka_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-amh_Ethi.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ary_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-arz_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-bam_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ban_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-bem_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-cjk_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-dik_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-dyu_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ewe_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fon_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fra_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-fuv_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-gaz_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-hau_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ibo_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kab_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kam_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kbp_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kea_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kik_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kin_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kmb_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-knc_Arab.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-knc_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-kon_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lin_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lua_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-lug_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-luo_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-mos_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nso_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nus_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-nya_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-plt_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-run_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sag_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sna_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-som_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sot_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-ssw_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-sun_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-swh_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-taq_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-taq_Tfng.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tir_Ethi.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tsn_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tso_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tum_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-twi_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-tzm_Tfng.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-umb_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-wol_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-xho_Latn.yaml
│   │   │   │   │   │   ├── flores_eng_Latn-yor_Latn.yaml
│   │   │   │   │   │   └── flores_eng_Latn-zul_Latn.yaml
│   │   │   │   │   └── flores
│   │   │   │   └── prompt_3/
│   │   │   │       ├── african-english/
│   │   │   │       │   ├── flores
│   │   │   │       │   ├── flores_ace_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_ace_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_acq_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_aeb_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_afr_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_aka_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_amh_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── flores_ary_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_arz_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_bam_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_ban_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_bem_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_cjk_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_dik_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_dyu_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_ewe_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_fon_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_fra_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_fuv_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_gaz_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_hau_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_ibo_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kab_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kam_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kbp_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kea_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kik_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kin_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kmb_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_knc_Arab-eng_Latn.yaml
│   │   │   │       │   ├── flores_knc_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_kon_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_lin_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_lua_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_lug_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_luo_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_mos_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_nso_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_nus_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_nya_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_plt_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_run_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_sag_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_sna_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_som_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_sot_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_ssw_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_sun_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_swh_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_taq_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_taq_Tfng-eng_Latn.yaml
│   │   │   │       │   ├── flores_tir_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── flores_tsn_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_tso_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_tum_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_twi_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_tzm_Tfng-eng_Latn.yaml
│   │   │   │       │   ├── flores_umb_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_wol_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_xho_Latn-eng_Latn.yaml
│   │   │   │       │   ├── flores_yor_Latn-eng_Latn.yaml
│   │   │   │       │   └── flores_zul_Latn-eng_Latn.yaml
│   │   │   │       ├── english-african/
│   │   │   │       │   ├── flores
│   │   │   │       │   ├── flores_eng_Latn-ace_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-ace_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-acq_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-aeb_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-afr_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-aka_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-amh_Ethi.yaml
│   │   │   │       │   ├── flores_eng_Latn-ary_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-arz_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-bam_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-ban_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-bem_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-cjk_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-dik_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-dyu_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-ewe_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-fon_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-fra_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-fuv_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-gaz_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-hau_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-ibo_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kab_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kam_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kbp_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kea_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kik_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kin_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kmb_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-knc_Arab.yaml
│   │   │   │       │   ├── flores_eng_Latn-knc_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-kon_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-lin_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-lua_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-lug_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-luo_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-mos_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-nso_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-nus_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-nya_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-plt_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-run_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-sag_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-sna_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-som_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-sot_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-ssw_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-sun_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-swh_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-taq_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-taq_Tfng.yaml
│   │   │   │       │   ├── flores_eng_Latn-tir_Ethi.yaml
│   │   │   │       │   ├── flores_eng_Latn-tsn_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-tso_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-tum_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-twi_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-tzm_Tfng.yaml
│   │   │   │       │   ├── flores_eng_Latn-umb_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-wol_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-xho_Latn.yaml
│   │   │   │       │   ├── flores_eng_Latn-yor_Latn.yaml
│   │   │   │       │   └── flores_eng_Latn-zul_Latn.yaml
│   │   │   │       └── flores
│   │   │   ├── injongointent/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── injongointent.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── injongointent
│   │   │   │   │   ├── injongointent_amh.yaml
│   │   │   │   │   ├── injongointent_eng.yaml
│   │   │   │   │   ├── injongointent_ewe.yaml
│   │   │   │   │   ├── injongointent_hau.yaml
│   │   │   │   │   ├── injongointent_ibo.yaml
│   │   │   │   │   ├── injongointent_kin.yaml
│   │   │   │   │   ├── injongointent_lin.yaml
│   │   │   │   │   ├── injongointent_lug.yaml
│   │   │   │   │   ├── injongointent_orm.yaml
│   │   │   │   │   ├── injongointent_sna.yaml
│   │   │   │   │   ├── injongointent_sot.yaml
│   │   │   │   │   ├── injongointent_swa.yaml
│   │   │   │   │   ├── injongointent_twi.yaml
│   │   │   │   │   ├── injongointent_wol.yaml
│   │   │   │   │   ├── injongointent_xho.yaml
│   │   │   │   │   ├── injongointent_yor.yaml
│   │   │   │   │   ├── injongointent_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── injongointent
│   │   │   │   │   ├── injongointent_amh.yaml
│   │   │   │   │   ├── injongointent_eng.yaml
│   │   │   │   │   ├── injongointent_ewe.yaml
│   │   │   │   │   ├── injongointent_hau.yaml
│   │   │   │   │   ├── injongointent_ibo.yaml
│   │   │   │   │   ├── injongointent_kin.yaml
│   │   │   │   │   ├── injongointent_lin.yaml
│   │   │   │   │   ├── injongointent_lug.yaml
│   │   │   │   │   ├── injongointent_orm.yaml
│   │   │   │   │   ├── injongointent_sna.yaml
│   │   │   │   │   ├── injongointent_sot.yaml
│   │   │   │   │   ├── injongointent_swa.yaml
│   │   │   │   │   ├── injongointent_twi.yaml
│   │   │   │   │   ├── injongointent_wol.yaml
│   │   │   │   │   ├── injongointent_xho.yaml
│   │   │   │   │   ├── injongointent_yor.yaml
│   │   │   │   │   ├── injongointent_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── injongointent
│   │   │   │   │   ├── injongointent_amh.yaml
│   │   │   │   │   ├── injongointent_eng.yaml
│   │   │   │   │   ├── injongointent_ewe.yaml
│   │   │   │   │   ├── injongointent_hau.yaml
│   │   │   │   │   ├── injongointent_ibo.yaml
│   │   │   │   │   ├── injongointent_kin.yaml
│   │   │   │   │   ├── injongointent_lin.yaml
│   │   │   │   │   ├── injongointent_lug.yaml
│   │   │   │   │   ├── injongointent_orm.yaml
│   │   │   │   │   ├── injongointent_sna.yaml
│   │   │   │   │   ├── injongointent_sot.yaml
│   │   │   │   │   ├── injongointent_swa.yaml
│   │   │   │   │   ├── injongointent_twi.yaml
│   │   │   │   │   ├── injongointent_wol.yaml
│   │   │   │   │   ├── injongointent_xho.yaml
│   │   │   │   │   ├── injongointent_yor.yaml
│   │   │   │   │   ├── injongointent_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── injongointent
│   │   │   │   │   ├── injongointent_amh.yaml
│   │   │   │   │   ├── injongointent_eng.yaml
│   │   │   │   │   ├── injongointent_ewe.yaml
│   │   │   │   │   ├── injongointent_hau.yaml
│   │   │   │   │   ├── injongointent_ibo.yaml
│   │   │   │   │   ├── injongointent_kin.yaml
│   │   │   │   │   ├── injongointent_lin.yaml
│   │   │   │   │   ├── injongointent_lug.yaml
│   │   │   │   │   ├── injongointent_orm.yaml
│   │   │   │   │   ├── injongointent_sna.yaml
│   │   │   │   │   ├── injongointent_sot.yaml
│   │   │   │   │   ├── injongointent_swa.yaml
│   │   │   │   │   ├── injongointent_twi.yaml
│   │   │   │   │   ├── injongointent_wol.yaml
│   │   │   │   │   ├── injongointent_xho.yaml
│   │   │   │   │   ├── injongointent_yor.yaml
│   │   │   │   │   ├── injongointent_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── injongointent
│   │   │   │       ├── injongointent_amh.yaml
│   │   │   │       ├── injongointent_eng.yaml
│   │   │   │       ├── injongointent_ewe.yaml
│   │   │   │       ├── injongointent_hau.yaml
│   │   │   │       ├── injongointent_ibo.yaml
│   │   │   │       ├── injongointent_kin.yaml
│   │   │   │       ├── injongointent_lin.yaml
│   │   │   │       ├── injongointent_lug.yaml
│   │   │   │       ├── injongointent_orm.yaml
│   │   │   │       ├── injongointent_sna.yaml
│   │   │   │       ├── injongointent_sot.yaml
│   │   │   │       ├── injongointent_swa.yaml
│   │   │   │       ├── injongointent_twi.yaml
│   │   │   │       ├── injongointent_wol.yaml
│   │   │   │       ├── injongointent_xho.yaml
│   │   │   │       ├── injongointent_yor.yaml
│   │   │   │       ├── injongointent_zul.yaml
│   │   │   │       └── utils.py
│   │   │   ├── mafand/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── mafand.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── mafand
│   │   │   │   │   │   ├── mafand_amh-en.yaml
│   │   │   │   │   │   ├── mafand_bam-fr.yaml
│   │   │   │   │   │   ├── mafand_bbj-fr.yaml
│   │   │   │   │   │   ├── mafand_ewe-fr.yaml
│   │   │   │   │   │   ├── mafand_fon-fr.yaml
│   │   │   │   │   │   ├── mafand_hau-en.yaml
│   │   │   │   │   │   ├── mafand_ibo-en.yaml
│   │   │   │   │   │   ├── mafand_kin-en.yaml
│   │   │   │   │   │   ├── mafand_lug-en.yaml
│   │   │   │   │   │   ├── mafand_luo-en.yaml
│   │   │   │   │   │   ├── mafand_mos-fr.yaml
│   │   │   │   │   │   ├── mafand_nya-en.yaml
│   │   │   │   │   │   ├── mafand_pcm-en.yaml
│   │   │   │   │   │   ├── mafand_sna-en.yaml
│   │   │   │   │   │   ├── mafand_swa-en.yaml
│   │   │   │   │   │   ├── mafand_tsn-en.yaml
│   │   │   │   │   │   ├── mafand_twi-en.yaml
│   │   │   │   │   │   ├── mafand_wol-fr.yaml
│   │   │   │   │   │   ├── mafand_xho-en.yaml
│   │   │   │   │   │   ├── mafand_yor-en.yaml
│   │   │   │   │   │   ├── mafand_zul-en.yaml
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   └── english-african/
│   │   │   │   │       ├── mafand
│   │   │   │   │       ├── mafand_en-amh.yaml
│   │   │   │   │       ├── mafand_en-hau.yaml
│   │   │   │   │       ├── mafand_en-ibo.yaml
│   │   │   │   │       ├── mafand_en-kin.yaml
│   │   │   │   │       ├── mafand_en-lug.yaml
│   │   │   │   │       ├── mafand_en-luo.yaml
│   │   │   │   │       ├── mafand_en-nya.yaml
│   │   │   │   │       ├── mafand_en-pcm.yaml
│   │   │   │   │       ├── mafand_en-sna.yaml
│   │   │   │   │       ├── mafand_en-swa.yaml
│   │   │   │   │       ├── mafand_en-tsn.yaml
│   │   │   │   │       ├── mafand_en-twi.yaml
│   │   │   │   │       ├── mafand_en-xho.yaml
│   │   │   │   │       ├── mafand_en-yor.yaml
│   │   │   │   │       ├── mafand_en-zul.yaml
│   │   │   │   │       ├── mafand_fr-bam.yaml
│   │   │   │   │       ├── mafand_fr-bbj.yaml
│   │   │   │   │       ├── mafand_fr-ewe.yaml
│   │   │   │   │       ├── mafand_fr-fon.yaml
│   │   │   │   │       ├── mafand_fr-mos.yaml
│   │   │   │   │       ├── mafand_fr-wol.yaml
│   │   │   │   │       └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── mafand
│   │   │   │   │   │   ├── mafand_amh-en.yaml
│   │   │   │   │   │   ├── mafand_bam-fr.yaml
│   │   │   │   │   │   ├── mafand_bbj-fr.yaml
│   │   │   │   │   │   ├── mafand_ewe-fr.yaml
│   │   │   │   │   │   ├── mafand_fon-fr.yaml
│   │   │   │   │   │   ├── mafand_hau-en.yaml
│   │   │   │   │   │   ├── mafand_ibo-en.yaml
│   │   │   │   │   │   ├── mafand_kin-en.yaml
│   │   │   │   │   │   ├── mafand_lug-en.yaml
│   │   │   │   │   │   ├── mafand_luo-en.yaml
│   │   │   │   │   │   ├── mafand_mos-fr.yaml
│   │   │   │   │   │   ├── mafand_nya-en.yaml
│   │   │   │   │   │   ├── mafand_pcm-en.yaml
│   │   │   │   │   │   ├── mafand_sna-en.yaml
│   │   │   │   │   │   ├── mafand_swa-en.yaml
│   │   │   │   │   │   ├── mafand_tsn-en.yaml
│   │   │   │   │   │   ├── mafand_twi-en.yaml
│   │   │   │   │   │   ├── mafand_wol-fr.yaml
│   │   │   │   │   │   ├── mafand_xho-en.yaml
│   │   │   │   │   │   ├── mafand_yor-en.yaml
│   │   │   │   │   │   ├── mafand_zul-en.yaml
│   │   │   │   │   │   └── utils.py
│   │   │   │   │   └── english-african/
│   │   │   │   │       ├── mafand
│   │   │   │   │       ├── mafand_en-amh.yaml
│   │   │   │   │       ├── mafand_en-hau.yaml
│   │   │   │   │       ├── mafand_en-ibo.yaml
│   │   │   │   │       ├── mafand_en-kin.yaml
│   │   │   │   │       ├── mafand_en-lug.yaml
│   │   │   │   │       ├── mafand_en-luo.yaml
│   │   │   │   │       ├── mafand_en-nya.yaml
│   │   │   │   │       ├── mafand_en-pcm.yaml
│   │   │   │   │       ├── mafand_en-sna.yaml
│   │   │   │   │       ├── mafand_en-swa.yaml
│   │   │   │   │       ├── mafand_en-tsn.yaml
│   │   │   │   │       ├── mafand_en-twi.yaml
│   │   │   │   │       ├── mafand_en-xho.yaml
│   │   │   │   │       ├── mafand_en-yor.yaml
│   │   │   │   │       ├── mafand_en-zul.yaml
│   │   │   │   │       ├── mafand_fr-bam.yaml
│   │   │   │   │       ├── mafand_fr-bbj.yaml
│   │   │   │   │       ├── mafand_fr-ewe.yaml
│   │   │   │   │       ├── mafand_fr-fon.yaml
│   │   │   │   │       ├── mafand_fr-mos.yaml
│   │   │   │   │       ├── mafand_fr-wol.yaml
│   │   │   │   │       └── utils.py
│   │   │   │   └── prompt_3/
│   │   │   │       ├── african-english/
│   │   │   │       │   ├── mafand
│   │   │   │       │   ├── mafand_amh-en.yaml
│   │   │   │       │   ├── mafand_bam-fr.yaml
│   │   │   │       │   ├── mafand_bbj-fr.yaml
│   │   │   │       │   ├── mafand_ewe-fr.yaml
│   │   │   │       │   ├── mafand_fon-fr.yaml
│   │   │   │       │   ├── mafand_hau-en.yaml
│   │   │   │       │   ├── mafand_ibo-en.yaml
│   │   │   │       │   ├── mafand_kin-en.yaml
│   │   │   │       │   ├── mafand_lug-en.yaml
│   │   │   │       │   ├── mafand_luo-en.yaml
│   │   │   │       │   ├── mafand_mos-fr.yaml
│   │   │   │       │   ├── mafand_nya-en.yaml
│   │   │   │       │   ├── mafand_pcm-en.yaml
│   │   │   │       │   ├── mafand_sna-en.yaml
│   │   │   │       │   ├── mafand_swa-en.yaml
│   │   │   │       │   ├── mafand_tsn-en.yaml
│   │   │   │       │   ├── mafand_twi-en.yaml
│   │   │   │       │   ├── mafand_wol-fr.yaml
│   │   │   │       │   ├── mafand_xho-en.yaml
│   │   │   │       │   ├── mafand_yor-en.yaml
│   │   │   │       │   ├── mafand_zul-en.yaml
│   │   │   │       │   └── utils.py
│   │   │   │       └── english-african/
│   │   │   │           ├── mafand
│   │   │   │           ├── mafand_en-amh.yaml
│   │   │   │           ├── mafand_en-hau.yaml
│   │   │   │           ├── mafand_en-ibo.yaml
│   │   │   │           ├── mafand_en-kin.yaml
│   │   │   │           ├── mafand_en-lug.yaml
│   │   │   │           ├── mafand_en-luo.yaml
│   │   │   │           ├── mafand_en-nya.yaml
│   │   │   │           ├── mafand_en-pcm.yaml
│   │   │   │           ├── mafand_en-sna.yaml
│   │   │   │           ├── mafand_en-swa.yaml
│   │   │   │           ├── mafand_en-tsn.yaml
│   │   │   │           ├── mafand_en-twi.yaml
│   │   │   │           ├── mafand_en-xho.yaml
│   │   │   │           ├── mafand_en-yor.yaml
│   │   │   │           ├── mafand_en-zul.yaml
│   │   │   │           ├── mafand_fr-bam.yaml
│   │   │   │           ├── mafand_fr-bbj.yaml
│   │   │   │           ├── mafand_fr-ewe.yaml
│   │   │   │           ├── mafand_fr-fon.yaml
│   │   │   │           ├── mafand_fr-mos.yaml
│   │   │   │           ├── mafand_fr-wol.yaml
│   │   │   │           └── utils.py
│   │   │   ├── masakhaner/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── masakhaner.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── masakhaner
│   │   │   │   │   ├── masakhaner_am.yaml
│   │   │   │   │   ├── masakhaner_bbj.yaml
│   │   │   │   │   ├── masakhaner_bm.yaml
│   │   │   │   │   ├── masakhaner_ee.yaml
│   │   │   │   │   ├── masakhaner_ha.yaml
│   │   │   │   │   ├── masakhaner_ig.yaml
│   │   │   │   │   ├── masakhaner_lg.yaml
│   │   │   │   │   ├── masakhaner_luo.yaml
│   │   │   │   │   ├── masakhaner_mos.yaml
│   │   │   │   │   ├── masakhaner_ny.yaml
│   │   │   │   │   ├── masakhaner_pcm.yaml
│   │   │   │   │   ├── masakhaner_rw.yaml
│   │   │   │   │   ├── masakhaner_sn.yaml
│   │   │   │   │   ├── masakhaner_sw.yaml
│   │   │   │   │   ├── masakhaner_tn.yaml
│   │   │   │   │   ├── masakhaner_tw.yaml
│   │   │   │   │   ├── masakhaner_wo.yaml
│   │   │   │   │   ├── masakhaner_xh.yaml
│   │   │   │   │   ├── masakhaner_yo.yaml
│   │   │   │   │   ├── masakhaner_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── masakhaner
│   │   │   │   │   ├── masakhaner_am.yaml
│   │   │   │   │   ├── masakhaner_bbj.yaml
│   │   │   │   │   ├── masakhaner_bm.yaml
│   │   │   │   │   ├── masakhaner_ee.yaml
│   │   │   │   │   ├── masakhaner_ha.yaml
│   │   │   │   │   ├── masakhaner_ig.yaml
│   │   │   │   │   ├── masakhaner_lg.yaml
│   │   │   │   │   ├── masakhaner_luo.yaml
│   │   │   │   │   ├── masakhaner_mos.yaml
│   │   │   │   │   ├── masakhaner_ny.yaml
│   │   │   │   │   ├── masakhaner_pcm.yaml
│   │   │   │   │   ├── masakhaner_rw.yaml
│   │   │   │   │   ├── masakhaner_sn.yaml
│   │   │   │   │   ├── masakhaner_sw.yaml
│   │   │   │   │   ├── masakhaner_tn.yaml
│   │   │   │   │   ├── masakhaner_tw.yaml
│   │   │   │   │   ├── masakhaner_wo.yaml
│   │   │   │   │   ├── masakhaner_xh.yaml
│   │   │   │   │   ├── masakhaner_yo.yaml
│   │   │   │   │   ├── masakhaner_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── masakhaner
│   │   │   │   │   ├── masakhaner_am.yaml
│   │   │   │   │   ├── masakhaner_bbj.yaml
│   │   │   │   │   ├── masakhaner_bm.yaml
│   │   │   │   │   ├── masakhaner_ee.yaml
│   │   │   │   │   ├── masakhaner_ha.yaml
│   │   │   │   │   ├── masakhaner_ig.yaml
│   │   │   │   │   ├── masakhaner_lg.yaml
│   │   │   │   │   ├── masakhaner_luo.yaml
│   │   │   │   │   ├── masakhaner_mos.yaml
│   │   │   │   │   ├── masakhaner_ny.yaml
│   │   │   │   │   ├── masakhaner_pcm.yaml
│   │   │   │   │   ├── masakhaner_rw.yaml
│   │   │   │   │   ├── masakhaner_sn.yaml
│   │   │   │   │   ├── masakhaner_sw.yaml
│   │   │   │   │   ├── masakhaner_tn.yaml
│   │   │   │   │   ├── masakhaner_tw.yaml
│   │   │   │   │   ├── masakhaner_wo.yaml
│   │   │   │   │   ├── masakhaner_xh.yaml
│   │   │   │   │   ├── masakhaner_yo.yaml
│   │   │   │   │   ├── masakhaner_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── masakhaner
│   │   │   │   │   ├── masakhaner_am.yaml
│   │   │   │   │   ├── masakhaner_bbj.yaml
│   │   │   │   │   ├── masakhaner_bm.yaml
│   │   │   │   │   ├── masakhaner_ee.yaml
│   │   │   │   │   ├── masakhaner_ha.yaml
│   │   │   │   │   ├── masakhaner_ig.yaml
│   │   │   │   │   ├── masakhaner_lg.yaml
│   │   │   │   │   ├── masakhaner_luo.yaml
│   │   │   │   │   ├── masakhaner_mos.yaml
│   │   │   │   │   ├── masakhaner_ny.yaml
│   │   │   │   │   ├── masakhaner_pcm.yaml
│   │   │   │   │   ├── masakhaner_rw.yaml
│   │   │   │   │   ├── masakhaner_sn.yaml
│   │   │   │   │   ├── masakhaner_sw.yaml
│   │   │   │   │   ├── masakhaner_tn.yaml
│   │   │   │   │   ├── masakhaner_tw.yaml
│   │   │   │   │   ├── masakhaner_wo.yaml
│   │   │   │   │   ├── masakhaner_xh.yaml
│   │   │   │   │   ├── masakhaner_yo.yaml
│   │   │   │   │   ├── masakhaner_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── masakhaner
│   │   │   │       ├── masakhaner_am.yaml
│   │   │   │       ├── masakhaner_bbj.yaml
│   │   │   │       ├── masakhaner_bm.yaml
│   │   │   │       ├── masakhaner_ee.yaml
│   │   │   │       ├── masakhaner_ha.yaml
│   │   │   │       ├── masakhaner_ig.yaml
│   │   │   │       ├── masakhaner_lg.yaml
│   │   │   │       ├── masakhaner_luo.yaml
│   │   │   │       ├── masakhaner_mos.yaml
│   │   │   │       ├── masakhaner_ny.yaml
│   │   │   │       ├── masakhaner_pcm.yaml
│   │   │   │       ├── masakhaner_rw.yaml
│   │   │   │       ├── masakhaner_sn.yaml
│   │   │   │       ├── masakhaner_sw.yaml
│   │   │   │       ├── masakhaner_tn.yaml
│   │   │   │       ├── masakhaner_tw.yaml
│   │   │   │       ├── masakhaner_wo.yaml
│   │   │   │       ├── masakhaner_xh.yaml
│   │   │   │       ├── masakhaner_yo.yaml
│   │   │   │       ├── masakhaner_zu.yaml
│   │   │   │       └── utils.py
│   │   │   ├── masakhanews/
│   │   │   │   ├── README.md
│   │   │   │   ├── masakhanews.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── masakhanews
│   │   │   │   │   ├── masakhanews_amh.yaml
│   │   │   │   │   ├── masakhanews_eng.yaml
│   │   │   │   │   ├── masakhanews_fra.yaml
│   │   │   │   │   ├── masakhanews_hau.yaml
│   │   │   │   │   ├── masakhanews_ibo.yaml
│   │   │   │   │   ├── masakhanews_lin.yaml
│   │   │   │   │   ├── masakhanews_lug.yaml
│   │   │   │   │   ├── masakhanews_orm.yaml
│   │   │   │   │   ├── masakhanews_pcm.yaml
│   │   │   │   │   ├── masakhanews_run.yaml
│   │   │   │   │   ├── masakhanews_sna.yaml
│   │   │   │   │   ├── masakhanews_som.yaml
│   │   │   │   │   ├── masakhanews_swa.yaml
│   │   │   │   │   ├── masakhanews_tir.yaml
│   │   │   │   │   ├── masakhanews_xho.yaml
│   │   │   │   │   ├── masakhanews_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── utils.py
│   │   │   ├── masakhapos/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── masakhapos.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── masakhapos_bam.yaml
│   │   │   │   │   ├── masakhapos_bbj.yaml
│   │   │   │   │   ├── masakhapos_ewe.yaml
│   │   │   │   │   ├── masakhapos_fon.yaml
│   │   │   │   │   ├── masakhapos_hau.yaml
│   │   │   │   │   ├── masakhapos_ibo.yaml
│   │   │   │   │   ├── masakhapos_kin.yaml
│   │   │   │   │   ├── masakhapos_lug.yaml
│   │   │   │   │   ├── masakhapos_luo.yaml
│   │   │   │   │   ├── masakhapos_mos.yaml
│   │   │   │   │   ├── masakhapos_nya.yaml
│   │   │   │   │   ├── masakhapos_pcm.yaml
│   │   │   │   │   ├── masakhapos_sna.yaml
│   │   │   │   │   ├── masakhapos_swa.yaml
│   │   │   │   │   ├── masakhapos_tsn.yaml
│   │   │   │   │   ├── masakhapos_twi.yaml
│   │   │   │   │   ├── masakhapos_wol.yaml
│   │   │   │   │   ├── masakhapos_xho.yaml
│   │   │   │   │   ├── masakhapos_yaml
│   │   │   │   │   ├── masakhapos_yor.yaml
│   │   │   │   │   ├── masakhapos_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── utils.py
│   │   │   ├── naijarc/
│   │   │   │   ├── README.md
│   │   │   │   ├── naijarc.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── naijarc
│   │   │   │   │   ├── naijarc_hau.yaml
│   │   │   │   │   ├── naijarc_ibo.yaml
│   │   │   │   │   └── naijarc_yor.yaml
│   │   │   │   └── utils.py
│   │   │   ├── nollysenti/
│   │   │   │   ├── README.md
│   │   │   │   ├── nollysenti.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── nollysenti
│   │   │   │   │   ├── nollysenti_eng.yaml
│   │   │   │   │   ├── nollysenti_hau.yaml
│   │   │   │   │   ├── nollysenti_ibo.yaml
│   │   │   │   │   ├── nollysenti_pcm.yaml
│   │   │   │   │   ├── nollysenti_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── nollysenti
│   │   │   │   │   ├── nollysenti_eng.yaml
│   │   │   │   │   ├── nollysenti_hau.yaml
│   │   │   │   │   ├── nollysenti_ibo.yaml
│   │   │   │   │   ├── nollysenti_pcm.yaml
│   │   │   │   │   ├── nollysenti_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── nollysenti
│   │   │   │   │   ├── nollysenti_eng.yaml
│   │   │   │   │   ├── nollysenti_hau.yaml
│   │   │   │   │   ├── nollysenti_ibo.yaml
│   │   │   │   │   ├── nollysenti_pcm.yaml
│   │   │   │   │   ├── nollysenti_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── nollysenti
│   │   │   │   │   ├── nollysenti_eng.yaml
│   │   │   │   │   ├── nollysenti_hau.yaml
│   │   │   │   │   ├── nollysenti_ibo.yaml
│   │   │   │   │   ├── nollysenti_pcm.yaml
│   │   │   │   │   ├── nollysenti_yor.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   └── prompt_5/
│   │   │   │       ├── nollysenti
│   │   │   │       ├── nollysenti_eng.yaml
│   │   │   │       ├── nollysenti_hau.yaml
│   │   │   │       ├── nollysenti_ibo.yaml
│   │   │   │       ├── nollysenti_pcm.yaml
│   │   │   │       ├── nollysenti_yor.yaml
│   │   │   │       └── utils.py
│   │   │   ├── ntrex/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── ntrex.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── ntrex
│   │   │   │   │   │   ├── ntrex_afr_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_amh_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_arb_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_bem_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ewe_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_fra_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_hau_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ibo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_kin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_mey_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_mlg_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_msa_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nde_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nya_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_orm_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_shi_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_sna_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_som_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ssw_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_swa_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tam_Taml-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tel_Telu-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tir_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ton_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tsn_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_urd_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ven_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_wol_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_xho_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_yor_Latn-eng_Latn.yaml
│   │   │   │   │   │   └── ntrex_zul_Latn-eng_Latn.yaml
│   │   │   │   │   └── english-african/
│   │   │   │   │       ├── ntrex
│   │   │   │   │       ├── ntrex_eng_Latn-afr_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-amh_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-arb_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-bem_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ewe_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-fra_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-hau_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ibo_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-kin_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-mey_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-mlg_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-msa_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nde_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nso_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nya_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-orm_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-shi_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-sna_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-som_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ssw_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-swa_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tam_Taml.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tel_Telu.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tir_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ton_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tsn_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-urd_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ven_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-wol_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-xho_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-yor_Latn.yaml
│   │   │   │   │       └── ntrex_eng_Latn-zul_Latn.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── african-english/
│   │   │   │   │   │   ├── ntrex
│   │   │   │   │   │   ├── ntrex_afr_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_amh_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_arb_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_bem_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ewe_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_fra_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_hau_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ibo_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_kin_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_mey_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_mlg_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_msa_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nde_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nso_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_nya_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_orm_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_shi_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_sna_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_som_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ssw_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_swa_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tam_Taml-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tel_Telu-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tir_Ethi-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ton_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_tsn_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_urd_Arab-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_ven_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_wol_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_xho_Latn-eng_Latn.yaml
│   │   │   │   │   │   ├── ntrex_yor_Latn-eng_Latn.yaml
│   │   │   │   │   │   └── ntrex_zul_Latn-eng_Latn.yaml
│   │   │   │   │   └── english-african/
│   │   │   │   │       ├── ntrex
│   │   │   │   │       ├── ntrex_eng_Latn-afr_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-amh_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-arb_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-bem_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ewe_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-fra_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-hau_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ibo_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-kin_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-mey_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-mlg_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-msa_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nde_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nso_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-nya_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-orm_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-shi_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-sna_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-som_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ssw_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-swa_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tam_Taml.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tel_Telu.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tir_Ethi.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ton_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-tsn_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-urd_Arab.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-ven_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-wol_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-xho_Latn.yaml
│   │   │   │   │       ├── ntrex_eng_Latn-yor_Latn.yaml
│   │   │   │   │       └── ntrex_eng_Latn-zul_Latn.yaml
│   │   │   │   └── prompt_3/
│   │   │   │       ├── african-english/
│   │   │   │       │   ├── ntrex
│   │   │   │       │   ├── ntrex_afr_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_amh_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_arb_Arab-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_bem_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ewe_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_fra_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_hau_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ibo_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_kin_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_mey_Arab-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_mlg_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_msa_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_nde_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_nso_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_nya_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_orm_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_shi_Arab-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_sna_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_som_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ssw_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_swa_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_tam_Taml-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_tel_Telu-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_tir_Ethi-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ton_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_tsn_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_urd_Arab-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_ven_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_wol_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_xho_Latn-eng_Latn.yaml
│   │   │   │       │   ├── ntrex_yor_Latn-eng_Latn.yaml
│   │   │   │       │   └── ntrex_zul_Latn-eng_Latn.yaml
│   │   │   │       └── english-african/
│   │   │   │           ├── ntrex
│   │   │   │           ├── ntrex_eng_Latn-afr_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-amh_Ethi.yaml
│   │   │   │           ├── ntrex_eng_Latn-arb_Arab.yaml
│   │   │   │           ├── ntrex_eng_Latn-bem_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-ewe_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-fra_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-hau_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-ibo_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-kin_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-mey_Arab.yaml
│   │   │   │           ├── ntrex_eng_Latn-mlg_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-msa_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-nde_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-nso_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-nya_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-orm_Ethi.yaml
│   │   │   │           ├── ntrex_eng_Latn-shi_Arab.yaml
│   │   │   │           ├── ntrex_eng_Latn-sna_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-som_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-ssw_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-swa_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-tam_Taml.yaml
│   │   │   │           ├── ntrex_eng_Latn-tel_Telu.yaml
│   │   │   │           ├── ntrex_eng_Latn-tir_Ethi.yaml
│   │   │   │           ├── ntrex_eng_Latn-ton_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-tsn_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-urd_Arab.yaml
│   │   │   │           ├── ntrex_eng_Latn-ven_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-wol_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-xho_Latn.yaml
│   │   │   │           ├── ntrex_eng_Latn-yor_Latn.yaml
│   │   │   │           └── ntrex_eng_Latn-zul_Latn.yaml
│   │   │   ├── openai_mmlu/
│   │   │   │   ├── README.md
│   │   │   │   ├── openai_mmlu.yaml
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── openai_mmlu
│   │   │   │   │   ├── openai_mmlu_ara.yaml
│   │   │   │   │   ├── openai_mmlu_swa.yaml
│   │   │   │   │   └── openai_mmlu_yor.yaml
│   │   │   │   └── utils.py
│   │   │   ├── salt/
│   │   │   │   ├── README.md
│   │   │   │   ├── gen_utils.py
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── salt
│   │   │   │   │   ├── salt_ach-eng.yaml
│   │   │   │   │   ├── salt_eng-ach.yaml
│   │   │   │   │   ├── salt_eng-ibo.yaml
│   │   │   │   │   ├── salt_eng-lgg.yaml
│   │   │   │   │   ├── salt_eng-lug.yaml
│   │   │   │   │   ├── salt_eng-nyn.yaml
│   │   │   │   │   ├── salt_eng-swa.yaml
│   │   │   │   │   ├── salt_eng-teo.yaml
│   │   │   │   │   ├── salt_ibo-eng.yaml
│   │   │   │   │   ├── salt_lgg-eng.yaml
│   │   │   │   │   ├── salt_lug-eng.yaml
│   │   │   │   │   ├── salt_nyn-eng.yaml
│   │   │   │   │   ├── salt_swa-eng.yaml
│   │   │   │   │   └── salt_teo-eng.yaml
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── salt
│   │   │   │   │   ├── salt_ach-eng.yaml
│   │   │   │   │   ├── salt_eng-ach.yaml
│   │   │   │   │   ├── salt_eng-ibo.yaml
│   │   │   │   │   ├── salt_eng-lgg.yaml
│   │   │   │   │   ├── salt_eng-lug.yaml
│   │   │   │   │   ├── salt_eng-nyn.yaml
│   │   │   │   │   ├── salt_eng-swa.yaml
│   │   │   │   │   ├── salt_eng-teo.yaml
│   │   │   │   │   ├── salt_ibo-eng.yaml
│   │   │   │   │   ├── salt_lgg-eng.yaml
│   │   │   │   │   ├── salt_lug-eng.yaml
│   │   │   │   │   ├── salt_nyn-eng.yaml
│   │   │   │   │   ├── salt_swa-eng.yaml
│   │   │   │   │   └── salt_teo-eng.yaml
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── salt
│   │   │   │   │   ├── salt_ach-eng.yaml
│   │   │   │   │   ├── salt_eng-ach.yaml
│   │   │   │   │   ├── salt_eng-ibo.yaml
│   │   │   │   │   ├── salt_eng-lgg.yaml
│   │   │   │   │   ├── salt_eng-lug.yaml
│   │   │   │   │   ├── salt_eng-nyn.yaml
│   │   │   │   │   ├── salt_eng-swa.yaml
│   │   │   │   │   ├── salt_eng-teo.yaml
│   │   │   │   │   ├── salt_ibo-eng.yaml
│   │   │   │   │   ├── salt_lgg-eng.yaml
│   │   │   │   │   ├── salt_lug-eng.yaml
│   │   │   │   │   ├── salt_nyn-eng.yaml
│   │   │   │   │   ├── salt_swa-eng.yaml
│   │   │   │   │   └── salt_teo-eng.yaml
│   │   │   │   └── salt.yaml
│   │   │   ├── sample_run_scripts/
│   │   │   │   ├── run_afrobench.sh
│   │   │   │   └── run_afrobench_lite.sh
│   │   │   ├── sib/
│   │   │   │   ├── README.md
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── sib
│   │   │   │   │   ├── sib_aeb.yaml
│   │   │   │   │   ├── sib_afr.yaml
│   │   │   │   │   ├── sib_aka.yaml
│   │   │   │   │   ├── sib_amh.yaml
│   │   │   │   │   ├── sib_ary.yaml
│   │   │   │   │   ├── sib_arz.yaml
│   │   │   │   │   ├── sib_bam.yaml
│   │   │   │   │   ├── sib_bem.yaml
│   │   │   │   │   ├── sib_cjk.yaml
│   │   │   │   │   ├── sib_dik.yaml
│   │   │   │   │   ├── sib_dyu.yaml
│   │   │   │   │   ├── sib_eng.yaml
│   │   │   │   │   ├── sib_ewe.yaml
│   │   │   │   │   ├── sib_fon.yaml
│   │   │   │   │   ├── sib_fra.yaml
│   │   │   │   │   ├── sib_fuv.yaml
│   │   │   │   │   ├── sib_gaz.yaml
│   │   │   │   │   ├── sib_hau.yaml
│   │   │   │   │   ├── sib_ibo.yaml
│   │   │   │   │   ├── sib_kab.yaml
│   │   │   │   │   ├── sib_kam.yaml
│   │   │   │   │   ├── sib_kbp.yaml
│   │   │   │   │   ├── sib_kea.yaml
│   │   │   │   │   ├── sib_kik.yaml
│   │   │   │   │   ├── sib_kin.yaml
│   │   │   │   │   ├── sib_kmb.yaml
│   │   │   │   │   ├── sib_knc.yaml
│   │   │   │   │   ├── sib_kon.yaml
│   │   │   │   │   ├── sib_lin.yaml
│   │   │   │   │   ├── sib_lua.yaml
│   │   │   │   │   ├── sib_lug.yaml
│   │   │   │   │   ├── sib_luo.yaml
│   │   │   │   │   ├── sib_mos.yaml
│   │   │   │   │   ├── sib_nso.yaml
│   │   │   │   │   ├── sib_nus.yaml
│   │   │   │   │   ├── sib_nya.yaml
│   │   │   │   │   ├── sib_plt.yaml
│   │   │   │   │   ├── sib_por.yaml
│   │   │   │   │   ├── sib_run.yaml
│   │   │   │   │   ├── sib_sag.yaml
│   │   │   │   │   ├── sib_sna.yaml
│   │   │   │   │   ├── sib_som.yaml
│   │   │   │   │   ├── sib_sot.yaml
│   │   │   │   │   ├── sib_ssw.yaml
│   │   │   │   │   ├── sib_swa.yaml
│   │   │   │   │   ├── sib_taq.yaml
│   │   │   │   │   ├── sib_tir.yaml
│   │   │   │   │   ├── sib_tso.yaml
│   │   │   │   │   ├── sib_tum.yaml
│   │   │   │   │   ├── sib_twi.yaml
│   │   │   │   │   ├── sib_tzm.yaml
│   │   │   │   │   ├── sib_umb.yaml
│   │   │   │   │   ├── sib_wol.yaml
│   │   │   │   │   ├── sib_xho.yaml
│   │   │   │   │   ├── sib_yor.yaml
│   │   │   │   │   ├── sib_zul.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── sib.yaml
│   │   │   │   └── utils.py
│   │   │   ├── uhura-arc-easy/
│   │   │   │   ├── README.md
│   │   │   │   ├── prompt_1/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_2/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_3/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_4/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── prompt_5/
│   │   │   │   │   ├── uhura-arc-easy
│   │   │   │   │   ├── uhura-arc-easy_am.yaml
│   │   │   │   │   ├── uhura-arc-easy_en.yaml
│   │   │   │   │   ├── uhura-arc-easy_ha.yaml
│   │   │   │   │   ├── uhura-arc-easy_nso.yaml
│   │   │   │   │   ├── uhura-arc-easy_sw.yaml
│   │   │   │   │   ├── uhura-arc-easy_yo.yaml
│   │   │   │   │   ├── uhura-arc-easy_zu.yaml
│   │   │   │   │   └── utils.py
│   │   │   │   ├── uhura.yaml
│   │   │   │   └── utils.py
│   │   │   └── xlsum/
│   │   │       ├── README.md
│   │   │       ├── prompt_1/
│   │   │       │   ├── utils.py
│   │   │       │   ├── xlsum
│   │   │       │   ├── xlsum_amharic.yaml
│   │   │       │   ├── xlsum_arabic.yaml
│   │   │       │   ├── xlsum_hausa.yaml
│   │   │       │   ├── xlsum_igbo.yaml
│   │   │       │   ├── xlsum_kirundi.yaml
│   │   │       │   ├── xlsum_oromo.yaml
│   │   │       │   ├── xlsum_pidgin.yaml
│   │   │       │   ├── xlsum_somali.yaml
│   │   │       │   ├── xlsum_swahili.yaml
│   │   │       │   ├── xlsum_telugu.yaml
│   │   │       │   ├── xlsum_tigrinya.yaml
│   │   │       │   └── xlsum_yoruba.yaml
│   │   │       ├── prompt_2/
│   │   │       │   ├── utils.py
│   │   │       │   ├── xlsum
│   │   │       │   ├── xlsum_amharic.yaml
│   │   │       │   ├── xlsum_arabic.yaml
│   │   │       │   ├── xlsum_hausa.yaml
│   │   │       │   ├── xlsum_igbo.yaml
│   │   │       │   ├── xlsum_kirundi.yaml
│   │   │       │   ├── xlsum_oromo.yaml
│   │   │       │   ├── xlsum_pidgin.yaml
│   │   │       │   ├── xlsum_somali.yaml
│   │   │       │   ├── xlsum_swahili.yaml
│   │   │       │   ├── xlsum_telugu.yaml
│   │   │       │   ├── xlsum_tigrinya.yaml
│   │   │       │   └── xlsum_yoruba.yaml
│   │   │       ├── prompt_3/
│   │   │       │   ├── utils.py
│   │   │       │   ├── xlsum
│   │   │       │   ├── xlsum_amharic.yaml
│   │   │       │   ├── xlsum_arabic.yaml
│   │   │       │   ├── xlsum_hausa.yaml
│   │   │       │   ├── xlsum_igbo.yaml
│   │   │       │   ├── xlsum_kirundi.yaml
│   │   │       │   ├── xlsum_oromo.yaml
│   │   │       │   ├── xlsum_pidgin.yaml
│   │   │       │   ├── xlsum_somali.yaml
│   │   │       │   ├── xlsum_swahili.yaml
│   │   │       │   ├── xlsum_telugu.yaml
│   │   │       │   ├── xlsum_tigrinya.yaml
│   │   │       │   └── xlsum_yoruba.yaml
│   │   │       ├── utils.py
│   │   │       └── xlsum.yaml
│   │   ├── agieval/
│   │   │   ├── README.md
│   │   │   ├── agieval.yaml
│   │   │   ├── agieval_cn.yaml
│   │   │   ├── agieval_en.yaml
│   │   │   ├── agieval_nous.yaml
│   │   │   ├── aqua-rat.yaml
│   │   │   ├── gaokao-biology.yaml
│   │   │   ├── gaokao-chemistry.yaml
│   │   │   ├── gaokao-chinese.yaml
│   │   │   ├── gaokao-english.yaml
│   │   │   ├── gaokao-geography.yaml
│   │   │   ├── gaokao-history.yaml
│   │   │   ├── gaokao-mathcloze.yaml
│   │   │   ├── gaokao-mathqa.yaml
│   │   │   ├── gaokao-physics.yaml
│   │   │   ├── jec-qa-ca.yaml
│   │   │   ├── jec-qa-kd.yaml
│   │   │   ├── logiqa-en.yaml
│   │   │   ├── logiqa-zh.yaml
│   │   │   ├── lsat-ar.yaml
│   │   │   ├── lsat-lr.yaml
│   │   │   ├── lsat-rc.yaml
│   │   │   ├── math.yaml
│   │   │   ├── sat-en-without-passage.yaml
│   │   │   ├── sat-en.yaml
│   │   │   ├── sat-math.yaml
│   │   │   └── utils.py
│   │   ├── aime/
│   │   │   ├── README.md
│   │   │   ├── aime.yaml
│   │   │   ├── aime24.yaml
│   │   │   ├── aime25.yaml
│   │   │   └── utils.py
│   │   ├── alghafa/
│   │   │   ├── copa_ar/
│   │   │   │   ├── README.md
│   │   │   │   └── copa_ar.yaml
│   │   │   └── piqa_ar/
│   │   │       ├── README.md
│   │   │       └── piqa_ar.yaml
│   │   ├── anli/
│   │   │   ├── README.md
│   │   │   ├── anli_r1.yaml
│   │   │   ├── anli_r2.yaml
│   │   │   └── anli_r3.yaml
│   │   ├── arab_culture/
│   │   │   ├── README.md
│   │   │   ├── _arab_culture.yaml
│   │   │   ├── _arab_culture_gulf.yaml
│   │   │   ├── _arab_culture_levant.yaml
│   │   │   ├── _arab_culture_nile_valley.yaml
│   │   │   ├── _arab_culture_north_africa.yaml
│   │   │   ├── _default_arab_culture_mcq_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── arab_culture_algeria.yaml
│   │   │   ├── arab_culture_egypt.yaml
│   │   │   ├── arab_culture_jordan.yaml
│   │   │   ├── arab_culture_ksa.yaml
│   │   │   ├── arab_culture_lebanon.yaml
│   │   │   ├── arab_culture_libya.yaml
│   │   │   ├── arab_culture_morocco.yaml
│   │   │   ├── arab_culture_palestine.yaml
│   │   │   ├── arab_culture_sudan.yaml
│   │   │   ├── arab_culture_syria.yaml
│   │   │   ├── arab_culture_tunisia.yaml
│   │   │   ├── arab_culture_uae.yaml
│   │   │   ├── arab_culture_yemen.yaml
│   │   │   ├── prompts.py
│   │   │   └── utils_mcq.py
│   │   ├── arab_culture_completion/
│   │   │   ├── README.md
│   │   │   ├── _arab_culture_completion.yaml
│   │   │   ├── _arab_culture_completion_gulf.yaml
│   │   │   ├── _arab_culture_completion_levant.yaml
│   │   │   ├── _arab_culture_completion_nile_valley.yaml
│   │   │   ├── _arab_culture_completion_north_africa.yaml
│   │   │   ├── _default_arab_culture_completion_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── arab_culture_completion_algeria.yaml
│   │   │   ├── arab_culture_completion_egypt.yaml
│   │   │   ├── arab_culture_completion_jordan.yaml
│   │   │   ├── arab_culture_completion_ksa.yaml
│   │   │   ├── arab_culture_completion_lebanon.yaml
│   │   │   ├── arab_culture_completion_libya.yaml
│   │   │   ├── arab_culture_completion_morocco.yaml
│   │   │   ├── arab_culture_completion_palestine.yaml
│   │   │   ├── arab_culture_completion_sudan.yaml
│   │   │   ├── arab_culture_completion_syria.yaml
│   │   │   ├── arab_culture_completion_tunisia.yaml
│   │   │   ├── arab_culture_completion_uae.yaml
│   │   │   ├── arab_culture_completion_yemen.yaml
│   │   │   ├── prompts.py
│   │   │   └── utils_completion.py
│   │   ├── arabic_leaderboard_complete/
│   │   │   ├── README.md
│   │   │   ├── arabic_leaderboard_alghafa/
│   │   │   │   ├── arabic_leaderboard_alghafa.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_mcq_exams_test_ar.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_meta_ar_dialects.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_meta_ar_msa.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_facts_truefalse_balanced_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_grounded_statement_soqal_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_grounded_statement_xglue_mlqa_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_rating_sentiment_no_neutral_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_rating_sentiment_task.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_sentiment_task.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_exams/
│   │   │   │   ├── arabic_exams.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_exams.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mmlu/
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_abstract_algebra.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_anatomy.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_astronomy.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_business_ethics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_clinical_knowledge.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_biology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_chemistry.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_computer_science.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_mathematics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_medicine.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_physics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_computer_security.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_conceptual_physics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_econometrics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_electrical_engineering.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_elementary_mathematics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_formal_logic.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_global_facts.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_biology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_chemistry.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_computer_science.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_european_history.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_geography.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_government_and_politics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_macroeconomics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_mathematics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_microeconomics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_physics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_psychology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_statistics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_us_history.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_world_history.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_human_aging.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_human_sexuality.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_international_law.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_jurisprudence.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_logical_fallacies.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_machine_learning.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_management.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_marketing.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_medical_genetics.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_miscellaneous.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_moral_disputes.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_moral_scenarios.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_nutrition.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_philosophy.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_prehistory.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_accounting.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_law.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_medicine.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_psychology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_public_relations.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_security_studies.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_sociology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_us_foreign_policy.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_virology.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_world_religions.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_arc_challenge/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_arc_challenge.yaml
│   │   │   │   ├── arabic_mt_arc_challenge.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_arc_easy/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_arc_easy.yaml
│   │   │   │   ├── arabic_mt_arc_easy.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_boolq/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_boolq.yaml
│   │   │   │   ├── arabic_mt_boolq.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_copa/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_copa.yaml
│   │   │   │   ├── arabic_mt_copa.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_hellaswag/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_hellaswag.yaml
│   │   │   │   ├── arabic_mt_hellaswag.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_mmlu/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_mmlu.yaml
│   │   │   │   ├── arabic_mt_mmlu.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_openbook_qa/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_openbook_qa.yaml
│   │   │   │   ├── arabic_mt_openbook_qa.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_piqa/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_piqa.yaml
│   │   │   │   ├── arabic_mt_piqa.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_race/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_race.yaml
│   │   │   │   ├── arabic_mt_race.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_sciq/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_sciq.yaml
│   │   │   │   ├── arabic_mt_sciq.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_toxigen/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_toxigen.yaml
│   │   │   │   ├── arabic_mt_toxigen.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_avca/
│   │   │   │   ├── arabic_leaderboard_acva.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Algeria.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Ancient_Egypt.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arab_Empire.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Architecture.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Art.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Astronomy.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Calligraphy.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Ceremony.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Clothing.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Culture.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Food.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Funeral.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Geography.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_History.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Language_Origin.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Literature.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Math.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Medicine.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Music.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Ornament.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Philosophy.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Physics_and_Chemistry.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Wedding.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Bahrain.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Comoros.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Egypt_modern.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromAncientEgypt.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromByzantium.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromChina.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromGreece.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromIslam.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromPersia.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromRome.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Iraq.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islam_Education.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islam_branches_and_schools.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islamic_law_system.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Jordan.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Kuwait.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Lebanon.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Libya.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Mauritania.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Mesopotamia_civilization.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Morocco.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Oman.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Palestine.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Qatar.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Saudi_Arabia.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Somalia.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Sudan.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Syria.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Tunisia.yaml
│   │   │   │   ├── arabic_leaderboard_acva_United_Arab_Emirates.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Yemen.yaml
│   │   │   │   ├── arabic_leaderboard_acva_communication.yaml
│   │   │   │   ├── arabic_leaderboard_acva_computer_and_phone.yaml
│   │   │   │   ├── arabic_leaderboard_acva_daily_life.yaml
│   │   │   │   ├── arabic_leaderboard_acva_entertainment.yaml
│   │   │   │   └── utils.py
│   │   │   └── arabic_leaderboard_complete.yaml
│   │   ├── arabic_leaderboard_light/
│   │   │   ├── README.md
│   │   │   ├── arabic_leaderboard_alghafa_light/
│   │   │   │   ├── arabic_leaderboard_alghafa_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_mcq_exams_test_ar_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_meta_ar_dialects_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_meta_ar_msa_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_facts_truefalse_balanced_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_grounded_statement_soqal_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_grounded_statement_xglue_mlqa_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_rating_sentiment_no_neutral_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_rating_sentiment_task_light.yaml
│   │   │   │   ├── arabic_leaderboard_alghafa_multiple_choice_sentiment_task_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_exams_light/
│   │   │   │   ├── arabic_exams_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_exams_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mmlu_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_abstract_algebra_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_anatomy_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_astronomy_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_business_ethics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_clinical_knowledge_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_biology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_chemistry_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_computer_science_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_mathematics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_medicine_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_college_physics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_computer_security_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_conceptual_physics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_econometrics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_electrical_engineering_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_elementary_mathematics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_formal_logic_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_global_facts_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_biology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_chemistry_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_computer_science_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_european_history_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_geography_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_government_and_politics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_macroeconomics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_mathematics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_microeconomics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_physics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_psychology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_statistics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_us_history_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_high_school_world_history_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_human_aging_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_human_sexuality_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_international_law_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_jurisprudence_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_logical_fallacies_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_machine_learning_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_management_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_marketing_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_medical_genetics_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_miscellaneous_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_moral_disputes_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_moral_scenarios_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_nutrition_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_philosophy_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_prehistory_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_accounting_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_law_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_medicine_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_professional_psychology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_public_relations_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_security_studies_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_sociology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_us_foreign_policy_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_virology_light.yaml
│   │   │   │   ├── arabic_leaderboard_arabic_mmlu_world_religions_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_arc_challenge_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_arc_challenge_light.yaml
│   │   │   │   ├── arabic_mt_arc_challenge_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_arc_easy_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_arc_easy_light.yaml
│   │   │   │   ├── arabic_mt_arc_easy_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_boolq_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_boolq_light.yaml
│   │   │   │   ├── arabic_mt_boolq_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_copa_light/
│   │   │   │   ├── arabic_mt_copa_light.yaml
│   │   │   │   ├── arbic_leaderboard_arabic_mt_copa_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_hellaswag_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_hellaswag_light.yaml
│   │   │   │   ├── arabic_mt_hellaswag_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_mmlu_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_mmlu_light.yaml
│   │   │   │   ├── arabic_mt_mmlu_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_openbook_qa_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_openbook_qa_light.yaml
│   │   │   │   ├── arabic_mt_openbook_qa_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_piqa_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_piqa_light.yaml
│   │   │   │   ├── arabic_mt_piqa_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_race_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_race_light.yaml
│   │   │   │   ├── arabic_mt_race_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_sciq_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_sciq_light.yaml
│   │   │   │   ├── arabic_mt_sciq_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_arabic_mt_toxigen_light/
│   │   │   │   ├── arabic_leaderboard_arabic_mt_toxigen_light.yaml
│   │   │   │   ├── arabic_mt_toxigen_light.yaml
│   │   │   │   └── utils.py
│   │   │   ├── arabic_leaderboard_avca_light/
│   │   │   │   ├── arabic_leaderboard_acva_Algeria_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Ancient_Egypt_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arab_Empire_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Architecture_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Art_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Astronomy_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Calligraphy_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Ceremony_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Clothing_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Culture_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Food_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Funeral_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Geography_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_History_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Language_Origin_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Literature_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Math_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Medicine_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Music_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Ornament_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Philosophy_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Physics_and_Chemistry_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Arabic_Wedding_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Bahrain_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Comoros_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Egypt_modern_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromAncientEgypt_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromByzantium_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromChina_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromGreece_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromIslam_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromPersia_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_InfluenceFromRome_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Iraq_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islam_Education_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islam_branches_and_schools_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Islamic_law_system_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Jordan_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Kuwait_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Lebanon_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Libya_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Mauritania_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Mesopotamia_civilization_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Morocco_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Oman_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Palestine_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Qatar_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Saudi_Arabia_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Somalia_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Sudan_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Syria_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Tunisia_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_United_Arab_Emirates_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_Yemen_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_communication_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_computer_and_phone_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_daily_life_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_entertainment_light.yaml
│   │   │   │   ├── arabic_leaderboard_acva_light.yaml
│   │   │   │   └── utils.py
│   │   │   └── arabic_leaderboard_light.yaml
│   │   ├── arabicmmlu/
│   │   │   ├── README.md
│   │   │   ├── _arabicmmlu.yaml
│   │   │   ├── _arabicmmlu_humanities.yaml
│   │   │   ├── _arabicmmlu_language.yaml
│   │   │   ├── _arabicmmlu_other.yaml
│   │   │   ├── _arabicmmlu_social_science.yaml
│   │   │   ├── _arabicmmlu_stem.yaml
│   │   │   ├── _default_arabicmmlu_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── arabicmmlu_accounting_university.yaml
│   │   │   ├── arabicmmlu_arabic_language_general.yaml
│   │   │   ├── arabicmmlu_arabic_language_grammar.yaml
│   │   │   ├── arabicmmlu_arabic_language_high_school.yaml
│   │   │   ├── arabicmmlu_arabic_language_middle_school.yaml
│   │   │   ├── arabicmmlu_arabic_language_primary_school.yaml
│   │   │   ├── arabicmmlu_biology_high_school.yaml
│   │   │   ├── arabicmmlu_civics_high_school.yaml
│   │   │   ├── arabicmmlu_civics_middle_school.yaml
│   │   │   ├── arabicmmlu_computer_science_high_school.yaml
│   │   │   ├── arabicmmlu_computer_science_middle_school.yaml
│   │   │   ├── arabicmmlu_computer_science_primary_school.yaml
│   │   │   ├── arabicmmlu_computer_science_university.yaml
│   │   │   ├── arabicmmlu_driving_test.yaml
│   │   │   ├── arabicmmlu_economics_high_school.yaml
│   │   │   ├── arabicmmlu_economics_middle_school.yaml
│   │   │   ├── arabicmmlu_economics_university.yaml
│   │   │   ├── arabicmmlu_general_knowledge.yaml
│   │   │   ├── arabicmmlu_general_knowledge_middle_school.yaml
│   │   │   ├── arabicmmlu_general_knowledge_primary_school.yaml
│   │   │   ├── arabicmmlu_geography_high_school.yaml
│   │   │   ├── arabicmmlu_geography_middle_school.yaml
│   │   │   ├── arabicmmlu_geography_primary_school.yaml
│   │   │   ├── arabicmmlu_history_high_school.yaml
│   │   │   ├── arabicmmlu_history_middle_school.yaml
│   │   │   ├── arabicmmlu_history_primary_school.yaml
│   │   │   ├── arabicmmlu_islamic_studies.yaml
│   │   │   ├── arabicmmlu_islamic_studies_high_school.yaml
│   │   │   ├── arabicmmlu_islamic_studies_middle_school.yaml
│   │   │   ├── arabicmmlu_islamic_studies_primary_school.yaml
│   │   │   ├── arabicmmlu_law_professional.yaml
│   │   │   ├── arabicmmlu_management_university.yaml
│   │   │   ├── arabicmmlu_math_primary_school.yaml
│   │   │   ├── arabicmmlu_natural_science_middle_school.yaml
│   │   │   ├── arabicmmlu_natural_science_primary_school.yaml
│   │   │   ├── arabicmmlu_philosophy_high_school.yaml
│   │   │   ├── arabicmmlu_physics_high_school.yaml
│   │   │   ├── arabicmmlu_political_science_university.yaml
│   │   │   ├── arabicmmlu_social_science_middle_school.yaml
│   │   │   ├── arabicmmlu_social_science_primary_school.yaml
│   │   │   └── utils.py
│   │   ├── aradice/
│   │   │   ├── ArabicMMLU/
│   │   │   │   ├── EGY/
│   │   │   │   │   ├── AraDiCE_ArabicMMLU.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_humanities_history.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_social-science_civics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_social-science_economics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_social-science_geography.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_stem_biology.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_high_stem_physics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_humanities_history.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_social-science_civics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_social-science_economics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_social-science_geography.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_social-science_social-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_stem_computer-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_middle_stem_natural-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_humanities_islamic-studies.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_language_arabic-language-general.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_language_arabic-language-grammar.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_other_driving-test.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_na_other_general-knowledge.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_humanities_history.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_humanities_islamic-studies.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_language_arabic-language.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_other_general-knowledge.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_social-science_geography.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_social-science_social-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_stem_computer-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_stem_math.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_primary_stem_natural-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_prof_humanities_law.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_other_management.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_social-science_accounting.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_social-science_economics.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_social-science_political-science.yaml
│   │   │   │   │   ├── AraDiCE_ArabicMMLU_univ_stem_computer-science.yaml
│   │   │   │   │   ├── _default_template_yaml
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   └── utils.py
│   │   │   │   └── LEV/
│   │   │   │       ├── AraDiCE_ArabicMMLU.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_humanities_history.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_social-science_civics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_social-science_economics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_social-science_geography.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_stem_biology.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_high_stem_physics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_humanities_history.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_social-science_civics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_social-science_economics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_social-science_geography.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_social-science_social-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_stem_computer-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_middle_stem_natural-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_humanities_islamic-studies.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_language_arabic-language-general.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_language_arabic-language-grammar.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_other_driving-test.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_na_other_general-knowledge.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_humanities_history.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_humanities_islamic-studies.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_language_arabic-language.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_other_general-knowledge.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_social-science_geography.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_social-science_social-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_stem_computer-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_stem_math.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_primary_stem_natural-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_prof_humanities_law.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_other_management.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_social-science_accounting.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_social-science_economics.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_social-science_political-science.yaml
│   │   │   │       ├── AraDiCE_ArabicMMLU_univ_stem_computer-science.yaml
│   │   │   │       ├── _default_template_yaml
│   │   │   │       ├── metrics.py
│   │   │   │       └── utils.py
│   │   │   ├── README.md
│   │   │   ├── aradice.yaml
│   │   │   ├── boolq/
│   │   │   │   ├── EGY/
│   │   │   │   │   ├── boolq_egy.yaml
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   └── utils.py
│   │   │   │   ├── ENG/
│   │   │   │   │   ├── boolq_eng.yaml
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   └── utils.py
│   │   │   │   ├── LEV/
│   │   │   │   │   ├── boolq_lev.yaml
│   │   │   │   │   ├── metrics.py
│   │   │   │   │   └── utils.py
│   │   │   │   └── MSA/
│   │   │   │       ├── boolq_msa.yaml
│   │   │   │       ├── metrics.py
│   │   │   │       └── utils.py
│   │   │   ├── cultural-benchmark/
│   │   │   │   ├── egypt.yaml
│   │   │   │   ├── jordan.yaml
│   │   │   │   ├── lebanon.yaml
│   │   │   │   ├── metrics.py
│   │   │   │   ├── palestine.yaml
│   │   │   │   ├── qatar.yaml
│   │   │   │   ├── syria.yaml
│   │   │   │   └── utils.py
│   │   │   ├── openbookqa/
│   │   │   │   ├── metrics.py
│   │   │   │   ├── openbookqa_egy.yaml
│   │   │   │   ├── openbookqa_eng.yaml
│   │   │   │   ├── openbookqa_lev.yaml
│   │   │   │   ├── openbookqa_msa.yaml
│   │   │   │   └── utils.py
│   │   │   ├── piqa/
│   │   │   │   ├── metrics.py
│   │   │   │   ├── piqa_egy.yaml
│   │   │   │   ├── piqa_eng.yaml
│   │   │   │   ├── piqa_lev.yaml
│   │   │   │   └── piqa_msa.yaml
│   │   │   ├── truthfulqa_mcq/
│   │   │   │   ├── metrics.py
│   │   │   │   ├── truthfulqa_mc1_egy.yaml
│   │   │   │   ├── truthfulqa_mc1_eng.yaml
│   │   │   │   ├── truthfulqa_mc1_lev.yaml
│   │   │   │   └── truthfulqa_mc1_msa.yaml
│   │   │   └── winogrande/
│   │   │       ├── metrics.py
│   │   │       ├── utils.py
│   │   │       ├── winogrande_egy.yaml
│   │   │       ├── winogrande_eng.yaml
│   │   │       ├── winogrande_lev.yaml
│   │   │       └── winogrande_msa.yaml
│   │   ├── arc/
│   │   │   ├── README.md
│   │   │   ├── arc_challenge.yaml
│   │   │   ├── arc_challenge_chat.yaml
│   │   │   └── arc_easy.yaml
│   │   ├── arc_mt/
│   │   │   ├── README.md
│   │   │   ├── arc_challenge_mt_da.yaml
│   │   │   ├── arc_challenge_mt_de.yaml
│   │   │   ├── arc_challenge_mt_el.yaml
│   │   │   ├── arc_challenge_mt_es.yaml
│   │   │   ├── arc_challenge_mt_fi.yaml
│   │   │   ├── arc_challenge_mt_hu.yaml
│   │   │   ├── arc_challenge_mt_is.yaml
│   │   │   ├── arc_challenge_mt_it.yaml
│   │   │   ├── arc_challenge_mt_nb.yaml
│   │   │   ├── arc_challenge_mt_pl.yaml
│   │   │   ├── arc_challenge_mt_pt.yaml
│   │   │   └── arc_challenge_mt_sv.yaml
│   │   ├── arithmetic/
│   │   │   ├── README.md
│   │   │   ├── arithmetic_1dc.yaml
│   │   │   ├── arithmetic_2da.yaml
│   │   │   ├── arithmetic_2dm.yaml
│   │   │   ├── arithmetic_2ds.yaml
│   │   │   ├── arithmetic_3da.yaml
│   │   │   ├── arithmetic_3ds.yaml
│   │   │   ├── arithmetic_4da.yaml
│   │   │   ├── arithmetic_4ds.yaml
│   │   │   ├── arithmetic_5da.yaml
│   │   │   └── arithmetic_5ds.yaml
│   │   ├── asdiv/
│   │   │   ├── README.md
│   │   │   ├── asdiv-cot-llama.yaml
│   │   │   └── default.yaml
│   │   ├── babi/
│   │   │   ├── README.md
│   │   │   └── babi.yaml
│   │   ├── babilong/
│   │   │   ├── README.md
│   │   │   ├── _babilong_common_yaml
│   │   │   ├── babilong.yaml
│   │   │   ├── babilong_longctx.yaml
│   │   │   ├── babilong_qa1.yaml
│   │   │   ├── babilong_qa10.yaml
│   │   │   ├── babilong_qa11.yaml
│   │   │   ├── babilong_qa12.yaml
│   │   │   ├── babilong_qa13.yaml
│   │   │   ├── babilong_qa14.yaml
│   │   │   ├── babilong_qa15.yaml
│   │   │   ├── babilong_qa16.yaml
│   │   │   ├── babilong_qa17.yaml
│   │   │   ├── babilong_qa18.yaml
│   │   │   ├── babilong_qa19.yaml
│   │   │   ├── babilong_qa2.yaml
│   │   │   ├── babilong_qa20.yaml
│   │   │   ├── babilong_qa3.yaml
│   │   │   ├── babilong_qa4.yaml
│   │   │   ├── babilong_qa5.yaml
│   │   │   ├── babilong_qa6.yaml
│   │   │   ├── babilong_qa7.yaml
│   │   │   ├── babilong_qa8.yaml
│   │   │   ├── babilong_qa9.yaml
│   │   │   └── common_utils.py
│   │   ├── bangla/
│   │   │   ├── README.md
│   │   │   ├── bangla_boolqa.yaml
│   │   │   ├── bangla_commonsenseqa.yaml
│   │   │   ├── bangla_mmlu.yaml
│   │   │   ├── bangla_openbookqa.yaml
│   │   │   └── bangla_piqa.yaml
│   │   ├── basque_bench/
│   │   │   ├── README.md
│   │   │   ├── arc_eu_challenge.yaml
│   │   │   ├── arc_eu_easy.yaml
│   │   │   ├── basque_bench.yaml
│   │   │   ├── flores_eu/
│   │   │   │   ├── _flores_common_yaml
│   │   │   │   ├── create_yamls_flores_eu.py
│   │   │   │   ├── flores_ca-eu.yaml
│   │   │   │   ├── flores_de-eu.yaml
│   │   │   │   ├── flores_en-eu.yaml
│   │   │   │   ├── flores_es-eu.yaml
│   │   │   │   ├── flores_eu-ca.yaml
│   │   │   │   ├── flores_eu-de.yaml
│   │   │   │   ├── flores_eu-en.yaml
│   │   │   │   ├── flores_eu-es.yaml
│   │   │   │   ├── flores_eu-fr.yaml
│   │   │   │   ├── flores_eu-gl.yaml
│   │   │   │   ├── flores_eu-it.yaml
│   │   │   │   ├── flores_eu-pt.yaml
│   │   │   │   ├── flores_eu.yaml
│   │   │   │   ├── flores_fr-eu.yaml
│   │   │   │   ├── flores_gl-eu.yaml
│   │   │   │   ├── flores_it-eu.yaml
│   │   │   │   └── flores_pt-eu.yaml
│   │   │   ├── mgsm_direct_eu.yaml
│   │   │   ├── mgsm_native_cot_eu.yaml
│   │   │   ├── paws_eu.yaml
│   │   │   ├── piqa_eu.yaml
│   │   │   ├── utils.py
│   │   │   ├── wnli_eu.yaml
│   │   │   └── xcopa_eu.yaml
│   │   ├── basqueglue/
│   │   │   ├── README.md
│   │   │   ├── bec.yaml
│   │   │   ├── bhtc.yaml
│   │   │   ├── coref.yaml
│   │   │   ├── qnli.yaml
│   │   │   ├── utils.py
│   │   │   ├── vaxx.yaml
│   │   │   └── wic.yaml
│   │   ├── bbh/
│   │   │   ├── README.md
│   │   │   ├── _generate_configs.py
│   │   │   ├── cot_fewshot/
│   │   │   │   ├── _bbh.yaml
│   │   │   │   ├── _bbh_cot_fewshot.yaml
│   │   │   │   ├── _cot_fewshot_template_yaml
│   │   │   │   ├── boolean_expressions.yaml
│   │   │   │   ├── causal_judgement.yaml
│   │   │   │   ├── date_understanding.yaml
│   │   │   │   ├── disambiguation_qa.yaml
│   │   │   │   ├── dyck_languages.yaml
│   │   │   │   ├── formal_fallacies.yaml
│   │   │   │   ├── geometric_shapes.yaml
│   │   │   │   ├── hyperbaton.yaml
│   │   │   │   ├── logical_deduction_five_objects.yaml
│   │   │   │   ├── logical_deduction_seven_objects.yaml
│   │   │   │   ├── logical_deduction_three_objects.yaml
│   │   │   │   ├── movie_recommendation.yaml
│   │   │   │   ├── multistep_arithmetic_two.yaml
│   │   │   │   ├── navigate.yaml
│   │   │   │   ├── object_counting.yaml
│   │   │   │   ├── penguins_in_a_table.yaml
│   │   │   │   ├── reasoning_about_colored_objects.yaml
│   │   │   │   ├── ruin_names.yaml
│   │   │   │   ├── salient_translation_error_detection.yaml
│   │   │   │   ├── snarks.yaml
│   │   │   │   ├── sports_understanding.yaml
│   │   │   │   ├── temporal_sequences.yaml
│   │   │   │   ├── tracking_shuffled_objects_five_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_seven_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_three_objects.yaml
│   │   │   │   ├── web_of_lies.yaml
│   │   │   │   └── word_sorting.yaml
│   │   │   ├── cot_zeroshot/
│   │   │   │   ├── _bbh_cot_zeroshot.yaml
│   │   │   │   ├── _cot_zeroshot_template_yaml
│   │   │   │   ├── boolean_expressions.yaml
│   │   │   │   ├── causal_judgement.yaml
│   │   │   │   ├── date_understanding.yaml
│   │   │   │   ├── disambiguation_qa.yaml
│   │   │   │   ├── dyck_languages.yaml
│   │   │   │   ├── formal_fallacies.yaml
│   │   │   │   ├── geometric_shapes.yaml
│   │   │   │   ├── hyperbaton.yaml
│   │   │   │   ├── logical_deduction_five_objects.yaml
│   │   │   │   ├── logical_deduction_seven_objects.yaml
│   │   │   │   ├── logical_deduction_three_objects.yaml
│   │   │   │   ├── movie_recommendation.yaml
│   │   │   │   ├── multistep_arithmetic_two.yaml
│   │   │   │   ├── navigate.yaml
│   │   │   │   ├── object_counting.yaml
│   │   │   │   ├── penguins_in_a_table.yaml
│   │   │   │   ├── reasoning_about_colored_objects.yaml
│   │   │   │   ├── ruin_names.yaml
│   │   │   │   ├── salient_translation_error_detection.yaml
│   │   │   │   ├── snarks.yaml
│   │   │   │   ├── sports_understanding.yaml
│   │   │   │   ├── temporal_sequences.yaml
│   │   │   │   ├── tracking_shuffled_objects_five_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_seven_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_three_objects.yaml
│   │   │   │   ├── utils.py
│   │   │   │   ├── web_of_lies.yaml
│   │   │   │   └── word_sorting.yaml
│   │   │   ├── fewshot/
│   │   │   │   ├── _bbh_fewshot.yaml
│   │   │   │   ├── _fewshot_template_yaml
│   │   │   │   ├── boolean_expressions.yaml
│   │   │   │   ├── causal_judgement.yaml
│   │   │   │   ├── date_understanding.yaml
│   │   │   │   ├── disambiguation_qa.yaml
│   │   │   │   ├── dyck_languages.yaml
│   │   │   │   ├── formal_fallacies.yaml
│   │   │   │   ├── geometric_shapes.yaml
│   │   │   │   ├── hyperbaton.yaml
│   │   │   │   ├── logical_deduction_five_objects.yaml
│   │   │   │   ├── logical_deduction_seven_objects.yaml
│   │   │   │   ├── logical_deduction_three_objects.yaml
│   │   │   │   ├── movie_recommendation.yaml
│   │   │   │   ├── multistep_arithmetic_two.yaml
│   │   │   │   ├── navigate.yaml
│   │   │   │   ├── object_counting.yaml
│   │   │   │   ├── penguins_in_a_table.yaml
│   │   │   │   ├── reasoning_about_colored_objects.yaml
│   │   │   │   ├── ruin_names.yaml
│   │   │   │   ├── salient_translation_error_detection.yaml
│   │   │   │   ├── snarks.yaml
│   │   │   │   ├── sports_understanding.yaml
│   │   │   │   ├── temporal_sequences.yaml
│   │   │   │   ├── tracking_shuffled_objects_five_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_seven_objects.yaml
│   │   │   │   ├── tracking_shuffled_objects_three_objects.yaml
│   │   │   │   ├── web_of_lies.yaml
│   │   │   │   └── word_sorting.yaml
│   │   │   └── zeroshot/
│   │   │       ├── _bbh_zeroshot.yaml
│   │   │       ├── _zeroshot_template_yaml
│   │   │       ├── boolean_expressions.yaml
│   │   │       ├── causal_judgement.yaml
│   │   │       ├── date_understanding.yaml
│   │   │       ├── disambiguation_qa.yaml
│   │   │       ├── dyck_languages.yaml
│   │   │       ├── formal_fallacies.yaml
│   │   │       ├── geometric_shapes.yaml
│   │   │       ├── hyperbaton.yaml
│   │   │       ├── logical_deduction_five_objects.yaml
│   │   │       ├── logical_deduction_seven_objects.yaml
│   │   │       ├── logical_deduction_three_objects.yaml
│   │   │       ├── movie_recommendation.yaml
│   │   │       ├── multistep_arithmetic_two.yaml
│   │   │       ├── navigate.yaml
│   │   │       ├── object_counting.yaml
│   │   │       ├── penguins_in_a_table.yaml
│   │   │       ├── reasoning_about_colored_objects.yaml
│   │   │       ├── ruin_names.yaml
│   │   │       ├── salient_translation_error_detection.yaml
│   │   │       ├── snarks.yaml
│   │   │       ├── sports_understanding.yaml
│   │   │       ├── temporal_sequences.yaml
│   │   │       ├── tracking_shuffled_objects_five_objects.yaml
│   │   │       ├── tracking_shuffled_objects_seven_objects.yaml
│   │   │       ├── tracking_shuffled_objects_three_objects.yaml
│   │   │       ├── utils.py
│   │   │       ├── web_of_lies.yaml
│   │   │       └── word_sorting.yaml
│   │   ├── bbq/
│   │   │   ├── README.md
│   │   │   ├── bbq_generate.yaml
│   │   │   ├── bbq_generate_ambig.yaml
│   │   │   ├── bbq_generate_disambig.yaml
│   │   │   ├── bbq_multiple_choice.yaml
│   │   │   ├── bbq_multiple_choice_ambig.yaml
│   │   │   ├── bbq_multiple_choice_disambig.yaml
│   │   │   └── utils.py
│   │   ├── bear/
│   │   │   ├── README.md
│   │   │   ├── bear.yaml
│   │   │   └── bear_big.yaml
│   │   ├── belebele/
│   │   │   ├── README.md
│   │   │   ├── _belebele.yaml
│   │   │   ├── _default_template_yaml
│   │   │   ├── _generate_configs.py
│   │   │   ├── belebele_acm_Arab.yaml
│   │   │   ├── belebele_afr_Latn.yaml
│   │   │   ├── belebele_als_Latn.yaml
│   │   │   ├── belebele_amh_Ethi.yaml
│   │   │   ├── belebele_apc_Arab.yaml
│   │   │   ├── belebele_arb_Arab.yaml
│   │   │   ├── belebele_arb_Latn.yaml
│   │   │   ├── belebele_ars_Arab.yaml
│   │   │   ├── belebele_ary_Arab.yaml
│   │   │   ├── belebele_arz_Arab.yaml
│   │   │   ├── belebele_asm_Beng.yaml
│   │   │   ├── belebele_azj_Latn.yaml
│   │   │   ├── belebele_bam_Latn.yaml
│   │   │   ├── belebele_ben_Beng.yaml
│   │   │   ├── belebele_ben_Latn.yaml
│   │   │   ├── belebele_bod_Tibt.yaml
│   │   │   ├── belebele_bul_Cyrl.yaml
│   │   │   ├── belebele_cat_Latn.yaml
│   │   │   ├── belebele_ceb_Latn.yaml
│   │   │   ├── belebele_ces_Latn.yaml
│   │   │   ├── belebele_ckb_Arab.yaml
│   │   │   ├── belebele_dan_Latn.yaml
│   │   │   ├── belebele_deu_Latn.yaml
│   │   │   ├── belebele_ell_Grek.yaml
│   │   │   ├── belebele_eng_Latn.yaml
│   │   │   ├── belebele_est_Latn.yaml
│   │   │   ├── belebele_eus_Latn.yaml
│   │   │   ├── belebele_fin_Latn.yaml
│   │   │   ├── belebele_fra_Latn.yaml
│   │   │   ├── belebele_fuv_Latn.yaml
│   │   │   ├── belebele_gaz_Latn.yaml
│   │   │   ├── belebele_grn_Latn.yaml
│   │   │   ├── belebele_guj_Gujr.yaml
│   │   │   ├── belebele_hat_Latn.yaml
│   │   │   ├── belebele_hau_Latn.yaml
│   │   │   ├── belebele_heb_Hebr.yaml
│   │   │   ├── belebele_hin_Deva.yaml
│   │   │   ├── belebele_hin_Latn.yaml
│   │   │   ├── belebele_hrv_Latn.yaml
│   │   │   ├── belebele_hun_Latn.yaml
│   │   │   ├── belebele_hye_Armn.yaml
│   │   │   ├── belebele_ibo_Latn.yaml
│   │   │   ├── belebele_ilo_Latn.yaml
│   │   │   ├── belebele_ind_Latn.yaml
│   │   │   ├── belebele_isl_Latn.yaml
│   │   │   ├── belebele_ita_Latn.yaml
│   │   │   ├── belebele_jav_Latn.yaml
│   │   │   ├── belebele_jpn_Jpan.yaml
│   │   │   ├── belebele_kac_Latn.yaml
│   │   │   ├── belebele_kan_Knda.yaml
│   │   │   ├── belebele_kat_Geor.yaml
│   │   │   ├── belebele_kaz_Cyrl.yaml
│   │   │   ├── belebele_kea_Latn.yaml
│   │   │   ├── belebele_khk_Cyrl.yaml
│   │   │   ├── belebele_khm_Khmr.yaml
│   │   │   ├── belebele_kin_Latn.yaml
│   │   │   ├── belebele_kir_Cyrl.yaml
│   │   │   ├── belebele_kor_Hang.yaml
│   │   │   ├── belebele_lao_Laoo.yaml
│   │   │   ├── belebele_lin_Latn.yaml
│   │   │   ├── belebele_lit_Latn.yaml
│   │   │   ├── belebele_lug_Latn.yaml
│   │   │   ├── belebele_luo_Latn.yaml
│   │   │   ├── belebele_lvs_Latn.yaml
│   │   │   ├── belebele_mal_Mlym.yaml
│   │   │   ├── belebele_mar_Deva.yaml
│   │   │   ├── belebele_mkd_Cyrl.yaml
│   │   │   ├── belebele_mlt_Latn.yaml
│   │   │   ├── belebele_mri_Latn.yaml
│   │   │   ├── belebele_mya_Mymr.yaml
│   │   │   ├── belebele_nld_Latn.yaml
│   │   │   ├── belebele_nob_Latn.yaml
│   │   │   ├── belebele_npi_Deva.yaml
│   │   │   ├── belebele_npi_Latn.yaml
│   │   │   ├── belebele_nso_Latn.yaml
│   │   │   ├── belebele_nya_Latn.yaml
│   │   │   ├── belebele_ory_Orya.yaml
│   │   │   ├── belebele_pan_Guru.yaml
│   │   │   ├── belebele_pbt_Arab.yaml
│   │   │   ├── belebele_pes_Arab.yaml
│   │   │   ├── belebele_plt_Latn.yaml
│   │   │   ├── belebele_pol_Latn.yaml
│   │   │   ├── belebele_por_Latn.yaml
│   │   │   ├── belebele_ron_Latn.yaml
│   │   │   ├── belebele_rus_Cyrl.yaml
│   │   │   ├── belebele_shn_Mymr.yaml
│   │   │   ├── belebele_sin_Latn.yaml
│   │   │   ├── belebele_sin_Sinh.yaml
│   │   │   ├── belebele_slk_Latn.yaml
│   │   │   ├── belebele_slv_Latn.yaml
│   │   │   ├── belebele_sna_Latn.yaml
│   │   │   ├── belebele_snd_Arab.yaml
│   │   │   ├── belebele_som_Latn.yaml
│   │   │   ├── belebele_sot_Latn.yaml
│   │   │   ├── belebele_spa_Latn.yaml
│   │   │   ├── belebele_srp_Cyrl.yaml
│   │   │   ├── belebele_ssw_Latn.yaml
│   │   │   ├── belebele_sun_Latn.yaml
│   │   │   ├── belebele_swe_Latn.yaml
│   │   │   ├── belebele_swh_Latn.yaml
│   │   │   ├── belebele_tam_Taml.yaml
│   │   │   ├── belebele_tel_Telu.yaml
│   │   │   ├── belebele_tgk_Cyrl.yaml
│   │   │   ├── belebele_tgl_Latn.yaml
│   │   │   ├── belebele_tha_Thai.yaml
│   │   │   ├── belebele_tir_Ethi.yaml
│   │   │   ├── belebele_tsn_Latn.yaml
│   │   │   ├── belebele_tso_Latn.yaml
│   │   │   ├── belebele_tur_Latn.yaml
│   │   │   ├── belebele_ukr_Cyrl.yaml
│   │   │   ├── belebele_urd_Arab.yaml
│   │   │   ├── belebele_urd_Latn.yaml
│   │   │   ├── belebele_uzn_Latn.yaml
│   │   │   ├── belebele_vie_Latn.yaml
│   │   │   ├── belebele_war_Latn.yaml
│   │   │   ├── belebele_wol_Latn.yaml
│   │   │   ├── belebele_xho_Latn.yaml
│   │   │   ├── belebele_yor_Latn.yaml
│   │   │   ├── belebele_zho_Hans.yaml
│   │   │   ├── belebele_zho_Hant.yaml
│   │   │   ├── belebele_zsm_Latn.yaml
│   │   │   └── belebele_zul_Latn.yaml
│   │   ├── benchmarks/
│   │   │   ├── README.md
│   │   │   ├── flan/
│   │   │   │   ├── _held_in_template_yaml
│   │   │   │   ├── flan_held_in.yaml
│   │   │   │   └── flan_held_out.yaml
│   │   │   ├── minerva_math.yaml
│   │   │   ├── multimedqa/
│   │   │   │   ├── README.md
│   │   │   │   └── multimedqa.yaml
│   │   │   ├── openllm.yaml
│   │   │   ├── pythia.yaml
│   │   │   └── t0_eval.yaml
│   │   ├── bertaqa/
│   │   │   ├── README.md
│   │   │   ├── _bertaqa_template
│   │   │   ├── bertaqa_en.yaml
│   │   │   ├── bertaqa_en_mt_gemma-7b.yaml
│   │   │   ├── bertaqa_en_mt_hitz.yaml
│   │   │   ├── bertaqa_en_mt_itzuli.yaml
│   │   │   ├── bertaqa_en_mt_latxa-13b-v1.1.yaml
│   │   │   ├── bertaqa_en_mt_latxa-13b-v1.yaml
│   │   │   ├── bertaqa_en_mt_latxa-70b-v1.1.yaml
│   │   │   ├── bertaqa_en_mt_latxa-70b-v1.yaml
│   │   │   ├── bertaqa_en_mt_latxa-7b-v1.1.yaml
│   │   │   ├── bertaqa_en_mt_latxa-7b-v1.yaml
│   │   │   ├── bertaqa_en_mt_llama-2-13b.yaml
│   │

Download .txt

Showing preview only (387K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (4474 symbols across 720 files)

FILE: examples/transformer-lens.py
  function evaluate_lm_eval (line 12) | def evaluate_lm_eval(lens_model: HookedTransformer, tasks: list[str], **...

FILE: lm_eval/__init__.py
  function __getattr__ (line 17) | def __getattr__(name):

FILE: lm_eval/__main__.py
  function cli_evaluate (line 5) | def cli_evaluate() -> None:

FILE: lm_eval/_cli/harness.py
  class HarnessCLI (line 10) | class HarnessCLI:
    method __init__ (line 13) | def __init__(self):
    method parse_args (line 46) | def parse_args(self) -> argparse.Namespace:
    method execute (line 58) | def execute(self, args: argparse.Namespace) -> None:

FILE: lm_eval/_cli/ls.py
  class List (line 7) | class List(SubCommand):
    method __init__ (line 10) | def __init__(self, subparsers: argparse._SubParsersAction, *args, **kw...
    method _add_args (line 51) | def _add_args(self) -> None:
    method _execute (line 66) | def _execute(self, args: argparse.Namespace) -> None:

FILE: lm_eval/_cli/run.py
  class Run (line 18) | class Run(SubCommand):
    method __init__ (line 21) | def __init__(self, subparsers: argparse._SubParsersAction, *args, **kw...
    method _add_args (line 49) | def _add_args(self) -> None:
    method _execute (line 338) | def _execute(args: argparse.Namespace) -> None:

FILE: lm_eval/_cli/subcommand.py
  class SubCommand (line 5) | class SubCommand(ABC):
    method __init__ (line 8) | def __init__(self, *args, **kwargs):
    method create (line 12) | def create(cls, subparsers: argparse._SubParsersAction):
    method _add_args (line 17) | def _add_args(self) -> None:

FILE: lm_eval/_cli/utils.py
  function try_parse_json (line 12) | def try_parse_json(value: str | dict[str, Any] | None) -> str | dict[str...
  function _int_or_none_list_arg_type (line 28) | def _int_or_none_list_arg_type(
  function request_caching_arg_to_dict (line 66) | def request_caching_arg_to_dict(cache_requests: str | None) -> dict[str,...
  function check_argument_types (line 81) | def check_argument_types(parser: argparse.ArgumentParser) -> None:
  function handle_cli_value_string (line 95) | def handle_cli_value_string(arg: str) -> bool | int | float | str:
  function key_val_to_dict (line 111) | def key_val_to_dict(args: str) -> dict[str, Any]:
  class MergeDictAction (line 125) | class MergeDictAction(argparse.Action):
    method __call__ (line 128) | def __call__(
  class SplitArgs (line 159) | class SplitArgs(argparse.Action):
    method __call__ (line 160) | def __call__(self, parser, namespace, values, option_string=None):

FILE: lm_eval/_cli/validate.py
  class Validate (line 8) | class Validate(SubCommand):
    method __init__ (line 11) | def __init__(self, subparsers: argparse._SubParsersAction, *args, **kw...
    method _add_args (line 78) | def _add_args(self) -> None:
    method _execute (line 95) | def _execute(self, args: argparse.Namespace) -> None:

FILE: lm_eval/api/filter.py
  class Filter (line 8) | class Filter(ABC):
    method __init__ (line 17) | def __init__(self, **kwargs) -> None:
    method apply (line 23) | def apply(self, resps: Union[List, Iterable], docs: List[dict]) -> Ite...
  class FilterEnsemble (line 34) | class FilterEnsemble:
    method apply (line 45) | def apply(self, instances: List[Instance]) -> None:

FILE: lm_eval/api/group.py
  class Group (line 34) | class Group:
    method add (line 61) | def add(self, item: Task | Group) -> None:
    method pop (line 69) | def pop(self, name: str) -> Group | Task | None:
    method get (line 73) | def get(self, name: str) -> Task | Group | None:
    method __contains__ (line 77) | def __contains__(self, name: str) -> bool:
    method __iter__ (line 81) | def __iter__(self):
    method __len__ (line 85) | def __len__(self) -> int:
    method get_all_tasks (line 91) | def get_all_tasks(self, recursive: bool = True) -> list[Task]:
    method get_all_groups (line 112) | def get_all_groups(self, recursive: bool = True) -> list[Group]:
    method child_names (line 132) | def child_names(self) -> list[str]:
    method version (line 137) | def version(self) -> str:
    method has_aggregation (line 142) | def has_aggregation(self) -> bool:
    method _discover_filters_for_metric (line 149) | def _discover_filters_for_metric(
    method aggregate (line 183) | def aggregate(self, task_metrics: dict[str, _TaskMetrics]) -> _TaskMet...
    method to_dict (line 285) | def to_dict(self) -> dict[str, Any] | None:
    method from_config (line 303) | def from_config(cls, config: GroupConfig | dict[str, Any]) -> Group:
    method __repr__ (line 323) | def __repr__(self):
  class ConfigurableGroup (line 333) | class ConfigurableGroup(Group):
    method __init__ (line 336) | def __init__(self, config: dict | GroupConfig | None = None) -> None:
    method group (line 350) | def group(self):
    method group_alias (line 354) | def group_alias(self):
    method version (line 358) | def version(self) -> str:
    method config (line 364) | def config(self):
    method group_name (line 368) | def group_name(self):
    method from_group (line 372) | def from_group(cls, group: Group) -> ConfigurableGroup:
    method __eq__ (line 385) | def __eq__(self, other):
    method __hash__ (line 390) | def __hash__(self):
    method __repr__ (line 393) | def __repr__(self):

FILE: lm_eval/api/instance.py
  class Instance (line 11) | class Instance:
    method __post_init__ (line 27) | def __post_init__(self) -> None:
    method args (line 32) | def args(self):

FILE: lm_eval/api/metrics.py
  function bypass_agg (line 23) | def bypass_agg(arr):
  function nanmean (line 28) | def nanmean(arr):
  function mean (line 35) | def mean(arr):
  function median (line 40) | def median(arr):
  function perplexity (line 47) | def perplexity(items):
  function weighted_perplexity (line 52) | def weighted_perplexity(items):
  function bits_per_byte (line 57) | def bits_per_byte(items):
  function f1_score (line 62) | def f1_score(items):
  function matthews_corrcoef (line 74) | def matthews_corrcoef(items):
  function bleu (line 84) | def bleu(items):
  function chrf (line 102) | def chrf(items):
  function ter (line 117) | def ter(items):
  function brier_score (line 133) | def brier_score(items):  # This is a passthrough function
  function brier_score_fn (line 148) | def brier_score_fn(items):  # This is a passthrough function
  function acc_fn (line 158) | def acc_fn(items):  # This is a passthrough function
  function acc_norm_fn (line 168) | def acc_norm_fn(items):  # This is a passthrough function
  function acc_mutual_info_fn (line 178) | def acc_mutual_info_fn(items):  # This is a passthrough function
  function acc_bytes_fn (line 188) | def acc_bytes_fn(items):  # This is a passthrough function
  function exact_match_hf_evaluate (line 210) | def exact_match_hf_evaluate(
  function exact_match_fn (line 254) | def exact_match_fn(**kwargs):
  function perplexity_fn (line 264) | def perplexity_fn(items):  # This is a passthrough function
  function likelihood_fn (line 274) | def likelihood_fn(items):  # This is a passthrough function
  function word_perplexity_fn (line 284) | def word_perplexity_fn(items):  # This is a passthrough function
  function byte_perplexity_fn (line 294) | def byte_perplexity_fn(items):  # This is a passthrough function
  function bits_per_byte_fn (line 304) | def bits_per_byte_fn(items):  # This is a passthrough function
  function pop_stddev (line 308) | def pop_stddev(arr):
  function sample_stddev (line 313) | def sample_stddev(arr: Sequence[T]) -> float:
  function mean_stderr (line 318) | def mean_stderr(arr):
  function bypass (line 328) | def bypass(items):
  function mcc_fn (line 338) | def mcc_fn(items):  # This is a passthrough function
  function f1_fn (line 348) | def f1_fn(items):  # This is a passthrough function
  function bleu_fn (line 358) | def bleu_fn(items):  # This is a passthrough function
  function chrf_fn (line 368) | def chrf_fn(items):  # This is a passthrough function
  function ter_fn (line 378) | def ter_fn(items):  # This is a passthrough function
  function acc_all (line 388) | def acc_all(items):
  function acc_all_stderr (line 407) | def acc_all_stderr(items):
  function metric_max_over_ground_truths (line 425) | def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
  function weighted_mean (line 434) | def weighted_mean(items):
  function is_non_str_iterable (line 439) | def is_non_str_iterable(obj):
  function _sacreformat (line 443) | def _sacreformat(refs, preds):
  class _bootstrap_internal (line 474) | class _bootstrap_internal:
    method __init__ (line 480) | def __init__(self, f: Callable[[Sequence[T]], float], n: int) -> None:
    method __call__ (line 484) | def __call__(self, v: tuple[int, Sequence[T]]) -> list[float]:
  function _bootstrap_internal_no_mp (line 494) | def _bootstrap_internal_no_mp(
  function bootstrap_stderr (line 516) | def bootstrap_stderr(
  function stderr_for_metric (line 555) | def stderr_for_metric(
  function pooled_sample_stderr (line 590) | def pooled_sample_stderr(stderrs: List[float], sizes: List[int]):
  function combined_sample_stderr (line 608) | def combined_sample_stderr(stderrs: List[float], sizes: List[int], metri...
  function aggregate_subtask_metrics (line 640) | def aggregate_subtask_metrics(metrics, sizes, weight_by_size=True):

FILE: lm_eval/api/model.py
  class LM (line 25) | class LM(abc.ABC):
    method __init__ (line 32) | def __init__(self) -> None:
    method loglikelihood (line 40) | def loglikelihood(self, requests: list["Instance"]) -> list[tuple[floa...
    method loglikelihood_rolling (line 58) | def loglikelihood_rolling(self, requests: list["Instance"]) -> list[fl...
    method generate_until (line 100) | def generate_until(self, requests: list["Instance"]) -> list[str]:
    method apply_chat_template (line 113) | def apply_chat_template(
    method create_from_arg_string (line 131) | def create_from_arg_string(
    method create_from_arg_obj (line 149) | def create_from_arg_obj(
    method device (line 176) | def device(self):
    method rank (line 180) | def rank(self) -> int:
    method world_size (line 185) | def world_size(self) -> int:
    method all_gather (line 189) | def all_gather(self, tensor):
    method gather_object (line 196) | def gather_object(self, obj, dst=0):
    method barrier (line 203) | def barrier(self) -> None:
    method tokenizer_name (line 208) | def tokenizer_name(self) -> str:
    method chat_template (line 217) | def chat_template(self, chat_template: bool | str = False) -> str | None:
    method set_cache_hook (line 225) | def set_cache_hook(self, cache_hook: "CacheHook") -> None:
  function hash_args (line 230) | def hash_args(attr: str, args: Iterable[Any]) -> str:
  class CacheHook (line 235) | class CacheHook:
    method __init__ (line 236) | def __init__(self, cachinglm: Optional["CachingLM"]) -> None:
    method add_partial (line 243) | def add_partial(self, attr: str, req: Iterable[Any], res: Any) -> None:
  class CachingLM (line 250) | class CachingLM:
    method __init__ (line 251) | def __init__(self, lm: LM, cache_db: str) -> None:
    method __getattr__ (line 269) | def __getattr__(self, attr: str) -> Any:
    method get_cache_hook (line 327) | def get_cache_hook(self) -> "CacheHook":
  class TemplateLM (line 331) | class TemplateLM(LM):
    method eot_token_id (line 343) | def eot_token_id(self) -> int:
    method prefix_token_id (line 347) | def prefix_token_id(self):
    method tok_encode (line 352) | def tok_encode(
    method _loglikelihood_tokens (line 363) | def _loglikelihood_tokens(
    method _encode_pair (line 368) | def _encode_pair(
    method loglikelihood (line 408) | def loglikelihood(
    method loglikelihood_rolling (line 449) | def loglikelihood_rolling(
    method generate_until (line 455) | def generate_until(self, requests, disable_tqdm: bool = False) -> list...
    method chat_template (line 458) | def chat_template(self, chat_template: bool | str = False) -> str | None:

FILE: lm_eval/api/registry.py
  function _materialise_placeholder (line 101) | def _materialise_placeholder(ph: Placeholder) -> Any:
  function _suggest_similar (line 125) | def _suggest_similar(
  function _build_key_error_msg (line 142) | def _build_key_error_msg(name: str, alias: str, keys: Iterable[str]) -> ...
  class Registry (line 156) | class Registry(Generic[T]):
    method __init__ (line 164) | def __init__(
    method register (line 183) | def register(
    method _materialise (line 261) | def _materialise(self, ph: Placeholder) -> T:
    method get (line 273) | def get(self, alias: str) -> T: ...
    method get (line 276) | def get(self, alias: str, default: D) -> T | D: ...
    method get (line 278) | def get(self, alias: str, default: D | Any = _MISSING) -> T | D:
    method __getitem__ (line 329) | def __getitem__(self, alias: str) -> T:
    method __contains__ (line 333) | def __contains__(self, alias: str) -> bool:
    method __iter__ (line 337) | def __iter__(self):
    method __len__ (line 341) | def __len__(self):
    method __repr__ (line 345) | def __repr__(self) -> str:
    method keys (line 352) | def keys(self):
    method values (line 356) | def values(self):
    method items (line 363) | def items(self):
    method origin (line 372) | def origin(self, alias: str) -> str | None:
    method freeze (line 391) | def freeze(self):
    method _clear (line 402) | def _clear(self):  # pragma: no cover
  function freeze_all (line 426) | def freeze_all():
  function register_model (line 465) | def register_model(*names):
  function get_model (line 491) | def get_model(model_name: str):
  function register_filter (line 525) | def register_filter(name: str):
  function get_filter (line 545) | def get_filter(filter_name: str | Callable) -> Callable:
  function register_metric (line 575) | def register_metric(**args):
  function get_metric (line 609) | def get_metric(name: str, hf_evaluate_metric: bool = False) -> Callable ...
  function register_aggregation (line 643) | def register_aggregation(name: str):
  function get_aggregation (line 660) | def get_aggregation(name: str) -> Callable[..., float] | None:
  function get_metric_aggregation (line 680) | def get_metric_aggregation(name: str) -> Callable[..., float] | None:
  function is_higher_better (line 700) | def is_higher_better(metric_name: str) -> bool | None:

FILE: lm_eval/api/samplers.py
  class ContextSampler (line 17) | class ContextSampler:
    method __init__ (line 18) | def __init__(
    method sample (line 31) | def sample(
    method set_rnd (line 69) | def set_rnd(self, rnd: int | None):
    method replace_df (line 73) | def replace_df(self, df: Sequence[dict[str, Any]]):
    method fewshot_docs (line 78) | def fewshot_docs(self):
    method rm_eval_doc (line 88) | def rm_eval_doc(doc: _T, _iter: Iterable[_T], n=None) -> Sequence[_T]:
  class FirstNSampler (line 96) | class FirstNSampler(ContextSampler):
    method sample (line 97) | def sample(self, n: int, eval_doc=None, df=None, **kwargs):
  class BalancedSampler (line 108) | class BalancedSampler(ContextSampler):
    method sample (line 109) | def sample(self, n: int, eval_doc=None, df=None, **kwargs):
  class ManualSampler (line 118) | class ManualSampler(ContextSampler):
    method sample (line 119) | def sample(self, n: int, eval_doc=None, df=None, **kwargs):
  function get_sampler (line 130) | def get_sampler(name: str):

FILE: lm_eval/api/task.py
  class Task (line 64) | class Task(abc.ABC):
    method __init__ (line 85) | def __init__(
    method download (line 125) | def download(
    method config (line 164) | def config(self) -> TaskConfig:
    method has_training_docs (line 169) | def has_training_docs(self):
    method has_validation_docs (line 174) | def has_validation_docs(self):
    method has_test_docs (line 179) | def has_test_docs(self):
    method training_docs (line 183) | def training_docs(self) -> Iterable:
    method validation_docs (line 190) | def validation_docs(self) -> Iterable:
    method test_docs (line 197) | def test_docs(self) -> Iterable:
    method fewshot_docs (line 204) | def fewshot_docs(self) -> Iterable:
    method _process_doc (line 221) | def _process_doc(self, doc: dict) -> dict:
    method instances (line 233) | def instances(self) -> list[Instance]:
    method fewshot_examples (line 239) | def fewshot_examples(self, k, rnd):
    method doc_to_decontamination_query (line 245) | def doc_to_decontamination_query(self, doc):
    method doc_to_text (line 251) | def doc_to_text(self, doc):
    method doc_to_target (line 255) | def doc_to_target(self, doc):
    method doc_to_image (line 259) | def doc_to_image(self, doc):
    method doc_to_audio (line 262) | def doc_to_audio(self, doc):
    method doc_to_prefix (line 265) | def doc_to_prefix(self, doc):
    method build_all_requests (line 268) | def build_all_requests(
    method construct_requests (line 382) | def construct_requests(self, doc, ctx, **kwargs):
    method process_results (line 403) | def process_results(self, doc, results):
    method aggregation (line 416) | def aggregation(self):
    method higher_is_better (line 425) | def higher_is_better(self):
    method get_config (line 433) | def get_config(self, key: str) -> Any:
    method count_bytes (line 437) | def count_bytes(cls, doc):
    method count_words (line 442) | def count_words(cls, doc):
    method fewshot_context (line 447) | def fewshot_context(self, doc, num_fewshot, rnd=None, description=None...
    method apply_filters (line 505) | def apply_filters(self) -> list[Instance] | None:
    method dump_config (line 514) | def dump_config(self) -> dict:
    method set_config (line 520) | def set_config(self, key: str, value: Any, update: bool = False) -> None:
    method override_metric (line 535) | def override_metric(self, metric_name: str) -> None:
    method set_fewshot_seed (line 560) | def set_fewshot_seed(self, seed: int | None = None) -> None:
    method eval_docs (line 566) | def eval_docs(self) -> datasets.Dataset | list[dict]:
    method doc_iterator (line 576) | def doc_iterator(
    method resolve_field (line 609) | def resolve_field(doc: dict[str, Any], field: str | None = None):
    method task_name (line 614) | def task_name(self) -> str:
  class ConfigurableTask (line 618) | class ConfigurableTask(Task):
    method __init__ (line 623) | def __init__(
    method download (line 855) | def download(self, dataset_kwargs: dict[str, Any] | None = None, **kwa...
    method has_training_docs (line 875) | def has_training_docs(self) -> bool:
    method has_validation_docs (line 878) | def has_validation_docs(self) -> bool:
    method has_test_docs (line 881) | def has_test_docs(self) -> bool:
    method training_docs (line 884) | def training_docs(self) -> datasets.Dataset:
    method validation_docs (line 892) | def validation_docs(self) -> datasets.Dataset:
    method test_docs (line 900) | def test_docs(self) -> datasets.Dataset:
    method fewshot_docs (line 906) | def fewshot_docs(self):
    method fewshot_context (line 933) | def fewshot_context(
    method build_qa_turn (line 1044) | def build_qa_turn(
    method multiple_input_context (line 1109) | def multiple_input_context(
    method apply_filters (line 1160) | def apply_filters(self) -> list[Instance] | None:
    method should_decontaminate (line 1169) | def should_decontaminate(self):
    method doc_to_decontamination_query (line 1172) | def doc_to_decontamination_query(self, doc: dict):
    method _process_doc (line 1189) | def _process_doc(self, doc: dict) -> dict:
    method doc_to_text (line 1200) | def doc_to_text(self, doc, doc_to_text=None):
    method doc_to_target (line 1236) | def doc_to_target(self, doc: Mapping, doc_to_target=None) -> int | str...
    method doc_to_choice (line 1282) | def doc_to_choice(self, doc: Any, doc_to_choice=None) -> list[str]:
    method doc_to_image (line 1308) | def doc_to_image(self, doc: Any, doc_to_image=None) -> int | str | lis...
    method doc_to_audio (line 1331) | def doc_to_audio(self, doc: Any, doc_to_audio=None) -> int | str | lis...
    method doc_to_prefix (line 1354) | def doc_to_prefix(self, doc):
    method construct_requests (line 1362) | def construct_requests(
    method process_results (line 1455) | def process_results(self, doc, results):
    method aggregation (line 1666) | def aggregation(self) -> dict:
    method higher_is_better (line 1669) | def higher_is_better(self) -> dict:
    method get_config (line 1672) | def get_config(self, key: str) -> Any:
    method task_name (line 1676) | def task_name(self) -> str:
    method __repr__ (line 1679) | def __repr__(self):
  class MultipleChoiceTask (line 1688) | class MultipleChoiceTask(Task):
    method doc_to_target (line 1691) | def doc_to_target(self, doc: dict) -> str:
    method construct_requests (line 1694) | def construct_requests(self, doc: dict, ctx: str, **kwargs) -> list[In...
    method process_results (line 1707) | def process_results(self, doc: dict, results: Iterable[tuple[float, bo...
    method higher_is_better (line 1722) | def higher_is_better(self) -> dict:
    method aggregation (line 1728) | def aggregation(self) -> dict:
  class PerplexityTask (line 1735) | class PerplexityTask(Task):
    method has_training_docs (line 1738) | def has_training_docs(self) -> bool:
    method fewshot_examples (line 1741) | def fewshot_examples(self, k: int, rnd) -> list:
    method fewshot_context (line 1748) | def fewshot_context(self, doc: dict, num_fewshot: int) -> Literal[""]:
    method higher_is_better (line 1756) | def higher_is_better(self) -> dict:
    method doc_to_decontamination_query (line 1763) | def doc_to_decontamination_query(self, doc):
    method doc_to_text (line 1766) | def doc_to_text(self, doc) -> str:
    method doc_to_target (line 1769) | def doc_to_target(self, doc):
    method construct_requests (line 1772) | def construct_requests(self, doc: dict, ctx: str | None, **kwargs):
    method process_results (line 1784) | def process_results(self, doc: dict, results: tuple[float]) -> dict:
    method aggregation (line 1794) | def aggregation(self) -> dict:
    method count_bytes (line 1802) | def count_bytes(cls, doc) -> int:
    method count_words (line 1806) | def count_words(cls, doc) -> int:

FILE: lm_eval/api/utils.py
  function maybe_delimit (line 7) | def maybe_delimit(prefix: str | None, suffix: str | None, delimiter: str...
  function requires_delimiter (line 20) | def requires_delimiter(prefix: str, suffix: str) -> bool:
  function ends_with_whitespace (line 27) | def ends_with_whitespace(s: str) -> bool:
  class Message (line 33) | class Message:
    method to_dict (line 51) | def to_dict(self) -> dict[str, str]:
    method to_text (line 55) | def to_text(self) -> str:
  function messages_to_text (line 60) | def messages_to_text(messages: list[Message]) -> str:
  function multiturn_to_singleturn (line 65) | def multiturn_to_singleturn(messages: list[Message]) -> list[dict[str, A...
  function format_turn (line 86) | def format_turn(content: str, role: str, type: str | None = None) -> dic...
  function random_task_id (line 95) | def random_task_id():

FILE: lm_eval/caching/cache.py
  function load_from_cache (line 26) | def load_from_cache(file_name: str, cache: bool = False):
  function save_to_cache (line 41) | def save_to_cache(file_name, obj):
  function delete_cache (line 53) | def delete_cache(key: str = ""):

FILE: lm_eval/config/evaluate_config.py
  class EvaluatorConfig (line 29) | class EvaluatorConfig:
    method from_cli (line 196) | def from_cli(cls, namespace: Namespace) -> "EvaluatorConfig":
    method from_config (line 231) | def from_config(cls, config_path: str | Path) -> "EvaluatorConfig":
    method load_yaml_config (line 241) | def load_yaml_config(config_path: str | Path) -> dict[str, Any]:
    method _parse_dict_args (line 261) | def _parse_dict_args(self):
    method _configure (line 268) | def _configure(self):
    method _validate_arguments (line 274) | def _validate_arguments(self):
    method _process_arguments (line 314) | def _process_arguments(self):
    method process_tasks (line 336) | def process_tasks(self, metadata: dict | None = None) -> "TaskManager":
    method _set_trust_remote_code (line 414) | def _set_trust_remote_code(self):

FILE: lm_eval/config/group.py
  class AggMetricConfig (line 7) | class AggMetricConfig:
    method __post_init__ (line 34) | def __post_init__(self):
  class GroupConfig (line 47) | class GroupConfig:
    method __post_init__ (line 93) | def __post_init__(self):
    method to_dict (line 104) | def to_dict(self, keep_callable: bool = False) -> dict[str, str]:
    method serialize_function (line 113) | def serialize_function(

FILE: lm_eval/config/task.py
  class FewshotConfig (line 21) | class FewshotConfig:
    method __post_init__ (line 43) | def __post_init__(self):
    method from_dict (line 50) | def from_dict(
  class TaskConfig (line 82) | class TaskConfig(dict):
    method __post_init__ (line 130) | def __post_init__(self) -> None:
    method __getitem__ (line 170) | def __getitem__(self, item):
    method __setitem__ (line 173) | def __setitem__(self, item, value):
    method to_dict (line 176) | def to_dict(self, keep_callable: bool = False) -> dict:
    method serialize_function (line 204) | def serialize_function(

FILE: lm_eval/decontamination/archiver.py
  function json_serial (line 14) | def json_serial(obj: Any) -> str:
  class Archive (line 23) | class Archive:
    method __init__ (line 24) | def __init__(self, file_path: str, compression_level: int = 3) -> None:
    method add_data (line 33) | def add_data(self, data, meta=None) -> None:
    method commit (line 43) | def commit(self) -> None:
  class Reader (line 50) | class Reader:
    method __init__ (line 51) | def __init__(self) -> None:
    method read (line 54) | def read(
  class TextArchive (line 84) | class TextArchive:
    method __init__ (line 85) | def __init__(self, file_path, mode: str = "rb+") -> None:
    method add_data (line 96) | def add_data(self, data) -> None:
    method commit (line 99) | def commit(self) -> None:
  class TextReader (line 104) | class TextReader:
    method __init__ (line 105) | def __init__(self, file_path) -> None:
    method read_tqdm (line 110) | def read_tqdm(self, update_frequency: int = 10000):
    method read_and_tell (line 134) | def read_and_tell(self):
    method read (line 145) | def read(self):
    method read_slow (line 152) | def read_slow(self):
  class ZStdTextReader (line 164) | class ZStdTextReader:
    method __init__ (line 165) | def __init__(self, file) -> None:
    method read_tqdm (line 168) | def read_tqdm(self):

FILE: lm_eval/decontamination/decontaminate.py
  function get_train_overlap_stub (line 14) | def get_train_overlap_stub(docs: dict, ngrams_path: str, ngrams_n_size: ...
  function get_train_overlap (line 37) | def get_train_overlap(docs_by_task_set: dict, ngrams_path: str, limit: i...

FILE: lm_eval/decontamination/janitor.py
  function form_ngrams (line 25) | def form_ngrams(sequence: Iterator[T], n: int) -> Iterator[Tuple[T, ...]]:
  function word_ngrams (line 42) | def word_ngrams(s: str, n: int) -> Iterator[str]:
  function split_indices (line 74) | def split_indices(s: str) -> Iterator[Tuple[str, Tuple[int, int]]]:
  function word_ngrams_indices (line 81) | def word_ngrams_indices(s: str, n: int) -> Iterator[Tuple[str, Tuple[int...
  class Janitor (line 109) | class Janitor:
    method __init__ (line 111) | def __init__(
    method save_contamination_ngrams (line 140) | def save_contamination_ngrams(self, filename: str) -> None:
    method load_contamination_ngrams (line 144) | def load_contamination_ngrams(self, filename: str) -> None:
    method register_contaminant (line 152) | def register_contaminant(self, dirt_string: str) -> None:
    method clean (line 161) | def clean(self, dirty_string: str) -> List[str]:
    method _split_chunks (line 171) | def _split_chunks(
    method register_contaminant_cpp (line 196) | def register_contaminant_cpp(self, dirt_string) -> None:
    method clean_cpp (line 201) | def clean_cpp(self, dirty_string: str) -> List[str]:
    method normalize_string (line 211) | def normalize_string(self, s: str) -> str:
    method register_contaminant_python (line 214) | def register_contaminant_python(self, dirt_string: str) -> None:
    method clean_python (line 219) | def clean_python(self, dirty_string: str) -> List[str]:

FILE: lm_eval/defaults.py
  function _strtobool (line 13) | def _strtobool(val: str) -> bool:
  function _envbool (line 25) | def _envbool(var: str, default: bool = False) -> bool:
  function default_gen_kwargs (line 38) | def default_gen_kwargs(

FILE: lm_eval/evaluator.py
  function simple_evaluate (line 54) | def simple_evaluate(
  function evaluate (line 414) | def evaluate(

FILE: lm_eval/evaluator_utils.py
  class ResultAcc (line 29) | class ResultAcc(TypedDict):
  function print_writeout (line 37) | def print_writeout(task: Task) -> None:
  function get_sample_size (line 49) | def get_sample_size(task, limit: int | float | None) -> int | None:
  function find_test_root (line 58) | def find_test_root(start_path: pathlib.Path) -> pathlib.Path:
  function run_task_tests (line 76) | def run_task_tests(task_list: list[str]):
  class EvalAcc (line 99) | class EvalAcc:
    method collect (line 120) | def collect(self) -> tuple[dict[str, _TaskMetrics], dict[str, _TaskMet...
    method _to_eval_results (line 134) | def _to_eval_results(
  function _compute_task_aggregations (line 173) | def _compute_task_aggregations(
  function _collect_results (line 222) | def _collect_results(
  function aggregate_groups (line 275) | def aggregate_groups(
  function _get_root_groups (line 302) | def _get_root_groups(groups: dict[str, Group]) -> list[Group]:
  function _collect_groups_bottom_up (line 319) | def _collect_groups_bottom_up(groups: dict[str, Group]) -> list[Group]:
  function _process_results (line 349) | def _process_results(
  function _propagate_num_fewshot (line 395) | def _propagate_num_fewshot(
  function _propagate_higher_is_better (line 404) | def _propagate_higher_is_better(
  function _log_selected_tasks (line 423) | def _log_selected_tasks(
  function _handle_back_comp (line 483) | def _handle_back_comp(

FILE: lm_eval/filters/__init__.py
  function build_filter_ensemble (line 11) | def build_filter_ensemble(

FILE: lm_eval/filters/custom.py
  class CustomFilter (line 6) | class CustomFilter(Filter):
    method __init__ (line 11) | def __init__(self, **kwargs) -> None:
    method apply (line 16) | def apply(self, resps, docs):

FILE: lm_eval/filters/decontamination.py
  class DecontaminationFilter (line 6) | class DecontaminationFilter(Filter):
    method __init__ (line 13) | def __init__(self, path) -> None:
    method apply (line 21) | def apply(self, resps, docs) -> None:

FILE: lm_eval/filters/extraction.py
  class RegexFilter (line 10) | class RegexFilter(Filter):
    method __init__ (line 18) | def __init__(
    method apply (line 33) | def apply(self, resps: list[list[str]], docs: list[dict]) -> list[list...
  class POSFilter (line 63) | class POSFilter(Filter):
    method __init__ (line 66) | def __init__(
    method apply (line 83) | def apply(self, resps, docs):
  class WhitespaceFilter (line 109) | class WhitespaceFilter(Filter):
    method apply (line 112) | def apply(self, resps: list[list[str]], docs: list[dict]) -> list[list...
  class MultiChoiceRegexFilter (line 126) | class MultiChoiceRegexFilter(RegexFilter):
    method __init__ (line 134) | def __init__(
    method apply (line 157) | def apply(self, resps: list[list[str]], docs: list[dict]) -> list[list...

FILE: lm_eval/filters/selection.py
  class TakeFirstFilter (line 13) | class TakeFirstFilter(Filter):
    method __init__ (line 14) | def __init__(self) -> None:
    method apply (line 19) | def apply(self, resps, docs):
  class TakeKFilter (line 27) | class TakeKFilter(Filter):
    method __init__ (line 28) | def __init__(self, **kwargs) -> None:
    method apply (line 33) | def apply(self, resps, docs):
  class MajorityVoteFilter (line 44) | class MajorityVoteFilter(Filter):
    method __init__ (line 45) | def __init__(self) -> None:
    method apply (line 50) | def apply(self, resps, docs):

FILE: lm_eval/filters/transformation.py
  class LowercaseFilter (line 8) | class LowercaseFilter(Filter):
    method __init__ (line 9) | def __init__(self) -> None:
    method apply (line 12) | def apply(self, resps, docs):
  class UppercaseFilter (line 20) | class UppercaseFilter(Filter):
    method __init__ (line 21) | def __init__(self) -> None:
    method apply (line 24) | def apply(self, resps, docs):
  class MapFilter (line 32) | class MapFilter(Filter):
    method __init__ (line 33) | def __init__(self, mapping_dict: dict = None, default_value=None) -> N...
    method apply (line 54) | def apply(self, resps, docs):
  class SPANFilter (line 62) | class SPANFilter(Filter):
    method __init__ (line 63) | def __init__(self) -> None:
    method apply (line 66) | def apply(self, resps, docs):

FILE: lm_eval/loggers/evaluation_tracker.py
  class GeneralConfigTracker (line 38) | class GeneralConfigTracker:
    method __init__ (line 70) | def __init__(self) -> None:
    method _get_model_name (line 75) | def _get_model_name(model_args: str | dict[str, Any] | None) -> str | ...
    method log_experiment_args (line 95) | def log_experiment_args(
    method log_end_time (line 117) | def log_end_time(self) -> None:
  class EvaluationTracker (line 123) | class EvaluationTracker:
    method __init__ (line 130) | def __init__(
    method _api (line 222) | def _api(token: str | None = None) -> "HfApi | None":
    method save_results_aggregated (line 230) | def save_results_aggregated(
    method save_results_samples (line 320) | def save_results_samples(
    method recreate_metadata_card (line 424) | def recreate_metadata_card(self) -> None:

FILE: lm_eval/loggers/utils.py
  function remove_none_pattern (line 15) | def remove_none_pattern(input_string: str) -> tuple[str, bool]:
  function _handle_non_serializable (line 37) | def _handle_non_serializable(o: Any) -> int | str | list:
  function get_commit_from_path (line 56) | def get_commit_from_path(repo_path: Path | str) -> str | None:
  function get_git_commit_hash (line 83) | def get_git_commit_hash():
  function add_env_info (line 97) | def add_env_info(storage: dict[str, Any]):
  function add_tokenizer_info (line 131) | def add_tokenizer_info(storage: dict[str, Any], lm):

FILE: lm_eval/loggers/wandb_logger.py
  function get_wandb_printer (line 16) | def get_wandb_printer() -> Literal["Printer"]:
  class WandbLogger (line 24) | class WandbLogger:
    method __init__ (line 25) | def __init__(self, init_args=None, config_args=None) -> None:
    method post_init (line 66) | def post_init(self, results: Dict[str, Any]) -> None:
    method _get_config (line 71) | def _get_config(self) -> Dict[str, Any]:
    method _sanitize_results_dict (line 82) | def _sanitize_results_dict(self) -> Tuple[Dict[str, str], Dict[str, An...
    method _log_results_as_table (line 118) | def _log_results_as_table(self) -> None:
    method _log_results_as_artifact (line 168) | def _log_results_as_artifact(self) -> None:
    method log_eval_result (line 180) | def log_eval_result(self) -> None:
    method _generate_dataset (line 196) | def _generate_dataset(
    method _log_samples_as_artifact (line 287) | def _log_samples_as_artifact(
    method log_eval_samples (line 307) | def log_eval_samples(self, samples: Dict[str, List[Dict[str, Any]]]) -...

FILE: lm_eval/models/__init__.py
  function _register_all_models (line 60) | def _register_all_models():

FILE: lm_eval/models/anthropic_llms.py
  function anthropic_completion (line 17) | def anthropic_completion(
  function anthropic_chat (line 80) | def anthropic_chat(
  class AnthropicLM (line 145) | class AnthropicLM(LM):
    method __init__ (line 148) | def __init__(
    method eot_token_id (line 186) | def eot_token_id(self):
    method max_length (line 191) | def max_length(self) -> int:
    method max_gen_toks (line 195) | def max_gen_toks(self) -> int:
    method batch_size (line 199) | def batch_size(self):
    method device (line 204) | def device(self):
    method tok_encode (line 208) | def tok_encode(self, string: str) -> List[int]:
    method tok_decode (line 211) | def tok_decode(self, tokens: List[int]) -> str:
    method _loglikelihood_tokens (line 214) | def _loglikelihood_tokens(self, requests, disable_tqdm: bool = False):
    method generate_until (line 217) | def generate_until(self, requests, disable_tqdm: bool = False) -> List...
    method _model_call (line 261) | def _model_call(self, inps):
    method _model_generate (line 265) | def _model_generate(self, context, max_length, eos_token_id):
    method loglikelihood (line 269) | def loglikelihood(self, requests, disable_tqdm: bool = False):
    method loglikelihood_rolling (line 272) | def loglikelihood_rolling(self, requests, disable_tqdm: bool = False):
  class AnthropicChat (line 277) | class AnthropicChat(LocalCompletionsAPI):
    method __init__ (line 278) | def __init__(
    method api_key (line 297) | def api_key(self):
    method header (line 307) | def header(self):
    method _create_payload (line 313) | def _create_payload(
    method parse_generations (line 359) | def parse_generations(
    method tok_encode (line 370) | def tok_encode(
    method loglikelihood (line 379) | def loglikelihood(self, requests, **kwargs):

FILE: lm_eval/models/api_models.py
  class JsonChatStr (line 54) | class JsonChatStr(NamedTuple):
    method encode (line 57) | def encode(self, encoding):
  function create_image_prompt (line 61) | def create_image_prompt(
  class TemplateAPI (line 104) | class TemplateAPI(TemplateLM):
    method __init__ (line 107) | def __init__(
    method _create_payload (line 252) | def _create_payload(
    method create_message (line 265) | def create_message(
    method parse_logprobs (line 297) | def parse_logprobs(
    method parse_generations (line 308) | def parse_generations(outputs: Union[Any, List[Any]], **kwargs) -> Lis...
    method api_key (line 313) | def api_key(self) -> str:
    method header (line 318) | def header(self) -> dict:
    method tokenizer_name (line 323) | def tokenizer_name(self) -> str:
    method apply_chat_template (line 330) | def apply_chat_template(
    method eot_token_id (line 353) | def eot_token_id(self) -> Optional[int]:
    method eos_string (line 365) | def eos_string(self) -> Optional[str]:
    method prefix_token_id (line 382) | def prefix_token_id(self) -> Optional[int]:
    method tok_encode (line 397) | def tok_encode(
    method decode_batch (line 446) | def decode_batch(self, tokens: List[List[int]]) -> List[str]:
    method model_call (line 454) | def model_call(
    method amodel_call (line 490) | async def amodel_call(
    method batch_loglikelihood_requests (line 552) | def batch_loglikelihood_requests(
    method get_batched_requests (line 575) | async def get_batched_requests(
    method _loglikelihood_tokens (line 620) | def _loglikelihood_tokens(self, requests, **kwargs) -> List[Tuple[floa...
    method generate_until (line 683) | def generate_until(
    method loglikelihood_rolling (line 832) | def loglikelihood_rolling(

FILE: lm_eval/models/dummy.py
  class DummyLM (line 11) | class DummyLM(LM):
    method __init__ (line 14) | def __init__(self, *args, write_out: bool = False, **kwargs) -> None:
    method create_from_arg_string (line 19) | def create_from_arg_string(cls, arg_string, additional_config=None):
    method loglikelihood (line 22) | def loglikelihood(self, requests, disable_tqdm: bool = False):
    method generate_until (line 33) | def generate_until(self, requests, disable_tqdm: bool = False):
    method loglikelihood_rolling (line 45) | def loglikelihood_rolling(self, requests, disable_tqdm: bool = False):
    method tokenizer (line 54) | def tokenizer(self):
    method apply_chat_template (line 59) | def apply_chat_template(

FILE: lm_eval/models/gguf.py
  function get_result (line 15) | def get_result(logprobs, context_length):
  class GGUFLM (line 37) | class GGUFLM(LM):
    method __init__ (line 38) | def __init__(self, base_url=None, max_length=2048, **kwargs):
    method gguf_completion (line 46) | def gguf_completion(
    method loglikelihood (line 75) | def loglikelihood(self, requests, disable_tqdm: bool = False):
    method generate_until (line 104) | def generate_until(self, requests, disable_tqdm: bool = False):
    method loglikelihood_rolling (line 129) | def loglikelihood_rolling(self, requests, disable_tqdm: bool = False):

FILE: lm_eval/models/hf_audiolm.py
  class HFAUDIOLMQWEN (line 22) | class HFAUDIOLMQWEN(HFLM):
    method __init__ (line 30) | def __init__(
    method _create_tokenizer (line 42) | def _create_tokenizer(
    method apply_chat_template (line 85) | def apply_chat_template(
    method _model_multimodal_generate (line 98) | def _model_multimodal_generate(self, inputs, max_length, stop, **gener...
    method tok_batch_multimodal_encode (line 124) | def tok_batch_multimodal_encode(
    method generate_until (line 165) | def generate_until(
    method loglikelihood_rolling (line 290) | def loglikelihood_rolling(self, requests: list[Instance]) -> list[float]:
    method loglikelihood (line 296) | def loglikelihood(

FILE: lm_eval/models/hf_steered.py
  function steer (line 23) | def steer(
  class SteeredModel (line 67) | class SteeredModel(HFLM):
    method __init__ (line 70) | def __init__(
    method derive_steer_config (line 147) | def derive_steer_config(cls, steer_path: str):
    method add (line 210) | def add(
    method clamp (line 231) | def clamp(
    method forward (line 270) | def forward(self, *args, **kwargs):
    method _model_call (line 274) | def _model_call(self, *args, **kwargs):
    method _model_generate (line 278) | def _model_generate(self, *args, **kwargs):

FILE: lm_eval/models/hf_vlms.py
  class HFMultimodalLM (line 30) | class HFMultimodalLM(HFLM):
    method __init__ (line 38) | def __init__(
    method _create_tokenizer (line 112) | def _create_tokenizer(
    method tok_multimodal_encode (line 158) | def tok_multimodal_encode(
    method _encode_multimodal_pair (line 188) | def _encode_multimodal_pair(self, context, continuation, images):
    method apply_chat_template (line 218) | def apply_chat_template(
    method chat_template (line 275) | def chat_template(self, chat_template: bool | str = False) -> str | None:
    method tok_batch_multimodal_encode (line 287) | def tok_batch_multimodal_encode(
    method _model_multimodal_call (line 342) | def _model_multimodal_call(self, inps, imgs, attn_mask=None, labels=No...
    method _model_multimodal_generate (line 350) | def _model_multimodal_generate(self, inputs, max_length, stop, **gener...
    method _batch_images (line 376) | def _batch_images(self, image_encs):
    method loglikelihood_rolling (line 394) | def loglikelihood_rolling(self, requests: list[Instance]) -> list[float]:
    method loglikelihood (line 403) | def loglikelihood(
    method _multimodal_loglikelihood_tokens (line 439) | def _multimodal_loglikelihood_tokens(
    method generate_until (line 625) | def generate_until(

FILE: lm_eval/models/huggingface.py
  class HFLM (line 60) | class HFLM(TemplateLM):
    method __init__ (line 70) | def __init__(
    method _get_accelerate_args (line 442) | def _get_accelerate_args(
    method config (line 529) | def config(self):
    method model (line 534) | def model(self):
    method eot_token_id (line 542) | def eot_token_id(self) -> int:
    method prefix_token_id (line 547) | def prefix_token_id(self) -> int:
    method max_length (line 556) | def max_length(self) -> int:
    method max_gen_toks (line 570) | def max_gen_toks(self) -> int:
    method batch_size (line 574) | def batch_size(self):
    method device (line 578) | def device(self):
    method rank (line 582) | def rank(self):
    method world_size (line 586) | def world_size(self):
    method all_gather (line 589) | def all_gather(self, tensor):
    method gather_object (line 594) | def gather_object(self, obj, dst=0):
    method barrier (line 601) | def barrier(self):
    method tokenizer_name (line 606) | def tokenizer_name(self) -> str:
    method _get_backend (line 609) | def _get_backend(
    method _get_config (line 669) | def _get_config(
    method _create_model (line 687) | def _create_model(
    method _create_tokenizer (line 856) | def _create_tokenizer(
    method _detect_batch_size (line 917) | def _detect_batch_size(self, requests: Sequence | None = None, pos: in...
    method tok_encode (line 976) | def tok_encode(
    method tok_batch_encode (line 1001) | def tok_batch_encode(
    method tok_decode (line 1044) | def tok_decode(self, tokens: Iterator[list[str]], skip_special_tokens:...
    method _model_call (line 1047) | def _model_call(
    method _model_generate (line 1089) | def _model_generate(
    method _select_cont_toks (line 1127) | def _select_cont_toks(
    method loglikelihood_rolling (line 1150) | def loglikelihood_rolling(
    method _batch_scheduler (line 1236) | def _batch_scheduler(self, pos, n_reordered_requests):
    method _loglikelihood_tokens (line 1253) | def _loglikelihood_tokens(
    method generate_until (line 1490) | def generate_until(
    method apply_chat_template (line 1634) | def apply_chat_template(
    method get_model_info (line 1661) | def get_model_info(self) -> dict:

FILE: lm_eval/models/ibm_watsonx_ai.py
  class LogLikelihoodResult (line 21) | class LogLikelihoodResult(NamedTuple):
  function _verify_credentials (line 26) | def _verify_credentials(creds: dict) -> None:
  function get_watsonx_credentials (line 73) | def get_watsonx_credentials() -> dict[str, str | None]:
  class WatsonxLLM (line 120) | class WatsonxLLM(LM):
    method create_from_arg_string (line 127) | def create_from_arg_string(
    method __init__ (line 191) | def __init__(
    method _has_stop_token (line 228) | def _has_stop_token(response_tokens: list[str], context_tokens: list[s...
    method _check_model_logprobs_support (line 257) | def _check_model_logprobs_support(self):
    method _get_log_likelihood (line 278) | def _get_log_likelihood(
    method generate_until (line 312) | def generate_until(self, requests: list[Instance]) -> list[str]:
    method loglikelihood (line 349) | def loglikelihood(self, requests: list[Instance]) -> list[tuple[float,...
    method loglikelihood_rolling (line 416) | def loglikelihood_rolling(self, requests) -> list[float]:
    method tokenizer_name (line 470) | def tokenizer_name(self) -> str:
    method apply_chat_template (line 473) | def apply_chat_template(

FILE: lm_eval/models/mamba_lm.py
  class MambaLMWrapper (line 10) | class MambaLMWrapper(HFLM):
    method __init__ (line 11) | def __init__(
    method _get_config (line 66) | def _get_config(
    method _create_model (line 84) | def _create_model(
    method _model_generate (line 114) | def _model_generate(self, context, max_length, stop, **generation_kwar...

FILE: lm_eval/models/megatron_lm.py
  function _add_megatron_to_path (line 74) | def _add_megatron_to_path():
  function _check_dist_ckpt (line 93) | def _check_dist_ckpt(load_path: str) -> bool:
  function _parse_extra_args (line 105) | def _parse_extra_args(extra_args: str | None) -> list[str]:
  class MegatronLMEval (line 130) | class MegatronLMEval(LM):
    method __init__ (line 154) | def __init__(
    method _validate_parallelism_config (line 247) | def _validate_parallelism_config(self, devices: int, tp: int, pp: int,...
    method _initialize_megatron (line 309) | def _initialize_megatron(self, **kwargs):
    method eot_token_id (line 595) | def eot_token_id(self) -> int:
    method prefix_token_id (line 606) | def prefix_token_id(self) -> int:
    method max_length (line 620) | def max_length(self) -> int:
    method max_gen_toks (line 624) | def max_gen_toks(self) -> int:
    method batch_size (line 628) | def batch_size(self) -> int:
    method device (line 632) | def device(self) -> torch.device:
    method rank (line 636) | def rank(self) -> int:
    method world_size (line 640) | def world_size(self) -> int:
    method accelerator (line 644) | def accelerator(self):
    method all_gather (line 648) | def all_gather(self, tensor: torch.Tensor) -> torch.Tensor:
    method gather_object (line 652) | def gather_object(self, obj, dst: int = 0):
    method barrier (line 661) | def barrier(self) -> None:
    class _Accelerator (line 665) | class _Accelerator:
      method __init__ (line 672) | def __init__(self, world_size, device):
      method wait_for_everyone (line 676) | def wait_for_everyone(self):
      method gather (line 681) | def gather(self, local_tensor):
      method gather_object (line 705) | def gather_object(self, local_obj):
    method tok_encode (line 714) | def tok_encode(self, string: str, add_special_tokens: bool = False) ->...
    method tok_decode (line 721) | def tok_decode(self, tokens: list[int]) -> str:
    method _encode_pair (line 728) | def _encode_pair(
    method _model_forward (line 744) | def _model_forward(
    method _distribute_requests (line 823) | def _distribute_requests(self, requests: list) -> tuple[list, list[int]]:
    method _gather_results (line 840) | def _gather_results(self, local_results: list, sizes: list[int]) -> list:
    method loglikelihood (line 860) | def loglikelihood(self, requests: list[Instance]) -> list[tuple[float,...
    method _loglikelihood_tokens (line 891) | def _loglikelihood_tokens(
    method loglikelihood_rolling (line 1010) | def loglikelihood_rolling(
    method generate_until (line 1055) | def generate_until(

FILE: lm_eval/models/mistral3.py
  class Mistral3LM (line 33) | class Mistral3LM(HFLM):
    method __init__ (line 44) | def __init__(self, **kwargs):
    method _get_backend (line 59) | def _get_backend(
    method _model_call (line 74) | def _model_call(
    method max_length (line 99) | def max_length(self) -> int:

FILE: lm_eval/models/nemo_lm.py
  function _patch_pretrained_cfg (line 42) | def _patch_pretrained_cfg(
  function _get_target_from_class (line 72) | def _get_target_from_class(target_class) -> str:
  function load_model (line 76) | def load_model(
  function setup_distributed_environment (line 145) | def setup_distributed_environment(trainer):
  class NeMoLM (line 168) | class NeMoLM(LM):
    method __init__ (line 169) | def __init__(
    method create_from_arg_string (line 275) | def create_from_arg_string(cls, arg_string, additional_config=None):
    method eot_token_id (line 283) | def eot_token_id(self):
    method max_length (line 290) | def max_length(self):
    method max_gen_toks (line 294) | def max_gen_toks(self):
    method batch_size (line 298) | def batch_size(self):
    method device (line 302) | def device(self):
    method rank (line 306) | def rank(self):
    method world_size (line 310) | def world_size(self):
    method all_gather (line 313) | def all_gather(self, tensor):
    method gather_object (line 320) | def gather_object(self, obj, dst=0):
    method barrier (line 327) | def barrier(self):
    method tok_encode (line 331) | def tok_encode(self, string: str):
    method tok_decode (line 334) | def tok_decode(self, tokens):
    method _encode_pair (line 337) | def _encode_pair(self, context, continuation):
    method loglikelihood (line 348) | def loglikelihood(self, requests):
    method loglikelihood_rolling (line 364) | def loglikelihood_rolling(
    method _loglikelihood_tokens (line 398) | def _loglikelihood_tokens(self, requests, disable_tqdm=False):
    method generate_until (line 491) | def generate_until(self, requests):

FILE: lm_eval/models/neuron_optimum.py
  class CustomNeuronModelForCausalLM (line 37) | class CustomNeuronModelForCausalLM(NeuronModelForCausalLM):
    method generate (line 40) | def generate(
  class NEURON_HF (line 126) | class NEURON_HF(TemplateLM):
    method __init__ (line 133) | def __init__(
    method config (line 248) | def config(self):
    method eot_token_id (line 253) | def eot_token_id(self):
    method prefix_token_id (line 258) | def prefix_token_id(self):
    method max_length (line 263) | def max_length(self):
    method max_gen_toks (line 267) | def max_gen_toks(self) -> int:
    method batch_size (line 271) | def batch_size(self):
    method device (line 275) | def device(self):
    method rank (line 280) | def rank(self):
    method world_size (line 284) | def world_size(self):
    method tok_encode (line 287) | def tok_encode(self, string: str, left_truncate_len=None, add_special_...
    method tok_batch_encode (line 300) | def tok_batch_encode(
    method tok_decode (line 329) | def tok_decode(self, tokens):
    method _model_generate (line 332) | def _model_generate(self, context, max_length, stop, **generation_kwar...
    method _select_cont_toks (line 356) | def _select_cont_toks(self, logits, contlen=None, inplen=None):
    method loglikelihood_rolling (line 366) | def loglikelihood_rolling(self, requests, disable_tqdm: bool = False):
    method _loglikelihood_tokens (line 419) | def _loglikelihood_tokens(
    method generate_until (line 568) | def generate_until(self, requests, disable_tqdm: bool = False):

FILE: lm_eval/models/openai_completions.py
  class LocalCompletionsAPI (line 16) | class LocalCompletionsAPI(TemplateAPI):
    method __init__ (line 17) | def __init__(
    method _create_payload (line 61) | def _create_payload(
    method parse_logprobs (line 99) | def parse_logprobs(
    method parse_generations (line 125) | def parse_generations(outputs: Union[Dict, List[Dict]], **kwargs) -> L...
    method api_key (line 137) | def api_key(self):
  class LocalChatCompletion (line 142) | class LocalChatCompletion(LocalCompletionsAPI):
    method __init__ (line 150) | def __init__(
    method _create_payload (line 175) | def _create_payload(
    method parse_generations (line 211) | def parse_generations(outputs: Union[Dict, List[Dict]], **kwargs) -> L...
    method tok_encode (line 229) | def tok_encode(
    method loglikelihood (line 238) | def loglikelihood(self, requests, **kwargs):
  class OpenAICompletionsAPI (line 247) | class OpenAICompletionsAPI(LocalCompletionsAPI):
    method __init__ (line 248) | def __init__(
    method api_key (line 259) | def api_key(self):
    method loglikelihood (line 268) | def loglikelihood(self, requests, **kwargs):
    method chat_template (line 277) | def chat_template(self, chat_template: Union[bool, str] = False) -> Op...
  class OpenAIChatCompletion (line 282) | class OpenAIChatCompletion(LocalChatCompletion):
    method __init__ (line 283) | def __init__(
    method api_key (line 303) | def api_key(self):
    method loglikelihood (line 312) | def loglikelihood(self, requests, **kwargs):
    method _create_payload (line 317) | def _create_payload(
  class AzureOpenaiChatCompletionsLM (line 359) | class AzureOpenaiChatCompletionsLM(OpenAIChatCompletion):
    method __init__ (line 360) | def __init__(
    method api_key (line 384) | def api_key(self):

FILE: lm_eval/models/optimum_habana.py
  class HabanaLM (line 18) | class HabanaLM(HFLM):
    method __init__ (line 30) | def __init__(self, **kwargs) -> None:
    method max_length (line 52) | def max_length(self) -> int:
    method max_length (line 57) | def max_length(self, value: int) -> None:
    method find_bucket (line 60) | def find_bucket(self, length: int, key=lambda b, length: b >= length) ...
    method _model_call (line 75) | def _model_call(self, inps: torch.Tensor) -> torch.Tensor:
    method setup_generation_config_gaudi (line 97) | def setup_generation_config_gaudi(self, **kwargs):
    method _create_model (line 108) | def _create_model(self, *args, **kwargs) -> None:
    method generate_until (line 125) | def generate_until(
    method _model_generate (line 137) | def _model_generate(

FILE: lm_eval/models/optimum_ipex.py
  class IPEXLM (line 13) | class IPEXLM(HFLM):
    method __init__ (line 18) | def __init__(
    method _create_model (line 33) | def _create_model(

FILE: lm_eval/models/optimum_lm.py
  class OptimumLM (line 14) | class OptimumLM(HFLM):
    method __init__ (line 25) | def __init__(
    method _create_model (line 43) | def _create_model(

FILE: lm_eval/models/sglang_causallms.py
  class SGLangLM (line 34) | class SGLangLM(TemplateLM):
    method __init__ (line 37) | def __init__(
    method loglikelihood_rolling (line 124) | def loglikelihood_rolling(
    method generate_until (line 193) | def generate_until(
    method _model_generate (line 288) | def _model_generate(
    method eot_token_id (line 319) | def eot_token_id(self):
    method prefix_token_id (line 324) | def prefix_token_id(self):
    method max_length (line 333) | def max_length(self):
    method max_gen_toks (line 343) | def max_gen_toks(self):
    method tok_encode (line 347) | def tok_encode(
    method tok_decode (line 372) | def tok_decode(self, tokens: List[int]) -> str:
    method tokenizer_name (line 377) | def tokenizer_name(self) -> str:
    method chat_template (line 387) | def chat_template(self, chat_template: Union[bool, str] = False) -> str:
    method apply_chat_template (line 408) | def apply_chat_template(
    method _loglikelihood_tokens (line 423) | def _loglikelihood_tokens(
    method _parse_logprobs (line 483) | def _parse_logprobs(tokens: List, outputs, ctxlen: int) -> Tuple[float...
    method modify_gen_kwargs (line 519) | def modify_gen_kwargs(kwargs: dict) -> dict:

FILE: lm_eval/models/sglang_generate_API.py
  class SGLANGGENERATEAPI (line 9) | class SGLANGGENERATEAPI(LocalCompletionsAPI):
    method __init__ (line 10) | def __init__(
    method _create_payload (line 20) | def _create_payload(
    method parse_logprobs (line 66) | def parse_logprobs(
    method parse_generations (line 90) | def parse_generations(outputs: Union[Dict, List[Dict]], **kwargs) -> L...
    method api_key (line 99) | def api_key(self):

FILE: lm_eval/models/textsynth.py
  function textsynth_completion (line 29) | def textsynth_completion(**kwargs):
  class TextSynthLM (line 51) | class TextSynthLM(LM):
    method __init__ (line 52) | def __init__(self, engine, truncate: bool = False, **kwargs) -> None:
    method eot_token_id (line 68) | def eot_token_id(self):
    method max_length (line 73) | def max_length(self) -> int:
    method max_gen_toks (line 78) | def max_gen_toks(self) -> int:
    method batch_size (line 82) | def batch_size(self):
    method device (line 87) | def device(self):
    method tok_encode (line 91) | def tok_encode(self, string: str):
    method tok_decode (line 95) | def tok_decode(self, tokens):
    method loglikelihood (line 99) | def loglikelihood(self, requests, disable_tqdm: bool = False):
    method loglikelihood_rolling (line 123) | def loglikelihood_rolling(self, requests, disable_tqdm: bool = False):
    method generate_until (line 133) | def generate_until(self, requests, disable_tqdm: bool = False):
    method _model_call (line 166) | def _model_call(self, inps):
    method _model_generate (line 170) | def _model_generate(self, context, max_length, eos_token_id):

FILE: lm_eval/models/utils.py
  class GenKwargs (line 33) | class GenKwargs(TypedDict, total=False):
  function chunks (line 42) | def chunks(iter, n: int = 0, fn=None):
  class MultiChoice (line 80) | class MultiChoice:
    method __init__ (line 81) | def __init__(self, choices) -> None:
    method __contains__ (line 85) | def __contains__(self, values) -> bool:
    method __iter__ (line 94) | def __iter__(self) -> Iterator:
  class Grouper (line 98) | class Grouper:
    method __init__ (line 105) | def __init__(self, arr, fn) -> None:
    method get_grouped (line 123) | def get_grouped(self):
    method get_original (line 134) | def get_original(self, grouped_dict):
  function undistribute (line 156) | def undistribute(iterable):
  function retry_on_specific_exceptions (line 196) | def retry_on_specific_exceptions(
  class Collator (line 236) | class Collator:
    method __init__ (line 249) | def __init__(
    method _group_by_index (line 270) | def _group_by_index(self) -> None:
    method _group_by_context (line 276) | def _group_by_context(self) -> None:
    method get_batched (line 282) | def get_batched(
    method get_cache (line 329) | def get_cache(
    method _reorder (line 390) | def _reorder(self, arr: list | tuple[tuple[int, Any], ...]) -> Iterator:
    method get_original (line 406) | def get_original(self, newarr: list) -> list:
    method __len__ (line 427) | def __len__(self):
    method group (line 431) | def group(
    method get_chunks (line 474) | def get_chunks(
  function configure_pad_token (line 515) | def configure_pad_token(
  function replace_placeholders (line 560) | def replace_placeholders(
  function flatten_image_list (line 594) | def flatten_image_list(images: list[list]):
  function handle_stop_sequences (line 605) | def handle_stop_sequences(until: str | list[str] | None, eos: str | None...
  function normalize_gen_kwargs (line 621) | def normalize_gen_kwargs(
  function resize_image (line 717) | def resize_image(
  function truncate_tokens (line 817) | def truncate_tokens(
  function maybe_truncate (line 836) | def maybe_truncate(
  function postprocess_generated_text (line 910) | def postprocess_generated_text(
  function has_bos_prefix (line 939) | def has_bos_prefix(sequence: str, bos_str: str | Iterable[str] | None = ...
  function _add_special_kwargs (line 948) | def _add_special_kwargs(add_special_tokens: bool | None, add_bos: bool |...

FILE: lm_eval/models/utils_hf.py
  function pad_and_concat (line 8) | def pad_and_concat(
  function clear_torch_cache (line 59) | def clear_torch_cache() -> None:
  function get_dtype (line 64) | def get_dtype(dtype: str | torch.dtype) -> torch.dtype | str:
  class MultiTokenEOSCriteria (line 74) | class MultiTokenEOSCriteria(transformers.StoppingCriteria):
    method __init__ (line 77) | def __init__(
    method __call__ (line 100) | def __call__(self, input_ids, scores, **kwargs) -> bool:
  function stop_sequences_criteria (line 114) | def stop_sequences_criteria(

FILE: lm_eval/models/vllm_causallms.py
  function _vllm_mp_worker (line 68) | def _vllm_mp_worker(
  class VLLM (line 126) | class VLLM(TemplateLM):
    method __init__ (line 130) | def __init__(
    method eot_token_id (line 288) | def eot_token_id(self):
    method prefix_token_id (line 293) | def prefix_token_id(self):
    method max_length (line 302) | def max_length(self) -> int:
    method max_gen_toks (line 319) | def max_gen_toks(self):
    method apply_chat_template (line 322) | def apply_chat_template(
    method tokenizer_name (line 355) | def tokenizer_name(self) -> str:
    method tok_encode (line 359) | def tok_encode(
    method tok_encode (line 363) | def tok_encode(
    method tok_encode (line 367) | def tok_encode(
    method _model_generate (line 428) | def _model_generate(
    method loglikelihood_rolling (line 558) | def loglikelihood_rolling(
    method generate_until (line 627) | def generate_until(
    method _loglikelihood_tokens (line 725) | def _loglikelihood_tokens(
    method _parse_logprobs (line 787) | def _parse_logprobs(tokens: list, outputs, ctxlen: int) -> tuple[float...
    method modify_gen_kwargs (line 850) | def modify_gen_kwargs(

FILE: lm_eval/models/vllm_vlms.py
  class VLLM_VLM (line 33) | class VLLM_VLM(VLLM):
    method __init__ (line 36) | def __init__(
    method tok_batch_multimodal_encode (line 76) | def tok_batch_multimodal_encode(
    method _multimodal_model_generate (line 102) | def _multimodal_model_generate(
    method apply_chat_template (line 157) | def apply_chat_template(
    method generate_until (line 214) | def generate_until(
    method loglikelihood_rolling (line 309) | def loglikelihood_rolling(

FILE: lm_eval/models/winml.py
  class WindowsML (line 32) | class WindowsML(TemplateLM):
    method create_from_arg_obj (line 43) | def create_from_arg_obj(
    method __init__ (line 67) | def __init__(
    method _validate_dependencies (line 120) | def _validate_dependencies(self) -> None:
    method _fix_winrt_runtime (line 150) | def _fix_winrt_runtime(self):
    method _register_winml_providers_to_genai (line 164) | def _register_winml_providers_to_genai(self) -> bool:
    method _setup_winml_devices_and_providers (line 199) | def _setup_winml_devices_and_providers(self) -> None:
    method _load_and_compile_model (line 238) | def _load_and_compile_model(self, model_path: str) -> None:
    method eot_token_id (line 283) | def eot_token_id(self) -> int:
    method prefix_token_id (line 310) | def prefix_token_id(self) -> int | None:
    method max_gen_toks (line 340) | def max_gen_toks(self) -> int:
    method tok_encode (line 349) | def tok_encode(
    method tok_decode (line 375) | def tok_decode(self, tokens: list[int]) -> str:
    method _run_genai_inference_for_full_logits (line 387) | def _run_genai_inference_for_full_logits(self, input_text: str) -> np....
    method _loglikelihood_tokens (line 438) | def _loglikelihood_tokens(
    method loglikelihood (line 461) | def loglikelihood(
    method loglikelihood_rolling (line 574) | def loglikelihood_rolling(
    method generate_until (line 647) | def generate_until(
    method _run_genai_generation (line 694) | def _run_genai_generation(

FILE: lm_eval/prompts/__init__.py
  function get_prompt (line 23) | def get_prompt(prompt_id: str, dataset_name: str = None, subset_name: st...
  function load_prompt_list (line 72) | def load_prompt_list(
  class PromptString (line 115) | class PromptString:
    method __init__ (line 116) | def __init__(self, prompt_string):
    method apply (line 119) | def apply(self, doc):

FILE: lm_eval/result_schema.py
  class _TaskMetrics (line 110) | class _TaskMetrics(TypedDict, Generic[T], extra_items=T):
  class _SampleCount (line 131) | class _SampleCount(TypedDict):
  class _EvalConfig (line 141) | class _EvalConfig(TypedDict, total=False):
  class SampleResult (line 163) | class SampleResult(TypedDict, extra_items=float):

FILE: lm_eval/tasks/__init__.py
  function get_task_name_from_config (line 36) | def get_task_name_from_config(task_config: dict[str, str]) -> str:
  function get_task_name_from_object (line 50) | def get_task_name_from_object(task_object):
  function _check_duplicates (line 63) | def _check_duplicates(task_dict: dict) -> None:
  function _log_task_dict (line 98) | def _log_task_dict(task_dict: dict, task_manager: "TaskManager") -> None:
  function get_task_dict (line 137) | def get_task_dict(

FILE: lm_eval/tasks/_factory.py
  class TaskFactory (line 25) | class TaskFactory:
    method __init__ (line 32) | def __init__(self, *, meta: dict[str, Any] | None = None):
    method build (line 37) | def build(
    method _build_task (line 65) | def _build_task(self, entry: Entry, overrides: dict[str, Any] | None) ...
    method _build_group (line 85) | def _build_group(
    method _build_group_members (line 127) | def _build_group_members(
    method _build_tag (line 234) | def _build_tag(
    method _load_full_config (line 255) | def _load_full_config(
  function _ctor_accepts_config (line 283) | def _ctor_accepts_config(cls) -> bool:

FILE: lm_eval/tasks/_index.py
  class Kind (line 19) | class Kind(Enum):
  class Entry (line 28) | class Entry:
  class TaskIndex (line 36) | class TaskIndex:
    method __init__ (line 41) | def __init__(self, *, meta: dict[str, str] | None = None) -> None:
    method build (line 45) | def build(
    method _iter_yaml_files (line 82) | def _iter_yaml_files(root: Path):
    method process_cfg (line 94) | def process_cfg(
    method _register_tags (line 139) | def _register_tags(
    method _kind_of (line 154) | def _kind_of(cfg: dict) -> Kind:
    method entry_from_path (line 168) | def entry_from_path(path: Path) -> Entry | None:
    method entry_from_config (line 179) | def entry_from_config(cfg: dict[str, Any]) -> Entry | None:
    method _str_to_set (line 192) | def _str_to_set(*args) -> set[str]:

FILE: lm_eval/tasks/_yaml_loader.py
  function _mk_function_ctor (line 17) | def _mk_function_ctor(base_dir: Path, resolve: bool):
  function _make_loader (line 27) | def _make_loader(base_dir: Path, *, resolve_funcs: bool) -> type[yaml.Lo...
  function _load_module_with_cache (line 38) | def _load_module_with_cache(module_path: Path) -> Any:
  function _import_func_in_yml (line 93) | def _import_func_in_yml(qual: str, base_dir: Path):
  function _import_fun_from_str (line 130) | def _import_fun_from_str(path_str: str) -> Any:
  function load_yaml (line 164) | def load_yaml(

FILE: lm_eval/tasks/aclue/_generate_configs.py
  function parse_args (line 35) | def parse_args():

FILE: lm_eval/tasks/acpbench/gen_2shot/acp_utils.py
  class ACPBench_Visitor (line 47) | class ACPBench_Visitor(Visitor):
    method __init__ (line 48) | def __init__(self) -> None:
    method action_list (line 56) | def action_list(self, tree):
    method prog_list (line 59) | def prog_list(self, tree):
    method progression_list (line 64) | def progression_list(self, tree):
    method action_none (line 67) | def action_none(self, tree):
    method action_name (line 70) | def action_name(self, tree):
    method index (line 78) | def index(self, tree):
  class ACPGrammarParser (line 84) | class ACPGrammarParser(object):
    method __init__ (line 85) | def __init__(self, task) -> None:
    method parse (line 91) | def parse(self, input, debug=False):
  function is_on_optimal_plan (line 135) | def is_on_optimal_plan(domain, problem, action, opt):
  function is_plan (line 177) | def is_plan(domain, problem, new_plan):
  function get_action_preconditions (line 196) | def get_action_preconditions(domain, problem, action):
  function generate_optimal_plans_for_problem_state (line 207) | def generate_optimal_plans_for_problem_state(P, state, num_plans, timeout):
  function generate_top_q_plans (line 228) | def generate_top_q_plans(domain, problem, num_plans=10, quality_bound=1....
  function is_unsolvable_new_goal (line 241) | def is_unsolvable_new_goal(domain, problem, new_goal):
  function is_unsolvable (line 247) | def is_unsolvable(domain, problem):
  function extract_goal (line 274) | def extract_goal(prob):
  function entails (line 288) | def entails(state, partialstate):
  function progress (line 292) | def progress(state, act):
  function regress (line 302) | def regress(state, act):
  function get_STRIPS (line 312) | def get_STRIPS(domain, problem):
  function create_tmp_dom_prob_replace_init (line 330) | def create_tmp_dom_prob_replace_init(P, state, result_domain_file, resul...
  function fix_name (line 340) | def fix_name(s):
  function get_atoms_pddl (line 354) | def get_atoms_pddl(d, p, atoms):
  class Action (line 390) | class Action:
    method __init__ (line 391) | def __init__(self, name, pre, add, delete):
    method __str__ (line 397) | def __str__(self):
    method toJSON (line 404) | def toJSON(self):
    method __repr__ (line 416) | def __repr__(self):
    method __eq__ (line 419) | def __eq__(self, action):
    method __hash__ (line 422) | def __hash__(self):
  class STRIPS (line 426) | class STRIPS:
    method __init__ (line 427) | def __init__(self, domain, problem):
    method __str__ (line 453) | def __str__(self):
    method toJSON (line 460) | def toJSON(self):
    method operator_to_action (line 473) | def operator_to_action(self, op, check_fluents=True, check_static=False):
    method fix_pre_name (line 488) | def fix_pre_name(self, precondition):
    method action (line 493) | def action(self, name):
    method get_action_or_none (line 496) | def get_action_or_none(self, name):
    method fluent (line 501) | def fluent(self, name):
    method static_symbols (line 504) | def static_symbols(self):
    method fluent_symbols (line 507) | def fluent_symbols(self):
    method get_grounded_atoms (line 510) | def get_grounded_atoms(self, symbol):
    method get_applicable_actions (line 523) | def get_applicable_actions(self, s):
    method ground_problem (line 526) | def ground_problem(self, problem):
    method get_static (line 551) | def get_static(self):
    method PDDL_replace_init_pddl_parser (line 558) | def PDDL_replace_init_pddl_parser(self, s):
  function parse_ans (line 571) | def parse_ans(response: str, parser: ACPGrammarParser, task: str):
  function remove_garbage (line 582) | def remove_garbage(s):
  function compare_str (line 593) | def compare_str(s1, s2):
  function compare (line 597) | def compare(l1, l2):
  function check_prog_response (line 608) | def check_prog_response(resp):
  function clean_answer (line 618) | def clean_answer(resp, task):
  function get_grammar_task (line 642) | def get_grammar_task(task):
  function fix_action_name (line 666) | def fix_action_name(a):
  function str_remove_before_first_parentheses (line 671) | def str_remove_before_first_parentheses(s):
  function str_remove_after_last_parentheses (line 680) | def str_remove_after_last_parentheses(s):
  function cleanup_answer (line 691) | def cleanup_answer(ans):
  function set_equal (line 710) | def set_equal(ans1, ans2):
  class BaseEvaluator (line 714) | class BaseEvaluator(ABC):
    method __init__ (line 715) | def __init__(self) -> None:
    method get_score (line 719) | def get_score(self, ans, doc):
    method add_scores (line 722) | def add_scores(self, scores):
    method get_avg_score (line 725) | def get_avg_score(self):
  function get_evaluator (line 730) | def get_evaluator(group):
  class ActionReachabilityEvaluator (line 757) | class ActionReachabilityEvaluator(BaseEvaluator):
    method get_score (line 758) | def get_score(self, ans, doc):
  class ApplicabilityEvaluator (line 801) | class ApplicabilityEvaluator(BaseEvaluator):
    method get_score (line 802) | def get_score(self, ans, doc):
  function is_subsequence (line 817) | def is_subsequence(plan, new_plan):
  function is_subsequence_and_plan (line 828) | def is_subsequence_and_plan(domain, problem, plan, new_plan):
  class JustificationEvaluator (line 842) | class JustificationEvaluator(BaseEvaluator):
    method get_score (line 843) | def get_score(self, ans, doc):
  class LandmarksEvaluator (line 883) | class LandmarksEvaluator(BaseEvaluator):
    method get_score (line 884) | def get_score(self, ans, doc):
  class NextActionEvaluator (line 916) | class NextActionEvaluator(BaseEvaluator):
    method get_score (line 917) | def get_score(self, ans, doc):
  class ProgressionEvaluator (line 961) | class ProgressionEvaluator(BaseEvaluator):
    method get_score (line 962) | def get_score(self, ans, doc):
  class ReachabilityEvaluator (line 992) | class ReachabilityEvaluator(BaseEvaluator):
    method get_score (line 993) | def get_score(self, ans, doc):
  class ValidationEvaluator (line 1029) | class ValidationEvaluator(BaseEvaluator):
    method get_score (line 1030) | def get_score(self, ans, doc):
  function dump_item (line 1049) | def dump_item(item, **kwargs):
  function parse_prediction (line 1053) | def parse_prediction(prediction):
  class ACPGrammarFilter (line 1064) | class ACPGrammarFilter(RegexFilter):
    method __init__ (line 1067) | def __init__(self, *args, **kwargs):
    method clean_pos_neg (line 1071) | def clean_pos_neg(self, resp):
    method clean_simplified_plan (line 1082) | def clean_simplified_plan(self, resp):
    method apply (line 1091) | def apply(self, resps, docs):
  function process_acp_results (line 1107) | def process_acp_results(doc, results):
  function get_score (line 1111) | def get_score(references, predictions, **kwargs):

FILE: lm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py
  class ACPBench_Visitor (line 47) | class ACPBench_Visitor(Visitor):
    method __init__ (line 48) | def __init__(self) -> None:
    method action_list (line 56) | def action_list(self, tree):
    method prog_list (line 59) | def prog_list(self, tree):
    method progression_list (line 64) | def progression_list(self, tree):
    method action_none (line 67) | def action_none(self, tree):
    method action_name (line 70) | def action_name(self, tree):
    method index (line 78) | def index(self, tree):
  class ACPGrammarParser (line 84) | class ACPGrammarParser(object):
    method __init__ (line 85) | def __init__(self, task) -> None:
    method parse (line 91) | def parse(self, input, debug=False):
  function is_on_optimal_plan (line 135) | def is_on_optimal_plan(domain, problem, action, opt):
  function is_plan (line 177) | def is_plan(domain, problem, new_plan):
  function get_action_preconditions (line 196) | def get_action_preconditions(domain, problem, action):
  function generate_optimal_plans_for_problem_state (line 207) | def generate_optimal_plans_for_problem_state(P, state, num_plans, timeout):
  function generate_top_q_plans (line 228) | def generate_top_q_plans(domain, problem, num_plans=10, quality_bound=1....
  function is_unsolvable_new_goal (line 241) | def is_unsolvable_new_goal(domain, problem, new_goal):
  function is_unsolvable (line 247) | def is_unsolvable(domain, problem):
  function extract_goal (line 274) | def extract_goal(prob):
  function entails (line 288) | def entails(state, partialstate):
  function progress (line 292) | def progress(state, act):
  function regress (line 302) | def regress(state, act):
  function get_STRIPS (line 312) | def get_STRIPS(domain, problem):
  function create_tmp_dom_prob_replace_init (line 330) | def create_tmp_dom_prob_replace_init(P, state, result_domain_file, resul...
  function fix_name (line 340) | def fix_name(s):
  function get_atoms_pddl (line 354) | def get_atoms_pddl(d, p, atoms):
  class Action (line 390) | class Action:
    method __init__ (line 391) | def __init__(self, name, pre, add, delete):
    method __str__ (line 397) | def __str__(self):
    method toJSON (line 404) | def toJSON(self):
    method __repr__ (line 416) | def __repr__(self):
    method __eq__ (line 419) | def __eq__(self, action):
    method __hash__ (line 422) | def __hash__(self):
  class STRIPS (line 426) | class STRIPS:
    method __init__ (line 427) | def __init__(self, domain, problem):
    method __str__ (line 453) | def __str__(self):
    method toJSON (line 460) | def toJSON(self):
    method operator_to_action (line 473) | def operator_to_action(self, op, check_fluents=True, check_static=False):
    method fix_pre_name (line 488) | def fix_pre_name(self, precondition):
    method action (line 493) | def action(self, name):
    method get_action_or_none (line 496) | def get_action_or_none(self, name):
    method fluent (line 501) | def fluent(self, name):
    method static_symbols (line 504) | def static_symbols(self):
    method fluent_symbols (line 507) | def fluent_symbols(self):
    method get_grounded_atoms (line 510) | def get_grounded_atoms(self, symbol):
    method get_applicable_actions (line 523) | def get_applicable_actions(self, s):
    method ground_problem (line 526) | def ground_problem(self, problem):
    method get_static (line 551) | def get_static(self):
    method PDDL_replace_init_pddl_parser (line 558) | def PDDL_replace_init_pddl_parser(self, s):
  function parse_ans (line 571) | def parse_ans(response: str, parser: ACPGrammarParser, task: str):
  function remove_garbage (line 582) | def remove_garbage(s):
  function compare_str (line 593) | def compare_str(s1, s2):
  function compare (line 597) | def compare(l1, l2):
  function check_prog_response (line 608) | def check_prog_response(resp):
  function clean_answer (line 618) | def clean_answer(resp, task):
  function get_grammar_task (line 642) | def get_grammar_task(task):
  function fix_action_name (line 666) | def fix_action_name(a):
  function str_remove_before_first_parentheses (line 671) | def str_remove_before_first_parentheses(s):
  function str_remove_after_last_parentheses (line 680) | def str_remove_after_last_parentheses(s):
  function cleanup_answer (line 691) | def cleanup_answer(ans):
  function set_equal (line 710) | def set_equal(ans1, ans2):
  class BaseEvaluator (line 714) | class BaseEvaluator(ABC):
    method __init__ (line 715) | def __init__(self) -> None:
    method get_score (line 719) | def get_score(self, ans, doc):
    method add_scores (line 722) | def add_scores(self, scores):
    method get_avg_score (line 725) | def get_avg_score(self):
  function get_evaluator (line 730) | def get_evaluator(group):
  class ActionReachabilityEvaluator (line 757) | class ActionReachabilityEvaluator(BaseEvaluator):
    method get_score (line 758) | def get_score(self, ans, doc):
  class ApplicabilityEvaluator (line 801) | class ApplicabilityEvaluator(BaseEvaluator):
    method get_score (line 802) | def get_score(self, ans, doc):
  function is_subsequence (line 817) | def is_subsequence(plan, new_plan):
  function is_subsequence_and_plan (line 828) | def is_subsequence_and_plan(domain, problem, plan, new_plan):
  class JustificationEvaluator (line 842) | class JustificationEvaluator(BaseEvaluator):
    method get_score (line 843) | def get_score(self, ans, doc):
  class LandmarksEvaluator (line 883) | class LandmarksEvaluator(BaseEvaluator):
    method get_score (line 884) | def get_score(self, ans, doc):
  class NextActionEvaluator (line 916) | class NextActionEvaluator(BaseEvaluator):
    method get_score (line 917) | def get_score(self, ans, doc):
  class ProgressionEvaluator (line 961) | class ProgressionEvaluator(BaseEvaluator):
    method get_score (line 962) | def get_score(self, ans, doc):
  class ReachabilityEvaluator (line 992) | class ReachabilityEvaluator(BaseEvaluator):
    method get_score (line 993) | def get_score(self, ans, doc):
  class ValidationEvaluator (line 1029) | class ValidationEvaluator(BaseEvaluator):
    method get_score (line 1030) | def get_score(self, ans, doc):
  function dump_item (line 1049) | def dump_item(item, **kwargs):
  function parse_prediction (line 1053) | def parse_prediction(prediction):
  class ACPGrammarFilter (line 1064) | class ACPGrammarFilter(RegexFilter):
    method __init__ (line 1067) | def __init__(self, *args, **kwargs):
    method clean_pos_neg (line 1071) | def clean_pos_neg(self, resp):
    method clean_simplified_plan (line 1082) | def clean_simplified_plan(self, resp):
    method apply (line 1091) | def apply(self, resps, docs):
  function process_acp_results (line 1107) | def process_acp_results(doc, results):
  function get_score (line 1111) | def get_score(references, predictions, **kwargs):

FILE: lm_eval/tasks/afrimgsm/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 22) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 96) | def main() -> None:

FILE: lm_eval/tasks/afrimgsm/utils.py
  function add_regex_pattern (line 75) | def add_regex_pattern(regex_pattern):
  function gen_lang_yamls (line 109) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 195) | def main() -> None:

FILE: lm_eval/tasks/afrimmlu/direct/prompt_1/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/direct/prompt_2/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/direct/prompt_3/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/direct/prompt_4/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/direct/prompt_5/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function gen_lang_yamls (line 12) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 77) | def main() -> None:

FILE: lm_eval/tasks/afrimmlu/translate/prompt_1/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/translate/prompt_2/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/translate/prompt_3/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/translate/prompt_4/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/translate/prompt_5/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrimmlu/utils.py
  function doc_to_choice (line 4) | def doc_to_choice(doc):
  function doc_to_text (line 9) | def doc_to_text(doc):

FILE: lm_eval/tasks/afrixnli/anli prompt/en-direct/utils.py
  function doc_to_target (line 4) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/anli prompt/translate/utils.py
  function doc_to_target (line 4) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/direct/prompt_1/utils.py
  function doc_to_text (line 4) | def doc_to_text(doc):
  function doc_to_target (line 17) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/direct/prompt_2/utils.py
  function doc_to_target (line 4) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/direct/prompt_3/utils.py
  function doc_to_target (line 4) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/direct/prompt_4/utils.py
  function doc_to_text (line 4) | def doc_to_text(doc):
  function doc_to_target (line 17) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/direct/prompt_5/utils.py
  function doc_to_target (line 4) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 30) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 103) | def main() -> None:

FILE: lm_eval/tasks/afrixnli/lai prompt/direct/utils.py
  function doc_to_text (line 4) | def doc_to_text(doc):
  function doc_to_target (line 17) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/lai prompt/translate/utils.py
  function doc_to_text (line 4) | def doc_to_text(doc):
  function doc_to_target (line 17) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/translate/prompt_1/utils.py
  function doc_to_text (line 4) | def doc_to_text(doc):
  function doc_to_target (line 17) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/translate/prompt_2/utils.py
  function doc_to_target (line 4) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/translate/prompt_3/utils.py
  function doc_to_text (line 4) | def doc_to_text(doc):
  function doc_to_target (line 19) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/translate/prompt_4/utils.py
  function doc_to_text (line 4) | def doc_to_text(doc):
  function doc_to_target (line 17) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/translate/prompt_5/utils.py
  function doc_to_target (line 4) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrixnli/utils.py
  class FunctionTag (line 6) | class FunctionTag:
    method __init__ (line 7) | def __init__(self, value):
  function gen_lang_yamls (line 123) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 211) | def main() -> None:

FILE: lm_eval/tasks/afrobench/adr/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 30) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 79) | def main() -> None:

FILE: lm_eval/tasks/afrobench/afriqa/prompt_1/utils.py
  function normalize_answer (line 6) | def normalize_answer(s):
  function f1 (line 28) | def f1(items):

FILE: lm_eval/tasks/afrobench/afriqa/prompt_2/utils.py
  function normalize_answer (line 6) | def normalize_answer(s):
  function f1 (line 28) | def f1(items):

FILE: lm_eval/tasks/afrobench/afriqa/prompt_3/utils.py
  function normalize_answer (line 6) | def normalize_answer(s):
  function f1 (line 28) | def f1(items):

FILE: lm_eval/tasks/afrobench/afriqa/prompt_4/utils.py
  function normalize_answer (line 6) | def normalize_answer(s):
  function f1 (line 28) | def f1(items):

FILE: lm_eval/tasks/afrobench/afriqa/prompt_5/utils.py
  function normalize_answer (line 6) | def normalize_answer(s):
  function f1 (line 28) | def f1(items):

FILE: lm_eval/tasks/afrobench/afriqa/utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 43) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 99) | def main() -> None:

FILE: lm_eval/tasks/afrobench/afrisenti/utils.py
  class FunctionTag (line 6) | class FunctionTag:
    method __init__ (line 7) | def __init__(self, value):
  function prompt_func (line 11) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 35) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 98) | def main() -> None:

FILE: lm_eval/tasks/afrobench/belebele/utils.py
  function prompt_func (line 7) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 18) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 129) | def main() -> None:

FILE: lm_eval/tasks/afrobench/flores/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang, lang_dict):
  function gen_lang_yamls (line 33) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str, reverse:...
  function main (line 165) | def main() -> None:

FILE: lm_eval/tasks/afrobench/injongointent/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang, intent):
  function gen_lang_yamls (line 29) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 133) | def main() -> None:

FILE: lm_eval/tasks/afrobench/mafand/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang, lang_dict):
  function gen_lang_yamls (line 35) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str, reverse:...
  function main (line 110) | def main() -> None:

FILE: lm_eval/tasks/afrobench/mafand/prompt_1/african-english/utils.py
  function get_target (line 26) | def get_target(doc):
  function get_target_reverse (line 35) | def get_target_reverse(doc):
  function create_text_prompt_1 (line 43) | def create_text_prompt_1(doc):
  function create_reverse_prompt_1 (line 57) | def create_reverse_prompt_1(doc):
  function create_text_prompt_2 (line 72) | def create_text_prompt_2(doc):
  function create_reverse_prompt_2 (line 84) | def create_reverse_prompt_2(doc):
  function create_text_prompt_3 (line 97) | def create_text_prompt_3(doc):
  function create_reverse_prompt_3 (line 110) | def create_reverse_prompt_3(doc):

FILE: lm_eval/tasks/afrobench/mafand/prompt_1/english-african/utils.py
  function get_target (line 26) | def get_target(doc):
  function get_target_reverse (line 35) | def get_target_reverse(doc):
  function create_text_prompt_1 (line 43) | def create_text_prompt_1(doc):
  function create_reverse_prompt_1 (line 57) | def create_reverse_prompt_1(doc):
  function create_text_prompt_2 (line 72) | def create_text_prompt_2(doc):
  function create_reverse_prompt_2 (line 84) | def create_reverse_prompt_2(doc):
  function create_text_prompt_3 (line 97) | def create_text_prompt_3(doc):
  function create_reverse_prompt_3 (line 110) | def create_reverse_prompt_3(doc):

FILE: lm_eval/tasks/afrobench/mafand/prompt_2/african-english/utils.py
  function get_target (line 26) | def get_target(doc):
  function get_target_reverse (line 35) | def get_target_reverse(doc):
  function create_text_prompt_1 (line 43) | def create_text_prompt_1(doc):
  function create_reverse_prompt_1 (line 57) | def create_reverse_prompt_1(doc):
  function create_text_prompt_2 (line 72) | def create_text_prompt_2(doc):
  function create_reverse_prompt_2 (line 84) | def create_reverse_prompt_2(doc):
  function create_text_prompt_3 (line 97) | def create_text_prompt_3(doc):
  function create_reverse_prompt_3 (line 110) | def create_reverse_prompt_3(doc):

FILE: lm_eval/tasks/afrobench/mafand/prompt_2/english-african/utils.py
  function get_target (line 26) | def get_target(doc):
  function get_target_reverse (line 35) | def get_target_reverse(doc):
  function create_text_prompt_1 (line 43) | def create_text_prompt_1(doc):
  function create_reverse_prompt_1 (line 57) | def create_reverse_prompt_1(doc):
  function create_text_prompt_2 (line 72) | def create_text_prompt_2(doc):
  function create_reverse_prompt_2 (line 84) | def create_reverse_prompt_2(doc):
  function create_text_prompt_3 (line 97) | def create_text_prompt_3(doc):
  function create_reverse_prompt_3 (line 110) | def create_reverse_prompt_3(doc):

FILE: lm_eval/tasks/afrobench/mafand/prompt_3/african-english/utils.py
  function get_target (line 26) | def get_target(doc):
  function get_target_reverse (line 35) | def get_target_reverse(doc):
  function create_text_prompt_1 (line 43) | def create_text_prompt_1(doc):
  function create_reverse_prompt_1 (line 57) | def create_reverse_prompt_1(doc):
  function create_text_prompt_2 (line 72) | def create_text_prompt_2(doc):
  function create_reverse_prompt_2 (line 84) | def create_reverse_prompt_2(doc):
  function create_text_prompt_3 (line 97) | def create_text_prompt_3(doc):
  function create_reverse_prompt_3 (line 110) | def create_reverse_prompt_3(doc):

FILE: lm_eval/tasks/afrobench/mafand/prompt_3/english-african/utils.py
  function get_target (line 26) | def get_target(doc):
  function get_target_reverse (line 35) | def get_target_reverse(doc):
  function create_text_prompt_1 (line 43) | def create_text_prompt_1(doc):
  function create_reverse_prompt_1 (line 57) | def create_reverse_prompt_1(doc):
  function create_text_prompt_2 (line 72) | def create_text_prompt_2(doc):
  function create_reverse_prompt_2 (line 84) | def create_reverse_prompt_2(doc):
  function create_text_prompt_3 (line 97) | def create_text_prompt_3(doc):
  function create_reverse_prompt_3 (line 110) | def create_reverse_prompt_3(doc):

FILE: lm_eval/tasks/afrobench/masakhaner/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 48) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 112) | def main() -> None:

FILE: lm_eval/tasks/afrobench/masakhaner/prompt_1/utils.py
  function doc_to_target (line 7) | def doc_to_target(doc):
  function transform_text (line 11) | def transform_text(text):
  function span_f1_agg (line 42) | def span_f1_agg(items):

FILE: lm_eval/tasks/afrobench/masakhaner/prompt_2/utils.py
  function doc_to_target (line 7) | def doc_to_target(doc):
  function transform_text (line 11) | def transform_text(text):
  function span_f1_agg (line 42) | def span_f1_agg(items):

FILE: lm_eval/tasks/afrobench/masakhaner/prompt_3/utils.py
  function doc_to_target (line 7) | def doc_to_target(doc):
  function transform_text (line 11) | def transform_text(text):
  function span_f1_agg (line 42) | def span_f1_agg(items):

FILE: lm_eval/tasks/afrobench/masakhaner/prompt_4/utils.py
  function doc_to_target (line 7) | def doc_to_target(doc):
  function transform_text (line 11) | def transform_text(text):
  function span_f1_agg (line 42) | def span_f1_agg(items):

FILE: lm_eval/tasks/afrobench/masakhaner/prompt_5/utils.py
  function doc_to_target (line 7) | def doc_to_target(doc):
  function transform_text (line 11) | def transform_text(text):
  function span_f1_agg (line 42) | def span_f1_agg(items):

FILE: lm_eval/tasks/afrobench/masakhanews/utils.py
  function prompt_func (line 7) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 35) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 97) | def main() -> None:

FILE: lm_eval/tasks/afrobench/masakhapos/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 61) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 125) | def main() -> None:

FILE: lm_eval/tasks/afrobench/masakhapos/prompt_1/utils.py
  function doc_to_target (line 8) | def doc_to_target(doc):
  function acc_score (line 32) | def acc_score(items):

FILE: lm_eval/tasks/afrobench/masakhapos/prompt_2/utils.py
  function doc_to_target (line 8) | def doc_to_target(doc):
  function acc_score (line 32) | def acc_score(items):

FILE: lm_eval/tasks/afrobench/masakhapos/prompt_3/utils.py
  function doc_to_target (line 8) | def doc_to_target(doc):
  function acc_score (line 32) | def acc_score(items):

FILE: lm_eval/tasks/afrobench/masakhapos/prompt_4/utils.py
  function doc_to_target (line 8) | def doc_to_target(doc):
  function acc_score (line 32) | def acc_score(items):

FILE: lm_eval/tasks/afrobench/masakhapos/prompt_5/utils.py
  function doc_to_target (line 8) | def doc_to_target(doc):
  function acc_score (line 32) | def acc_score(items):

FILE: lm_eval/tasks/afrobench/masakhapos/utils.py
  function doc_to_text (line 4) | def doc_to_text(doc):
  function doc_to_target (line 19) | def doc_to_target(doc):

FILE: lm_eval/tasks/afrobench/naijarc/utils.py
  function prompt_func (line 7) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 18) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 67) | def main() -> None:

FILE: lm_eval/tasks/afrobench/ntrex/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang, lang_dict):
  function gen_lang_yamls (line 33) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str, reverse:...
  function main (line 134) | def main() -> None:

FILE: lm_eval/tasks/afrobench/openai_mmlu/utils.py
  function prompt_func (line 7) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 18) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 73) | def main() -> None:

FILE: lm_eval/tasks/afrobench/salt/gen_utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang, lang_dict):
  function gen_lang_yamls (line 34) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str, reverse:...
  function main (line 112) | def main() -> None:

FILE: lm_eval/tasks/afrobench/sib/utils.py
  class FunctionTag (line 7) | class FunctionTag:
    method __init__ (line 8) | def __init__(self, value):
  function prompt_func (line 12) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 40) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 201) | def main() -> None:

FILE: lm_eval/tasks/afrobench/uhura-arc-easy/utils.py
  function get_language_from_code (line 8) | def get_language_from_code(code: str) -> str:
  function prompt_func (line 13) | def prompt_func(mode):
  function gen_lang_yamls (line 51) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 99) | def main() -> None:

FILE: lm_eval/tasks/afrobench/xlsum/prompt_1/utils.py
  function rougeL (line 4) | def rougeL(items):
  function rougeL_agg (line 11) | def rougeL_agg(items):

FILE: lm_eval/tasks/afrobench/xlsum/prompt_2/utils.py
  function rougeL (line 4) | def rougeL(items):
  function rougeL_agg (line 11) | def rougeL_agg(items):

FILE: lm_eval/tasks/afrobench/xlsum/prompt_3/utils.py
  function rougeL (line 4) | def rougeL(items):
  function rougeL_agg (line 11) | def rougeL_agg(items):

FILE: lm_eval/tasks/afrobench/xlsum/utils.py
  function prompt_func (line 7) | def prompt_func(mode, lang):
  function gen_lang_yamls (line 29) | def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
  function main (line 88) | def main() -> None:

FILE: lm_eval/tasks/agieval/utils.py
  function parse_math_answer (line 10) | def parse_math_answer(raw_string):
  function _fix_fracs (line 82) | def _fix_fracs(string):
  function _fix_a_slash_b (line 114) | def _fix_a_slash_b(string):
  function _remove_right_units (line 129) | def _remove_right_units(string):
  function _fix_sqrt (line 139) | def _fix_sqrt(string):
  function _strip_string (line 154) | def _strip_string(string):
  function is_equiv (line 224) | def is_equiv(str1, str2, verbose=False):
  function process_results (line 243) | def process_results(doc: dict, results: List[str]) -> Dict[str, int]:
  function process_results_mcqa (line 262) | def process_results_mcqa(doc, results):

FILE: lm_eval/tasks/aime/utils.py
  function process_results (line 5) | def process_results(doc: dict, results: List[str]) -> Dict[str, int]:
  function is_equiv (line 36) | def is_equiv(str1, str2, verbose=False):
  function remove_boxed (line 53) | def remove_boxed(s):
  function last_boxed_only_string (line 67) | def last_boxed_only_string(string):
  function fix_fracs (line 97) | def fix_fracs(string):
  function fix_a_slash_b (line 129) | def fix_a_slash_b(string):
  function remove_right_units (line 144) | def remove_right_units(string):
  function fix_sqrt (line 154) | def fix_sqrt(string):
  function strip_string (line 169) | def strip_string(string):

FILE: lm_eval/tasks/arab_culture/_generate_configs.py
  function parse_args (line 34) | def parse_args():

FILE: lm_eval/tasks/arab_culture/utils_mcq.py
  function doc_to_text (line 49) | def doc_to_text(doc):
  function doc_to_choice (line 101) | def doc_to_choice(doc):
  function doc_to_target (line 105) | def doc_to_target(doc):

FILE: lm_eval/tasks/arab_culture_completion/_generate_configs.py
  function parse_args (line 34) | def parse_args():

FILE: lm_eval/tasks/arab_culture_completion/utils_completion.py
  function doc_to_text (line 52) | def doc_to_text(doc):
  function doc_to_choice (line 91) | def doc_to_choice(doc):
  function doc_to_target (line 96) | def doc_to_target(doc):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_alghafa/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_exams/utils.py
  function process_docs (line 15) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mmlu/utils.py
  function process_docs (line 15) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_arc_challenge/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_arc_easy/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_boolq/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_copa/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_hellaswag/utils.py
  function process_docs (line 7) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_mmlu/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_openbook_qa/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_piqa/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_race/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_sciq/utils.py
  function doc_to_text (line 7) | def doc_to_text(doc):
  function process_docs (line 24) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_arabic_mt_toxigen/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_alghafa_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_exams_light/utils.py
  function process_docs (line 15) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/utils.py
  function process_docs (line 15) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_arc_challenge_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_arc_easy_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_boolq_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_copa_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_hellaswag_light/utils.py
  function process_docs (line 7) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_mmlu_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_openbook_qa_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_piqa_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_race_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_sciq_light/utils.py
  function doc_to_text (line 7) | def doc_to_text(doc):
  function process_docs (line 24) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mt_toxigen_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/utils.py
  function process_docs (line 5) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/arabicmmlu/_generate_configs.py
  function parse_args (line 60) | def parse_args():

FILE: lm_eval/tasks/arabicmmlu/utils.py
  function doc_to_text (line 14) | def doc_to_text(doc):
  function doc_to_choice (line 43) | def doc_to_choice(doc):

FILE: lm_eval/tasks/aradice/ArabicMMLU/EGY/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/ArabicMMLU/EGY/utils.py
  function process_docs (line 51) | def process_docs(dataset):

FILE: lm_eval/tasks/aradice/ArabicMMLU/LEV/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/ArabicMMLU/LEV/utils.py
  function process_docs (line 50) | def process_docs(dataset):

FILE: lm_eval/tasks/aradice/boolq/EGY/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/boolq/EGY/utils.py
  function process_docs (line 4) | def process_docs(dataset):

FILE: lm_eval/tasks/aradice/boolq/ENG/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/boolq/ENG/utils.py
  function process_docs (line 4) | def process_docs(dataset):

FILE: lm_eval/tasks/aradice/boolq/LEV/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/boolq/LEV/utils.py
  function process_docs (line 4) | def process_docs(dataset):

FILE: lm_eval/tasks/aradice/boolq/MSA/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/boolq/MSA/utils.py
  function process_docs (line 4) | def process_docs(dataset):

FILE: lm_eval/tasks/aradice/cultural-benchmark/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/cultural-benchmark/utils.py
  function process_docs (line 1) | def process_docs(dataset):

FILE: lm_eval/tasks/aradice/openbookqa/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/openbookqa/utils.py
  function doc_to_target (line 1) | def doc_to_target(doc):
  function doc_to_choice (line 12) | def doc_to_choice(doc):
  function doc_to_text (line 17) | def doc_to_text(doc):

FILE: lm_eval/tasks/aradice/piqa/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/truthfulqa_mcq/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/winogrande/metrics.py
  function macro_f1_score (line 4) | def macro_f1_score(items):
  function micro_f1_score (line 12) | def micro_f1_score(items):
  function weighted_f1_score (line 20) | def weighted_f1_score(items):

FILE: lm_eval/tasks/aradice/winogrande/utils.py
  function doc_to_text (line 1) | def doc_to_text(doc):
  function doc_to_target (line 6) | def doc_to_target(doc):
  function doc_to_choice (line 11) | def doc_to_choice(doc):

FILE: lm_eval/tasks/babilong/common_utils.py
  function get_tokenizer (line 18) | def get_tokenizer(
  function postprocess_pred (line 27) | def postprocess_pred(prediction: list[str]) -> list[str]:
  function load_dataset (line 40) | def load_dataset(**kwargs):
  function process_results (line 55) | def process_results(doc: dict, results: list[str]) -> dict[str, float]:

FILE: lm_eval/tasks/basque_bench/flores_eu/create_yamls_flores_eu.py
  function doc_to_text (line 257) | def doc_to_text(src: str, tgt: str) -> str:
  function doc_to_target (line 265) | def doc_to_target(tgt: str) -> str:
  function gen_lang_yamls (line 272) | def gen_lang_yamls(output_dir: str, overwrite: bool) -> None:
  function main (line 316) | def main() -> None:

FILE: lm_eval/tasks/basque_bench/utils.py
  function xcopa_doc_to_text (line 6) | def xcopa_doc_to_text(doc):
  function xcopa_doc_to_choice (line 11) | def xcopa_doc_to_choice(doc):
  function paws_process_docs (line 21) | def paws_process_docs(dataset):

FILE: lm_eval/tasks/basqueglue/utils.py
  function general_detokenize (line 7) | def general_detokenize(string):
  function process_doc (line 16) | def process_doc(string):
  function process_wic_docs (line 22) | def process_wic_docs(dataset):
  function coref_doc_to_text (line 36) | def coref_doc_to_text(x):
  function micro_f1_score (line 62) | def micro_f1_score(items):
  function vaxx_f1_score (line 71) | def vaxx_f1_score(items):

FILE: lm_eval/tasks/bbh/_generate_configs.py
  function parse_args (line 15) | def parse_args():

FILE: lm_eval/tasks/bbh/cot_zeroshot/utils.py
  class ExtendedRegexFilter (line 9) | class ExtendedRegexFilter(RegexFilter):
    method __init__ (line 14) | def __init__(
    method filter_ignores (line 28) | def filter_ignores(self, st):
    method find_match (line 41) | def find_match(self, regex, resp, convert_dict={}):
  class MapRegexFilter (line 53) | class MapRegexFilter(ExtendedRegexFilter):
    method __init__ (line 54) | def __init__(
    method apply (line 82) | def apply(self, resps, docs):
  class NumberParseRegexFilter (line 109) | class NumberParseRegexFilter(ExtendedRegexFilter):
    method apply (line 110) | def apply(self, resps, docs):
  class WordSortFilter (line 140) | class WordSortFilter(Filter):
    method apply (line 143) | def apply(self, resps, docs):
  class MultiChoiceRegexFilter (line 162) | class MultiChoiceRegexFilter(ExtendedRegexFilter):
    method __init__ (line 163) | def __init__(self, *args, **kwargs):
    method apply (line 175) | def apply(self, resps, docs):

FILE: lm_eval/tasks/bbh/zeroshot/utils.py
  class ExtendedRegexFilter (line 9) | class ExtendedRegexFilter(RegexFilter):
    method __init__ (line 14) | def __init__(
    method filter_ignores (line 28) | def filter_ignores(self, st):
    method find_match (line 41) | def find_match(self, regex, resp, convert_dict={}):
  class MapRegexFilter (line 53) | class MapRegexFilter(ExtendedRegexFilter):
    method __init__ (line 54) | def __init__(
    method apply (line 82) | def apply(self, resps, docs):
  class NumberParseRegexFilter (line 109) | class NumberParseRegexFilter(ExtendedRegexFilter):
    method apply (line 110) | def apply(self, resps, docs):
  class WordSortFilter (line 140) | class WordSortFilter(Filter):
    method apply (line 143) | def apply(self, resps, docs):
  class MultiChoiceRegexFilter (line 162) | class MultiChoiceRegexFilter(ExtendedRegexFilter):
    method __init__ (line 163) | def __init__(self, *args, **kwargs):
    method apply (line 175) | def apply(self, resps, docs):

FILE: lm_eval/tasks/bbq/utils.py
  function agg_accuracy_amb (line 33) | def agg_accuracy_amb(arr):
  function agg_accuracy_disamb (line 42) | def agg_accuracy_disamb(arr):
  function agg_disamb_bias_scores (line 51) | def agg_disamb_bias_scores(arr):
  function agg_amb_bias_scores (line 84) | def agg_amb_bias_scores(arr):
  function _process_results (line 110) | def _process_results(doc, answer: int):
  function _clean_answer (line 193) | def _clean_answer(answer: str):
  function _check_unk_answer (line 204) | def _check_unk_answer(answer: str):
  function process_results_generate_until (line 212) | def process_results_generate_until(doc, results):
  function process_results_multiple_choice (line 242) | def process_results_multiple_choice(doc, results):
  function doc_to_biased_answer (line 255) | def doc_to_biased_answer(doc):
  function _process_groups_in_answers (line 265) | def _process_groups_in_answers(string):
  function process_docs (line 300) | def process_docs(dataset: datasets.Dataset):
  function filter_dataset_context (line 362) | def filter_dataset_context(dataset: datasets.Dataset, context: str) -> d...
  function process_docs_ambig (line 368) | def process_docs_ambig(dataset: datasets.Dataset):
  function process_docs_disambig (line 372) | def process_docs_disambig(dataset: datasets.Dataset):
  function doc_to_choice (line 376) | def doc_to_choice(doc):
  function _doc_to_choice_groups (line 385) | def _doc_to_choice_groups(doc):
  function doc_to_targets (line 397) | def doc_to_targets(doc):
  function doc_to_target (line 412) | def doc_to_target(doc):
  function filter_dataset (line 417) | def filter_dataset(dataset: datasets.Dataset, bias_type: str) -> dataset...
  function filter_race_color (line 421) | def filter_race_color(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/belebele/_generate_configs.py
  function parse_args (line 18) | def parse_args():
  function query (line 41) | def query():

FILE: lm_eval/tasks/bigbench/generate_tasks.py
  function main (line 183) | def main() -> None:

FILE: lm_eval/tasks/blimp/generate_configs.py
  function main (line 75) | def main() -> None:

FILE: lm_eval/tasks/c4/preprocess_c4.py
  function c4_detokenizer (line 4) | def c4_detokenizer(doc):
  function process_results (line 39) | def process_results(doc, results):

FILE: lm_eval/tasks/cabbq/utils.py
  function _model_answer (line 6) | def _model_answer(lls):
  function _model_answer_type (line 25) | def _model_answer_type(doc, model_answer):
  function process_results (line 75) | def process_results(doc, results):
  function acc_ambig_agg (line 137) | def acc_ambig_agg(results):
  function acc_disambig_agg (line 159) | def acc_disambig_agg(results):
  function bias_score_ambig_agg (line 181) | def bias_score_ambig_agg(results):
  function bias_score_disambig_agg (line 212) | def bias_score_disambig_agg(results):

FILE: lm_eval/tasks/careqa/utils.py
  function doc_to_text (line 1) | def doc_to_text(doc) -> str:
  function doc_to_target (line 39) | def doc_to_target(doc) -> int:

FILE: lm_eval/tasks/careqa/utils_open.py
  function doc_eval (line 22) | def doc_eval(pred, refs):
  function doc_to_text (line 65) | def doc_to_text(doc) -> str:
  function doc_to_target (line 69) | def doc_to_target(doc) -> str:
  function process_results_gen (line 73) | def process_results_gen(doc, results):
  function process_results_gen_w_repeats (line 98) | def process_results_gen_w_repeats(doc, results):

FILE: lm_eval/tasks/careqa/utils_perplexity.py
  function doc_to_target (line 5) | def doc_to_target(doc) -> str:
  function process_results (line 9) | def process_results(doc, results):

FILE: lm_eval/tasks/catalan_bench/flores_ca/create_yamls_flores_ca.py
  function code_to_language_name (line 246) | def code_to_language_name(code):
  function code_to_short_name (line 250) | def code_to_short_name(code):
  function jinja_var (line 254) | def jinja_var(s):
  function doc_to_text (line 258) | def doc_to_text(src: str, tgt: str) -> str:
  function doc_to_target (line 266) | def doc_to_target(tgt: str) -> str:
  function gen_lang_yamls (line 273) | def gen_lang_yamls(output_dir: str, overwrite: bool) -> None:
  function main (line 317) | def main() -> None:

FILE: lm_eval/tasks/catalan_bench/truthfulqa_va/utils.py
  function lowercase_first_letter (line 14) | def lowercase_first_letter(text):
  function process_summarization (line 18) | def process_summarization(dataset):
  function process_docs_paraphrases (line 28) | def process_docs_paraphrases(dataset):
  function process_docs_paws (line 56) | def process_docs_paws(dataset):
  function rouge1 (line 84) | def rouge1(items):
  function rouge1_agg (line 91) | def rouge1_agg(items):
  function process_results_mc2 (line 102) | def process_results_mc2(doc, results):
  function process_docs_gen (line 115) | def process_docs_gen(dataset: datasets.Dataset) -> datasets.Dataset:
  function preprocess_function_gen (line 119) | def preprocess_function_gen(examples):
  function process_doc_nli (line 143) | def process_doc_nli(dataset):
  function process_results_gen (line 170) | def process_results_gen(doc, results):
  function bleu (line 241) | def bleu(refs, preds):
  function rouge (line 263) | def rouge(refs, preds):

FILE: lm_eval/tasks/catalan_bench/utils.py
  function lowercase_first_letter (line 10) | def lowercase_first_letter(text):
  function process_doc_nli (line 14) | def process_doc_nli(dataset):
  function process_results_coqcat (line 38) | def process_results_coqcat(doc, results):
  function process_results_qa (line 72) | def process_results_qa(doc, results):
  function process_doc_cabreu (line 81) | def process_doc_cabreu(dataset):
  function process_docs_paraphrases (line 96) | def process_docs_paraphrases(dataset):
  function process_docs_copa_ca (line 119) | def process_docs_copa_ca(dataset):
  function rouge1 (line 128) | def rouge1(items):
  function rouge1_agg (line 135) | def rouge1_agg(items):

FILE: lm_eval/tasks/ceval/_generate_configs.py
  function parse_args (line 72) | def parse_args():

FILE: lm_eval/tasks/chartqa/utils.py
  function _normalize_string (line 6) | def _normalize_string(s):
  function _remove_end_punctuation (line 14) | def _remove_end_punctuation(unnormalized_string: str) -> str:
  class RelaxedCorrectness (line 27) | class RelaxedCorrectness:
    method _relaxed_correctness (line 39) | def _relaxed_correctness(
    method score (line 132) | def score(self, model_answer: str, reference_answer: str | list[str]) ...
  class ExplicitPromptRelaxedCorrectness (line 141) | class ExplicitPromptRelaxedCorrectness(RelaxedCorrectness):
    method name (line 145) | def name(self) -> str:
    method _get_final_answer (line 148) | def _get_final_answer(self, generation: str) -> str:
    method score (line 174) | def score(self, model_answer: str, reference_answer: str | list[str]) ...
  class AnywhereInAnswerRelaxedCorrectness (line 182) | class AnywhereInAnswerRelaxedCorrectness(ExplicitPromptRelaxedCorrectness):
    method name (line 189) | def name(self) -> str:
    method score (line 192) | def score(self, model_answer: str, reference_answer: str | list[str]) ...
  function exact_match (line 242) | def exact_match(references, predictions):
  function relaxed_accuracy (line 257) | def relaxed_accuracy(references, predictions):
  function anywhere_accuracy (line 268) | def anywhere_accuracy(references, predictions):

FILE: lm_eval/tasks/click/click_cul/utils.py
  function get_context (line 6) | def get_context(doc) -> str:
  function get_target (line 18) | def get_target(doc) -> str:
  function get_choices (line 25) | def get_choices(doc) -> List[str]:
  function extract_economy (line 31) | def extract_economy(dataset: Dataset) -> Dataset:
  function extract_geography (line 35) | def extract_geography(dataset: Dataset) -> Dataset:
  function extract_history (line 39) | def extract_history(dataset: Dataset) -> Dataset:
  function extract_law (line 45) | def extract_law(dataset: Dataset) -> Dataset:
  function extract_politics (line 51) | def extract_politics(dataset: Dataset) -> Dataset:
  function extract_kpop (line 55) | def extract_kpop(dataset: Dataset) -> Dataset:
  function extract_society (line 59) | def extract_society(dataset: Dataset) -> Dataset:
  function extract_tradition (line 63) | def extract_tradition(dataset: Dataset) -> Dataset:

FILE: lm_eval/tasks/click/click_lang/utils.py
  function get_context (line 6) | def get_context(doc) -> str:
  function get_target (line 18) | def get_target(doc) -> str:
  function get_choices (line 25) | def get_choices(doc) -> List[str]:
  function extract_text (line 31) | def extract_text(dataset: Dataset) -> Dataset:
  function extract_grammar (line 41) | def extract_grammar(dataset: Dataset) -> Dataset:
  function extract_function (line 65) | def extract_function(dataset: Dataset) -> Dataset:

FILE: lm_eval/tasks/cmmlu/_generate_configs.py
  function parse_args (line 87) | def parse_args():

FILE: lm_eval/tasks/cnn_dailymail/utils.py
  function normalize_text (line 27) | def normalize_text(text: str) -> str:
  function calculate_rouge_scores (line 44) | def calculate_rouge_scores(
  function calculate_bertscore (line 82) | def calculate_bertscore(
  function process_results (line 124) | def process_results(doc: Dict[str, Any], results: List[str]) -> Dict[str...
  function postprocess_generation (line 186) | def postprocess_generation(generation: str) -> str:
  function filter_long_articles (line 208) | def filter_long_articles(doc: Dict[str, Any]) -> bool:
  function doc_to_choice (line 224) | def doc_to_choice(doc: Dict[str, Any]) -> List[str]:
  function process_docs (line 237) | def process_docs(dataset):
  function calculate_summary_length (line 269) | def calculate_summary_length(generated: str) -> int:

FILE: lm_eval/tasks/code_x_glue/code-text/bleu.py
  function normalize (line 58) | def normalize(s):
  function count_ngrams (line 78) | def count_ngrams(words, n=4):
  function cook_refs (line 87) | def cook_refs(refs, n=4):
  function cook_test (line 101) | def cook_test(test, item, n=4):
  function score_cooked (line 132) | def score_cooked(allcomps, n=4, ground=0, smooth=1):
  function bleu (line 174) | def bleu(refs, candidate, ground=0, smooth=1):
  function splitPuncts (line 180) | def splitPuncts(line):
  function computeMaps (line 184) | def computeMaps(predictions, goldfile):
  function bleuFromMaps (line 210) | def bleuFromMaps(m1, m2):
  function smoothed_bleu_4 (line 222) | def smoothed_bleu_4(references, predictions, **kwargs):

FILE: lm_eval/tasks/code_x_glue/code-text/utils.py
  function doc_to_text (line 1) | def doc_to_text(doc):
  function doc_to_target (line 8) | def doc_to_target(doc):

FILE: lm_eval/tasks/common_voice/utils.py
  function doc_to_text (line 10) | def doc_to_text(doc: Dict[str, Any]) -> str:
  function doc_to_audio (line 14) | def doc_to_audio(doc: Dict[str, Any]) -> List[dict]:

FILE: lm_eval/tasks/copal_id/utils.py
  function convert_choice (line 4) | def convert_choice(choice):
  function doc_to_text (line 8) | def doc_to_text(doc, connector):
  function doc_to_choice (line 13) | def doc_to_choice(doc):

FILE: lm_eval/tasks/coqa/utils.py
  function doc_to_text (line 6) | def doc_to_text(doc):
  function doc_to_target (line 19) | def doc_to_target(doc):
  function em (line 37) | def em(gold_list, pred):
  function compute_scores (line 51) | def compute_scores(gold_list, pred):
  function process_results (line 72) | def process_results(doc, results):

FILE: lm_eval/tasks/crows_pairs/utils.py
  function process_results (line 4) | def process_results(doc, results):
  function doc_to_choice (line 19) | def doc_to_choice(doc):
  function filter_dataset (line 23) | def filter_dataset(dataset: datasets.Dataset, bias_type: str) -> dataset...
  function filter_race_color (line 27) | def filter_race_color(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_socio (line 31) | def filter_socio(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_gender (line 35) | def filter_gender(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_age (line 39) | def filter_age(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_religion (line 43) | def filter_religion(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_disability (line 47) | def filter_disability(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_orientation (line 51) | def filter_orientation(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_nationality (line 55) | def filter_nationality(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_appearance (line 59) | def filter_appearance(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_autre (line 63) | def filter_autre(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/csatqa/_generate_configs.py
  function parse_args (line 19) | def parse_args():

FILE: lm_eval/tasks/csatqa/utils.py
  function process_docs (line 4) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/darija_bench/darija_sentiment/utils.py
  function doc_to_text (line 9) | def doc_to_text(doc):
  function doc_to_choice_3 (line 21) | def doc_to_choice_3(doc):
  function doc_to_choice_2 (line 25) | def doc_to_choice_2(doc):
  function doc_to_target (line 29) | def doc_to_target(doc):

FILE: lm_eval/tasks/darija_bench/darija_summarization/utils.py
  function strip (line 5) | def strip(resps, docs):
  function doc_to_text (line 12) | def doc_to_text(doc):
  function doc_to_target (line 19) | def doc_to_target(doc):
  function bert (line 23) | def bert(items):
  function Average (line 27) | def Average(lst):
  function darijabert (line 31) | def darijabert(items):
  function rouge1 (line 44) | def rouge1(items):
  function rougeL (line 48) | def rougeL(items):
  function rouge2 (line 52) | def rouge2(items):
  function rougeLsum (line 56) | def rougeLsum(items):
  function agg_rougelsum (line 60) | def agg_rougelsum(items):
  function agg_rouge1 (line 66) | def agg_rouge1(items):
  function agg_rouge2 (line 72) | def agg_rouge2(items):
  function agg_rougel (line 78) | def agg_rougel(items):

FILE: lm_eval/tasks/darija_bench/darija_translation/utils.py
  function strip (line 5) | def strip(resps, docs):
  function dr_fr (line 12) | def dr_fr(dataset: datasets.Dataset):
  function dr_en (line 16) | def dr_en(dataset: datasets.Dataset):
  function dr_msa (line 20) | def dr_msa(dataset: datasets.Dataset):
  function fr_dr (line 24) | def fr_dr(dataset: datasets.Dataset):
  function en_dr (line 28) | def en_dr(dataset: datasets.Dataset):
  function msa_dr (line 32) | def msa_dr(dataset: datasets.Dataset):
  function doc_to_text (line 46) | def doc_to_text(doc):
  function doc_to_target (line 51) | def doc_to_target(doc):
  function bert (line 55) | def bert(items):
  function Average (line 59) | def Average(lst):
  function camembert (line 63) | def camembert(items):
  function darijabert (line 76) | def darijabert(items):
  function arabert (line 89) | def arabert(items):
  function bertbase (line 102) | def bertbase(items):
  function mbert (line 115) | def mbert(items):

FILE: lm_eval/tasks/darija_bench/darija_transliteration/utils.py
  function strip (line 5) | def strip(resps, docs):
  function dr_ar (line 12) | def dr_ar(dataset: datasets.Dataset):
  function ar_dr (line 16) | def ar_dr(dataset: datasets.Dataset):
  function doc_to_text (line 20) | def doc_to_text(doc):
  function doc_to_target (line 25) | def doc_to_target(doc):
  function bert (line 29) | def bert(items):
  function Average (line 33) | def Average(lst):
  function arabizibert (line 37) | def arabizibert(items):
  function darijabert (line 50) | def darijabert(items):
  function mbert (line 63) | def mbert(items):

FILE: lm_eval/tasks/darijahellaswag/utils.py
  function process_docs (line 4) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/darijammlu/_generate_configs.py
  function parse_args (line 73) | def parse_args():

FILE: lm_eval/tasks/darijammlu/utils.py
  function doc_to_text (line 7) | def doc_to_text(doc):
  function doc_to_choice (line 24) | def doc_to_choice(doc):

FILE: lm_eval/tasks/discrim_eval/utils.py
  function _logit (line 8) | def _logit(p: float) -> float:
  function process_results (line 30) | def process_results(
  function agg_demographic_bias_regression (line 63) | def agg_demographic_bias_regression(items: List[BiasTuple]) -> float:

FILE: lm_eval/tasks/drop/utils.py
  function process_docs (line 10) | def process_docs(dataset):
  function get_answers (line 22) | def get_answers(doc):
  function parse_answer (line 51) | def parse_answer(answer):
  function process_results (line 64) | def process_results(doc, results):
  function get_metrics (line 76) | def get_metrics(predicted, gold):
  function _answer_to_bags (line 100) | def _answer_to_bags(answer):
  function _align_bags (line 114) | def _align_bags(predicted, gold):
  function _compute_f1 (line 134) | def _compute_f1(predicted_bag, gold_bag):
  function _match_numbers_if_present (line 152) | def _match_numbers_if_present(gold_bag, predicted_bag):
  function _is_number (line 166) | def _is_number(text):
  function _remove_articles (line 174) | def _remove_articles(text):
  function _white_space_fix (line 178) | def _white_space_fix(text):
  function _remove_punc (line 182) | def _remove_punc(text):
  function _fix_number (line 190) | def _fix_number(text):
  function _tokenize (line 194) | def _tokenize(text):
  function _normalize (line 198) | def _normalize(answer):

FILE: lm_eval/tasks/e2lmc/mmlu_early_training/custom_metrics.py
  function loglikelihood_diff (line 4) | def loglikelihood_diff(items):

FILE: lm_eval/tasks/e2lmc/noor/_generate_configs.py
  function parse_args (line 78) | def parse_args():

FILE: lm_eval/tasks/egyhellaswag/utils.py
  function process_docs (line 4) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/egymmlu/_generate_configs.py
  function parse_args (line 74) | def parse_args():

FILE: lm_eval/tasks/egymmlu/utils.py
  function doc_to_text (line 7) | def doc_to_text(doc):
  function doc_to_choice (line 24) | def doc_to_choice(doc):

FILE: lm_eval/tasks/eq_bench/multilingual/utils.py
  function calculate_score_fullscale (line 6) | def calculate_score_fullscale(docs, results):

FILE: lm_eval/tasks/eq_bench/utils.py
  function calculate_score_fullscale (line 6) | def calculate_score_fullscale(docs, results):

FILE: lm_eval/tasks/esbbq/utils.py
  function _model_answer (line 6) | def _model_answer(lls):
  function _model_answer_type (line 25) | def _model_answer_type(doc, model_answer):
  function process_results (line 75) | def process_results(doc, results):
  function acc_ambig_agg (line 137) | def acc_ambig_agg(results):
  function acc_disambig_agg (line 159) | def acc_disambig_agg(results):
  function bias_score_ambig_agg (line 181) | def bias_score_ambig_agg(results):
  function bias_score_disambig_agg (line 212) | def bias_score_disambig_agg(results):

FILE: lm_eval/tasks/eus_exams/configs.py
  function gen_config_yamls (line 16) | def gen_config_yamls(output_dir: str, overwrite: bool) -> None:
  function main (line 49) | def main() -> None:

FILE: lm_eval/tasks/eus_exams/utils.py
  function process_docs (line 4) | def process_docs(dataset: datasets.Dataset):

FILE: lm_eval/tasks/eus_reading/utils.py
  function doc_to_text_context (line 7) | def doc_to_text_context(doc) -> str:
  function doc_to_choice (line 28) | def doc_to_choice(doc) -> List[str]:

FILE: lm_eval/tasks/eus_trivia/utils.py
  function doc_to_text (line 7) | def doc_to_text(doc) -> str:
  function doc_to_choice (line 28) | def doc_to_choice(doc) -> List[str]:

FILE: lm_eval/tasks/evalita_llm/metrics.py
  function _aggreg_ls (line 10) | def _aggreg_ls(predictions):
  function _aggreg_sa_v2 (line 37) | def _aggreg_sa_v2(predictions):
  function _aggreg_sa (line 49) | def _aggreg_sa(predictions):
  function _aggreg_ner (line 124) | def _aggreg_ner(predictions):
  function _aggreg_rel (line 143) | def _aggreg_rel(predictions):
  function _aggreg_dd (line 160) | def _aggreg_dd(items):

FILE: lm_eval/tasks/evalita_llm/sum_utils.py
  function rouge1_score (line 7) | def rouge1_score(references, predictions, **kwargs):
  function process_results_sum (line 16) | def process_results_sum(doc, results):

FILE: lm_eval/tasks/evalita_llm/utils.py
  function sa_doc_to_target (line 11) | def sa_doc_to_target(x):
  function sa_doc_to_target_v2 (line 30) | def sa_doc_to_target_v2(x):
  function sa_doc_to_choice (line 49) | def sa_doc_to_choice(x):
  function _ls_gold_to_target (line 60) | def _ls_gold_to_target(x):
  function ls_doc_to_target (line 77) | def ls_doc_to_target(x):
  function _ls_split_gold (line 91) | def _ls_split_gold(x):
  function ls_process_results (line 112) | def ls_process_results(doc, results):
  function _ner_gold_to_target (line 163) | def _ner_gold_to_target(x: list) -> list:
  function _ner_gold_to_target_v2 (line 171) | def _ner_gold_to_target_v2(x: list) -> list:
  function ner_doc_to_target (line 179) | def ner_doc_to_target(doc):
  function ner_process_results (line 193) | def ner_process_results(doc, results):
  function ner_process_results_v2 (line 246) | def ner_process_results_v2(doc, results):
  function _ner_process_raw_output (line 313) | def _ner_process_raw_output(llm_result: str) -> list[tuple]:
  function _ner_process_raw_output_v2 (line 337) | def _ner_process_raw_output_v2(llm_result: str) -> list[tuple]:
  function _rel_process_raw_output (line 364) | def _rel_process_raw_output(llm_result: str) -> list[str]:
  function re_doc_to_target (line 391) | def re_doc_to_target(doc):
  function _rel_gold_to_target (line 403) | def _rel_gold_to_target(x: list) -> list:
  function rel_doc_to_target (line 410) | def rel_doc_to_target(doc):
  function _extract_relations (line 422) | def _extract_relations(results):
  function rel_process_results_v3 (line 439) | def rel_process_results_v3(doc, results):
  function split_text_with_regex (line 498) | def split_text_with_regex(text, pattern):
  function faq_doc_to_target (line 526) | def faq_doc_to_target(x):
  function ht_doc_to_target (line 541) | def ht_doc_to_target(x):

FILE: lm_eval/tasks/fda/task.py
  class FDA (line 10) | class FDA(ConfigurableTask):
    method __init__ (line 15) | def __init__(self, **kwargs):
    method has_training_docs (line 18) | def has_training_docs(self):
    method has_validation_docs (line 21) | def has_validation_docs(self):
    method has_test_docs (line 24) | def has_test_docs(self):
    method validation_docs (line 27) | def validation_docs(self):
    method doc_to_text (line 30) | def doc_to_text(self, doc):
    method doc_to_target (line 33) | def doc_to_target(self, doc):
    method construct_requests (line 36) | def construct_requests(
    method process_results (line 60) | def process_results(self, doc, results):
    method aggregation (line 75) | def aggregation(self):
    method higher_is_better (line 85) | def higher_is_better(self):
  function contains_score (line 96) | def contains_score(prediction: str, labels: List[str]):

FILE: lm_eval/tasks/french_bench/preprocess_wikitext.py
  function wikitext_detokenizer (line 4) | def wikitext_detokenizer(doc):
  function process_results (line 39) | def process_results(doc, results):

FILE: lm_eval/tasks/french_bench/utils.py
  function normalize_answer (line 9) | def normalize_answer(s):
  function get_tokens (line 29) | def get_tokens(s):
  function exact (line 36) | def exact(predictions, references):
  function f1 (line 41) | def f1(predictions, references):
  function rouge1 (line 57) | def rouge1(items):
  function rouge1_agg (line 64) | def rouge1_agg(items):
  function is_included (line 74) | def is_included(items):
  function preprocess (line 83) | def preprocess(text):
  function process_docs (line 92) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/galician_bench/flores_gl/create_yamls_flores_gl.py
  function doc_to_text (line 257) | def doc_to_text(src: str, tgt: str) -> str:
  function doc_to_target (line 265) | def doc_to_target(tgt: str) -> str:
  function gen_lang_yamls (line 272) | def gen_lang_yamls(output_dir: str, overwrite: bool) -> None:
  function main (line 316) | def main() -> None:

FILE: lm_eval/tasks/galician_bench/utils.py
  function lowercase_first_letter (line 14) | def lowercase_first_letter(text):
  function process_summarization (line 18) | def process_summarization(dataset):
  function process_docs_paraphrases (line 28) | def process_docs_paraphrases(dataset):
  function process_docs_paws (line 56) | def process_docs_paws(dataset):
  function rouge1 (line 84) | def rouge1(items):
  function rouge1_agg (line 91) | def rouge1_agg(items):
  function process_results_mc2 (line 102) | def process_results_mc2(doc, results):
  function process_docs_gen (line 115) | def process_docs_gen(dataset: datasets.Dataset) -> datasets.Dataset:
  function preprocess_function_gen (line 119) | def preprocess_function_gen(examples):
  function process_doc_nli (line 143) | def process_doc_nli(dataset):
  function process_results_gen (line 170) | def process_results_gen(doc, results):
  function bleu (line 241) | def bleu(refs, preds):
  function rouge (line 264) | def rouge(refs, preds):

FILE: lm_eval/tasks/glianorex/preprocess_glianorex.py
  function doc_to_text (line 4) | def doc_to_text(doc) -> str:
  function doc_to_target (line 10) | def doc_to_target(doc) -> str:
  function filter_dataset (line 15) | def filter_dataset(dataset: datasets.Dataset, lang: str) -> datasets.Dat...
  function filter_french (line 19) | def filter_french(dataset: datasets.Dataset) -> datasets.Dataset:
  function filter_english (line 23) | def filter_english(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/global_mmlu/default/ar/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/bn/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/de/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/en/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/es/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/fr/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/hi/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/id/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/it/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/ja/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/ko/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/pt/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/sw/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/yo/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/default/zh/utils.py
  function process_docs (line 7) | def process_docs(dataset, category):

FILE: lm_eval/tasks/global_mmlu/full/am/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ar/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/bn/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/cs/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/de/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/el/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/en/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/es/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/fa/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/fil/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/fr/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ha/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/he/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/hi/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/id/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ig/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/it/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ja/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ko/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ky/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/lt/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/mg/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ms/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ne/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/nl/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ny/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/pl/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/pt/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ro/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/ru/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/si/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/sn/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/so/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/sr/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/sv/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/sw/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/te/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/tr/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/uk/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/vi/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/yo/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_mmlu/full/zh/utils.py
  function process_docs (line 65) | def process_docs(dataset, subject):

FILE: lm_eval/tasks/global_piqa/completions/_generate_config.py
  class IndentedDumper (line 7) | class IndentedDumper(yaml.Dumper):
    method increase_indent (line 8) | def increase_indent(self, flow=False, indentless=False):
  function format_subset (line 15) | def format_subset(subset: str, preface: str = PREFACE) -> str:

FILE: lm_eval/tasks/global_piqa/prompted/_generate_config.py
  class IndentedDumper (line 7) | class IndentedDumper(yaml.Dumper):
    method increase_indent (line 8) | def increase_indent(self, flow=False, indentless=False):
  function format_subset (line 15) | def format_subset(subset: str, preface: str = PREFACE) -> str:

FILE: lm_eval/tasks/glue/mnli/utils.py
  function doc_to_text (line 1) | def doc_to_text(doc) -> str:

FILE: lm_eval/tasks/gpqa/cot_n_shot/_generate_configs.py
  function main (line 5) | def main() -> None:

FILE: lm_eval/tasks/gpqa/cot_n_shot/utils.py
  function preprocess (line 7) | def preprocess(text):
  function process_docs (line 17) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/gpqa/cot_zeroshot/_generate_configs.py
  function main (line 5) | def main() -> None:

FILE: lm_eval/tasks/gpqa/cot_zeroshot/utils.py
  function preprocess (line 7) | def preprocess(text):
  function process_docs (line 17) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/gpqa/generative/_generate_configs.py
  function main (line 5) | def main() -> None:

FILE: lm_eval/tasks/gpqa/generative/utils.py
  function preprocess (line 7) | def preprocess(text):
  function process_docs (line 17) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/gpqa/n_shot/_generate_configs.py
  function main (line 5) | def main() -> None:

FILE: lm_eval/tasks/gpqa/n_shot/utils.py
  function preprocess (line 7) | def preprocess(text):
  function process_docs (line 20) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/gpqa/zeroshot/_generate_configs.py
  function main (line 5) | def main() -> None:

FILE: lm_eval/tasks/gpqa/zeroshot/utils.py
  function preprocess (line 7) | def preprocess(text):
  function process_docs (line 17) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/graphwalks/utils.py
  function load_dataset (line 7) | def load_dataset(**kwargs):
  function extract_answer_list (line 27) | def extract_answer_list(response: str) -> Tuple[List[str], bool]:
  function extract_answer_list_flexible (line 65) | def extract_answer_list_flexible(response: str) -> Tuple[List[str], bool]:
  function process_results (line 100) | def process_results(doc, results):

FILE: lm_eval/tasks/groundcocoa/utils.py
  function process_docs (line 6) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/hellaswag/utils.py
  function preprocess (line 6) | def preprocess(text):
  function process_docs (line 15) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/hendrycks_ethics/utils.py
  function _preproc_doc (line 5) | def _preproc_doc(doc):
  function doc_to_text (line 18) | def doc_to_text(doc) -> str:
  function doc_to_target (line 23) | def doc_to_target(doc):

FILE: lm_eval/tasks/hendrycks_math/utils.py
  function process_docs (line 6) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
  function process_results (line 18) | def process_results(doc: dict, results: List[str]) -> Dict[str, int]:
  function is_equiv (line 36) | def is_equiv(str1, str2, verbose=False):
  function remove_boxed (line 53) | def remove_boxed(s):
  function last_boxed_only_string (line 67) | def last_boxed_only_string(string):
  function fix_fracs (line 97) | def fix_fracs(string):
  function fix_a_slash_b (line 129) | def fix_a_slash_b(string):
  function remove_right_units (line 144) | def remove_right_units(string):
  function fix_sqrt (line 154) | def fix_sqrt(string):
  function strip_string (line 169) | def strip_string(string):

FILE: lm_eval/tasks/histoires_morales/utils.py
  function process_docs (line 4) | def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:

FILE: lm_eval/tasks/hrm8k/default/utils.py
  function doc_to_text (line 5) | def doc_to_text(doc):
  function doc_to_text_mmmlu (line 14) | def doc_to_text_mmmlu(doc):
  function doc_to_target (line 23) | def doc_to_target(doc):
  function postprocess (line 27) | def postprocess(s):
  function process_results (line 36) | def process_results(doc: dict, results: List[str]) -> Dict[str, int]:
  function is_equiv (line 54) | def is_equiv(str1, str2, verbose=False):
  function parse_math_answer (line 74) | def parse_math_answer(raw_string):
  function _fix_fracs (line 146) | def _fix_fracs(string):
  function _fix_a_slash_b (line 178) | def _fix_a_slash_b(string):
  function _remove_right_units (line 193) | def _remove_right_units(string):
  function _fix_sqrt (line 203) | def _fix_sqrt(string):
  function _strip_string (line 218) | def _strip_string(string):

FILE: lm_eval/tasks/hrm8k/en/utils.py
  function doc_to_text (line 5) | def doc_to_text(doc):
  function doc_to_text_mmmlu (line 14) | def doc_to_text_mmmlu(doc):
  function doc_to_target (line 23) | def doc_to_target(doc):
  function postprocess (line 27) | def postprocess(s):
  function process_results (line 36) | def process_results(doc: dict, results: List[str]) -> Dict[str, int]:
  function is_equiv (line 54) | def is_equiv(str1, str2, verbose=False):
  function parse_math_answer (line 74) | def parse_math_answer(raw_string):
  function _fix_fracs (line 146) | def _fix_fracs(string):
  function _fix_a_slash_b (line 178) | def _fix_a_slash_b(string):
  function _remove_right_units (line 193) | def _remove_right_units(string):
  function _fix_sqrt (line 203) | def _fix_sqrt(string):
  function _strip_string (line 218) | def _strip_string(string):

FILE: lm_eval/tasks/humaneval/utils.py
  function pass_at_k (line 13) | def pass_at_k(references: list[str], predictions: list[list[str]], k: li...
  function build_predictions (line 26) | def build_predictions(resps: list[list[str]], docs: list[dict]) -> list[...
  function build_predictions_instruct (line 30) | def build_predictions_instruct(

FILE: lm_eval/tasks/humaneval_infilling/utils.py
  function pass_at_k (line 13) | def pass_at_k(references: list[str], predictions: list[list[str]], k: li...
  function build_predictions (line 26) | def build_predictions(resps: list[list[str]], docs: list[dict]) -> list[...

FILE: lm_eval/tasks/icelandic_winogrande/preprocess_winogrande.py
  function doc_to_text (line 1) | def doc_to_text(doc):
  function doc_to_target (line 6) | def doc_to_target(doc):
  function doc_to_choice (line 14) | def doc_to_choice(doc):

FILE: lm_eval/tasks/ifeval/instructions.py
  class Instruction (line 110) | class Instruction:
    method __init__ (line 113) | def __init__(self, instruction_id):
    method build_description (line 116) | def build_description(self, **kwargs):
    method get_instruction_args (line 119) | def get_instruction_args(self):
    method get_instruction_args_keys (line 122) | def get_instruction_args_keys(self):
    method check_following (line 125) | def check_following(self, value):
  class ResponseLanguageChecker (line 129) | class ResponseLanguageChecker(Instruction):
    method build_description (line 132) | def build_description(self, *, language=None):
    method get_instruction_args (line 155) | def get_instruction_args(self):
    method get_instruction_args_keys (line 159) | def get_instruction_args_keys(self):
    method check_following (line 163) | def check_following(self, value):
  class NumberOfSentences (line 184) | class NumberOfSentences(Instruction):
    method build_description (line 187) | def build_description(self, *, num_sentences=None, relation=None):
    method get_instruction_args (line 225) | def get_instruction_args(self):
    method get_instruction_args_keys (line 232) | def get_instruction_args_keys(self):
    method check_following (line 236) | def check_following(self, value):
  class PlaceholderChecker (line 256) | class PlaceholderChecker(Instruction):
    method build_description (line 259) | def build_description(self, *, num_placeholders=None):
    method get_instruction_args (line 278) | def get_instruction_args(self):
    method get_instruction_args_keys (line 282) | def get_instruction_args_keys(self):
    method check_following (line 286) | def check_following(self, value):
  class BulletListChecker (line 301) | class BulletListChecker(Instruction):
    method build_description (line 304) | def build_description(self, *, num_bullets=None):
    method get_instruction_args (line 325) | def get_instruction_args(self):
    method get_instruction_args_keys (line 329) | def get_instruction_args_keys(self):
    method check_following (line 333) | def check_following(self, value):
  class ConstrainedResponseChecker (line 350) | class ConstrainedResponseChecker(Instruction):
    method build_description (line 353) | def build_description(self):
    method get_instruction_args (line 364) | def get_instruction_args(self):
    method get_instruction_args_keys (line 368) | def get_instruction_args_keys(self):
    method check_following (line 372) | def check_following(self, value):
  class ConstrainedStartChecker (line 389) | class ConstrainedStartChecker(Instruction):
    method build_description (line 392) | def build_description(self, *, starter=None):
    method get_instruction_args (line 411) | def get_instruction_args(self):
    method get_instruction_args_keys (line 415) | def get_instruction_args_keys(self):
    method check_following (line 419) | def check_following(self, value):
  class HighlightSectionChecker (line 436) | class HighlightSectionChecker(Instruction):
    method build_description (line 439) | def build_description(self, *, num_highlights=None):
    method get_instruction_args (line 460) | def get_instruction_args(self):
    method get_instruction_args_keys (line 464) | def get_instruction_args_keys(self):
    method check_following (line 468) | def check_following(self, value):
  class SectionChecker (line 492) | class SectionChecker(Instruction):
    method build_description (line 495) | def build_description(self, *, section_spliter=None, num_sections=None):
    method get_instruction_args (line 531) | def get_instruction_args(self):
    method get_instruction_args_keys (line 538) | def get_instruction_args_keys(self):
    method check_following (line 542) | def check_following(self, value):
  class ParagraphChecker (line 561) | class ParagraphChecker(Instruction):
    method build_description (line 564) | def build_description(self, *, num_paragraphs=None):
    method get_instruction_args (line 584) | def get_instruction_args(self):
    method get_instruction_args_keys (line 588) | def get_instruction_args_keys(self):
    method check_following (line 592) | def check_following(self, value):
  class PostscriptChecker (line 616) | class PostscriptChecker(Instruction):
    method build_description (line 619) | def build_description(self, *, postscript_marker=None):
    method get_instruction_args (line 644) | def get_instruction_args(self):
    method get_instruction_args_keys (line 648) | def get_instruction_args_keys(self):
    method check_following (line 652) | def check_following(self, value):
  class RephraseChecker (line 674) | class RephraseChecker(Instruction):
    method build_description (line 677) | def build_description(self, *, original_message):
    method get_instruction_args (line 703) | def get_instruction_args(self):
    method get_instruction_args_keys (line 707) | def get_instruction_args_keys(self):
    method check_following (line 711) | def check_following(self, value):
    method is_change (line 733) | def is_change(self, response):
    method strip_changes (line 737) | def strip_changes(self, response):
  class KeywordChecker (line 742) | class KeywordChecker(Instruction):
    method build_description (line 745) | def build_description(self, *, keywords=None):
    method get_instruction_args (line 768) | def get_instruction_args(self):
    method get_instruction_args_keys (line 772) | def get_instruction_args_keys(self):
    method check_following (line 776) | def check_following(self, value):
  class KeywordFrequencyChecker (line 784) | class KeywordFrequencyChecker(Instruction):
    method build_description (line 787) | def build_description(self, *, keyword=None, frequency=None, relation=...
    method get_instruction_args (line 833) | def get_instruction_args(self):
    method get_instruction_args_keys (line 841) | def get_instruction_args_keys(self):
    method check_following (line 845) | def check_following(self, value):
  class NumberOfWords (line 855) | class NumberOfWords(Instruction):
    method build_description (line 858) | def build_description(self, *, num_words=None, relation=None):
    method get_instruction_args (line 896) | def get_instruction_args(self):
    method get_instruction_args_keys (line 900) | def get_instruction_args_keys(self):
    method check_following (line 904) | def check_following(self, value):
  class JsonFormat (line 914) | class JsonFormat(Instruction):
    method build_description (line 917) | def build_description(self):
    method get_instruction_args (line 924) | def get_instruction_args(self):
    method get_instruction_args_keys (line 928) | def get_instruction_args_keys(self):
    method check_following (line 932) | def check_following(self, value):
  class ParagraphFirstWordCheck (line 949) | class ParagraphFirstWordCheck(Instruction):
    method build_description (line 952) | def build_description(
    method get_instruction_args (line 998) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1006) | def get_instruction_args_keys(self):
    method check_following (line 1010) | def check_following(self, value):
  class KeySentenceChecker (line 1056) | class KeySentenceChecker(Instruction):
    method build_description (line 1059) | def build_description(self, key_sentences=None, num_sentences=None):
    method get_instruction_args (line 1091) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1098) | def get_instruction_args_keys(self):
    method check_following (line 1102) | def check_following(self, value):
  class ForbiddenWords (line 1113) | class ForbiddenWords(Instruction):
    method build_description (line 1116) | def build_description(self, forbidden_words=None):
    method get_instruction_args (line 1140) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1144) | def get_instruction_args_keys(self):
    method check_following (line 1148) | def check_following(self, value):
  class RephraseParagraph (line 1156) | class RephraseParagraph(Instruction):
    method build_description (line 1159) | def build_description(self, *, original_paragraph, low, high):
    method get_instruction_args (line 1190) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1198) | def get_instruction_args_keys(self):
    method check_following (line 1202) | def check_following(self, value):
  class TwoResponsesChecker (line 1216) | class TwoResponsesChecker(Instruction):
    method build_description (line 1219) | def build_description(self):
    method get_instruction_args (line 1227) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1231) | def get_instruction_args_keys(self):
    method check_following (line 1235) | def check_following(self, value):
  class RepeatPromptThenAnswer (line 1258) | class RepeatPromptThenAnswer(Instruction):
    method build_description (line 1261) | def build_description(self, *, prompt_to_repeat=None):
    method get_instruction_args (line 1282) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1285) | def get_instruction_args_keys(self):
    method check_following (line 1289) | def check_following(self, value):
  class EndChecker (line 1295) | class EndChecker(Instruction):
    method build_description (line 1298) | def build_description(self, *, end_phrase=None):
    method get_instruction_args (line 1318) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1321) | def get_instruction_args_keys(self):
    method check_following (line 1325) | def check_following(self, value):
  class TitleChecker (line 1332) | class TitleChecker(Instruction):
    method build_description (line 1335) | def build_description(self):
    method get_instruction_args (line 1343) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1346) | def get_instruction_args_keys(self):
    method check_following (line 1350) | def check_following(self, value):
  class LetterFrequencyChecker (line 1362) | class LetterFrequencyChecker(Instruction):
    method build_description (line 1365) | def build_description(self, *, letter=None, let_frequency=None, let_re...
    method get_instruction_args (line 1417) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1425) | def get_instruction_args_keys(self):
    method check_following (line 1429) | def check_following(self, value):
  class CapitalLettersEnglishChecker (line 1440) | class CapitalLettersEnglishChecker(Instruction):
    method build_description (line 1443) | def build_description(self):
    method get_instruction_args (line 1450) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1453) | def get_instruction_args_keys(self):
    method check_following (line 1457) | def check_following(self, value):
  class LowercaseLettersEnglishChecker (line 1471) | class LowercaseLettersEnglishChecker(Instruction):
    method build_description (line 1474) | def build_description(self):
    method get_instruction_args (line 1482) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1485) | def get_instruction_args_keys(self):
    method check_following (line 1489) | def check_following(self, value):
  class CommaChecker (line 1503) | class CommaChecker(Instruction):
    method build_description (line 1506) | def build_description(self):
    method get_instruction_args (line 1513) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1516) | def get_instruction_args_keys(self):
    method check_following (line 1520) | def check_following(self, value):
  class CapitalWordFrequencyChecker (line 1525) | class CapitalWordFrequencyChecker(Instruction):
    method build_description (line 1528) | def build_description(
    method get_instruction_args (line 1566) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1573) | def get_instruction_args_keys(self):
    method check_following (line 1577) | def check_following(self, value):
  class QuotationChecker (line 1591) | class QuotationChecker(Instruction):
    method build_description (line 1594) | def build_description(self):
    method get_instruction_args (line 1601) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1605) | def get_instruction_args_keys(self):
    method check_following (line 1609) | def check_following(self, value):

FILE: lm_eval/tasks/ifeval/instructions_registry.py
  function conflict_make (line 153) | def conflict_make(conflicts):

FILE: lm_eval/tasks/ifeval/instructions_util.py
  function download_nltk_resources (line 36) | def download_nltk_resources():
  function split_into_sentences (line 1628) | def split_into_sentences(text):
  function count_words (line 1679) | def count_words(text):
  function _get_sentence_tokenizer (line 1688) | def _get_sentence_tokenizer():
  function count_sentences (line 1692) | def count_sentences(text):
  function generate_keywords (line 1699) | def generate_keywords(num_keywords):

FILE: lm_eval/tasks/ifeval/multilingual/instruction_utils/ca_instructions_util.py
  function lang_code_to_name (line 32) | def lang_code_to_name(lang_code: str):
  function split_into_sentences (line 46) | def split_into_sentences(text):
  function count_words (line 98) | def count_words(text):
  function tokenize_words (line 105) | def tokenize_words(text):
  function count_sentences (line 113) | def count_sentences(text):
  function generate_keywords (line 120) | def generate_keywords(num_keywords):

FILE: lm_eval/tasks/ifeval/multilingual/instruction_utils/es_instructions_util.py
  function lang_code_to_name (line 32) | def lang_code_to_name(lang_code: str):
  function split_into_sentences (line 46) | def split_into_sentences(text):
  function count_words (line 98) | def count_words(text):
  function tokenize_words (line 105) | def tokenize_words(text):
  function count_sentences (line 113) | def count_sentences(text):
  function generate_keywords (line 120) | def generate_keywords(num_keywords):

FILE: lm_eval/tasks/ifeval/multilingual/instructions/ca_instructions.py
  class Instruction (line 90) | class Instruction:
    method __init__ (line 93) | def __init__(self, instruction_id):
    method build_description (line 96) | def build_description(self, **kwargs):
    method get_instruction_args (line 99) | def get_instruction_args(self):
    method get_instruction_args_keys (line 102) | def get_instruction_args_keys(self):
    method check_following (line 105) | def check_following(self, value):
  class ResponseLanguageChecker (line 109) | class ResponseLanguageChecker(Instruction):
    method build_description (line 112) | def build_description(self, *, language = None):
    method get_instruction_args (line 133) | def get_instruction_args(self):
    method get_instruction_args_keys (line 137) | def get_instruction_args_keys(self):
    method check_following (line 141) | def check_following(self, value):
  class NumberOfSentences (line 162) | class NumberOfSentences(Instruction):
    method build_description (line 165) | def build_description(self, *, num_sentences = None,
    method get_instruction_args (line 201) | def get_instruction_args(self):
    method get_instruction_args_keys (line 206) | def get_instruction_args_keys(self):
    method check_following (line 210) | def check_following(self, value):
  class PlaceholderChecker (line 248) | class PlaceholderChecker(Instruction):
    method build_description (line 251) | def build_description(self, *, num_placeholders = None,
    method get_instruction_args (line 284) | def get_instruction_args(self):
    method get_instruction_args_keys (line 289) | def get_instruction_args_keys(self):
    method check_following (line 293) | def check_following(self, value):
  class BulletListChecker (line 312) | class BulletListChecker(Instruction):
    method build_description (line 315) | def build_description(self, *, num_bullets = None):
    method get_instruction_args (line 336) | def get_instruction_args(self):
    method get_instruction_args_keys (line 340) | def get_instruction_args_keys(self):
    method check_following (line 344) | def check_following(self, value):
  class ConstrainedResponseChecker (line 360) | class ConstrainedResponseChecker(Instruction):
    method build_description (line 363) | def build_description(self):
    method get_instruction_args (line 372) | def get_instruction_args(self):
    method get_instruction_args_keys (line 376) | def get_instruction_args_keys(self):
    method check_following (line 380) | def check_following(self, value):
  class ConstrainedStartChecker (line 398) | class ConstrainedStartChecker(Instruction):
    method build_description (line 401) | def build_description(self, *, starter = None):
    method get_instruction_args (line 419) | def get_instruction_args(self):
    method get_instruction_args_keys (line 423) | def get_instruction_args_keys(self):
    method check_following (line 427) | def check_following(self, value):
  class HighlightSectionChecker (line 443) | class HighlightSectionChecker(Instruction):
    method build_description (line 446) | def build_description(self, *, num_highlights = None,
    method get_instruction_args (line 479) | def get_instruction_args(self):
    method get_instruction_args_keys (line 484) | def get_instruction_args_keys(self):
    method check_following (line 488) | def check_following(self, value):
  class SectionChecker (line 516) | class SectionChecker(Instruction):
    method build_description (line 519) | def build_description(self, *, section_spliter = None,
    method get_instruction_args (line 563) | def get_instruction_args(self):
    method get_instruction_args_keys (line 569) | def get_instruction_args_keys(self):
    method check_following (line 573) | def check_following(self, value):
  class ParagraphChecker (line 596) | class ParagraphChecker(Instruction):
    method build_description (line 599) | def build_description(self, *, num_paragraphs = None):
    method get_instruction_args (line 618) | def get_instruction_args(self):
    method get_instruction_args_keys (line 622) | def get_instruction_args_keys(self):
    method check_following (line 626) | def check_following(self, value):
  class PostscriptChecker (line 650) | class PostscriptChecker(Instruction):
    method build_description (line 653) | def build_description(self, *, postscript_marker = None
    method get_instruction_args (line 675) | def get_instruction_args(self):
    method get_instruction_args_keys (line 679) | def get_instruction_args_keys(self):
    method check_following (line 683) | def check_following(self, value):
  class RephraseChecker (line 706) | class RephraseChecker(Instruction):
    method build_description (line 709) | def build_description(self, *, original_message):
    method get_instruction_args (line 731) | def get_instruction_args(self):
    method get_instruction_args_keys (line 735) | def get_instruction_args_keys(self):
    method check_following (line 739) | def check_following(self, value):
    method is_change (line 761) | def is_change(self, response):
    method strip_changes (line 765) | def strip_changes(self, response):
  class KeywordChecker (line 770) | class KeywordChecker(Instruction):
    method build_description (line 773) | def build_description(self, *, keywords = None
    method get_instruction_args (line 796) | def get_instruction_args(self):
    method get_instruction_args_keys (line 800) | def get_instruction_args_keys(self):
    method check_following (line 804) | def check_following(self, value):
  class KeywordFrequencyChecker (line 812) | class KeywordFrequencyChecker(Instruction):
    method build_description (line 815) | def build_description(self, *, keyword = None,
    method get_instruction_args (line 859) | def get_instruction_args(self):
    method get_instruction_args_keys (line 865) | def get_instruction_args_keys(self):
    method check_following (line 869) | def check_following(self, value):
  class NumberOfWords (line 880) | class NumberOfWords(Instruction):
    method build_description (line 883) | def build_description(self, *, num_words = None,
    method get_instruction_args (line 921) | def get_instruction_args(self):
    method get_instruction_args_keys (line 926) | def get_instruction_args_keys(self):
    method check_following (line 930) | def check_following(self, value):
  class JsonFormat (line 946) | class JsonFormat(Instruction):
    method build_description (line 949) | def build_description(self):
    method get_instruction_args (line 955) | def get_instruction_args(self):
    method get_instruction_args_keys (line 959) | def get_instruction_args_keys(self):
    method check_following (line 966) | def check_following(self, value):
  class ParagraphFirstWordCheck (line 983) | class ParagraphFirstWordCheck(Instruction):
    method build_description (line 986) | def build_description(self, num_paragraphs = None,
    method get_instruction_args (line 1030) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1036) | def get_instruction_args_keys(self):
    method check_following (line 1040) | def check_following(self, value):
  class KeySentenceChecker (line 1095) | class KeySentenceChecker(Instruction):
    method build_description (line 1098) | def build_description(self, key_sentences = None,
    method get_instruction_args (line 1131) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1136) | def get_instruction_args_keys(self):
    method check_following (line 1140) | def check_following(self, value):
  class ForbiddenWords (line 1151) | class ForbiddenWords(Instruction):
    method build_description (line 1154) | def build_description(self, forbidden_words = None
    method get_instruction_args (line 1180) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1184) | def get_instruction_args_keys(self):
    method check_following (line 1188) | def check_following(self, value):
  class RephraseParagraph (line 1197) | class RephraseParagraph(Instruction):
    method build_description (line 1200) | def build_description(self, *, original_paragraph, low, high
    method get_instruction_args (line 1229) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1235) | def get_instruction_args_keys(self):
    method check_following (line 1239) | def check_following(self, value):
  class TwoResponsesChecker (line 1253) | class TwoResponsesChecker(Instruction):
    method build_description (line 1256) | def build_description(self):
    method get_instruction_args (line 1264) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1268) | def get_instruction_args_keys(self):
    method check_following (line 1272) | def check_following(self, value):
  class RepeatPromptThenAnswer (line 1295) | class RepeatPromptThenAnswer(Instruction):
    method build_description (line 1298) | def build_description(self, *, prompt_to_repeat = None):
    method get_instruction_args (line 1318) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1321) | def get_instruction_args_keys(self):
    method check_following (line 1325) | def check_following(self, value):
  class EndChecker (line 1331) | class EndChecker(Instruction):
    method build_description (line 1334) | def build_description(self, *, end_phrase = None):
    method get_instruction_args (line 1353) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1356) | def get_instruction_args_keys(self):
    method check_following (line 1360) | def check_following(self, value):
  class TitleChecker (line 1371) | class TitleChecker(Instruction):
    method build_description (line 1374) | def build_description(self):
    method get_instruction_args (line 1382) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1385) | def get_instruction_args_keys(self):
    method check_following (line 1389) | def check_following(self, value):
  class LetterFrequencyChecker (line 1401) | class LetterFrequencyChecker(Instruction):
    method build_description (line 1404) | def build_description(self, *, letter = None,
    method get_instruction_args (line 1458) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1464) | def get_instruction_args_keys(self):
    method check_following (line 1468) | def check_following(self, value):
  class CapitalLettersCatalanChecker (line 1479) | class CapitalLettersCatalanChecker(Instruction):
    method build_description (line 1482) | def build_description(self):
    method get_instruction_args (line 1489) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1492) | def get_instruction_args_keys(self):
    method check_following (line 1496) | def check_following(self, value):
  class LowercaseLettersCatalanChecker (line 1521) | class LowercaseLettersCatalanChecker(Instruction):
    method build_description (line 1524) | def build_description(self):
    method get_instruction_args (line 1532) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1535) | def get_instruction_args_keys(self):
    method check_following (line 1539) | def check_following(self, value):
  class CommaChecker (line 1553) | class CommaChecker(Instruction):
    method build_description (line 1556) | def build_description(self):
    method get_instruction_args (line 1563) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1566) | def get_instruction_args_keys(self):
    method check_following (line 1570) | def check_following(self, value):
  class CapitalWordFrequencyChecker (line 1575) | class CapitalWordFrequencyChecker(Instruction):
    method build_description (line 1578) | def build_description(
    method get_instruction_args (line 1616) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1623) | def get_instruction_args_keys(self):
    method check_following (line 1627) | def check_following(self, value):
  class QuotationChecker (line 1641) | class QuotationChecker(Instruction):
    method build_description (line 1644) | def build_description(self):
    method get_instruction_args (line 1651) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1655) | def get_instruction_args_keys(self):
    method check_following (line 1659) | def check_following(self, value):
  class QuestionMarkChecker (line 1665) | class QuestionMarkChecker(Instruction):
    method build_description (line 1668) | def build_description(self):
    method get_instruction_args (line 1675) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1679) | def get_instruction_args_keys(self):
    method check_following (line 1683) | def check_following(self, value):
  class ExclamationMarkChecker (line 1695) | class ExclamationMarkChecker(Instruction):
    method build_description (line 1698) | def build_description(self):
    method get_instruction_args (line 1705) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1709) | def get_instruction_args_keys(self):
    method check_following (line 1713) | def check_following(self, value):
  class EnieChecker (line 1725) | class EnieChecker(Instruction):
    method build_description (line 1728) | def build_description(
    method get_instruction_args (line 1754) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1760) | def get_instruction_args_keys(self):
    method check_following (line 1764) | def check_following(self, value):
  class DieresisChecker (line 1778) | class DieresisChecker(Instruction):
    method build_description (line 1781) | def build_description(
    method get_instruction_args (line 1807) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1813) | def get_instruction_args_keys(self):
    method check_following (line 1817) | def check_following(self, value):
  class TildesChecker (line 1831) | class TildesChecker(Instruction):
    method build_description (line 1834) | def build_description(self, *, num_words = None,
    method get_instruction_args (line 1872) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1877) | def get_instruction_args_keys(self):
    method check_following (line 1881) | def check_following(self, value):

FILE: lm_eval/tasks/ifeval/multilingual/instructions/es_instructions.py
  class Instruction (line 92) | class Instruction:
    method __init__ (line 95) | def __init__(self, instruction_id):
    method build_description (line 98) | def build_description(self, **kwargs):
    method get_instruction_args (line 101) | def get_instruction_args(self):
    method get_instruction_args_keys (line 104) | def get_instruction_args_keys(self):
    method check_following (line 107) | def check_following(self, value):
  class ResponseLanguageChecker (line 111) | class ResponseLanguageChecker(Instruction):
    method build_description (line 114) | def build_description(self, *, language = None):
    method get_instruction_args (line 135) | def get_instruction_args(self):
    method get_instruction_args_keys (line 139) | def get_instruction_args_keys(self):
    method check_following (line 143) | def check_following(self, value):
  class NumberOfSentences (line 164) | class NumberOfSentences(Instruction):
    method build_description (line 167) | def build_description(self, *, num_sentences = None,
    method get_instruction_args (line 203) | def get_instruction_args(self):
    method get_instruction_args_keys (line 208) | def get_instruction_args_keys(self):
    method check_following (line 212) | def check_following(self, value):
  class PlaceholderChecker (line 250) | class PlaceholderChecker(Instruction):
    method build_description (line 253) | def build_description(self, *, num_placeholders = None,
    method get_instruction_args (line 286) | def get_instruction_args(self):
    method get_instruction_args_keys (line 291) | def get_instruction_args_keys(self):
    method check_following (line 295) | def check_following(self, value):
  class BulletListChecker (line 314) | class BulletListChecker(Instruction):
    method build_description (line 317) | def build_description(self, *, num_bullets = None):
    method get_instruction_args (line 338) | def get_instruction_args(self):
    method get_instruction_args_keys (line 342) | def get_instruction_args_keys(self):
    method check_following (line 346) | def check_following(self, value):
  class ConstrainedResponseChecker (line 362) | class ConstrainedResponseChecker(Instruction):
    method build_description (line 365) | def build_description(self):
    method get_instruction_args (line 374) | def get_instruction_args(self):
    method get_instruction_args_keys (line 378) | def get_instruction_args_keys(self):
    method check_following (line 382) | def check_following(self, value):
  class ConstrainedStartChecker (line 400) | class ConstrainedStartChecker(Instruction):
    method build_description (line 403) | def build_description(self, *, starter = None):
    method get_instruction_args (line 421) | def get_instruction_args(self):
    method get_instruction_args_keys (line 425) | def get_instruction_args_keys(self):
    method check_following (line 429) | def check_following(self, value):
  class HighlightSectionChecker (line 445) | class HighlightSectionChecker(Instruction):
    method build_description (line 448) | def build_description(self, *, num_highlights = None,
    method get_instruction_args (line 481) | def get_instruction_args(self):
    method get_instruction_args_keys (line 486) | def get_instruction_args_keys(self):
    method check_following (line 490) | def check_following(self, value):
  class SectionChecker (line 518) | class SectionChecker(Instruction):
    method build_description (line 521) | def build_description(self, *, section_spliter = None,
    method get_instruction_args (line 565) | def get_instruction_args(self):
    method get_instruction_args_keys (line 571) | def get_instruction_args_keys(self):
    method check_following (line 575) | def check_following(self, value):
  class ParagraphChecker (line 598) | class ParagraphChecker(Instruction):
    method build_description (line 601) | def build_description(self, *, num_paragraphs = None):
    method get_instruction_args (line 620) | def get_instruction_args(self):
    method get_instruction_args_keys (line 624) | def get_instruction_args_keys(self):
    method check_following (line 628) | def check_following(self, value):
  class PostscriptChecker (line 652) | class PostscriptChecker(Instruction):
    method build_description (line 655) | def build_description(self, *, postscript_marker = None
    method get_instruction_args (line 677) | def get_instruction_args(self):
    method get_instruction_args_keys (line 681) | def get_instruction_args_keys(self):
    method check_following (line 685) | def check_following(self, value):
  class RephraseChecker (line 708) | class RephraseChecker(Instruction):
    method build_description (line 711) | def build_description(self, *, original_message):
    method get_instruction_args (line 733) | def get_instruction_args(self):
    method get_instruction_args_keys (line 737) | def get_instruction_args_keys(self):
    method check_following (line 741) | def check_following(self, value):
    method is_change (line 763) | def is_change(self, response):
    method strip_changes (line 767) | def strip_changes(self, response):
  class KeywordChecker (line 772) | class KeywordChecker(Instruction):
    method build_description (line 775) | def build_description(self, *, keywords = None
    method get_instruction_args (line 798) | def get_instruction_args(self):
    method get_instruction_args_keys (line 802) | def get_instruction_args_keys(self):
    method check_following (line 806) | def check_following(self, value):
  class KeywordFrequencyChecker (line 814) | class KeywordFrequencyChecker(Instruction):
    method build_description (line 817) | def build_description(self, *, keyword = None,
    method get_instruction_args (line 861) | def get_instruction_args(self):
    method get_instruction_args_keys (line 867) | def get_instruction_args_keys(self):
    method check_following (line 871) | def check_following(self, value):
  class NumberOfWords (line 882) | class NumberOfWords(Instruction):
    method build_description (line 885) | def build_description(self, *, num_words = None,
    method get_instruction_args (line 923) | def get_instruction_args(self):
    method get_instruction_args_keys (line 928) | def get_instruction_args_keys(self):
    method check_following (line 932) | def check_following(self, value):
  class JsonFormat (line 948) | class JsonFormat(Instruction):
    method build_description (line 951) | def build_description(self):
    method get_instruction_args (line 957) | def get_instruction_args(self):
    method get_instruction_args_keys (line 961) | def get_instruction_args_keys(self):
    method check_following (line 968) | def check_following(self, value):
  class ParagraphFirstWordCheck (line 985) | class ParagraphFirstWordCheck(Instruction):
    method build_description (line 988) | def build_description(self, num_paragraphs = None,
    method get_instruction_args (line 1032) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1038) | def get_instruction_args_keys(self):
    method check_following (line 1042) | def check_following(self, value):
  class KeySentenceChecker (line 1097) | class KeySentenceChecker(Instruction):
    method build_description (line 1100) | def build_description(self, key_sentences = None,
    method get_instruction_args (line 1133) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1138) | def get_instruction_args_keys(self):
    method check_following (line 1142) | def check_following(self, value):
  class ForbiddenWords (line 1153) | class ForbiddenWords(Instruction):
    method build_description (line 1156) | def build_description(self, forbidden_words = None
    method get_instruction_args (line 1182) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1186) | def get_instruction_args_keys(self):
    method check_following (line 1190) | def check_following(self, value):
  class RephraseParagraph (line 1199) | class RephraseParagraph(Instruction):
    method build_description (line 1202) | def build_description(self, *, original_paragraph, low, high
    method get_instruction_args (line 1231) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1237) | def get_instruction_args_keys(self):
    method check_following (line 1241) | def check_following(self, value):
  class TwoResponsesChecker (line 1255) | class TwoResponsesChecker(Instruction):
    method build_description (line 1258) | def build_description(self):
    method get_instruction_args (line 1266) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1270) | def get_instruction_args_keys(self):
    method check_following (line 1274) | def check_following(self, value):
  class RepeatPromptThenAnswer (line 1297) | class RepeatPromptThenAnswer(Instruction):
    method build_description (line 1300) | def build_description(self, *, prompt_to_repeat = None):
    method get_instruction_args (line 1320) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1323) | def get_instruction_args_keys(self):
    method check_following (line 1327) | def check_following(self, value):
  class EndChecker (line 1333) | class EndChecker(Instruction):
    method build_description (line 1336) | def build_description(self, *, end_phrase = None):
    method get_instruction_args (line 1355) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1358) | def get_instruction_args_keys(self):
    method check_following (line 1362) | def check_following(self, value):
  class TitleChecker (line 1373) | class TitleChecker(Instruction):
    method build_description (line 1376) | def build_description(self):
    method get_instruction_args (line 1384) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1387) | def get_instruction_args_keys(self):
    method check_following (line 1391) | def check_following(self, value):
  class LetterFrequencyChecker (line 1403) | class LetterFrequencyChecker(Instruction):
    method build_description (line 1406) | def build_description(self, *, letter = None,
    method get_instruction_args (line 1460) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1466) | def get_instruction_args_keys(self):
    method check_following (line 1470) | def check_following(self, value):
  class CapitalLettersSpanishChecker (line 1481) | class CapitalLettersSpanishChecker(Instruction):
    method build_description (line 1484) | def build_description(self):
    method get_instruction_args (line 1491) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1494) | def get_instruction_args_keys(self):
    method check_following (line 1498) | def check_following(self, value):
  class LowercaseLettersSpanishChecker (line 1523) | class LowercaseLettersSpanishChecker(Instruction):
    method build_description (line 1526) | def build_description(self):
    method get_instruction_args (line 1534) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1537) | def get_instruction_args_keys(self):
    method check_following (line 1541) | def check_following(self, value):
  class CommaChecker (line 1555) | class CommaChecker(Instruction):
    method build_description (line 1558) | def build_description(self):
    method get_instruction_args (line 1565) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1568) | def get_instruction_args_keys(self):
    method check_following (line 1572) | def check_following(self, value):
  class CapitalWordFrequencyChecker (line 1577) | class CapitalWordFrequencyChecker(Instruction):
    method build_description (line 1580) | def build_description(
    method get_instruction_args (line 1618) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1625) | def get_instruction_args_keys(self):
    method check_following (line 1629) | def check_following(self, value):
  class QuotationChecker (line 1643) | class QuotationChecker(Instruction):
    method build_description (line 1646) | def build_description(self):
    method get_instruction_args (line 1653) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1657) | def get_instruction_args_keys(self):
    method check_following (line 1661) | def check_following(self, value):
  class QuestionMarkChecker (line 1667) | class QuestionMarkChecker(Instruction):
    method build_description (line 1670) | def build_description(self):
    method get_instruction_args (line 1677) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1681) | def get_instruction_args_keys(self):
    method check_following (line 1685) | def check_following(self, value):
  class ExclamationMarkChecker (line 1697) | class ExclamationMarkChecker(Instruction):
    method build_description (line 1700) | def build_description(self):
    method get_instruction_args (line 1707) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1711) | def get_instruction_args_keys(self):
    method check_following (line 1715) | def check_following(self, value):
  class EnieChecker (line 1727) | class EnieChecker(Instruction):
    method build_description (line 1730) | def build_description(
    method get_instruction_args (line 1756) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1762) | def get_instruction_args_keys(self):
    method check_following (line 1766) | def check_following(self, value):
  class DieresisChecker (line 1780) | class DieresisChecker(Instruction):
    method build_description (line 1783) | def build_description(
    method get_instruction_args (line 1809) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1815) | def get_instruction_args_keys(self):
    method check_following (line 1819) | def check_following(self, value):
  class TildesChecker (line 1833) | class TildesChecker(Instruction):
    method build_description (line 1836) | def build_description(self, *, num_words = None,
    method get_instruction_args (line 1874) | def get_instruction_args(self):
    method get_instruction_args_keys (line 1879) | def get_instruction_args_keys(self):
    method check_following (line 1883) | def check_following(self, value):

FILE: lm_eval/tasks/ifeval/multilingual/utils.py
  class InputExample (line 7) | class InputExample:
  class OutputE

Copy disabled (too large) Download .json

Condensed preview — 15734 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (13,843K chars).

[
  {
    "path": ".github/workflows/new_tasks.yml",
    "chars": 2729,
    "preview": "name: Tasks Modified\n\non:\n  push:\n    branches:\n      - 'main'\n  pull_request:\n    branches:\n      - 'main'\n  workflow_d"
  },
  {
    "path": ".github/workflows/publish.yml",
    "chars": 2125,
    "preview": "name: Publish Python distribution to PyPI\n\non:\n  push:\n    tags:\n      - '*'\n\njobs:\n  build:\n    name: Build distributio"
  },
  {
    "path": ".github/workflows/unit_tests.yml",
    "chars": 3881,
    "preview": "# This workflow will install Python dependencies, run tests and lint with a variety of Python versions\n# For more inform"
  },
  {
    "path": ".gitignore",
    "chars": 487,
    "preview": "# macOS system files\n.DS_Store\n\n# Virtual environments\n.venv/\nvenv/\nENV/\nenv/\n*.env\n\n# Python bytecode and build artifac"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 1562,
    "preview": "# Ignore test linting to avoid conflicting changes to version stability.\nexclude: ^tests/testdata/\nrepos:\n  - repo: http"
  },
  {
    "path": "CITATION.bib",
    "chars": 744,
    "preview": "@misc{eval-harness,\n  author       = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid a"
  },
  {
    "path": "CODEOWNERS",
    "chars": 21,
    "preview": "* @baberabb\n* @0xSMT\n"
  },
  {
    "path": "LICENSE.md",
    "chars": 1067,
    "preview": "MIT License\n\nCopyright (c) 2020 EleutherAI\n\nPermission is hereby granted, free of charge, to any person obtaining a copy"
  },
  {
    "path": "MANIFEST.in",
    "chars": 24,
    "preview": "recursive-include tests\n"
  },
  {
    "path": "README.md",
    "chars": 55737,
    "preview": "# Language Model Evaluation Harness\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10256836.svg)](https://doi.org/"
  },
  {
    "path": "docs/API_guide.md",
    "chars": 8173,
    "preview": "# TemplateAPI Usage Guide\n\nThe `TemplateAPI` class is a versatile superclass designed to facilitate the integration of v"
  },
  {
    "path": "docs/CONTRIBUTING.md",
    "chars": 6148,
    "preview": "# Contributing to LM Evaluation Harness\n\nWelcome and thank you for your interest in the LM Evaluation Harness! We welcom"
  },
  {
    "path": "docs/README.md",
    "chars": 859,
    "preview": "# Eval Harness Documentation\n\nWelcome to the docs for the LM Evaluation Harness!\n\n## Table of Contents\n\n* To learn about"
  },
  {
    "path": "docs/chat-template-readme.md",
    "chars": 1331,
    "preview": "# Chat Template Delimiter Handling Update\n\n## Overview\n\nThis change modifies how delimiters are handled when applying ch"
  },
  {
    "path": "docs/config_files.md",
    "chars": 3620,
    "preview": "# Configuration Guide\n\nThis guide explains how to use YAML configuration files with `lm-eval` to define reusable evaluat"
  },
  {
    "path": "docs/decontamination.md",
    "chars": 3506,
    "preview": "# Decontamination\n\n## Usage\n\nThe provided directory should contain\nthe ngram files and info.json produced in \"Pile Ngram"
  },
  {
    "path": "docs/footguns.md",
    "chars": 1827,
    "preview": "# Common Pitfalls and Troubleshooting Guide\n\nThis document highlights common pitfalls and troubleshooting tips when usin"
  },
  {
    "path": "docs/interface.md",
    "chars": 9997,
    "preview": "# User Guide\n\nThis document details the interface exposed by `lm-eval` and provides details on what flags are available "
  },
  {
    "path": "docs/model_guide.md",
    "chars": 11334,
    "preview": "# New Model Guide\n\nThis guide may be of special interest to users who are using the library outside of the repository, v"
  },
  {
    "path": "docs/new_task_guide.md",
    "chars": 27171,
    "preview": "# New Task Guide\n\n`lm-evaluation-harness` is a framework that strives to support a wide range of zero- and few-shot eval"
  },
  {
    "path": "docs/python-api.md",
    "chars": 7379,
    "preview": "# Python API\n\nThis guide covers programmatic usage of the evaluation harness in Python scripts and applications.\n\n## Ove"
  },
  {
    "path": "docs/task_guide.md",
    "chars": 20974,
    "preview": "# Task Configuration\n\nThe `lm-evaluation-harness` is meant to be an extensible and flexible framework within which many "
  },
  {
    "path": "examples/lm-eval-overview.ipynb",
    "chars": 50295,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"Qw83KAePAhaS\"\n   },\n   \"source\": [\n    \"# Rele"
  },
  {
    "path": "examples/transformer-lens.py",
    "chars": 1985,
    "preview": "import warnings\n\nimport torch\nimport torch.nn as nn\nfrom transformer_lens import HookedTransformer\nfrom transformers imp"
  },
  {
    "path": "examples/visualize-wandb.ipynb",
    "chars": 5374,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"fc477b96-adee-4829-a9d7-a5eb990df358\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "examples/visualize-zeno.ipynb",
    "chars": 3563,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Visualizing Results in Zeno\\n\",\n "
  },
  {
    "path": "ignore.txt",
    "chars": 45,
    "preview": "ROUGE\nrouge\nnin\nmaka\nmor\nte\nond\nextraversion\n"
  },
  {
    "path": "lm_eval/__init__.py",
    "chars": 805,
    "preview": "import importlib.metadata\nimport logging\nimport os\nfrom importlib.util import find_spec\n\n\n__version__ = importlib.metada"
  },
  {
    "path": "lm_eval/__main__.py",
    "chars": 288,
    "preview": "from lm_eval._cli import HarnessCLI\nfrom lm_eval.utils import setup_logging\n\n\ndef cli_evaluate() -> None:\n    \"\"\"Main CL"
  },
  {
    "path": "lm_eval/_cli/__init__.py",
    "chars": 110,
    "preview": "\"\"\"\nCLI subcommands to run from the terminal.\n\"\"\"\n\nfrom .harness import HarnessCLI\n\n\n__all__ = [\"HarnessCLI\"]\n"
  },
  {
    "path": "lm_eval/_cli/harness.py",
    "chars": 2416,
    "preview": "import argparse\nimport sys\nimport textwrap\n\nfrom lm_eval._cli.ls import List\nfrom lm_eval._cli.run import Run\nfrom lm_ev"
  },
  {
    "path": "lm_eval/_cli/ls.py",
    "chars": 3394,
    "preview": "import argparse\nimport textwrap\n\nfrom lm_eval._cli.subcommand import SubCommand\n\n\nclass List(SubCommand):\n    \"\"\"Command"
  },
  {
    "path": "lm_eval/_cli/run.py",
    "chars": 16917,
    "preview": "import argparse\nimport json\nimport logging\nimport os\nimport textwrap\nfrom functools import partial\n\nfrom lm_eval._cli.su"
  },
  {
    "path": "lm_eval/_cli/subcommand.py",
    "chars": 480,
    "preview": "import argparse\nfrom abc import ABC, abstractmethod\n\n\nclass SubCommand(ABC):\n    \"\"\"Base class for all subcommands.\"\"\"\n\n"
  },
  {
    "path": "lm_eval/_cli/utils.py",
    "chars": 5608,
    "preview": "import argparse\nimport ast\nimport json\nimport logging\nfrom collections.abc import Sequence\nfrom typing import Any\n\n\neval"
  },
  {
    "path": "lm_eval/_cli/validate.py",
    "chars": 4937,
    "preview": "import argparse\nimport sys\nimport textwrap\n\nfrom lm_eval._cli.subcommand import SubCommand\n\n\nclass Validate(SubCommand):"
  },
  {
    "path": "lm_eval/api/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "lm_eval/api/filter.py",
    "chars": 2149,
    "preview": "from abc import ABC, abstractmethod\nfrom dataclasses import dataclass\nfrom typing import Callable, Iterable, List, Union"
  },
  {
    "path": "lm_eval/api/group.py",
    "chars": 14078,
    "preview": "\"\"\"\nGroup model for organizing tasks into hierarchical collections.\n\nA Group is a container for Tasks and/or sub-Groups "
  },
  {
    "path": "lm_eval/api/instance.py",
    "chars": 1060,
    "preview": "from dataclasses import dataclass, field\nfrom typing import Literal, Optional, Tuple\n\n\nOutputType = Literal[\n    \"loglik"
  },
  {
    "path": "lm_eval/api/metrics.py",
    "chars": 19150,
    "preview": "import logging\nimport math\nimport os\nimport random\nimport re\nimport string\nfrom collections.abc import Iterable\nfrom typ"
  },
  {
    "path": "lm_eval/api/model.py",
    "chars": 20799,
    "preview": "import abc\nimport hashlib\nimport json\nimport logging\nimport os\nfrom collections.abc import Iterable\nfrom typing import T"
  },
  {
    "path": "lm_eval/api/registry.py",
    "chars": 22271,
    "preview": "\"\"\"Registry system for lm_eval components.\n\nThis module provides a centralized registration system for models, tasks, me"
  },
  {
    "path": "lm_eval/api/samplers.py",
    "chars": 3971,
    "preview": "from __future__ import annotations\n\nimport logging\nfrom random import Random\nfrom typing import TYPE_CHECKING\n\n\nif TYPE_"
  },
  {
    "path": "lm_eval/api/task.py",
    "chars": 71922,
    "preview": "from __future__ import annotations\n\nimport abc\nimport ast\nimport logging\nimport random\nimport re\nfrom collections.abc im"
  },
  {
    "path": "lm_eval/api/utils.py",
    "chars": 3465,
    "preview": "from __future__ import annotations\n\nfrom dataclasses import dataclass\nfrom typing import Any\n\n\ndef maybe_delimit(prefix:"
  },
  {
    "path": "lm_eval/caching/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "lm_eval/caching/cache.py",
    "chars": 1424,
    "preview": "import hashlib\nimport logging\nimport os\n\nimport dill\n\n\neval_logger = logging.getLogger(__name__)\n\n\nMODULE_DIR = os.path."
  },
  {
    "path": "lm_eval/config/__init__.py",
    "chars": 166,
    "preview": "from .evaluate_config import EvaluatorConfig\nfrom .group import GroupConfig\nfrom .task import TaskConfig\n\n\n__all__ = [\"E"
  },
  {
    "path": "lm_eval/config/evaluate_config.py",
    "chars": 15162,
    "preview": "import json\nimport logging\nimport textwrap\nfrom argparse import Namespace\nfrom dataclasses import asdict, dataclass, fie"
  },
  {
    "path": "lm_eval/config/group.py",
    "chars": 4631,
    "preview": "from collections.abc import Callable\nfrom dataclasses import dataclass\nfrom typing import Any\n\n\n@dataclass\nclass AggMetr"
  },
  {
    "path": "lm_eval/config/task.py",
    "chars": 8251,
    "preview": "from __future__ import annotations\n\nimport logging\nfrom dataclasses import asdict, dataclass\nfrom inspect import getsour"
  },
  {
    "path": "lm_eval/decontamination/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "lm_eval/decontamination/archiver.py",
    "chars": 5789,
    "preview": "import datetime\nimport io\nimport json\nimport mmap\nimport os\nfrom pathlib import Path\nfrom typing import Any\n\nimport json"
  },
  {
    "path": "lm_eval/decontamination/decontaminate.py",
    "chars": 6821,
    "preview": "import collections\nimport glob\nimport json\nimport os\nimport pickle\nimport random\nimport time\n\nfrom .archiver import ZStd"
  },
  {
    "path": "lm_eval/decontamination/janitor.py",
    "chars": 13039,
    "preview": "import pickle\nimport re\nimport string\nimport traceback\nfrom typing import Iterator, List, Sequence, Tuple, TypeVar\n\n\n# T"
  },
  {
    "path": "lm_eval/defaults.py",
    "chars": 1351,
    "preview": "import os\nfrom typing import Any\n\n\nDEFAULT_MAX_LENGTH = 2048\nDEFAULT_MAX_GEN_TOKS = 256\nDEFAULT_RANDOM_SEED = 0\nDEFAULT_"
  },
  {
    "path": "lm_eval/evaluator.py",
    "chars": 29713,
    "preview": "from __future__ import annotations\n\nimport itertools\nimport json\nimport logging\nimport random\nimport time\nfrom collectio"
  },
  {
    "path": "lm_eval/evaluator_utils.py",
    "chars": 17288,
    "preview": "from __future__ import annotations\n\nimport logging\nimport math\nimport pathlib\nimport sys\nfrom dataclasses import datacla"
  },
  {
    "path": "lm_eval/filters/__init__.py",
    "chars": 803,
    "preview": "from __future__ import annotations\n\nfrom functools import partial\n\nfrom lm_eval.api.filter import FilterEnsemble\nfrom lm"
  },
  {
    "path": "lm_eval/filters/custom.py",
    "chars": 453,
    "preview": "from lm_eval.api.filter import Filter\nfrom lm_eval.api.registry import register_filter\n\n\n@register_filter(\"custom\")\nclas"
  },
  {
    "path": "lm_eval/filters/decontamination.py",
    "chars": 712,
    "preview": "from lm_eval.api.filter import Filter\nfrom lm_eval.api.registry import register_filter\n\n\n@register_filter(\"decontaminate"
  },
  {
    "path": "lm_eval/filters/extraction.py",
    "chars": 8679,
    "preview": "import re\nimport sys\nimport unicodedata\n\nfrom lm_eval.api.filter import Filter\nfrom lm_eval.api.registry import register"
  },
  {
    "path": "lm_eval/filters/selection.py",
    "chars": 2105,
    "preview": "from collections import Counter\n\nfrom lm_eval.api.filter import Filter\nfrom lm_eval.api.registry import register_filter\n"
  },
  {
    "path": "lm_eval/filters/transformation.py",
    "chars": 3971,
    "preview": "import re\n\nfrom lm_eval.api.filter import Filter\nfrom lm_eval.api.registry import register_filter\n\n\n@register_filter(\"lo"
  },
  {
    "path": "lm_eval/loggers/__init__.py",
    "chars": 88,
    "preview": "from .evaluation_tracker import EvaluationTracker\nfrom .wandb_logger import WandbLogger\n"
  },
  {
    "path": "lm_eval/loggers/evaluation_tracker.py",
    "chars": 26111,
    "preview": "import json\nimport logging\nimport os\nimport re\nimport time\nfrom collections import defaultdict\nfrom dataclasses import a"
  },
  {
    "path": "lm_eval/loggers/utils.py",
    "chars": 5600,
    "preview": "import logging\nimport os\nimport re\nimport subprocess\nfrom importlib.metadata import version\nfrom pathlib import Path\nfro"
  },
  {
    "path": "lm_eval/loggers/wandb_logger.py",
    "chars": 13844,
    "preview": "import copy\nimport json\nimport logging\nfrom typing import Any, Dict, List, Literal, Tuple\n\nimport numpy as np\nimport pan"
  },
  {
    "path": "lm_eval/models/__init__.py",
    "chars": 3108,
    "preview": "\"\"\"Model implementations for lm_eval.\n\nModels are lazily loaded via the registry system to improve startup performance.\n"
  },
  {
    "path": "lm_eval/models/anthropic_llms.py",
    "chars": 12816,
    "preview": "import logging\nimport os\nfrom functools import cached_property\nfrom typing import Any, Dict, List, Tuple, Union\n\nfrom tq"
  },
  {
    "path": "lm_eval/models/api_models.py",
    "chars": 34542,
    "preview": "import abc\nimport asyncio\nimport copy\nimport itertools\nimport json\nimport logging\nfrom functools import cached_property\n"
  },
  {
    "path": "lm_eval/models/dummy.py",
    "chars": 1991,
    "preview": "import random\nfrom functools import cached_property\n\nfrom tqdm import tqdm\n\nfrom lm_eval.api.model import LM\nfrom lm_eva"
  },
  {
    "path": "lm_eval/models/gguf.py",
    "chars": 4815,
    "preview": "import logging\nimport time\n\nimport requests\nfrom requests.exceptions import RequestException\nfrom tqdm import tqdm\n\nfrom"
  },
  {
    "path": "lm_eval/models/hf_audiolm.py",
    "chars": 11704,
    "preview": "import copy\n\nimport torch\nimport transformers\nfrom tqdm import tqdm\nfrom transformers import BatchEncoding\n\nfrom lm_eval"
  },
  {
    "path": "lm_eval/models/hf_steered.py",
    "chars": 10010,
    "preview": "from __future__ import annotations\n\nfrom contextlib import contextmanager\nfrom functools import partial\nfrom pathlib imp"
  },
  {
    "path": "lm_eval/models/hf_vlms.py",
    "chars": 31685,
    "preview": "import copy\nimport logging\n\nimport torch\nimport torch.nn.functional as F\nimport transformers\nfrom tqdm import tqdm\nfrom "
  },
  {
    "path": "lm_eval/models/huggingface.py",
    "chars": 73991,
    "preview": "from __future__ import annotations\n\nimport logging\nimport os\nfrom datetime import timedelta\nfrom pathlib import Path\nfro"
  },
  {
    "path": "lm_eval/models/ibm_watsonx_ai.py",
    "chars": 18489,
    "preview": "import copy\nimport json\nimport logging\nimport os\nimport warnings\nfrom functools import cache\nfrom typing import NamedTup"
  },
  {
    "path": "lm_eval/models/mamba_lm.py",
    "chars": 6237,
    "preview": "import torch\n\nimport lm_eval.models.utils\nimport lm_eval.models.utils_hf\nfrom lm_eval.api.registry import register_model"
  },
  {
    "path": "lm_eval/models/megatron_lm.py",
    "chars": 54119,
    "preview": "r\"\"\"\nMegatron-LM backend for lm-evaluation-harness.\n\nThis module provides support for evaluating Megatron-LM models, inc"
  },
  {
    "path": "lm_eval/models/mistral3.py",
    "chars": 4197,
    "preview": "\"\"\"\nMistral3 model adapter for lm-evaluation-harness.\n\nThis adapter enables evaluation of Ministral-3 models (3B, 8B, 14"
  },
  {
    "path": "lm_eval/models/nemo_lm.py",
    "chars": 19598,
    "preview": "# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "lm_eval/models/neuron_optimum.py",
    "chars": 28840,
    "preview": "import copy\nimport logging\nfrom collections import defaultdict\nfrom typing import TYPE_CHECKING, Optional\n\nimport torch\n"
  },
  {
    "path": "lm_eval/models/openai_completions.py",
    "chars": 13516,
    "preview": "import logging\nimport os\nfrom functools import cached_property\nfrom operator import itemgetter\nfrom typing import Any, D"
  },
  {
    "path": "lm_eval/models/optimum_habana.py",
    "chars": 6930,
    "preview": "import logging\nimport os\nfrom importlib.util import find_spec\nfrom typing import Any\n\nimport torch\nimport torch.nn.funct"
  },
  {
    "path": "lm_eval/models/optimum_ipex.py",
    "chars": 2342,
    "preview": "import logging\nfrom importlib.util import find_spec\n\nfrom lm_eval.api.registry import register_model\nfrom lm_eval.models"
  },
  {
    "path": "lm_eval/models/optimum_lm.py",
    "chars": 3051,
    "preview": "import json\nimport logging\nfrom importlib.util import find_spec\nfrom pathlib import Path\n\nfrom lm_eval.api.registry impo"
  },
  {
    "path": "lm_eval/models/sglang_causallms.py",
    "chars": 21266,
    "preview": "import copy\nimport logging\nfrom importlib.util import find_spec\nfrom typing import TYPE_CHECKING, Dict, List, Optional, "
  },
  {
    "path": "lm_eval/models/sglang_generate_API.py",
    "chars": 3360,
    "preview": "from typing import Dict, List, Optional, Tuple, Union\n\nfrom lm_eval.api.registry import register_model\nfrom lm_eval.mode"
  },
  {
    "path": "lm_eval/models/textsynth.py",
    "chars": 5877,
    "preview": "\"\"\"TextSynth API\nImplementation provided by Fabrice Bellard:\n    https://github.com/EleutherAI/lm-evaluation-harness/iss"
  },
  {
    "path": "lm_eval/models/utils.py",
    "chars": 35903,
    "preview": "from __future__ import annotations\n\nimport collections\nimport fnmatch\nimport itertools\nimport logging\nimport time\nfrom f"
  },
  {
    "path": "lm_eval/models/utils_hf.py",
    "chars": 4805,
    "preview": "import gc\nfrom typing import Literal\n\nimport torch\nimport transformers\n\n\ndef pad_and_concat(\n    max_length: int,\n    te"
  },
  {
    "path": "lm_eval/models/vllm_causallms.py",
    "chars": 35158,
    "preview": "from __future__ import annotations\n\nimport gc\nimport logging\nimport os\nfrom importlib.metadata import version\nfrom impor"
  },
  {
    "path": "lm_eval/models/vllm_vlms.py",
    "chars": 12270,
    "preview": "import logging\nfrom typing import TYPE_CHECKING, Any\n\nimport ray\nimport transformers\nfrom more_itertools import distribu"
  },
  {
    "path": "lm_eval/models/winml.py",
    "chars": 28691,
    "preview": "\"\"\"\nWinML backend for lm-eval-harness with NPU/GPU/CPU support.\n\nThis backend leverages Windows Machine Learning (WinML)"
  },
  {
    "path": "lm_eval/prompts/__init__.py",
    "chars": 4575,
    "preview": "import ast\nimport logging\nimport os\nfrom typing import Dict\n\nfrom lm_eval import utils\n\n\neval_logger = logging.getLogger"
  },
  {
    "path": "lm_eval/result_schema.py",
    "chars": 7851,
    "preview": "from typing import Any, Generic, TypeVar\n\nfrom typing_extensions import NotRequired, TypedDict\n\n\nT = TypeVar(\"T\", bound="
  },
  {
    "path": "lm_eval/tasks/README.md",
    "chars": 131803,
    "preview": "# Tasks\n\nA list of supported tasks and task groupings can be viewed with `lm-eval ls tasks`.\n\nFor more information, incl"
  },
  {
    "path": "lm_eval/tasks/__init__.py",
    "chars": 6321,
    "preview": "\"\"\"Task management for lm-evaluation-harness.\n\nThis module provides:\n- TaskManager: Main class for discovering and loadi"
  },
  {
    "path": "lm_eval/tasks/_factory.py",
    "chars": 10675,
    "preview": "from __future__ import annotations\n\nimport inspect\nimport logging\nfrom copy import deepcopy\nfrom dataclasses import fiel"
  },
  {
    "path": "lm_eval/tasks/_index.py",
    "chars": 6605,
    "preview": "from __future__ import annotations\n\nimport logging\nfrom dataclasses import dataclass, field\nfrom enum import Enum, auto\n"
  },
  {
    "path": "lm_eval/tasks/_yaml_loader.py",
    "chars": 7311,
    "preview": "from __future__ import annotations\n\nimport importlib.util\nimport sys\nfrom pathlib import Path\nfrom typing import Any\n\nim"
  },
  {
    "path": "lm_eval/tasks/aclue/README.md",
    "chars": 2288,
    "preview": "# ACLUE\n\n### Paper\n\nCan Large Language Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE\nhttps://arxiv.org/a"
  },
  {
    "path": "lm_eval/tasks/aclue/_aclue.yaml",
    "chars": 684,
    "preview": "group: aclue\ntask:\n  - aclue_ancient_chinese_culture\n  - aclue_ancient_literature\n  - aclue_ancient_medical\n  - aclue_an"
  },
  {
    "path": "lm_eval/tasks/aclue/_default_template_yaml",
    "chars": 475,
    "preview": "dataset_path: tyouisen/aclue\ntest_split: test\nfewshot_split: dev\nfewshot_config:\n  sampler: first_n\noutput_type: multipl"
  },
  {
    "path": "lm_eval/tasks/aclue/_generate_configs.py",
    "chars": 2552,
    "preview": "\"\"\"\nTake in a YAML, and output all other splits with this YAML\n\"\"\"\n\nimport argparse\nimport logging\nimport os\n\nimport yam"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_ancient_chinese_culture.yaml",
    "chars": 169,
    "preview": "\"dataset_name\": \"ancient_chinese_culture\"\n\"description\": \"以下是关于国学常识的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_templa"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_ancient_literature.yaml",
    "chars": 161,
    "preview": "\"dataset_name\": \"ancient_literature\"\n\"description\": \"以下是关于古代文学知识的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_template_"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_ancient_medical.yaml",
    "chars": 152,
    "preview": "\"dataset_name\": \"ancient_medical\"\n\"description\": \"以下是关于医古文的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_template_yaml\"\n"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_ancient_phonetics.yaml",
    "chars": 156,
    "preview": "\"dataset_name\": \"ancient_phonetics\"\n\"description\": \"以下是关于古音学的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_template_yaml"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_basic_ancient_chinese.yaml",
    "chars": 166,
    "preview": "\"dataset_name\": \"basic_ancient_chinese\"\n\"description\": \"以下是关于古汉语知识的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_templat"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_couplet_prediction.yaml",
    "chars": 157,
    "preview": "\"dataset_name\": \"couplet_prediction\"\n\"description\": \"以下是关于对联的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_template_yaml"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_homographic_character_resolution.yaml",
    "chars": 186,
    "preview": "\"dataset_name\": \"homographic_character_resolution\"\n\"description\": \"以下是关于通假字的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_defaul"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_named_entity_recognition.yaml",
    "chars": 175,
    "preview": "\"dataset_name\": \"named_entity_recognition\"\n\"description\": \"以下是关于古汉语命名体识别的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_t"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_poetry_appreciate.yaml",
    "chars": 159,
    "preview": "\"dataset_name\": \"poetry_appreciate\"\n\"description\": \"以下是关于古诗词曲鉴赏的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_template_y"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_poetry_context_prediction.yaml",
    "chars": 177,
    "preview": "\"dataset_name\": \"poetry_context_prediction\"\n\"description\": \"以下是关于古诗词上下句预测的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_poetry_quality_assessment.yaml",
    "chars": 176,
    "preview": "\"dataset_name\": \"poetry_quality_assessment\"\n\"description\": \"以下是关于古诗词质量评估的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_t"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_poetry_sentiment_analysis.yaml",
    "chars": 175,
    "preview": "\"dataset_name\": \"poetry_sentiment_analysis\"\n\"description\": \"以下是关于诗词情感分类的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_te"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_polysemy_resolution.yaml",
    "chars": 163,
    "preview": "\"dataset_name\": \"polysemy_resolution\"\n\"description\": \"以下是关于古文单字多义的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_template"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_reading_comprehension.yaml",
    "chars": 167,
    "preview": "\"dataset_name\": \"reading_comprehension\"\n\"description\": \"以下是关于古文阅读理解的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_templa"
  },
  {
    "path": "lm_eval/tasks/aclue/aclue_sentence_segmentation.yaml",
    "chars": 165,
    "preview": "\"dataset_name\": \"sentence_segmentation\"\n\"description\": \"以下是关于古文断句的单项选择题，请直接给出正确答案的选项。\\n\\n\"\n\"include\": \"_default_template"
  },
  {
    "path": "lm_eval/tasks/acpbench/README.md",
    "chars": 5770,
    "preview": "# ACPBench\n\n**Homepage:** https://ibm.github.io/ACPBench/\n\n### Papers\n\n**Title:** ACPBench: Reasoning About Action, Chan"
  },
  {
    "path": "lm_eval/tasks/acpbench/boolq_cot_2shot/_boolq_cot_2shot_yaml",
    "chars": 920,
    "preview": "tag:\n  - acp_bool_cot_2shot\n  - acp_bench\noutput_type: generate_until\ndataset_path: ibm-research/acp_bench\ntest_split: t"
  },
  {
    "path": "lm_eval/tasks/acpbench/boolq_cot_2shot/act_reach.yaml",
    "chars": 2393,
    "preview": "task: acp_areach_bool\ndataset_name: acp_areach_bool\ninclude: _boolq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  "
  },
  {
    "path": "lm_eval/tasks/acpbench/boolq_cot_2shot/app.yaml",
    "chars": 2109,
    "preview": "task: acp_app_bool\ndataset_name: acp_app_bool\ninclude: _boolq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  sample"
  },
  {
    "path": "lm_eval/tasks/acpbench/boolq_cot_2shot/just.yaml",
    "chars": 3967,
    "preview": "task: acp_just_bool\ndataset_name: acp_just_bool\ninclude: _boolq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samp"
  },
  {
    "path": "lm_eval/tasks/acpbench/boolq_cot_2shot/land.yaml",
    "chars": 2898,
    "preview": "task: acp_land_bool\ndataset_name: acp_land_bool\ninclude: _boolq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samp"
  },
  {
    "path": "lm_eval/tasks/acpbench/boolq_cot_2shot/prog.yaml",
    "chars": 2532,
    "preview": "task: acp_prog_bool\ndataset_name: acp_prog_bool\ninclude: _boolq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samp"
  },
  {
    "path": "lm_eval/tasks/acpbench/boolq_cot_2shot/reach.yaml",
    "chars": 2637,
    "preview": "task: acp_reach_bool\ndataset_name: acp_reach_bool\ninclude: _boolq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  sa"
  },
  {
    "path": "lm_eval/tasks/acpbench/boolq_cot_2shot/val.yaml",
    "chars": 3070,
    "preview": "task: acp_val_bool\ndataset_name: acp_val_bool\ninclude: _boolq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  sample"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/_gen_yaml_2shot",
    "chars": 503,
    "preview": "tag:\n  - acp_gen_2shot\n  - acp_bench_hard\ndataset_path: ibm-research/acp_bench\ntest_split: test\ndoc_to_target: \"{{answer"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/acp_grammar.lark",
    "chars": 383,
    "preview": "NAME: /[a-zA-Z][a-zA-Z0-9-_]*/\nLPAR : \"(\"\nRPAR : \")\"\nLSPAR: \"[\"\nRSPAR: \"]\"\nCOMMA: \",\"\nWS: /[ \\n]/\n\naction_none : \"None\"\n"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/acp_utils.py",
    "chars": 37056,
    "preview": "import json\nimport os\nfrom abc import ABC, abstractmethod\nfrom collections import defaultdict\nfrom pathlib import Path\n\n"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/act_reach.yaml",
    "chars": 3462,
    "preview": "task: acp_areach_gen\ndataset_name: acp_areach_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samples:"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/app.yaml",
    "chars": 3713,
    "preview": "task: acp_app_gen\ndataset_name: acp_app_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samples:\n  - c"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/just.yaml",
    "chars": 4868,
    "preview": "task: acp_just_gen\ndataset_name: acp_just_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samples:\n  -"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/land.yaml",
    "chars": 2692,
    "preview": "task: acp_land_gen\ndataset_name: acp_land_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samples:\n  -"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/next_act.yaml",
    "chars": 3553,
    "preview": "task: acp_nexta_gen\ndataset_name: acp_nexta_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samples:\n "
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/prog.yaml",
    "chars": 3145,
    "preview": "task: acp_prog_gen\ndataset_name: acp_prog_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samples:\n  -"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/reach.yaml",
    "chars": 2373,
    "preview": "task: acp_reach_gen\ndataset_name: acp_reach_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samples:\n "
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot/val.yaml",
    "chars": 4131,
    "preview": "task: acp_val_gen\ndataset_name: acp_val_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samples:\n  - c"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/_gen_yaml_2shot",
    "chars": 690,
    "preview": "tag:\n  - acp_gen_2shot_with_pddl\n  - acp_bench_hard_with_pddl\ndataset_path: ibm-research/acp_bench\ntest_split: test\ndesc"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_grammar.lark",
    "chars": 383,
    "preview": "NAME: /[a-zA-Z][a-zA-Z0-9-_]*/\nLPAR : \"(\"\nRPAR : \")\"\nLSPAR: \"[\"\nRSPAR: \"]\"\nCOMMA: \",\"\nWS: /[ \\n]/\n\naction_none : \"None\"\n"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/acp_utils.py",
    "chars": 37056,
    "preview": "import json\nimport os\nfrom abc import ABC, abstractmethod\nfrom collections import defaultdict\nfrom pathlib import Path\n\n"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/act_reach.yaml",
    "chars": 10049,
    "preview": "task: acp_areach_gen_with_pddl\ndataset_name: acp_areach_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/app.yaml",
    "chars": 10370,
    "preview": "task: acp_app_gen_with_pddl\ndataset_name: acp_app_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samp"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/just.yaml",
    "chars": 11547,
    "preview": "task: acp_just_gen_with_pddl\ndataset_name: acp_just_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  sa"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/land.yaml",
    "chars": 9311,
    "preview": "task: acp_land_gen_with_pddl\ndataset_name: acp_land_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  sa"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/next_act.yaml",
    "chars": 10202,
    "preview": "task: acp_nexta_gen_with_pddl\ndataset_name: acp_nexta_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  "
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/prog.yaml",
    "chars": 9871,
    "preview": "task: acp_prog_gen_with_pddl\ndataset_name: acp_prog_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  sa"
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/reach.yaml",
    "chars": 8996,
    "preview": "task: acp_reach_gen_with_pddl\ndataset_name: acp_reach_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  "
  },
  {
    "path": "lm_eval/tasks/acpbench/gen_2shot_with_pddl/val.yaml",
    "chars": 10729,
    "preview": "task: acp_val_gen_with_pddl\ndataset_name: acp_val_gen\ninclude: _gen_yaml_2shot\nfewshot_config:\n  sampler: first_n\n  samp"
  },
  {
    "path": "lm_eval/tasks/acpbench/mcq_cot_2shot/_mcq_cot_2shot_yaml",
    "chars": 997,
    "preview": "tag:\n  - acp_mcq_cot_2shot\n  - acp_bench\noutput_type: generate_until\ndataset_path: ibm-research/acp_bench\ntest_split: te"
  },
  {
    "path": "lm_eval/tasks/acpbench/mcq_cot_2shot/act_reach.yaml",
    "chars": 2836,
    "preview": "task: acp_areach_mcq\ndataset_name: acp_areach_mcq\ninclude: _mcq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samp"
  },
  {
    "path": "lm_eval/tasks/acpbench/mcq_cot_2shot/app.yaml",
    "chars": 2545,
    "preview": "task: acp_app_mcq\ndataset_name: acp_app_mcq\ninclude: _mcq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samples:\n "
  },
  {
    "path": "lm_eval/tasks/acpbench/mcq_cot_2shot/just.yaml",
    "chars": 4556,
    "preview": "task: acp_just_mcq\ndataset_name: acp_just_mcq\ninclude: _mcq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samples:"
  },
  {
    "path": "lm_eval/tasks/acpbench/mcq_cot_2shot/land.yaml",
    "chars": 2394,
    "preview": "task: acp_land_mcq\ndataset_name: acp_land_mcq\ninclude: _mcq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samples:"
  },
  {
    "path": "lm_eval/tasks/acpbench/mcq_cot_2shot/prog.yaml",
    "chars": 2572,
    "preview": "task: acp_prog_mcq\ndataset_name: acp_prog_mcq\ninclude: _mcq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samples:"
  },
  {
    "path": "lm_eval/tasks/acpbench/mcq_cot_2shot/reach.yaml",
    "chars": 2712,
    "preview": "task: acp_reach_mcq\ndataset_name: acp_reach_mcq\ninclude: _mcq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  sample"
  },
  {
    "path": "lm_eval/tasks/acpbench/mcq_cot_2shot/val.yaml",
    "chars": 4240,
    "preview": "task: acp_val_mcq\ndataset_name: acp_val_mcq\ninclude: _mcq_cot_2shot_yaml\nfewshot_config:\n  sampler: first_n\n  samples:\n "
  },
  {
    "path": "lm_eval/tasks/aexams/README.md",
    "chars": 1896,
    "preview": "# Arabic EXAMS\n\n### Paper\n\nEXAMS: a resource specialized in multilingual high school exam questions.\nThe original paper "
  },
  {
    "path": "lm_eval/tasks/aexams/_aexams.yaml",
    "chars": 300,
    "preview": "group: aexams\ntask:\n  - aexams_Biology\n  - aexams_IslamicStudies\n  - aexams_Physics\n  - aexams_Science\n  - aexams_Social"
  },
  {
    "path": "lm_eval/tasks/aexams/_default_template_yaml",
    "chars": 479,
    "preview": "dataset_path: Hennara/aexams\ntest_split: test\nfewshot_split: dev\nfewshot_config:\n  sampler: first_n\noutput_type: multipl"
  },
  {
    "path": "lm_eval/tasks/aexams/aexams_Biology.yaml",
    "chars": 153,
    "preview": "\"dataset_name\": \"Biology\"\n\"description\": \"قم بالإجابة على مايلي في مجال العلوم الحيوية\\n\\n\"\n\"include\": \"_default_templat"
  },
  {
    "path": "lm_eval/tasks/aexams/aexams_IslamicStudies.yaml",
    "chars": 170,
    "preview": "\"dataset_name\": \"IslamicStudies\"\n\"description\": \"قم بالإجابة على مايلي في مجال العلوم الإسلامية \\n\\n\"\n\"include\": \"_defau"
  },
  {
    "path": "lm_eval/tasks/aexams/aexams_Physics.yaml",
    "chars": 148,
    "preview": "\"dataset_name\": \"Physics\"\n\"description\": \"قم بالإجابة على مايلي في مجال الفيزياء \\n\\n\"\n\"include\": \"_default_template_yam"
  },
  {
    "path": "lm_eval/tasks/aexams/aexams_Science.yaml",
    "chars": 146,
    "preview": "\"dataset_name\": \"Science\"\n\"description\": \"قم بالإجابة على مايلي في مجال العلوم \\n\\n\"\n\"include\": \"_default_template_yaml\""
  },
  {
    "path": "lm_eval/tasks/aexams/aexams_Social.yaml",
    "chars": 155,
    "preview": "\"dataset_name\": \"Social\"\n\"description\": \"قم بالإجابة على مايلي في مجال العلوم الإجتماعية \\n\\n\"\n\"include\": \"_default_temp"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/README.md",
    "chars": 2587,
    "preview": "# MathQA\n\n### Paper\n\nIrokoBench: A New Benchmark for African Languages in the Age of Large Language Models\nhttps://arxiv"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/afrimgsm.yaml",
    "chars": 282,
    "preview": "group: afrimgsm-irokobench\ntask:\n  - afrimgsm_tasks_prompt_1\n  - afrimgsm_tasks_prompt_2\n  - afrimgsm_tasks_prompt_3\n  -"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_amh.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: amh\ninclude: afrimgsm_yaml\ntask: afrimgsm_amh_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_eng.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: eng\ninclude: afrimgsm_yaml\ntask: afrimgsm_eng_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_ewe.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: ewe\ninclude: afrimgsm_yaml\ntask: afrimgsm_ewe_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_fra.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: fra\ninclude: afrimgsm_yaml\ntask: afrimgsm_fra_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_hau.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: hau\ninclude: afrimgsm_yaml\ntask: afrimgsm_hau_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_ibo.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: ibo\ninclude: afrimgsm_yaml\ntask: afrimgsm_ibo_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_kin.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: kin\ninclude: afrimgsm_yaml\ntask: afrimgsm_kin_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_lin.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: lin\ninclude: afrimgsm_yaml\ntask: afrimgsm_lin_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_lug.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: lug\ninclude: afrimgsm_yaml\ntask: afrimgsm_lug_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_orm.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: orm\ninclude: afrimgsm_yaml\ntask: afrimgsm_orm_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_sna.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: sna\ninclude: afrimgsm_yaml\ntask: afrimgsm_sna_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_sot.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: sot\ninclude: afrimgsm_yaml\ntask: afrimgsm_sot_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_swa.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: swa\ninclude: afrimgsm_yaml\ntask: afrimgsm_swa_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_twi.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: twi\ninclude: afrimgsm_yaml\ntask: afrimgsm_twi_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_vai.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: vai\ninclude: afrimgsm_yaml\ntask: afrimgsm_vai_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_wol.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: wol\ninclude: afrimgsm_yaml\ntask: afrimgsm_wol_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_xho.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: xho\ninclude: afrimgsm_yaml\ntask: afrimgsm_xho_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_yaml",
    "chars": 974,
    "preview": "tag:\n    - afrimgsm_tasks\n    - afrimgsm_tasks_prompt_1\ndataset_path: masakhane/afrimgsm\ndataset_name: null  # Overridde"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_yor.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: yor\ninclude: afrimgsm_yaml\ntask: afrimgsm_yor_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_1/afrimgsm_zul.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: zul\ninclude: afrimgsm_yaml\ntask: afrimgsm_zul_prompt_1\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_amh.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: amh\ninclude: afrimgsm_yaml\ntask: afrimgsm_amh_prompt_2\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_eng.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: eng\ninclude: afrimgsm_yaml\ntask: afrimgsm_eng_prompt_2\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_ewe.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: ewe\ninclude: afrimgsm_yaml\ntask: afrimgsm_ewe_prompt_2\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_fra.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: fra\ninclude: afrimgsm_yaml\ntask: afrimgsm_fra_prompt_2\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_hau.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: hau\ninclude: afrimgsm_yaml\ntask: afrimgsm_hau_prompt_2\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_ibo.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: ibo\ninclude: afrimgsm_yaml\ntask: afrimgsm_ibo_prompt_2\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_kin.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: kin\ninclude: afrimgsm_yaml\ntask: afrimgsm_kin_prompt_2\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_lin.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: lin\ninclude: afrimgsm_yaml\ntask: afrimgsm_lin_prompt_2\n"
  },
  {
    "path": "lm_eval/tasks/afrimgsm/direct/prompt_2/afrimgsm_lug.yaml",
    "chars": 93,
    "preview": "# Generated by utils.py\ndataset_name: lug\ninclude: afrimgsm_yaml\ntask: afrimgsm_lug_prompt_2\n"
  }
]

// ... and 15534 more files (download for full content)

About this extraction

This page contains the full source code of the EleutherAI/lm-evaluation-harness GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 15734 files (11.1 MB), approximately 3.9M tokens, and a symbol index with 4474 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo