Showing preview only (804K chars total). Download the full file or copy to clipboard to get everything.
Repository: Macaronlin/LLaMA3-Quantization
Branch: master
Commit: 3d3efe901763
Files: 133
Total size: 762.1 KB
Directory structure:
gitextract_2y0s78c7/
├── .gitignore
├── README.md
├── categories.py
├── datautils.py
├── gptq.py
├── irqlora.py
├── llama.py
├── lm_eval/
│ ├── __init__.py
│ ├── base.py
│ ├── datasets/
│ │ ├── README.md
│ │ ├── __init__.py
│ │ ├── asdiv/
│ │ │ ├── __init__.py
│ │ │ ├── asdiv.py
│ │ │ └── dataset_infos.json
│ │ ├── coqa/
│ │ │ ├── __init__.py
│ │ │ ├── coqa.py
│ │ │ └── dataset_infos.json
│ │ ├── drop/
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── drop.py
│ │ ├── headqa/
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── headqa.py
│ │ ├── hendrycks_ethics/
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── hendrycks_ethics.py
│ │ ├── hendrycks_math/
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── hendrycks_math.py
│ │ ├── logiqa/
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── logiqa.py
│ │ ├── mutual/
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── mutual.py
│ │ ├── pile/
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── pile.py
│ │ ├── quac/
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── quac.py
│ │ ├── sat_analogies/
│ │ │ ├── __init__.py
│ │ │ └── sat_analogies.py
│ │ ├── triviaqa/
│ │ │ ├── README.md
│ │ │ ├── __init__.py
│ │ │ ├── dataset_infos.json
│ │ │ └── triviaqa.py
│ │ └── unscramble/
│ │ ├── __init__.py
│ │ ├── dataset_infos.json
│ │ └── unscramble.py
│ ├── decontamination/
│ │ ├── __init__.py
│ │ ├── archiver.py
│ │ ├── decontaminate.py
│ │ └── janitor.py
│ ├── evaluator copy.py
│ ├── evaluator.py
│ ├── metrics.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── dummy.py
│ │ ├── gpt2.py
│ │ ├── gpt3.py
│ │ ├── huggingface.py
│ │ └── textsynth.py
│ ├── quantizer/
│ │ └── irqlora.py
│ ├── tasks/
│ │ ├── __init__.py
│ │ ├── anli.py
│ │ ├── arc.py
│ │ ├── arithmetic.py
│ │ ├── asdiv.py
│ │ ├── blimp.py
│ │ ├── cbt.py
│ │ ├── coqa.py
│ │ ├── crowspairs.py
│ │ ├── drop.py
│ │ ├── glue.py
│ │ ├── gsm8k.py
│ │ ├── headqa.py
│ │ ├── hellaswag.py
│ │ ├── hendrycks_ethics.py
│ │ ├── hendrycks_math.py
│ │ ├── hendrycks_test.py
│ │ ├── lambada.py
│ │ ├── lambada_cloze.py
│ │ ├── lambada_multilingual.py
│ │ ├── logiqa.py
│ │ ├── mathqa.py
│ │ ├── mc_taco.py
│ │ ├── mutual.py
│ │ ├── naturalqs.py
│ │ ├── openbookqa.py
│ │ ├── pile.py
│ │ ├── piqa.py
│ │ ├── prost.py
│ │ ├── pubmedqa.py
│ │ ├── qa4mre.py
│ │ ├── qasper.py
│ │ ├── quac.py
│ │ ├── race.py
│ │ ├── sat.py
│ │ ├── sciq.py
│ │ ├── squad.py
│ │ ├── storycloze.py
│ │ ├── superglue.py
│ │ ├── swag.py
│ │ ├── toxigen.py
│ │ ├── translation.py
│ │ ├── triviaqa.py
│ │ ├── truthfulqa.py
│ │ ├── unscramble.py
│ │ ├── webqs.py
│ │ ├── wikitext.py
│ │ ├── winogrande.py
│ │ └── wsc273.py
│ └── utils.py
├── main.py
├── models/
│ ├── IRQLoRALMClass.py
│ ├── LMClass.py
│ ├── int_falcon_layer.py
│ ├── int_llama_layer.py
│ ├── int_opt_layer.py
│ ├── models_utils.py
│ └── transformation.py
├── parallel_utils.py
├── quant/
│ ├── __init__.py
│ ├── int_linear.py
│ ├── int_matmul.py
│ ├── omni_norm.py
│ ├── omniquant.py
│ ├── quantizer.py
│ └── utils.py
├── scripts/
│ ├── eval_fake_ptq.sh
│ └── eval_irqlora_commonsenseqa.sh
└── utils.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*/__pycache__/*
*cache
================================================
FILE: README.md
================================================
# LLaMA3-Quantization
LLaMA3-Quantization is the official implementation of our paper How Good Are Low-bit Quantized LLAMA3 Models?
An Empirical Study [[PDF](https://arxiv.org/abs/2404.14047)]. Created by researchers from The University of Hong Kong, Beihang University and ETH Zürich.
## Introduction
Meta's LLaMa family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMa3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMa3's capabilities when quantized to low bit-width. This exploration holds the potential to unveil new insights and challenges for low-bit quantization of LLaMa3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression. Specifically, we evaluate the 10 existing post-training quantization and LoRA-finetuning methods of LLaMa3 on 1-8 bits and diverse datasets to comprehensively reveal LLaMa3's low-bit quantization performance. Our experiment results indicate that LLaMa3 still suffers non-negligent degradation in these scenarios, especially in ultra-low bit-width. This highlights the significant performance gap under low bit-width that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, pushing the LLMs to lower bit-width with higher accuracy for being practical. Our project is released on [https://github.com/Macaronlin/LLaMA3-Quantization](https://github.com/Macaronlin/LLaMA3-Quantization) and quantized LLaMa3 models are released in [https://huggingface.co/Efficient-ML](https://huggingface.co/Efficient-ML).

## Usage
We provide full script to evaluate various quantization methods in `./scripts/`. We use LLaMa-3-8B in IR-QLoRA method as an example here:
```shell
python main.py \
--model meta-llama/Meta-Llama-3-8B \
--peft LLMQ/LLaMA-3-8B-IR-QLoRA \
--tau_range 0.1 --tau_n 100--blocksize 256 \
--epochs 0 \
--output_dir ./log/llama-3-8b-irqlora \
--wbits 4 \
--tasks piqa,arc_easy,arc_challenge,hellaswag,winogrande
```
## Results
### Track1: Post-Training Quantization
- Evaluation results of post-training quantization on LLAMA3-8B model.

- Evaluation results of post-training quantization on LLAMA3-70B model.

### Track2: LoRA-FineTuning Quantization
- LoRA-FT on LLAMA3-8B with Alpaca dataset.

## Related Project
[QUIP](https://github.com/Cornell-RelaxML/QuIP)
[GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers](https://github.com/IST-DASLab/gptq)
[AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)
[AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://github.com/mit-han-lab/llm-awq)
[RPTQ: Reorder-Based Post-Training Quantization for Large Language Models](https://github.com/hahnyuan/RPTQ4LLM)
[OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models](https://github.com/OpenGVLab/OmniQuant)
[PB-LLM: Partially Binarized Large Language Models](https://github.com/hahnyuan/PB-LLM)
[BiLLM: Pushing the Limit of Post-Training Quantization for LLMs](https://github.com/Aaronhuang-778/BiLLM)
[SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://github.com/mit-han-lab/smoothquant)
[QLoRA: Efficient Finetuning of Quantized LLMs](https://github.com/artidoro/qlora)
[IR-QLoRA: Accurate LoRA-Finetuning Quantization of LLMs via Information Retention](https://github.com/htqin/IR-QLoRA)
<!-- ## Citation
If you use our OmniQuant approach in your research, please cite our paper:
```
``` -->
================================================
FILE: categories.py
================================================
subcategories = {
"abstract_algebra": ["math"],
"anatomy": ["health"],
"astronomy": ["physics"],
"business_ethics": ["business"],
"clinical_knowledge": ["health"],
"college_biology": ["biology"],
"college_chemistry": ["chemistry"],
"college_computer_science": ["computer science"],
"college_mathematics": ["math"],
"college_medicine": ["health"],
"college_physics": ["physics"],
"computer_security": ["computer science"],
"conceptual_physics": ["physics"],
"econometrics": ["economics"],
"electrical_engineering": ["engineering"],
"elementary_mathematics": ["math"],
"formal_logic": ["philosophy"],
"global_facts": ["other"],
"high_school_biology": ["biology"],
"high_school_chemistry": ["chemistry"],
"high_school_computer_science": ["computer science"],
"high_school_european_history": ["history"],
"high_school_geography": ["geography"],
"high_school_government_and_politics": ["politics"],
"high_school_macroeconomics": ["economics"],
"high_school_mathematics": ["math"],
"high_school_microeconomics": ["economics"],
"high_school_physics": ["physics"],
"high_school_psychology": ["psychology"],
"high_school_statistics": ["math"],
"high_school_us_history": ["history"],
"high_school_world_history": ["history"],
"human_aging": ["health"],
"human_sexuality": ["culture"],
"international_law": ["law"],
"jurisprudence": ["law"],
"logical_fallacies": ["philosophy"],
"machine_learning": ["computer science"],
"management": ["business"],
"marketing": ["business"],
"medical_genetics": ["health"],
"miscellaneous": ["other"],
"moral_disputes": ["philosophy"],
"moral_scenarios": ["philosophy"],
"nutrition": ["health"],
"philosophy": ["philosophy"],
"prehistory": ["history"],
"professional_accounting": ["other"],
"professional_law": ["law"],
"professional_medicine": ["health"],
"professional_psychology": ["psychology"],
"public_relations": ["politics"],
"security_studies": ["politics"],
"sociology": ["culture"],
"us_foreign_policy": ["politics"],
"virology": ["health"],
"world_religions": ["philosophy"],
}
categories = {
"STEM": ["physics", "chemistry", "biology", "computer science", "math", "engineering"],
"humanities": ["history", "philosophy", "law"],
"social sciences": ["politics", "culture", "economics", "geography", "psychology"],
"other (business, health, misc.)": ["other", "business", "health"],
}
================================================
FILE: datautils.py
================================================
import pdb
from transformers import AutoTokenizer
from datasets import load_dataset
import numpy as np
import torch
import random
def set_seed(seed):
np.random.seed(seed)
torch.random.manual_seed(seed)
def get_pile(nsamples, seed, seqlen, model):
print("get_pile")
traindata = load_dataset("json", data_files='/cpfs01/user/chenmengzhao/prompt_quantization/val.jsonl.zst', split="train")
tokenizer = AutoTokenizer.from_pretrained(model, use_fast=False)
trainenc = tokenizer("\n\n".join(traindata['text'][:1000]), return_tensors='pt')
random.seed(seed)
trainloader = []
for _ in range(nsamples):
i = random.randint(0, trainenc.input_ids.shape[1] - seqlen - 1)
j = i + seqlen
inp = trainenc.input_ids[:, i:j]
tar = inp.clone()
tar[:, :-1] = -100
trainloader.append((inp, tar))
return trainloader, None
def get_wikitext2(nsamples, seed, seqlen, model):
print("get_wikitext2")
traindata = load_dataset('wikitext', 'wikitext-2-raw-v1', split='train')
testdata = load_dataset('wikitext', 'wikitext-2-raw-v1', split='test')
tokenizer = AutoTokenizer.from_pretrained(model, use_fast=False)
trainenc = tokenizer("\n\n".join(traindata['text']), return_tensors='pt')
testenc = tokenizer("\n\n".join(testdata['text']), return_tensors='pt')
random.seed(seed)
trainloader = []
for _ in range(nsamples):
i = random.randint(0, trainenc.input_ids.shape[1] - seqlen - 1)
j = i + seqlen
inp = trainenc.input_ids[:, i:j]
tar = inp.clone()
tar[:, :-1] = -100
trainloader.append((inp, tar))
return trainloader, testenc
def get_ptb(nsamples, seed, seqlen, model):
print("get_ptb")
traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train')
valdata = load_dataset('ptb_text_only', 'penn_treebank', split='validation')
tokenizer = AutoTokenizer.from_pretrained(model, use_fast=False)
trainenc = tokenizer("\n\n".join(traindata['sentence']), return_tensors='pt')
testenc = tokenizer("\n\n".join(valdata['sentence']), return_tensors='pt')
random.seed(seed)
trainloader = []
for _ in range(nsamples):
i = random.randint(0, trainenc.input_ids.shape[1] - seqlen - 1)
j = i + seqlen
inp = trainenc.input_ids[:, i:j]
tar = inp.clone()
tar[:, :-1] = -100
trainloader.append((inp, tar))
return trainloader, testenc
def get_c4(nsamples, seed, seqlen, model):
print("get_c4")
traindata = load_dataset(
'allenai/c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train'
)
valdata = load_dataset(
'allenai/c4', data_files={'validation': 'en/c4-validation.00000-of-00008.json.gz'}, split='validation'
)
tokenizer = AutoTokenizer.from_pretrained(model, use_fast=False)
random.seed(seed)
trainloader = []
for _ in range(nsamples):
while True:
i = random.randint(0, len(traindata) - 1)
trainenc = tokenizer(traindata[i]['text'], return_tensors='pt')
if trainenc.input_ids.shape[1] >= seqlen:
break
i = random.randint(0, trainenc.input_ids.shape[1] - seqlen - 1)
j = i + seqlen
inp = trainenc.input_ids[:, i:j]
tar = inp.clone()
tar[:, :-1] = -100
trainloader.append((inp, tar))
random.seed(0)
valenc = []
for _ in range(256):
while True:
i = random.randint(0, len(valdata) - 1)
tmp = tokenizer(valdata[i]['text'], return_tensors='pt')
if tmp.input_ids.shape[1] >= seqlen:
break
i = random.randint(0, tmp.input_ids.shape[1] - seqlen - 1)
j = i + seqlen
valenc.append(tmp.input_ids[:, i:j])
valenc = torch.hstack(valenc)
return trainloader, valenc
def get_ptb_new(nsamples, seed, seqlen, model):
print("get_ptb_new")
traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train')
testdata = load_dataset('ptb_text_only', 'penn_treebank', split='test')
tokenizer = AutoTokenizer.from_pretrained(model, use_fast=False)
trainenc = tokenizer(" ".join(traindata["sentence"]), return_tensors="pt")
testenc = tokenizer(" ".join(testdata ["sentence"]), return_tensors="pt")
random.seed(seed)
trainloader = []
for _ in range(nsamples):
i = random.randint(0, trainenc.input_ids.shape[1] - seqlen - 1)
j = i + seqlen
inp = trainenc.input_ids[:, i:j]
tar = inp.clone()
tar[:, :-1] = -100
trainloader.append((inp, tar))
return trainloader, testenc
def get_c4_new(nsamples, seed, seqlen, model):
print("get_c4_new")
traindata = load_dataset(
'allenai/c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train'
)
valdata = load_dataset(
'allenai/c4', data_files={'validation': 'en/c4-validation.00000-of-00008.json.gz'}, split='validation'
)
tokenizer = AutoTokenizer.from_pretrained(model, use_fast=False)
random.seed(seed)
trainloader = []
for _ in range(nsamples):
while True:
i = random.randint(0, len(traindata) - 1)
trainenc = tokenizer(traindata[i]["text"], return_tensors="pt")
if trainenc.input_ids.shape[1] >= seqlen:
break
i = random.randint(0, trainenc.input_ids.shape[1] - seqlen - 1)
j = i + seqlen
inp = trainenc.input_ids[:, i:j]
tar = inp.clone()
tar[:, :-1] = -100
trainloader.append((inp, tar))
valenc = tokenizer(" ".join(valdata[:1100]["text"]), return_tensors="pt")
valenc = valenc.input_ids[:, : (256 * seqlen)]
return trainloader, valenc
def get_loaders(
name, nsamples=128, seed=0, seqlen=2048, model='',
):
if 'wikitext2' in name:
return get_wikitext2(nsamples, seed, seqlen, model)
if 'pile' in name:
return get_pile(nsamples, seed, seqlen, model)
if 'ptb' in name:
if 'new' in name:
return get_ptb_new(nsamples, seed, seqlen, model)
return get_ptb(nsamples, seed, seqlen, model)
if 'c4' in name:
if 'new' in name:
return get_c4_new(nsamples, seed, seqlen, model)
return get_c4(nsamples, seed, seqlen, model)
if 'mix' in name:
wiki_train,wiki_val=get_wikitext2(nsamples//3, seed, seqlen, model)
ptb_train,ptb_val=get_ptb(nsamples//3, seed, seqlen, model)
c4_train,c4_val=get_c4(nsamples//3, seed, seqlen, model)
train=wiki_train+ptb_train+c4_train
val=None
return train,val
================================================
FILE: gptq.py
================================================
import math
import time
import torch
import torch.nn as nn
import transformers
import quant
from texttable import Texttable
from utils import torch_snr_error
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.allow_tf32 = False
class Observer:
def __init__(self, topk=32):
self.loss_list = []
self.topk = topk
def submit(self, name: str, layerid: int, gptq, error: float):
item = (name, layerid, {'gptq': gptq, 'error': error})
if len(self.loss_list) < self.topk:
self.loss_list.append(item)
return
min_error = error
min_idx = -1
for idx, data in enumerate(self.loss_list):
if min_error > data[2]['error']:
min_idx = idx
min_error = data[2]['error']
if min_idx >= 0:
self.loss_list[min_idx] = item
def print(self):
self.loss_list = sorted(self.loss_list, key=lambda s: s[2]['error'], reverse=True)
table = Texttable()
table.header(['name', 'error'])
table.set_cols_dtype(['t', 'f'])
for item in self.loss_list:
table.add_row([f"{item[0]}.{item[1]}", item[2]['error']])
print(table.draw())
print('\n')
def items(self):
return self.loss_list
class GPTQ:
def __init__(self, layer, observe=False):
self.layer = layer
self.dev = self.layer.weight.device
W = layer.weight.data.clone()
if isinstance(self.layer, nn.Conv2d):
W = W.flatten(1)
if isinstance(self.layer, transformers.Conv1D):
W = W.t()
self.rows = W.shape[0]
self.columns = W.shape[1]
self.H = torch.zeros((self.columns, self.columns), device=self.dev)
self.nsamples = 0
self.quantizer = quant.Quantizer()
self.observe = observe
def add_batch(self, inp, out):
# Hessian H = 2 X XT + λ I
if self.observe:
self.inp1 = inp
self.out1 = out
else:
self.inp1 = None
self.out1 = None
if len(inp.shape) == 2:
inp = inp.unsqueeze(0)
tmp = inp.shape[0]
if isinstance(self.layer, nn.Linear) or isinstance(self.layer, transformers.Conv1D):
if len(inp.shape) == 3:
inp = inp.reshape((-1, inp.shape[-1]))
inp = inp.t()
if isinstance(self.layer, nn.Conv2d):
unfold = nn.Unfold(self.layer.kernel_size, dilation=self.layer.dilation, padding=self.layer.padding, stride=self.layer.stride)
inp = unfold(inp)
inp = inp.permute([1, 0, 2])
inp = inp.flatten(1)
self.H *= self.nsamples / (self.nsamples + tmp)
self.nsamples += tmp
# inp = inp.float()
inp = math.sqrt(2 / self.nsamples) * inp.float()
# self.H += 2 / self.nsamples * inp.matmul(inp.t())
self.H += inp.matmul(inp.t())
def print_loss(self, name, q_weight, weight_error, timecost):
table = Texttable()
name += ' ' * (16 - len(name))
table.header(['name', 'weight_error', 'fp_inp_SNR', 'q_inp_SNR', 'time'])
# assign weight
self.layer.weight.data = q_weight.reshape(self.layer.weight.shape).to(self.layer.weight.data.dtype)
if self.inp1 is not None:
# quantize input to int8
quantizer = quant.Quantizer()
quantizer.configure(8, perchannel=False, sym=True, mse=False)
quantizer.find_params(self.inp1)
q_in = quantizer.quantize(self.inp1).type(torch.float16)
q_out = self.layer(q_in)
# get kinds of SNR
q_SNR = torch_snr_error(q_out, self.out1).item()
fp_SNR = torch_snr_error(self.layer(self.inp1), self.out1).item()
else:
q_SNR = '-'
fp_SNR = '-'
table.add_row([name, weight_error, fp_SNR, q_SNR, timecost])
print(table.draw().split('\n')[-2])
def fasterquant(self, blocksize=128, percdamp=.01, groupsize=-1, actorder=False, name=''):
self.layer.to(self.dev)
W = self.layer.weight.data.clone()
if blocksize == -1:
blocksize = W.shape[1]
print(blocksize)
if isinstance(self.layer, nn.Conv2d):
W = W.flatten(1)
if isinstance(self.layer, transformers.Conv1D):
W = W.t()
W = W.float()
tick = time.time()
if not self.quantizer.ready():
self.quantizer.find_params(W, weight=True)
H = self.H
if not self.observe:
del self.H
dead = torch.diag(H) == 0
H[dead, dead] = 1
W[:, dead] = 0
if actorder:
perm = torch.argsort(torch.diag(H), descending=True)
W = W[:, perm]
H = H[perm][:, perm]
Losses = torch.zeros_like(W)
Q = torch.zeros_like(W)
damp = percdamp * torch.mean(torch.diag(H))
diag = torch.arange(self.columns, device=self.dev)
H[diag, diag] += damp
H = torch.linalg.cholesky(H)
H = torch.cholesky_inverse(H)
H = torch.linalg.cholesky(H, upper=True)
Hinv = H
g_idx = []
scale = []
zero = []
now_idx = 1
for i1 in range(0, self.columns, blocksize):
i2 = min(i1 + blocksize, self.columns)
count = i2 - i1
W1 = W[:, i1:i2].clone()
Q1 = torch.zeros_like(W1)
Err1 = torch.zeros_like(W1)
Losses1 = torch.zeros_like(W1)
Hinv1 = Hinv[i1:i2, i1:i2]
for i in range(count):
w = W1[:, i]
d = Hinv1[i, i]
if groupsize != -1:
if (i1 + i) % groupsize == 0:
self.quantizer.find_params(W[:, (i1 + i):(i1 + i + groupsize)], weight=True)
if ((i1 + i) // groupsize) - now_idx == -1:
scale.append(self.quantizer.scale)
zero.append(self.quantizer.zero)
now_idx += 1
q = self.quantizer.quantize(w.unsqueeze(1)).flatten()
Q1[:, i] = q
Losses1[:, i] = (w - q)**2 / d**2
err1 = (w - q) / d
W1[:, i:] -= err1.unsqueeze(1).matmul(Hinv1[i, i:].unsqueeze(0))
Err1[:, i] = err1
Q[:, i1:i2] = Q1
Losses[:, i1:i2] = Losses1 / 2
W[:, i2:] -= Err1.matmul(Hinv[i1:i2, i2:])
torch.cuda.synchronize()
error = torch.sum(Losses).item()
groupsize = groupsize if groupsize != -1 else self.columns
g_idx = [i // groupsize for i in range(self.columns)]
g_idx = torch.tensor(g_idx, dtype=torch.int32, device=Q.device)
if actorder:
invperm = torch.argsort(perm)
Q = Q[:, invperm]
g_idx = g_idx[invperm]
if isinstance(self.layer, transformers.Conv1D):
Q = Q.t()
self.print_loss(name=name, q_weight=Q, weight_error=error, timecost=(time.time() - tick))
if scale == []:
scale.append(self.quantizer.scale)
zero.append(self.quantizer.zero)
scale = torch.cat(scale, dim=1)
zero = torch.cat(zero, dim=1)
return scale, zero, g_idx, error
def free(self):
self.inp1 = None
self.out1 = None
self.H = None
self.Losses = None
self.Trace = None
torch.cuda.empty_cache()
================================================
FILE: irqlora.py
================================================
from tqdm import tqdm
import peft
import torch
import operator
import numpy as np
import bitsandbytes as bnb
from peft.tuners.lora import LoraLayer
from functools import reduce # Required in Python 3
import bitsandbytes.functional as bnb_F
from torch import Tensor
from scipy.stats import norm
from bitsandbytes.functional import create_fp8_map, create_dynamic_map
cache_folder_path = ''
module_num = 0
sigma = 1 / norm.ppf(torch.linspace(0.9677083, 0.5, 9)[:-1]).tolist()[0]
def replace_to_qlora_model(model, model_fp, blocksize2=256, tau_range=0.1, tau_n=100):
model.model = _replace_with_ours_lora_4bit_linear(model.model, model_fp=model_fp, blocksize2=blocksize2, tau_range=tau_range, tau_n=tau_n)
return model
def prod(iterable):
return reduce(operator.mul, iterable, 1)
normal_map_fp8 = create_dynamic_map()
def quantize_tensor(X, L, idx=False):
L = L.to(X.device)
X_shape = X.shape
X_expanded = X.reshape(-1, 1)
L_reshaped = L.reshape(1, -1)
abs_diff = torch.abs(X_expanded - L_reshaped)
min_index = torch.argmin(abs_diff, dim=-1)
min_index = torch.tensor(min_index, dtype=torch.uint8).to(L.device).reshape(X_shape)
return min_index
def dequantize_tensor(X, L):
L = L.to(X.device)
return torch.index_select(L, dim=0, index=torch.as_tensor(X, dtype=torch.int32).reshape(-1)).reshape(X.shape)
@torch.no_grad()
def nf4_quant(weight, weight_shape, tau, compress_statistics, quant_type, device):
weight = weight.reshape(-1, 256, 64).to(device)
tau = tau.reshape(-1, 256, 1).to(device)
_weight = (weight - tau).reshape(weight_shape)
nf4_weight = bnb.nn.Params4bit(_weight, requires_grad=False, compress_statistics=compress_statistics, quant_type=quant_type).cuda(0)
tau2 = tau.abs().max(dim=1, keepdim=True)[0]
tau1 = quantize_tensor(tau / tau2, normal_map_fp8)
return nf4_weight, tau1.reshape(-1, 256), tau2.reshape(-1, 1)
@torch.no_grad()
def evaluate_entropy(weight_int8, blocksize):
device = weight_int8.device
_weight_int8 = weight_int8.reshape(-1, 1)
weight_nf4 = torch.cat((_weight_int8//16, _weight_int8%16), 1).reshape(1, -1, blocksize)
weight_nf4_repeat = weight_nf4.repeat(16, 1, 1).to(device)
values = torch.tensor(range(16)).reshape(16, 1, 1).to(device)
freqs = (weight_nf4_repeat==values).sum(dim=-1, keepdim=True) / blocksize
entropy = -freqs * torch.log2(freqs)
entropy = torch.where(torch.isnan(entropy), 0, entropy)
entropy = entropy.sum(dim=0)
return entropy
@torch.no_grad()
def search(fp_weight: Tensor, fp_weight_shape, compress_statistics, quant_type, device, tau_range=0.1, tau_n=51, blocksize=64, blocksize2=256):
fp_weight = fp_weight.reshape(-1, blocksize2, blocksize).to(device)
tau0 = fp_weight.median(2, keepdim=True)[0] # [-1, 256, 1]
absmax = (fp_weight - tau0).abs().max(2, keepdim=True)[0]
entropy_max, factor_best = None, None
for factor in tqdm(np.linspace(-tau_range*sigma, tau_range*sigma, tau_n*2+1)):
tau = factor * absmax + tau0
nf4_weight, _, _ = nf4_quant(fp_weight, fp_weight_shape, tau, compress_statistics, quant_type, device)
entropy = evaluate_entropy(nf4_weight, blocksize)
if entropy_max is None:
entropy_max = entropy
factor_best = torch.full_like(entropy, factor)
else:
factor_best = torch.where(entropy > entropy_max, factor, factor_best)
entropy_max = torch.max(entropy_max, entropy)
tau = factor_best.reshape(-1, 256, 1) * absmax + tau0
nf4_weight, tau1, tau2 = nf4_quant(fp_weight, fp_weight_shape, tau, compress_statistics, quant_type, device)
return nf4_weight, tau1, tau2
class IRQLoraLinear4bit(bnb.nn.Linear4bit, LoraLayer):
def __init__(
self, old_model, model_fp=None, blocksize2=256, tau_range=0.1, tau_n=51
):
for key, value in old_model.__dict__.items():
setattr(self, key, value)
fp_weight = model_fp.weight.data.contiguous().to('cpu')
fp_weight_shape = fp_weight.shape
compress_statistics, quant_type, device = self.base_layer.weight.compress_statistics, self.base_layer.weight.quant_type, self.base_layer.weight.device
del self.base_layer.weight, model_fp
torch.cuda.empty_cache()
self.base_layer.weight, self.base_layer.tau_quant, self.base_layer.tau_absmax = search(
fp_weight=fp_weight,
fp_weight_shape=fp_weight_shape,
compress_statistics=compress_statistics,
quant_type=quant_type,
device=device,
tau_range=tau_range, tau_n=tau_n,
blocksize2=blocksize2
)
self.base_layer.tau_quant = self.base_layer.tau_quant.to(device)
self.base_layer.tau_absmax = self.base_layer.tau_absmax.to(device)
del fp_weight
torch.cuda.empty_cache()
self.lora_default_A_scale = torch.nn.Parameter(torch.zeros([1], dtype=self.lora_A.default.weight.dtype).to(self.base_layer.weight.device), requires_grad=True)
self.lora_default_B_scale = torch.nn.Parameter(torch.zeros([1], dtype=self.lora_A.default.weight.dtype).to(self.base_layer.weight.device), requires_grad=True)
def forward(self, x: torch.Tensor):
if self.base_layer.bias is not None and self.base_layer.bias.dtype != x.dtype:
self.base_layer.bias.data = self.base_layer.bias.data.to(x.dtype)
if getattr(self.base_layer.weight, 'quant_state', None) is None:
print('FP4 quantization state not initialized. Please call .cuda() or .to(device) on the LinearFP4 layer first.')
inp_dtype = x.dtype
if self.base_layer.compute_dtype is not None:
x = x.to(self.base_layer.compute_dtype)
bias = None if self.base_layer.bias is None else self.base_layer.bias.to(self.base_layer.compute_dtype)
with torch.no_grad():
fp_B = bnb_F.dequantize_fp4(self.base_layer.weight, self.base_layer.weight.quant_state).to(x.dtype)
tau = (dequantize_tensor(self.base_layer.tau_quant, normal_map_fp8).reshape(-1, 256, 1) * self.base_layer.tau_absmax.reshape(-1, 1, 1)).to(fp_B.device)
blocksize = torch.prod(torch.tensor(fp_B.shape)) / torch.prod(torch.tensor(tau.shape))
fp_B = (fp_B.reshape(-1, blocksize.int().item()) + tau.reshape(-1, 1)).reshape(fp_B.shape).to(x.dtype)
out = torch.nn.functional.linear(x, fp_B, bias)
out = out.to(inp_dtype)
result = out
if self.disable_adapters or self.active_adapter[0] not in self.lora_A.keys():
return result
elif self.r[self.active_adapter[0]] > 0:
result = result.clone()
if not torch.is_autocast_enabled():
expected_dtype = result.dtype
x = x.to(self.lora_A[self.active_adapter[0]].weight.dtype)
x = self.lora_A[self.active_adapter[0]](self.lora_dropout[self.active_adapter[0]](x)) + self.lora_default_A_scale * x.reshape([_ for _ in x.shape[:-1]] + [self.lora_A[self.active_adapter[0]].out_features] + [-1]).mean(dim=-1)
x = (self.lora_B[self.active_adapter[0]](x).reshape([_ for _ in x.shape] + [-1]) + self.lora_default_B_scale * x.unsqueeze(-1)).reshape([_ for _ in x.shape[:-1]] + [-1])
output = x.to(expected_dtype) * self.scaling[self.active_adapter[0]]
else:
x = self.lora_A[self.active_adapter[0]](self.lora_dropout[self.active_adapter[0]](x)) + self.lora_default_A_scale * x.reshape([_ for _ in x.shape[:-1]] + [self.lora_A[self.active_adapter[0]].out_features] + [-1]).mean(dim=-1)
x = (self.lora_B[self.active_adapter[0]](x).reshape([_ for _ in x.shape] + [-1]) + self.lora_default_B_scale * x.unsqueeze(-1)).reshape([_ for _ in x.shape[:-1]] + [-1])
output = x * self.scaling[self.active_adapter[0]]
result += output
return result
def _replace_with_ours_lora_4bit_linear(
model, current_key_name=None, model_fp=None, blocksize2=256, tau_range=0.5, tau_n=51
):
for name, module in model.named_children():
if current_key_name is None:
current_key_name = []
current_key_name.append(name)
if isinstance(module, peft.tuners.lora.Linear4bit):
model._modules[name] = IRQLoraLinear4bit(model._modules[name], model_fp=model_fp._modules[name], blocksize2=blocksize2, tau_range=tau_range, tau_n=tau_n)
if len(list(module.children())) > 0:
if name in model_fp._modules:
_ = _replace_with_ours_lora_4bit_linear(
module,
current_key_name, model_fp._modules[name], blocksize2, tau_range, tau_n
)
else:
_ = _replace_with_ours_lora_4bit_linear(
module,
current_key_name, None, blocksize2, tau_range, tau_n
)
current_key_name.pop(-1)
return model
================================================
FILE: llama.py
================================================
import argparse
import time
import numpy as np
import torch
import torch.nn as nn
import quant
from gptq import GPTQ, Observer
from utils import find_layers, DEV, set_seed, get_wikitext2, get_ptb, get_c4, get_ptb_new, get_c4_new, get_loaders, export_quant_table, gen_conditions
from texttable import Texttable
def get_llama(model):
def skip(*args, **kwargs):
pass
torch.nn.init.kaiming_uniform_ = skip
torch.nn.init.uniform_ = skip
torch.nn.init.normal_ = skip
from transformers import LlamaForCausalLM
model = LlamaForCausalLM.from_pretrained(model, torch_dtype=torch.float16)
model.seqlen = 2048
return model
@torch.no_grad()
def llama_sequential(model, dataloader, dev):
print('Starting ...')
use_cache = model.config.use_cache
model.config.use_cache = False
layers = model.model.layers
model.model.embed_tokens = model.model.embed_tokens.to(dev)
model.model.norm = model.model.norm.to(dev)
layers[0] = layers[0].to(dev)
dtype = next(iter(model.parameters())).dtype
inps = torch.zeros((args.nsamples, model.seqlen, model.config.hidden_size), dtype=dtype, device=dev)
cache = {'i': 0, 'attention_mask': None}
class Catcher(nn.Module):
def __init__(self, module):
super().__init__()
self.module = module
def forward(self, inp, **kwargs):
inps[cache['i']] = inp
cache['i'] += 1
cache['attention_mask'] = kwargs['attention_mask']
cache['position_ids'] = kwargs['position_ids']
raise ValueError
layers[0] = Catcher(layers[0])
for batch in dataloader:
try:
model(batch[0].to(dev))
except ValueError:
pass
layers[0] = layers[0].module
layers[0] = layers[0].cpu()
model.model.embed_tokens = model.model.embed_tokens.cpu()
model.model.norm = model.model.norm.cpu()
torch.cuda.empty_cache()
outs = torch.zeros_like(inps)
attention_mask = cache['attention_mask']
position_ids = cache['position_ids']
print('Ready.')
quantizers = {}
observer = Observer()
for i in range(len(layers)):
print(f'Quantizing layer {i+1}/{len(layers)}..')
print('+------------------+--------------+------------+-----------+-------+')
print('| name | weight_error | fp_inp_SNR | q_inp_SNR | time |')
print('+==================+==============+============+===========+=======+')
layer = layers[i].to(dev)
full = find_layers(layer)
if args.true_sequential:
sequential = [['self_attn.k_proj', 'self_attn.v_proj', 'self_attn.q_proj'], ['self_attn.o_proj'], ['mlp.up_proj', 'mlp.gate_proj'], ['mlp.down_proj']]
else:
sequential = [list(full.keys())]
for names in sequential:
subset = {n: full[n] for n in names}
gptq = {}
for name in subset:
gptq[name] = GPTQ(subset[name], observe=args.observe)
gptq[name].quantizer.configure(args.wbits, perchannel=True, sym=args.sym, mse=False)
def add_batch(name):
def tmp(_, inp, out):
gptq[name].add_batch(inp[0].data, out.data)
return tmp
handles = []
for name in subset:
handles.append(subset[name].register_forward_hook(add_batch(name)))
for j in range(args.nsamples):
outs[j] = layer(inps[j].unsqueeze(0), attention_mask=attention_mask, position_ids=position_ids)[0]
for h in handles:
h.remove()
for name in subset:
scale, zero, g_idx, error = gptq[name].fasterquant(blocksize=args.blocksize, percdamp=args.percdamp, groupsize=args.groupsize, actorder=args.act_order, name=name)
quantizers['model.layers.%d.%s' % (i, name)] = (gptq[name].quantizer.cpu(), scale.cpu(), zero.cpu(), g_idx.cpu(), args.wbits, args.groupsize)
if args.observe:
observer.submit(name=name, layerid=i, gptq=gptq[name], error=error)
else:
gptq[name].free()
for j in range(args.nsamples):
outs[j] = layer(inps[j].unsqueeze(0), attention_mask=attention_mask, position_ids=position_ids)[0]
layers[i] = layer.cpu()
del layer
del gptq
torch.cuda.empty_cache()
inps, outs = outs, inps
print('+------------------+--------------+------------+-----------+-------+')
print('\n')
if args.observe:
observer.print()
conditions = gen_conditions(args.wbits, args.groupsize)
for item in observer.items():
name = item[0]
layerid = item[1]
gptq = item[2]['gptq']
error = item[2]['error']
target = error / 2
table = Texttable()
table.header(['wbits', 'groupsize', 'error'])
table.set_cols_dtype(['i', 'i', 'f'])
table.add_row([args.wbits, args.groupsize, error])
print('Optimizing {} {} ..'.format(name, layerid))
for wbits, groupsize in conditions:
if error < target:
# if error dropped 50%, skip
break
gptq.quantizer.configure(wbits, perchannel=True, sym=args.sym, mse=False)
scale, zero, g_idx, error = gptq.fasterquant(percdamp=args.percdamp, groupsize=groupsize, actorder=args.act_order, name=name)
table.add_row([wbits, groupsize, error])
quantizers['model.layers.%d.%s' % (layerid, name)] = (gptq.quantizer.cpu(), scale.cpu(), zero.cpu(), g_idx.cpu(), wbits, groupsize)
print(table.draw())
print('\n')
gptq.layer.to('cpu')
gptq.free()
model.config.use_cache = use_cache
return quantizers
@torch.no_grad()
def llama_eval(model, testenc, dev):
print('Evaluating ...')
testenc = testenc.input_ids
nsamples = testenc.numel() // model.seqlen
use_cache = model.config.use_cache
model.config.use_cache = False
layers = model.model.layers
model.model.embed_tokens = model.model.embed_tokens.to(dev)
layers[0] = layers[0].to(dev)
dtype = next(iter(model.parameters())).dtype
inps = torch.zeros((nsamples, model.seqlen, model.config.hidden_size), dtype=dtype, device=dev)
cache = {'i': 0, 'attention_mask': None}
class Catcher(nn.Module):
def __init__(self, module):
super().__init__()
self.module = module
def forward(self, inp, **kwargs):
inps[cache['i']] = inp
cache['i'] += 1
cache['attention_mask'] = kwargs['attention_mask']
cache['position_ids'] = kwargs['position_ids']
raise ValueError
layers[0] = Catcher(layers[0])
for i in range(nsamples):
batch = testenc[:, (i * model.seqlen):((i + 1) * model.seqlen)].to(dev)
try:
model(batch)
except ValueError:
pass
layers[0] = layers[0].module
layers[0] = layers[0].cpu()
model.model.embed_tokens = model.model.embed_tokens.cpu()
torch.cuda.empty_cache()
outs = torch.zeros_like(inps)
attention_mask = cache['attention_mask']
position_ids = cache['position_ids']
for i in range(len(layers)):
print(i)
layer = layers[i].to(dev)
if args.nearest:
subset = find_layers(layer)
for name in subset:
quantizer = quant.Quantizer()
quantizer.configure(args.wbits, perchannel=True, sym=args.sym, mse=False)
W = subset[name].weight.data
quantizer.find_params(W, weight=True)
subset[name].weight.data = quantizer.quantize(W).to(next(iter(layer.parameters())).dtype)
for j in range(nsamples):
outs[j] = layer(inps[j].unsqueeze(0), attention_mask=attention_mask, position_ids=position_ids)[0]
layers[i] = layer.cpu()
del layer
torch.cuda.empty_cache()
inps, outs = outs, inps
if model.model.norm is not None:
model.model.norm = model.model.norm.to(dev)
model.lm_head = model.lm_head.to(dev)
testenc = testenc.to(dev)
nlls = []
for i in range(nsamples):
hidden_states = inps[i].unsqueeze(0)
if model.model.norm is not None:
hidden_states = model.model.norm(hidden_states)
lm_logits = model.lm_head(hidden_states)
shift_logits = lm_logits[:, :-1, :].contiguous()
shift_labels = testenc[:, (i * model.seqlen):((i + 1) * model.seqlen)][:, 1:]
loss_fct = nn.CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
neg_log_likelihood = loss.float() * model.seqlen
nlls.append(neg_log_likelihood)
ppl = torch.exp(torch.stack(nlls).sum() / (nsamples * model.seqlen))
print(ppl.item())
model.config.use_cache = use_cache
# TODO: perform packing on GPU
def llama_pack(model, quantizers, wbits, groupsize):
layers = find_layers(model)
layers = {n: layers[n] for n in quantizers}
quant.make_quant_linear(model, quantizers, wbits, groupsize)
qlayers = find_layers(model, [quant.QuantLinear])
print('Packing ...')
for name in qlayers:
print(name)
quantizers[name], scale, zero, g_idx, _, _ = quantizers[name]
qlayers[name].pack(layers[name], scale, zero, g_idx)
print('Done.')
return model
def load_quant(model, checkpoint, wbits, groupsize=-1, fused_mlp=True, eval=True, warmup_autotune=True):
from transformers import LlamaConfig, LlamaForCausalLM, modeling_utils
config = LlamaConfig.from_pretrained(model)
def noop(*args, **kwargs):
pass
torch.nn.init.kaiming_uniform_ = noop
torch.nn.init.uniform_ = noop
torch.nn.init.normal_ = noop
torch.set_default_dtype(torch.half)
modeling_utils._init_weights = False
torch.set_default_dtype(torch.half)
model = LlamaForCausalLM(config)
torch.set_default_dtype(torch.float)
if eval:
model = model.eval()
layers = find_layers(model)
for name in ['lm_head']:
if name in layers:
del layers[name]
quant.make_quant_linear(model, layers, wbits, groupsize)
del layers
print('Loading model ...')
if checkpoint.endswith('.safetensors'):
from safetensors.torch import load_file as safe_load
model.load_state_dict(safe_load(checkpoint))
else:
model.load_state_dict(torch.load(checkpoint))
if eval:
quant.make_quant_attn(model)
quant.make_quant_norm(model)
if fused_mlp:
quant.make_fused_mlp(model)
if warmup_autotune:
quant.autotune_warmup_linear(model, transpose=not (eval))
if eval and fused_mlp:
quant.autotune_warmup_fused(model)
model.seqlen = 2048
print('Done.')
return model
def llama_multigpu(model, gpus, gpu_dist):
model.model.embed_tokens = model.model.embed_tokens.to(gpus[0])
if hasattr(model.model, 'norm') and model.model.norm:
model.model.norm = model.model.norm.to(gpus[0])
import copy
model.lm_head = copy.deepcopy(model.lm_head).to(gpus[0])
cache = {'mask': None, 'position_ids': None}
class MoveModule(nn.Module):
def __init__(self, module, invalidate_cache):
super().__init__()
self.module = module
self.dev = next(iter(self.module.parameters())).device
self.invalidate_cache=invalidate_cache
def forward(self, *inp, **kwargs):
inp = list(inp)
if inp[0].device != self.dev:
inp[0] = inp[0].to(self.dev)
if cache['mask'] is None or cache['mask'].device != self.dev or self.invalidate_cache:
cache['mask'] = kwargs['attention_mask'].to(self.dev)
kwargs['attention_mask'] = cache['mask']
if cache['position_ids'] is None or cache['position_ids'].device != self.dev or self.invalidate_cache:
cache['position_ids'] = kwargs['position_ids'].to(self.dev)
kwargs['position_ids'] = cache['position_ids']
tmp = self.module(*inp, **kwargs)
return tmp
layers = model.model.layers
from math import ceil
if not gpu_dist:
pergpu = ceil(len(layers) / len(gpus))
for i in range(len(layers)):
layers[i] = MoveModule(layers[i].to(0 if i == 0 or i == len(layers) -1 else gpus[(i-1) // pergpu]), i==0)
else:
assert gpu_dist[0] >= 2, "At least two layers must be on GPU 0."
assigned_gpus = [0] * (gpu_dist[0]-1)
for i in range(1, len(gpu_dist)):
assigned_gpus = assigned_gpus + [i] * gpu_dist[i]
remaining_assignments = len(layers)-len(assigned_gpus) - 1
if remaining_assignments > 0:
assigned_gpus = assigned_gpus + [-1] * remaining_assignments
assigned_gpus = assigned_gpus + [0]
for i in range(len(layers)):
layers[i] = MoveModule(layers[i].to(gpus[assigned_gpus[i]]), i==0)
model.gpus = gpus
def benchmark(model, input_ids, check=False):
input_ids = input_ids.to(model.gpus[0] if hasattr(model, 'gpus') else DEV)
torch.cuda.synchronize()
cache = {'past': None}
def clear_past(i):
def tmp(layer, inp, out):
if cache['past']:
cache['past'][i] = None
return tmp
for i, layer in enumerate(model.model.layers):
layer.register_forward_hook(clear_past(i))
print('Benchmarking ...')
if check:
loss = nn.CrossEntropyLoss()
tot = 0.
def sync():
if hasattr(model, 'gpus'):
for gpu in model.gpus:
torch.cuda.synchronize(gpu)
else:
torch.cuda.synchronize()
max_memory = 0
with torch.no_grad():
attention_mask = torch.ones((1, input_ids.numel()), device=DEV)
times = []
for i in range(input_ids.numel()):
tick = time.time()
out = model(input_ids[:, i:i + 1], past_key_values=cache['past'], attention_mask=attention_mask[:, :(i + 1)].reshape((1, -1)))
sync()
times.append(time.time() - tick)
print(i, times[-1])
if hasattr(model, 'gpus'):
mem_allocated = sum(torch.cuda.memory_allocated(gpu) for gpu in model.gpus) / 1024 / 1024
else:
mem_allocated = torch.cuda.memory_allocated() / 1024 / 1024
max_memory = max(max_memory, mem_allocated)
if check and i != input_ids.numel() - 1:
tot += loss(out.logits[0].to(DEV), input_ids[:, (i + 1)].to(DEV)).float()
cache['past'] = list(out.past_key_values)
del out
sync()
print('Median:', np.median(times))
if check:
print('PPL:', torch.exp(tot / (input_ids.numel() - 1)).item())
print('max memory(MiB):', max_memory)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('model', type=str, help='llama model to load')
parser.add_argument('dataset', type=str, choices=['wikitext2', 'ptb', 'c4'], help='Where to extract calibration data from.')
parser.add_argument('--seed', type=int, default=0, help='Seed for sampling the calibration data.')
parser.add_argument('--nsamples', type=int, default=128, help='Number of calibration data samples.')
parser.add_argument('--percdamp', type=float, default=.01, help='Percent of the average Hessian diagonal to use for dampening.')
parser.add_argument('--nearest', action='store_true', help='Whether to run the RTN baseline.')
parser.add_argument('--wbits', type=int, default=16, choices=[2, 3, 4, 8, 16], help='#bits to use for quantization; use 16 for evaluating base model.')
parser.add_argument('--trits', action='store_true', help='Whether to use trits for quantization.')
parser.add_argument('--blocksize', type=int, default=128, help='blocksize')
parser.add_argument('--groupsize', type=int, default=-1, help='Groupsize to use for quantization; default uses full row.')
parser.add_argument('--eval', action='store_true', help='evaluate quantized model.')
parser.add_argument('--test-generation', action='store_true', help='test generation.')
parser.add_argument('--save', type=str, default='', help='Save quantized checkpoint under this name.')
parser.add_argument('--save_safetensors', type=str, default='', help='Save quantized `.safetensors` checkpoint under this name.')
parser.add_argument('--load', type=str, default='', help='Load quantized model.')
parser.add_argument('--benchmark', type=int, default=0, help='Number of tokens to use for benchmarking.')
parser.add_argument('--check', action='store_true', help='Whether to compute perplexity during benchmarking for verification.')
parser.add_argument('--sym', action='store_true', help='Whether to perform symmetric quantization.')
parser.add_argument('--act-order', action='store_true', help='Whether to apply the activation order GPTQ heuristic')
parser.add_argument('--true-sequential', action='store_true', help='Whether to run in true sequential model.')
parser.add_argument('--new-eval', action='store_true', help='Whether to use the new PTB and C4 eval')
parser.add_argument('--layers-dist', type=str, default='', help='Distribution of layers across GPUs. e.g. 2:1:1 for 2 layers on GPU 0, 1 layer on GPU 1, and 1 layer on GPU 2. Any remaining layers will be assigned to your last GPU.')
parser.add_argument('--observe',
action='store_true',
help='Auto upgrade layer precision to higher precision, for example int2 to int4, groupsize 128 to 64. \
When this feature enabled, `--save` or `--save_safetensors` would be disable.')
parser.add_argument('--quant-directory', type=str, default=None, help='Specify the directory for export quantization parameters to toml format. `None` means no export by default.')
args = parser.parse_args()
if args.layers_dist:
gpu_dist = [int(x) for x in args.layers_dist.split(':')]
else:
gpu_dist = []
if type(args.load) is not str:
args.load = args.load.as_posix()
if args.load:
model = load_quant(args.model, args.load, args.wbits, args.groupsize)
else:
model = get_llama(args.model)
model.eval()
dataloader, testloader = get_loaders(args.dataset, nsamples=args.nsamples, seed=args.seed, model=args.model, seqlen=model.seqlen)
if not args.load and args.wbits < 16 and not args.nearest:
tick = time.time()
quantizers = llama_sequential(model, dataloader, DEV)
print(time.time() - tick)
if args.benchmark:
gpus = [torch.device('cuda:%d' % i) for i in range(torch.cuda.device_count())]
if len(gpus) > 1:
llama_multigpu(model, gpus, gpu_dist)
else:
model = model.to(DEV)
if args.benchmark:
input_ids = next(iter(dataloader))[0][:, :args.benchmark]
benchmark(model, input_ids, check=args.check)
if args.eval:
datasets = ['wikitext2', 'ptb', 'c4']
if args.new_eval:
datasets = ['wikitext2', 'ptb-new', 'c4-new']
for dataset in datasets:
dataloader, testloader = get_loaders(dataset, seed=args.seed, model=args.model, seqlen=model.seqlen)
print(dataset)
llama_eval(model, testloader, DEV)
from utils.datautils import zeroshot_evaluate
zeroshot_evaluate(model, args, DEV)
if args.test_generation:
gpus = [torch.device('cuda:%d' % i) for i in range(torch.cuda.device_count())]
if len(gpus) > 1:
llama_multigpu(model, gpus, gpu_dist)
else:
model = model.to(DEV)
from transformers import LlamaTokenizer, TextStreamer
tokenizer = LlamaTokenizer.from_pretrained(args.model, use_fast=False)
input_ids = tokenizer(["The capital of New Mexico is"], return_tensors="pt").input_ids.to(gpus[0])
streamer = TextStreamer(tokenizer)
with torch.no_grad():
generated_ids = model.generate(input_ids, streamer=streamer)
if args.quant_directory is not None:
export_quant_table(quantizers, args.quant_directory)
if not args.observe and args.save:
llama_pack(model, quantizers, args.wbits, args.groupsize)
torch.save(model.state_dict(), args.save)
if not args.observe and args.save_safetensors:
llama_pack(model, quantizers, args.wbits, args.groupsize)
from safetensors.torch import save_file as safe_save
state_dict = model.state_dict()
state_dict = {k: v.clone().contiguous() for k, v in state_dict.items()}
safe_save(state_dict, args.save_safetensors)
================================================
FILE: lm_eval/__init__.py
================================================
================================================
FILE: lm_eval/base.py
================================================
import abc
from typing import Iterable
import numpy as np
import random
import re
import os
import json
import hashlib
import datasets
from sqlitedict import SqliteDict
from tqdm import tqdm
import torch
import torch.nn.functional as F
from lm_eval.metrics import mean, weighted_perplexity, weighted_mean, bits_per_byte
from lm_eval import utils
from abc import abstractmethod
class LM(abc.ABC):
def __init__(self):
self.cache_hook = CacheHook(None)
@abstractmethod
def loglikelihood(self, requests):
"""Compute log-likelihood of generating a continuation from a context.
Downstream tasks should attempt to use loglikelihood instead of other
LM calls whenever possible.
:param requests: list
A list of pairs (context, continuation)
context: str
Context string. Implementations of LM must be able to handle an
empty context string.
continuation: str
The continuation over which log likelihood will be calculated. If
there is a word boundary, the space should be in the continuation.
For example, context="hello" continuation=" world" is correct.
:return: list
A list of pairs (logprob, isgreedy)
logprob: float
The log probability of `continuation`
isgreedy:
Whether `continuation` would be generated by greedy sampling from `context`
"""
pass
@abstractmethod
def loglikelihood_rolling(self, requests):
"""Compute full log-likelihood of a string, with no truncation, for perplexity computation
- We will use the full max context length of the model.
- For inputs that exceed the max context length, we divide the tokenized string into chunks of up to
the max context length.
- IMPORTANT: Each document's loglikelihood/perplexity is computed *separately*, unlike other implementations
which may simply concatenate multiple documents together.
- IMPORTANT: We maximize the amount of context for each prediction. Specifically, for inputs that we break into
multiple chunks, the last input will still a full-sized context.
Example:
Input tokens: [ 0 1 2 3 4 5 6 7 8 9 ]
Prefix: EOT
Max context length: 4
Resulting input/prediction pairs:
INPUT: EOT 0 1 2
PRED: 0 1 2 3
INPUT: 3 4 5 6
PRED: 4 5 6 7
INPUT: 5 6 7 8
PRED: 8 9
Observe that:
1. Each token is predicted exactly once
2. For the last pair, we provide the full context, but only score the last two tokens
:param requests: list
A list of strings
string: str
String for which we are computing per-toke loglikelihood
:return: list
A list of pairs (logprob, isgreedy)
logprob: float
The log probability of `continuation`
isgreedy:
Whether `continuation` would be generated by greedy sampling from `context`
"""
pass
# TODO: Add an optional max length
@abstractmethod
def greedy_until(self, requests):
"""Generate greedily until a stopping sequence
:param requests: list
A list of pairs (context, until)
context: str
Context string
until: [str]
The string sequences to generate until. These string sequences
may each span across multiple tokens, or may be part of one token.
:return: list
A list of strings continuation
continuation: str
The generated continuation.
"""
pass
@classmethod
def create_from_arg_string(cls, arg_string, additional_config=None):
additional_config = {} if additional_config is None else additional_config
args = utils.simple_parse_args_string(arg_string)
args2 = {k: v for k, v in additional_config.items() if v is not None}
return cls(**args, **args2)
def set_cache_hook(self, cache_hook):
self.cache_hook = cache_hook
class BaseLM(LM):
@property
@abstractmethod
def eot_token_id(self):
pass
@property
@abstractmethod
def max_length(self):
pass
@property
@abstractmethod
def max_gen_toks(self):
pass
@property
@abstractmethod
def batch_size(self):
pass
@property
@abstractmethod
def device(self):
pass
@abstractmethod
def tok_encode(self, string: str):
pass
@abstractmethod
def tok_decode(self, tokens: Iterable[int]):
pass
@abstractmethod
def _model_generate(self, context, max_length, eos_token_id):
pass
@abstractmethod
def _model_call(self, inps):
"""
inps: a torch tensor of shape [batch, sequence]
the size of sequence may vary from call to call
returns: a torch tensor of shape [batch, sequence, vocab] with the
logits returned from the model
"""
pass
# subclass must implement properties vocab_size, eot_token_id, max_gen_toks, batch_size, device, max_length.
# TODO: enforce this somehow
def loglikelihood(self, requests):
new_reqs = []
for context, continuation in requests:
if context == "":
# end of text as context
context_enc = [self.eot_token_id]
else:
context_enc = self.tok_encode(context)
continuation_enc = self.tok_encode(continuation)
new_reqs.append(((context, continuation), context_enc, continuation_enc))
return self._loglikelihood_tokens(new_reqs)
def loglikelihood_rolling(self, requests):
# TODO: Implement caching once we've confirmed the perplexity implementation
# TODO: automatic batch size detection for vectorization
loglikelihoods = []
for (string,) in tqdm(requests):
rolling_token_windows = list(
map(
utils.make_disjoint_window,
utils.get_rolling_token_windows(
token_list=self.tok_encode(string),
prefix_token=self.eot_token_id,
max_seq_len=self.max_length,
context_len=1,
),
)
)
rolling_token_windows = [(None,) + x for x in rolling_token_windows]
# TODO: extract out this call so it only gets called once and also somehow figure out partial caching for
# that
string_nll = self._loglikelihood_tokens(
rolling_token_windows, disable_tqdm=True
)
# discard is_greedy
string_nll = [x[0] for x in string_nll]
string_nll = sum(string_nll)
loglikelihoods.append(string_nll)
return loglikelihoods
def _loglikelihood_tokens(self, requests, disable_tqdm=False):
# TODO: implement some kind of efficient-request-middleware that lumps together requests with the same context
res = []
def _collate(x):
# the negative sign on len(toks) sorts descending - this has a few advantages:
# - time estimates will always be over not underestimates, which is more useful for planning
# - to know the size of a batch when going through the list, you know the first one is always the batch
# padded context length. this is useful to simplify the batching logic and more importantly to make
# automatic adaptive batches much much easier to implement
# - any OOMs will happen right away rather than near the end
toks = x[1] + x[2]
return -len(toks), tuple(toks)
# TODO: automatic (variable) batch size detection for vectorization
re_ord = utils.Reorderer(requests, _collate)
for chunk in utils.chunks(
tqdm(re_ord.get_reordered(), disable=disable_tqdm), self.batch_size
):
inps = []
cont_toks_list = []
inplens = []
padding_length = None
# because vectorizing is annoying, we first convert each (context, continuation) pair to padded
# tensors, then we pack them together into a batch, call the model, and then pick it all apart
# again because vectorizing is annoying
for _, context_enc, continuation_enc in chunk:
# sanity check
assert len(context_enc) > 0
assert len(continuation_enc) > 0
assert len(continuation_enc) <= self.max_length
# how this all works:
# CTX CONT
# inp 0 1 2 3|4 5 6 7 8 9 <- last token is deleted by inp[:, :-1]
# gpt2 \ \
# logits 1 2 3|4 5 6 7 8 9 <- the ctx half gets tossed out by the
# cont_toks 4 5 6 7 8 9 [:, -len(continuation_enc):, :self.vocab_size] slice
# when too long to fit in context, truncate from the left
inp = torch.tensor(
(context_enc + continuation_enc)[-(self.max_length + 1) :][:-1],
dtype=torch.long,
).to(self.device)
(inplen,) = inp.shape
cont = continuation_enc
# since in _collate we make sure length is descending, the longest is always the first one.
padding_length = (
padding_length if padding_length is not None else inplen
)
# pad length from seq to padding_length
inp = torch.cat(
[
inp, # [seq]
torch.zeros(padding_length - inplen, dtype=torch.long).to(
inp.device
), # [padding_length - seq]
],
dim=0,
)
inps.append(inp.unsqueeze(0)) # [1, padding_length]
cont_toks_list.append(cont)
inplens.append(inplen)
batched_inps = torch.cat(inps, dim=0) # [batch, padding_length
multi_logits = F.log_softmax(
self._model_call(batched_inps), dim=-1
).cpu() # [batch, padding_length, vocab]
for (cache_key, _, _), logits, inp, inplen, cont_toks in zip(
chunk, multi_logits, inps, inplens, cont_toks_list
):
# Slice to original seq length
contlen = len(cont_toks)
logits = logits[inplen - contlen : inplen].unsqueeze(
0
) # [1, seq, vocab]
# Check if per-token argmax is exactly equal to continuation
greedy_tokens = logits.argmax(dim=-1)
cont_toks = torch.tensor(cont_toks, dtype=torch.long).unsqueeze(
0
) # [1, seq]
max_equal = (greedy_tokens == cont_toks).all()
# Obtain log-probs at the corresponding continuation token indices
# last_token_slice = logits[:, -1, :].squeeze(0).tolist()
logits = torch.gather(logits, 2, cont_toks.unsqueeze(-1)).squeeze(
-1
) # [1, seq]
# Answer: (log prob, is-exact-match)
answer = (float(logits.sum()), bool(max_equal))
# partial caching
if cache_key is not None:
self.cache_hook.add_partial("loglikelihood", cache_key, answer)
res.append(answer)
return re_ord.get_original(res)
def greedy_until(self, requests):
# TODO: implement fully general `until` that handles until that are
# multiple tokens or that span multiple tokens correctly
# TODO: extract to TokenizedLM?
res = []
def _collate(x):
toks = self.tok_encode(x[0])
return len(toks), x[0]
re_ord = utils.Reorderer(requests, _collate)
for context, until in tqdm(re_ord.get_reordered()):
if isinstance(until, str):
until = [until]
(primary_until,) = self.tok_encode(until[0])
context_enc = torch.tensor(
[self.tok_encode(context)[self.max_gen_toks - self.max_length :]]
).to(self.device)
cont = self._model_generate(
context_enc, context_enc.shape[1] + self.max_gen_toks, primary_until
)
s = self.tok_decode(cont[0].tolist()[context_enc.shape[1] :])
for term in until:
s = s.split(term)[0]
# partial caching
self.cache_hook.add_partial("greedy_until", (context, until), s)
res.append(s)
return re_ord.get_original(res)
class Task(abc.ABC):
"""A task represents an entire benchmark including its dataset, problems,
answers, and evaluation methods. See BoolQ for a simple example implementation
A `doc` can be any python object which represents one instance of evaluation.
This is usually a dictionary e.g.
{"question": ..., "answer": ...} or
{"question": ..., question, answer)
"""
# The name of the `Task` benchmark as denoted in the HuggingFace datasets Hub
# or a path to a custom `datasets` loading script.
DATASET_PATH: str = None
# The name of a subset within `DATASET_PATH`.
DATASET_NAME: str = None
def __init__(self, data_dir=None, cache_dir=None, download_mode=None):
"""
:param data_dir: str
Stores the path to a local folder containing the `Task`'s data files.
Use this to specify the path to manually downloaded data (usually when
the dataset is not publicly accessible).
:param cache_dir: str
The directory to read/write the `Task` dataset. This follows the
HuggingFace `datasets` API with the default cache directory located at:
`~/.cache/huggingface/datasets`
NOTE: You can change the cache location globally for a given process
by setting the shell environment variable, `HF_DATASETS_CACHE`,
to another directory:
`export HF_DATASETS_CACHE="/path/to/another/directory"`
:param download_mode: datasets.DownloadMode
How to treat pre-existing `Task` downloads and data.
- `datasets.DownloadMode.REUSE_DATASET_IF_EXISTS`
Reuse download and reuse dataset.
- `datasets.DownloadMode.REUSE_CACHE_IF_EXISTS`
Reuse download with fresh dataset.
- `datasets.DownloadMode.FORCE_REDOWNLOAD`
Fresh download and fresh dataset.
"""
self.download(data_dir, cache_dir, download_mode)
self._training_docs = None
self._fewshot_docs = None
def download(self, data_dir=None, cache_dir=None, download_mode=None):
"""Downloads and returns the task dataset.
Override this method to download the dataset from a custom API.
:param data_dir: str
Stores the path to a local folder containing the `Task`'s data files.
Use this to specify the path to manually downloaded data (usually when
the dataset is not publicly accessible).
:param cache_dir: str
The directory to read/write the `Task` dataset. This follows the
HuggingFace `datasets` API with the default cache directory located at:
`~/.cache/huggingface/datasets`
NOTE: You can change the cache location globally for a given process
by setting the shell environment variable, `HF_DATASETS_CACHE`,
to another directory:
`export HF_DATASETS_CACHE="/path/to/another/directory"`
:param download_mode: datasets.DownloadMode
How to treat pre-existing `Task` downloads and data.
- `datasets.DownloadMode.REUSE_DATASET_IF_EXISTS`
Reuse download and reuse dataset.
- `datasets.DownloadMode.REUSE_CACHE_IF_EXISTS`
Reuse download with fresh dataset.
- `datasets.DownloadMode.FORCE_REDOWNLOAD`
Fresh download and fresh dataset.
"""
self.dataset = datasets.load_dataset(
path=self.DATASET_PATH,
name=self.DATASET_NAME,
data_dir=data_dir,
cache_dir=cache_dir,
download_mode=download_mode,
)
def should_decontaminate(self):
"""Whether this task supports decontamination against model training set."""
return False
@abstractmethod
def has_training_docs(self):
"""Whether the task has a training set"""
pass
@abstractmethod
def has_validation_docs(self):
"""Whether the task has a validation set"""
pass
@abstractmethod
def has_test_docs(self):
"""Whether the task has a test set"""
pass
def training_docs(self):
"""
:return: Iterable[obj]
A iterable of any object, that doc_to_text can handle
"""
return []
def validation_docs(self):
"""
:return: Iterable[obj]
A iterable of any object, that doc_to_text can handle
"""
return []
def test_docs(self):
"""
:return: Iterable[obj]
A iterable of any object, that doc_to_text can handle
"""
return []
def _process_doc(self, doc):
"""
Override this to process (detokenize, strip, replace, etc.) individual
documents. This can be used in a map over documents of a data split.
E.g. `map(self._process_doc, self.dataset["validation"])`
:return: dict
The processed version of the specified `doc`.
"""
return doc
def fewshot_examples(self, k, rnd):
if self._training_docs is None:
self._training_docs = list(self.training_docs())
return rnd.sample(self._training_docs, k)
def doc_to_decontamination_query(self, doc):
print(
"Override doc_to_decontamination_query with document specific decontamination query."
)
assert False
@abstractmethod
def doc_to_text(self, doc):
pass
@abstractmethod
def doc_to_target(self, doc):
pass
@abstractmethod
def construct_requests(self, doc, ctx):
"""Uses RequestFactory to construct Requests and returns an iterable of
Requests which will be sent to the LM.
:param doc:
The document as returned from training_docs, validation_docs, or test_docs.
:param ctx: str
The context string, generated by fewshot_context. This includes the natural
language description, as well as the few shot examples, and the question
part of the document for `doc`.
"""
pass
@abstractmethod
def process_results(self, doc, results):
"""Take a single document and the LM results and evaluates, returning a
dict where keys are the names of submetrics and values are the values of
the metric for that one document
:param doc:
The document as returned from training_docs, validation_docs, or test_docs.
:param results:
The results of the requests created in construct_requests.
"""
pass
@abstractmethod
def aggregation(self):
"""
:returns: {str: [metric_score] -> float}
A dictionary where keys are the names of submetrics and values are
functions that aggregate a list of metric scores
"""
pass
@abstractmethod
def higher_is_better(self):
"""
:returns: {str: bool}
A dictionary where keys are the names of submetrics and values are
whether a higher value of the submetric is better
"""
pass
def fewshot_description(self):
import warnings
warnings.warn(
"`fewshot_description` will be removed in futures versions. Pass "
"any custom descriptions to the `evaluate` function instead.",
DeprecationWarning,
)
return ""
@utils.positional_deprecated
def fewshot_context(
self, doc, num_fewshot, provide_description=None, rnd=None, description=None
):
"""Returns a fewshot context string that is made up of a prepended description
(if provided), the `num_fewshot` number of examples, and an appended prompt example.
:param doc: str
The document as returned from training_docs, validation_docs, or test_docs.
:param num_fewshot: int
The number of fewshot examples to provide in the returned context string.
:param provide_description: bool
Not implemented, and this option is deprecated and will be removed in a future version in favor of a different description providing method
:param rnd: random.Random
The pseudo-random number generator used to randomly sample examples.
WARNING: This is currently a required arg although it's optionalized with a default `None`.
:param description: str
The task's description that will be prepended to the fewshot examples.
:returns: str
The fewshot context.
"""
assert (
rnd is not None
), "A `random.Random` generator argument must be provided to `rnd`"
assert not provide_description, (
"The `provide_description` arg will be removed in future versions. To prepend "
"a custom description to the context, supply the corresponding string via the "
"`description` arg."
)
if provide_description is not None:
# nudge people to not specify it at all
print(
"WARNING: provide_description is deprecated and will be removed in a future version in favor of description_dict"
)
description = description + "\n\n" if description else ""
if num_fewshot == 0:
labeled_examples = ""
else:
# for sets with no training docs, draw from other set *but ensure no overlap with current doc*
if self.has_training_docs():
fewshotex = self.fewshot_examples(k=num_fewshot, rnd=rnd)
else:
if self._fewshot_docs is None:
self._fewshot_docs = list(
self.validation_docs()
if self.has_validation_docs()
else self.test_docs()
)
fewshotex = rnd.sample(self._fewshot_docs, num_fewshot + 1)
# get rid of the doc that's the one we're evaluating, if it's in the fewshot
fewshotex = [x for x in fewshotex if x != doc][:num_fewshot]
labeled_examples = (
"\n\n".join(
[
self.doc_to_text(doc) + self.doc_to_target(doc)
for doc in fewshotex
]
)
+ "\n\n"
)
example = self.doc_to_text(doc)
return description + labeled_examples + example
class MultipleChoiceTask(Task):
def doc_to_target(self, doc):
return " " + doc["choices"][doc["gold"]]
def construct_requests(self, doc, ctx):
lls = [
rf.loglikelihood(ctx, " {}".format(choice))[0] for choice in doc["choices"]
]
return lls
def process_results(self, doc, results):
gold = doc["gold"]
acc = 1.0 if np.argmax(results) == gold else 0.0
completion_len = np.array([float(len(i)) for i in doc["choices"]])
acc_norm = 1.0 if np.argmax(results / completion_len) == gold else 0.0
return {
"acc": acc,
"acc_norm": acc_norm,
}
def higher_is_better(self):
return {
"acc": True,
"acc_norm": True,
}
def aggregation(self):
return {
"acc": mean,
"acc_norm": mean,
}
class PerplexityTask(Task, abc.ABC):
def should_decontaminate(self):
"""Whether this task supports decontamination against model training set."""
return True
def has_training_docs(self):
return False
def fewshot_examples(self, k, rnd):
assert k == 0
return []
def fewshot_context(
self, doc, num_fewshot, provide_description=None, rnd=None, description=None
):
assert (
num_fewshot == 0
), "The number of fewshot examples must be 0 for perplexity tasks."
assert (
rnd is not None
), "A `random.Random` generator argument must be provided to `rnd`."
assert not provide_description, (
"The `provide_description` arg will be removed in future versions. To prepend "
"a custom description to the context, supply the corresponding string via the "
"`description` arg."
)
if provide_description is not None:
# nudge people to not specify it at all
print(
"WARNING: provide_description is deprecated and will be removed in a future version in favor of description_dict"
)
return ""
def higher_is_better(self):
return {
"word_perplexity": False,
"byte_perplexity": False,
"bits_per_byte": False,
}
def doc_to_decontamination_query(self, doc):
return doc
def doc_to_text(self, doc):
return ""
def doc_to_target(self, doc):
return doc
def construct_requests(self, doc, ctx):
assert not ctx
req = rf.loglikelihood_rolling(self.doc_to_target(doc))
return req
def process_results(self, doc, results):
(loglikelihood,) = results
words = self.count_words(doc)
bytes_ = self.count_bytes(doc)
return {
"word_perplexity": (loglikelihood, words),
"byte_perplexity": (loglikelihood, bytes_),
"bits_per_byte": (loglikelihood, bytes_),
}
def aggregation(self):
return {
"word_perplexity": weighted_perplexity,
"byte_perplexity": weighted_perplexity,
"bits_per_byte": bits_per_byte,
}
@classmethod
def count_bytes(cls, doc):
return len(doc.encode("utf-8"))
@classmethod
def count_words(cls, doc):
"""Downstream tasks with custom word boundaries should override this!"""
return len(re.split(r"\s+", doc))
def hash_args(attr, args):
dat = json.dumps([attr] + list(args))
return hashlib.sha256(dat.encode("utf-8")).hexdigest()
class CacheHook:
def __init__(self, cachinglm):
if cachinglm is None:
self.dbdict = None
return
self.dbdict = cachinglm.dbdict
def add_partial(self, attr, req, res):
if self.dbdict is None:
return
hsh = hash_args(attr, req)
self.dbdict[hsh] = res
class CachingLM:
def __init__(self, lm, cache_db):
"""LM wrapper that returns cached results if they exist, and uses the underlying LM if not.
:param lm: LM
Underlying LM
:param cache_db: str
Path to cache db
"""
self.lm = lm
self.cache_db = cache_db
if os.path.dirname(cache_db):
os.makedirs(os.path.dirname(cache_db), exist_ok=True)
self.dbdict = SqliteDict(cache_db, autocommit=True)
# add hook to lm
lm.set_cache_hook(self.get_cache_hook())
def __getattr__(self, attr):
def fn(requests):
res = []
remaining_reqs = []
# figure out which ones are cached and which ones are new
for req in requests:
hsh = hash_args(attr, req)
if hsh in self.dbdict:
ob = self.dbdict[hsh]
assert ob is not None
res.append(ob)
else:
res.append(None)
remaining_reqs.append(req)
# actually run the LM on the requests that do not have cached results
rem_res = getattr(self.lm, attr)(remaining_reqs)
# stick the new ones back into the list and also cache any of the new ones
resptr = 0
for req, r in zip(remaining_reqs, rem_res):
while res[resptr] is not None:
resptr += 1
res[resptr] = r
# caching
hsh = hash_args(attr, req)
self.dbdict[hsh] = r
self.dbdict.commit()
return res
return fn
def get_cache_hook(self):
return CacheHook(self)
REQUEST_RETURN_LENGTHS = {
"loglikelihood": 2,
"greedy_until": None,
"loglikelihood_rolling": None,
}
class Request:
def __init__(self, request_type, args, index=None):
if request_type not in REQUEST_RETURN_LENGTHS.keys():
raise NotImplementedError(
"The request type {} is not implemented!".format(request_type)
)
self.request_type = request_type
self.args = args
self.index = index
def __iter__(self):
if REQUEST_RETURN_LENGTHS[self.request_type] is None:
raise IndexError("This request type does not return multiple arguments!")
for i in range(REQUEST_RETURN_LENGTHS[self.request_type]):
yield Request(self.request_type, self.args, i)
def __getitem__(self, i):
if REQUEST_RETURN_LENGTHS[self.request_type] is None:
raise IndexError("This request type does not return multiple arguments!")
return Request(self.request_type, self.args, i)
def __eq__(self, other):
return (
self.request_type == other.request_type
and self.args == other.args
and self.index == other.index
)
def __repr__(self):
return f"Req_{self.request_type}{self.args}[{self.index}]\n"
class RequestFactory:
def __getattr__(self, attr):
def fn(*args):
return Request(attr, args)
return fn
rf = RequestFactory()
================================================
FILE: lm_eval/datasets/README.md
================================================
# datasets
This directory contains custom HuggingFace [dataset loading scripts](https://huggingface.co/docs/datasets/dataset_script). They are provided to maintain backward compatibility with the ad-hoc data downloaders in earlier versions of the `lm-evaluation-harness` before HuggingFace [`datasets`](https://huggingface.co/docs/datasets/index) was adopted as the default downloading manager. For example, some instances in the HuggingFace `datasets` repository process features (e.g. whitespace stripping, lower-casing, etc.) in ways that the `lm-evaluation-harness` did not.
__NOTE__: We are __not__ accepting any additional loading scripts into the main branch! If you'd like to use a custom dataset, fork the repo and follow HuggingFace's loading script guide found [here](https://huggingface.co/docs/datasets/dataset_script). You can then override your `Task`'s `DATASET_PATH` attribute to point to this script's local path.
__WARNING__: A handful of loading scripts are included in this collection because they have not yet been pushed to the Huggingface Hub or a HuggingFace organization repo. We will remove such scripts once pushed.
================================================
FILE: lm_eval/datasets/__init__.py
================================================
================================================
FILE: lm_eval/datasets/asdiv/__init__.py
================================================
================================================
FILE: lm_eval/datasets/asdiv/asdiv.py
================================================
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""ASDIV dataset."""
import os
import xml.etree.ElementTree as ET
import datasets
_CITATION = """\
@misc{miao2021diverse,
title={A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers},
author={Shen-Yun Miao and Chao-Chun Liang and Keh-Yih Su},
year={2021},
eprint={2106.15772},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
"""
_DESCRIPTION = """\
ASDiv (Academia Sinica Diverse MWP Dataset) is a diverse (in terms of both language
patterns and problem types) English math word problem (MWP) corpus for evaluating
the capability of various MWP solvers. Existing MWP corpora for studying AI progress
remain limited either in language usage patterns or in problem types. We thus present
a new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem
types taught in elementary school. Each MWP is annotated with its problem type and grade
level (for indicating the level of difficulty).
"""
_HOMEPAGE = "https://github.com/chaochun/nlu-asdiv-dataset"
# TODO: Add the licence for the dataset here if you can find it
_LICENSE = ""
_URLS = "https://github.com/chaochun/nlu-asdiv-dataset/archive/55790e5270bb91ccfa5053194b25732534696b50.zip"
class ASDiv(datasets.GeneratorBasedBuilder):
"""ASDiv: A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers"""
VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [
datasets.BuilderConfig(
name="asdiv",
version=VERSION,
description="A diverse corpus for evaluating and developing english math word problem solvers",
)
]
def _info(self):
features = datasets.Features(
{
"body": datasets.Value("string"),
"question": datasets.Value("string"),
"solution_type": datasets.Value("string"),
"answer": datasets.Value("string"),
"formula": datasets.Value("string"),
}
)
return datasets.DatasetInfo(
description=_DESCRIPTION,
features=features,
homepage=_HOMEPAGE,
license=_LICENSE,
citation=_CITATION,
)
def _split_generators(self, dl_manager):
urls = _URLS
data_dir = dl_manager.download_and_extract(urls)
base_filepath = "nlu-asdiv-dataset-55790e5270bb91ccfa5053194b25732534696b50"
return [
datasets.SplitGenerator(
name=datasets.Split.VALIDATION,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": os.path.join(
data_dir, base_filepath, "dataset", "ASDiv.xml"
),
"split": datasets.Split.VALIDATION,
},
),
]
# method parameters are unpacked from `gen_kwargs` as given in `_split_generators`
def _generate_examples(self, filepath, split):
tree = ET.parse(filepath)
root = tree.getroot()
for key, problem in enumerate(root.iter("Problem")):
yield key, {
"body": problem.find("Body").text,
"question": problem.find("Question").text,
"solution_type": problem.find("Solution-Type").text,
"answer": problem.find("Answer").text,
"formula": problem.find("Formula").text,
}
================================================
FILE: lm_eval/datasets/asdiv/dataset_infos.json
================================================
{"asdiv": {"description": "ASDiv (Academia Sinica Diverse MWP Dataset) is a diverse (in terms of both language\npatterns and problem types) English math word problem (MWP) corpus for evaluating\nthe capability of various MWP solvers. Existing MWP corpora for studying AI progress\nremain limited either in language usage patterns or in problem types. We thus present\na new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem\ntypes taught in elementary school. Each MWP is annotated with its problem type and grade\nlevel (for indicating the level of difficulty).\n", "citation": "@misc{miao2021diverse,\n title={A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers},\n author={Shen-Yun Miao and Chao-Chun Liang and Keh-Yih Su},\n year={2021},\n eprint={2106.15772},\n archivePrefix={arXiv},\n primaryClass={cs.AI}\n}\n", "homepage": "https://github.com/chaochun/nlu-asdiv-dataset", "license": "", "features": {"body": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "solution_type": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"dtype": "string", "id": null, "_type": "Value"}, "formula": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "as_div", "config_name": "asdiv", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 501489, "num_examples": 2305, "dataset_name": "as_div"}}, "download_checksums": {"https://github.com/chaochun/nlu-asdiv-dataset/archive/55790e5270bb91ccfa5053194b25732534696b50.zip": {"num_bytes": 440966, "checksum": "8f1fe4f6d5f170ec1e24ab78c244153c14c568b1bb2b1dad0324e71f37939a2d"}}, "download_size": 440966, "post_processing_size": null, "dataset_size": 501489, "size_in_bytes": 942455}}
================================================
FILE: lm_eval/datasets/coqa/__init__.py
================================================
================================================
FILE: lm_eval/datasets/coqa/coqa.py
================================================
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""CoQA dataset.
This `CoQA` adds the "additional_answers" feature that's missing in the original
datasets version:
https://github.com/huggingface/datasets/blob/master/datasets/coqa/coqa.py
"""
import json
import datasets
_CITATION = """\
@misc{reddy2018coqa,
title={CoQA: A Conversational Question Answering Challenge},
author={Siva Reddy and Danqi Chen and Christopher D. Manning},
year={2018},
eprint={1808.07042},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
"""
_DESCRIPTION = """\
CoQA is a large-scale dataset for building Conversational Question Answering
systems. The goal of the CoQA challenge is to measure the ability of machines to
understand a text passage and answer a series of interconnected questions that
appear in a conversation.
"""
_HOMEPAGE = "https://stanfordnlp.github.io/coqa/"
# TODO: Add the licence for the dataset here if you can find it
_LICENSE = ""
_URLS = {
"train": "https://nlp.stanford.edu/data/coqa/coqa-train-v1.0.json",
"validation": "https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json",
}
# `additional_answers` are not available in the train set so we fill them with
# empty dicts of the same form.
_EMPTY_ADDITIONAL_ANSWER = {
"0": [
{
"span_start": -1,
"span_end": -1,
"span_text": "",
"input_text": "",
"turn_id": -1,
}
],
"1": [
{
"span_start": -1,
"span_end": -1,
"span_text": "",
"input_text": "",
"turn_id": -1,
}
],
"2": [
{
"span_start": -1,
"span_end": -1,
"span_text": "",
"input_text": "",
"turn_id": -1,
}
],
}
class Coqa(datasets.GeneratorBasedBuilder):
"""CoQA is a large-scale dataset for building Conversational Question Answering systems."""
VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [
datasets.BuilderConfig(
name="coqa", version=VERSION, description="The CoQA dataset."
),
]
def _info(self):
features = datasets.Features(
{
"id": datasets.Value("string"),
"source": datasets.Value("string"),
"story": datasets.Value("string"),
"questions": datasets.features.Sequence(
{
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}
),
"answers": datasets.features.Sequence(
{
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}
),
"additional_answers": {
"0": datasets.features.Sequence(
{
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}
),
"1": datasets.features.Sequence(
{
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}
),
"2": datasets.features.Sequence(
{
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}
),
},
}
)
return datasets.DatasetInfo(
description=_DESCRIPTION,
features=features,
homepage=_HOMEPAGE,
license=_LICENSE,
citation=_CITATION,
)
def _split_generators(self, dl_manager):
urls = {"train": _URLS["train"], "validation": _URLS["validation"]}
data_dirs = dl_manager.download_and_extract(urls)
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": data_dirs["train"],
"split": datasets.Split.TRAIN,
},
),
datasets.SplitGenerator(
name=datasets.Split.VALIDATION,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": data_dirs["validation"],
"split": datasets.Split.VALIDATION,
},
),
]
# method parameters are unpacked from `gen_kwargs` as given in `_split_generators`
def _generate_examples(self, filepath, split):
with open(filepath, encoding="utf-8") as f:
data = json.load(f)
for row in data["data"]:
id = row["id"]
source = row["source"]
story = row["story"]
questions = [
{"input_text": q["input_text"], "turn_id": q["turn_id"]}
for q in row["questions"]
]
answers = [
{
"span_start": a["span_start"],
"span_end": a["span_end"],
"span_text": a["span_text"],
"input_text": a["input_text"],
"turn_id": a["turn_id"],
}
for a in row["answers"]
]
if split == datasets.Split.TRAIN:
additional_answers = _EMPTY_ADDITIONAL_ANSWER
else:
additional_answers = {
"0": [
{
"span_start": a0["span_start"],
"span_end": a0["span_end"],
"span_text": a0["span_text"],
"input_text": a0["input_text"],
"turn_id": a0["turn_id"],
}
for a0 in row["additional_answers"]["0"]
],
"1": [
{
"span_start": a1["span_start"],
"span_end": a1["span_end"],
"span_text": a1["span_text"],
"input_text": a1["input_text"],
"turn_id": a1["turn_id"],
}
for a1 in row["additional_answers"]["1"]
],
"2": [
{
"span_start": a2["span_start"],
"span_end": a2["span_end"],
"span_text": a2["span_text"],
"input_text": a2["input_text"],
"turn_id": a2["turn_id"],
}
for a2 in row["additional_answers"]["2"]
],
}
yield row["id"], {
"id": id,
"story": story,
"source": source,
"questions": questions,
"answers": answers,
"additional_answers": additional_answers,
}
================================================
FILE: lm_eval/datasets/coqa/dataset_infos.json
================================================
{"coqa": {"description": "CoQA is a large-scale dataset for building Conversational Question Answering\nsystems. The goal of the CoQA challenge is to measure the ability of machines to\nunderstand a text passage and answer a series of interconnected questions that\nappear in a conversation.\n", "citation": "@misc{reddy2018coqa,\n title={CoQA: A Conversational Question Answering Challenge},\n author={Siva Reddy and Danqi Chen and Christopher D. Manning},\n year={2018},\n eprint={1808.07042},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n", "homepage": "https://stanfordnlp.github.io/coqa/", "license": "", "features": {"id": {"dtype": "string", "id": null, "_type": "Value"}, "source": {"dtype": "string", "id": null, "_type": "Value"}, "story": {"dtype": "string", "id": null, "_type": "Value"}, "questions": {"feature": {"input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "answers": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "additional_answers": {"0": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "1": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "2": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "coqa", "config_name": "coqa", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 26250528, "num_examples": 7199, "dataset_name": "coqa"}, "validation": {"name": "validation", "num_bytes": 3765933, "num_examples": 500, "dataset_name": "coqa"}}, "download_checksums": {"https://nlp.stanford.edu/data/coqa/coqa-train-v1.0.json": {"num_bytes": 49001836, "checksum": "b0fdb2bc1bd38dd3ca2ce5fa2ac3e02c6288ac914f241ac409a655ffb6619fa6"}, "https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json": {"num_bytes": 9090845, "checksum": "dfa367a9733ce53222918d0231d9b3bedc2b8ee831a2845f62dfc70701f2540a"}}, "download_size": 58092681, "post_processing_size": null, "dataset_size": 30016461, "size_in_bytes": 88109142}}
================================================
FILE: lm_eval/datasets/drop/__init__.py
================================================
================================================
FILE: lm_eval/datasets/drop/dataset_infos.json
================================================
{"drop": {"description": "DROP is a QA dataset which tests comprehensive understanding of paragraphs. In \nthis crowdsourced, adversarially-created, 96k question-answering benchmark, a \nsystem must resolve multiple references in a question, map them onto a paragraph,\nand perform discrete operations over them (such as addition, counting, or sorting).\n", "citation": "@misc{dua2019drop,\n title={DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs}, \n author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},\n year={2019},\n eprint={1903.00161},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n", "homepage": "https://allenai.org/data/drop", "license": "", "features": {"section_id": {"dtype": "string", "id": null, "_type": "Value"}, "passage": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "query_id": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"number": {"dtype": "string", "id": null, "_type": "Value"}, "date": {"day": {"dtype": "string", "id": null, "_type": "Value"}, "month": {"dtype": "string", "id": null, "_type": "Value"}, "year": {"dtype": "string", "id": null, "_type": "Value"}}, "spans": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}, "worker_id": {"dtype": "string", "id": null, "_type": "Value"}, "hit_id": {"dtype": "string", "id": null, "_type": "Value"}}, "validated_answers": {"feature": {"number": {"dtype": "string", "id": null, "_type": "Value"}, "date": {"day": {"dtype": "string", "id": null, "_type": "Value"}, "month": {"dtype": "string", "id": null, "_type": "Value"}, "year": {"dtype": "string", "id": null, "_type": "Value"}}, "spans": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}, "worker_id": {"dtype": "string", "id": null, "_type": "Value"}, "hit_id": {"dtype": "string", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "drop", "config_name": "drop", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 108858121, "num_examples": 77409, "dataset_name": "drop"}, "validation": {"name": "validation", "num_bytes": 12560739, "num_examples": 9536, "dataset_name": "drop"}}, "download_checksums": {"https://s3-us-west-2.amazonaws.com/allennlp/datasets/drop/drop_dataset.zip": {"num_bytes": 8308692, "checksum": "39d2278a29fd729de301b111a45f434c24834f40df8f4ff116d864589e3249d6"}}, "download_size": 8308692, "post_processing_size": null, "dataset_size": 121418860, "size_in_bytes": 129727552}}
================================================
FILE: lm_eval/datasets/drop/drop.py
================================================
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Custom DROP dataset that, unlike HF, keeps all question-answer pairs
# even if there are multiple types of answers for the same question.
"""DROP dataset."""
import json
import os
import datasets
_CITATION = """\
@misc{dua2019drop,
title={DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs},
author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
year={2019},
eprint={1903.00161},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
"""
_DESCRIPTION = """\
DROP is a QA dataset which tests comprehensive understanding of paragraphs. In
this crowdsourced, adversarially-created, 96k question-answering benchmark, a
system must resolve multiple references in a question, map them onto a paragraph,
and perform discrete operations over them (such as addition, counting, or sorting).
"""
_HOMEPAGE = "https://allenai.org/data/drop"
# TODO: Add the licence for the dataset here if you can find it
_LICENSE = ""
_URLS = {
"drop": "https://s3-us-west-2.amazonaws.com/allennlp/datasets/drop/drop_dataset.zip",
}
_EMPTY_VALIDATED_ANSWER = [
{
"number": "",
"date": {
"day": "",
"month": "",
"year": "",
},
"spans": [],
"worker_id": "",
"hit_id": "",
}
]
class Drop(datasets.GeneratorBasedBuilder):
"""DROP is a QA dataset which tests comprehensive understanding of paragraphs."""
VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [
datasets.BuilderConfig(
name="drop", version=VERSION, description="The DROP dataset."
),
]
def _info(self):
features = datasets.Features(
{
"section_id": datasets.Value("string"),
"passage": datasets.Value("string"),
"question": datasets.Value("string"),
"query_id": datasets.Value("string"),
"answer": {
"number": datasets.Value("string"),
"date": {
"day": datasets.Value("string"),
"month": datasets.Value("string"),
"year": datasets.Value("string"),
},
"spans": datasets.features.Sequence(datasets.Value("string")),
"worker_id": datasets.Value("string"),
"hit_id": datasets.Value("string"),
},
"validated_answers": datasets.features.Sequence(
{
"number": datasets.Value("string"),
"date": {
"day": datasets.Value("string"),
"month": datasets.Value("string"),
"year": datasets.Value("string"),
},
"spans": datasets.features.Sequence(datasets.Value("string")),
"worker_id": datasets.Value("string"),
"hit_id": datasets.Value("string"),
}
),
}
)
return datasets.DatasetInfo(
description=_DESCRIPTION,
features=features,
homepage=_HOMEPAGE,
license=_LICENSE,
citation=_CITATION,
)
def _split_generators(self, dl_manager):
urls = _URLS[self.config.name]
data_dir = dl_manager.download_and_extract(urls)
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": os.path.join(
data_dir, "drop_dataset", "drop_dataset_train.json"
),
"split": "train",
},
),
datasets.SplitGenerator(
name=datasets.Split.VALIDATION,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": os.path.join(
data_dir, "drop_dataset", "drop_dataset_dev.json"
),
"split": "validation",
},
),
]
# method parameters are unpacked from `gen_kwargs` as given in `_split_generators`
def _generate_examples(self, filepath, split):
with open(filepath, encoding="utf-8") as f:
data = json.load(f)
key = 0
for section_id, example in data.items():
# Each example (passage) has multiple sub-question-answer pairs.
for qa in example["qa_pairs"]:
# Build answer.
answer = qa["answer"]
answer = {
"number": answer["number"],
"date": {
"day": answer["date"].get("day", ""),
"month": answer["date"].get("month", ""),
"year": answer["date"].get("year", ""),
},
"spans": answer["spans"],
"worker_id": answer.get("worker_id", ""),
"hit_id": answer.get("hit_id", ""),
}
validated_answers = []
if "validated_answers" in qa:
for validated_answer in qa["validated_answers"]:
va = {
"number": validated_answer.get("number", ""),
"date": {
"day": validated_answer["date"].get("day", ""),
"month": validated_answer["date"].get("month", ""),
"year": validated_answer["date"].get("year", ""),
},
"spans": validated_answer.get("spans", ""),
"worker_id": validated_answer.get("worker_id", ""),
"hit_id": validated_answer.get("hit_id", ""),
}
validated_answers.append(va)
else:
validated_answers = _EMPTY_VALIDATED_ANSWER
yield key, {
"section_id": section_id,
"passage": example["passage"],
"question": qa["question"],
"query_id": qa["query_id"],
"answer": answer,
"validated_answers": validated_answers,
}
key += 1
================================================
FILE: lm_eval/datasets/headqa/__init__.py
================================================
================================================
FILE: lm_eval/datasets/headqa/dataset_infos.json
================================================
{"es": {"description": "HEAD-QA is a multi-choice HEAlthcare Dataset. The questions come from exams to access a specialized position in the\nSpanish healthcare system, and are challenging even for highly specialized humans. They are designed by the Ministerio\nde Sanidad, Consumo y Bienestar Social.\nThe dataset contains questions about the following topics: medicine, nursing, psychology, chemistry, pharmacology and biology.\n", "citation": "@inproceedings{vilares-gomez-rodriguez-2019-head,\n title = \"{HEAD}-{QA}: A Healthcare Dataset for Complex Reasoning\",\n author = \"Vilares, David and\n G{'o}mez-Rodr{'i}guez, Carlos\",\n booktitle = \"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics\",\n month = jul,\n year = \"2019\",\n address = \"Florence, Italy\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://www.aclweb.org/anthology/P19-1092\",\n doi = \"10.18653/v1/P19-1092\",\n pages = \"960--966\",\n abstract = \"We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show that: (i) HEAD-QA challenges current methods, and (ii) the results lag well behind human performance, demonstrating its usefulness as a benchmark for future work.\",\n}\n", "homepage": "https://aghie.github.io/head-qa/", "license": "MIT License", "features": {"name": {"dtype": "string", "id": null, "_type": "Value"}, "year": {"dtype": "string", "id": null, "_type": "Value"}, "category": {"dtype": "string", "id": null, "_type": "Value"}, "qid": {"dtype": "int32", "id": null, "_type": "Value"}, "qtext": {"dtype": "string", "id": null, "_type": "Value"}, "ra": {"dtype": "int32", "id": null, "_type": "Value"}, "answers": [{"aid": {"dtype": "int32", "id": null, "_type": "Value"}, "atext": {"dtype": "string", "id": null, "_type": "Value"}}]}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "head_qa", "config_name": "es", "version": {"version_str": "1.1.0", "description": null, "major": 1, "minor": 1, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 1196021, "num_examples": 2657, "dataset_name": "head_qa"}, "test": {"name": "test", "num_bytes": 1169819, "num_examples": 2742, "dataset_name": "head_qa"}, "validation": {"name": "validation", "num_bytes": 556924, "num_examples": 1366, "dataset_name": "head_qa"}}, "download_checksums": {"https://drive.google.com/uc?export=download&confirm=t&id=1a_95N5zQQoUCq8IBNVZgziHbeM-QxG2t": {"num_bytes": 79365502, "checksum": "6ec29a3f55153d167f0bdf05395558919ba0b1df9c63e79ffceda2a09884ad8b"}}, "download_size": 79365502, "post_processing_size": null, "dataset_size": 2922764, "size_in_bytes": 82288266}, "en": {"description": "HEAD-QA is a multi-choice HEAlthcare Dataset. The questions come from exams to access a specialized position in the\nSpanish healthcare system, and are challenging even for highly specialized humans. They are designed by the Ministerio\nde Sanidad, Consumo y Bienestar Social.\nThe dataset contains questions about the following topics: medicine, nursing, psychology, chemistry, pharmacology and biology.\n", "citation": "@inproceedings{vilares-gomez-rodriguez-2019-head,\n title = \"{HEAD}-{QA}: A Healthcare Dataset for Complex Reasoning\",\n author = \"Vilares, David and\n G{'o}mez-Rodr{'i}guez, Carlos\",\n booktitle = \"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics\",\n month = jul,\n year = \"2019\",\n address = \"Florence, Italy\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://www.aclweb.org/anthology/P19-1092\",\n doi = \"10.18653/v1/P19-1092\",\n pages = \"960--966\",\n abstract = \"We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show that: (i) HEAD-QA challenges current methods, and (ii) the results lag well behind human performance, demonstrating its usefulness as a benchmark for future work.\",\n}\n", "homepage": "https://aghie.github.io/head-qa/", "license": "MIT License", "features": {"name": {"dtype": "string", "id": null, "_type": "Value"}, "year": {"dtype": "string", "id": null, "_type": "Value"}, "category": {"dtype": "string", "id": null, "_type": "Value"}, "qid": {"dtype": "int32", "id": null, "_type": "Value"}, "qtext": {"dtype": "string", "id": null, "_type": "Value"}, "ra": {"dtype": "int32", "id": null, "_type": "Value"}, "answers": [{"aid": {"dtype": "int32", "id": null, "_type": "Value"}, "atext": {"dtype": "string", "id": null, "_type": "Value"}}]}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "head_qa", "config_name": "en", "version": {"version_str": "1.1.0", "description": null, "major": 1, "minor": 1, "patch": 0}, "splits": {"train": {"name": "train", "num_bytes": 1123151, "num_examples": 2657, "dataset_name": "head_qa"}, "test": {"name": "test", "num_bytes": 1097349, "num_examples": 2742, "dataset_name": "head_qa"}, "validation": {"name": "validation", "num_bytes": 523462, "num_examples": 1366, "dataset_name": "head_qa"}}, "download_checksums": {"https://drive.google.com/uc?export=download&confirm=t&id=1a_95N5zQQoUCq8IBNVZgziHbeM-QxG2t": {"num_bytes": 79365502, "checksum": "6ec29a3f55153d167f0bdf05395558919ba0b1df9c63e79ffceda2a09884ad8b"}}, "download_size": 79365502, "post_processing_size": null, "dataset_size": 2743962, "size_in_bytes": 82109464}}
================================================
FILE: lm_eval/datasets/headqa/headqa.py
================================================
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# NOTE: This is an exact copy of
# https://github.com/huggingface/datasets/blob/3804442bb7cfcb9d52044d92688115cfdc69c2da/datasets/head_qa/head_qa.py
# with the exception of the `image` feature. This is to avoid adding `Pillow`
# as a dependency.
"""HEAD-QA: A Healthcare Dataset for Complex Reasoning."""
import json
import os
import datasets
_CITATION = """\
@inproceedings{vilares-gomez-rodriguez-2019-head,
title = "{HEAD}-{QA}: A Healthcare Dataset for Complex Reasoning",
author = "Vilares, David and
G{\'o}mez-Rodr{\'i}guez, Carlos",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1092",
doi = "10.18653/v1/P19-1092",
pages = "960--966",
abstract = "We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show that: (i) HEAD-QA challenges current methods, and (ii) the results lag well behind human performance, demonstrating its usefulness as a benchmark for future work.",
}
"""
_DESCRIPTION = """\
HEAD-QA is a multi-choice HEAlthcare Dataset. The questions come from exams to access a specialized position in the
Spanish healthcare system, and are challenging even for highly specialized humans. They are designed by the Ministerio
de Sanidad, Consumo y Bienestar Social.
The dataset contains questions about the following topics: medicine, nursing, psychology, chemistry, pharmacology and biology.
"""
_HOMEPAGE = "https://aghie.github.io/head-qa/"
_LICENSE = "MIT License"
_URL = "https://drive.google.com/uc?export=download&confirm=t&id=1a_95N5zQQoUCq8IBNVZgziHbeM-QxG2t"
_DIRS = {"es": "HEAD", "en": "HEAD_EN"}
class HeadQA(datasets.GeneratorBasedBuilder):
"""HEAD-QA: A Healthcare Dataset for Complex Reasoning"""
VERSION = datasets.Version("1.1.0")
BUILDER_CONFIGS = [
datasets.BuilderConfig(
name="es", version=VERSION, description="Spanish HEAD dataset"
),
datasets.BuilderConfig(
name="en", version=VERSION, description="English HEAD dataset"
),
]
DEFAULT_CONFIG_NAME = "es"
def _info(self):
return datasets.DatasetInfo(
description=_DESCRIPTION,
features=datasets.Features(
{
"name": datasets.Value("string"),
"year": datasets.Value("string"),
"category": datasets.Value("string"),
"qid": datasets.Value("int32"),
"qtext": datasets.Value("string"),
"ra": datasets.Value("int32"),
"answers": [
{
"aid": datasets.Value("int32"),
"atext": datasets.Value("string"),
}
],
}
),
supervised_keys=None,
homepage=_HOMEPAGE,
license=_LICENSE,
citation=_CITATION,
)
def _split_generators(self, dl_manager):
"""Returns SplitGenerators."""
data_dir = dl_manager.download_and_extract(_URL)
dir = _DIRS[self.config.name]
data_lang_dir = os.path.join(data_dir, dir)
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
gen_kwargs={
"data_dir": data_dir,
"filepath": os.path.join(data_lang_dir, f"train_{dir}.json"),
},
),
datasets.SplitGenerator(
name=datasets.Split.TEST,
gen_kwargs={
"data_dir": data_dir,
"filepath": os.path.join(data_lang_dir, f"test_{dir}.json"),
},
),
datasets.SplitGenerator(
name=datasets.Split.VALIDATION,
gen_kwargs={
"data_dir": data_dir,
"filepath": os.path.join(data_lang_dir, f"dev_{dir}.json"),
},
),
]
def _generate_examples(self, data_dir, filepath):
"""Yields examples."""
with open(filepath, encoding="utf-8") as f:
head_qa = json.load(f)
for exam_id, exam in enumerate(head_qa["exams"]):
content = head_qa["exams"][exam]
name = content["name"].strip()
year = content["year"].strip()
category = content["category"].strip()
for question in content["data"]:
qid = int(question["qid"].strip())
qtext = question["qtext"].strip()
ra = int(question["ra"].strip())
aids = [answer["aid"] for answer in question["answers"]]
atexts = [answer["atext"].strip() for answer in question["answers"]]
answers = [
{"aid": aid, "atext": atext} for aid, atext in zip(aids, atexts)
]
id_ = f"{exam_id}_{qid}"
yield id_, {
"name": name,
"year": year,
"category": category,
"qid": qid,
"qtext": qtext,
"ra": ra,
"answers": answers,
}
================================================
FILE: lm_eval/datasets/hendrycks_ethics/__init__.py
================================================
================================================
FILE: lm_eval/datasets/hendrycks_ethics/dataset_infos.json
================================================
{"commonsense": {"description": "The ETHICS dataset is a benchmark that spans concepts in justice, well-being,\nduties, virtues, and commonsense morality. Models predict widespread moral\njudgments about diverse text scenarios. This requires connecting physical and\nsocial world knowledge to value judgements, a capability that may enable us\nto steer chatbot outputs or eventually regularize open-ended reinforcement\nlearning agents.\n\nThe Commonsense subset contains examples focusing on moral standards and principles that most people intuitively accept.", "citation": "@article{hendrycks2021ethics\n title={Aligning AI With Shared Human Values},\n author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},\n journal={Proceedings of the International Conference on Learning Representations (ICLR)},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/ethics", "license": "", "features": {"label": {"dtype": "int32", "id": null, "_type": "Value"}, "input": {"dtype": "string", "id": null, "_type": "Value"}, "is_short": {"dtype": "bool", "id": null, "_type": "Value"}, "edited": {"dtype": "bool", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_ethics", "config_name": "commonsense", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 14435215, "num_examples": 13910, "dataset_name": "hendrycks_ethics"}, "test": {"name": "test", "num_bytes": 3150094, "num_examples": 3885, "dataset_name": "hendrycks_ethics"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/ethics.tar": {"num_bytes": 35585024, "checksum": "40acbf1ac0da79a2aabef394d58889136b8d38b05be09482006de2453fb06333"}}, "download_size": 35585024, "post_processing_size": null, "dataset_size": 17585309, "size_in_bytes": 53170333}, "deontology": {"description": "The ETHICS dataset is a benchmark that spans concepts in justice, well-being,\nduties, virtues, and commonsense morality. Models predict widespread moral\njudgments about diverse text scenarios. This requires connecting physical and\nsocial world knowledge to value judgements, a capability that may enable us\nto steer chatbot outputs or eventually regularize open-ended reinforcement\nlearning agents.\n\nThe Deontology subset contains examples focusing on whether an act is required, permitted, or forbidden according to a set of rules or constraints", "citation": "@article{hendrycks2021ethics\n title={Aligning AI With Shared Human Values},\n author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},\n journal={Proceedings of the International Conference on Learning Representations (ICLR)},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/ethics", "license": "", "features": {"group_id": {"dtype": "int32", "id": null, "_type": "Value"}, "label": {"dtype": "int32", "id": null, "_type": "Value"}, "scenario": {"dtype": "string", "id": null, "_type": "Value"}, "excuse": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_ethics", "config_name": "deontology", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 1931475, "num_examples": 18164, "dataset_name": "hendrycks_ethics"}, "test": {"name": "test", "num_bytes": 384602, "num_examples": 3596, "dataset_name": "hendrycks_ethics"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/ethics.tar": {"num_bytes": 35585024, "checksum": "40acbf1ac0da79a2aabef394d58889136b8d38b05be09482006de2453fb06333"}}, "download_size": 35585024, "post_processing_size": null, "dataset_size": 2316077, "size_in_bytes": 37901101}, "justice": {"description": "The ETHICS dataset is a benchmark that spans concepts in justice, well-being,\nduties, virtues, and commonsense morality. Models predict widespread moral\njudgments about diverse text scenarios. This requires connecting physical and\nsocial world knowledge to value judgements, a capability that may enable us\nto steer chatbot outputs or eventually regularize open-ended reinforcement\nlearning agents.\n\nThe Justice subset contains examples focusing on how a character treats another person", "citation": "@article{hendrycks2021ethics\n title={Aligning AI With Shared Human Values},\n author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},\n journal={Proceedings of the International Conference on Learning Representations (ICLR)},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/ethics", "license": "", "features": {"group_id": {"dtype": "int32", "id": null, "_type": "Value"}, "label": {"dtype": "int32", "id": null, "_type": "Value"}, "scenario": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_ethics", "config_name": "justice", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 2516501, "num_examples": 21791, "dataset_name": "hendrycks_ethics"}, "test": {"name": "test", "num_bytes": 309427, "num_examples": 2704, "dataset_name": "hendrycks_ethics"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/ethics.tar": {"num_bytes": 35585024, "checksum": "40acbf1ac0da79a2aabef394d58889136b8d38b05be09482006de2453fb06333"}}, "download_size": 35585024, "post_processing_size": null, "dataset_size": 2825928, "size_in_bytes": 38410952}, "utilitarianism": {"description": "The ETHICS dataset is a benchmark that spans concepts in justice, well-being,\nduties, virtues, and commonsense morality. Models predict widespread moral\njudgments about diverse text scenarios. This requires connecting physical and\nsocial world knowledge to value judgements, a capability that may enable us\nto steer chatbot outputs or eventually regularize open-ended reinforcement\nlearning agents.\n\nThe Utilitarianism subset contains scenarios that should be ranked from most pleasant to least pleasant for the person in the scenario", "citation": "@article{hendrycks2021ethics\n title={Aligning AI With Shared Human Values},\n author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},\n journal={Proceedings of the International Conference on Learning Representations (ICLR)},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/ethics", "license": "", "features": {"activity": {"dtype": "string", "id": null, "_type": "Value"}, "baseline": {"dtype": "string", "id": null, "_type": "Value"}, "rating": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_ethics", "config_name": "utilitarianism", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 2241770, "num_examples": 13738, "dataset_name": "hendrycks_ethics"}, "test": {"name": "test", "num_bytes": 749768, "num_examples": 4808, "dataset_name": "hendrycks_ethics"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/ethics.tar": {"num_bytes": 35585024, "checksum": "40acbf1ac0da79a2aabef394d58889136b8d38b05be09482006de2453fb06333"}}, "download_size": 35585024, "post_processing_size": null, "dataset_size": 2991538, "size_in_bytes": 38576562}, "virtue": {"description": "The ETHICS dataset is a benchmark that spans concepts in justice, well-being,\nduties, virtues, and commonsense morality. Models predict widespread moral\njudgments about diverse text scenarios. This requires connecting physical and\nsocial world knowledge to value judgements, a capability that may enable us\nto steer chatbot outputs or eventually regularize open-ended reinforcement\nlearning agents.\n\nThe Virtue subset contains scenarios focusing on whether virtues or vices are being exemplified", "citation": "@article{hendrycks2021ethics\n title={Aligning AI With Shared Human Values},\n author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},\n journal={Proceedings of the International Conference on Learning Representations (ICLR)},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/ethics", "license": "", "features": {"group_id": {"dtype": "int32", "id": null, "_type": "Value"}, "label": {"dtype": "int32", "id": null, "_type": "Value"}, "scenario": {"dtype": "string", "id": null, "_type": "Value"}, "trait": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_ethics", "config_name": "virtue", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 2640328, "num_examples": 28245, "dataset_name": "hendrycks_ethics"}, "test": {"name": "test", "num_bytes": 473473, "num_examples": 4975, "dataset_name": "hendrycks_ethics"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/ethics.tar": {"num_bytes": 35585024, "checksum": "40acbf1ac0da79a2aabef394d58889136b8d38b05be09482006de2453fb06333"}}, "download_size": 35585024, "post_processing_size": null, "dataset_size": 3113801, "size_in_bytes": 38698825}}
================================================
FILE: lm_eval/datasets/hendrycks_ethics/hendrycks_ethics.py
================================================
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""ETHICS dataset."""
# TODO: Add the `hard` dataset splits.
import csv
import os
import datasets
_CITATION = """\
@article{hendrycks2021ethics
title={Aligning AI With Shared Human Values},
author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},
journal={Proceedings of the International Conference on Learning Representations (ICLR)},
year={2021}
}
"""
_DESCRIPTION = """\
The ETHICS dataset is a benchmark that spans concepts in justice, well-being,
duties, virtues, and commonsense morality. Models predict widespread moral
judgments about diverse text scenarios. This requires connecting physical and
social world knowledge to value judgements, a capability that may enable us
to steer chatbot outputs or eventually regularize open-ended reinforcement
learning agents.
"""
_HOMEPAGE = "https://github.com/hendrycks/ethics"
# TODO: Add the licence for the dataset here if you can find it
_LICENSE = ""
_URLS = "https://people.eecs.berkeley.edu/~hendrycks/ethics.tar"
class EthicsConfig(datasets.BuilderConfig):
"""BuilderConfig for Hendrycks ETHICS."""
def __init__(self, prefix, features, **kwargs):
"""BuilderConfig for Hendrycks ETHICS.
Args:
prefix: *string*, prefix to add to the dataset name for path location.
features: *list[string]*, list of the features that will appear in the
feature dict.
"""
# Version history:
super().__init__(version=datasets.Version("0.0.1"), **kwargs)
self.prefix = prefix
self.features = features
class HendrycksEthics(datasets.GeneratorBasedBuilder):
"""The ETHICS dataset is a benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality."""
BUILDER_CONFIGS = [
EthicsConfig(
name="commonsense",
prefix="cm",
features=datasets.Features(
{
"label": datasets.Value("int32"),
"input": datasets.Value("string"),
"is_short": datasets.Value("bool"),
"edited": datasets.Value("bool"),
}
),
description="The Commonsense subset contains examples focusing on moral standards and principles that most people intuitively accept.",
),
EthicsConfig(
name="deontology",
prefix="deontology",
features=datasets.Features(
{
"group_id": datasets.Value("int32"),
"label": datasets.Value("int32"),
"scenario": datasets.Value("string"),
"excuse": datasets.Value("string"),
}
),
description="The Deontology subset contains examples focusing on whether an act is required, permitted, or forbidden according to a set of rules or constraints",
),
EthicsConfig(
name="justice",
prefix="justice",
features=datasets.Features(
{
"group_id": datasets.Value("int32"),
"label": datasets.Value("int32"),
"scenario": datasets.Value("string"),
}
),
description="The Justice subset contains examples focusing on how a character treats another person",
),
EthicsConfig(
name="utilitarianism",
prefix="util",
features=datasets.Features(
{
"activity": datasets.Value("string"),
"baseline": datasets.Value("string"),
"rating": datasets.Value("string"), # Empty rating.
}
),
description="The Utilitarianism subset contains scenarios that should be ranked from most pleasant to least pleasant for the person in the scenario",
),
EthicsConfig(
name="virtue",
prefix="virtue",
features=datasets.Features(
{
"group_id": datasets.Value("int32"),
"label": datasets.Value("int32"),
"scenario": datasets.Value("string"),
"trait": datasets.Value("string"),
}
),
description="The Virtue subset contains scenarios focusing on whether virtues or vices are being exemplified",
),
]
def _info(self):
return datasets.DatasetInfo(
description=f"{_DESCRIPTION}\n{self.config.description}",
features=self.config.features,
homepage=_HOMEPAGE,
license=_LICENSE,
citation=_CITATION,
)
def _split_generators(self, dl_manager):
urls = _URLS
data_dir = dl_manager.download_and_extract(urls)
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": os.path.join(
data_dir,
"ethics",
self.config.name,
f"{self.config.prefix}_train.csv",
),
"split": "train",
},
),
datasets.SplitGenerator(
name=datasets.Split.TEST,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": os.path.join(
data_dir,
"ethics",
self.config.name,
f"{self.config.prefix}_test.csv",
),
"split": "test",
},
),
]
# method parameters are unpacked from `gen_kwargs` as given in `_split_generators`
def _generate_examples(self, filepath, split):
with open(filepath, newline="") as f:
if self.config.name == "utilitarianism":
contents = csv.DictReader(f, fieldnames=["activity", "baseline"])
else:
contents = csv.DictReader(f)
# For subsets with grouped scenarios, tag them with an id.
group_id = 0
for key, row in enumerate(contents):
if self.config.name == "deontology":
# Scenarios come in groups of 4.
if key % 4 == 0 and key != 0:
group_id += 1
yield key, {
"group_id": group_id,
"label": row["label"],
"scenario": row["scenario"],
"excuse": row["excuse"],
}
elif self.config.name == "justice":
# Scenarios come in groups of 4.
if key % 4 == 0 and key != 0:
group_id += 1
yield key, {
"group_id": group_id,
"label": row["label"],
"scenario": row["scenario"],
}
elif self.config.name == "commonsense":
yield key, {
"label": row["label"],
"input": row["input"],
"is_short": row["is_short"],
"edited": row["edited"],
}
elif self.config.name == "virtue":
# Scenarios come in groups of 5.
if key % 5 == 0 and key != 0:
group_id += 1
scenario, trait = row["scenario"].split(" [SEP] ")
yield key, {
"group_id": group_id,
"label": row["label"],
"scenario": scenario,
"trait": trait,
}
elif self.config.name == "utilitarianism":
yield key, {
"activity": row["activity"],
"baseline": row["baseline"],
"rating": "",
}
================================================
FILE: lm_eval/datasets/hendrycks_math/__init__.py
================================================
================================================
FILE: lm_eval/datasets/hendrycks_math/dataset_infos.json
================================================
{"algebra": {"description": "MATH is a dataset of 12,500 challenging competition mathematics problems. Each\nproblem in Math has a full step-by-step solution which can be used to teach\nmodels to generate answer derivations and explanations.\n", "citation": "@article{hendrycksmath2021,\n title={Measuring Mathematical Problem Solving With the Math Dataset},\n author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},\n journal={NeurIPS},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/math", "license": "", "features": {"problem": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "solution": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_math", "config_name": "algebra", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 955021, "num_examples": 1744, "dataset_name": "hendrycks_math"}, "test": {"name": "test", "num_bytes": 648291, "num_examples": 1187, "dataset_name": "hendrycks_math"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/MATH.tar": {"num_bytes": 20327936, "checksum": "0fbe4fad0df66942db6c221cdcc95b298cc7f4595a2f0f518360cce84e90d9ac"}}, "download_size": 20327936, "post_processing_size": null, "dataset_size": 1603312, "size_in_bytes": 21931248}, "counting_and_probability": {"description": "MATH is a dataset of 12,500 challenging competition mathematics problems. Each\nproblem in Math has a full step-by-step solution which can be used to teach\nmodels to generate answer derivations and explanations.\n", "citation": "@article{hendrycksmath2021,\n title={Measuring Mathematical Problem Solving With the Math Dataset},\n author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},\n journal={NeurIPS},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/math", "license": "", "features": {"problem": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "solution": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_math", "config_name": "counting_and_probability", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 667385, "num_examples": 771, "dataset_name": "hendrycks_math"}, "test": {"name": "test", "num_bytes": 353803, "num_examples": 474, "dataset_name": "hendrycks_math"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/MATH.tar": {"num_bytes": 20327936, "checksum": "0fbe4fad0df66942db6c221cdcc95b298cc7f4595a2f0f518360cce84e90d9ac"}}, "download_size": 20327936, "post_processing_size": null, "dataset_size": 1021188, "size_in_bytes": 21349124}, "geometry": {"description": "MATH is a dataset of 12,500 challenging competition mathematics problems. Each\nproblem in Math has a full step-by-step solution which can be used to teach\nmodels to generate answer derivations and explanations.\n", "citation": "@article{hendrycksmath2021,\n title={Measuring Mathematical Problem Solving With the Math Dataset},\n author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},\n journal={NeurIPS},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/math", "license": "", "features": {"problem": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "solution": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_math", "config_name": "geometry", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 1077241, "num_examples": 870, "dataset_name": "hendrycks_math"}, "test": {"name": "test", "num_bytes": 523126, "num_examples": 479, "dataset_name": "hendrycks_math"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/MATH.tar": {"num_bytes": 20327936, "checksum": "0fbe4fad0df66942db6c221cdcc95b298cc7f4595a2f0f518360cce84e90d9ac"}}, "download_size": 20327936, "post_processing_size": null, "dataset_size": 1600367, "size_in_bytes": 21928303}, "intermediate_algebra": {"description": "MATH is a dataset of 12,500 challenging competition mathematics problems. Each\nproblem in Math has a full step-by-step solution which can be used to teach\nmodels to generate answer derivations and explanations.\n", "citation": "@article{hendrycksmath2021,\n title={Measuring Mathematical Problem Solving With the Math Dataset},\n author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},\n journal={NeurIPS},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/math", "license": "", "features": {"problem": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "solution": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_math", "config_name": "intermediate_algebra", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 1157476, "num_examples": 1295, "dataset_name": "hendrycks_math"}, "test": {"name": "test", "num_bytes": 795070, "num_examples": 903, "dataset_name": "hendrycks_math"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/MATH.tar": {"num_bytes": 20327936, "checksum": "0fbe4fad0df66942db6c221cdcc95b298cc7f4595a2f0f518360cce84e90d9ac"}}, "download_size": 20327936, "post_processing_size": null, "dataset_size": 1952546, "size_in_bytes": 22280482}, "number_theory": {"description": "MATH is a dataset of 12,500 challenging competition mathematics problems. Each\nproblem in Math has a full step-by-step solution which can be used to teach\nmodels to generate answer derivations and explanations.\n", "citation": "@article{hendrycksmath2021,\n title={Measuring Mathematical Problem Solving With the Math Dataset},\n author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},\n journal={NeurIPS},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/math", "license": "", "features": {"problem": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "solution": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_math", "config_name": "number_theory", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 595793, "num_examples": 869, "dataset_name": "hendrycks_math"}, "test": {"name": "test", "num_bytes": 349455, "num_examples": 540, "dataset_name": "hendrycks_math"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/MATH.tar": {"num_bytes": 20327936, "checksum": "0fbe4fad0df66942db6c221cdcc95b298cc7f4595a2f0f518360cce84e90d9ac"}}, "download_size": 20327936, "post_processing_size": null, "dataset_size": 945248, "size_in_bytes": 21273184}, "prealgebra": {"description": "MATH is a dataset of 12,500 challenging competition mathematics problems. Each\nproblem in Math has a full step-by-step solution which can be used to teach\nmodels to generate answer derivations and explanations.\n", "citation": "@article{hendrycksmath2021,\n title={Measuring Mathematical Problem Solving With the Math Dataset},\n author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},\n journal={NeurIPS},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/math", "license": "", "features": {"problem": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "solution": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_math", "config_name": "prealgebra", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 715611, "num_examples": 1205, "dataset_name": "hendrycks_math"}, "test": {"name": "test", "num_bytes": 510195, "num_examples": 871, "dataset_name": "hendrycks_math"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/MATH.tar": {"num_bytes": 20327936, "checksum": "0fbe4fad0df66942db6c221cdcc95b298cc7f4595a2f0f518360cce84e90d9ac"}}, "download_size": 20327936, "post_processing_size": null, "dataset_size": 1225806, "size_in_bytes": 21553742}, "precalculus": {"description": "MATH is a dataset of 12,500 challenging competition mathematics problems. Each\nproblem in Math has a full step-by-step solution which can be used to teach\nmodels to generate answer derivations and explanations.\n", "citation": "@article{hendrycksmath2021,\n title={Measuring Mathematical Problem Solving With the Math Dataset},\n author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},\n journal={NeurIPS},\n year={2021}\n}\n", "homepage": "https://github.com/hendrycks/math", "license": "", "features": {"problem": {"dtype": "string", "id": null, "_type": "Value"}, "level": {"dtype": "string", "id": null, "_type": "Value"}, "type": {"dtype": "string", "id": null, "_type": "Value"}, "solution": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "hendrycks_math", "config_name": "precalculus", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 816245, "num_examples": 746, "dataset_name": "hendrycks_math"}, "test": {"name": "test", "num_bytes": 552893, "num_examples": 546, "dataset_name": "hendrycks_math"}}, "download_checksums": {"https://people.eecs.berkeley.edu/~hendrycks/MATH.tar": {"num_bytes": 20327936, "checksum": "0fbe4fad0df66942db6c221cdcc95b298cc7f4595a2f0f518360cce84e90d9ac"}}, "download_size": 20327936, "post_processing_size": null, "dataset_size": 1369138, "size_in_bytes": 21697074}}
================================================
FILE: lm_eval/datasets/hendrycks_math/hendrycks_math.py
================================================
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""MATH dataset."""
import json
import os
import pathlib
import datasets
_CITATION = """\
@article{hendrycksmath2021,
title={Measuring Mathematical Problem Solving With the Math Dataset},
author={Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt},
journal={NeurIPS},
year={2021}
}
"""
_DESCRIPTION = """\
MATH is a dataset of 12,500 challenging competition mathematics problems. Each
problem in Math has a full step-by-step solution which can be used to teach
models to generate answer derivations and explanations.
"""
_HOMEPAGE = "https://github.com/hendrycks/math"
# TODO: Add the licence for the dataset here if you can find it
_LICENSE = ""
_URLS = "https://people.eecs.berkeley.edu/~hendrycks/MATH.tar"
_NAMES = [
"algebra",
"counting_and_probability",
"geometry",
"intermediate_algebra",
"number_theory",
"prealgebra",
"precalculus",
]
class HendrycksMath(datasets.GeneratorBasedBuilder):
"""MATH is a dataset of 12,500 challenging competition mathematics problems."""
VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [
datasets.BuilderConfig(name=name, version=version, description=name)
for name, version in zip(_NAMES, [VERSION] * len(_NAMES))
]
def _info(self):
features = datasets.Features(
{
"problem": datasets.Value("string"),
"level": datasets.Value("string"),
"type": datasets.Value("string"),
"solution": datasets.Value("string"),
}
)
return datasets.DatasetInfo(
description=_DESCRIPTION,
features=features,
homepage=_HOMEPAGE,
license=_LICENSE,
citation=_CITATION,
)
def _split_generators(self, dl_manager):
urls = _URLS
data_dir = dl_manager.download_and_extract(urls)
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"basepath": os.path.join(
data_dir, "MATH", "train", self.config.name
),
"split": "train",
},
),
datasets.SplitGenerator(
name=datasets.Split.TEST,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"basepath": os.path.join(
data_dir, "MATH", "test", self.config.name
),
"split": "test",
},
),
]
# method parameters are unpacked from `gen_kwargs` as given in `_split_generators`
def _generate_examples(self, basepath, split):
key = 0
for file in sorted(pathlib.Path(basepath).iterdir()):
with open(file, "r", encoding="utf-8") as f:
data = json.load(f)
yield key, {
"problem": data["problem"],
"level": data["level"],
"type": data["type"],
"solution": data["solution"],
}
key += 1
================================================
FILE: lm_eval/datasets/logiqa/__init__.py
================================================
================================================
FILE: lm_eval/datasets/logiqa/dataset_infos.json
================================================
{"logiqa": {"description": "LogiQA is a dataset for testing human logical reasoning. It consists of 8,678 QA\ninstances, covering multiple types of deductive reasoning. Results show that state-\nof-the-art neural models perform by far worse than human ceiling. The dataset can\nalso serve as a benchmark for reinvestigating logical AI under the deep learning\nNLP setting.\n", "citation": "@misc{liu2020logiqa,\n title={LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning}, \n author={Jian Liu and Leyang Cui and Hanmeng Liu and Dandan Huang and Yile Wang and Yue Zhang},\n year={2020},\n eprint={2007.08124},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n", "homepage": "https://github.com/lgw863/LogiQA-dataset", "license": "", "features": {"label": {"dtype": "string", "id": null, "_type": "Value"}, "context": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "options": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "logiqa", "config_name": "logiqa", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 6419852, "num_examples": 7376, "dataset_name": "logiqa"}, "test": {"name": "test", "num_bytes": 571705, "num_examples": 651, "dataset_name": "logiqa"}, "validation": {"name": "validation", "num_bytes": 562437, "num_examples": 651, "dataset_name": "logiqa"}}, "download_checksums": {"https://raw.githubusercontent.com/lgw863/LogiQA-dataset/master/Train.txt": {"num_bytes": 6281272, "checksum": "7d5bb1f58278e33b395744cd2ad8d7600faa0b3c4d615c659a44ec1181d759fa"}, "https://raw.githubusercontent.com/lgw863/LogiQA-dataset/master/Test.txt": {"num_bytes": 559060, "checksum": "359acb78c37802208f7fde9e2f6574b8526527c63d6a336f90a53f1932cb4701"}, "https://raw.githubusercontent.com/lgw863/LogiQA-dataset/master/Eval.txt": {"num_bytes": 550021, "checksum": "4c49e6753b7262c001506b9151135abf722247035ab075dad93acdea5789c01f"}}, "download_size": 7390353, "post_processing_size": null, "dataset_size": 7553994, "size_in_bytes": 14944347}}
================================================
FILE: lm_eval/datasets/logiqa/logiqa.py
================================================
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""LogiQA dataset."""
import datasets
_CITATION = """\
@misc{liu2020logiqa,
title={LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning},
author={Jian Liu and Leyang Cui and Hanmeng Liu and Dandan Huang and Yile Wang and Yue Zhang},
year={2020},
eprint={2007.08124},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
"""
_DESCRIPTION = """\
LogiQA is a dataset for testing human logical reasoning. It consists of 8,678 QA
instances, covering multiple types of deductive reasoning. Results show that state-
of-the-art neural models perform by far worse than human ceiling. The dataset can
also serve as a benchmark for reinvestigating logical AI under the deep learning
NLP setting.
"""
_HOMEPAGE = "https://github.com/lgw863/LogiQA-dataset"
# TODO: Add the licence for the dataset here if you can find it
_LICENSE = ""
_URLS = {
"train": "https://raw.githubusercontent.com/lgw863/LogiQA-dataset/master/Train.txt",
"validation": "https://raw.githubusercontent.com/lgw863/LogiQA-dataset/master/Eval.txt",
"test": "https://raw.githubusercontent.com/lgw863/LogiQA-dataset/master/Test.txt",
}
class Logiqa(datasets.GeneratorBasedBuilder):
"""LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning"""
VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [
datasets.BuilderConfig(
name="logiqa", version=VERSION, description="The LogiQA dataset."
),
]
def _info(self):
features = datasets.Features(
{
"label": datasets.Value("string"),
"context": datasets.Value("string"),
"question": datasets.Value("string"),
"options": datasets.features.Sequence(datasets.Value("string")),
}
)
return datasets.DatasetInfo(
description=_DESCRIPTION,
features=features,
homepage=_HOMEPAGE,
license=_LICENSE,
citation=_CITATION,
)
def _split_generators(self, dl_manager):
urls = {
"train": _URLS["train"],
"test": _URLS["test"],
"validation": _URLS["validation"],
}
data_dir = dl_manager.download_and_extract(urls)
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": data_dir["train"],
"split": "train",
},
),
datasets.SplitGenerator(
name=datasets.Split.TEST,
# These kwargs will be passed to _generate_examples
gen_kwargs={"filepath": data_dir["test"], "split": "test"},
),
datasets.SplitGenerator(
name=datasets.Split.VALIDATION,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"filepath": data_dir["validation"],
"split": "validation",
},
),
]
# method parameters are unpacked from `gen_kwargs` as given in `_split_generators`
def _generate_examples(self, filepath, split):
def normalize(text):
return text.replace(".", ". ").strip()
with open(filepath, encoding="utf-8") as f:
data = f.read().strip().split("\n\n")
for key, row in enumerate(data):
example = row.split("\n")
yield key, {
"label": example[0].strip(),
"context": normalize(example[1]),
"question": normalize(example[2]),
"options": [normalize(option[2:]) for option in example[3:]],
}
================================================
FILE: lm_eval/datasets/mutual/__init__.py
================================================
================================================
FILE: lm_eval/datasets/mutual/dataset_infos.json
================================================
{"mutual": {"description": "MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is\nmodified from Chinese high school English listening comprehension test data.\n\nThe MuTual dataset.", "citation": "@inproceedings{mutual,\n title = \"MuTual: A Dataset for Multi-Turn Dialogue Reasoning\",\n author = \"Cui, Leyang and Wu, Yu and Liu, Shujie and Zhang, Yue and Zhou, Ming\" ,\n booktitle = \"Proceedings of the 58th Conference of the Association for Computational Linguistics\",\n year = \"2020\",\n publisher = \"Association for Computational Linguistics\",\n}\n", "homepage": "https://github.com/Nealcly/MuTual", "license": "", "features": {"answers": {"dtype": "string", "id": null, "_type": "Value"}, "options": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}, "article": {"dtype": "string", "id": null, "_type": "Value"}, "id": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "mutual", "config_name": "mutual", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 5141602, "num_examples": 7088, "dataset_name": "mutual"}, "test": {"name": "test", "num_bytes": 634396, "num_examples": 886, "dataset_name": "mutual"}, "validation": {"name": "validation", "num_bytes": 624271, "num_examples": 886, "dataset_name": "mutual"}}, "download_checksums": {"https://github.com/Nealcly/MuTual/archive/master.zip": {"num_bytes": 10997878, "checksum": "bb325cf6c672f0f02699993a37138b0fa0af6fcfc77ec81dfbe46add4d7b29f9"}}, "download_size": 10997878, "post_processing_size": null, "dataset_size": 6400269, "size_in_bytes": 17398147}, "mutual_plus": {"description": "MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is\nmodified from Chinese high school English listening comprehension test data.\n\nMuTualPlus is a more difficult MuTual that replaces positive responses with a safe responses.", "citation": "@inproceedings{mutual,\n title = \"MuTual: A Dataset for Multi-Turn Dialogue Reasoning\",\n author = \"Cui, Leyang and Wu, Yu and Liu, Shujie and Zhang, Yue and Zhou, Ming\" ,\n booktitle = \"Proceedings of the 58th Conference of the Association for Computational Linguistics\",\n year = \"2020\",\n publisher = \"Association for Computational Linguistics\",\n}\n", "homepage": "https://github.com/Nealcly/MuTual", "license": "", "features": {"answers": {"dtype": "string", "id": null, "_type": "Value"}, "options": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}, "article": {"dtype": "string", "id": null, "_type": "Value"}, "id": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "mutual", "config_name": "mutual_plus", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 4921179, "num_examples": 7088, "dataset_name": "mutual"}, "test": {"name": "test", "num_bytes": 606620, "num_examples": 886, "dataset_name": "mutual"}, "validation": {"name": "validation", "num_bytes": 597340, "num_examples": 886, "dataset_name": "mutual"}}, "download_checksums": {"https://github.com/Nealcly/MuTual/archive/master.zip": {"num_bytes": 10997878, "checksum": "bb325cf6c672f0f02699993a37138b0fa0af6fcfc77ec81dfbe46add4d7b29f9"}}, "download_size": 10997878, "post_processing_size": null, "dataset_size": 6125139, "size_in_bytes": 17123017}}
================================================
FILE: lm_eval/datasets/mutual/mutual.py
================================================
# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""MuTual dataset."""
import json
import os
from pathlib import Path
import datasets
_CITATION = """\
@inproceedings{mutual,
title = "MuTual: A Dataset for Multi-Turn Dialogue Reasoning",
author = "Cui, Leyang and Wu, Yu and Liu, Shujie and Zhang, Yue and Zhou, Ming" ,
booktitle = "Proceedings of the 58th Conference of the Association for Computational Linguistics",
year = "2020",
publisher = "Association for Computational Linguistics",
}
"""
_DESCRIPTION = """\
MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is
modified from Chinese high school English listening comprehension test data.
"""
_HOMEPAGE = "https://github.com/Nealcly/MuTual"
# TODO: Add the licence for the dataset here if you can find it
_LICENSE = ""
_URLS = "https://github.com/Nealcly/MuTual/archive/master.zip"
class Mutual(datasets.GeneratorBasedBuilder):
"""MuTual: A Dataset for Multi-Turn Dialogue Reasoning"""
VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [
datasets.BuilderConfig(
name="mutual", version=VERSION, description="The MuTual dataset."
),
datasets.BuilderConfig(
name="mutual_plus",
version=VERSION,
description="MuTualPlus is a more difficult MuTual that replaces positive responses with a safe responses.",
),
]
def _info(self):
features = datasets.Features(
{
"answers": datasets.Value("string"),
"options": datasets.features.Sequence(datasets.Value("string")),
"article": datasets.Value("string"),
"id": datasets.Value("string"),
}
)
return datasets.DatasetInfo(
description=f"{_DESCRIPTION}\n{self.config.description}",
features=features,
homepage=_HOMEPAGE,
license=_LICENSE,
citation=_CITATION,
)
def _split_generators(self, dl_manager):
urls = _URLS
data_dir = dl_manager.download_and_extract(urls)
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"basepath": os.path.join(
data_dir, "MuTual-master", "data", self.config.name, "train"
),
"split": "train",
},
),
datasets.SplitGenerator(
name=datasets.Split.TEST,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"basepath": os.path.join(
data_dir, "MuTual-master", "data", self.config.name, "test"
),
"split": "test",
},
),
datasets.SplitGenerator(
name=datasets.Split.VALIDATION,
# These kwargs will be passed to _generate_examples
gen_kwargs={
"basepath": os.path.join(
data_dir, "MuTual-master", "data", self.config.name, "dev"
),
"split": "dev",
},
),
]
# method parameters are unpacked from `gen_kwargs` as given in `_split_generators`
def _generate_examples(self, basepath, split):
# TODO: This method handles input defined in _split_generators to yield (key, example) tuples from the dataset.
# The `key` is for legacy reasons (tfds) and is not important in itself, but must be unique for each example.
key = 0
for file in sorted(Path(basepath).iterdir()):
if file.suffix != ".txt":
continue
with open(file, "r", encoding="utf-8") as f:
data_str = f.read()
# Ignore the occasional empty file.
if not data_str:
continue
data = json.loads(data_str)
yield key, {
"answers": data["answers"],
"options": data["options"],
"article": data["article"],
"id": data["id"],
}
key += 1
================================================
FILE: lm_eval/datasets/pile/__init__.py
================================================
================================================
FILE: lm_eval/datasets/pile/dataset_infos.json
================================================
{"pile_arxiv": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nArXiv", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_arxiv", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 113218251, "num_examples": 2407, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 115653720, "num_examples": 2434, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 228871971, "size_in_bytes": 1160030307}, "pile_books3": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nBooks3", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_books3", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 150095743, "num_examples": 269, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 177359876, "num_examples": 301, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 327455619, "size_in_bytes": 1258613955}, "pile_bookcorpus2": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nBookCorpus2", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_bookcorpus2", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 9680652, "num_examples": 28, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 9776271, "num_examples": 26, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 19456923, "size_in_bytes": 950615259}, "pile_dm-mathematics": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nDM Mathematics", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_dm-mathematics", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 15756556, "num_examples": 1922, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 16453386, "num_examples": 2007, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 32209942, "size_in_bytes": 963368278}, "pile_enron": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nEnron Emails", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_enron", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 1638859, "num_examples": 1010, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 1556487, "num_examples": 947, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 3195346, "size_in_bytes": 934353682}, "pile_europarl": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nEuroParl", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_europarl", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 8789652, "num_examples": 157, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 9111791, "num_examples": 133, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 17901443, "size_in_bytes": 949059779}, "pile_freelaw": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nFreeLaw", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_freelaw", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 80808693, "num_examples": 5101, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 80363814, "num_examples": 5094, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 161172507, "size_in_bytes": 1092330843}, "pile_github": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nGithub", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_github", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 95654706, "num_examples": 18195, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 97179576, "num_examples": 18337, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 192834282, "size_in_bytes": 1123992618}, "pile_gutenberg": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nGutenberg (PG-19)", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_gutenberg", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 30243176, "num_examples": 80, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 24685980, "num_examples": 60, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 54929156, "size_in_bytes": 986087492}, "pile_hackernews": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nHackerNews", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_hackernews", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 8124255, "num_examples": 1632, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 9803822, "num_examples": 1619, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 17928077, "size_in_bytes": 949086413}, "pile_nih-exporter": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byte), a model must be able to understand many disparate domains\nincluding books, github repositories, webpages, chat logs, and medical, physics,\nmath, computer science, and philosophy papers.\n\nNIH ExPorter", "citation": "@article{pile,\n title={The {P}ile: An 800GB Dataset of Diverse Text for Language Modeling},\n author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor},\n journal={arXiv preprint arXiv:2101.00027},\n year={2020}\n}\n", "homepage": "https://pile.eleuther.ai/", "license": "", "features": {"text": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "pile", "config_name": "pile_nih-exporter", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"test": {"name": "test", "num_bytes": 3928804, "num_examples": 1884, "dataset_name": "pile"}, "validation": {"name": "validation", "num_bytes": 3927967, "num_examples": 1825, "dataset_name": "pile"}}, "download_checksums": {"https://the-eye.eu/public/AI/pile/val.jsonl.zst": {"num_bytes": 470907480, "checksum": "264c875d8bbd355d8daa9d032b75fd8fb91606218bb84dd1155b203fcd5fab92"}, "https://the-eye.eu/public/AI/pile/test.jsonl.zst": {"num_bytes": 460250856, "checksum": "0bb28c52d0b5596d389bf179ce2d43bf7f7ffae76b0d2d20b180c97f62e0975e"}}, "download_size": 931158336, "post_processing_size": null, "dataset_size": 7856771, "size_in_bytes": 939015107}, "pile_opensubtitles": {"description": "The Pile is a 825 GiB diverse, open source language modeling data set that consists\nof 22 smaller, high-quality datasets combined together. To score well on Pile\nBPB (bits per byt
gitextract_2y0s78c7/ ├── .gitignore ├── README.md ├── categories.py ├── datautils.py ├── gptq.py ├── irqlora.py ├── llama.py ├── lm_eval/ │ ├── __init__.py │ ├── base.py │ ├── datasets/ │ │ ├── README.md │ │ ├── __init__.py │ │ ├── asdiv/ │ │ │ ├── __init__.py │ │ │ ├── asdiv.py │ │ │ └── dataset_infos.json │ │ ├── coqa/ │ │ │ ├── __init__.py │ │ │ ├── coqa.py │ │ │ └── dataset_infos.json │ │ ├── drop/ │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── drop.py │ │ ├── headqa/ │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── headqa.py │ │ ├── hendrycks_ethics/ │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── hendrycks_ethics.py │ │ ├── hendrycks_math/ │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── hendrycks_math.py │ │ ├── logiqa/ │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── logiqa.py │ │ ├── mutual/ │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── mutual.py │ │ ├── pile/ │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── pile.py │ │ ├── quac/ │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── quac.py │ │ ├── sat_analogies/ │ │ │ ├── __init__.py │ │ │ └── sat_analogies.py │ │ ├── triviaqa/ │ │ │ ├── README.md │ │ │ ├── __init__.py │ │ │ ├── dataset_infos.json │ │ │ └── triviaqa.py │ │ └── unscramble/ │ │ ├── __init__.py │ │ ├── dataset_infos.json │ │ └── unscramble.py │ ├── decontamination/ │ │ ├── __init__.py │ │ ├── archiver.py │ │ ├── decontaminate.py │ │ └── janitor.py │ ├── evaluator copy.py │ ├── evaluator.py │ ├── metrics.py │ ├── models/ │ │ ├── __init__.py │ │ ├── dummy.py │ │ ├── gpt2.py │ │ ├── gpt3.py │ │ ├── huggingface.py │ │ └── textsynth.py │ ├── quantizer/ │ │ └── irqlora.py │ ├── tasks/ │ │ ├── __init__.py │ │ ├── anli.py │ │ ├── arc.py │ │ ├── arithmetic.py │ │ ├── asdiv.py │ │ ├── blimp.py │ │ ├── cbt.py │ │ ├── coqa.py │ │ ├── crowspairs.py │ │ ├── drop.py │ │ ├── glue.py │ │ ├── gsm8k.py │ │ ├── headqa.py │ │ ├── hellaswag.py │ │ ├── hendrycks_ethics.py │ │ ├── hendrycks_math.py │ │ ├── hendrycks_test.py │ │ ├── lambada.py │ │ ├── lambada_cloze.py │ │ ├── lambada_multilingual.py │ │ ├── logiqa.py │ │ ├── mathqa.py │ │ ├── mc_taco.py │ │ ├── mutual.py │ │ ├── naturalqs.py │ │ ├── openbookqa.py │ │ ├── pile.py │ │ ├── piqa.py │ │ ├── prost.py │ │ ├── pubmedqa.py │ │ ├── qa4mre.py │ │ ├── qasper.py │ │ ├── quac.py │ │ ├── race.py │ │ ├── sat.py │ │ ├── sciq.py │ │ ├── squad.py │ │ ├── storycloze.py │ │ ├── superglue.py │ │ ├── swag.py │ │ ├── toxigen.py │ │ ├── translation.py │ │ ├── triviaqa.py │ │ ├── truthfulqa.py │ │ ├── unscramble.py │ │ ├── webqs.py │ │ ├── wikitext.py │ │ ├── winogrande.py │ │ └── wsc273.py │ └── utils.py ├── main.py ├── models/ │ ├── IRQLoRALMClass.py │ ├── LMClass.py │ ├── int_falcon_layer.py │ ├── int_llama_layer.py │ ├── int_opt_layer.py │ ├── models_utils.py │ └── transformation.py ├── parallel_utils.py ├── quant/ │ ├── __init__.py │ ├── int_linear.py │ ├── int_matmul.py │ ├── omni_norm.py │ ├── omniquant.py │ ├── quantizer.py │ └── utils.py ├── scripts/ │ ├── eval_fake_ptq.sh │ └── eval_irqlora_commonsenseqa.sh └── utils.py
SYMBOL INDEX (1636 symbols across 97 files)
FILE: datautils.py
function set_seed (line 9) | def set_seed(seed):
function get_pile (line 16) | def get_pile(nsamples, seed, seqlen, model):
function get_wikitext2 (line 35) | def get_wikitext2(nsamples, seed, seqlen, model):
function get_ptb (line 56) | def get_ptb(nsamples, seed, seqlen, model):
function get_c4 (line 78) | def get_c4(nsamples, seed, seqlen, model):
function get_ptb_new (line 120) | def get_ptb_new(nsamples, seed, seqlen, model):
function get_c4_new (line 143) | def get_c4_new(nsamples, seed, seqlen, model):
function get_loaders (line 174) | def get_loaders(
FILE: gptq.py
class Observer (line 15) | class Observer:
method __init__ (line 17) | def __init__(self, topk=32):
method submit (line 21) | def submit(self, name: str, layerid: int, gptq, error: float):
method print (line 39) | def print(self):
method items (line 52) | def items(self):
class GPTQ (line 56) | class GPTQ:
method __init__ (line 58) | def __init__(self, layer, observe=False):
method add_batch (line 73) | def add_batch(self, inp, out):
method print_loss (line 101) | def print_loss(self, name, q_weight, weight_error, timecost):
method fasterquant (line 128) | def fasterquant(self, blocksize=128, percdamp=.01, groupsize=-1, actor...
method free (line 233) | def free(self):
FILE: irqlora.py
function replace_to_qlora_model (line 18) | def replace_to_qlora_model(model, model_fp, blocksize2=256, tau_range=0....
function prod (line 22) | def prod(iterable):
function quantize_tensor (line 26) | def quantize_tensor(X, L, idx=False):
function dequantize_tensor (line 36) | def dequantize_tensor(X, L):
function nf4_quant (line 41) | def nf4_quant(weight, weight_shape, tau, compress_statistics, quant_type...
function evaluate_entropy (line 51) | def evaluate_entropy(weight_int8, blocksize):
function search (line 64) | def search(fp_weight: Tensor, fp_weight_shape, compress_statistics, quan...
class IRQLoraLinear4bit (line 86) | class IRQLoraLinear4bit(bnb.nn.Linear4bit, LoraLayer):
method __init__ (line 87) | def __init__(
method forward (line 118) | def forward(self, x: torch.Tensor):
function _replace_with_ours_lora_4bit_linear (line 159) | def _replace_with_ours_lora_4bit_linear(
FILE: llama.py
function get_llama (line 13) | def get_llama(model):
function llama_sequential (line 28) | def llama_sequential(model, dataloader, dev):
function llama_eval (line 174) | def llama_eval(model, testenc, dev):
function llama_pack (line 265) | def llama_pack(model, quantizers, wbits, groupsize):
function load_quant (line 279) | def load_quant(model, checkpoint, wbits, groupsize=-1, fused_mlp=True, e...
function llama_multigpu (line 328) | def llama_multigpu(model, gpus, gpu_dist):
function benchmark (line 385) | def benchmark(model, input_ids, check=False):
FILE: lm_eval/base.py
class LM (line 20) | class LM(abc.ABC):
method __init__ (line 21) | def __init__(self):
method loglikelihood (line 25) | def loglikelihood(self, requests):
method loglikelihood_rolling (line 49) | def loglikelihood_rolling(self, requests):
method greedy_until (line 92) | def greedy_until(self, requests):
method create_from_arg_string (line 110) | def create_from_arg_string(cls, arg_string, additional_config=None):
method set_cache_hook (line 116) | def set_cache_hook(self, cache_hook):
class BaseLM (line 120) | class BaseLM(LM):
method eot_token_id (line 123) | def eot_token_id(self):
method max_length (line 128) | def max_length(self):
method max_gen_toks (line 133) | def max_gen_toks(self):
method batch_size (line 138) | def batch_size(self):
method device (line 143) | def device(self):
method tok_encode (line 147) | def tok_encode(self, string: str):
method tok_decode (line 151) | def tok_decode(self, tokens: Iterable[int]):
method _model_generate (line 155) | def _model_generate(self, context, max_length, eos_token_id):
method _model_call (line 159) | def _model_call(self, inps):
method loglikelihood (line 172) | def loglikelihood(self, requests):
method loglikelihood_rolling (line 187) | def loglikelihood_rolling(self, requests):
method _loglikelihood_tokens (line 221) | def _loglikelihood_tokens(self, requests, disable_tqdm=False):
method greedy_until (line 332) | def greedy_until(self, requests):
class Task (line 372) | class Task(abc.ABC):
method __init__ (line 389) | def __init__(self, data_dir=None, cache_dir=None, download_mode=None):
method download (line 416) | def download(self, data_dir=None, cache_dir=None, download_mode=None):
method should_decontaminate (line 449) | def should_decontaminate(self):
method has_training_docs (line 454) | def has_training_docs(self):
method has_validation_docs (line 459) | def has_validation_docs(self):
method has_test_docs (line 464) | def has_test_docs(self):
method training_docs (line 468) | def training_docs(self):
method validation_docs (line 475) | def validation_docs(self):
method test_docs (line 482) | def test_docs(self):
method _process_doc (line 489) | def _process_doc(self, doc):
method fewshot_examples (line 500) | def fewshot_examples(self, k, rnd):
method doc_to_decontamination_query (line 506) | def doc_to_decontamination_query(self, doc):
method doc_to_text (line 513) | def doc_to_text(self, doc):
method doc_to_target (line 517) | def doc_to_target(self, doc):
method construct_requests (line 521) | def construct_requests(self, doc, ctx):
method process_results (line 535) | def process_results(self, doc, results):
method aggregation (line 548) | def aggregation(self):
method higher_is_better (line 557) | def higher_is_better(self):
method fewshot_description (line 565) | def fewshot_description(self):
method fewshot_context (line 576) | def fewshot_context(
class MultipleChoiceTask (line 645) | class MultipleChoiceTask(Task):
method doc_to_target (line 646) | def doc_to_target(self, doc):
method construct_requests (line 649) | def construct_requests(self, doc, ctx):
method process_results (line 656) | def process_results(self, doc, results):
method higher_is_better (line 668) | def higher_is_better(self):
method aggregation (line 674) | def aggregation(self):
class PerplexityTask (line 681) | class PerplexityTask(Task, abc.ABC):
method should_decontaminate (line 682) | def should_decontaminate(self):
method has_training_docs (line 686) | def has_training_docs(self):
method fewshot_examples (line 689) | def fewshot_examples(self, k, rnd):
method fewshot_context (line 693) | def fewshot_context(
method higher_is_better (line 715) | def higher_is_better(self):
method doc_to_decontamination_query (line 722) | def doc_to_decontamination_query(self, doc):
method doc_to_text (line 725) | def doc_to_text(self, doc):
method doc_to_target (line 728) | def doc_to_target(self, doc):
method construct_requests (line 731) | def construct_requests(self, doc, ctx):
method process_results (line 736) | def process_results(self, doc, results):
method aggregation (line 746) | def aggregation(self):
method count_bytes (line 754) | def count_bytes(cls, doc):
method count_words (line 758) | def count_words(cls, doc):
function hash_args (line 763) | def hash_args(attr, args):
class CacheHook (line 768) | class CacheHook:
method __init__ (line 769) | def __init__(self, cachinglm):
method add_partial (line 776) | def add_partial(self, attr, req, res):
class CachingLM (line 783) | class CachingLM:
method __init__ (line 784) | def __init__(self, lm, cache_db):
method __getattr__ (line 801) | def __getattr__(self, attr):
method get_cache_hook (line 839) | def get_cache_hook(self):
class Request (line 850) | class Request:
method __init__ (line 851) | def __init__(self, request_type, args, index=None):
method __iter__ (line 861) | def __iter__(self):
method __getitem__ (line 867) | def __getitem__(self, i):
method __eq__ (line 872) | def __eq__(self, other):
method __repr__ (line 879) | def __repr__(self):
class RequestFactory (line 883) | class RequestFactory:
method __getattr__ (line 884) | def __getattr__(self, attr):
FILE: lm_eval/datasets/asdiv/asdiv.py
class ASDiv (line 52) | class ASDiv(datasets.GeneratorBasedBuilder):
method _info (line 65) | def _info(self):
method _split_generators (line 83) | def _split_generators(self, dl_manager):
method _generate_examples (line 101) | def _generate_examples(self, filepath, split):
FILE: lm_eval/datasets/coqa/coqa.py
class Coqa (line 88) | class Coqa(datasets.GeneratorBasedBuilder):
method _info (line 99) | def _info(self):
method _split_generators (line 159) | def _split_generators(self, dl_manager):
method _generate_examples (line 182) | def _generate_examples(self, filepath, split):
FILE: lm_eval/datasets/drop/drop.py
class Drop (line 68) | class Drop(datasets.GeneratorBasedBuilder):
method _info (line 79) | def _info(self):
method _split_generators (line 120) | def _split_generators(self, dl_manager):
method _generate_examples (line 147) | def _generate_examples(self, filepath, split):
FILE: lm_eval/datasets/headqa/headqa.py
class HeadQA (line 61) | class HeadQA(datasets.GeneratorBasedBuilder):
method _info (line 77) | def _info(self):
method _split_generators (line 102) | def _split_generators(self, dl_manager):
method _generate_examples (line 133) | def _generate_examples(self, data_dir, filepath):
FILE: lm_eval/datasets/hendrycks_ethics/hendrycks_ethics.py
class EthicsConfig (line 50) | class EthicsConfig(datasets.BuilderConfig):
method __init__ (line 53) | def __init__(self, prefix, features, **kwargs):
class HendrycksEthics (line 67) | class HendrycksEthics(datasets.GeneratorBasedBuilder):
method _info (line 136) | def _info(self):
method _split_generators (line 145) | def _split_generators(self, dl_manager):
method _generate_examples (line 178) | def _generate_examples(self, filepath, split):
FILE: lm_eval/datasets/hendrycks_math/hendrycks_math.py
class HendrycksMath (line 57) | class HendrycksMath(datasets.GeneratorBasedBuilder):
method _info (line 67) | def _info(self):
method _split_generators (line 84) | def _split_generators(self, dl_manager):
method _generate_examples (line 111) | def _generate_examples(self, basepath, split):
FILE: lm_eval/datasets/logiqa/logiqa.py
class Logiqa (line 51) | class Logiqa(datasets.GeneratorBasedBuilder):
method _info (line 62) | def _info(self):
method _split_generators (line 79) | def _split_generators(self, dl_manager):
method _generate_examples (line 111) | def _generate_examples(self, filepath, split):
FILE: lm_eval/datasets/mutual/mutual.py
class Mutual (line 47) | class Mutual(datasets.GeneratorBasedBuilder):
method _info (line 63) | def _info(self):
method _split_generators (line 80) | def _split_generators(self, dl_manager):
method _generate_examples (line 117) | def _generate_examples(self, basepath, split):
FILE: lm_eval/datasets/pile/pile.py
class Pile (line 75) | class Pile(datasets.GeneratorBasedBuilder):
method _info (line 85) | def _info(self):
method _split_generators (line 99) | def _split_generators(self, dl_manager):
method _generate_examples (line 119) | def _generate_examples(self, filepath, split):
FILE: lm_eval/datasets/quac/quac.py
class Quac (line 51) | class Quac(datasets.GeneratorBasedBuilder):
method _info (line 62) | def _info(self):
method _split_generators (line 80) | def _split_generators(self, dl_manager):
method _generate_examples (line 100) | def _generate_examples(self, filepath, split):
FILE: lm_eval/datasets/sat_analogies/sat_analogies.py
class SatAnalogies (line 46) | class SatAnalogies(datasets.GeneratorBasedBuilder):
method manual_download_instructions (line 60) | def manual_download_instructions(self):
method _info (line 69) | def _info(self):
method _split_generators (line 86) | def _split_generators(self, dl_manager):
method _generate_examples (line 103) | def _generate_examples(self, filepath):
FILE: lm_eval/datasets/triviaqa/triviaqa.py
class Triviaqa (line 52) | class Triviaqa(datasets.GeneratorBasedBuilder):
method _info (line 63) | def _info(self):
method _split_generators (line 95) | def _split_generators(self, dl_manager):
method _generate_examples (line 120) | def _generate_examples(self, filepath):
FILE: lm_eval/datasets/unscramble/unscramble.py
class Unscramble (line 61) | class Unscramble(datasets.GeneratorBasedBuilder):
method _info (line 73) | def _info(self):
method _split_generators (line 88) | def _split_generators(self, dl_manager):
method _generate_examples (line 103) | def _generate_examples(self, filepath, split):
FILE: lm_eval/decontamination/archiver.py
function json_serial (line 12) | def json_serial(obj):
class Archive (line 21) | class Archive:
method __init__ (line 22) | def __init__(self, file_path, compression_level=3):
method add_data (line 31) | def add_data(self, data, meta={}):
method commit (line 39) | def commit(self):
class Reader (line 46) | class Reader:
method __init__ (line 47) | def __init__(self):
method read (line 50) | def read(self, file, get_meta=False, autojoin_paragraphs=True, para_jo...
class TextArchive (line 74) | class TextArchive:
method __init__ (line 75) | def __init__(self, file_path, mode="rb+"):
method add_data (line 86) | def add_data(self, data):
method commit (line 89) | def commit(self):
class TextReader (line 94) | class TextReader:
method __init__ (line 95) | def __init__(self, file_path):
method read_tqdm (line 100) | def read_tqdm(self, update_frequency=10000):
method read_and_tell (line 121) | def read_and_tell(self):
method read (line 132) | def read(self):
method read_slow (line 139) | def read_slow(self):
class ZStdTextReader (line 151) | class ZStdTextReader:
method __init__ (line 152) | def __init__(self, file):
method read_tqdm (line 155) | def read_tqdm(self):
FILE: lm_eval/decontamination/decontaminate.py
function get_train_overlap_stub (line 14) | def get_train_overlap_stub(docs, ngrams_path, ngrams_n_size):
function get_train_overlap (line 36) | def get_train_overlap(docs_by_task_set, ngrams_path, limit):
FILE: lm_eval/decontamination/janitor.py
function form_ngrams (line 22) | def form_ngrams(sequence, n):
function word_ngrams (line 39) | def word_ngrams(s, n):
function split_indices (line 71) | def split_indices(s):
function word_ngrams_indices (line 78) | def word_ngrams_indices(s, n):
class Janitor (line 106) | class Janitor:
method __init__ (line 109) | def __init__(
method save_contamination_ngrams (line 138) | def save_contamination_ngrams(self, filename):
method load_contamination_ngrams (line 142) | def load_contamination_ngrams(self, filename):
method register_contaminant (line 150) | def register_contaminant(self, dirt_string):
method clean (line 159) | def clean(self, dirty_string):
method _split_chunks (line 169) | def _split_chunks(self, dirty_string, dirty_parts):
method register_contaminant_cpp (line 192) | def register_contaminant_cpp(self, dirt_string):
method clean_cpp (line 197) | def clean_cpp(self, dirty_string):
method normalize_string (line 207) | def normalize_string(self, s):
method register_contaminant_python (line 210) | def register_contaminant_python(self, dirt_string):
method clean_python (line 215) | def clean_python(self, dirty_string):
FILE: lm_eval/evaluator copy.py
function simple_evaluate (line 17) | def simple_evaluate(
function evaluate (line 127) | def evaluate(
function make_table (line 311) | def make_table(result_dict):
FILE: lm_eval/evaluator.py
function pattern_match (line 11) | def pattern_match(patterns, source_list):
function simple_evaluate (line 18) | def simple_evaluate(
function evaluate (line 96) | def evaluate(
function make_table (line 284) | def make_table(result_dict):
FILE: lm_eval/metrics.py
function mean (line 10) | def mean(arr):
function pop_stddev (line 14) | def pop_stddev(arr):
function sample_stddev (line 19) | def sample_stddev(arr):
function mean_stderr (line 24) | def mean_stderr(arr):
function median (line 28) | def median(arr):
function matthews_corrcoef (line 32) | def matthews_corrcoef(items):
function f1_score (line 39) | def f1_score(items):
function acc_all (line 48) | def acc_all(items):
function acc_all_stderr (line 67) | def acc_all_stderr(items):
function metric_max_over_ground_truths (line 85) | def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
function perplexity (line 94) | def perplexity(items):
function weighted_mean (line 98) | def weighted_mean(items):
function weighted_perplexity (line 103) | def weighted_perplexity(items):
function bits_per_byte (line 107) | def bits_per_byte(items):
function bleu (line 111) | def bleu(items):
function chrf (line 128) | def chrf(items):
function ter (line 142) | def ter(items):
function is_non_str_iterable (line 157) | def is_non_str_iterable(obj):
function _sacreformat (line 161) | def _sacreformat(refs, preds):
class _bootstrap_internal (line 192) | class _bootstrap_internal:
method __init__ (line 193) | def __init__(self, f, n):
method __call__ (line 197) | def __call__(self, v):
function bootstrap_stderr (line 207) | def bootstrap_stderr(f, xs, iters):
function stderr_for_metric (line 236) | def stderr_for_metric(metric, bootstrap_iters):
function yesno (line 255) | def yesno(x):
FILE: lm_eval/models/__init__.py
function get_model (line 18) | def get_model(model_name):
FILE: lm_eval/models/dummy.py
class DummyLM (line 5) | class DummyLM(LM):
method __init__ (line 6) | def __init__(self):
method create_from_arg_string (line 10) | def create_from_arg_string(cls, arg_string, additional_config=None):
method loglikelihood (line 13) | def loglikelihood(self, requests):
method greedy_until (line 21) | def greedy_until(self, requests):
method loglikelihood_rolling (line 30) | def loglikelihood_rolling(self, requests):
FILE: lm_eval/models/gpt2.py
class HFLM (line 6) | class HFLM(BaseLM):
method __init__ (line 7) | def __init__(
method eot_token_id (line 81) | def eot_token_id(self):
method max_length (line 86) | def max_length(self):
method max_gen_toks (line 94) | def max_gen_toks(self):
method batch_size (line 98) | def batch_size(self):
method device (line 103) | def device(self):
method tok_encode (line 107) | def tok_encode(self, string: str):
method tok_decode (line 110) | def tok_decode(self, tokens):
method _model_call (line 113) | def _model_call(self, inps):
method _model_generate (line 124) | def _model_generate(self, context, max_length, eos_token_id):
FILE: lm_eval/models/gpt3.py
function get_result (line 10) | def get_result(response, ctxlen):
function oa_completion (line 38) | def oa_completion(**kwargs):
class GPT3LM (line 57) | class GPT3LM(BaseLM):
method __init__ (line 60) | def __init__(self, engine, truncate=False):
method eot_token_id (line 89) | def eot_token_id(self):
method max_length (line 93) | def max_length(self):
method max_gen_toks (line 98) | def max_gen_toks(self):
method batch_size (line 102) | def batch_size(self):
method device (line 107) | def device(self):
method tok_encode (line 111) | def tok_encode(self, string: str):
method tok_decode (line 114) | def tok_decode(self, tokens):
method _loglikelihood_tokens (line 117) | def _loglikelihood_tokens(self, requests, disable_tqdm=False):
method greedy_until (line 168) | def greedy_until(self, requests):
method _model_call (line 224) | def _model_call(self, inps):
method _model_generate (line 228) | def _model_generate(self, context, max_length, eos_token_id):
FILE: lm_eval/models/huggingface.py
function _get_accelerate_args (line 18) | def _get_accelerate_args(
function _get_dtype (line 43) | def _get_dtype(
class HuggingFaceAutoLM (line 57) | class HuggingFaceAutoLM(BaseLM):
method __init__ (line 66) | def __init__(
method _create_auto_model (line 190) | def _create_auto_model(
method _create_auto_tokenizer (line 212) | def _create_auto_tokenizer(
method add_special_tokens (line 229) | def add_special_tokens(self) -> bool:
method eot_token (line 249) | def eot_token(self) -> str:
method eot_token_id (line 253) | def eot_token_id(self) -> int:
method max_gen_toks (line 257) | def max_gen_toks(self) -> int:
method max_length (line 261) | def max_length(self) -> int:
method batch_size (line 283) | def batch_size(self) -> int:
method device (line 288) | def device(self) -> Union[int, str, torch.device]:
method tok_encode (line 291) | def tok_encode(self, string: str) -> TokenSequence:
method tok_encode_batch (line 295) | def tok_encode_batch(self, strings: List[str]) -> TokenSequence:
method tok_decode (line 303) | def tok_decode(self, tokens: torch.LongTensor) -> List[str]:
method greedy_until (line 306) | def greedy_until(self, requests: List[Tuple[str, dict]]) -> List[str]:
class AutoCausalLM (line 358) | class AutoCausalLM(HuggingFaceAutoLM):
method _create_auto_tokenizer (line 366) | def _create_auto_tokenizer(
method _model_call (line 383) | def _model_call(
method _model_generate (line 388) | def _model_generate(
class AutoSeq2SeqLM (line 422) | class AutoSeq2SeqLM(HuggingFaceAutoLM):
method max_length (line 431) | def max_length(self) -> int:
method loglikelihood (line 439) | def loglikelihood(
method loglikelihood_rolling (line 467) | def loglikelihood_rolling(self, requests: List[Tuple[str, str]]) -> Li...
method _loglikelihood_tokens (line 515) | def _loglikelihood_tokens(
method _model_call (line 551) | def _model_call(
method _model_generate (line 556) | def _model_generate(
class MultiTokenEOSCriteria (line 589) | class MultiTokenEOSCriteria(transformers.StoppingCriteria):
method __init__ (line 592) | def __init__(
method __call__ (line 606) | def __call__(self, input_ids, scores, **kwargs) -> bool:
function stop_sequences_criteria (line 620) | def stop_sequences_criteria(
FILE: lm_eval/models/textsynth.py
function textsynth_completion (line 25) | def textsynth_completion(**kwargs):
class TextSynthLM (line 41) | class TextSynthLM(BaseLM):
method __init__ (line 42) | def __init__(self, engine, truncate=False):
method eot_token_id (line 58) | def eot_token_id(self):
method max_length (line 63) | def max_length(self):
method max_gen_toks (line 68) | def max_gen_toks(self):
method batch_size (line 72) | def batch_size(self):
method device (line 77) | def device(self):
method tok_encode (line 81) | def tok_encode(self, string: str):
method tok_decode (line 85) | def tok_decode(self, tokens):
method loglikelihood (line 89) | def loglikelihood(self, requests):
method loglikelihood_rolling (line 109) | def loglikelihood_rolling(self, requests):
method greedy_until (line 119) | def greedy_until(self, requests):
method _model_call (line 149) | def _model_call(self, inps):
method _model_generate (line 153) | def _model_generate(self, context, max_length, eos_token_id):
FILE: lm_eval/quantizer/irqlora.py
function replace_to_qlora_model (line 18) | def replace_to_qlora_model(model, model_fp, blocksize2=256, tau_range=0....
function prod (line 22) | def prod(iterable):
function quantize_tensor (line 26) | def quantize_tensor(X, L, idx=False):
function dequantize_tensor (line 36) | def dequantize_tensor(X, L):
function nf4_quant (line 41) | def nf4_quant(weight, weight_shape, tau, compress_statistics, quant_type...
function evaluate_entropy (line 51) | def evaluate_entropy(weight_int8, blocksize):
function search (line 64) | def search(fp_weight: Tensor, fp_weight_shape, compress_statistics, quan...
class IRQLoraLinear4bit (line 86) | class IRQLoraLinear4bit(bnb.nn.Linear4bit, LoraLayer):
method __init__ (line 87) | def __init__(
method forward (line 118) | def forward(self, x: torch.Tensor):
function _replace_with_ours_lora_4bit_linear (line 159) | def _replace_with_ours_lora_4bit_linear(
FILE: lm_eval/tasks/__init__.py
function get_task (line 319) | def get_task(task_name):
function get_task_name_from_object (line 328) | def get_task_name_from_object(task_object):
function get_task_dict (line 341) | def get_task_dict(task_name_list: List[Union[str, lm_eval.base.Task]]):
FILE: lm_eval/tasks/anli.py
class ANLIBase (line 33) | class ANLIBase(Task):
method has_training_docs (line 39) | def has_training_docs(self):
method has_validation_docs (line 42) | def has_validation_docs(self):
method has_test_docs (line 45) | def has_test_docs(self):
method training_docs (line 48) | def training_docs(self):
method validation_docs (line 54) | def validation_docs(self):
method test_docs (line 58) | def test_docs(self):
method doc_to_text (line 62) | def doc_to_text(self, doc):
method should_decontaminate (line 74) | def should_decontaminate(self):
method doc_to_decontamination_query (line 77) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 80) | def doc_to_target(self, doc):
method construct_requests (line 86) | def construct_requests(self, doc, ctx):
method process_results (line 102) | def process_results(self, doc, results):
method aggregation (line 116) | def aggregation(self):
method higher_is_better (line 124) | def higher_is_better(self):
class ANLIRound1 (line 133) | class ANLIRound1(ANLIBase):
class ANLIRound2 (line 137) | class ANLIRound2(ANLIBase):
class ANLIRound3 (line 141) | class ANLIRound3(ANLIBase):
FILE: lm_eval/tasks/arc.py
class ARCEasy (line 29) | class ARCEasy(MultipleChoiceTask):
method has_training_docs (line 34) | def has_training_docs(self):
method has_validation_docs (line 37) | def has_validation_docs(self):
method has_test_docs (line 40) | def has_test_docs(self):
method training_docs (line 43) | def training_docs(self):
method validation_docs (line 48) | def validation_docs(self):
method test_docs (line 51) | def test_docs(self):
method _process_doc (line 54) | def _process_doc(self, doc):
method doc_to_text (line 67) | def doc_to_text(self, doc):
method should_decontaminate (line 70) | def should_decontaminate(self):
method doc_to_decontamination_query (line 73) | def doc_to_decontamination_query(self, doc):
class ARCChallenge (line 77) | class ARCChallenge(ARCEasy):
FILE: lm_eval/tasks/arithmetic.py
class Arithmetic (line 29) | class Arithmetic(Task):
method has_training_docs (line 33) | def has_training_docs(self):
method has_validation_docs (line 36) | def has_validation_docs(self):
method has_test_docs (line 39) | def has_test_docs(self):
method training_docs (line 42) | def training_docs(self):
method validation_docs (line 45) | def validation_docs(self):
method test_docs (line 48) | def test_docs(self):
method doc_to_text (line 51) | def doc_to_text(self, doc):
method should_decontaminate (line 54) | def should_decontaminate(self):
method doc_to_decontamination_query (line 57) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 60) | def doc_to_target(self, doc):
method construct_requests (line 63) | def construct_requests(self, doc, ctx):
method process_results (line 67) | def process_results(self, doc, results):
method aggregation (line 71) | def aggregation(self):
method higher_is_better (line 76) | def higher_is_better(self):
class Arithmetic2DPlus (line 80) | class Arithmetic2DPlus(Arithmetic):
class Arithmetic2DMinus (line 84) | class Arithmetic2DMinus(Arithmetic):
class Arithmetic3DPlus (line 88) | class Arithmetic3DPlus(Arithmetic):
class Arithmetic3DMinus (line 92) | class Arithmetic3DMinus(Arithmetic):
class Arithmetic4DPlus (line 96) | class Arithmetic4DPlus(Arithmetic):
class Arithmetic4DMinus (line 100) | class Arithmetic4DMinus(Arithmetic):
class Arithmetic5DPlus (line 104) | class Arithmetic5DPlus(Arithmetic):
class Arithmetic5DMinus (line 108) | class Arithmetic5DMinus(Arithmetic):
class Arithmetic2DMultiplication (line 112) | class Arithmetic2DMultiplication(Arithmetic):
class Arithmetic1DComposite (line 116) | class Arithmetic1DComposite(Arithmetic):
FILE: lm_eval/tasks/asdiv.py
class Asdiv (line 35) | class Asdiv(Task):
method has_training_docs (line 39) | def has_training_docs(self):
method has_validation_docs (line 42) | def has_validation_docs(self):
method has_test_docs (line 45) | def has_test_docs(self):
method training_docs (line 48) | def training_docs(self):
method validation_docs (line 51) | def validation_docs(self):
method test_docs (line 54) | def test_docs(self):
method fewshot_context (line 57) | def fewshot_context(
method doc_to_text (line 65) | def doc_to_text(self, doc):
method should_decontaminate (line 69) | def should_decontaminate(self):
method doc_to_decontamination_query (line 72) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 75) | def doc_to_target(self, doc):
method construct_requests (line 81) | def construct_requests(self, doc, ctx):
method process_results (line 85) | def process_results(self, doc, results):
method aggregation (line 90) | def aggregation(self):
method higher_is_better (line 93) | def higher_is_better(self):
FILE: lm_eval/tasks/blimp.py
class BlimpTask (line 34) | class BlimpTask(Task):
method has_training_docs (line 38) | def has_training_docs(self):
method has_validation_docs (line 41) | def has_validation_docs(self):
method has_test_docs (line 44) | def has_test_docs(self):
method validation_docs (line 47) | def validation_docs(self):
method fewshot_context (line 53) | def fewshot_context(
method doc_to_text (line 73) | def doc_to_text(self, doc):
method should_decontaminate (line 77) | def should_decontaminate(self):
method doc_to_decontamination_query (line 80) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 83) | def doc_to_target(self, doc):
method construct_requests (line 87) | def construct_requests(self, doc, ctx):
method process_results (line 97) | def process_results(self, doc, results):
method higher_is_better (line 107) | def higher_is_better(self):
method aggregation (line 112) | def aggregation(self):
class BlimpAdjunctIsland (line 118) | class BlimpAdjunctIsland(BlimpTask):
class BlimpAnaphorGenderAgreement (line 122) | class BlimpAnaphorGenderAgreement(BlimpTask):
class BlimpAnaphorNumberAgreement (line 126) | class BlimpAnaphorNumberAgreement(BlimpTask):
class BlimpAnimateSubjectPassive (line 130) | class BlimpAnimateSubjectPassive(BlimpTask):
class BlimpAnimateSubjectTrans (line 134) | class BlimpAnimateSubjectTrans(BlimpTask):
class BlimpCausative (line 138) | class BlimpCausative(BlimpTask):
class BlimpComplex_NPIsland (line 142) | class BlimpComplex_NPIsland(BlimpTask):
class BlimpCoordinateStructureConstraintComplexLeftBranch (line 146) | class BlimpCoordinateStructureConstraintComplexLeftBranch(BlimpTask):
class BlimpCoordinateStructureConstraintObjectExtraction (line 150) | class BlimpCoordinateStructureConstraintObjectExtraction(BlimpTask):
class BlimpDeterminerNounAgreement_1 (line 154) | class BlimpDeterminerNounAgreement_1(BlimpTask):
class BlimpDeterminerNounAgreement_2 (line 158) | class BlimpDeterminerNounAgreement_2(BlimpTask):
class BlimpDeterminerNounAgreementIrregular_1 (line 162) | class BlimpDeterminerNounAgreementIrregular_1(BlimpTask):
class BlimpDeterminerNounAgreementIrregular_2 (line 166) | class BlimpDeterminerNounAgreementIrregular_2(BlimpTask):
class BlimpDeterminerNounAgreementWithAdj_2 (line 170) | class BlimpDeterminerNounAgreementWithAdj_2(BlimpTask):
class BlimpDeterminerNounAgreementWithAdjIrregular_1 (line 174) | class BlimpDeterminerNounAgreementWithAdjIrregular_1(BlimpTask):
class BlimpDeterminerNounAgreementWithAdjIrregular_2 (line 178) | class BlimpDeterminerNounAgreementWithAdjIrregular_2(BlimpTask):
class BlimpDeterminerNounAgreementWithAdjective_1 (line 182) | class BlimpDeterminerNounAgreementWithAdjective_1(BlimpTask):
class BlimpDistractorAgreementRelationalNoun (line 186) | class BlimpDistractorAgreementRelationalNoun(BlimpTask):
class BlimpDistractorAgreementRelativeClause (line 190) | class BlimpDistractorAgreementRelativeClause(BlimpTask):
class BlimpDropArgument (line 194) | class BlimpDropArgument(BlimpTask):
class BlimpEllipsisNBar_1 (line 198) | class BlimpEllipsisNBar_1(BlimpTask):
class BlimpEllipsisNBar_2 (line 202) | class BlimpEllipsisNBar_2(BlimpTask):
class BlimpExistentialThereObjectRaising (line 206) | class BlimpExistentialThereObjectRaising(BlimpTask):
class BlimpExistentialThereQuantifiers_1 (line 210) | class BlimpExistentialThereQuantifiers_1(BlimpTask):
class BlimpExistentialThereQuantifiers_2 (line 214) | class BlimpExistentialThereQuantifiers_2(BlimpTask):
class BlimpExistentialThereSubjectRaising (line 218) | class BlimpExistentialThereSubjectRaising(BlimpTask):
class BlimpExpletiveItObjectRaising (line 222) | class BlimpExpletiveItObjectRaising(BlimpTask):
class BlimpInchoative (line 226) | class BlimpInchoative(BlimpTask):
class BlimpIntransitive (line 230) | class BlimpIntransitive(BlimpTask):
class BlimpIrregularPastParticipleAdjectives (line 234) | class BlimpIrregularPastParticipleAdjectives(BlimpTask):
class BlimpIrregularPastParticipleVerbs (line 238) | class BlimpIrregularPastParticipleVerbs(BlimpTask):
class BlimpIrregularPluralSubjectVerbAgreement_1 (line 242) | class BlimpIrregularPluralSubjectVerbAgreement_1(BlimpTask):
class BlimpIrregularPluralSubjectVerbAgreement_2 (line 246) | class BlimpIrregularPluralSubjectVerbAgreement_2(BlimpTask):
class BlimpLeftBranchIslandEchoQuestion (line 250) | class BlimpLeftBranchIslandEchoQuestion(BlimpTask):
class BlimpLeftBranchIslandSimpleQuestion (line 254) | class BlimpLeftBranchIslandSimpleQuestion(BlimpTask):
class BlimpMatrixQuestionNpiLicensorPresent (line 258) | class BlimpMatrixQuestionNpiLicensorPresent(BlimpTask):
class BlimpNpiPresent_1 (line 262) | class BlimpNpiPresent_1(BlimpTask):
class BlimpNpiPresent_2 (line 266) | class BlimpNpiPresent_2(BlimpTask):
class BlimpOnlyNpiLicensorPresent (line 270) | class BlimpOnlyNpiLicensorPresent(BlimpTask):
class BlimpOnlyNpiScope (line 274) | class BlimpOnlyNpiScope(BlimpTask):
class BlimpPassive_1 (line 278) | class BlimpPassive_1(BlimpTask):
class BlimpPassive_2 (line 282) | class BlimpPassive_2(BlimpTask):
class BlimpPrinciple_ACCommand (line 286) | class BlimpPrinciple_ACCommand(BlimpTask):
class BlimpPrinciple_ACase_1 (line 290) | class BlimpPrinciple_ACase_1(BlimpTask):
class BlimpPrinciple_ACase_2 (line 294) | class BlimpPrinciple_ACase_2(BlimpTask):
class BlimpPrinciple_ADomain_1 (line 298) | class BlimpPrinciple_ADomain_1(BlimpTask):
class BlimpPrinciple_ADomain_2 (line 302) | class BlimpPrinciple_ADomain_2(BlimpTask):
class BlimpPrinciple_ADomain_3 (line 306) | class BlimpPrinciple_ADomain_3(BlimpTask):
class BlimpPrinciple_AReconstruction (line 310) | class BlimpPrinciple_AReconstruction(BlimpTask):
class BlimpRegularPluralSubjectVerbAgreement_1 (line 314) | class BlimpRegularPluralSubjectVerbAgreement_1(BlimpTask):
class BlimpRegularPluralSubjectVerbAgreement_2 (line 318) | class BlimpRegularPluralSubjectVerbAgreement_2(BlimpTask):
class BlimpSententialNegationNpiLicensorPresent (line 322) | class BlimpSententialNegationNpiLicensorPresent(BlimpTask):
class BlimpSententialNegationNpiScope (line 326) | class BlimpSententialNegationNpiScope(BlimpTask):
class BlimpSententialSubjectIsland (line 330) | class BlimpSententialSubjectIsland(BlimpTask):
class BlimpSuperlativeQuantifiers_1 (line 334) | class BlimpSuperlativeQuantifiers_1(BlimpTask):
class BlimpSuperlativeQuantifiers_2 (line 338) | class BlimpSuperlativeQuantifiers_2(BlimpTask):
class BlimpToughVsRaising_1 (line 342) | class BlimpToughVsRaising_1(BlimpTask):
class BlimpToughVsRaising_2 (line 346) | class BlimpToughVsRaising_2(BlimpTask):
class BlimpTransitive (line 350) | class BlimpTransitive(BlimpTask):
class BlimpWhIsland (line 354) | class BlimpWhIsland(BlimpTask):
class BlimpWhQuestionsObjectGap (line 358) | class BlimpWhQuestionsObjectGap(BlimpTask):
class BlimpWhQuestionsSubjectGap (line 362) | class BlimpWhQuestionsSubjectGap(BlimpTask):
class BlimpWhQuestionsSubjectGapLongDistance (line 366) | class BlimpWhQuestionsSubjectGapLongDistance(BlimpTask):
class BlimpWhVsThatNoGap (line 370) | class BlimpWhVsThatNoGap(BlimpTask):
class BlimpWhVsThatNoGapLongDistance (line 374) | class BlimpWhVsThatNoGapLongDistance(BlimpTask):
class BlimpWhVsThatWithGap (line 378) | class BlimpWhVsThatWithGap(BlimpTask):
class BlimpWhVsThatWithGapLongDistance (line 382) | class BlimpWhVsThatWithGapLongDistance(BlimpTask):
FILE: lm_eval/tasks/cbt.py
class CBTBase (line 32) | class CBTBase(Task):
method has_training_docs (line 37) | def has_training_docs(self):
method has_validation_docs (line 40) | def has_validation_docs(self):
method has_test_docs (line 43) | def has_test_docs(self):
method training_docs (line 46) | def training_docs(self):
method validation_docs (line 51) | def validation_docs(self):
method test_docs (line 54) | def test_docs(self):
method detokenize (line 57) | def detokenize(self, text):
method doc_to_text (line 73) | def doc_to_text(self, doc):
method should_decontaminate (line 78) | def should_decontaminate(self):
method doc_to_decontamination_query (line 81) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 85) | def doc_to_target(self, doc):
method fewshot_examples (line 88) | def fewshot_examples(self, k, rnd):
method construct_requests (line 94) | def construct_requests(self, doc, ctx):
method process_results (line 113) | def process_results(self, doc, results):
method aggregation (line 127) | def aggregation(self):
method higher_is_better (line 135) | def higher_is_better(self):
class CBTCN (line 144) | class CBTCN(CBTBase):
class CBTNE (line 148) | class CBTNE(CBTBase):
FILE: lm_eval/tasks/coqa.py
class CoQA (line 31) | class CoQA(Task):
method has_training_docs (line 36) | def has_training_docs(self):
method has_validation_docs (line 39) | def has_validation_docs(self):
method has_test_docs (line 42) | def has_test_docs(self):
method training_docs (line 45) | def training_docs(self):
method validation_docs (line 48) | def validation_docs(self):
method test_docs (line 51) | def test_docs(self):
method doc_to_text (line 54) | def doc_to_text(self, doc):
method should_decontaminate (line 66) | def should_decontaminate(self):
method doc_to_decontamination_query (line 69) | def doc_to_decontamination_query(self, doc):
method get_answers (line 73) | def get_answers(cls, doc, turn_id):
method get_answer_choice (line 90) | def get_answer_choice(self, raw_text):
method compute_scores (line 104) | def compute_scores(gold_list, pred):
method doc_to_target (line 126) | def doc_to_target(self, doc, turnid=None):
method construct_requests (line 133) | def construct_requests(self, doc, ctx):
method process_results (line 147) | def process_results(self, doc, results):
method higher_is_better (line 168) | def higher_is_better(self):
method aggregation (line 174) | def aggregation(self):
FILE: lm_eval/tasks/crowspairs.py
class CrowsPairsMutilingual (line 55) | class CrowsPairsMutilingual(Task):
method has_training_docs (line 60) | def has_training_docs(self):
method has_validation_docs (line 63) | def has_validation_docs(self):
method has_test_docs (line 66) | def has_test_docs(self):
method validation_docs (line 69) | def validation_docs(self):
method fewshot_context (line 77) | def fewshot_context(
method doc_to_text (line 97) | def doc_to_text(self, doc):
method should_decontaminate (line 101) | def should_decontaminate(self):
method doc_to_decontamination_query (line 104) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 107) | def doc_to_target(self, doc):
method construct_requests (line 111) | def construct_requests(self, doc, ctx):
method process_results (line 121) | def process_results(self, doc, results):
method higher_is_better (line 133) | def higher_is_better(self):
method aggregation (line 137) | def aggregation(self):
class CrowsPairsEnglish (line 141) | class CrowsPairsEnglish(CrowsPairsMutilingual):
class CrowsPairsFrench (line 145) | class CrowsPairsFrench(CrowsPairsMutilingual):
class CrowsPairsEnglishRaceColor (line 149) | class CrowsPairsEnglishRaceColor(CrowsPairsMutilingual):
class CrowsPairsEnglishSocioeconomic (line 154) | class CrowsPairsEnglishSocioeconomic(CrowsPairsMutilingual):
class CrowsPairsEnglishGender (line 159) | class CrowsPairsEnglishGender(CrowsPairsMutilingual):
class CrowsPairsEnglishAge (line 164) | class CrowsPairsEnglishAge(CrowsPairsMutilingual):
class CrowsPairsEnglishReligion (line 169) | class CrowsPairsEnglishReligion(CrowsPairsMutilingual):
class CrowsPairsEnglishDisability (line 174) | class CrowsPairsEnglishDisability(CrowsPairsMutilingual):
class CrowsPairsEnglishSexualOrientation (line 179) | class CrowsPairsEnglishSexualOrientation(CrowsPairsMutilingual):
class CrowsPairsEnglishNationality (line 184) | class CrowsPairsEnglishNationality(CrowsPairsMutilingual):
class CrowsPairsEnglishPhysicalAppearance (line 189) | class CrowsPairsEnglishPhysicalAppearance(CrowsPairsMutilingual):
class CrowsPairsEnglishAutre (line 194) | class CrowsPairsEnglishAutre(CrowsPairsMutilingual):
class CrowsPairsFrenchRaceColor (line 199) | class CrowsPairsFrenchRaceColor(CrowsPairsMutilingual):
class CrowsPairsFrenchSocioeconomic (line 204) | class CrowsPairsFrenchSocioeconomic(CrowsPairsMutilingual):
class CrowsPairsFrenchGender (line 209) | class CrowsPairsFrenchGender(CrowsPairsMutilingual):
class CrowsPairsFrenchAge (line 214) | class CrowsPairsFrenchAge(CrowsPairsMutilingual):
class CrowsPairsFrenchReligion (line 219) | class CrowsPairsFrenchReligion(CrowsPairsMutilingual):
class CrowsPairsFrenchDisability (line 224) | class CrowsPairsFrenchDisability(CrowsPairsMutilingual):
class CrowsPairsFrenchSexualOrientation (line 229) | class CrowsPairsFrenchSexualOrientation(CrowsPairsMutilingual):
class CrowsPairsFrenchNationality (line 234) | class CrowsPairsFrenchNationality(CrowsPairsMutilingual):
class CrowsPairsFrenchPhysicalAppearance (line 239) | class CrowsPairsFrenchPhysicalAppearance(CrowsPairsMutilingual):
class CrowsPairsFrenchAutre (line 244) | class CrowsPairsFrenchAutre(CrowsPairsMutilingual):
FILE: lm_eval/tasks/drop.py
class DROP (line 40) | class DROP(Task):
method has_training_docs (line 45) | def has_training_docs(self):
method has_validation_docs (line 48) | def has_validation_docs(self):
method has_test_docs (line 51) | def has_test_docs(self):
method training_docs (line 54) | def training_docs(self):
method validation_docs (line 59) | def validation_docs(self):
method _process_doc (line 62) | def _process_doc(self, doc):
method get_answers (line 71) | def get_answers(cls, qa):
method parse_answer (line 102) | def parse_answer(cls, answer):
method doc_to_text (line 114) | def doc_to_text(self, doc):
method should_decontaminate (line 117) | def should_decontaminate(self):
method doc_to_decontamination_query (line 120) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 123) | def doc_to_target(self, doc):
method construct_requests (line 126) | def construct_requests(self, doc, ctx):
method process_results (line 140) | def process_results(self, doc, results):
method get_metrics (line 160) | def get_metrics(self, predicted, gold):
method _answer_to_bags (line 183) | def _answer_to_bags(self, answer):
method _align_bags (line 196) | def _align_bags(self, predicted, gold):
method _compute_f1 (line 215) | def _compute_f1(self, predicted_bag, gold_bag):
method _match_numbers_if_present (line 232) | def _match_numbers_if_present(self, gold_bag, predicted_bag):
method _is_number (line 245) | def _is_number(self, text):
method _remove_articles (line 252) | def _remove_articles(self, text):
method _white_space_fix (line 255) | def _white_space_fix(self, text):
method _remove_punc (line 258) | def _remove_punc(self, text):
method _fix_number (line 265) | def _fix_number(self, text):
method _tokenize (line 268) | def _tokenize(self, text):
method _normalize (line 271) | def _normalize(self, answer):
method aggregation (line 284) | def aggregation(self):
method higher_is_better (line 292) | def higher_is_better(self):
FILE: lm_eval/tasks/glue.py
class CoLA (line 48) | class CoLA(Task):
method has_training_docs (line 53) | def has_training_docs(self):
method has_validation_docs (line 56) | def has_validation_docs(self):
method has_test_docs (line 59) | def has_test_docs(self):
method training_docs (line 62) | def training_docs(self):
method validation_docs (line 67) | def validation_docs(self):
method doc_to_text (line 70) | def doc_to_text(self, doc):
method should_decontaminate (line 75) | def should_decontaminate(self):
method doc_to_decontamination_query (line 78) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 81) | def doc_to_target(self, doc):
method construct_requests (line 84) | def construct_requests(self, doc, ctx):
method process_results (line 89) | def process_results(self, doc, results):
method higher_is_better (line 95) | def higher_is_better(self):
method aggregation (line 98) | def aggregation(self):
class SST (line 102) | class SST(Task):
method has_training_docs (line 107) | def has_training_docs(self):
method has_validation_docs (line 110) | def has_validation_docs(self):
method has_test_docs (line 113) | def has_test_docs(self):
method training_docs (line 116) | def training_docs(self):
method validation_docs (line 121) | def validation_docs(self):
method doc_to_text (line 124) | def doc_to_text(self, doc):
method doc_to_target (line 129) | def doc_to_target(self, doc):
method construct_requests (line 132) | def construct_requests(self, doc, ctx):
method process_results (line 137) | def process_results(self, doc, results):
method higher_is_better (line 143) | def higher_is_better(self):
method aggregation (line 146) | def aggregation(self):
class MNLI (line 153) | class MNLI(Task):
method has_training_docs (line 158) | def has_training_docs(self):
method has_validation_docs (line 161) | def has_validation_docs(self):
method has_test_docs (line 164) | def has_test_docs(self):
method training_docs (line 167) | def training_docs(self):
method validation_docs (line 172) | def validation_docs(self):
method test_docs (line 176) | def test_docs(self):
method doc_to_text (line 180) | def doc_to_text(self, doc):
method doc_to_target (line 187) | def doc_to_target(self, doc):
method construct_requests (line 193) | def construct_requests(self, doc, ctx):
method process_results (line 199) | def process_results(self, doc, results):
method higher_is_better (line 204) | def higher_is_better(self):
method aggregation (line 207) | def aggregation(self):
class MNLIMismatched (line 211) | class MNLIMismatched(MNLI):
method validation_docs (line 214) | def validation_docs(self):
method test_docs (line 218) | def test_docs(self):
class QNLI (line 223) | class QNLI(Task):
method has_training_docs (line 228) | def has_training_docs(self):
method has_validation_docs (line 231) | def has_validation_docs(self):
method has_test_docs (line 234) | def has_test_docs(self):
method training_docs (line 237) | def training_docs(self):
method validation_docs (line 242) | def validation_docs(self):
method doc_to_text (line 245) | def doc_to_text(self, doc):
method doc_to_target (line 253) | def doc_to_target(self, doc):
method construct_requests (line 258) | def construct_requests(self, doc, ctx):
method process_results (line 263) | def process_results(self, doc, results):
method higher_is_better (line 269) | def higher_is_better(self):
method aggregation (line 272) | def aggregation(self):
class WNLI (line 276) | class WNLI(Task):
method has_training_docs (line 281) | def has_training_docs(self):
method has_validation_docs (line 284) | def has_validation_docs(self):
method has_test_docs (line 287) | def has_test_docs(self):
method training_docs (line 290) | def training_docs(self):
method validation_docs (line 295) | def validation_docs(self):
method doc_to_text (line 298) | def doc_to_text(self, doc):
method doc_to_target (line 304) | def doc_to_target(self, doc):
method construct_requests (line 309) | def construct_requests(self, doc, ctx):
method process_results (line 314) | def process_results(self, doc, results):
method higher_is_better (line 320) | def higher_is_better(self):
method aggregation (line 323) | def aggregation(self):
class RTE (line 327) | class RTE(Task):
method has_training_docs (line 332) | def has_training_docs(self):
method has_validation_docs (line 335) | def has_validation_docs(self):
method has_test_docs (line 338) | def has_test_docs(self):
method training_docs (line 341) | def training_docs(self):
method validation_docs (line 346) | def validation_docs(self):
method doc_to_text (line 349) | def doc_to_text(self, doc):
method doc_to_target (line 355) | def doc_to_target(self, doc):
method construct_requests (line 360) | def construct_requests(self, doc, ctx):
method process_results (line 365) | def process_results(self, doc, results):
method higher_is_better (line 371) | def higher_is_better(self):
method aggregation (line 374) | def aggregation(self):
class MRPC (line 381) | class MRPC(Task):
method has_training_docs (line 386) | def has_training_docs(self):
method has_validation_docs (line 389) | def has_validation_docs(self):
method has_test_docs (line 392) | def has_test_docs(self):
method training_docs (line 395) | def training_docs(self):
method validation_docs (line 400) | def validation_docs(self):
method doc_to_text (line 403) | def doc_to_text(self, doc):
method doc_to_target (line 409) | def doc_to_target(self, doc):
method construct_requests (line 412) | def construct_requests(self, doc, ctx):
method process_results (line 417) | def process_results(self, doc, results):
method higher_is_better (line 426) | def higher_is_better(self):
method aggregation (line 429) | def aggregation(self):
class QQP (line 433) | class QQP(Task):
method has_training_docs (line 438) | def has_training_docs(self):
method has_validation_docs (line 441) | def has_validation_docs(self):
method has_test_docs (line 444) | def has_test_docs(self):
method training_docs (line 447) | def training_docs(self):
method validation_docs (line 452) | def validation_docs(self):
method doc_to_text (line 455) | def doc_to_text(self, doc):
method doc_to_target (line 461) | def doc_to_target(self, doc):
method construct_requests (line 464) | def construct_requests(self, doc, ctx):
method process_results (line 469) | def process_results(self, doc, results):
method higher_is_better (line 478) | def higher_is_better(self):
method aggregation (line 481) | def aggregation(self):
class STSB (line 485) | class STSB(Task):
method has_training_docs (line 490) | def has_training_docs(self):
method has_validation_docs (line 493) | def has_validation_docs(self):
method has_test_docs (line 496) | def has_test_docs(self):
method training_docs (line 499) | def training_docs(self):
method validation_docs (line 504) | def validation_docs(self):
method test_docs (line 507) | def test_docs(self):
method doc_to_text (line 510) | def doc_to_text(self, doc):
method doc_to_target (line 516) | def doc_to_target(self, doc):
method construct_requests (line 519) | def construct_requests(self, doc, ctx):
method process_results (line 533) | def process_results(self, doc, results):
method aggregation (line 546) | def aggregation(self):
method higher_is_better (line 555) | def higher_is_better(self):
FILE: lm_eval/tasks/gsm8k.py
class GradeSchoolMath8K (line 40) | class GradeSchoolMath8K(Task):
method has_training_docs (line 45) | def has_training_docs(self):
method has_validation_docs (line 48) | def has_validation_docs(self):
method has_test_docs (line 51) | def has_test_docs(self):
method training_docs (line 54) | def training_docs(self):
method validation_docs (line 57) | def validation_docs(self):
method test_docs (line 60) | def test_docs(self):
method doc_to_text (line 63) | def doc_to_text(self, doc):
method doc_to_target (line 66) | def doc_to_target(self, doc):
method construct_requests (line 69) | def construct_requests(self, doc, ctx):
method _extract_answer (line 85) | def _extract_answer(self, completion):
method _is_correct (line 94) | def _is_correct(self, completion, answer):
method process_results (line 99) | def process_results(self, doc, results):
method aggregation (line 113) | def aggregation(self):
method higher_is_better (line 121) | def higher_is_better(self):
FILE: lm_eval/tasks/headqa.py
class HeadQABase (line 28) | class HeadQABase(MultipleChoiceTask):
method has_training_docs (line 32) | def has_training_docs(self):
method has_validation_docs (line 35) | def has_validation_docs(self):
method has_test_docs (line 38) | def has_test_docs(self):
method training_docs (line 41) | def training_docs(self):
method validation_docs (line 46) | def validation_docs(self):
method test_docs (line 49) | def test_docs(self):
method _process_doc (line 52) | def _process_doc(self, doc):
method doc_to_text (line 61) | def doc_to_text(self, doc):
method should_decontaminate (line 64) | def should_decontaminate(self):
method doc_to_decontamination_query (line 67) | def doc_to_decontamination_query(self, doc):
class HeadQAEn (line 71) | class HeadQAEn(HeadQABase):
class HeadQAEs (line 75) | class HeadQAEs(HeadQABase):
class HeadQAEsDeprecated (line 80) | class HeadQAEsDeprecated(HeadQABase):
method __init__ (line 83) | def __init__(self):
FILE: lm_eval/tasks/hellaswag.py
class HellaSwag (line 30) | class HellaSwag(MultipleChoiceTask):
method has_training_docs (line 35) | def has_training_docs(self):
method has_validation_docs (line 38) | def has_validation_docs(self):
method has_test_docs (line 41) | def has_test_docs(self):
method training_docs (line 44) | def training_docs(self):
method validation_docs (line 49) | def validation_docs(self):
method _process_doc (line 52) | def _process_doc(self, doc):
method preprocess (line 62) | def preprocess(cls, text):
method doc_to_text (line 70) | def doc_to_text(self, doc):
method should_decontaminate (line 73) | def should_decontaminate(self):
method doc_to_decontamination_query (line 76) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/hendrycks_ethics.py
class Ethics (line 37) | class Ethics(Task):
method has_training_docs (line 41) | def has_training_docs(self):
method has_validation_docs (line 44) | def has_validation_docs(self):
method has_test_docs (line 47) | def has_test_docs(self):
method training_docs (line 52) | def training_docs(self):
method validation_docs (line 55) | def validation_docs(self):
method test_docs (line 58) | def test_docs(self):
method doc_to_text (line 62) | def doc_to_text(self, doc):
method doc_to_target (line 66) | def doc_to_target(self, doc):
method construct_requests (line 70) | def construct_requests(self, doc, ctx):
method process_results (line 74) | def process_results(self, doc, results):
method aggregation (line 78) | def aggregation(self):
method higher_is_better (line 82) | def higher_is_better(self):
class EthicsCM (line 86) | class EthicsCM(Ethics):
method doc_to_text (line 90) | def doc_to_text(self, doc):
method should_decontaminate (line 93) | def should_decontaminate(self):
method doc_to_decontamination_query (line 96) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 99) | def doc_to_target(self, doc):
method construct_requests (line 102) | def construct_requests(self, doc, ctx):
method process_results (line 107) | def process_results(self, doc, results):
method aggregation (line 113) | def aggregation(self):
method higher_is_better (line 116) | def higher_is_better(self):
class EthicsDeontology (line 120) | class EthicsDeontology(Ethics):
method doc_to_text (line 124) | def doc_to_text(self, doc):
method should_decontaminate (line 130) | def should_decontaminate(self):
method doc_to_decontamination_query (line 133) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 136) | def doc_to_target(self, doc):
method construct_requests (line 140) | def construct_requests(self, doc, ctx):
method process_results (line 145) | def process_results(self, doc, results):
method calc_em (line 150) | def calc_em(self, items):
method aggregation (line 164) | def aggregation(self):
method higher_is_better (line 167) | def higher_is_better(self):
class EthicsJustice (line 171) | class EthicsJustice(Ethics):
method doc_to_text (line 175) | def doc_to_text(self, doc):
method should_decontaminate (line 180) | def should_decontaminate(self):
method doc_to_decontamination_query (line 183) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 186) | def doc_to_target(self, doc):
method construct_requests (line 190) | def construct_requests(self, doc, ctx):
method process_results (line 195) | def process_results(self, doc, results):
method calc_em (line 200) | def calc_em(self, items):
method aggregation (line 214) | def aggregation(self):
method higher_is_better (line 217) | def higher_is_better(self):
class EthicsUtilitarianismOriginal (line 221) | class EthicsUtilitarianismOriginal(Ethics):
method has_training_docs (line 225) | def has_training_docs(self):
method fewshot_examples (line 229) | def fewshot_examples(self, k, rnd):
method doc_to_text (line 253) | def doc_to_text(self, doc):
method should_decontaminate (line 256) | def should_decontaminate(self):
method doc_to_decontamination_query (line 259) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 262) | def doc_to_target(self, doc):
method construct_requests (line 265) | def construct_requests(self, doc, ctx):
method process_results (line 273) | def process_results(self, doc, results):
method aggregation (line 287) | def aggregation(self):
method higher_is_better (line 290) | def higher_is_better(self):
class EthicsUtilitarianism (line 294) | class EthicsUtilitarianism(Ethics):
method training_docs (line 303) | def training_docs(self):
method validation_docs (line 307) | def validation_docs(self):
method test_docs (line 310) | def test_docs(self):
method _process_doc (line 314) | def _process_doc(self, doc):
method doc_to_text (line 325) | def doc_to_text(self, doc):
method doc_to_target (line 330) | def doc_to_target(self, doc):
method construct_requests (line 333) | def construct_requests(self, doc, ctx):
method process_results (line 338) | def process_results(self, doc, results):
method aggregation (line 344) | def aggregation(self):
method higher_is_better (line 347) | def higher_is_better(self):
class EthicsVirtue (line 351) | class EthicsVirtue(Ethics):
method _process_doc (line 355) | def _process_doc(self, doc):
method doc_to_text (line 358) | def doc_to_text(self, doc):
method doc_to_target (line 363) | def doc_to_target(self, doc):
method construct_requests (line 366) | def construct_requests(self, doc, ctx):
method process_results (line 371) | def process_results(self, doc, results):
method calc_em (line 377) | def calc_em(self, items):
method aggregation (line 392) | def aggregation(self):
method higher_is_better (line 395) | def higher_is_better(self):
FILE: lm_eval/tasks/hendrycks_math.py
class Math (line 27) | class Math(Task):
method has_training_docs (line 31) | def has_training_docs(self):
method has_validation_docs (line 34) | def has_validation_docs(self):
method has_test_docs (line 37) | def has_test_docs(self):
method training_docs (line 40) | def training_docs(self):
method validation_docs (line 43) | def validation_docs(self):
method test_docs (line 46) | def test_docs(self):
method _process_doc (line 49) | def _process_doc(self, doc):
method doc_to_text (line 53) | def doc_to_text(self, doc):
method should_decontaminate (line 56) | def should_decontaminate(self):
method doc_to_decontamination_query (line 59) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 62) | def doc_to_target(self, doc):
method construct_requests (line 65) | def construct_requests(self, doc, ctx):
method process_results (line 68) | def process_results(self, doc, results):
method aggregation (line 82) | def aggregation(self):
method higher_is_better (line 85) | def higher_is_better(self):
method is_equiv (line 88) | def is_equiv(self, str1, str2, verbose=False):
method remove_boxed (line 104) | def remove_boxed(self, s):
method last_boxed_only_string (line 117) | def last_boxed_only_string(self, string):
method fix_fracs (line 147) | def fix_fracs(self, string):
method fix_a_slash_b (line 178) | def fix_a_slash_b(self, string):
method remove_right_units (line 192) | def remove_right_units(self, string):
method fix_sqrt (line 201) | def fix_sqrt(self, string):
class NotEqual (line 215) | class NotEqual:
method __eq__ (line 216) | def __eq__(self, other):
method strip_string (line 219) | def strip_string(self, string):
class MathAlgebra (line 284) | class MathAlgebra(Math):
class MathCountingAndProbability (line 289) | class MathCountingAndProbability(Math):
class MathGeometry (line 294) | class MathGeometry(Math):
class MathIntermediateAlgebra (line 299) | class MathIntermediateAlgebra(Math):
class MathNumberTheory (line 304) | class MathNumberTheory(Math):
class MathPrealgebra (line 309) | class MathPrealgebra(Math):
class MathPrecalculus (line 314) | class MathPrecalculus(Math):
FILE: lm_eval/tasks/hendrycks_test.py
function create_all_tasks (line 89) | def create_all_tasks():
function create_task (line 97) | def create_task(subject):
class GeneralHendrycksTest (line 105) | class GeneralHendrycksTest(MultipleChoiceTask):
method __init__ (line 110) | def __init__(self, subject):
method has_training_docs (line 114) | def has_training_docs(self):
method has_validation_docs (line 117) | def has_validation_docs(self):
method has_test_docs (line 120) | def has_test_docs(self):
method validation_docs (line 123) | def validation_docs(self):
method test_docs (line 126) | def test_docs(self):
method _process_doc (line 129) | def _process_doc(self, doc):
method fewshot_examples (line 156) | def fewshot_examples(self, k, rnd):
method doc_to_text (line 165) | def doc_to_text(self, doc):
method should_decontaminate (line 168) | def should_decontaminate(self):
method doc_to_decontamination_query (line 171) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/lambada.py
class LambadaBase (line 31) | class LambadaBase(Task):
method training_docs (line 34) | def training_docs(self):
method validation_docs (line 38) | def validation_docs(self):
method test_docs (line 42) | def test_docs(self):
method doc_to_text (line 46) | def doc_to_text(self, doc):
method should_decontaminate (line 49) | def should_decontaminate(self):
method doc_to_decontamination_query (line 52) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 55) | def doc_to_target(self, doc):
method construct_requests (line 58) | def construct_requests(self, doc, ctx):
method process_results (line 63) | def process_results(self, doc, results):
method aggregation (line 68) | def aggregation(self):
method higher_is_better (line 71) | def higher_is_better(self):
class LambadaStandard (line 75) | class LambadaStandard(LambadaBase):
method has_training_docs (line 81) | def has_training_docs(self):
method has_validation_docs (line 84) | def has_validation_docs(self):
method has_test_docs (line 87) | def has_test_docs(self):
class LambadaOpenAI (line 91) | class LambadaOpenAI(LambadaBase):
method has_training_docs (line 101) | def has_training_docs(self):
method has_validation_docs (line 104) | def has_validation_docs(self):
method has_test_docs (line 107) | def has_test_docs(self):
FILE: lm_eval/tasks/lambada_cloze.py
class LambadaStandardCloze (line 31) | class LambadaStandardCloze(LambadaStandard):
method doc_to_text (line 36) | def doc_to_text(self, doc):
method should_decontaminate (line 39) | def should_decontaminate(self):
method doc_to_decontamination_query (line 42) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 45) | def doc_to_target(self, doc):
class LambadaOpenAICloze (line 49) | class LambadaOpenAICloze(LambadaOpenAI):
method doc_to_text (line 54) | def doc_to_text(self, doc):
method should_decontaminate (line 57) | def should_decontaminate(self):
method doc_to_decontamination_query (line 60) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 63) | def doc_to_target(self, doc):
FILE: lm_eval/tasks/lambada_multilingual.py
class LambadaOpenAIMultilingualEnglish (line 33) | class LambadaOpenAIMultilingualEnglish(LambadaOpenAI):
class LambadaOpenAIMultilingualFrench (line 38) | class LambadaOpenAIMultilingualFrench(LambadaOpenAI):
class LambadaOpenAIMultilingualGerman (line 43) | class LambadaOpenAIMultilingualGerman(LambadaOpenAI):
class LambadaOpenAIMultilingualItalian (line 48) | class LambadaOpenAIMultilingualItalian(LambadaOpenAI):
class LambadaOpenAIMultilingualSpanish (line 53) | class LambadaOpenAIMultilingualSpanish(LambadaOpenAI):
function construct_tasks (line 67) | def construct_tasks():
FILE: lm_eval/tasks/logiqa.py
class LogiQA (line 30) | class LogiQA(MultipleChoiceTask):
method has_training_docs (line 35) | def has_training_docs(self):
method has_validation_docs (line 38) | def has_validation_docs(self):
method has_test_docs (line 41) | def has_test_docs(self):
method training_docs (line 44) | def training_docs(self):
method validation_docs (line 49) | def validation_docs(self):
method test_docs (line 52) | def test_docs(self):
method _process_doc (line 55) | def _process_doc(self, doc):
method doc_to_text (line 82) | def doc_to_text(self, doc):
method should_decontaminate (line 85) | def should_decontaminate(self):
method doc_to_decontamination_query (line 88) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/mathqa.py
class MathQA (line 27) | class MathQA(MultipleChoiceTask):
method has_training_docs (line 32) | def has_training_docs(self):
method has_validation_docs (line 35) | def has_validation_docs(self):
method has_test_docs (line 38) | def has_test_docs(self):
method training_docs (line 41) | def training_docs(self):
method validation_docs (line 46) | def validation_docs(self):
method test_docs (line 49) | def test_docs(self):
method _process_doc (line 52) | def _process_doc(self, doc):
method doc_to_text (line 66) | def doc_to_text(self, doc):
method should_decontaminate (line 69) | def should_decontaminate(self):
method doc_to_decontamination_query (line 72) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/mc_taco.py
class MCTACO (line 37) | class MCTACO(Task):
method has_training_docs (line 42) | def has_training_docs(self):
method has_validation_docs (line 45) | def has_validation_docs(self):
method has_test_docs (line 48) | def has_test_docs(self):
method validation_docs (line 51) | def validation_docs(self):
method test_docs (line 54) | def test_docs(self):
method doc_to_text (line 57) | def doc_to_text(self, doc):
method should_decontaminate (line 63) | def should_decontaminate(self):
method doc_to_decontamination_query (line 66) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 69) | def doc_to_target(self, doc):
method construct_requests (line 72) | def construct_requests(self, doc, ctx):
method process_results (line 87) | def process_results(self, doc, results):
method _question2id (line 104) | def _question2id(self, doc):
method aggregation (line 108) | def aggregation(self):
method higher_is_better (line 114) | def higher_is_better(self):
function exact_match (line 121) | def exact_match(items):
function f1 (line 133) | def f1(items):
FILE: lm_eval/tasks/mutual.py
class MuTualBase (line 28) | class MuTualBase(Task):
method has_training_docs (line 34) | def has_training_docs(self):
method has_validation_docs (line 37) | def has_validation_docs(self):
method has_test_docs (line 40) | def has_test_docs(self):
method training_docs (line 43) | def training_docs(self):
method validation_docs (line 46) | def validation_docs(self):
method test_docs (line 49) | def test_docs(self):
method doc_to_text (line 52) | def doc_to_text(self, doc):
method should_decontaminate (line 55) | def should_decontaminate(self):
method doc_to_decontamination_query (line 58) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 61) | def doc_to_target(self, doc):
method construct_requests (line 64) | def construct_requests(self, doc, ctx):
method detokenize (line 70) | def detokenize(self, text):
method process_results (line 86) | def process_results(self, doc, results):
method aggregation (line 94) | def aggregation(self):
method higher_is_better (line 97) | def higher_is_better(self):
class MuTual (line 101) | class MuTual(MuTualBase):
class MuTualPlus (line 105) | class MuTualPlus(MuTualBase):
FILE: lm_eval/tasks/naturalqs.py
class NaturalQs (line 32) | class NaturalQs(Task):
method has_training_docs (line 37) | def has_training_docs(self):
method has_validation_docs (line 40) | def has_validation_docs(self):
method has_test_docs (line 43) | def has_test_docs(self):
method training_docs (line 46) | def training_docs(self):
method validation_docs (line 53) | def validation_docs(self):
method fewshot_examples (line 56) | def fewshot_examples(self, k, rnd):
method doc_to_text (line 63) | def doc_to_text(self, doc):
method should_decontaminate (line 66) | def should_decontaminate(self):
method doc_to_decontamination_query (line 69) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 72) | def doc_to_target(self, doc):
method construct_requests (line 91) | def construct_requests(self, doc, ctx):
method process_results (line 105) | def process_results(self, doc, results):
method aggregation (line 118) | def aggregation(self):
method higher_is_better (line 127) | def higher_is_better(self):
FILE: lm_eval/tasks/openbookqa.py
class OpenBookQA (line 30) | class OpenBookQA(MultipleChoiceTask):
method has_training_docs (line 35) | def has_training_docs(self):
method has_validation_docs (line 38) | def has_validation_docs(self):
method has_test_docs (line 41) | def has_test_docs(self):
method training_docs (line 44) | def training_docs(self):
method validation_docs (line 49) | def validation_docs(self):
method test_docs (line 52) | def test_docs(self):
method _process_doc (line 55) | def _process_doc(self, doc):
method doc_to_text (line 64) | def doc_to_text(self, doc):
method should_decontaminate (line 67) | def should_decontaminate(self):
method doc_to_decontamination_query (line 70) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/pile.py
class PilePerplexityTask (line 28) | class PilePerplexityTask(PerplexityTask):
method has_validation_docs (line 33) | def has_validation_docs(self):
method has_test_docs (line 36) | def has_test_docs(self):
method validation_docs (line 39) | def validation_docs(self):
method test_docs (line 43) | def test_docs(self):
class PileArxiv (line 48) | class PileArxiv(PilePerplexityTask):
class PileBooks3 (line 52) | class PileBooks3(PilePerplexityTask):
class PileBookCorpus2 (line 56) | class PileBookCorpus2(PilePerplexityTask):
class PileDmMathematics (line 60) | class PileDmMathematics(PilePerplexityTask):
class PileEnron (line 64) | class PileEnron(PilePerplexityTask):
class PileEuroparl (line 68) | class PileEuroparl(PilePerplexityTask):
class PileFreeLaw (line 72) | class PileFreeLaw(PilePerplexityTask):
class PileGithub (line 76) | class PileGithub(PilePerplexityTask):
class PileGutenberg (line 80) | class PileGutenberg(PilePerplexityTask):
class PileHackernews (line 84) | class PileHackernews(PilePerplexityTask):
class PileNIHExporter (line 88) | class PileNIHExporter(PilePerplexityTask):
class PileOpenSubtitles (line 92) | class PileOpenSubtitles(PilePerplexityTask):
class PileOpenWebText2 (line 96) | class PileOpenWebText2(PilePerplexityTask):
class PilePhilPapers (line 100) | class PilePhilPapers(PilePerplexityTask):
class PilePileCc (line 104) | class PilePileCc(PilePerplexityTask):
class PilePubmedAbstracts (line 108) | class PilePubmedAbstracts(PilePerplexityTask):
class PilePubmedCentral (line 112) | class PilePubmedCentral(PilePerplexityTask):
class PileStackExchange (line 116) | class PileStackExchange(PilePerplexityTask):
class PileUspto (line 120) | class PileUspto(PilePerplexityTask):
class PileUbuntuIrc (line 124) | class PileUbuntuIrc(PilePerplexityTask):
class PileWikipedia (line 128) | class PileWikipedia(PilePerplexityTask):
class PileYoutubeSubtitles (line 132) | class PileYoutubeSubtitles(PilePerplexityTask):
FILE: lm_eval/tasks/piqa.py
class PiQA (line 29) | class PiQA(MultipleChoiceTask):
method has_training_docs (line 34) | def has_training_docs(self):
method has_validation_docs (line 37) | def has_validation_docs(self):
method has_test_docs (line 40) | def has_test_docs(self):
method training_docs (line 43) | def training_docs(self):
method validation_docs (line 48) | def validation_docs(self):
method _process_doc (line 51) | def _process_doc(self, doc):
method doc_to_text (line 59) | def doc_to_text(self, doc):
method should_decontaminate (line 62) | def should_decontaminate(self):
method doc_to_decontamination_query (line 65) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/prost.py
class PROST (line 38) | class PROST(MultipleChoiceTask):
method has_training_docs (line 43) | def has_training_docs(self):
method has_validation_docs (line 46) | def has_validation_docs(self):
method has_test_docs (line 49) | def has_test_docs(self):
method test_docs (line 52) | def test_docs(self):
method fewshot_context (line 55) | def fewshot_context(
method _process_doc (line 65) | def _process_doc(self, doc):
method doc_to_text (line 73) | def doc_to_text(self, doc):
method should_decontaminate (line 76) | def should_decontaminate(self):
method doc_to_decontamination_query (line 79) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/pubmedqa.py
class Pubmed_QA (line 34) | class Pubmed_QA(Task):
method has_training_docs (line 39) | def has_training_docs(self):
method has_validation_docs (line 42) | def has_validation_docs(self):
method has_test_docs (line 45) | def has_test_docs(self):
method test_docs (line 48) | def test_docs(self):
method doc_to_text (line 53) | def doc_to_text(self, doc):
method should_decontaminate (line 59) | def should_decontaminate(self):
method doc_to_decontamination_query (line 62) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 65) | def doc_to_target(self, doc):
method construct_requests (line 68) | def construct_requests(self, doc, ctx):
method process_results (line 77) | def process_results(self, doc, results):
method aggregation (line 85) | def aggregation(self):
method higher_is_better (line 88) | def higher_is_better(self):
FILE: lm_eval/tasks/qa4mre.py
class QA4MRE (line 29) | class QA4MRE(MultipleChoiceTask):
method has_training_docs (line 34) | def has_training_docs(self):
method has_validation_docs (line 37) | def has_validation_docs(self):
method has_test_docs (line 40) | def has_test_docs(self):
method test_docs (line 43) | def test_docs(self):
method _process_doc (line 47) | def _process_doc(self, doc):
method doc_to_text (line 57) | def doc_to_text(self, doc):
method should_decontaminate (line 60) | def should_decontaminate(self):
method doc_to_decontamination_query (line 63) | def doc_to_decontamination_query(self, doc):
class QA4MRE_2011 (line 67) | class QA4MRE_2011(QA4MRE):
class QA4MRE_2012 (line 71) | class QA4MRE_2012(QA4MRE):
class QA4MRE_2013 (line 75) | class QA4MRE_2013(QA4MRE):
FILE: lm_eval/tasks/qasper.py
function normalize_answer (line 43) | def normalize_answer(s):
function categorise_answer (line 65) | def categorise_answer(answer_blob):
function token_f1_score (line 88) | def token_f1_score(prediction, ground_truth):
class QASPER (line 104) | class QASPER(Task):
method has_training_docs (line 109) | def has_training_docs(self):
method has_validation_docs (line 112) | def has_validation_docs(self):
method has_test_docs (line 115) | def has_test_docs(self):
method doc_to_text (line 118) | def doc_to_text(self, doc):
method doc_to_target (line 132) | def doc_to_target(self, doc):
method training_docs (line 138) | def training_docs(self):
method validation_docs (line 142) | def validation_docs(self):
method _process_doc (line 146) | def _process_doc(self, doc):
method process_results (line 167) | def process_results(self, doc, results):
method aggregation (line 198) | def aggregation(self):
method construct_requests (line 204) | def construct_requests(self, doc, ctx):
method higher_is_better (line 225) | def higher_is_better(self):
FILE: lm_eval/tasks/quac.py
class QuAC (line 28) | class QuAC(Task):
method has_training_docs (line 33) | def has_training_docs(self):
method has_validation_docs (line 36) | def has_validation_docs(self):
method has_test_docs (line 39) | def has_test_docs(self):
method training_docs (line 42) | def training_docs(self):
method validation_docs (line 47) | def validation_docs(self):
method test_docs (line 50) | def test_docs(self):
method _process_doc (line 53) | def _process_doc(self, doc):
method doc_to_text (line 57) | def doc_to_text(self, doc):
method should_decontaminate (line 71) | def should_decontaminate(self):
method doc_to_decontamination_query (line 74) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 77) | def doc_to_target(self, doc):
method construct_requests (line 80) | def construct_requests(self, doc, ctx):
method process_results (line 94) | def process_results(self, doc, results):
method aggregation (line 107) | def aggregation(self):
method higher_is_better (line 116) | def higher_is_better(self):
FILE: lm_eval/tasks/race.py
class each (line 29) | class each:
method __init__ (line 30) | def __init__(self, f):
method __rrshift__ (line 33) | def __rrshift__(self, other):
class RACE (line 37) | class RACE(Task):
method has_training_docs (line 45) | def has_training_docs(self):
method has_validation_docs (line 48) | def has_validation_docs(self):
method has_test_docs (line 51) | def has_test_docs(self):
method _collate_data (line 54) | def _collate_data(self, set):
method training_docs (line 87) | def training_docs(self):
method validation_docs (line 90) | def validation_docs(self):
method test_docs (line 93) | def test_docs(self):
method get_answer_option (line 97) | def get_answer_option(cls, problem):
method last_problem (line 102) | def last_problem(cls, doc):
method doc_to_text (line 105) | def doc_to_text(self, doc):
method should_decontaminate (line 119) | def should_decontaminate(self):
method doc_to_decontamination_query (line 122) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 125) | def doc_to_target(self, doc):
method construct_requests (line 128) | def construct_requests(self, doc, ctx):
method process_results (line 145) | def process_results(self, doc, results):
method aggregation (line 159) | def aggregation(self):
method higher_is_better (line 167) | def higher_is_better(self):
FILE: lm_eval/tasks/sat.py
class SATAnalogies (line 29) | class SATAnalogies(MultipleChoiceTask):
method __init__ (line 34) | def __init__(self, data_dir: str):
method has_training_docs (line 42) | def has_training_docs(self):
method has_validation_docs (line 45) | def has_validation_docs(self):
method has_test_docs (line 48) | def has_test_docs(self):
method training_docs (line 51) | def training_docs(self):
method validation_docs (line 54) | def validation_docs(self):
method test_docs (line 57) | def test_docs(self):
method _process_doc (line 60) | def _process_doc(self, doc):
method doc_to_text (line 70) | def doc_to_text(self, doc):
method should_decontaminate (line 73) | def should_decontaminate(self):
method doc_to_decontamination_query (line 76) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/sciq.py
class SciQ (line 25) | class SciQ(MultipleChoiceTask):
method has_training_docs (line 30) | def has_training_docs(self):
method has_validation_docs (line 33) | def has_validation_docs(self):
method has_test_docs (line 36) | def has_test_docs(self):
method training_docs (line 39) | def training_docs(self):
method validation_docs (line 44) | def validation_docs(self):
method test_docs (line 47) | def test_docs(self):
method _process_doc (line 50) | def _process_doc(self, doc):
method doc_to_text (line 66) | def doc_to_text(self, doc):
method should_decontaminate (line 69) | def should_decontaminate(self):
method doc_to_decontamination_query (line 72) | def doc_to_decontamination_query(self, doc):
FILE: lm_eval/tasks/squad.py
function _squad_metric (line 35) | def _squad_metric(predictions, references):
function _squad_agg (line 40) | def _squad_agg(key, items):
class SQuAD2 (line 46) | class SQuAD2(Task):
method has_training_docs (line 56) | def has_training_docs(self):
method has_validation_docs (line 59) | def has_validation_docs(self):
method has_test_docs (line 62) | def has_test_docs(self):
method training_docs (line 65) | def training_docs(self):
method validation_docs (line 68) | def validation_docs(self):
method doc_to_text (line 71) | def doc_to_text(self, doc):
method should_decontaminate (line 85) | def should_decontaminate(self):
method doc_to_decontamination_query (line 88) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 91) | def doc_to_target(self, doc):
method construct_requests (line 99) | def construct_requests(self, doc, ctx):
method process_results (line 114) | def process_results(self, doc, results):
method aggregation (line 171) | def aggregation(self):
method higher_is_better (line 204) | def higher_is_better(self):
FILE: lm_eval/tasks/storycloze.py
class StoryCloze (line 36) | class StoryCloze(Task):
method __init__ (line 41) | def __init__(self, data_dir: str):
method has_training_docs (line 49) | def has_training_docs(self):
method has_validation_docs (line 52) | def has_validation_docs(self):
method has_test_docs (line 55) | def has_test_docs(self):
method training_docs (line 58) | def training_docs(self):
method validation_docs (line 61) | def validation_docs(self):
method test_docs (line 64) | def test_docs(self):
method doc_to_text (line 67) | def doc_to_text(self, doc):
method should_decontaminate (line 77) | def should_decontaminate(self):
method doc_to_decontamination_query (line 80) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 90) | def doc_to_target(self, doc):
method construct_requests (line 95) | def construct_requests(self, doc, ctx):
method process_results (line 110) | def process_results(self, doc, results):
method aggregation (line 124) | def aggregation(self):
method higher_is_better (line 132) | def higher_is_better(self):
class StoryCloze2016 (line 141) | class StoryCloze2016(StoryCloze):
class StoryCloze2018 (line 145) | class StoryCloze2018(StoryCloze):
FILE: lm_eval/tasks/superglue.py
class BoolQ (line 35) | class BoolQ(Task):
method has_training_docs (line 40) | def has_training_docs(self):
method has_validation_docs (line 43) | def has_validation_docs(self):
method has_test_docs (line 46) | def has_test_docs(self):
method training_docs (line 49) | def training_docs(self):
method validation_docs (line 54) | def validation_docs(self):
method doc_to_text (line 57) | def doc_to_text(self, doc):
method should_decontaminate (line 60) | def should_decontaminate(self):
method doc_to_decontamination_query (line 63) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 66) | def doc_to_target(self, doc):
method construct_requests (line 69) | def construct_requests(self, doc, ctx):
method process_results (line 76) | def process_results(self, doc, results):
method higher_is_better (line 84) | def higher_is_better(self):
method aggregation (line 87) | def aggregation(self):
class CommitmentBank (line 91) | class CommitmentBank(Task):
method has_training_docs (line 96) | def has_training_docs(self):
method has_validation_docs (line 99) | def has_validation_docs(self):
method has_test_docs (line 102) | def has_test_docs(self):
method training_docs (line 105) | def training_docs(self):
method validation_docs (line 110) | def validation_docs(self):
method doc_to_text (line 113) | def doc_to_text(self, doc):
method doc_to_target (line 119) | def doc_to_target(self, doc):
method construct_requests (line 125) | def construct_requests(self, doc, ctx):
method process_results (line 132) | def process_results(self, doc, results):
method higher_is_better (line 139) | def higher_is_better(self):
method cb_multi_fi (line 143) | def cb_multi_fi(cls, items):
method aggregation (line 153) | def aggregation(self):
class Copa (line 160) | class Copa(Task):
method has_training_docs (line 165) | def has_training_docs(self):
method has_validation_docs (line 168) | def has_validation_docs(self):
method has_test_docs (line 171) | def has_test_docs(self):
method training_docs (line 174) | def training_docs(self):
method validation_docs (line 179) | def validation_docs(self):
method doc_to_text (line 182) | def doc_to_text(self, doc):
method doc_to_target (line 190) | def doc_to_target(self, doc):
method construct_requests (line 195) | def construct_requests(self, doc, ctx):
method process_results (line 204) | def process_results(self, doc, results):
method higher_is_better (line 211) | def higher_is_better(self):
method aggregation (line 214) | def aggregation(self):
method convert_choice (line 218) | def convert_choice(choice):
class MultiRC (line 222) | class MultiRC(Task):
method has_training_docs (line 227) | def has_training_docs(self):
method has_validation_docs (line 230) | def has_validation_docs(self):
method has_test_docs (line 233) | def has_test_docs(self):
method training_docs (line 236) | def training_docs(self):
method validation_docs (line 241) | def validation_docs(self):
method doc_to_text (line 244) | def doc_to_text(self, doc):
method doc_to_target (line 247) | def doc_to_target(self, doc):
method format_answer (line 251) | def format_answer(answer, label):
method construct_requests (line 255) | def construct_requests(self, doc, ctx):
method process_results (line 264) | def process_results(self, doc, results):
method higher_is_better (line 269) | def higher_is_better(self):
method aggregation (line 272) | def aggregation(self):
class ReCoRD (line 276) | class ReCoRD(Task):
method has_training_docs (line 281) | def has_training_docs(self):
method has_validation_docs (line 284) | def has_validation_docs(self):
method has_test_docs (line 287) | def has_test_docs(self):
method training_docs (line 290) | def training_docs(self):
method validation_docs (line 299) | def validation_docs(self):
method _process_doc (line 305) | def _process_doc(cls, doc):
method doc_to_text (line 313) | def doc_to_text(self, doc):
method format_answer (line 321) | def format_answer(cls, query, entity):
method doc_to_target (line 324) | def doc_to_target(self, doc):
method construct_requests (line 328) | def construct_requests(self, doc, ctx):
method process_results (line 335) | def process_results(self, doc, results):
method higher_is_better (line 356) | def higher_is_better(self):
method aggregation (line 362) | def aggregation(self):
class WordsInContext (line 369) | class WordsInContext(Task):
method has_training_docs (line 374) | def has_training_docs(self):
method has_validation_docs (line 377) | def has_validation_docs(self):
method has_test_docs (line 380) | def has_test_docs(self):
method training_docs (line 383) | def training_docs(self):
method validation_docs (line 388) | def validation_docs(self):
method doc_to_text (line 391) | def doc_to_text(self, doc):
method doc_to_target (line 401) | def doc_to_target(self, doc):
method construct_requests (line 404) | def construct_requests(self, doc, ctx):
method process_results (line 410) | def process_results(self, doc, results):
method higher_is_better (line 418) | def higher_is_better(self):
method aggregation (line 421) | def aggregation(self):
class SGWinogradSchemaChallenge (line 425) | class SGWinogradSchemaChallenge(Task):
method has_training_docs (line 432) | def has_training_docs(self):
method has_validation_docs (line 435) | def has_validation_docs(self):
method has_test_docs (line 438) | def has_test_docs(self):
method training_docs (line 441) | def training_docs(self):
method validation_docs (line 450) | def validation_docs(self):
method doc_to_text (line 453) | def doc_to_text(self, doc):
method doc_to_target (line 468) | def doc_to_target(self, doc):
method construct_requests (line 471) | def construct_requests(self, doc, ctx):
method process_results (line 478) | def process_results(self, doc, results):
method higher_is_better (line 486) | def higher_is_better(self):
method aggregation (line 489) | def aggregation(self):
FILE: lm_eval/tasks/swag.py
class SWAG (line 28) | class SWAG(MultipleChoiceTask):
method has_training_docs (line 33) | def has_training_docs(self):
method has_validation_docs (line 36) | def has_validation_docs(self):
method has_test_docs (line 39) | def has_test_docs(self):
method training_docs (line 42) | def training_docs(self):
method validation_docs (line 47) | def validation_docs(self):
method _process_doc (line 50) | def _process_doc(self, doc):
method doc_to_text (line 58) | def doc_to_text(self, doc):
FILE: lm_eval/tasks/toxigen.py
class ToxiGen (line 24) | class ToxiGen(MultipleChoiceTask):
method has_training_docs (line 29) | def has_training_docs(self):
method has_validation_docs (line 32) | def has_validation_docs(self):
method has_test_docs (line 35) | def has_test_docs(self):
method training_docs (line 38) | def training_docs(self):
method test_docs (line 48) | def test_docs(self):
method _preprocess_dataset (line 53) | def _preprocess_dataset(self, split: str):
method _process_doc (line 62) | def _process_doc(self, doc):
method doc_to_text (line 69) | def doc_to_text(self, doc):
FILE: lm_eval/tasks/translation.py
function create_tasks_from_benchmarks (line 52) | def create_tasks_from_benchmarks(benchmark_dict):
function zh_split (line 78) | def zh_split(zh_text: List[str]) -> List[str]:
function ja_split (line 89) | def ja_split(ja_text: List[str]) -> List[str]:
function create_translation_task (line 107) | def create_translation_task(dataset, language_pair, version=0):
class GeneralTranslationTask (line 117) | class GeneralTranslationTask(Task):
method __init__ (line 121) | def __init__(self, sacrebleu_dataset, sacrebleu_language_pair=None):
method download (line 128) | def download(self, data_dir=None, cache_dir=None, download_mode=None):
method has_training_docs (line 138) | def has_training_docs(self):
method has_validation_docs (line 143) | def has_validation_docs(self):
method has_test_docs (line 147) | def has_test_docs(self):
method test_docs (line 151) | def test_docs(self):
method doc_to_text (line 160) | def doc_to_text(self, doc):
method should_decontaminate (line 166) | def should_decontaminate(self):
method doc_to_decontamination_query (line 169) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 172) | def doc_to_target(self, doc):
method construct_requests (line 176) | def construct_requests(self, doc, ctx):
method process_results (line 189) | def process_results(self, doc, results):
method aggregation (line 205) | def aggregation(self):
method higher_is_better (line 217) | def higher_is_better(self):
method __str__ (line 229) | def __str__(self):
function code_to_language (line 241) | def code_to_language(code):
FILE: lm_eval/tasks/triviaqa.py
class TriviaQA (line 31) | class TriviaQA(Task):
method has_training_docs (line 36) | def has_training_docs(self):
method has_validation_docs (line 39) | def has_validation_docs(self):
method has_test_docs (line 42) | def has_test_docs(self):
method training_docs (line 45) | def training_docs(self):
method validation_docs (line 48) | def validation_docs(self):
method test_docs (line 51) | def test_docs(self):
method doc_to_text (line 54) | def doc_to_text(self, doc):
method should_decontaminate (line 57) | def should_decontaminate(self):
method doc_to_decontamination_query (line 60) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 63) | def doc_to_target(self, doc):
method _remove_prefixes (line 66) | def _remove_prefixes(self, aliases):
method construct_requests (line 76) | def construct_requests(self, doc, ctx):
method process_results (line 83) | def process_results(self, doc, results):
method aggregation (line 86) | def aggregation(self):
method higher_is_better (line 91) | def higher_is_better(self):
FILE: lm_eval/tasks/truthfulqa.py
class TruthfulQAMultipleChoice (line 67) | class TruthfulQAMultipleChoice(Task):
method has_training_docs (line 72) | def has_training_docs(self):
method has_validation_docs (line 75) | def has_validation_docs(self):
method has_test_docs (line 78) | def has_test_docs(self):
method training_docs (line 81) | def training_docs(self):
method validation_docs (line 84) | def validation_docs(self):
method test_docs (line 87) | def test_docs(self):
method doc_to_text (line 90) | def doc_to_text(self, doc):
method should_decontaminate (line 93) | def should_decontaminate(self):
method doc_to_decontamination_query (line 96) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 99) | def doc_to_target(self, doc):
method fewshot_context (line 102) | def fewshot_context(
method construct_requests (line 112) | def construct_requests(self, doc, ctx):
method process_results (line 133) | def process_results(self, doc, results):
method aggregation (line 161) | def aggregation(self):
method higher_is_better (line 164) | def higher_is_better(self):
class TruthfulQAGeneration (line 168) | class TruthfulQAGeneration(Task):
method __init__ (line 173) | def __init__(self):
method has_training_docs (line 183) | def has_training_docs(self):
method has_validation_docs (line 186) | def has_validation_docs(self):
method has_test_docs (line 189) | def has_test_docs(self):
method training_docs (line 192) | def training_docs(self):
method _format_answers (line 195) | def _format_answers(self, answers):
method validation_docs (line 207) | def validation_docs(self):
method test_docs (line 219) | def test_docs(self):
method doc_to_text (line 222) | def doc_to_text(self, doc):
method doc_to_target (line 225) | def doc_to_target(self, doc):
method fewshot_context (line 228) | def fewshot_context(
method construct_requests (line 238) | def construct_requests(self, doc, ctx):
method process_results (line 253) | def process_results(self, doc, results):
method aggregation (line 332) | def aggregation(self):
method higher_is_better (line 351) | def higher_is_better(self):
method bleu (line 370) | def bleu(self, refs, preds):
method rouge (line 392) | def rouge(self, refs, preds):
FILE: lm_eval/tasks/unscramble.py
class WordUnscrambleTask (line 32) | class WordUnscrambleTask(Task):
method has_training_docs (line 37) | def has_training_docs(self):
method has_validation_docs (line 40) | def has_validation_docs(self):
method has_test_docs (line 43) | def has_test_docs(self):
method validation_docs (line 46) | def validation_docs(self):
method doc_to_text (line 49) | def doc_to_text(self, doc):
method should_decontaminate (line 52) | def should_decontaminate(self):
method doc_to_decontamination_query (line 55) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 58) | def doc_to_target(self, doc):
method construct_requests (line 61) | def construct_requests(self, doc, ctx):
method process_results (line 65) | def process_results(self, doc, results):
method aggregation (line 70) | def aggregation(self):
method higher_is_better (line 73) | def higher_is_better(self):
class Anagrams1 (line 77) | class Anagrams1(WordUnscrambleTask):
class Anagrams2 (line 81) | class Anagrams2(WordUnscrambleTask):
class CycleLetters (line 85) | class CycleLetters(WordUnscrambleTask):
class RandomInsertion (line 89) | class RandomInsertion(WordUnscrambleTask):
class ReversedWords (line 93) | class ReversedWords(WordUnscrambleTask):
FILE: lm_eval/tasks/webqs.py
class WebQs (line 34) | class WebQs(Task):
method has_training_docs (line 39) | def has_training_docs(self):
method has_validation_docs (line 42) | def has_validation_docs(self):
method has_test_docs (line 45) | def has_test_docs(self):
method training_docs (line 48) | def training_docs(self):
method test_docs (line 53) | def test_docs(self):
method doc_to_text (line 56) | def doc_to_text(self, doc):
method should_decontaminate (line 59) | def should_decontaminate(self):
method doc_to_decontamination_query (line 62) | def doc_to_decontamination_query(self, doc):
method doc_to_target (line 65) | def doc_to_target(self, doc):
method _remove_prefixes (line 71) | def _remove_prefixes(self, aliases):
method construct_requests (line 82) | def construct_requests(self, doc, ctx):
method process_results (line 89) | def process_results(self, doc, results):
method aggregation (line 92) | def aggregation(self):
method higher_is_better (line 97) | def higher_is_better(self):
FILE: lm_eval/tasks/wikitext.py
function wikitext_detokenizer (line 28) | def wikitext_detokenizer(string):
class WikiText (line 62) | class WikiText(PerplexityTask):
method has_training_docs (line 67) | def has_training_docs(self):
method has_validation_docs (line 70) | def has_validation_docs(self):
method has_test_docs (line 73) | def has_test_docs(self):
method training_docs (line 76) | def training_docs(self):
method validation_docs (line 79) | def validation_docs(self):
method test_docs (line 82) | def test_docs(self):
method _process_doc (line 85) | def _process_doc(self, doc):
method doc_to_target (line 88) | def doc_to_target(self, doc):
method should_decontaminate (line 91) | def should_decontaminate(self):
method count_words (line 94) | def count_words(self, doc):
FILE: lm_eval/tasks/winogrande.py
class Winogrande (line 32) | class Winogrande(Task):
method has_training_docs (line 39) | def has_training_docs(self):
method has_validation_docs (line 42) | def has_validation_docs(self):
method has_test_docs (line 45) | def has_test_docs(self):
method training_docs (line 48) | def training_docs(self):
method validation_docs (line 53) | def validation_docs(self):
method doc_to_text (line 56) | def doc_to_text(self, doc):
method should_decontaminate (line 59) | def should_decontaminate(self):
method doc_to_decontamination_query (line 62) | def doc_to_decontamination_query(self, doc):
method partial_context (line 66) | def partial_context(cls, doc, option):
method doc_to_target (line 72) | def doc_to_target(self, doc):
method partial_target (line 76) | def partial_target(cls, doc):
method construct_requests (line 81) | def construct_requests(self, doc, ctx):
method append_context (line 101) | def append_context(cls, ctx, partial_ctx):
method process_results (line 106) | def process_results(self, doc, results):
method aggregation (line 118) | def aggregation(self):
method higher_is_better (line 126) | def higher_is_better(self):
FILE: lm_eval/tasks/wsc273.py
class WinogradSchemaChallenge273 (line 38) | class WinogradSchemaChallenge273(Task):
method has_training_docs (line 57) | def has_training_docs(self):
method has_validation_docs (line 60) | def has_validation_docs(self):
method has_test_docs (line 63) | def has_test_docs(self):
method test_docs (line 66) | def test_docs(self):
method _process_doc (line 69) | def _process_doc(self, doc):
method __normalize_option (line 76) | def __normalize_option(self, doc, option):
method fewshot_examples (line 87) | def fewshot_examples(self, k, rnd):
method doc_to_text (line 96) | def doc_to_text(self, doc):
method should_decontaminate (line 99) | def should_decontaminate(self):
method doc_to_decontamination_query (line 102) | def doc_to_decontamination_query(self, doc):
method partial_context (line 106) | def partial_context(cls, doc, option):
method doc_to_target (line 111) | def doc_to_target(self, doc):
method partial_target (line 115) | def partial_target(cls, doc):
method construct_requests (line 120) | def construct_requests(self, doc, ctx):
method append_context (line 140) | def append_context(cls, ctx, partial_ctx):
method process_results (line 145) | def process_results(self, doc, results):
method aggregation (line 157) | def aggregation(self):
method higher_is_better (line 165) | def higher_is_better(self):
FILE: lm_eval/utils.py
class ExitCodeError (line 13) | class ExitCodeError(Exception):
function sh (line 17) | def sh(x):
function simple_parse_args_string (line 22) | def simple_parse_args_string(args_string):
function join_iters (line 36) | def join_iters(iters):
function chunks (line 41) | def chunks(iter, n):
function group (line 53) | def group(arr, fn):
function general_detokenize (line 62) | def general_detokenize(string):
function get_rolling_token_windows (line 72) | def get_rolling_token_windows(token_list, prefix_token, max_seq_len, con...
function make_disjoint_window (line 113) | def make_disjoint_window(pair):
class Reorderer (line 119) | class Reorderer:
method __init__ (line 120) | def __init__(self, arr, fn):
method get_reordered (line 129) | def get_reordered(self):
method get_original (line 132) | def get_original(self, newarr):
function positional_deprecated (line 146) | def positional_deprecated(fn):
function find_test_root (line 166) | def find_test_root(start_path: pathlib.Path) -> pathlib.Path:
function run_task_tests (line 184) | def run_task_tests(task_list: List[str]):
FILE: main.py
function evaluate (line 54) | def evaluate(lm, args, logger):
function main (line 189) | def main():
FILE: models/IRQLoRALMClass.py
class IRQLoRALMClass (line 23) | class IRQLoRALMClass(BaseLM):
method __init__ (line 24) | def __init__(self, args):
method eot_token (line 62) | def eot_token(self) -> str:
method eot_token_id (line 66) | def eot_token_id(self):
method max_length (line 71) | def max_length(self):
method max_gen_toks (line 79) | def max_gen_toks(self):
method batch_size (line 84) | def batch_size(self):
method device (line 89) | def device(self):
method tok_encode (line 93) | def tok_encode(self, string: str):
method tok_encode_batch (line 96) | def tok_encode_batch(self, strings):
method tok_decode (line 104) | def tok_decode(self, tokens):
method _model_call (line 107) | def _model_call(self, inps):
method model_batched_set (line 118) | def model_batched_set(self, inps):
method _model_generate (line 127) | def _model_generate(self, context, max_length, eos_token_id):
FILE: models/LMClass.py
class LMClass (line 12) | class LMClass(BaseLM):
method __init__ (line 13) | def __init__(self, args):
method eot_token (line 36) | def eot_token(self) -> str:
method eot_token_id (line 40) | def eot_token_id(self):
method max_length (line 45) | def max_length(self):
method max_gen_toks (line 53) | def max_gen_toks(self):
method batch_size (line 58) | def batch_size(self):
method device (line 63) | def device(self):
method tok_encode (line 67) | def tok_encode(self, string: str):
method tok_encode_batch (line 70) | def tok_encode_batch(self, strings):
method tok_decode (line 78) | def tok_decode(self, tokens):
method _model_call (line 81) | def _model_call(self, inps):
method model_batched_set (line 92) | def model_batched_set(self, inps):
method _model_generate (line 101) | def _model_generate(self, context, max_length, eos_token_id):
FILE: models/int_falcon_layer.py
class QuantFalconMLP (line 20) | class QuantFalconMLP(nn.Module):
method __init__ (line 21) | def __init__(self, org_module: nn.Module,args=None):
method forward (line 28) | def forward(self, x: torch.Tensor) -> torch.Tensor:
class QuantFalconAttention (line 35) | class QuantFalconAttention(nn.Module):
method __init__ (line 36) | def __init__(self, config: FalconConfig, org_module: nn.Module, args=...
method _split_heads (line 65) | def _split_heads(self, fused_qkv: torch.Tensor) -> Tuple[torch.Tensor,...
method _merge_heads (line 97) | def _merge_heads(self, x: torch.Tensor) -> torch.Tensor:
method forward (line 122) | def forward(
class QuantFalconDecoderLayer (line 239) | class QuantFalconDecoderLayer(nn.Module):
method __init__ (line 240) | def __init__(self, config: FalconConfig,
method forward (line 261) | def forward(
method set_quant_state (line 318) | def set_quant_state(self, weight_quant: bool = False, act_quant: bool ...
method smooth_and_quant_inplace (line 329) | def smooth_and_quant_inplace(self):
method clear_temp_variable (line 338) | def clear_temp_variable(self):
method smooth_and_quant_temporary (line 344) | def smooth_and_quant_temporary(self):
method let_parameters (line 363) | def let_parameters(self, use_shift=True):
method lwc_parameters (line 371) | def lwc_parameters(self):
method omni_parameters (line 378) | def omni_parameters(self, use_shift=True):
method omni_state_dict (line 386) | def omni_state_dict(self, destination=None, prefix='', keep_vars=False):
method register_scales_and_zeros (line 394) | def register_scales_and_zeros(self):
FILE: models/int_llama_layer.py
class QuantLlamaMLP (line 20) | class QuantLlamaMLP(nn.Module):
method __init__ (line 21) | def __init__(
method forward (line 44) | def forward(self, x):
class QuantLlamaAttention (line 48) | class QuantLlamaAttention(nn.Module):
method __init__ (line 51) | def __init__(self,
method _shape (line 100) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
method forward (line 103) | def forward(
method set_quant_state (line 181) | def set_quant_state(self, weight_quant: bool = False, act_quant: bool ...
class QuantLlamaDecoderLayer (line 191) | class QuantLlamaDecoderLayer(nn.Module):
method __init__ (line 192) | def __init__(self,
method forward (line 213) | def forward(
method set_quant_state (line 269) | def set_quant_state(self, weight_quant: bool = False, act_quant: bool ...
method smooth_and_quant_temporary (line 279) | def smooth_and_quant_temporary(self):
method clear_temp_variable (line 309) | def clear_temp_variable(self):
method smooth_and_quant_inplace (line 316) | def smooth_and_quant_inplace(self):
method let_parameters (line 334) | def let_parameters(self, use_shift=True):
method lwc_parameters (line 342) | def lwc_parameters(self):
method omni_parameters (line 349) | def omni_parameters(self, use_shift=True):
method omni_state_dict (line 357) | def omni_state_dict(self, destination=None, prefix='', keep_vars=False):
method register_scales_and_zeros (line 365) | def register_scales_and_zeros(self):
FILE: models/int_opt_layer.py
class QuantOPTAttention (line 16) | class QuantOPTAttention(nn.Module):
method __init__ (line 19) | def __init__(
method _shape (line 74) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
method forward (line 81) | def forward(
method set_quant_state (line 215) | def set_quant_state(self, weight_quant: bool = False, act_quant: bool ...
class QuantOPTDecoderLayer (line 230) | class QuantOPTDecoderLayer(nn.Module):
method __init__ (line 231) | def __init__(
method forward (line 268) | def forward(
method set_quant_state (line 348) | def set_quant_state(self, weight_quant: bool = False, act_quant: bool ...
method smooth_and_quant_inplace (line 359) | def smooth_and_quant_inplace(self):
method clear_temp_variable (line 379) | def clear_temp_variable(self):
method smooth_and_quant_temporary (line 385) | def smooth_and_quant_temporary(self):
method let_parameters (line 416) | def let_parameters(self, use_shift=True):
method lwc_parameters (line 424) | def lwc_parameters(self):
method omni_parameters (line 431) | def omni_parameters(self, use_shift=True):
method omni_state_dict (line 439) | def omni_state_dict(self, destination=None, prefix='', keep_vars=False):
method register_scales_and_zeros (line 448) | def register_scales_and_zeros(self):
FILE: models/models_utils.py
class TruncateFunction (line 13) | class TruncateFunction(torch.autograd.Function):
method forward (line 15) | def forward(ctx, input, threshold):
method backward (line 22) | def backward(ctx, grad_output):
function truncate_number (line 26) | def truncate_number(number, threshold=1e-3):
function find_layers (line 30) | def find_layers(module, layers=[nn.Conv2d, nn.Linear, transformers.Conv1...
class CacheHook (line 43) | class CacheHook:
method __init__ (line 44) | def __init__(self, cachinglm):
method add_partial (line 51) | def add_partial(self, attr, req, res):
class LM (line 58) | class LM(abc.ABC):
method __init__ (line 59) | def __init__(self):
method loglikelihood (line 63) | def loglikelihood(self, requests):
method loglikelihood_rolling (line 87) | def loglikelihood_rolling(self, requests):
method greedy_until (line 130) | def greedy_until(self, requests):
method create_from_arg_string (line 148) | def create_from_arg_string(cls, additional_config=None):
method set_cache_hook (line 153) | def set_cache_hook(self, cache_hook):
class BaseLM (line 157) | class BaseLM(LM):
method eot_token_id (line 160) | def eot_token_id(self):
method max_length (line 165) | def max_length(self):
method max_gen_toks (line 170) | def max_gen_toks(self):
method batch_size (line 175) | def batch_size(self):
method device (line 180) | def device(self):
method tok_encode (line 184) | def tok_encode(self, string: str):
method tok_decode (line 188) | def tok_decode(self, tokens: Iterable[int]):
method _model_generate (line 192) | def _model_generate(self, context, max_length, eos_token_id):
method _model_call (line 196) | def _model_call(self, inps):
method loglikelihood (line 209) | def loglikelihood(self, requests):
method loglikelihood_rolling (line 223) | def loglikelihood_rolling(self, requests):
method _loglikelihood_tokens (line 257) | def _loglikelihood_tokens(self, requests, disable_tqdm=False):
method greedy_until (line 434) | def greedy_until(self, requests):
function make_disjoint_window (line 475) | def make_disjoint_window(pair):
function hash_args (line 481) | def hash_args(attr, args):
function simple_parse_args_string (line 486) | def simple_parse_args_string(args_string):
function get_rolling_token_windows (line 503) | def get_rolling_token_windows(token_list, prefix_token, max_seq_len, con...
class Reorderer (line 544) | class Reorderer:
method __init__ (line 545) | def __init__(self, arr, fn):
method get_reordered (line 554) | def get_reordered(self):
method get_original (line 557) | def get_original(self, newarr):
function join_iters (line 571) | def join_iters(iters):
function chunks (line 576) | def chunks(iter, n):
function group (line 588) | def group(arr, fn):
FILE: models/transformation.py
class TruncateFunction (line 5) | class TruncateFunction(torch.autograd.Function):
method forward (line 7) | def forward(ctx, input, threshold):
method backward (line 14) | def backward(ctx, grad_output):
function truncate_number (line 18) | def truncate_number(number, threshold=1e-2):
function smooth_ln_fcs_temporary (line 24) | def smooth_ln_fcs_temporary(ln, fcs, scales,shifts):
function smooth_fc_fc_temporary (line 44) | def smooth_fc_fc_temporary(fc1, fc2, scales,shifts=None):
function smooth_q_k_temporary (line 63) | def smooth_q_k_temporary(q_proj, k_proj, scales):
function smooth_ln_fcs_inplace (line 71) | def smooth_ln_fcs_inplace(ln, fcs, scales,shifts):
function smooth_fc_fc_inplace (line 93) | def smooth_fc_fc_inplace(fc1, fc2, scales,shifts=None):
function smooth_q_k_inplace (line 108) | def smooth_q_k_inplace(q_proj, k_proj, scales,):
FILE: parallel_utils.py
function nvidia_smi_memory_info (line 12) | def nvidia_smi_memory_info():
function get_gpu_memory (line 42) | def get_gpu_memory():
function get_lowest_occupied_gpu (line 59) | def get_lowest_occupied_gpu(wait_memory=1000):
function sort_layers_by_params (line 74) | def sort_layers_by_params(layers: List[nn.Module]):
function get_all_gpu_free_memory (line 80) | def get_all_gpu_free_memory():
function assign_layers_to_gpus (line 89) | def assign_layers_to_gpus(layers: List[nn.Module]):
function forward_hook_wrapper (line 135) | def forward_hook_wrapper(gpu_id):
function add_forward_hooks (line 148) | def add_forward_hooks(layer_gpu_map):
function map_layers_to_multi_gpus (line 159) | def map_layers_to_multi_gpus(layers):
FILE: quant/int_linear.py
class QuantLinear (line 11) | class QuantLinear(nn.Module):
method __init__ (line 16) | def __init__(
method forward (line 48) | def forward(self, input: torch.Tensor):
method set_quant_state (line 67) | def set_quant_state(self, weight_quant: bool = False, act_quant: bool ...
FILE: quant/int_matmul.py
class QuantMatMul (line 7) | class QuantMatMul(nn.Module):
method __init__ (line 8) | def __init__(
method set_quant_state (line 27) | def set_quant_state(self, weight_quant: bool = False, act_quant: bool ...
method quant_x1 (line 31) | def quant_x1(self, x1):
method quant_x2 (line 36) | def quant_x2(self, x2):
method forward (line 41) | def forward(self, x1, x2):
FILE: quant/omni_norm.py
class OmniLayerNorm (line 11) | class OmniLayerNorm(nn.Module):
method __init__ (line 12) | def __init__(self, ori_layer_norm) -> None:
method forward (line 26) | def forward(self, x):
method set_quant_state (line 36) | def set_quant_state(self, use_weight_quant, use_act_quant):
class OmniLlamaRMSNorm (line 40) | class OmniLlamaRMSNorm(nn.Module):
method __init__ (line 41) | def __init__(self, ori_norm, eps=1e-6):
method forward (line 52) | def forward(self, hidden_states):
FILE: quant/omniquant.py
function get_named_linears (line 25) | def get_named_linears(module):
function add_new_module (line 29) | def add_new_module(name, original_module, added_module):
function omniquant (line 42) | def omniquant(
FILE: quant/quantizer.py
function round_ste (line 15) | def round_ste(x: torch.Tensor):
class UniformAffineQuantizer (line 23) | class UniformAffineQuantizer(nn.Module):
method __init__ (line 24) | def __init__(
method change_n_bits (line 85) | def change_n_bits(self, n_bits):
method fake_quant (line 94) | def fake_quant(self, x, scale, round_zero_point):
method forward (line 118) | def forward(self, x: torch.Tensor):
method per_token_dynamic_calibration (line 132) | def per_token_dynamic_calibration(self, x):
method register_scales_and_zeros (line 161) | def register_scales_and_zeros(self):
FILE: quant/utils.py
function let_parameters (line 8) | def let_parameters(model, use_shift=True):
function lwc_parameters (line 16) | def lwc_parameters(model):
function get_omni_parameters (line 23) | def get_omni_parameters(model, use_shift=True):
function omni_state_dict (line 31) | def omni_state_dict(model, destination=None, prefix='', keep_vars=False):
function register_scales_and_zeros (line 39) | def register_scales_and_zeros(model):
class TruncateFunction (line 44) | class TruncateFunction(torch.autograd.Function):
method forward (line 46) | def forward(ctx, input, threshold):
method backward (line 53) | def backward(ctx, grad_output):
function truncate_number (line 58) | def truncate_number(number, threshold=1e-2):
function smooth_and_quant_temporary (line 62) | def smooth_and_quant_temporary(model, args, isllama):
function clear_temp_variable (line 103) | def clear_temp_variable(model):
function smooth_and_quant_inplace (line 112) | def smooth_and_quant_inplace(model, args, isllama):
function set_quant_state (line 138) | def set_quant_state(self, weight_quant: bool = False, act_quant: bool = ...
FILE: utils.py
function ampscaler_get_grad_norm (line 12) | def ampscaler_get_grad_norm(parameters, norm_type: float = 2.0) -> torch...
class NativeScalerWithGradNormCount (line 27) | class NativeScalerWithGradNormCount:
method __init__ (line 30) | def __init__(self):
method __call__ (line 33) | def __call__(self, loss, optimizer, clip_grad=None, parameters=None, c...
method state_dict (line 49) | def state_dict(self):
method load_state_dict (line 52) | def load_state_dict(self, state_dict):
function create_logger (line 56) | def create_logger(output_dir, dist_rank=0, name=''):
Condensed preview — 133 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (827K chars).
[
{
"path": ".gitignore",
"chars": 22,
"preview": "*/__pycache__/*\n*cache"
},
{
"path": "README.md",
"chars": 3947,
"preview": "# LLaMA3-Quantization\n\nLLaMA3-Quantization is the official implementation of our paper How Good Are Low-bit Quantized LL"
},
{
"path": "categories.py",
"chars": 2562,
"preview": "subcategories = {\n \"abstract_algebra\": [\"math\"],\n \"anatomy\": [\"health\"],\n \"astronomy\": [\"physics\"],\n \"busine"
},
{
"path": "datautils.py",
"chars": 6690,
"preview": "import pdb\nfrom transformers import AutoTokenizer\nfrom datasets import load_dataset\nimport numpy as np\nimport torch\nimpo"
},
{
"path": "gptq.py",
"chars": 7785,
"preview": "import math\r\nimport time\r\n\r\nimport torch\r\nimport torch.nn as nn\r\nimport transformers\r\nimport quant\r\nfrom texttable impor"
},
{
"path": "irqlora.py",
"chars": 9059,
"preview": "from tqdm import tqdm\nimport peft\nimport torch\nimport operator\nimport numpy as np\nimport bitsandbytes as bnb\nfrom peft.t"
},
{
"path": "llama.py",
"chars": 21095,
"preview": "import argparse\nimport time\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport quant\n\nfrom gptq import GPTQ, O"
},
{
"path": "lm_eval/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/base.py",
"chars": 30994,
"preview": "import abc\nfrom typing import Iterable\nimport numpy as np\nimport random\nimport re\nimport os\nimport json\nimport hashlib\ni"
},
{
"path": "lm_eval/datasets/README.md",
"chars": 1148,
"preview": "# datasets\n\nThis directory contains custom HuggingFace [dataset loading scripts](https://huggingface.co/docs/datasets/da"
},
{
"path": "lm_eval/datasets/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/asdiv/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/asdiv/asdiv.py",
"chars": 4102,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/asdiv/dataset_infos.json",
"chars": 1972,
"preview": "{\"asdiv\": {\"description\": \"ASDiv (Academia Sinica Diverse MWP Dataset) is a diverse (in terms of both language\\npatterns"
},
{
"path": "lm_eval/datasets/coqa/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/coqa/coqa.py",
"chars": 9080,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/coqa/dataset_infos.json",
"chars": 3478,
"preview": "{\"coqa\": {\"description\": \"CoQA is a large-scale dataset for building Conversational Question Answering\\nsystems. The goa"
},
{
"path": "lm_eval/datasets/drop/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/drop/dataset_infos.json",
"chars": 2871,
"preview": "{\"drop\": {\"description\": \"DROP is a QA dataset which tests comprehensive understanding of paragraphs. In \\nthis crowdsou"
},
{
"path": "lm_eval/datasets/drop/drop.py",
"chars": 7469,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/headqa/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/headqa/dataset_infos.json",
"chars": 6091,
"preview": "{\"es\": {\"description\": \"HEAD-QA is a multi-choice HEAlthcare Dataset. The questions come from exams to access a speciali"
},
{
"path": "lm_eval/datasets/headqa/headqa.py",
"chars": 6506,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/hendrycks_ethics/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/hendrycks_ethics/dataset_infos.json",
"chars": 9770,
"preview": "{\"commonsense\": {\"description\": \"The ETHICS dataset is a benchmark that spans concepts in justice, well-being,\\nduties, "
},
{
"path": "lm_eval/datasets/hendrycks_ethics/hendrycks_ethics.py",
"chars": 8975,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/hendrycks_math/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/hendrycks_math/dataset_infos.json",
"chars": 11371,
"preview": "{\"algebra\": {\"description\": \"MATH is a dataset of 12,500 challenging competition mathematics problems. Each\\nproblem in "
},
{
"path": "lm_eval/datasets/hendrycks_math/hendrycks_math.py",
"chars": 3968,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/logiqa/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/logiqa/dataset_infos.json",
"chars": 2305,
"preview": "{\"logiqa\": {\"description\": \"LogiQA is a dataset for testing human logical reasoning. It consists of 8,678 QA\\ninstances,"
},
{
"path": "lm_eval/datasets/logiqa/logiqa.py",
"chars": 4508,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/mutual/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/mutual/dataset_infos.json",
"chars": 3703,
"preview": "{\"mutual\": {\"description\": \"MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is\\nmodified fr"
},
{
"path": "lm_eval/datasets/mutual/mutual.py",
"chars": 4963,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/pile/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/pile/dataset_infos.json",
"chars": 40574,
"preview": "{\"pile_arxiv\": {\"description\": \"The Pile is a 825 GiB diverse, open source language modeling data set that consists\\nof "
},
{
"path": "lm_eval/datasets/pile/pile.py",
"chars": 4560,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/quac/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/quac/dataset_infos.json",
"chars": 2027,
"preview": "{\"quac\": {\"description\": \"Question Answering in Context (QuAC) is a dataset for modeling, understanding, and \\nparticipa"
},
{
"path": "lm_eval/datasets/quac/quac.py",
"chars": 4434,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/sat_analogies/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/sat_analogies/sat_analogies.py",
"chars": 4495,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/triviaqa/README.md",
"chars": 770,
"preview": "---\ndataset_info:\n features:\n - name: question_id\n dtype: string\n - name: question_source\n dtype: string\n - na"
},
{
"path": "lm_eval/datasets/triviaqa/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/triviaqa/dataset_infos.json",
"chars": 2561,
"preview": "{\"triviaqa\": {\"description\": \"TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence\\"
},
{
"path": "lm_eval/datasets/triviaqa/triviaqa.py",
"chars": 6278,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/datasets/unscramble/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/datasets/unscramble/dataset_infos.json",
"chars": 11423,
"preview": "{\"mid_word_1_anagrams\": {\"description\": \"Unscramble is a small battery of 5 \\u201ccharacter manipulation\\u201d tasks. Ea"
},
{
"path": "lm_eval/datasets/unscramble/unscramble.py",
"chars": 4419,
"preview": "# Copyright 2020 The HuggingFace Datasets Authors and the current dataset script contributor.\n#\n# Licensed under the Apa"
},
{
"path": "lm_eval/decontamination/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lm_eval/decontamination/archiver.py",
"chars": 5461,
"preview": "import os\nimport zstandard\nimport json\nimport jsonlines\nimport io\nimport datetime\nimport mmap\nimport tqdm\nfrom pathlib i"
},
{
"path": "lm_eval/decontamination/decontaminate.py",
"chars": 6880,
"preview": "import time\nimport random\nimport pickle\nimport json\nimport glob\nimport os\nimport collections\n\nfrom .janitor import Janit"
},
{
"path": "lm_eval/decontamination/janitor.py",
"chars": 12725,
"preview": "import re\nimport string\nimport timeit\nimport pickle\nimport traceback\nfrom pprint import pprint\n\n# This is a cpp module. "
},
{
"path": "lm_eval/evaluator copy.py",
"chars": 12386,
"preview": "import collections\nimport itertools\nimport numpy as np\nimport random\nimport lm_eval.metrics\nimport lm_eval.models\nimport"
},
{
"path": "lm_eval/evaluator.py",
"chars": 12273,
"preview": "import collections\nimport itertools\nimport numpy as np\nimport random\nimport lm_eval.metrics\nimport lm_eval.models\nimport"
},
{
"path": "lm_eval/metrics.py",
"chars": 7758,
"preview": "import math\nfrom collections.abc import Iterable\n\nimport numpy as np\nimport sacrebleu\nimport sklearn.metrics\nimport rand"
},
{
"path": "lm_eval/models/__init__.py",
"chars": 424,
"preview": "from . import gpt2\nfrom . import gpt3\nfrom . import huggingface\nfrom . import textsynth\nfrom . import dummy\n\nMODEL_REGIS"
},
{
"path": "lm_eval/models/dummy.py",
"chars": 697,
"preview": "import random\nfrom lm_eval.base import LM\n\n\nclass DummyLM(LM):\n def __init__(self):\n pass\n\n @classmethod\n "
},
{
"path": "lm_eval/models/gpt2.py",
"chars": 3959,
"preview": "import torch\nimport transformers\nfrom lm_eval.base import BaseLM\n\n\nclass HFLM(BaseLM):\n def __init__(\n self,\n "
},
{
"path": "lm_eval/models/gpt3.py",
"chars": 7276,
"preview": "import os\nimport numpy as np\nimport transformers\nfrom lm_eval.base import BaseLM\nfrom lm_eval import utils\nfrom tqdm imp"
},
{
"path": "lm_eval/models/huggingface.py",
"chars": 26158,
"preview": "import math\nimport torch\nimport torch.nn.functional as F\nimport transformers\nfrom typing import List, Mapping, NewType, "
},
{
"path": "lm_eval/models/textsynth.py",
"chars": 5136,
"preview": "\"\"\" TextSynth API\nImplementation provided by Fabrice Bellard:\n https://github.com/EleutherAI/lm-evaluation-harness/is"
},
{
"path": "lm_eval/quantizer/irqlora.py",
"chars": 9059,
"preview": "from tqdm import tqdm\nimport peft\nimport torch\nimport operator\nimport numpy as np\nimport bitsandbytes as bnb\nfrom peft.t"
},
{
"path": "lm_eval/tasks/__init__.py",
"chars": 15559,
"preview": "from pprint import pprint\nfrom typing import List, Union\n\nimport sacrebleu\nimport lm_eval.base\n\nfrom . import superglue\n"
},
{
"path": "lm_eval/tasks/anli.py",
"chars": 4554,
"preview": "\"\"\"\nAdversarial NLI: A New Benchmark for Natural Language Understanding\nhttps://arxiv.org/pdf/1910.14599.pdf\n\nAdversaria"
},
{
"path": "lm_eval/tasks/arc.py",
"chars": 2651,
"preview": "\"\"\"\nThink you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge\nhttps://arxiv.org/pdf/1803.05457.pdf\n"
},
{
"path": "lm_eval/tasks/arithmetic.py",
"chars": 3277,
"preview": "\"\"\"\nLanguage Models are Few-Shot Learners\nhttps://arxiv.org/pdf/2005.14165.pdf\n\nA small battery of 10 tests that involve"
},
{
"path": "lm_eval/tasks/asdiv.py",
"chars": 2918,
"preview": "\"\"\"\nASDiv: A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers\nhttps://arxiv.org/abs/2106.1"
},
{
"path": "lm_eval/tasks/blimp.py",
"chars": 11218,
"preview": "\"\"\"\nBLiMP: A Benchmark of Linguistic Minimal Pairs for English\nhttps://arxiv.org/abs/1912.00582\n\nBLiMP is a challenge se"
},
{
"path": "lm_eval/tasks/cbt.py",
"chars": 4980,
"preview": "\"\"\"\nThe Children’s Book Test (CBT) from the paper:\nhttps://research.fb.com/wp-content/uploads/2016/11/the_goldilocks_pri"
},
{
"path": "lm_eval/tasks/coqa.py",
"chars": 6173,
"preview": "\"\"\"\nCoQA: A Conversational Question Answering Challenge\nhttps://arxiv.org/pdf/1808.07042.pdf\n\nCoQA is a large-scale data"
},
{
"path": "lm_eval/tasks/crowspairs.py",
"chars": 10077,
"preview": "\"\"\"\nCrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models\nhttps://aclanthology.org/2020"
},
{
"path": "lm_eval/tasks/drop.py",
"chars": 10528,
"preview": "\"\"\"\nDROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs\nhttps://aclanthology.org/attach"
},
{
"path": "lm_eval/tasks/glue.py",
"chars": 17242,
"preview": "\"\"\"\nGLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding\nhttps://openreview.net/pdf?id="
},
{
"path": "lm_eval/tasks/gsm8k.py",
"chars": 4374,
"preview": "\"\"\"\n\"Training Verifiers to Solve Math Word Problems\"\nhttps://arxiv.org/abs/2110.14168\n\nState-of-the-art language models "
},
{
"path": "lm_eval/tasks/headqa.py",
"chars": 2441,
"preview": "\"\"\"\nInterpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering\nhttps://aclant"
},
{
"path": "lm_eval/tasks/hellaswag.py",
"chars": 2554,
"preview": "\"\"\"\nHellaSwag: Can a Machine Really Finish Your Sentence?\nhttps://arxiv.org/pdf/1905.07830.pdf\n\nHellaswag is a commonsen"
},
{
"path": "lm_eval/tasks/hendrycks_ethics.py",
"chars": 12398,
"preview": "\"\"\"\nAligning AI With Shared Human Values\nhttps://arxiv.org/pdf/2008.02275.pdf\n\nThe ETHICS dataset is a benchmark that sp"
},
{
"path": "lm_eval/tasks/hendrycks_math.py",
"chars": 9227,
"preview": "\"\"\"\nMeasuring Mathematical Problem Solving With the MATH Dataset\nhttps://arxiv.org/pdf/2103.03874.pdf\n\nMath is a dataset"
},
{
"path": "lm_eval/tasks/hendrycks_test.py",
"chars": 4900,
"preview": "\"\"\"\nMeasuring Massive Multitask Language Understanding\nhttps://arxiv.org/pdf/2009.03300.pdf\n\nThe Hendryck's Test is a be"
},
{
"path": "lm_eval/tasks/lambada.py",
"chars": 3065,
"preview": "\"\"\"\nThe LAMBADA dataset: Word prediction requiring a broad discourse context∗\nhttps://arxiv.org/pdf/1606.06031.pdf\n\nLAMB"
},
{
"path": "lm_eval/tasks/lambada_cloze.py",
"chars": 1975,
"preview": "\"\"\"\nThe LAMBADA dataset: Word prediction requiring a broad discourse context∗\nhttps://arxiv.org/pdf/1606.06031.pdf\n\nCloz"
},
{
"path": "lm_eval/tasks/lambada_multilingual.py",
"chars": 2114,
"preview": "\"\"\"\nThe LAMBADA (OpenAI) dataset: Word prediction requiring a broad discourse context∗\nhttps://arxiv.org/pdf/1606.06031."
},
{
"path": "lm_eval/tasks/logiqa.py",
"chars": 2726,
"preview": "\"\"\"\nLogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning\nhttps://arxiv.org/pdf/2007.0812"
},
{
"path": "lm_eval/tasks/mathqa.py",
"chars": 2087,
"preview": "\"\"\"\nMathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms\nhttps://arxiv.org/pdf/1905.1"
},
{
"path": "lm_eval/tasks/mc_taco.py",
"chars": 5424,
"preview": "\"\"\"\n“Going on a vacation” takes longer than “Going for a walk”:\nA Study of Temporal Commonsense Understanding\nhttps://ar"
},
{
"path": "lm_eval/tasks/mutual.py",
"chars": 3140,
"preview": "\"\"\"\nMuTual: A Dataset for Multi-Turn Dialogue Reasoning\nhttps://www.aclweb.org/anthology/2020.acl-main.130/\n\nMuTual is a"
},
{
"path": "lm_eval/tasks/naturalqs.py",
"chars": 5381,
"preview": "\"\"\"\nNatural Questions: a Benchmark for Question Answering Research\nhttps://storage.googleapis.com/pub-tools-public-publi"
},
{
"path": "lm_eval/tasks/openbookqa.py",
"chars": 2329,
"preview": "\"\"\"\nCan a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering\nhttps://arxiv.org/pdf/1809.0"
},
{
"path": "lm_eval/tasks/pile.py",
"chars": 3236,
"preview": "\"\"\"\nThe Pile: An 800GB Dataset of Diverse Text for Language Modeling\nhttps://arxiv.org/pdf/2101.00027.pdf\n\nThe Pile is a"
},
{
"path": "lm_eval/tasks/piqa.py",
"chars": 1813,
"preview": "\"\"\"\nPIQA: Reasoning about Physical Commonsense in Natural Language\nhttps://arxiv.org/pdf/1911.11641.pdf\n\nPhysical Intera"
},
{
"path": "lm_eval/tasks/prost.py",
"chars": 2590,
"preview": "\"\"\"\nPROST: Physical Reasoning about Objects Through Space and Time\nhttps://arxiv.org/pdf/2106.03634.pdf\n\nPROST, Physical"
},
{
"path": "lm_eval/tasks/pubmedqa.py",
"chars": 3122,
"preview": "\"\"\"\nPubMedQA: A Dataset for Biomedical Research Question Answering\nhttps://arxiv.org/pdf/1909.06146.pdf\n\nPubMedQA is a n"
},
{
"path": "lm_eval/tasks/qa4mre.py",
"chars": 2374,
"preview": "\"\"\"\nQA4MRE 2011-2013: Overview of Question Answering for Machine Reading Evaluation\nhttps://www.cs.cmu.edu/~./hovy/paper"
},
{
"path": "lm_eval/tasks/qasper.py",
"chars": 7783,
"preview": "\"\"\"\nA Dataset of Information-Seeking Questions and Answers Anchored in Research Papers\nhttps://arxiv.org/abs/2105.03011\n"
},
{
"path": "lm_eval/tasks/quac.py",
"chars": 4047,
"preview": "\"\"\"\nQuAC: Question Answering in Context\nhttps://arxiv.org/abs/1808.07036\n\nQuestion Answering in Context (QuAC) is a data"
},
{
"path": "lm_eval/tasks/race.py",
"chars": 5529,
"preview": "\"\"\"\nRACE: Large-scale ReAding Comprehension Dataset From Examinations\nhttps://arxiv.org/pdf/1704.04683.pdf\n\nRACE is a la"
},
{
"path": "lm_eval/tasks/sat.py",
"chars": 2127,
"preview": "\"\"\"\nSimilarity of Semantic Relations\nhttps://arxiv.org/pdf/cs/0608100.pdf\n\nSAT (Scholastic Aptitude Test) Analogy Questi"
},
{
"path": "lm_eval/tasks/sciq.py",
"chars": 2035,
"preview": "\"\"\"\nCrowdsourcing Multiple Choice Science Questions\nhttps://aclanthology.org/W17-4413.pdf\n\nThe SciQ dataset contains 13,"
},
{
"path": "lm_eval/tasks/squad.py",
"chars": 7981,
"preview": "\"\"\"\nKnow What You Don’t Know: Unanswerable Questions for SQuAD\nhttps://arxiv.org/pdf/1806.03822.pdf\n\nStanford Question A"
},
{
"path": "lm_eval/tasks/storycloze.py",
"chars": 5665,
"preview": "\"\"\"\nA Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories\nhttps://arxiv.org/pdf/1604.01696.pdf\n\n"
},
{
"path": "lm_eval/tasks/superglue.py",
"chars": 14163,
"preview": "\"\"\"\nSuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems\nhttps://w4ngatang.github.io/stati"
},
{
"path": "lm_eval/tasks/swag.py",
"chars": 1916,
"preview": "\"\"\"\nSWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference\nhttps://arxiv.org/pdf/1808.05326.pdf\n\nSWA"
},
{
"path": "lm_eval/tasks/toxigen.py",
"chars": 2367,
"preview": "\"\"\"\nToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection\nhttps://arxiv.or"
},
{
"path": "lm_eval/tasks/translation.py",
"chars": 7833,
"preview": "\"\"\"\nNOTE: This file implements translation tasks using datasets from WMT conferences,\nprovided by sacrebleu. Traditional"
},
{
"path": "lm_eval/tasks/triviaqa.py",
"chars": 2826,
"preview": "\"\"\"\nTriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension\nhttps://arxiv.org/pdf/1705."
},
{
"path": "lm_eval/tasks/truthfulqa.py",
"chars": 15262,
"preview": "\"\"\"\nTruthfulQA: Measuring How Models Mimic Human Falsehoods\nhttps://arxiv.org/pdf/2109.07958.pdf\n\nTruthfulQA is a benchm"
},
{
"path": "lm_eval/tasks/unscramble.py",
"chars": 3048,
"preview": "\"\"\"\nLanguage Models are Few-Shot Learners\nhttps://arxiv.org/pdf/2005.14165.pdf\n\nUnscramble is a small battery of 5 “char"
},
{
"path": "lm_eval/tasks/webqs.py",
"chars": 3038,
"preview": "\"\"\"\nSemantic Parsing on Freebase from Question-Answer Pairs\nhttps://cs.stanford.edu/~pliang/papers/freebase-emnlp2013.pd"
},
{
"path": "lm_eval/tasks/wikitext.py",
"chars": 2924,
"preview": "\"\"\"\nPointer Sentinel Mixture Models\nhttps://arxiv.org/pdf/1609.07843.pdf\n\nThe WikiText language modeling dataset is a co"
},
{
"path": "lm_eval/tasks/winogrande.py",
"chars": 4677,
"preview": "\"\"\"\nWinoGrande: An Adversarial Winograd Schema Challenge at Scale\nhttps://arxiv.org/pdf/1907.10641.pdf\n\nWinoGrande is a "
},
{
"path": "lm_eval/tasks/wsc273.py",
"chars": 6959,
"preview": "\"\"\"\nThe Winograd Schema Challenge\nhttp://commonsensereasoning.org/2011/papers/Levesque.pdf\n\nA Winograd schema is a pair "
},
{
"path": "lm_eval/utils.py",
"chars": 5558,
"preview": "import os\nimport pathlib\nimport re\nimport collections\nimport functools\nimport inspect\nimport sys\nfrom typing import List"
},
{
"path": "main.py",
"chars": 15788,
"preview": "import os\nimport sys\nimport random\nimport numpy as np\nfrom models.LMClass import LMClass\nfrom models.IRQLoRALMClass impo"
},
{
"path": "models/IRQLoRALMClass.py",
"chars": 4373,
"preview": "import transformers\nimport torch\nfrom .models_utils import BaseLM, find_layers\nimport os\nimport torch\nimport torch.nn.fu"
},
{
"path": "models/LMClass.py",
"chars": 3380,
"preview": "import transformers\nimport torch\nfrom .models_utils import BaseLM, find_layers\nfrom transformers import AutoTokenizer, A"
},
{
"path": "models/int_falcon_layer.py",
"chars": 17746,
"preview": "import torch\r\nfrom torch import nn\r\nfrom typing import Optional, Tuple, List\r\nfrom quant.int_linear import QuantLinear\r\n"
},
{
"path": "models/int_llama_layer.py",
"chars": 16353,
"preview": "import torch\r\nfrom torch import nn\r\nfrom typing import Optional, Tuple, List\r\nfrom quant.int_linear import QuantLinear\r\n"
},
{
"path": "models/int_opt_layer.py",
"chars": 18946,
"preview": "import torch\r\nfrom torch import nn\r\nfrom typing import Optional, Tuple, List\r\nfrom quant.int_linear import QuantLinear\r\n"
},
{
"path": "models/models_utils.py",
"chars": 20878,
"preview": "import abc\nimport torch\nimport json\nimport hashlib\nimport collections\nfrom tqdm import tqdm\nfrom typing import Iterable\n"
},
{
"path": "models/transformation.py",
"chars": 3801,
"preview": "\nimport torch\nimport pdb\n\nclass TruncateFunction(torch.autograd.Function):\n @staticmethod\n def forward(ctx, input,"
},
{
"path": "parallel_utils.py",
"chars": 4821,
"preview": "import torch\nimport torch.nn as nn\nfrom typing import List\nfrom functools import partial\nimport subprocess\nimport re\nimp"
},
{
"path": "quant/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "quant/int_linear.py",
"chars": 2166,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom quant.quantizer import UniformAffineQuantizer\n\n\n"
},
{
"path": "quant/int_matmul.py",
"chars": 1232,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom quant.quantizer import UniformAffineQuantizer\n\n\n"
},
{
"path": "quant/omni_norm.py",
"chars": 2052,
"preview": "import torch\nimport torch.nn as nn\n\n\n'''\nModify normalization layer to adapt the training of learnable equivalent transf"
},
{
"path": "quant/omniquant.py",
"chars": 14205,
"preview": "import torch\nimport torch.nn as nn\nfrom models.int_llama_layer import QuantLlamaDecoderLayer\nfrom models.int_opt_layer i"
},
{
"path": "quant/quantizer.py",
"chars": 5651,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom typing import Union\nimport tqdm\nimport numpy as "
},
{
"path": "quant/utils.py",
"chars": 6851,
"preview": "from collections import OrderedDict\nfrom quant.int_linear import QuantLinear\nimport torch\nfrom quant.int_matmul import Q"
},
{
"path": "scripts/eval_fake_ptq.sh",
"chars": 281,
"preview": "# for fake quantization here: AWQ, QuIP, BiLLM, PB-LLM, DB-LLM \nmodel_path='LLMQ/LLaMA-3-8B-BiLLM-1.1bit-fake'\npython ma"
},
{
"path": "scripts/eval_irqlora_commonsenseqa.sh",
"chars": 447,
"preview": "tau_range=0.1\ntau_n=100\nblocksize2=256\n\nCUDA_VISIBLE_DEVICES=0 python main.py \\\n--model /home/inspur/lin/pretrained_mode"
},
{
"path": "utils.py",
"chars": 3052,
"preview": "import torch\n# from torch._six import inf\nfrom math import inf\nimport logging\nfrom termcolor import colored\nimport sys\ni"
}
]
About this extraction
This page contains the full source code of the Macaronlin/LLaMA3-Quantization GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 133 files (762.1 KB), approximately 192.9k tokens, and a symbol index with 1636 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.