Repository: jlko/semantic_uncertainty
Branch: master
Commit: a8d9aa8cecd5
Files: 20
Total size: 144.0 KB
Directory structure:
gitextract_1q6fc1mb/
├── .gitignore
├── LICENSE
├── README.md
├── environment.yaml
├── environment_export.yaml
├── notebooks/
│ └── example_evaluation.ipynb
└── semantic_uncertainty/
├── analyze_results.py
├── compute_uncertainty_measures.py
├── generate_answers.py
└── uncertainty/
├── __init__.py
├── data/
│ └── data_utils.py
├── models/
│ ├── __init__.py
│ ├── base_model.py
│ └── huggingface_models.py
├── uncertainty_measures/
│ ├── p_ik.py
│ ├── p_true.py
│ └── semantic_entropy.py
└── utils/
├── eval_utils.py
├── openai.py
└── utils.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
figures/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# Slurm
*.out
# VSCODE
.vscode
================================================
FILE: LICENSE
================================================
The Clear BSD License
Copyright (c) 2023
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted (subject to the limitations in the disclaimer
below) provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from this
software without specific prior written permission.
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY
THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: README.md
================================================
# Detecting Hallucinations in Large Language Models Using Semantic Entropy
This repository contains the code necessary to reproduce the short-phrase and sentence-length experiments of the Nature submission 'Detecting Hallucinations in Large Language Models Using Semantic Entropy'.
This repository builds on the original, now deprecated codebase for semantic uncertainty at [https://github.com/lorenzkuhn/semantic_uncertainty](https://github.com/lorenzkuhn/semantic_uncertainty).
## System Requirements
We here discuss hardware and software system requirements.
### Hardware Dependencies
Generally speaking, our experiments require modern computer hardware which is suited for usage with large language models (LLMs).
Requirements regarding the system's CPU and RAM size are relatively modest: any reasonably modern system should suffice, e.g. a system with an Intel 10th generation CPU and 16 GB of system memory or better.
More importantly, all our experiments make use of one or more Graphics Processor Units (GPUs) to speed up LLM inference.
Without a GPU, it is not feasible to reproduce our results in a reasonable amount of time.
The particular GPU necessary depends on the choice of LLM: LLMs with more parameters require GPUs with more memory.
For smaller models (7B parameters), desktop GPUs such as the Nvidia TitanRTX (24 GB) are sufficient.
For larger models (13B), GPUs with more memory, such as the Nvidia A100 server GPU, are required.
Our largest models with 70B parameters require the use of two Nvidia A100 GPUs (2x80GB) simultaneously.
One can reduce the precision to float16 or int8 to reduce memory requirements without significantly affecting model predictions and their accuracy.
We use float16 for 70B models by default, and int8 mode can be enabled for any model by suffixing the model name with `-int8`.
### Software Dependencies
Our code relies on Python 3.11 with PyTorch 2.1.
Our systems run the Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-89-generic x86_64) operating system.
In [environment_export.yaml](environment_export.yaml) we list the exact versions for all Python packages used in our experiments.
We generally advise against trying to install from this exact export of our conda environment.
Please see below for installation instructions.
Although we have not tested this, we would expect our code to be compatible with other operating systems, Python versions, and versions of the Python libraries that we use.
## Installation Guide
To install Python with all necessary dependencies, we recommend the use of conda, and we refer to [https://conda.io/](https://conda.io/) for an installation guide.
After installing conda, you can set up and activate a new conda environment with all required packages by executing the following commands from the root folder of this repository in a shell:
```
conda-env update -f environment.yaml
conda activate semantic_uncertainty
```
The installation should take around 15 minutes.
Our experiments rely on [Weights & Biases (wandb)](https://wandb.ai/) to log and save individual runs.
While wandb will be installed automatically with the above conda script, you may need to log in with your wandb API key upon initial execution.
Our experiments rely on Hugging Face for all LLM models and most of the datasets.
It may be necessary to set the environment variable `HUGGING_FACE_HUB_TOKEN` to the token associated with your Hugging Face account.
Further, it may be necessary to [apply for access](https://huggingface.co/meta-llama) to use the official repository of Meta's LLaMa-2 models.
We further recommend setting the `XDG_CACHE_HOME` environment variable to a directory on a device with sufficient space, as models and datasets will be downloaded to this folder.
Our experiments with sentence-length generation use GPT models from the OpenAI API.
Please set the environment variable `OPENAI_API_KEY` to your OpenAI API key in order to use these models.
Note that OpenAI charges a cost per input token and per generated token.
Costs for reproducing our results vary depending on experiment configuration, but, without any guarantee, should lie somewhere between 5 and 30 USD per run.
For almost all tasks, the dataset is downloaded automatically from the Hugging Face Datasets library upon first execution.
The only exception is BioASQ (task b, BioASQ11, 2023), for which the data needs to be [downloaded](http://participants-area.bioasq.org/datasets) manually and stored at `$SCRATCH_DIR/$USER/semantic_uncertainty/data/bioasq/training11b.json`, where `$SCRATCH_DIR` defaults to `.`.
## Demo
Execute
```
python semantic_uncertainty/generate_answers.py --model_name=Llama-2-7b-chat --dataset=trivia_qa
```
to reproduce results for short-phrase generation with LLaMa-2 Chat (7B) on the BioASQ dataset.
The expected runtime of this demo is 1 hour using an Nvidia A100 GPU (80 GB), 24 cores of a Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz, and 192 GB of RAM.
Runtime may be longer upon first execution, as the LLM needs to be downloaded from Hugging Face first.
To evaluate the run and obtain a barplot similar to those of the paper, open the Jupyter notebook in [notebooks/example_evaluation.ipynb](notebooks/example_evaluation.ipynb), populate the `wandb_id` variable in the first cell with the id assigned to your demo run, and execute all cells of the notebook.
We refer to [https://jupyter.org/](https://jupyter.org/) for more information on how to start the Jupter notebook server.
## Further Instructions
### Repository Structure
We here give an overview of the various components of the code.
By default, a standard run executes the following three scripts in order:
* `generate_answers.py`: Sample responses (and their likelihods/hidden states) from the models for a set of input questions.
* `compute_uncertainty_measures.py`: Compute uncertainty metrics given responses.
* `analyze_results.py`: Compute aggregate performance metrics given uncertainties.
It is possible to run these scripts individually, e.g. when recomputing results, and we are happy to provide guidance on how to do so upon request.
### Reproducing the Experiments
To reproduce the experiments of the paper, one needs to execute
```
python generate_answers.py --model_name=$MODEL --dataset=$DATASET $EXTRA_CFG
```
for all combinations of models and datasets, and where `$EXTRA_CFG` is defined to either activate short-phrase or sentence-length generations and their associated hyperparameters.
Concretely,
* `$MODEL` is one of: `[Llama-2-7b, Llama-2-13b, Llama-2-70b, Llama-2-7b-chat, Llama-2-13b-chat, Llama-2-70b-chat, falcon-7b, falcon-40b, falcon-7b-instruct, falcon-40b-instruct, Mistral-7B-v0.1, Mistral-7B-Instruct-v0.1]`,
* `$DATASET` is one of `[trivia_qa, squad, bioasq, nq, svamp]`,
* and `$EXTRA_CFG=''` is empty for short-phrase generations and `EXTRA_CFG=--num_few_shot=0 --model_max_new_tokens=100 --brief_prompt=chat --metric=llm_gpt-4 --entailment_model=gpt-3.5 --no-compute_accuracy_at_all_temps` for sentence-length generations.
The results for any run can be obtained by passing the associated `wandb_id` to an evaluation notebook identical to the demo in [notebooks/example_evaluation.ipynb](notebooks/example_evaluation.ipynb).
================================================
FILE: environment.yaml
================================================
name: semantic_uncertainty
channels:
- pytorch
- nvidia
- defaults
dependencies:
- python=3.11
- pip
- pytorch
- torchvision
- torchaudio
- pytorch-cuda=11.8
- numpy
- transformers
- evaluate
- datasets
- pip:
- transformers>=4.31
- scikit-learn
- pandas
- flake8
- omegaconf
- jupyterlab
- notebook
- matplotlib
- seaborn
- tqdm
- ipywidgets
- scipy
- wandb
- tokenizers>=0.13.3
- accelerate
- ml_collections
- torchmetrics
- openai
- tiktoken
- einops
- bitsandbytes
- nltk
- tenacity
- sentencepiece
- safetensors
================================================
FILE: environment_export.yaml
================================================
name: semantic_uncertainty_export
channels:
- pytorch
- nvidia
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- abseil-cpp=20211102.0=hd4dd3e8_0
- aiohttp=3.8.5=py311h5eee18b_0
- aiosignal=1.2.0=pyhd3eb1b0_0
- arrow=1.2.3=py311h06a4308_1
- arrow-cpp=11.0.0=h374c478_2
- async-timeout=4.0.2=py311h06a4308_0
- attrs=23.1.0=py311h06a4308_0
- aws-c-common=0.6.8=h5eee18b_1
- aws-c-event-stream=0.1.6=h6a678d5_6
- aws-checksums=0.1.11=h5eee18b_2
- aws-sdk-cpp=1.8.185=h721c034_1
- binaryornot=0.4.4=pyhd3eb1b0_1
- blas=1.0=mkl
- boost-cpp=1.82.0=hdb19cb5_2
- bottleneck=1.3.5=py311hbed6279_0
- brotlipy=0.7.0=py311h5eee18b_1002
- bzip2=1.0.8=h7b6447c_0
- c-ares=1.19.1=h5eee18b_0
- ca-certificates=2023.08.22=h06a4308_0
- certifi=2023.11.17=py311h06a4308_0
- cffi=1.15.1=py311h5eee18b_3
- chardet=4.0.0=py311h06a4308_1003
- charset-normalizer=2.0.4=pyhd3eb1b0_0
- click=8.0.4=py311h06a4308_0
- cookiecutter=1.7.3=pyhd3eb1b0_0
- cryptography=41.0.3=py311hdda0065_0
- cuda-cudart=11.8.89=0
- cuda-cupti=11.8.87=0
- cuda-libraries=11.8.0=0
- cuda-nvrtc=11.8.89=0
- cuda-nvtx=11.8.86=0
- cuda-runtime=11.8.0=0
- datasets=2.12.0=py311h06a4308_0
- dill=0.3.6=py311h06a4308_0
- evaluate=0.4.0=py311h06a4308_0
- ffmpeg=4.3=hf484d3e_0
- filelock=3.9.0=py311h06a4308_0
- freetype=2.12.1=h4a9f257_0
- frozenlist=1.3.3=py311h5eee18b_0
- fsspec=2023.9.2=py311h06a4308_0
- gflags=2.2.2=he6710b0_0
- giflib=5.2.1=h5eee18b_3
- glog=0.5.0=h2531618_0
- gmp=6.2.1=h295c915_3
- gmpy2=2.1.2=py311hc9b5ff0_0
- gnutls=3.6.15=he1e5248_0
- grpc-cpp=1.48.2=he1ff14a_1
- huggingface_hub=0.17.3=py311h06a4308_0
- icu=73.1=h6a678d5_0
- idna=3.4=py311h06a4308_0
- intel-openmp=2023.1.0=hdb19cb5_46305
- jinja2=3.1.2=py311h06a4308_0
- jinja2-time=0.2.0=pyhd3eb1b0_3
- jpeg=9e=h5eee18b_1
- krb5=1.20.1=h143b758_1
- lame=3.100=h7b6447c_0
- lcms2=2.12=h3be6417_0
- ld_impl_linux-64=2.38=h1181459_1
- lerc=3.0=h295c915_0
- libboost=1.82.0=h109eef0_2
- libbrotlicommon=1.0.9=h5eee18b_7
- libbrotlidec=1.0.9=h5eee18b_7
- libbrotlienc=1.0.9=h5eee18b_7
- libcublas=11.11.3.6=0
- libcufft=10.9.0.58=0
- libcufile=1.8.0.34=0
- libcurand=10.3.4.52=0
- libcurl=8.4.0=h251f7ec_0
- libcusolver=11.4.1.48=0
- libcusparse=11.7.5.86=0
- libdeflate=1.17=h5eee18b_1
- libedit=3.1.20221030=h5eee18b_0
- libev=4.33=h7f8727e_1
- libevent=2.1.12=hdbd6064_1
- libffi=3.4.4=h6a678d5_0
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libiconv=1.16=h7f8727e_2
- libidn2=2.3.4=h5eee18b_0
- libjpeg-turbo=2.0.0=h9bf148f_0
- libnghttp2=1.57.0=h2d74bed_0
- libnpp=11.8.0.86=0
- libnvjpeg=11.9.0.86=0
- libpng=1.6.39=h5eee18b_0
- libprotobuf=3.20.3=he621ea3_0
- libssh2=1.10.0=hdbd6064_2
- libstdcxx-ng=11.2.0=h1234567_1
- libtasn1=4.19.0=h5eee18b_0
- libthrift=0.15.0=h1795dd8_2
- libtiff=4.5.1=h6a678d5_0
- libunistring=0.9.10=h27cfd23_0
- libuuid=1.41.5=h5eee18b_0
- libwebp=1.3.2=h11a3e52_0
- libwebp-base=1.3.2=h5eee18b_0
- llvm-openmp=14.0.6=h9e868ea_0
- lz4-c=1.9.4=h6a678d5_0
- markupsafe=2.1.1=py311h5eee18b_0
- mkl=2023.1.0=h213fc3f_46343
- mkl-service=2.4.0=py311h5eee18b_1
- mkl_fft=1.3.8=py311h5eee18b_0
- mkl_random=1.2.4=py311hdb19cb5_0
- mpc=1.1.0=h10f8cd9_1
- mpfr=4.0.2=hb69a4c5_1
- mpmath=1.3.0=py311h06a4308_0
- multidict=6.0.2=py311h5eee18b_0
- multiprocess=0.70.14=py311h06a4308_0
- ncurses=6.4=h6a678d5_0
- nettle=3.7.3=hbbd107a_1
- networkx=3.1=py311h06a4308_0
- numexpr=2.8.7=py311h65dcdc2_0
- numpy=1.26.0=py311h08b1b3b_0
- numpy-base=1.26.0=py311hf175353_0
- openh264=2.1.1=h4ff587b_0
- openjpeg=2.4.0=h3ad879b_0
- openssl=3.0.12=h7f8727e_0
- orc=1.7.4=hb3bc3d3_1
- packaging=23.1=py311h06a4308_0
- pillow=10.0.1=py311ha6cbd5a_0
- pip=23.3.1=py311h06a4308_0
- poyo=0.5.0=pyhd3eb1b0_0
- pyarrow=11.0.0=py311hd8e8d9b_1
- pycparser=2.21=pyhd3eb1b0_0
- pyopenssl=23.2.0=py311h06a4308_0
- pysocks=1.7.1=py311h06a4308_0
- python=3.11.5=h955ad1f_0
- python-dateutil=2.8.2=pyhd3eb1b0_0
- python-slugify=5.0.2=pyhd3eb1b0_0
- python-tzdata=2023.3=pyhd3eb1b0_0
- python-xxhash=2.0.2=py311h5eee18b_1
- pytorch=2.1.1=py3.11_cuda11.8_cudnn8.7.0_0
- pytorch-cuda=11.8=h7e8668a_5
- pytorch-mutex=1.0=cuda
- pytz=2023.3.post1=py311h06a4308_0
- pyyaml=6.0=py311h5eee18b_1
- re2=2022.04.01=h295c915_0
- readline=8.2=h5eee18b_0
- requests=2.31.0=py311h06a4308_0
- responses=0.13.3=pyhd3eb1b0_0
- setuptools=68.0.0=py311h06a4308_0
- six=1.16.0=pyhd3eb1b0_1
- snappy=1.1.9=h295c915_0
- sqlite=3.41.2=h5eee18b_0
- sympy=1.11.1=py311h06a4308_0
- tbb=2021.8.0=hdb19cb5_0
- text-unidecode=1.3=pyhd3eb1b0_0
- tk=8.6.12=h1ccaba5_0
- torchaudio=2.1.1=py311_cu118
- torchtriton=2.1.0=py311
- torchvision=0.16.1=py311_cu118
- typing-extensions=4.7.1=py311h06a4308_0
- typing_extensions=4.7.1=py311h06a4308_0
- tzdata=2023c=h04d1e81_0
- unidecode=1.2.0=pyhd3eb1b0_0
- urllib3=1.26.16=py311h06a4308_0
- utf8proc=2.6.1=h27cfd23_0
- wheel=0.41.2=py311h06a4308_0
- xxhash=0.8.0=h7f8727e_3
- xz=5.4.2=h5eee18b_0
- yaml=0.2.5=h7b6447c_0
- yarl=1.8.1=py311h5eee18b_0
- zlib=1.2.13=h5eee18b_0
- zstd=1.5.5=hc292b87_0
- pip:
- absl-py==2.0.0
- accelerate==0.25.0
- annotated-types==0.6.0
- antlr4-python3-runtime==4.9.3
- anyio==3.7.1
- appdirs==1.4.4
- argon2-cffi==23.1.0
- argon2-cffi-bindings==21.2.0
- asttokens==2.4.0
- async-lru==2.0.4
- babel==2.13.0
- backcall==0.2.0
- beautifulsoup4==4.12.2
- bitsandbytes==0.41.2.post2
- bleach==6.1.0
- comm==0.1.4
- contextlib2==21.6.0
- contourpy==1.1.1
- cycler==0.12.1
- debugpy==1.8.0
- decorator==5.1.1
- defusedxml==0.7.1
- distro==1.8.0
- docker-pycreds==0.4.0
- einops==0.7.0
- executing==2.0.0
- fastjsonschema==2.18.1
- flake8==6.1.0
- fonttools==4.43.1
- fqdn==1.5.1
- gitdb==4.0.11
- gitpython==3.1.40
- h11==0.14.0
- httpcore==1.0.1
- httpx==0.25.1
- ipykernel==6.25.2
- ipython==8.16.1
- ipywidgets==8.1.1
- isoduration==20.11.0
- jedi==0.19.1
- joblib==1.3.2
- json5==0.9.14
- jsonpointer==2.4
- jsonschema==4.19.1
- jsonschema-specifications==2023.7.1
- jupyter-client==8.4.0
- jupyter-core==5.4.0
- jupyter-events==0.8.0
- jupyter-lsp==2.2.0
- jupyter-server==2.8.0
- jupyter-server-terminals==0.4.4
- jupyterlab==4.0.9
- jupyterlab-pygments==0.2.2
- jupyterlab-server==2.25.0
- jupyterlab-widgets==3.0.9
- kiwisolver==1.4.5
- lightning-utilities==0.9.0
- matplotlib==3.8.2
- matplotlib-inline==0.1.6
- mccabe==0.7.0
- mistune==3.0.2
- ml-collections==0.1.1
- nbclient==0.8.0
- nbconvert==7.9.2
- nbformat==5.9.2
- nest-asyncio==1.5.8
- nltk==3.8.1
- notebook==7.0.6
- notebook-shim==0.2.3
- omegaconf==2.3.0
- openai==1.3.7
- overrides==7.4.0
- pandas==2.1.3
- pandocfilters==1.5.0
- parso==0.8.3
- pathtools==0.1.2
- pexpect==4.8.0
- pickleshare==0.7.5
- platformdirs==3.11.0
- prometheus-client==0.17.1
- prompt-toolkit==3.0.39
- protobuf==4.24.4
- psutil==5.9.6
- ptyprocess==0.7.0
- pure-eval==0.2.2
- pycodestyle==2.11.1
- pydantic==2.4.2
- pydantic-core==2.10.1
- pyflakes==3.1.0
- pygments==2.16.1
- pyparsing==3.1.1
- python-json-logger==2.0.7
- pyzmq==25.1.1
- referencing==0.30.2
- regex==2023.10.3
- rfc3339-validator==0.1.4
- rfc3986-validator==0.1.1
- rpds-py==0.10.6
- safetensors==0.4.1
- scikit-learn==1.3.2
- scipy==1.11.4
- seaborn==0.13.0
- send2trash==1.8.2
- sentencepiece==0.1.99
- sentry-sdk==1.32.0
- setproctitle==1.3.3
- smmap==5.0.1
- sniffio==1.3.0
- soupsieve==2.5
- stack-data==0.6.3
- tenacity==8.2.3
- terminado==0.17.1
- threadpoolctl==3.2.0
- tiktoken==0.5.2
- tinycss2==1.2.1
- tokenizers==0.15.0
- torchmetrics==1.2.1
- tornado==6.3.3
- tqdm==4.66.1
- traitlets==5.11.2
- transformers==4.35.2
- uri-template==1.3.0
- wandb==0.16.0
- wcwidth==0.2.8
- webcolors==1.13
- webencodings==0.5.1
- websocket-client==1.6.4
- widgetsnbextension==4.0.9
================================================
FILE: notebooks/example_evaluation.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "e9698d3d-f63c-4d8c-af6e-4e9996b3fe28",
"metadata": {},
"outputs": [],
"source": [
"# Fill in the wandb_id assigned to your demo run!\n",
"\n",
"wandb_id = 'YOUR_ID'\n",
"if wandb_id == 'YOUR_ID':\n",
" raise ValueError('Need to provide wandb_id of demo run!')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0ff8faff-38fa-49cb-be73-624dc88fcd13",
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"\n",
"import os\n",
"import json\n",
"import wandb\n",
"import pandas as pd\n",
"from matplotlib import pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "32bfe743-5d8c-4ae6-804d-c5800a4209f1",
"metadata": {},
"outputs": [],
"source": [
"# Helper Functions\n",
"\n",
"def restore_file(wandb_id, filename='wandb-summary.json'):\n",
" files_dir = 'notebooks/restored_files' \n",
" os.system(f'mkdir -p {files_dir}')\n",
"\n",
" api = wandb.Api()\n",
" run = api.run(f'semantic_uncertainty/{wandb_id}')\n",
"\n",
" path = f'{files_dir}/{filename}'\n",
" os.system(f'rm -rf {path}')\n",
" run.file(filename).download(root=files_dir, replace=True, exist_ok=False)\n",
" with open(path, 'r') as f:\n",
" out = json.load(f)\n",
" return out\n",
"\n",
"def get_uncertainty_df(metrics):\n",
" data = []\n",
" for method in metrics['uncertainty']:\n",
" for metric in metrics['uncertainty'][method]:\n",
" mean = metrics['uncertainty'][method][metric]['mean']\n",
" data.append([method, metric, mean])\n",
" df = pd.DataFrame(data, columns=['method', 'metric', 'means'])\n",
" main_methods = ['semantic_entropy', 'cluster_assignment_entropy', 'regular_entropy', 'p_false', 'p_ik']\n",
" df = df.set_index('method').loc[main_methods].reset_index()\n",
" main_names = ['Semantic entropy', 'Discrete Semantic Entropy', 'Naive Entropy', 'p(True)', 'Embedding Regression']\n",
" conversion = dict(zip(main_methods, main_names))\n",
" df['method'] = df.method.map(lambda x: conversion[x])\n",
" return df"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "56409203-8734-4b9e-8618-638a26be7ccb",
"metadata": {},
"outputs": [],
"source": [
"results = restore_file(wandb_id)\n",
"unc_df = get_uncertainty_df(results)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6acb4184-0c9a-4ec5-b860-e050bf7544a2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.6, 0.8)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkgAAAJhCAYAAAC6vU9RAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAABsVElEQVR4nO3deXhMZ/8/8PdksklkI7IKidorsQQR1N7YvlSpnRBLlVinWkmL2CqqT1Hl4UFsbYny0FoTxE5IxRolCBFLFpEmkaSyzfz+8DNP50xCtJk5k5z367rmusx97jnzmbmb5p1z7nMfmUqlUoGIiIiI1IzELoCIiIjI0DAgEREREQkwIBEREREJMCARERERCTAgEREREQkwIBEREREJMCARERERCTAgEREREQkwIBEREREJMCARERERCYgekFavXg13d3eYm5vDx8cHMTExr+2/YsUKNGjQAFWqVIGbmxtmzJiBFy9evNU+X7x4gcDAQFSvXh1Vq1bFgAEDkJqaWu6fjYiIiComUQPSjh07oFAoEBISgkuXLqFp06bo3r070tLSSuy/bds2BAUFISQkBDdv3kRYWBh27NiBL7744q32OWPGDOzbtw87d+7EyZMn8eTJE/Tv31/nn5eIiIgqBpmYN6v18fFBq1atsGrVKgCAUqmEm5sbpkyZgqCgIK3+kydPxs2bNxEVFaVu+/TTT3HhwgWcOXOmTPvMyspCjRo1sG3bNnz00UcAgFu3bqFRo0aIjo5GmzZtdP2xiYiIyMAZi/XGBQUFiI2NRXBwsLrNyMgI3bp1Q3R0dImvadu2LX788UfExMSgdevWuHfvHg4ePIiRI0eWeZ+xsbEoLCxEt27d1H0aNmyIWrVqvTYg5efnIz8/X/1cqVQiIyMD1atXh0wm+/tfBBEREemNSqXC8+fP4eLiAiOj0k+kiRaQ0tPTUVxcDEdHR412R0dH3Lp1q8TXDBs2DOnp6Wjfvj1UKhWKiorwySefqE+xlWWfKSkpMDU1ha2trVaflJSUUusNDQ3F/Pnz3/ZjEhERkQF6+PAhatasWep20QLS33HixAksXrwY//73v+Hj44O7d+9i2rRpWLhwIebMmaPT9w4ODoZCoVA/z8rKQq1atXD//n1YWVnp9L2JiIiofDx//hweHh5v/N0tWkCyt7eHXC7XunosNTUVTk5OJb5mzpw5GDlyJMaNGwcA8PT0RG5uLj7++GN8+eWXZdqnk5MTCgoKkJmZqXEU6XXvCwBmZmYwMzPTaq9WrRqsra3L9JmJiIhIXCYmJgDwxukxol3FZmpqCm9vb40J10qlElFRUfD19S3xNXl5eVrnC+VyOYCX5xTLsk9vb2+YmJho9ImPj0dSUlKp70tERETSIuopNoVCgVGjRqFly5Zo3bo1VqxYgdzcXAQEBAAA/P394erqitDQUABAnz59sGzZMjRv3lx9im3OnDno06ePOii9aZ82NjYYO3YsFAqF+ujPlClT4OvryyvYiIiICIDIAWnw4MF4+vQp5s6di5SUFDRr1gwRERHqSdZJSUkaR4xmz54NmUyG2bNn4/Hjx6hRowb69OmDr776qsz7BIDly5fDyMgIAwYMQH5+Prp3745///vf+vvgREREZNBEXQepIsvOzoaNjQ2ysrI4B4mIiLS8utq6uLhY7FIkRS6Xw9jYuNQ5RmX9/V2hrmIjIiKqCAoKCpCcnIy8vDyxS5EkCwsLODs7w9TU9G/vgwGJiIioHCmVSty/fx9yuRwuLi4wNTXlgsJ6olKpUFBQgKdPn+L+/fuoV6/eaxeDfB0GJCIionJUUFCgvs2VhYWF2OVITpUqVWBiYoIHDx6goKAA5ubmf2s/ot6sloiIqLL6u0cu6J8rj++eo0dEREQkwIBEREREJMCARERERCTASdpERER64h50QK/vl7ikt17frzLhESQiIiIiAQYkIiIiAgB06tQJU6ZMwfTp02FnZwdHR0esX79efU9TKysr1K1bF4cOHVK/Ji4uDj179kTVqlXh6OiIkSNHIj09Xb09IiIC7du3h62tLapXr47/+7//Q0JCgnp7YmIiZDIZdu/ejc6dO8PCwgJNmzZFdHS0us+DBw/Qp08f2NnZwdLSEu+++y4OHjyo0++CAYmIiIjUtmzZAnt7e8TExGDKlCmYOHEiBg4ciLZt2+LSpUvw8/PDyJEjkZeXh8zMTHTp0gXNmzfHxYsXERERgdTUVAwaNEi9v9zcXCgUCly8eBFRUVEwMjLChx9+CKVSqfG+X375JWbOnIkrV66gfv36GDp0KIqKigAAgYGByM/Px6lTp3D9+nV8/fXXqFq1qk6/B96L7W/ivdiIiKgkL168wP379+Hh4aG1SKGhz0Hq1KkTiouLcfr0aQBAcXExbGxs0L9/f2zduhUAkJKSAmdnZ0RHR+Po0aM4ffo0IiMj1ft49OgR3NzcEB8fj/r162u9R3p6OmrUqIHr16+jSZMmSExMhIeHBzZs2ICxY8cCAH7//Xe8++67uHnzJho2bAgvLy8MGDAAISEhZfocrxuDsv7+5hEkIiIiUvPy8lL/Wy6Xo3r16vD09FS3OTo6AgDS0tJw9epVHD9+HFWrVlU/GjZsCADq02h37tzB0KFDUadOHVhbW8Pd3R0AkJSUVOr7Ojs7q98DAKZOnYpFixahXbt2CAkJwbVr18r5U2tjQCIiIiI1ExMTjecymUyj7dV95ZRKJXJyctCnTx9cuXJF43Hnzh106NABANCnTx9kZGRg/fr1uHDhAi5cuADg5S1ZSnvfv74HAIwbNw737t3DyJEjcf36dbRs2RLff/99OX9yTbzMn4iIiP6WFi1a4L///S/c3d1hbKwdKZ49e4b4+HisX78e7733HgDgzJkzf+u93Nzc8Mknn+CTTz5BcHAw1q9fjylTpvyj+l+HR5CIiIjobwkMDERGRgaGDh2K3377DQkJCYiMjERAQACKi4thZ2eH6tWrY926dbh79y6OHTsGhULx1u8zffp0REZG4v79+7h06RKOHz+ORo0a6eAT/Q+PIBEREelJZVu40cXFBWfPnsWsWbPg5+eH/Px81K5dGz169ICRkRFkMhnCw8MxdepUNGnSBA0aNMDKlSvRqVOnt3qf4uJiBAYG4tGjR7C2tkaPHj2wfPly3Xyo/49Xsf1NvIqNiIhK8rorqEg/eBUbERERkQ4wIBEREREJMCARERERCTAgEREREQkwIBEREekAr4EST3l89wxIRERE5ejVitB5eXkiVyJdr7574argb4PrIBEREZUjuVwOW1tb9X3ELCws1LfOIN1SqVTIy8tDWloabG1tIZfL//a+GJCIiIjKmZOTE4D/3WyV9MvW1lY9Bn8XAxIREVE5k8lkcHZ2hoODAwoLC8UuR1JMTEz+0ZGjVxiQiIiIdEQul5fLL2vSP07SJiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhIwiIC0evVquLu7w9zcHD4+PoiJiSm1b6dOnSCTybQevXv3VvcpabtMJsM333yj7uPu7q61fcmSJTr9nERERFQxiL5Q5I4dO6BQKLB27Vr4+PhgxYoV6N69O+Lj4+Hg4KDVf/fu3SgoKFA/f/bsGZo2bYqBAweq25KTkzVec+jQIYwdOxYDBgzQaF+wYAHGjx+vfm5lZVVeH4uIiIgqMNED0rJlyzB+/HgEBAQAANauXYsDBw5g48aNCAoK0upfrVo1jefh4eGwsLDQCEjC+6/8+uuv6Ny5M+rUqaPRbmVl9Y/v1UJERESVj6gBqaCgALGxsQgODla3GRkZoVu3boiOji7TPsLCwjBkyBBYWlqWuD01NRUHDhzAli1btLYtWbIECxcuRK1atTBs2DDMmDEDxsYlfyX5+fnIz89XP8/OzgYAFBYW8j47REREFURZf2eLGpDS09NRXFwMR0dHjXZHR0fcunXrja+PiYlBXFwcwsLCSu2zZcsWWFlZoX///hrtU6dORYsWLVCtWjWcO3cOwcHBSE5OxrJly0rcT2hoKObPn6/VfvjwYVhYWLyxViIiIhJfXl5emfqJfortnwgLC4Onpydat25dap+NGzdi+PDhMDc312hXKBTqf3t5ecHU1BQTJkxAaGgozMzMtPYTHBys8Zrs7Gy4ubnBz88P1tbW5fBpiIiISNdenQF6E1EDkr29PeRyOVJTUzXaU1NT3zg3KDc3F+Hh4ViwYEGpfU6fPo34+Hjs2LHjjbX4+PigqKgIiYmJaNCggdZ2MzOzEoOTiYkJTExM3rh/IiIiEl9Zf2eLepm/qakpvL29ERUVpW5TKpWIioqCr6/va1+7c+dO5OfnY8SIEaX2CQsLg7e3N5o2bfrGWq5cuQIjI6MSr5wjIiIiaRH9FJtCocCoUaPQsmVLtG7dGitWrEBubq76qjZ/f3+4uroiNDRU43VhYWHo168fqlevXuJ+s7OzsXPnTnz77bda26Kjo3HhwgV07twZVlZWiI6OxowZMzBixAjY2dmV/4ckIiKiCkX0gDR48GA8ffoUc+fORUpKCpo1a4aIiAj1xO2kpCQYGWke6IqPj8eZM2dw+PDhUvcbHh4OlUqFoUOHam0zMzNDeHg45s2bh/z8fHh4eGDGjBkac4yIiIhIumQqlUoldhEVUXZ2NmxsbJCVlcVJ2kRERBVEWX9/G8StRoiIiIgMCQMSERERkQADEhEREZEAAxIRERGRAAMSERERkQADEhEREZEAAxIRERGRAAMSERERkQADEhEREZEAAxIRERGRgOj3YqPXcw86IHYJ/1jikt5il0BERPRWeASJiIiISIABiYiIiEiAAYmIiIhIgAGJiIiISIABiYiIiEiAAYmIiIhIgJf5E5URl1wgIpIOHkEiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhIwiIC0evVquLu7w9zcHD4+PoiJiSm1b6dOnSCTybQevXv3VvcZPXq01vYePXpo7CcjIwPDhw+HtbU1bG1tMXbsWOTk5OjsMxIREVHFIXpA2rFjBxQKBUJCQnDp0iU0bdoU3bt3R1paWon9d+/ejeTkZPUjLi4OcrkcAwcO1OjXo0cPjX7bt2/X2D58+HDcuHEDR44cwf79+3Hq1Cl8/PHHOvucREREVHGIHpCWLVuG8ePHIyAgAI0bN8batWthYWGBjRs3lti/WrVqcHJyUj+OHDkCCwsLrYBkZmam0c/Ozk697ebNm4iIiMCGDRvg4+OD9u3b4/vvv0d4eDiePHmi089LREREhs9YzDcvKChAbGwsgoOD1W1GRkbo1q0boqOjy7SPsLAwDBkyBJaWlhrtJ06cgIODA+zs7NClSxcsWrQI1atXBwBER0fD1tYWLVu2VPfv1q0bjIyMcOHCBXz44Yda75Ofn4/8/Hz18+zsbABAYWEhCgsLy/6h35KZXKWzfeuLLr8ffeJYEBFVfGX9/6CoASk9PR3FxcVwdHTUaHd0dMStW7fe+PqYmBjExcUhLCxMo71Hjx7o378/PDw8kJCQgC+++AI9e/ZEdHQ05HI5UlJS4ODgoPEaY2NjVKtWDSkpKSW+V2hoKObPn6/VfvjwYVhYWLyx1r9raWud7VpvDh48KHYJ5YJjQURU8eXl5ZWpn6gB6Z8KCwuDp6cnWrfW/M01ZMgQ9b89PT3h5eWFd955BydOnEDXrl3/1nsFBwdDoVCon2dnZ8PNzQ1+fn6wtrb+ex+gDJrMi9TZvvUlbl53sUsoFxwLIqKK79UZoDcRNSDZ29tDLpcjNTVVoz01NRVOTk6vfW1ubi7Cw8OxYMGCN75PnTp1YG9vj7t376Jr165wcnLSmgReVFSEjIyMUt/XzMwMZmZmWu0mJiYwMTF5Yw1/V36xTGf71hddfj/6xLEgIqr4yvr/QVEnaZuamsLb2xtRUVHqNqVSiaioKPj6+r72tTt37kR+fj5GjBjxxvd59OgRnj17BmdnZwCAr68vMjMzERsbq+5z7NgxKJVK+Pj4/M1PQ0RERJWF6FexKRQKrF+/Hlu2bMHNmzcxceJE5ObmIiAgAADg7++vMYn7lbCwMPTr10898fqVnJwcfPbZZzh//jwSExMRFRWFDz74AHXr1kX37i9PLzRq1Ag9evTA+PHjERMTg7Nnz2Ly5MkYMmQIXFxcdP+hiYiIyKCJPgdp8ODBePr0KebOnYuUlBQ0a9YMERER6onbSUlJMDLSzHHx8fE4c+YMDh8+rLU/uVyOa9euYcuWLcjMzISLiwv8/PywcOFCjVNkP/30EyZPnoyuXbvCyMgIAwYMwMqVK3X7YYmIiKhCkKlUqop/7bIIsrOzYWNjg6ysLJ1O0nYPOqCzfetL4pLeb+5UAXAsiIgqvrL+/hb9FBsRERGRoWFAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISMBa7ACKit+UedEDsEspF4pLeYpdARKXgESQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgGDCEirV6+Gu7s7zM3N4ePjg5iYmFL7durUCTKZTOvRu/fLq0EKCwsxa9YseHp6wtLSEi4uLvD398eTJ0809uPu7q61jyVLluj0cxIREVHFIHpA2rFjBxQKBUJCQnDp0iU0bdoU3bt3R1paWon9d+/ejeTkZPUjLi4OcrkcAwcOBADk5eXh0qVLmDNnDi5duoTdu3cjPj4effv21drXggULNPY1ZcoUnX5WIiIiqhhEXwdp2bJlGD9+PAICAgAAa9euxYEDB7Bx40YEBQVp9a9WrZrG8/DwcFhYWKgDko2NDY4cOaLRZ9WqVWjdujWSkpJQq1YtdbuVlRWcnJzKVGd+fj7y8/PVz7OzswG8PGJVWFhYpn38HWZylc72rS+6/H70iWNhOCrDWACVZzyIKpKy/tzJVCqVaP+nKSgogIWFBXbt2oV+/fqp20eNGoXMzEz8+uuvb9yHp6cnfH19sW7dulL7HD16FH5+fsjMzIS1tTWAl6fYXrx4gcLCQtSqVQvDhg3DjBkzYGxccmacN28e5s+fr9W+bds2WFhYvLFOIiIiEl9eXh6GDRuGrKwsdSYoiahHkNLT01FcXAxHR0eNdkdHR9y6deuNr4+JiUFcXBzCwsJK7fPixQvMmjULQ4cO1fgipk6dihYtWqBatWo4d+4cgoODkZycjGXLlpW4n+DgYCgUCvXz7OxsuLm5wc/P77Vf8D/VZF6kzvatL3HzuotdQrngWBiOyjAWQOUZD6KK5NUZoDcR/RTbPxEWFgZPT0+0bt26xO2FhYUYNGgQVCoV1qxZo7Htr2HHy8sLpqammDBhAkJDQ2FmZqa1LzMzsxLbTUxMYGJi8g8/Senyi2U627e+6PL70SeOheGoDGMBVJ7xIKpIyvpzJ+okbXt7e8jlcqSmpmq0p6amvnFuUG5uLsLDwzF27NgSt78KRw8ePMCRI0feeJTHx8cHRUVFSExMfKvPQERERJWPqAHJ1NQU3t7eiIqKUrcplUpERUXB19f3ta/duXMn8vPzMWLECK1tr8LRnTt3cPToUVSvXv2NtVy5cgVGRkZwcHB4+w9CRERElYrop9gUCgVGjRqFli1bonXr1lixYgVyc3PVV7X5+/vD1dUVoaGhGq8LCwtDv379tMJPYWEhPvroI1y6dAn79+9HcXExUlJSALy8As7U1BTR0dG4cOECOnfuDCsrK0RHR2PGjBkYMWIE7Ozs9PPBiYiIyGCJHpAGDx6Mp0+fYu7cuUhJSUGzZs0QERGhnridlJQEIyPNA13x8fE4c+YMDh8+rLW/x48fY+/evQCAZs2aaWw7fvw4OnXqBDMzM4SHh2PevHnIz8+Hh4cHZsyYoTEviYiIiKRL9IAEAJMnT8bkyZNL3HbixAmttgYNGqC01Qnc3d1L3fZKixYtcP78+beuk4iIiKRB9JW0iYiIiAwNAxIRERGRAAMSERERkQADEhEREZEAAxIRERGRAAMSERERkQADEhEREZEAAxIRERGRAAMSERERkQADEhEREZEAAxIRERGRAAMSERERkQADEhEREZEAAxIRERGRAAMSERERkQADEhEREZHAWwWk7OxsKJVKrfbi4mJkZ2eXW1FEREREYipzQNqzZw9atmyJFy9eaG178eIFWrVqhX379pVrcURERERiKHNAWrNmDT7//HNYWFhobbO0tMSsWbOwatWqci2OiIiISAxlDkhxcXHo1KlTqds7dOiA69evl0dNRERERKIqc0D6448/UFRUVOr2wsJC/PHHH+VSFBEREZGYyhyQ3N3dcfHixVK3X7x4EbVr1y6XooiIiIjEVOaA1L9/f3z55ZdITU3V2paSkoLZs2djwIAB5VocERERkRiMy9oxKCgIv/76K+rVq4cRI0agQYMGAIBbt27hp59+gpubG4KCgnRWKBEREZG+lDkgWVlZ4ezZswgODsaOHTvU841sbW0xYsQIfPXVV7CystJZoURERET6UuaABAA2Njb497//jdWrVyM9PR0qlQo1atSATCbTVX1EREREevdWAemV69ev4/bt2wCABg0awNPTs1yLIiIiIhLTWwWkmJgYjB07Fr///jtUKhUAQCaT4d1330VYWBhatWqlkyKJiIiI9KnMV7H9/vvv6Nq1K6pUqYIff/wRly5dwqVLl/DDDz/AzMwMXbt2xe+//67LWomIiIj0osxHkObNm4f3338f//3vfzXmHDVr1gxDhw5F//79MW/ePPz88886KZSIiIhIX8ockI4fP45Dhw6VOCFbJpPhiy++QK9evcq1OCIiIiIxlPkU2/Pnz+Ho6FjqdicnJzx//rxciiIiIiISU5kDUu3atRETE1Pq9gsXLvBWI0RERFQplDkgDRkyBAqFAnFxcVrbrl+/jpkzZ2Lw4MHlWhwRERGRGMo8Byk4OBhHjx5Fs2bN8P7776NRo0ZQqVS4efMmjh49itatW+OLL77QZa1EREREelHmI0jm5uY4fvw4vvrqKyQnJ2Pt2rX4z3/+g5SUFCxatAjHjx+Hubn53ypi9erVcHd3h7m5OXx8fF57Kq9Tp06QyWRaj969e6v7qFQqzJ07F87OzqhSpQq6deuGO3fuaOwnIyMDw4cPh7W1NWxtbTF27Fjk5OT8rfqJiIiocilzQAIAU1NTzJo1C1euXEFeXh7y8vJw5coVBAUFwczM7G8VsGPHDigUCoSEhODSpUto2rQpunfvjrS0tBL77969G8nJyepHXFwc5HI5Bg4cqO6zdOlSrFy5EmvXrsWFCxdgaWmJ7t2748WLF+o+w4cPx40bN3DkyBHs378fp06dwscff/y3PgMRERFVLm8VkF4nOTkZkydPfuvXLVu2DOPHj0dAQAAaN26MtWvXwsLCAhs3biyxf7Vq1eDk5KR+HDlyBBYWFuqApFKpsGLFCsyePRsffPABvLy8sHXrVjx58gS//PILAODmzZuIiIjAhg0b4OPjg/bt2+P7779HeHg4njx58re/AyIiIqoc3upWIzdu3MDx48dhamqKQYMGwdbWFunp6Vi0aBH+85//oE6dOm/15gUFBYiNjUVwcLC6zcjICN26dUN0dHSZ9hEWFoYhQ4bA0tISAHD//n2kpKSgW7du6j42Njbw8fFBdHQ0hgwZgujoaNja2qJly5bqPt26dYORkREuXLiADz/8UOt98vPzkZ+fr36enZ0NACgsLERhYeFbfe63YSZX6Wzf+qLL70efOBaGozKMBVB5xoOoIinrz12ZA9LevXvx0UcfoaioCMDL01jr16/HoEGD4O3tjT179qBHjx5vVWR6ejqKi4u11ldydHTErVu33vj6mJgYxMXFISwsTN2WkpKi3odwn6+2paSkwMHBQWO7sbExqlWrpu4jFBoaivnz52u1Hz58GBYWFm+s9e9a2lpnu9abgwcPil1CueBYGI7KMBZA5RkPoookLy+vTP3KHJAWLVqEwMBALFy4EBs2bIBCocDUqVNx8OBB0W5SGxYWBk9PT7Rurfv/WwYHB0OhUKifZ2dnw83NDX5+frC2ttbZ+zaZF6mzfetL3LzuYpdQLjgWhqMyjAVQecaDqCJ5dQboTcockOLj47Ft2zZUrVoVU6ZMwcyZM7F8+fJ/FI7s7e0hl8uRmpqq0Z6amgonJ6fXvjY3Nxfh4eFYsGCBRvur16WmpsLZ2Vljn82aNVP3EU4CLyoqQkZGRqnva2ZmVuJEdBMTE5iYmLy21n8iv1j71i4VjS6/H33iWBiOyjAWQOUZD6KKpKw/d291q5FXR0rkcjmqVKny1nOOhExNTeHt7Y2oqCh1m1KpRFRUFHx9fV/72p07dyI/Px8jRozQaPfw8ICTk5PGPrOzs3HhwgX1Pn19fZGZmYnY2Fh1n2PHjkGpVMLHx+cffSYiIiKq+N5qknZkZCRsbGwA/C/ICFfW7tu371sVoFAoMGrUKLRs2RKtW7fGihUrkJubi4CAAACAv78/XF1dERoaqvG6sLAw9OvXD9WrV9dol8lkmD59OhYtWoR69erBw8MDc+bMgYuLC/r16wcAaNSoEXr06IHx48dj7dq1KCwsxOTJkzFkyBC4uLi8Vf1ERERU+bxVQBo1apTG8wkTJmg8l8lkKC4ufqsCBg8ejKdPn2Lu3LlISUlBs2bNEBERoZ5knZSUBCMjzQNd8fHxOHPmDA4fPlziPj///HPk5ubi448/RmZmJtq3b4+IiAiNhSx/+uknTJ48GV27doWRkREGDBiAlStXvlXtREREVDnJVCpV5bheVs+ys7NhY2ODrKwsnU7Sdg86oLN960vikt5v7lQBcCwMR2UYC6DyjAdRRVLW39/ltlAkERERUWVR5lNspZ1+srGxQf369d84qZqIiIiooihzQFq+fHmJ7ZmZmcjKykLbtm2xd+9eVKtWrdyKIyIiIhJDmU+x3b9/v8THH3/8gbt370KpVGL27Nm6rJWIiIhIL8plDlKdOnWwZMmSUq8qIyIiIqpIym2Sdq1atUq9jxkRERFRRVJuAen69euoXbt2ee2OiIiISDRlnqRd2s3dsrKyEBsbi08//VRrIUkiIiKiiqjMAcnW1hYyWck3iJTJZBg3bhyCgoLKrTAiIiIisZQ5IB0/frzEdmtra9SrVw9Vq1ZFXFwcmjRpUm7FEREREYmhzAGpY8eOJbY/f/4c27ZtQ1hYGC5evPjW92IjIiIiMjR/e5L2qVOnMGrUKDg7O+Nf//oXOnfujPPnz5dnbURERESiKPMRJABISUnB5s2bERYWhuzsbAwaNAj5+fn45Zdf0LhxY13VSERERKRXZT6C1KdPHzRo0ADXrl3DihUr8OTJE3z//fe6rI2IiIhIFGU+gnTo0CFMnToVEydORL169XRZExEREZGoynwE6cyZM3j+/Dm8vb3h4+ODVatWIT09XZe1EREREYmizAGpTZs2WL9+PZKTkzFhwgSEh4fDxcUFSqUSR44cwfPnz3VZJxEREZHevPVVbJaWlhgzZgzOnDmD69ev49NPP8WSJUvg4OCAvn376qJGIiIiIr36R/dia9CgAZYuXYpHjx5h+/bt5VUTERERkajK5Wa1crkc/fr1w969e8tjd0RERESiKpeARERERFSZMCARERERCTAgEREREQkwIBEREREJMCARERERCTAgEREREQkwIBEREREJMCARERERCTAgEREREQkwIBEREREJMCARERERCTAgEREREQkwIBEREREJMCARERERCTAgEREREQmIHpBWr14Nd3d3mJubw8fHBzExMa/tn5mZicDAQDg7O8PMzAz169fHwYMH1dvd3d0hk8m0HoGBgeo+nTp10tr+ySef6OwzEhERUcViLOab79ixAwqFAmvXroWPjw9WrFiB7t27Iz4+Hg4ODlr9CwoK8P7778PBwQG7du2Cq6srHjx4AFtbW3Wf3377DcXFxerncXFxeP/99zFw4ECNfY0fPx4LFixQP7ewsCj/D0hEREQVkqgBadmyZRg/fjwCAgIAAGvXrsWBAwewceNGBAUFafXfuHEjMjIycO7cOZiYmAB4ecTor2rUqKHxfMmSJXjnnXfQsWNHjXYLCws4OTmV46chIiKiykK0gFRQUIDY2FgEBwer24yMjNCtWzdER0eX+Jq9e/fC19cXgYGB+PXXX1GjRg0MGzYMs2bNglwuL/E9fvzxRygUCshkMo1tP/30E3788Uc4OTmhT58+mDNnzmuPIuXn5yM/P1/9PDs7GwBQWFiIwsLCt/rsb8NMrtLZvvVFl9+PPnEsDEdlGAug8owHUUVS1p870QJSeno6iouL4ejoqNHu6OiIW7dulfiae/fu4dixYxg+fDgOHjyIu3fvYtKkSSgsLERISIhW/19++QWZmZkYPXq0RvuwYcNQu3ZtuLi44Nq1a5g1axbi4+Oxe/fuUusNDQ3F/PnztdoPHz6s09NzS1vrbNd689c5YhUZx8JwVIaxACrPeBBVJHl5eWXqJ1OpVKL8KfbkyRO4urri3Llz8PX1Vbd//vnnOHnyJC5cuKD1mvr16+PFixe4f/+++ojRsmXL8M033yA5OVmrf/fu3WFqaop9+/a9tpZjx46ha9euuHv3Lt55550S+5R0BMnNzQ3p6emwtrYu02f+O5rMi9TZvvUlbl53sUsoFxwLw1EZxgKoPONBVJFkZ2fD3t4eWVlZr/39LdoRJHt7e8jlcqSmpmq0p6amljo3yNnZGSYmJhqn0xo1aoSUlBQUFBTA1NRU3f7gwQMcPXr0tUeFXvHx8QGA1wYkMzMzmJmZabWbmJio50PpQn6x7M2dDJwuvx994lgYjsowFkDlGQ+iiqSsP3eiXeZvamoKb29vREVFqduUSiWioqI0jij9Vbt27XD37l0olUp12+3bt+Hs7KwRjgBg06ZNcHBwQO/evd9Yy5UrVwC8DGBEREREoq6DpFAosH79emzZsgU3b97ExIkTkZubq76qzd/fX2MS98SJE5GRkYFp06bh9u3bOHDgABYvXqyxxhHwMmht2rQJo0aNgrGx5kGyhIQELFy4ELGxsUhMTMTevXvh7++PDh06wMvLS/cfmoiIiAyeqJf5Dx48GE+fPsXcuXORkpKCZs2aISIiQj1xOykpCUZG/8twbm5uiIyMxIwZM+Dl5QVXV1dMmzYNs2bN0tjv0aNHkZSUhDFjxmi9p6mpKY4ePYoVK1YgNzcXbm5uGDBgAGbPnq3bD0tEREQVhqgBCQAmT56MyZMnl7jtxIkTWm2+vr44f/78a/fp5+eH0uaeu7m54eTJk29dJxEREUmH6LcaISIiIjI0DEhEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCoq+DREREFZd70AGxSygXiUvefFsqkhYeQSIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhJgQCIiIiISYEAiIiIiEmBAIiIiIhIQPSCtXr0a7u7uMDc3h4+PD2JiYl7bPzMzE4GBgXB2doaZmRnq16+PgwcPqrfPmzcPMplM49GwYUONfbx48QKBgYGoXr06qlatigEDBiA1NVUnn4+IiIgqHlED0o4dO6BQKBASEoJLly6hadOm6N69O9LS0krsX1BQgPfffx+JiYnYtWsX4uPjsX79eri6umr0e/fdd5GcnKx+nDlzRmP7jBkzsG/fPuzcuRMnT57EkydP0L9/f519TiIiIqpYjMV882XLlmH8+PEICAgAAKxduxYHDhzAxo0bERQUpNV/48aNyMjIwLlz52BiYgIAcHd31+pnbGwMJyenEt8zKysLYWFh2LZtG7p06QIA2LRpExo1aoTz58+jTZs25fTpiIiIqKISLSAVFBQgNjYWwcHB6jYjIyN069YN0dHRJb5m79698PX1RWBgIH799VfUqFEDw4YNw6xZsyCXy9X97ty5AxcXF5ibm8PX1xehoaGoVasWACA2NhaFhYXo1q2bun/Dhg1Rq1YtREdHlxqQ8vPzkZ+fr36enZ0NACgsLERhYeHf/yLewEyu0tm+9UWX348+cSwMR2UYC6ByjAfHgiqaso61aAEpPT0dxcXFcHR01Gh3dHTErVu3SnzNvXv3cOzYMQwfPhwHDx7E3bt3MWnSJBQWFiIkJAQA4OPjg82bN6NBgwZITk7G/Pnz8d577yEuLg5WVlZISUmBqakpbG1ttd43JSWl1HpDQ0Mxf/58rfbDhw/DwsLiLT992S1trbNd681f54hVZBwLw1EZxgKoHOPBsaCKJi8vr0z9RD3F9raUSiUcHBywbt06yOVyeHt74/Hjx/jmm2/UAalnz57q/l5eXvDx8UHt2rXx888/Y+zYsX/7vYODg6FQKNTPs7Oz4ebmBj8/P1hbW//9D/UGTeZF6mzf+hI3r7vYJZQLjoXhqAxjAVSO8eBYUEXz6gzQm4gWkOzt7SGXy7WuHktNTS11/pCzszNMTEw0Tqc1atQIKSkpKCgogKmpqdZrbG1tUb9+fdy9excA4OTkhIKCAmRmZmocRXrd+wKAmZkZzMzMtNpNTEzU86F0Ib9YprN964suvx994lgYjsowFkDlGA+OBVU0ZR1r0a5iMzU1hbe3N6KiotRtSqUSUVFR8PX1LfE17dq1w927d6FUKtVtt2/fhrOzc4nhCABycnKQkJAAZ2dnAIC3tzdMTEw03jc+Ph5JSUmlvi8RERFJi6iX+SsUCqxfvx5btmzBzZs3MXHiROTm5qqvavP399eYxD1x4kRkZGRg2rRpuH37Ng4cOIDFixcjMDBQ3WfmzJk4efIkEhMTce7cOXz44YeQy+UYOnQoAMDGxgZjx46FQqHA8ePHERsbi4CAAPj6+vIKNiIiIgIg8hykwYMH4+nTp5g7dy5SUlLQrFkzREREqCduJyUlwcjofxnOzc0NkZGRmDFjBry8vODq6opp06Zh1qxZ6j6PHj3C0KFD8ezZM9SoUQPt27fH+fPnUaNGDXWf5cuXw8jICAMGDEB+fj66d++Of//73/r74ERERGTQRJ+kPXnyZEyePLnEbSdOnNBq8/X1xfnz50vdX3h4+Bvf09zcHKtXr8bq1avLXCcRERFJh+i3GiEiIiIyNAxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCDEhEREREAsZiF0BERET/nHvQAbFLKBeJS3qLXQIAHkEiIiIi0sKARERERCTAgEREREQkwIBEREREJMCARERERCTAgEREREQkwIBEREREJMCARERERCTAgEREREQkwIBEREREJMCARERERCTAgEREREQkIHpAWr16Ndzd3WFubg4fHx/ExMS8tn9mZiYCAwPh7OwMMzMz1K9fHwcPHlRvDw0NRatWrWBlZQUHBwf069cP8fHxGvvo1KkTZDKZxuOTTz7RyecjIiKiikfUgLRjxw4oFAqEhITg0qVLaNq0Kbp37460tLQS+xcUFOD9999HYmIidu3ahfj4eKxfvx6urq7qPidPnkRgYCDOnz+PI0eOoLCwEH5+fsjNzdXY1/jx45GcnKx+LF26VKeflYiIiCoOYzHffNmyZRg/fjwCAgIAAGvXrsWBAwewceNGBAUFafXfuHEjMjIycO7cOZiYmAAA3N3dNfpERERoPN+8eTMcHBwQGxuLDh06qNstLCzg5ORUzp+IiIiIKgPRAlJBQQFiY2MRHBysbjMyMkK3bt0QHR1d4mv27t0LX19fBAYG4tdff0WNGjUwbNgwzJo1C3K5vMTXZGVlAQCqVaum0f7TTz/hxx9/hJOTE/r06YM5c+bAwsKi1Hrz8/ORn5+vfp6dnQ0AKCwsRGFhYdk+9N9gJlfpbN/6osvvR584FoajMowFUDnGg2NhODgW5bt/mUqlEuUbffLkCVxdXXHu3Dn4+vqq2z///HOcPHkSFy5c0HpNw4YNkZiYiOHDh2PSpEm4e/cuJk2ahKlTpyIkJESrv1KpRN++fZGZmYkzZ86o29etW4fatWvDxcUF165dw6xZs9C6dWvs3r271HrnzZuH+fPna7Vv27bttcGKiIiIDEdeXh6GDRuGrKwsWFtbl9pP1FNsb0upVMLBwQHr1q2DXC6Ht7c3Hj9+jG+++abEgBQYGIi4uDiNcAQAH3/8sfrfnp6ecHZ2RteuXZGQkIB33nmnxPcODg6GQqFQP8/Ozoabmxv8/Pxe+wX/U03mReps3/oSN6+72CWUC46F4agMYwFUjvHgWBgOjkXZvDoD9CaiBSR7e3vI5XKkpqZqtKemppY6N8jZ2RkmJiYap9MaNWqElJQUFBQUwNTUVN0+efJk7N+/H6dOnULNmjVfW4uPjw8A4O7du6UGJDMzM5iZmWm1m5iYqOdD6UJ+sUxn+9YXXX4/+sSxMByVYSyAyjEeHAvDwbEo3/2LdhWbqakpvL29ERUVpW5TKpWIiorSOOX2V+3atcPdu3ehVCrVbbdv34azs7M6HKlUKkyePBl79uzBsWPH4OHh8cZarly5AuBlACMiIiIS9TJ/hUKB9evXY8uWLbh58yYmTpyI3Nxc9VVt/v7+GpO4J06ciIyMDEybNg23b9/GgQMHsHjxYgQGBqr7BAYG4scff8S2bdtgZWWFlJQUpKSk4M8//wQAJCQkYOHChYiNjUViYiL27t0Lf39/dOjQAV5eXvr9AoiIiMggiToHafDgwXj69Cnmzp2LlJQUNGvWDBEREXB0dAQAJCUlwcjofxnOzc0NkZGRmDFjBry8vODq6opp06Zh1qxZ6j5r1qwB8HIxyL/atGkTRo8eDVNTUxw9ehQrVqxAbm4u3NzcMGDAAMyePVv3H5iIiIgqBNEnaU+ePBmTJ08ucduJEye02nx9fXH+/PlS9/emi/Lc3Nxw8uTJt6qRiIiIpEX0W40QERERGRoGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkIiIiIgEGJCIiIiIBBiQiIiIiAdED0urVq+Hu7g5zc3P4+PggJibmtf0zMzMRGBgIZ2dnmJmZoX79+jh48OBb7fPFixcIDAxE9erVUbVqVQwYMACpqanl/tmIiIioYhI1IO3YsQMKhQIhISG4dOkSmjZtiu7duyMtLa3E/gUFBXj//feRmJiIXbt2IT4+HuvXr4erq+tb7XPGjBnYt28fdu7ciZMnT+LJkyfo37+/zj8vERERVQzGYr75smXLMH78eAQEBAAA1q5diwMHDmDjxo0ICgrS6r9x40ZkZGTg3LlzMDExAQC4u7u/1T6zsrIQFhaGbdu2oUuXLgCATZs2oVGjRjh//jzatGlTYq35+fnIz89XP8/KygIAZGRkoLCw8J99Ea9hXJSrs33ry7Nnz8QuoVxwLAxHZRgLoHKMB8fCcHAsyub58+cAAJVK9fqOKpHk5+er5HK5as+ePRrt/v7+qr59+5b4mp49e6qGDx+uGj9+vMrBwUH17rvvqr766itVUVFRmfcZFRWlAqD6448/NPrUqlVLtWzZslLrDQkJUQHggw8++OCDDz4qwePhw4evzSmiHUFKT09HcXExHB0dNdodHR1x69atEl9z7949HDt2DMOHD8fBgwdx9+5dTJo0CYWFhQgJCSnTPlNSUmBqagpbW1utPikpKaXWGxwcDIVCoX6uVCqRkZGB6tWrQyaTvc1HNxjZ2dlwc3PDw4cPYW1tLXY5ksfxMBwcC8PBsTAclWUsVCoVnj9/DhcXl9f2E/UU29tSKpVwcHDAunXrIJfL4e3tjcePH+Obb75BSEiITt/bzMwMZmZmGm3CkFVRWVtbV+j/2Csbjofh4FgYDo6F4agMY2FjY/PGPqIFJHt7e8jlcq2rx1JTU+Hk5FTia5ydnWFiYgK5XK5ua9SoEVJSUlBQUFCmfTo5OaGgoACZmZkaAed170tERETSItpVbKampvD29kZUVJS6TalUIioqCr6+viW+pl27drh79y6USqW67fbt23B2doapqWmZ9unt7Q0TExONPvHx8UhKSir1fYmIiEhaRL3MX6FQYP369diyZQtu3ryJiRMnIjc3V30Fmr+/P4KDg9X9J06ciIyMDEybNg23b9/GgQMHsHjxYgQGBpZ5nzY2Nhg7diwUCgWOHz+O2NhYBAQEwNfXt9Qr2CorMzMzhISEaJ06JHFwPAwHx8JwcCwMh+TG4rVTuPXg+++/V9WqVUtlamqqat26ter8+fPqbR07dlSNGjVKo/+5c+dUPj4+KjMzM1WdOnU0rmIryz5VKpXqzz//VE2aNEllZ2ensrCwUH344Yeq5ORknX1GIiIiqlhkKtWbFgIgIiIikhbRbzVCREREZGgYkIiIiIgEGJCIiIiIBBiQiIiIiAQYkCSkY8eO2Lp1K/7880+xSyFwPAxJSEgIHjx4IHYZRGRAGJAkpHnz5pg5cyacnJwwfvx4nD9/XuySJI3jYTh+/fVXvPPOO+jatSu2bduG/Px8sUsiEl1ubi7mzJmDtm3bom7duqhTp47Go7LjZf4SU1RUhL1792LLli04dOgQ6tatizFjxmDkyJFaN/kl3eN4GI7Lly9j06ZN2L59O4qKijBkyBCMGTMGrVq1Ers0ScvPz5fOwoQGZujQoTh58iRGjhwJZ2dnrRuzT5s2TaTK9IMBScLS0tKwbt06fPXVVyguLkavXr0wdepUdOnSRezSJInjYRgKCwuxb98+bNq0CZGRkWjYsCHGjh2L0aNHl+kGl/TPHDp0COHh4Th9+jQePnwIpVIJS0tLNG/eHH5+fggICHjjXdipfNja2uLAgQNo166d2KWIgqfYJComJgYhISH49ttv4eDggODgYNjb2+P//u//MHPmTLHLkxyOh+FQqVQoLCxEQUEBVCoV7OzssGrVKri5uWHHjh1il1dp7dmzB/Xr18eYMWNgbGyMWbNmYffu3YiMjMSGDRvQsWNHHD16FHXq1MEnn3yCp0+fil1ypWdnZ4dq1aqJXYZ4xFvEm/QtNTVV9a9//Uv17rvvqkxNTVUDBgxQHTp0SKVUKtV9Tp8+rbK0tBSxSungeBiWixcvqgIDA1XVqlVTOTs7q2bNmqW6c+eOevvKlStVDg4OIlZYubVp00a1f/9+VXFx8Wv7PXr0SDVr1izVsmXL9FSZdP3www+qjz76SJWbmyt2KaLgKTYJMTU1xTvvvIMxY8Zg9OjRqFGjhlaf7OxsfPDBBzh+/LgIFUoLx8NweHp64tatW/Dz88P48ePRp08fyOVyjT7p6elwcHCAUqkUqUoi/WrevDkSEhKgUqng7u4OExMTje2XLl0SqTL9MBa7ANKfqKgovPfee6/tY21tzV/GesLxMByDBg3CmDFj4OrqWmofe3t7hiM9KygowP379/HOO+/A2Ji/rvStX79+YpcgKh5BkqC0tDTEx8cDABo0aAAHBweRK5I2jodhefW/ROEVO6Q/eXl5mDJlCrZs2QIAuH37NurUqYMpU6bA1dUVQUFBIldIUsBJ2hLy/PlzjBw5Eq6urujYsSM6duwIV1dXjBgxAllZWWKXJzkcD8MSFhaGJk2awNzcHObm5mjSpAk2bNggdlmSFBwcjKtXr+LEiRMwNzdXt3fr1o0T5UUQGxuLH3/8ET/++CMuX74sdjl6w4AkIePGjcOFCxewf/9+ZGZmIjMzE/v378fFixcxYcIEscuTHI6H4Zg7dy6mTZuGPn36YOfOndi5cyf69OmDGTNmYO7cuWKXJzm//PILVq1ahfbt22scyXv33XeRkJAgYmXSkpaWhi5duqBVq1aYOnUqpk6dCm9vb3Tt2lUaVxGKOUOc9MvCwkJ1+vRprfZTp06pLCwsRKhI2jgehsPe3l61bds2rfZt27apqlevLkJF0lalShVVQkKCSqVSqapWrar+95UrV1TW1tZiliYpgwYNUrVs2VL1+++/q9tu3LihatmypWrIkCEiVqYfPIIkIdWrVy9xoTsbGxvY2dmJUJG0cTwMR2FhIVq2bKnV7u3tjaKiIhEqkraWLVviwIED6uevjiJt2LABvr6+YpUlOREREfj3v/+NRo0aqdsaN26M1atX49ChQyJWph8MSBIye/ZsKBQKpKSkqNtSUlLw2WefYc6cOSJWJk0cD8MxcuRIrFmzRqt93bp1GD58uAgVSdvixYvxxRdfYOLEiSgqKsJ3330HPz8/bNq0CV999ZXY5UmGUqnUurQfAExMTCRxRSevYpOQ5s2b4+7du8jPz0etWrUAAElJSTAzM0O9evU0+lb29S0MAcfDcEyZMgVbt26Fm5sb2rRpAwC4cOECkpKS4O/vr/FLYtmyZWKVKSkJCQlYsmQJrl69ipycHLRo0QKzZs2Cp6en2KVJxgcffIDMzExs375dfXuXx48fY/jw4bCzs8OePXtErlC3uLCEhEh9TQtDw/EwHHFxcWjRogUAqCcB29vbw97eHnFxcep+vPRff9555x2sX79e7DIkbdWqVejbty/c3d3h5uYGAHj48CGaNGmCH3/8UeTqdI9HkIiIyKAkJSW9dvurI66keyqVCkePHsWtW7cAAI0aNUK3bt1Erko/GJAkKDY2Fjdv3gTw8rLZ5s2bi1yRtHE8DMujR48AADVr1hS5EukyMjJ67dG64uJiPVZDUsVTbBKSlpaGIUOG4MSJE7C1tQUAZGZmonPnzggPDy/xXmCkOxwPw6FUKrFo0SJ8++23yMnJAQBYWVnh008/xZdffgkjI17Pok/CxQgLCwtx+fJlLFu2jJO0dWzlypX4+OOPYW5ujpUrV76279SpU/VUlTh4BElCBg8ejHv37mHr1q3qyzZ///13jBo1CnXr1sX27dtFrlBaOB6GIzg4GGFhYZg/fz7atWsHADhz5gzmzZuH8ePH85eygThw4AC++eYbnDhxQuxSKi0PDw9cvHgR1atXh4eHR6n9ZDIZ7t27p8fK9I8BSUJsbGxw9OhRtGrVSqM9JiYGfn5+yMzMFKcwieJ4GA4XFxesXbsWffv21Wj/9ddfMWnSJDx+/Fikyuiv7t69i6ZNmyI3N1fsUkgCeIpNQqS+poWh4XgYjoyMDDRs2FCrvWHDhsjIyBChImnLzs7WeK5SqZCcnIx58+ZpLYFB+lNcXIzr16+jdu3akljMlifWJaRLly6YNm0anjx5om57/PgxZsyYga5du4pYmTRxPAxH06ZNsWrVKq32VatWoWnTpiJUJG22traws7NTP6pVq4bGjRsjOjq6xAU9STemT5+OsLAwAC/DUYcOHdCiRQu4ublJ4jQnT7FJyMOHD9G3b1/cuHFDa02LvXv38qodPeN4GI6TJ0+id+/eqFWrlvpWFtHR0Xj48CEOHjyI9957T+QKpeXkyZMaz42MjFCjRg3UrVsXxsY88aEvNWvWxC+//IKWLVvil19+QWBgII4fP44ffvgBx44dw9mzZ8UuUacYkCRGymtaGCKOh+F48uQJVq9erTEWkyZNUq8gTPpRWFiICRMmYM6cOa+dJEy6Z25ujrt376JmzZr4+OOPYWFhgRUrVuD+/fto2rSp1qnQyoYBSSIKCwtRpUoVXLlyBU2aNBG7HMnjeBiOwsJC9OjRA2vXruX8FgNhY2ODK1euMCCJrHbt2li/fj26du0KDw8PrFmzBr1798aNGzfQvn17/PHHH2KXqFOcgyQRJiYmqFWrFhdYMxAcD8NhYmKCa9euiV0G/UW/fv3wyy+/iF2G5AUEBGDQoEFo0qQJZDKZ+uj2hQsXSryoobLhESQJCQsLw+7du/HDDz+gWrVqYpcjeRwPwzFjxgyYmZlhyZIlYpdCgHrRzq5du8Lb2xuWlpYa2yv7AoWGZNeuXXj48CEGDhyonhe5ZcsW2Nra4oMPPhC5Ot1iQJKQV3ePLywsRO3atbX+p8M7xusXx8NwTJkyBVu3bkW9evVK/IW8bNkykSqTljp16uC3335Dy5YtS+0jhQUKDVlmZqZ65f/KjpcDSMgHH3zAu5EbEI6H4YiLi0OLFi0AALdv3xa5GulKTExEcXEx7t+/L3YpBODrr7+Gu7s7Bg8eDAAYNGgQ/vvf/8LZ2RkHDx6El5eXyBXqFo8gERGRQTAyMkJKSgocHBzELoXw8rYjP/30E9q2bYsjR45g0KBB2LFjB37++WckJSXh8OHDYpeoUzyCJCGvDl9Xr15doz0zMxMtWrTgYWs943gYjjFjxuC7776DlZWVRntubi6mTJmCjRs3ilSZ9ERGRsLGxua1fYS3hCHdSElJUa/Rtn//fgwaNAh+fn5wd3eHj4+PyNXpHo8gSUhpf52lpqbCzc0NBQUFIlUmTRwPwyGXy5GcnKw1Funp6XByckJRUZFIlUmLkdGbL6yWyWS8+lNPXFxcsGvXLrRt2xYNGjTAokWLMHDgQMTHx6NVq1aVfh0kHkGSgL1796r/LfzrrLi4GFFRUVxvRI84HoYjOzsbKpUKKpUKz58/h7m5uXpbcXExDh48yNM9esZTbIajf//+GDZsGOrVq4dnz56hZ8+eAIDLly+jbt26IlenewxIEtCvXz8AL//yGjVqlMY2ExMTuLu749tvvxWhMmnieBgOW1tbyGQyyGQy1K9fX2u7TCbD/PnzRahMmnjRgmFZvnw53N3d8fDhQyxduhRVq1YFACQnJ2PSpEkiV6d7PMUmIR4eHvjtt99gb28vdikEjochOHnyJFQqFbp06YL//ve/GutRmZqaonbt2rzViB5xkjYZEgYkIpK8Bw8ewM3NrUxzYEh3AgICsHLlSq3J8iSeH374Af/5z39w7949REdHo3bt2lixYgU8PDwq/UKRPMUmMVFRUYiKikJaWhqUSqXGNl6po38cD8NQu3ZtZGZmIiYmpsSx8Pf3F6ky6cjNzcWmTZveqr9wQU8qX2vWrMHcuXMxffp0fPXVV+rJ8ba2tlixYkWlD0g8giQh8+fPx4IFC9CyZUs4Oztrne/fs2ePSJVJE8fDcOzbtw/Dhw9HTk4OrK2tNcZCJpMhIyNDxOqkwdnZGdOmTcOoUaPg7OxcYh+VSoWjR49i2bJl6NChA4KDg/VcpbQ0btwYixcvRr9+/WBlZYWrV6+iTp06iIuLQ6dOnZCeni52iTrFgCQhzs7OWLp0KUaOHCl2KQSOhyGpX78+evXqhcWLF8PCwkLsciQpPj4eX3zxBfbv349mzZqhZcuWcHFxgbm5Of744w/8/vvviI6OhrGxMYKDgzFhwgTI5XKxy67UqlSpglu3bqF27doaAenOnTvw8vLCn3/+KXaJOsVTbBJSUFCAtm3bil0G/X8cD8Px+PFjTJ06leFIRA0aNMB///tfJCUlYefOnTh9+jTOnTuHP//8E/b29mjevDnWr1+Pnj17MhjpiYeHB65cuYLatWtrtEdERKBRo0YiVaU/DEgSMm7cOGzbtg1z5swRuxQCx8OQdO/eHRcvXkSdOnXELkXyatWqhU8//RSffvopgJen1QAuASAGhUKBwMBAvHjxAiqVCjExMdi+fTtCQ0OxYcMGscvTOQYkCXnx4gXWrVuHo0ePwsvLCyYmJhrbecdy/eJ4GI7evXvjs88+w++//w5PT0+tseCtLfQvLCwMy5cvx507dwAA9erVw/Tp0zFu3DiRK5OOcePGoUqVKpg9ezby8vIwbNgwuLi44LvvvsOQIUPELk/nOAdJQjp37lzqNplMhmPHjumxGuJ4GI7XXd7PW1vo39y5c7Fs2TJMmTIFvr6+AIDo6GisWrUKM2bMwIIFC0SusPIrKirCtm3b0L17dzg6OiIvLw85OTmSWqOKAYmIiAxKjRo1sHLlSgwdOlSjffv27ZgyZUqlv3rKUFhYWODmzZtac5CkgquiSdDdu3cRGRmpvgKBGdnwpKWliV0CkWgKCwvRsmVLrXZvb2/eOFiPWrdujcuXL4tdhmgYkCTk2bNn6Nq1q/qS5uTkZADA2LFj1RMiSfcsLCzw9OlT9fPevXurxwIAUlNTS10HhspXr169kJWVpX6+ZMkSZGZmqp8/e/YMjRs3FqEyaRs5ciTWrFmj1b5u3ToMHz5chIqkadKkSfj000+xatUqREdH49q1axqPyo6n2CTE398faWlp2LBhAxo1aqRe0yIyMhIKhQI3btwQu0RJEN5v6q/riwD/C0jC1Zyp/MnlciQnJ6vHwtraGleuXNEYCxcXF85B0rMpU6Zg69atcHNzQ5s2bQAAFy5cQFJSEvz9/TUm0fNiBt0paW6eTCaDSqWSxNw8XsUmIYcPH0ZkZCRq1qyp0V6vXj08ePBApKqoJLykWT+Efx/y70XDEBcXhxYtWgAAEhISAAD29vawt7dHXFycuh9/TnTr/v37YpcgKgYkCcnNzS1xIbyMjAyYmZmJUBERkbbjx4+LXQIBkp2c/QoDkoS899572Lp1KxYuXAjg5V9fSqUSS5cufe0l51S+ZDKZ1r2++JewOEr67jkWRC/t3bu3xHaZTAZzc3PUrVsXHh4eeq5KfzgHSULi4uLQtWtXtGjRAseOHUPfvn1x48YNZGRk4OzZs3jnnXfELlESjIyMYGNjo/5FnJmZCWtra/X5fpVKhezs7Ep/ft8QGBkZoWfPnuojqPv27UOXLl3Ud4nPz89HREQEx4IkycjISD3n6K/+Og+pffv2+OWXX2BnZydSlbrDgCQxWVlZWLVqFa5evYqcnBy0aNECgYGBvGpKj7Zs2VKmfqNGjdJxJRQQEFCmfps2bdJxJUSGJyoqCl9++SW++uortG7dGgAQExODOXPmYPbs2bCxscGECRPg4+ODsLAwkastfwxIREREpKVJkyZYt26d1k21z549i48//hg3btzA0aNHMWbMGCQlJYlUpe5wHSQiIiLSkpCQAGtra612a2tr3Lt3D8DLq6Ar68rmDEhERESkxdvbG5999pnGwrZPnz7F559/jlatWgEA7ty5Azc3N7FK1ClexUZERERawsLC8MEHH6BmzZrqEPTw4UPUqVMHv/76KwAgJycHs2fPFrNMneEcJCIiIiqRUqnE4cOHcfv2bQBAgwYN8P7775e4ynZlw4AkIffv30dRURHq1aun0X7nzh2YmJjA3d1dnMKIiMigvXjxAmZmZpJaJ6zyR0BSGz16NM6dO6fVfuHCBYwePVr/BUncgAED8PXXX2u1L126FAMHDhShImn74Ycf0K5dO7i4uKhvvbNixQr1qQQiqVEqlVi4cCFcXV1RtWpV9a1H5syZUykv6xdiQJKQy5cvo127dlrtbdq0wZUrV/RfkMSdOnUKvXr10mrv2bMnTp06JUJF0rVmzRooFAr06tULmZmZ6oUhbW1tsWLFCnGLIxLJokWLsHnzZixduhSmpqbq9iZNmmDDhg0iVqYfDEgSIpPJ8Pz5c632rKwsrhQsgpycHI3/6bxiYmKC7OxsESqSru+//x7r16/Hl19+Cblcrm5v2bIlrl+/LmJlROLZunUr1q1bh+HDh2v8XDRt2hS3bt0SsTL9YECSkA4dOiA0NFQjDBUXFyM0NBTt27cXsTJp8vT0xI4dO7Taw8PD0bhxYxEqkq779++jefPmWu1mZmbIzc0VoSIi8T1+/Bh169bValcqlSgsLBShIv3iZf4S8vXXX6NDhw5o0KAB3nvvPQDA6dOnkZ2djWPHjolcnfTMmTMH/fv3R0JCArp06QLg5dL+27dvx86dO0WuTlo8PDxw5coVrbuXR0REoFGjRiJVRSSuxo0b4/Tp01o/F7t27SrxD4rKhgFJQho3boxr166p78VWpUoV+Pv7Y/LkyahWrZrY5UlOnz598Msvv2Dx4sXYtWsXqlSpAi8vLxw9ehQdO3YUuzxJUSgUCAwMxIsXL6BSqRATE4Pt27cjNDRUEnMtiEoyd+5cjBo1Co8fP4ZSqcTu3bsRHx+PrVu3Yv/+/WKXp3O8zJ+ICMBPP/2EefPmISEhAQDg4uKC+fPnY+zYsSJXRiSe06dPY8GCBRo3OJ87dy78/PzELk3nGJAquWvXrqFJkyYwMjLCtWvXXtvXy8tLT1URGa68vDzk5OTAwcFB7FKIDNbFixfRsmVLscvQKQakSs7IyAgpKSlwcHCAkZERZDIZShpymUzGK9n0oFq1arh9+zbs7e1hZ2f32kXXMjIy9FiZtC1atAjDhw+Hh4eH2KUQGYycnBzI5XJUqVJF3XblyhXMmTMHBw8erPS/MzgHqZK7f/8+atSoof43iWv58uWwsrJS/1tKq9Iasp07dyIkJAQ+Pj4YMWIEBg0aBHt7e7HLIhLFw4cPMWjQIMTExEAul2Py5MlYtGgRPvnkE+zYsQMffvhhiYsOVzY8giQhp06dQtu2bWFsrJmLi4qKcO7cOXTo0EGkyojEd+PGDfz0008IDw/Ho0eP8P7772P48OHo168fLCwsxC6PSG+GDBmC+Ph4jB07Frt378bJkyfRokUL+Pj4ICgoCDVr1hS7RL1gQJIQuVyO5ORkrbkVz549g4ODQ6U/XGpoOB6G6+zZs9i2bRt27tyJFy9ecOFOkhQXFxfs3r0bbdq0QVpaGpycnLBs2TJMnz5d7NL0igtFSohKpSrxlM6zZ89gaWkpQkXSVtrfJvn5+SWusE36Y2lpiSpVqsDU1FQSC+IR/VVqaqp6Pp6DgwMsLCzQs2dPkavSP85BkoD+/fsDeDkRe/To0TAzM1NvKy4uxrVr19C2bVuxypOclStXAng5Hhs2bEDVqlXV24qLi3Hq1Ck0bNhQrPIk6/79+9i2bRu2bduG+Ph4dOzYEfPnz8dHH30kdmlEemdkZKTxbyn+0caAJAE2NjYAXh6xsLKy0rgiwdTUFG3atMH48ePFKk9yli9fDuDleKxdu1bjHkempqZwd3fH2rVrxSpPktq0aYPffvsNXl5eCAgIwNChQ+Hq6ip2WUSiUKlUqF+/vvqMQ05ODpo3b64RmoDKf6UtA5IEbNq0CQDg7u6OmTNn8nSayF5dTdi5c2fs3r0bdnZ2IldEXbt2xcaNG3kPPCL873eG1HGSNhEREZEAjyBJSGpqKmbOnImoqCikpaVpTRLmVVP6VVxcjM2bN6vHQ6lUamznDYR1S6FQYOHChbC0tIRCoXht32XLlumpKiIyFAxIEjJ69GgkJSVhzpw5cHZ25iKFIps2bRo2b96M3r17o0mTJhwPPbt8+bL6CrXLly+X2o/jQiRNPMUmIVZWVjh9+jSaNWsmdikEwN7eHlu3bkWvXr3ELoWIiAS4DpKEuLm5lbr2Dumfqakp6tatK3YZRERUAh5BkpDDhw/j22+/xX/+8x+4u7uLXY7kffvtt7h37x5WrVrF0zgG4OLFi/j555+RlJSEgoICjW27d+8WqSoiEgsDkoTY2dkhLy8PRUVFsLCwgImJicb2yr6mhaH58MMPcfz4cVSrVg3vvvuu1njwl7L+hIeHw9/fH927d8fhw4fh5+eH27dvIzU1FR9++CEveyZJKu3iBZlMBnNzc9StWxcffPABqlWrpufK9IMBSUK2bNny2u2jRo3SUyUEAAEBAa/dzl/K+uPl5YUJEyYgMDAQVlZWuHr1Kjw8PDBhwgQ4Oztj/vz5YpdIpHedO3fGpUuXUFxcjAYNGgAAbt++DblcjoYNGyI+Ph4ymQxnzpyplGuIMSARkeRZWlrixo0bcHd3R/Xq1XHixAl4enri5s2b6NKlC5KTk8UukUjvVqxYgdOnT2PTpk2wtrYGAGRlZWHcuHFo3749xo8fj2HDhuHPP/9EZGSkyNWWP07SlqhXdyj/64NIquzs7PD8+XMAgKurK+Li4gAAmZmZyMvLE7M0ItF88803WLhwoTocAS9vXTVv3jwsXboUFhYWmDt3LmJjY0WsUne4DpKE5ObmYtasWfj555/x7Nkzre1cKFL/du3aVerE4EuXLolUlfR06NABR44cgaenJwYOHIhp06bh2LFjOHLkCLp27Sp2eUSiyMrKQlpamtbps6dPn6r/qLa1tdX6f1dlwSNIEvL555/j2LFjWLNmDczMzLBhwwbMnz8fLi4u2Lp1q9jlSc7KlSsREBAAR0dHXL58Ga1bt0b16tVx79499OzZU+zyJGXVqlUYMmQIAODLL7+EQqFAamoqBgwYgLCwMJGrIxLHBx98gDFjxmDPnj149OgRHj16hD179mDs2LHo168fACAmJgb169cXt1Ad4RwkCalVqxa2bt2KTp06wdraGpcuXULdunXxww8/YPv27Th48KDYJUpKw4YNERISgqFDh6onBtepUwdz585FRkYGVq1aJXaJRCRhOTk5mDFjBrZu3YqioiIAgLGxMUaNGoXly5fD0tISV65cAYBKuQAxA5KEVK1aFb///jtq1aqFmjVrYvfu3WjdujXu378PT09P5OTkiF2ipFhYWODmzZuoXbs2HBwccOTIETRt2hR37txBmzZtSjwNSkSkbzk5Obh37x4AoE6dOqhatarIFekHT7FJSJ06dXD//n0AL49e/PzzzwCAffv2wdbWVsTKpMnJyUm99lStWrVw/vx5AMD9+/e54rmeGBkZQS6Xv/ZhbMypmiRtVatWhZeXF7y8vCQTjgBO0paUgIAAXL16FR07dkRQUBD69OmDVatWobCwkHcrF0GXLl2wd+9eNG/eHAEBAZgxYwZ27dqFixcvon///mKXJwl79uwpdVt0dDRWrlwJpVKpx4qIDEdubi6WLFmCqKgopKWlaf0svDqqVFnxFJuEPXjwALGxsahbty68vLzELkdylEollEql+ghFeHg4zp07h3r16mHChAkwNTUVuUJpio+PR1BQEPbt24fhw4djwYIFqF27tthlEend0KFDcfLkSYwcORLOzs5at0SaNm2aSJXpBwMSERGAJ0+eICQkBFu2bEH37t0RGhqKJk2aiF0WkWhsbW1x4MABtGvXTuxSRMFTbBLz22+/4fjx4yUeLuVpNv178eIFrl27VuJ49O3bV6SqpCUrKwuLFy/G999/j2bNmiEqKgrvvfee2GURic7Ozq7S3metLBiQJGTx4sWYPXs2GjRoAEdHR43DpbybvP5FRETA398f6enpWttkMhkX7tSDpUuX4uuvv4aTkxO2b9+ODz74QOySiAzGwoULMXfuXGzZsgUWFhZil6N3PMUmIY6Ojvj6668xevRosUshAPXq1YOfnx/mzp0LR0dHscuRJCMjI1SpUgXdunWDXC4vtd/u3bv1WBWRYWjevDkSEhKgUqng7u4OExMTje2VfbV/HkGSECMjI8meSzZEqampUCgUDEci8vf359FTolK8Wi1bqngESUKWLl2KJ0+eYMWKFWKXQgDGjBmDdu3aYezYsWKXQkREAgxIEqJUKtG7d2/cvn0bjRs31jpcytMI+pWXl4eBAweiRo0a8PT01BqPqVOnilQZERHxFJuETJ06FcePH0fnzp1RvXp1nloQ2fbt23H48GGYm5vjxIkTWpPmGZCISN+qVauG27dvw97eHnZ2dq/9PfHqTgCVFY8gSYiVlRXCw8PRu3dvsUshvLzVyNSpUxEUFAQjI971h4jEt2XLFgwZMgRmZmbYsmXLa/uOGjVKT1WJgwFJQmrXro3IyEg0bNhQ7FIIL/9S++233/DOO++IXQoREQkwIEnIpk2bEBERgU2bNklyTQtDM2PGDNSoUQNffPGF2KUQEQEAsrOzy9zX2tpah5WIjwFJQqS+poWhmTp1KrZu3YqmTZvCy8tLazy4sjkR6ZuRkVGZ56dW9sVsOUlbQqS+poWhuX79Opo3bw4AiIuL09jGCfREJIbjx4+r/52YmIigoCCMHj0avr6+AIDo6Ghs2bIFoaGhYpWoNzyCRERERFq6du2KcePGYejQoRrt27Ztw7p163DixAlxCtMTXjojMZmZmdiwYQOCg4PVl2heunQJjx8/Frky6bp79y4iIyPx559/AgD4NwsRGYLo6Gi0bNlSq71ly5aIiYkRoSL9YkCSkGvXrqF+/fr4+uuv8a9//QuZmZkAXi4QGRwcLG5xEvTs2TN07doV9evXR69evZCcnAwAGDt2LD799FORqyMiqXNzc8P69eu12jds2AA3NzcRKtIvBiQJUSgUGD16NO7cuQNzc3N1e69evXDq1CkRK5OmGTNmwMTEBElJSRpXFQ4ePBgREREiVkZEBCxfvhzff/89PD09MW7cOIwbNw5eXl74/vvvsXz5crHL0zkGJAn57bffMGHCBK12V1dXpKSkiFCRtB0+fBhff/01atasqdFer149PHjwQKSqiIhe6tWrF27fvo0+ffogIyMDGRkZ6NOnD27fvo1evXqJXZ7O8So2CTEzMytxjYvbt2+jRo0aIlQkbbm5uSWuR5WRkQEzMzMRKiIi0uTm5obFixeLXYYoGJAkpG/fvliwYAF+/vlnAC8vJU9KSsKsWbMwYMAAkauTnvfeew9bt27FwoULAbwcD6VSiaVLl6Jz584iV0dEUnTt2rUy9/Xy8tJhJeLjZf4SkpWVhY8++ggXL17E8+fP4eLigpSUFPj6+uLgwYOwtLQUu0RJiYuLQ9euXdGiRQscO3YMffv2xY0bN5CRkYGzZ8/yFiREpHevFopUqVQa67G9igp/bavsC0UyIEnQ2bNncfXqVeTk5KBFixbo1q2b2CVJVlZWFlatWqUxHoGBgXB2dha7NCKSoL/Of7x8+TJmzpyJzz77TGOhyG+//RZLly6t9IsPMyARERGRltatW2PevHlaE7IPHjyIOXPmIDY2VqTK9INXsUlAdHQ09u/fr9G2detWeHh4wMHBAR9//DHy8/NFqk560tPTta5Su3HjBgICAjBo0CBs27ZNpMqIiP7n+vXr8PDw0Gr38PDA77//LkJF+sWAJAELFizAjRs31M+vX7+OsWPHolu3bggKCsK+ffskcV8dQzFlyhSsXLlS/TwtLQ3vvfcefvvtN+Tn52P06NH44YcfRKyQiAho1KgRQkNDUVBQoG4rKChAaGgoGjVqJGJl+sGr2CTgypUr6iulACA8PBw+Pj7qFVLd3NwQEhKCefPmiVShtJw/fx6bN29WP9+6dSuqVauGK1euwNjYGP/617+wevVqjBw5UrwiiUjy1q5diz59+qBmzZrqK9auXbsGmUyGffv2iVyd7jEgScAff/wBR0dH9fOTJ0+iZ8+e6uetWrXCw4cPxShNklJSUuDu7q5+fuzYMfTv3x/Gxi9/HPv27csjekQkutatW+PevXv46aefcOvWLQAvV/ofNmyYJK56ZkCSAEdHR9y/fx9ubm4oKCjApUuXMH/+fPX258+fw8TERMQKpcXa2hqZmZmoXbs2ACAmJgZjx45Vb5fJZJwTRkQGwdLSEh9//LHYZYiCc5AkoFevXggKCsLp06cRHBwMCwsLvPfee+rt165d45o7etSmTRusXLkSSqUSu3btwvPnz9GlSxf19tu3b0viRpBEZPh++OEHtG/fHi4uLuqLS5YvX45ff/1V5Mp0jwFJAhYuXAhjY2N07NgR69evx/r162FqaqrevnHjRvj5+YlYobQsXLgQe/fuRZUqVTB48GB8/vnnsLOzU28PDw9Hx44dRayQiAhYs2YNFAoFevbsiT/++EO9MKSdnR1WrFghbnF6wHWQJCQrKwtVq1aFXC7XaM/IyEDVqlU1QhPpVnp6Os6ePQsnJyf4+PhobDtw4AAaN25c4uW1RET60rhxYyxevBj9+vWDlZUVrl69ijp16iAuLg6dOnVCenq62CXqFAMSERERaalSpQpu3bqF2rVrawSkO3fuwMvLC3/++afYJeoUT7ERERGRFg8PD1y5ckWrPSIigusgERERkTQpFAoEBgbixYsXUKlUiImJwfbt2xEaGooNGzaIXZ7O8RQbERERleinn37CvHnzkJCQAABwcXHB/PnzNZYmqawYkIiIiOi18vLykJOTAwcHB7FL0RvOQSISUUJCAmbPno2hQ4ciLS0NAHDo0CGNe+cREYkpLS0NsbGxiI+Px9OnT8UuR28YkIhEcvLkSXh6euLChQvYvXs3cnJyAABXr15FSEiIyNURkdQ9f/4cI0eOhIuLCzp27IiOHTvCxcUFI0aMQFZWltjl6RwDEpFIgoKCsGjRIhw5ckRjDaouXbrg/PnzIlZGRASMGzcOFy5cwIEDB5CZmYnMzEzs378fFy9exIQJE8QuT+c4B4lIJFWrVsX169fh4eGhscZIYmIiGjZsiBcvXohdIhFJmKWlJSIjI9G+fXuN9tOnT6NHjx7Izc0VqTL94BEkIpHY2toiOTlZq/3y5ctwdXUVoSIiov+pXr06bGxstNptbGw0bo9UWTEgEYlkyJAhmDVrFlJSUiCTyaBUKnH27FnMnDkT/v7+YpdHRBI3e/ZsKBQKpKSkqNtSUlLw2WefYc6cOSJWph88xUYkkoKCAgQGBmLz5s0oLi6GsbExiouLMWzYMGzevFnrnnlERLrWvHlzyGQy9fM7d+4gPz8ftWrVAgAkJSXBzMwM9erVw6VLl8QqUy8YkIhE9vDhQ1y/fh05OTlo3rw56tWrJ3ZJRCRR8+fPL3Pfyn61LQMSkUgWLFiAmTNnwsLCQqP9zz//xDfffIO5c+eKVBkRETEgEYlELpcjOTlZa2XaZ8+ewcHBAcXFxSJVRkSkKScnB0qlUqPN2tpapGr0g5O0iUSiUqk0zvW/cvXqVVSrVk2EioiI/uf+/fvo3bs3LC0t1Veu2dnZwdbWVhJXsRmLXQCR1NjZ2UEmk0Emk6F+/foaIam4uBg5OTn45JNPRKyQiAgYMWIEVCoVNm7cCEdHxxL/oKvMeIqNSM+2bNkClUqFMWPGYMWKFRrrjJiamsLd3R2+vr4iVkhE9HIx29jYWDRo0EDsUkTBI0hEejZq1CgAgIeHB9q1awdjY/4YEpHhadWqFR4+fCjZgMQjSEQiSkhIwKZNm5CQkIDvvvsODg4OOHToEGrVqoV3331X7PKISMISEhLwySefYMSIEWjSpAlMTEw0tnt5eYlUmX4wIBGJ5OTJk+jZsyfatWuHU6dO4ebNm6hTpw6WLFmCixcvYteuXWKXSEQSdv78eQwbNgyJiYnqNplMpr7ApLJfacuARCQSX19fDBw4EAqFQuNmtTExMejfvz8ePXokdolEJGGNGzdGo0aN8Pnnn5c4Sbt27doiVaYfnPxAJJLr169j27ZtWu0ODg5IT08XoSIiov958OAB9u7di7p164pdiii4DhKRSGxtbZGcnKzVfvnyZbi6uopQERHR/3Tp0gVXr14VuwzR8AgSkUiGDBmCWbNmYefOnZDJZFAqlTh79ixmzpwJf39/scsjIonr06cPZsyYgevXr8PT01Nrknbfvn1Fqkw/OAeJSCQFBQUIDAzE5s2bUVxcDGNjYxQXF2PYsGHYvHkz5HK52CUSkYQZGZV+komTtIlIJ1QqFR4+fIgaNWogPT0d169fR05ODpo3b4569eqJXR4RkeQxIBGJQKlUwtzcHDdu3GAgIiIyQJykTSQCIyMj1KtXD8+ePRO7FCIiDb169UJWVpb6+ZIlS5CZmal+/uzZMzRu3FiEyvSLAYlIJEuWLMFnn32GuLg4sUshIlKLjIxEfn6++vnixYuRkZGhfl5UVIT4+HgxStMrXsVGJBJ/f3/k5eWhadOmMDU1RZUqVTS2//V/SERE+iKceSPVmTgMSEQiWbFihdglEBFRKRiQiEQyatQosUsgItIik8m0bisifC4FDEhEIjl48CDkcjm6d++u0X748GEUFxejZ8+eIlVGRFKmUqkwevRomJmZAQBevHiBTz75BJaWlgCgMT+pMuMkbSKRBAUFlbjQmlKpRFBQkAgVERG9PLrt4OAAGxsb2NjYYMSIEXBxcVE/d3BwkMRq/1wHiUgkVapUwc2bN+Hu7q7RnpiYiHfffRe5ubniFEZERDyCRCQWGxsb3Lt3T6v97t276kPZREQkDgYkIpF88MEHmD59OhISEtRtd+/exaefflrpbwJJRGToeIqNSCRZWVno0aMHLl68iJo1awIAHj16hPfeew+7d++Gra2tuAUSEUkYAxKRiFQqFY4cOYKrV6+iSpUq8PLyQocOHcQui4hI8hiQiAxIZmYmjxwRERkAzkEiEsnXX3+NHTt2qJ8PGjQI1atXh6urK65evSpiZURExIBEJJK1a9fCzc0NAHDkyBEcOXIEhw4dQs+ePfHZZ5+JXB0RkbRxJW0ikaSkpKgD0v79+zFo0CD4+fnB3d0dPj4+IldHRCRtPIJEJBI7Ozs8fPgQABAREYFu3boBeDlxu6QVtomISH94BIlIJP3798ewYcNQr149PHv2TH3vtcuXL6Nu3boiV0dEJG0MSEQiWb58Odzd3fHw4UMsXboUVatWBQAkJydj0qRJIldHRCRtvMyfiIiISIBHkIj0aO/evejZsydMTEywd+/e1/bl7UaIiMTDI0hEemRkZISUlBQ4ODjAyKj0ayRkMhknahMRiYgBiYiIiEiAl/kTERERCXAOEpEIlEolNm/ejN27dyMxMREymQweHh746KOPMHLkSMhkMrFLJCKSNJ5iI9IzlUqFPn364ODBg2jatCkaNmwIlUqFmzdv4vr16+jbty9++eUXscskIpI0HkEi0rPNmzfj1KlTiIqKQufOnTW2HTt2DP369cPWrVvh7+8vUoVERMQjSER65ufnhy5duiAoKKjE7YsXL8bJkycRGRmp58qIiOgVTtIm0rNr166hR48epW7v2bMnrl69qseKiIhIiAGJSM8yMjLg6OhY6nZHR0f88ccfeqyIiIiEGJCI9Ky4uBjGxqVP/5PL5SgqKtJjRUREJMRJ2kR6plKpMHr0aJiZmZW4PT8/X88VERGREAMSkZ6NGjXqjX14BRsRkbh4FRsRERGRAOcgEREREQkwIBEREREJMCARERERCTAgEREREQkwIBEREREJMCAREZXgxIkTkMlkyMzMLPd9y2Qy/PLLL+W+XyIqPwxIRCR5nTp1wvTp08Uug4gMCAMSERERkQADEhFVKJ06dcKUKVMwffp02NnZwdHREevXr0dubi4CAgJgZWWFunXr4tChQ+rXxMXFoWfPnqhatSocHR0xcuRIpKenAwBGjx6NkydP4rvvvoNMJoNMJkNiYqL6tbGxsWjZsiUsLCzQtm1bxMfHa9SzZs0avPPOOzA1NUWDBg3www8/aGy/c+cOOnToAHNzczRu3BhHjhzR3ZdDROWGAYmIKpwtW7bA3t4eMTExmDJlCiZOnIiBAweibdu2uHTpEvz8/DBy5Ejk5eUhMzMTXbp0QfPmzXHx4kVEREQgNTUVgwYNAgB899138PX1xfjx45GcnIzk5GS4ubmp3+vLL7/Et99+i4sXL8LY2BhjxoxRb9uzZw+mTZuGTz/9FHFxcZgwYQICAgJw/PhxAIBSqUT//v1hamqKCxcuYO3atZg1a5Z+vywi+ntUREQVSMeOHVXt27dXPy8qKlJZWlqqRo4cqW5LTk5WAVBFR0erFi5cqPLz89PYx8OHD1UAVPHx8ep9Tps2TaPP8ePHVQBUR48eVbcdOHBABUD1559/qlQqlapt27aq8ePHa7xu4MCBql69eqlUKpUqMjJSZWxsrHr8+LF6+6FDh1QAVHv27Pn7XwIR6RyPIBFRhePl5aX+t1wuR/Xq1eHp6aluc3R0BACkpaXh6tWrOH78OKpWrap+NGzYEACQkJDwVu/l7Oys3i8A3Lx5E+3atdPo365dO9y8eVO93c3NDS4uLurtvr6+b/VZiUgcxmIXQET0tkxMTDSey2QyjTaZTAbg5SmunJwc9OnTB19//bXWfl4FnrK+11/3S0SVG48gEVGl1qJFC9y4cQPu7u6oW7euxsPS0hIAYGpqiuLi4rfed6NGjXD27FmNtrNnz6Jx48bq7Q8fPkRycrJ6+/nz5//BpyEifWFAIqJKLTAwEBkZGRg6dCh+++03JCQkIDIyEgEBAepQ5O7ujgsXLiAxMRHp6ellPkL02WefYfPmzVizZg3u3LmDZcuWYffu3Zg5cyYAoFu3bqhfvz5GjRqFq1ev4vTp0/jyyy919lmJqPwwIBFRpebi4oKzZ8+iuLgYfn5+8PT0xPTp02Frawsjo5f/C5w5cybkcjkaN26MGjVqICkpqUz77tevH7777jv861//wrvvvov//Oc/2LRpEzp16gQAMDIywp49e/Dnn3+idevWGDduHL766itdfVQiKkcylUqlErsIIiIiIkPCI0hEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAIMSEREREQCDEhEREREAgxIRERERAL/D4fJzXnMM118AAAAAElFTkSuQmCC",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"metric = 'AUROC'\n",
"unc_df.set_index('metric').loc[metric].plot.bar(x='method', y='means')\n",
"plt.gca().set_ylabel(metric)\n",
"plt.gca().grid(axis='y')\n",
"plt.gca().set_ylim(0.6, 0.8)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: semantic_uncertainty/analyze_results.py
================================================
"""Compute overall performance metrics from predicted uncertainties."""
import argparse
import functools
import logging
import os
import pickle
import numpy as np
import wandb
from uncertainty.utils import utils
from uncertainty.utils.eval_utils import (
bootstrap, compatible_bootstrap, auroc, accuracy_at_quantile,
area_under_thresholded_accuracy)
utils.setup_logger()
result_dict = {}
UNC_MEAS = 'uncertainty_measures.pkl'
def init_wandb(wandb_runid, assign_new_wandb_id, experiment_lot, entity):
"""Initialize wandb session."""
user = os.environ['USER']
slurm_jobid = os.getenv('SLURM_JOB_ID')
scratch_dir = os.getenv('SCRATCH_DIR', '.')
kwargs = dict(
entity=entity,
project='semantic_uncertainty',
dir=f'{scratch_dir}/{user}/uncertainty',
notes=f'slurm_id: {slurm_jobid}, experiment_lot: {experiment_lot}',
)
if not assign_new_wandb_id:
# Restore wandb session.
wandb.init(
id=wandb_runid,
resume=True,
**kwargs)
wandb.restore(UNC_MEAS)
else:
api = wandb.Api()
wandb.init(**kwargs)
old_run = api.run(f'{entity}/semantic_uncertainty/{wandb_runid}')
old_run.file(UNC_MEAS).download(
replace=True, exist_ok=False, root=wandb.run.dir)
def analyze_run(
wandb_runid, assign_new_wandb_id=False, answer_fractions_mode='default',
experiment_lot=None, entity=None):
"""Analyze the uncertainty measures for a given wandb run id."""
logging.info('Analyzing wandb_runid `%s`.', wandb_runid)
# Set up evaluation metrics.
if answer_fractions_mode == 'default':
answer_fractions = [0.8, 0.9, 0.95, 1.0]
elif answer_fractions_mode == 'finegrained':
answer_fractions = [round(i, 3) for i in np.linspace(0, 1, 20+1)]
else:
raise ValueError
rng = np.random.default_rng(41)
eval_metrics = dict(zip(
['AUROC', 'area_under_thresholded_accuracy', 'mean_uncertainty'],
list(zip(
[auroc, area_under_thresholded_accuracy, np.mean],
[compatible_bootstrap, compatible_bootstrap, bootstrap]
)),
))
for answer_fraction in answer_fractions:
key = f'accuracy_at_{answer_fraction}_answer_fraction'
eval_metrics[key] = [
functools.partial(accuracy_at_quantile, quantile=answer_fraction),
compatible_bootstrap]
if wandb.run is None:
init_wandb(
wandb_runid, assign_new_wandb_id=assign_new_wandb_id,
experiment_lot=experiment_lot, entity=entity)
elif wandb.run.id != wandb_runid:
raise ValueError
# Load the results dictionary from a pickle file.
with open(f'{wandb.run.dir}/{UNC_MEAS}', 'rb') as file:
results_old = pickle.load(file)
result_dict = {'performance': {}, 'uncertainty': {}}
# First: Compute simple accuracy metrics for model predictions.
all_accuracies = dict()
all_accuracies['accuracy'] = 1 - np.array(results_old['validation_is_false'])
for name, target in all_accuracies.items():
result_dict['performance'][name] = {}
result_dict['performance'][name]['mean'] = np.mean(target)
result_dict['performance'][name]['bootstrap'] = bootstrap(np.mean, rng)(target)
rum = results_old['uncertainty_measures']
if 'p_false' in rum and 'p_false_fixed' not in rum:
# Restore log probs true: y = 1 - x --> x = 1 - y.
# Convert to probs --> np.exp(1 - y).
# Convert to p_false --> 1 - np.exp(1 - y).
rum['p_false_fixed'] = [1 - np.exp(1 - x) for x in rum['p_false']]
# Next: Uncertainty Measures.
# Iterate through the dictionary and compute additional metrics for each measure.
for measure_name, measure_values in rum.items():
logging.info('Computing for uncertainty measure `%s`.', measure_name)
# Validation accuracy.
validation_is_falses = [
results_old['validation_is_false'],
results_old['validation_unanswerable']
]
logging_names = ['', '_UNANSWERABLE']
# Iterate over predictions of 'falseness' or 'answerability'.
for validation_is_false, logging_name in zip(validation_is_falses, logging_names):
name = measure_name + logging_name
result_dict['uncertainty'][name] = {}
validation_is_false = np.array(validation_is_false)
validation_accuracy = 1 - validation_is_false
if len(measure_values) > len(validation_is_false):
# This can happen, but only for p_false.
if 'p_false' not in measure_name:
raise ValueError
logging.warning(
'More measure values for %s than in validation_is_false. Len(measure values): %d, Len(validation_is_false): %d',
measure_name, len(measure_values), len(validation_is_false))
measure_values = measure_values[:len(validation_is_false)]
fargs = {
'AUROC': [validation_is_false, measure_values],
'area_under_thresholded_accuracy': [validation_accuracy, measure_values],
'mean_uncertainty': [measure_values]}
for answer_fraction in answer_fractions:
fargs[f'accuracy_at_{answer_fraction}_answer_fraction'] = [validation_accuracy, measure_values]
for fname, (function, bs_function) in eval_metrics.items():
metric_i = function(*fargs[fname])
result_dict['uncertainty'][name][fname] = {}
result_dict['uncertainty'][name][fname]['mean'] = metric_i
logging.info("%s for measure name `%s`: %f", fname, name, metric_i)
result_dict['uncertainty'][name][fname]['bootstrap'] = bs_function(
function, rng)(*fargs[fname])
wandb.log(result_dict)
logging.info(
'Analysis for wandb_runid `%s` finished. Full results dict: %s',
wandb_runid, result_dict
)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--wandb_runids', nargs='+', type=str,
help='Wandb run ids of the datasets to evaluate on.')
parser.add_argument('--assign_new_wandb_id', default=True,
action=argparse.BooleanOptionalAction)
parser.add_argument('--answer_fractions_mode', type=str, default='default')
parser.add_argument(
"--experiment_lot", type=str, default='Unnamed Experiment',
help="Keep default wandb clean.")
parser.add_argument(
"--entity", type=str, help="Wandb entity.")
args, unknown = parser.parse_known_args()
if unknown:
raise ValueError(f'Unkown args: {unknown}')
wandb_runids = args.wandb_runids
for wid in wandb_runids:
logging.info('Evaluating wandb_runid `%s`.', wid)
analyze_run(
wid, args.assign_new_wandb_id, args.answer_fractions_mode,
experiment_lot=args.experiment_lot, entity=args.entity)
================================================
FILE: semantic_uncertainty/compute_uncertainty_measures.py
================================================
"""Compute uncertainty measures after generating answers."""
from collections import defaultdict
import logging
import os
import pickle
import numpy as np
import wandb
from analyze_results import analyze_run
from uncertainty.data.data_utils import load_ds
from uncertainty.uncertainty_measures.p_ik import get_p_ik
from uncertainty.uncertainty_measures.semantic_entropy import get_semantic_ids
from uncertainty.uncertainty_measures.semantic_entropy import logsumexp_by_id
from uncertainty.uncertainty_measures.semantic_entropy import predictive_entropy
from uncertainty.uncertainty_measures.semantic_entropy import predictive_entropy_rao
from uncertainty.uncertainty_measures.semantic_entropy import cluster_assignment_entropy
from uncertainty.uncertainty_measures.semantic_entropy import context_entails_response
from uncertainty.uncertainty_measures.semantic_entropy import EntailmentDeberta
from uncertainty.uncertainty_measures.semantic_entropy import EntailmentGPT4
from uncertainty.uncertainty_measures.semantic_entropy import EntailmentGPT35
from uncertainty.uncertainty_measures.semantic_entropy import EntailmentGPT4Turbo
from uncertainty.uncertainty_measures.semantic_entropy import EntailmentLlama
from uncertainty.uncertainty_measures import p_true as p_true_utils
from uncertainty.utils import utils
utils.setup_logger()
EXP_DETAILS = 'experiment_details.pkl'
def main(args):
if args.train_wandb_runid is None:
args.train_wandb_runid = args.eval_wandb_runid
user = os.environ['USER']
scratch_dir = os.getenv('SCRATCH_DIR', '.')
wandb_dir = f'{scratch_dir}/{user}/uncertainty'
slurm_jobid = os.getenv('SLURM_JOB_ID', None)
project = "semantic_uncertainty" if not args.debug else "semantic_uncertainty_debug"
if args.assign_new_wandb_id:
logging.info('Assign new wandb_id.')
api = wandb.Api()
old_run = api.run(f'{args.restore_entity_eval}/{project}/{args.eval_wandb_runid}')
wandb.init(
entity=args.entity,
project=project,
dir=wandb_dir,
notes=f'slurm_id: {slurm_jobid}, experiment_lot: {args.experiment_lot}',
# For convenience, keep any 'generate_answers' configs from old run,
# but overwrite the rest!
# NOTE: This means any special configs affecting this script must be
# called again when calling this script!
config={**old_run.config, **args.__dict__},
)
def restore(filename):
old_run.file(filename).download(
replace=True, exist_ok=False, root=wandb.run.dir)
class Restored:
name = f'{wandb.run.dir}/{filename}'
return Restored
else:
logging.info('Reuse active wandb id.')
def restore(filename):
class Restored:
name = f'{wandb.run.dir}/{filename}'
return Restored
if args.train_wandb_runid != args.eval_wandb_runid:
logging.info(
"Distribution shift for p_ik. Training on embeddings from run %s but evaluating on run %s",
args.train_wandb_runid, args.eval_wandb_runid)
is_ood_eval = True # pylint: disable=invalid-name
api = wandb.Api()
old_run_train = api.run(f'{args.restore_entity_train}/semantic_uncertainty/{args.train_wandb_runid}')
filename = 'train_generations.pkl'
old_run_train.file(filename).download(
replace=True, exist_ok=False, root=wandb.run.dir)
with open(f'{wandb.run.dir}/{filename}', "rb") as infile:
train_generations = pickle.load(infile)
wandb.config.update(
{"ood_training_set": old_run_train.config['dataset']}, allow_val_change=True)
else:
is_ood_eval = False # pylint: disable=invalid-name
if args.compute_p_ik or args.compute_p_ik_answerable:
train_generations_pickle = restore('train_generations.pkl')
with open(train_generations_pickle.name, 'rb') as infile:
train_generations = pickle.load(infile)
wandb.config.update({"is_ood_eval": is_ood_eval}, allow_val_change=True)
# Load entailment model.
if args.compute_predictive_entropy:
logging.info('Beginning loading for entailment model.')
if args.entailment_model == 'deberta':
entailment_model = EntailmentDeberta()
elif args.entailment_model == 'gpt-4':
entailment_model = EntailmentGPT4(args.entailment_cache_id, args.entailment_cache_only)
elif args.entailment_model == 'gpt-3.5':
entailment_model = EntailmentGPT35(args.entailment_cache_id, args.entailment_cache_only)
elif args.entailment_model == 'gpt-4-turbo':
entailment_model = EntailmentGPT4Turbo(args.entailment_cache_id, args.entailment_cache_only)
elif 'llama' in args.entailment_model.lower():
entailment_model = EntailmentLlama(args.entailment_cache_id, args.entailment_cache_only, args.entailment_model)
else:
raise ValueError
logging.info('Entailment model loading complete.')
if args.compute_p_true_in_compute_stage:
# This is usually not called.
old_exp = restore(EXP_DETAILS)
with open(old_exp.name, "rb") as infile:
old_exp = pickle.load(infile)
if args.reuse_entailment_model:
pt_model = entailment_model.model
else:
pt_model = utils.init_model(old_exp['args'])
pt_train_dataset, pt_validation_dataset = load_ds(
old_exp['args'].dataset, add_options=old_exp['args'].use_mc_options,
seed=args.random_seed)
del pt_validation_dataset
# Reduce num generations used in p_true if needed!
if not args.use_all_generations:
if args.use_num_generations == -1:
raise ValueError
num_gen = args.use_num_generations
else:
num_gen = args.num_generations
p_true_few_shot_prompt, p_true_responses, len_p_true = p_true_utils.construct_few_shot_prompt(
model=pt_model,
dataset=pt_train_dataset,
indices=old_exp['p_true_indices'],
prompt=old_exp['prompt'],
brief=old_exp['BRIEF'],
brief_always=old_exp['args'].brief_always and old_exp['args'].enable_brief,
make_prompt=utils.get_make_prompt(old_exp['args']),
num_generations=num_gen,
metric=utils.get_metric(old_exp['args'].metric))
del p_true_responses
wandb.config.update(
{'p_true_num_fewshot': len_p_true}, allow_val_change=True)
wandb.log(dict(len_p_true=len_p_true))
logging.info('Generated few-shot prompt for p_true.')
logging.info(80*'#')
logging.info('p_true_few_shot_prompt: %s', p_true_few_shot_prompt)
logging.info(80*'#')
if args.recompute_accuracy:
# This is usually not enabled.
logging.warning('Recompute accuracy enabled. This does not apply to precomputed p_true!')
metric = utils.get_metric(args.metric)
# Restore outputs from `generate_answrs.py` run.
result_dict_pickle = restore('uncertainty_measures.pkl')
with open(result_dict_pickle.name, "rb") as infile:
result_dict = pickle.load(infile)
result_dict['semantic_ids'] = []
validation_generations_pickle = restore('validation_generations.pkl')
with open(validation_generations_pickle.name, 'rb') as infile:
validation_generations = pickle.load(infile)
entropies = defaultdict(list)
validation_embeddings, validation_is_true, validation_answerable = [], [], []
p_trues = []
count = 0 # pylint: disable=invalid-name
def is_answerable(generation):
return len(generation['reference']['answers']['text']) > 0
# Loop over datapoints and compute validation embeddings and entropies.
for idx, tid in enumerate(validation_generations):
example = validation_generations[tid]
question = example['question']
context = example['context']
full_responses = example["responses"]
most_likely_answer = example['most_likely_answer']
if not args.use_all_generations:
if args.use_num_generations == -1:
raise ValueError
responses = [fr[0] for fr in full_responses[:args.use_num_generations]]
else:
responses = [fr[0] for fr in full_responses]
if args.recompute_accuracy:
logging.info('Recomputing accuracy!')
if is_answerable(example):
acc = metric(most_likely_answer['response'], example, None)
else:
acc = 0.0 # pylint: disable=invalid-name
validation_is_true.append(acc)
logging.info('Recomputed accuracy!')
else:
validation_is_true.append(most_likely_answer['accuracy'])
validation_answerable.append(is_answerable(example))
validation_embeddings.append(most_likely_answer['embedding'])
logging.info('validation_is_true: %f', validation_is_true[-1])
if args.compute_predictive_entropy:
# Token log likelihoods. Shape = (n_sample, n_tokens)
if not args.use_all_generations:
log_liks = [r[1] for r in full_responses[:args.use_num_generations]]
else:
log_liks = [r[1] for r in full_responses]
for i in log_liks:
assert i
if args.compute_context_entails_response:
# Compute context entails answer baseline.
entropies['context_entails_response'].append(context_entails_response(
context, responses, entailment_model))
if args.condition_on_question and args.entailment_model == 'deberta':
responses = [f'{question} {r}' for r in responses]
# Compute semantic ids.
semantic_ids = get_semantic_ids(
responses, model=entailment_model,
strict_entailment=args.strict_entailment, example=example)
result_dict['semantic_ids'].append(semantic_ids)
# Compute entropy from frequencies of cluster assignments.
entropies['cluster_assignment_entropy'].append(cluster_assignment_entropy(semantic_ids))
# Length normalization of generation probabilities.
log_liks_agg = [np.mean(log_lik) for log_lik in log_liks]
# Compute naive entropy.
entropies['regular_entropy'].append(predictive_entropy(log_liks_agg))
# Compute semantic entropy.
log_likelihood_per_semantic_id = logsumexp_by_id(semantic_ids, log_liks_agg, agg='sum_normalized')
pe = predictive_entropy_rao(log_likelihood_per_semantic_id)
entropies['semantic_entropy'].append(pe)
# pylint: disable=invalid-name
log_str = 'semantic_ids: %s, avg_token_log_likelihoods: %s, entropies: %s'
entropies_fmt = ', '.join([f'{i}:{j[-1]:.2f}' for i, j in entropies.items()])
# pylint: enable=invalid-name
logging.info(80*'#')
logging.info('NEW ITEM %d at id=`%s`.', idx, tid)
logging.info('Context:')
logging.info(example['context'])
logging.info('Question:')
logging.info(question)
logging.info('True Answers:')
logging.info(example['reference'])
logging.info('Low Temperature Generation:')
logging.info(most_likely_answer['response'])
logging.info('Low Temperature Generation Accuracy:')
logging.info(most_likely_answer['accuracy'])
logging.info('High Temp Generation:')
logging.info([r[0] for r in full_responses])
logging.info('High Temp Generation:')
logging.info(log_str, semantic_ids, log_liks_agg, entropies_fmt)
if args.compute_p_true_in_compute_stage:
p_true = p_true_utils.calculate_p_true(
pt_model, question, most_likely_answer['response'],
responses, p_true_few_shot_prompt,
hint=old_exp['args'].p_true_hint)
p_trues.append(p_true)
logging.info('p_true: %s', np.exp(p_true))
count += 1
if count >= args.num_eval_samples:
logging.info('Breaking out of main loop.')
break
logging.info('Accuracy on original task: %f', np.mean(validation_is_true))
validation_is_false = [1.0 - is_t for is_t in validation_is_true]
result_dict['validation_is_false'] = validation_is_false
validation_unanswerable = [1.0 - is_a for is_a in validation_answerable]
result_dict['validation_unanswerable'] = validation_unanswerable
logging.info('Unanswerable prop on validation: %f', np.mean(validation_unanswerable))
if 'uncertainty_measures' not in result_dict:
result_dict['uncertainty_measures'] = dict()
if args.compute_predictive_entropy:
result_dict['uncertainty_measures'].update(entropies)
if args.compute_p_ik or args.compute_p_ik_answerable:
# Assemble training data for embedding classification.
train_is_true, train_embeddings, train_answerable = [], [], []
for tid in train_generations:
most_likely_answer = train_generations[tid]['most_likely_answer']
train_embeddings.append(most_likely_answer['embedding'])
train_is_true.append(most_likely_answer['accuracy'])
train_answerable.append(is_answerable(train_generations[tid]))
train_is_false = [0.0 if is_t else 1.0 for is_t in train_is_true]
train_unanswerable = [0.0 if is_t else 1.0 for is_t in train_answerable]
logging.info('Unanswerable prop on p_ik training: %f', np.mean(train_unanswerable))
if args.compute_p_ik:
logging.info('Starting training p_ik on train embeddings.')
# Train classifier of correct/incorrect from embeddings.
p_ik_predictions = get_p_ik(
train_embeddings=train_embeddings, is_false=train_is_false,
eval_embeddings=validation_embeddings, eval_is_false=validation_is_false)
result_dict['uncertainty_measures']['p_ik'] = p_ik_predictions
logging.info('Finished training p_ik on train embeddings.')
if args.compute_p_ik_answerable:
# Train classifier of answerable/unanswerable.
p_ik_predictions = get_p_ik(
train_embeddings=train_embeddings, is_false=train_unanswerable,
eval_embeddings=validation_embeddings, eval_is_false=validation_unanswerable)
result_dict['uncertainty_measures']['p_ik_unanswerable'] = p_ik_predictions
if args.compute_p_true_in_compute_stage:
result_dict['uncertainty_measures']['p_false'] = [1 - p for p in p_trues]
result_dict['uncertainty_measures']['p_false_fixed'] = [1 - np.exp(p) for p in p_trues]
utils.save(result_dict, 'uncertainty_measures.pkl')
if args.compute_predictive_entropy:
entailment_model.save_prediction_cache()
if args.analyze_run:
# Follow up with computation of aggregate performance metrics.
logging.info(50 * '#X')
logging.info('STARTING `analyze_run`!')
analyze_run(wandb.run.id)
logging.info(50 * '#X')
logging.info('FINISHED `analyze_run`!')
if __name__ == '__main__':
parser = utils.get_parser(stages=['compute'])
args, unknown = parser.parse_known_args() # pylint: disable=invalid-name
if unknown:
raise ValueError(f'Unkown args: {unknown}')
logging.info("Args: %s", args)
main(args)
================================================
FILE: semantic_uncertainty/generate_answers.py
================================================
"""Sample answers from LLMs on QA task."""
import gc
import os
import logging
import random
from tqdm import tqdm
import numpy as np
import torch
import wandb
from uncertainty.data.data_utils import load_ds
from uncertainty.utils import utils
from uncertainty.uncertainty_measures import p_true as p_true_utils
from compute_uncertainty_measures import main as main_compute
utils.setup_logger()
def main(args):
# Setup run.
if args.dataset == 'svamp':
if not args.use_context:
logging.info('Forcing `use_context=True` for svamp dataset.')
args.use_context = True
elif args.dataset == 'squad':
if not args.answerable_only:
logging.info('Forcing `answerable_only=True` for squad dataset.')
args.answerable_only = True
experiment_details = {'args': args}
random.seed(args.random_seed)
user = os.environ['USER']
slurm_jobid = os.getenv('SLURM_JOB_ID', None)
scratch_dir = os.getenv('SCRATCH_DIR', '.')
if not os.path.exists(f"{scratch_dir}/{user}/uncertainty"):
os.makedirs(f"{scratch_dir}/{user}/uncertainty")
wandb.init(
entity=args.entity,
project="semantic_uncertainty" if not args.debug else "semantic_uncertainty_debug",
dir=f"{scratch_dir}/{user}/uncertainty",
config=args,
notes=f'slurm_id: {slurm_jobid}, experiment_lot: {args.experiment_lot}',
)
logging.info('Finished wandb init.')
# Get accuracy metric.
metric = utils.get_metric(args.metric)
# Load dataset.
train_dataset, validation_dataset = load_ds(
args.dataset, add_options=args.use_mc_options, seed=args.random_seed)
if args.ood_train_dataset is not None:
logging.warning(
'Using OOD dataset %s to construct few-shot prompts and train p_ik.',
args.ood_train_dataset)
# Get indices of answerable and unanswerable questions and construct prompt.
train_dataset, _ = load_ds(args.ood_train_dataset, add_options=args.use_mc_options)
if not isinstance(train_dataset, list):
logging.info('Train dataset: %s', train_dataset)
# Get indices of answerable and unanswerable questions and construct prompt.
answerable_indices, unanswerable_indices = utils.split_dataset(train_dataset)
if args.answerable_only:
unanswerable_indices = []
val_answerable, val_unanswerable = utils.split_dataset(validation_dataset)
del val_unanswerable
validation_dataset = [validation_dataset[i] for i in val_answerable]
prompt_indices = random.sample(answerable_indices, args.num_few_shot)
experiment_details['prompt_indices'] = prompt_indices
remaining_answerable = list(set(answerable_indices) - set(prompt_indices))
# Create Few-Shot prompt.
make_prompt = utils.get_make_prompt(args)
BRIEF = utils.BRIEF_PROMPTS[args.brief_prompt]
arg = args.brief_always if args.enable_brief else True
prompt = utils.construct_fewshot_prompt_from_indices(
train_dataset, prompt_indices, BRIEF, arg, make_prompt)
experiment_details['prompt'] = prompt
experiment_details['BRIEF'] = BRIEF
logging.info('Prompt is: %s', prompt)
# Initialize model.
model = utils.init_model(args)
# Initialize prompt for p_true baseline.
if args.compute_p_true:
logging.info(80*'#')
logging.info('Constructing few-shot prompt for p_true.')
p_true_indices = random.sample(answerable_indices, args.p_true_num_fewshot)
remaining_answerable = list(set(remaining_answerable) - set(p_true_indices))
p_true_few_shot_prompt, p_true_responses, len_p_true = p_true_utils.construct_few_shot_prompt(
model=model, dataset=train_dataset, indices=p_true_indices,
prompt=prompt, brief=BRIEF,
brief_always=args.brief_always and args.enable_brief,
make_prompt=make_prompt, num_generations=args.num_generations,
metric=metric)
wandb.config.update(
{'p_true_num_fewshot': len_p_true}, allow_val_change=True)
wandb.log(dict(len_p_true=len_p_true))
experiment_details['p_true_indices'] = p_true_indices
experiment_details['p_true_responses'] = p_true_responses
experiment_details['p_true_few_shot_prompt'] = p_true_few_shot_prompt
logging.info('Finished constructing few-shot prompt for p_true.')
logging.info(80*'#')
logging.info('p_true_few_shot_prompt: %s', p_true_few_shot_prompt)
logging.info(80*'#')
# Start answer generation.
logging.info(80 * '=')
logging.info('Generating answers: ')
logging.info(80 * '=')
for dataset_split in ['train', 'validation']:
logging.info(80 * 'x')
logging.info('Starting with dataset_split %s.', dataset_split)
logging.info(80 * 'x')
# This will store all input data and model predictions.
accuracies, generations, results_dict, p_trues = [], {}, {}, []
if dataset_split == 'train':
if not args.get_training_set_generations:
logging.info('Skip training data.')
continue
dataset = train_dataset
possible_indices = list(set(remaining_answerable) | set(unanswerable_indices))
else:
dataset = validation_dataset
possible_indices = range(0, len(dataset))
# Evaluate over random subset of the datasets.
indices = random.sample(possible_indices, min(args.num_samples, len(dataset)))
experiment_details[dataset_split] = {'indices': indices}
if args.num_samples > len(dataset):
logging.warning('Not enough samples in dataset. Using all %d samples.', len(dataset))
it = 0
for index in tqdm(indices):
if (it + 1 % 10) == 0:
gc.collect()
torch.cuda.empty_cache()
it += 1
# Grab example at index.
example = dataset[index]
question, context = example["question"], example['context']
generations[example['id']] = {'question': question, 'context': context}
correct_answer = example['answers']['text']
current_input = make_prompt(
context, question, None, BRIEF, args.brief_always and args.enable_brief)
local_prompt = prompt + current_input
logging.info('Current input: '.ljust(15) + current_input)
full_responses = []
# We sample one low temperature answer on which we will compute the
# accuracy and args.num_generation high temperature answers which will
# be used to estimate the entropy variants.
if dataset_split == 'train' and args.get_training_set_generations_most_likely_only:
num_generations = 1
else:
num_generations = args.num_generations + 1
for i in range(num_generations):
# Temperature for first generation is always `0.1`.
temperature = 0.1 if i == 0 else args.temperature
predicted_answer, token_log_likelihoods, embedding = model.predict(
local_prompt, temperature)
embedding = embedding.cpu() if embedding is not None else None
# Only compute accuracy if question is answerable.
compute_acc = args.compute_accuracy_at_all_temps or (i == 0)
if correct_answer and compute_acc:
acc = metric(predicted_answer, example, model)
else:
acc = 0.0 # pylint: disable=invalid-name
if i == 0:
logging.info('Iteration ' + str(it) + ': ' + 80*'#')
if args.use_context:
logging.info('context: '.ljust(15) + str(context))
logging.info('question: '.ljust(15) + question)
logging.info('low-t prediction: '.ljust(15) + predicted_answer)
logging.info('correct answer: '.ljust(15) + str(correct_answer))
logging.info('accuracy: '.ljust(15) + str(acc))
accuracies.append(acc)
most_likely_answer_dict = {
'response': predicted_answer,
'token_log_likelihoods': token_log_likelihoods,
'embedding': embedding,
'accuracy': acc}
generations[example['id']].update({
'most_likely_answer': most_likely_answer_dict,
'reference': utils.get_reference(example)})
else:
logging.info('high-t prediction '.ljust(15) + str(i) + ' : ' + predicted_answer)
# Aggregate predictions over num_generations.
full_responses.append(
(predicted_answer, token_log_likelihoods, embedding, acc))
# Append all predictions for this example to `generations`.
generations[example['id']]['responses'] = full_responses
if args.compute_p_true and dataset_split == 'validation':
# Already compute p_true here. Avoid cost of generations in compute_uncertainty script.
p_true = p_true_utils.calculate_p_true(
model, question, most_likely_answer_dict['response'],
[r[0] for r in full_responses], p_true_few_shot_prompt,
hint=args.p_true_hint)
p_trues.append(p_true)
logging.info('p_true: %s', p_true)
# Save generations for that split.
utils.save(generations, f'{dataset_split}_generations.pkl')
# Log overall accuracy.
accuracy = np.mean(accuracies)
print(f"Overall {dataset_split} split accuracy: {accuracy}")
wandb.log({f"{dataset_split}_accuracy": accuracy})
if dataset_split == 'validation':
if args.compute_p_true:
results_dict['uncertainty_measures'] = {
'p_false': [1 - p for p in p_trues],
'p_false_fixed': [1 - np.exp(p) for p in p_trues],
}
utils.save(results_dict, 'uncertainty_measures.pkl')
utils.save(experiment_details, 'experiment_details.pkl')
logging.info('Run complete.')
del model
if __name__ == '__main__':
parser = utils.get_parser()
args, unknown = parser.parse_known_args()
logging.info('Starting new run with args: %s', args)
if unknown:
raise ValueError(f'Unkown args: {unknown}')
if args.compute_uncertainties:
args.assign_new_wandb_id = False
# First sample generations from LLM.
logging.info('STARTING `generate_answers`!')
main(args)
logging.info('FINISHED `generate_answers`!')
if args.compute_uncertainties:
# Follow with uncertainty calculation script by default.
args.assign_new_wandb_id = False
gc.collect()
torch.cuda.empty_cache()
logging.info(50 * '#X')
logging.info('STARTING `compute_uncertainty_measures`!')
main_compute(args)
logging.info('FINISHED `compute_uncertainty_measures`!')
================================================
FILE: semantic_uncertainty/uncertainty/__init__.py
================================================
================================================
FILE: semantic_uncertainty/uncertainty/data/data_utils.py
================================================
"""Data Loading Utilities."""
import os
import json
import hashlib
import datasets
def load_ds(dataset_name, seed, add_options=None):
"""Load dataset."""
user = os.environ['USER']
train_dataset, validation_dataset = None, None
if dataset_name == "squad":
dataset = datasets.load_dataset("squad_v2")
train_dataset = dataset["train"]
validation_dataset = dataset["validation"]
elif dataset_name == 'svamp':
dataset = datasets.load_dataset('ChilleD/SVAMP')
train_dataset = dataset["train"]
validation_dataset = dataset["test"]
reformat = lambda x: {
'question': x['Question'], 'context': x['Body'], 'type': x['Type'],
'equation': x['Equation'], 'id': x['ID'],
'answers': {'text': [str(x['Answer'])]}}
train_dataset = [reformat(d) for d in train_dataset]
validation_dataset = [reformat(d) for d in validation_dataset]
elif dataset_name == 'nq':
dataset = datasets.load_dataset("nq_open")
train_dataset = dataset["train"]
validation_dataset = dataset["validation"]
md5hash = lambda s: str(int(hashlib.md5(s.encode('utf-8')).hexdigest(), 16))
reformat = lambda x: {
'question': x['question']+'?',
'answers': {'text': x['answer']},
'context': '',
'id': md5hash(str(x['question'])),
}
train_dataset = [reformat(d) for d in train_dataset]
validation_dataset = [reformat(d) for d in validation_dataset]
elif dataset_name == "trivia_qa":
dataset = datasets.load_dataset('TimoImhof/TriviaQA-in-SQuAD-format')['unmodified']
dataset = dataset.train_test_split(test_size=0.2, seed=seed)
train_dataset = dataset['train']
validation_dataset = dataset['test']
elif dataset_name == "bioasq":
# http://participants-area.bioasq.org/datasets/ we are using training 11b
# could also download from here https://zenodo.org/records/7655130
scratch_dir = os.getenv('SCRATCH_DIR', '.')
path = f"{scratch_dir}/{user}/semantic_uncertainty/data/bioasq/training11b.json"
with open(path, "rb") as file:
data = json.load(file)
questions = data["questions"]
dataset_dict = {
"question": [],
"answers": [],
"id": []
}
for question in questions:
if "exact_answer" not in question:
continue
dataset_dict["question"].append(question["body"])
if "exact_answer" in question:
if isinstance(question['exact_answer'], list):
exact_answers = [
ans[0] if isinstance(ans, list) else ans
for ans in question['exact_answer']
]
else:
exact_answers = [question['exact_answer']]
dataset_dict["answers"].append({
"text": exact_answers,
"answer_start": [0] * len(question["exact_answer"])
})
else:
dataset_dict["answers"].append({
"text": question["ideal_answer"],
"answer_start": [0]
})
dataset_dict["id"].append(question["id"])
dataset_dict["context"] = [None] * len(dataset_dict["id"])
dataset = datasets.Dataset.from_dict(dataset_dict)
# Split into training and validation set.
dataset = dataset.train_test_split(test_size=0.8, seed=seed)
train_dataset = dataset['train']
validation_dataset = dataset['test']
else:
raise ValueError
return train_dataset, validation_dataset
================================================
FILE: semantic_uncertainty/uncertainty/models/__init__.py
================================================
================================================
FILE: semantic_uncertainty/uncertainty/models/base_model.py
================================================
from abc import ABC, abstractmethod
from typing import List, Text
STOP_SEQUENCES = ['\n\n\n\n', '\n\n\n', '\n\n', '\n', 'Question:', 'Context:']
class BaseModel(ABC):
stop_sequences: List[Text]
@abstractmethod
def predict(self, input_data, temperature):
pass
@abstractmethod
def get_p_true(self, input_data):
pass
================================================
FILE: semantic_uncertainty/uncertainty/models/huggingface_models.py
================================================
"""Implement HuggingfaceModel models."""
import copy
import logging
from collections import Counter
import torch
import accelerate
from transformers import AutoTokenizer
from transformers import AutoConfig
from transformers import AutoModelForCausalLM
from transformers import BitsAndBytesConfig
from transformers import StoppingCriteria
from transformers import StoppingCriteriaList
from huggingface_hub import snapshot_download
from uncertainty.models.base_model import BaseModel
from uncertainty.models.base_model import STOP_SEQUENCES
class StoppingCriteriaSub(StoppingCriteria):
"""Stop generations when they match a particular text or token."""
def __init__(self, stops, tokenizer, match_on='text', initial_length=None):
super().__init__()
self.stops = stops
self.initial_length = initial_length
self.tokenizer = tokenizer
self.match_on = match_on
if self.match_on == 'tokens':
self.stops = [torch.tensor(self.tokenizer.encode(i)).to('cuda') for i in self.stops]
print(self.stops)
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
del scores # `scores` arg is required by StoppingCriteria but unused by us.
for stop in self.stops:
if self.match_on == 'text':
generation = self.tokenizer.decode(input_ids[0][self.initial_length:], skip_special_tokens=False)
match = stop in generation
elif self.match_on == 'tokens':
# Can be dangerous due to tokenizer ambiguities.
match = stop in input_ids[0][-len(stop):]
else:
raise
if match:
return True
return False
def remove_split_layer(device_map_in):
"""Modify device maps s.t. individual layers are not spread across devices."""
device_map = copy.deepcopy(device_map_in)
destinations = list(device_map.keys())
counts = Counter(['.'.join(i.split('.')[:2]) for i in destinations])
found_split = False
for layer, count in counts.items():
if count == 1:
continue
if found_split:
# Only triggers if we find more than one split layer.
raise ValueError(
'More than one split layer.\n'
f'Currently at layer {layer}.\n'
f'In map: {device_map_in}\n'
f'Out map: {device_map}\n')
logging.info(f'Split layer is {layer}.')
# Remove split for that layer.
for name in list(device_map.keys()):
if name.startswith(layer):
print(f'pop {name}')
device = device_map.pop(name)
device_map[layer] = device
found_split = True
return device_map
class HuggingfaceModel(BaseModel):
"""Hugging Face Model."""
def __init__(self, model_name, stop_sequences=None, max_new_tokens=None):
if max_new_tokens is None:
raise
self.max_new_tokens = max_new_tokens
if stop_sequences == 'default':
stop_sequences = STOP_SEQUENCES
if 'llama' in model_name.lower():
if model_name.endswith('-8bit'):
kwargs = {'quantization_config': BitsAndBytesConfig(
load_in_8bit=True,)}
model_name = model_name[:-len('-8bit')]
eightbit = True
else:
kwargs = {}
eightbit = False
if 'Llama-2' in model_name:
base = 'meta-llama'
model_name = model_name + '-hf'
else:
base = 'huggyllama'
self.tokenizer = AutoTokenizer.from_pretrained(
f"{base}/{model_name}", device_map="auto",
token_type_ids=None)
llama65b = '65b' in model_name and base == 'huggyllama'
llama2_70b = '70b' in model_name and base == 'meta-llama'
if ('7b' in model_name or '13b' in model_name) or eightbit:
self.model = AutoModelForCausalLM.from_pretrained(
f"{base}/{model_name}", device_map="auto",
max_memory={0: '80GIB'}, **kwargs,)
elif llama2_70b or llama65b:
path = snapshot_download(
repo_id=f'{base}/{model_name}',
allow_patterns=['*.json', '*.model', '*.safetensors'],
ignore_patterns=['pytorch_model.bin.index.json']
)
config = AutoConfig.from_pretrained(f"{base}/{model_name}")
with accelerate.init_empty_weights():
self.model = AutoModelForCausalLM.from_config(config)
self.model.tie_weights()
max_mem = 15 * 4686198491
device_map = accelerate.infer_auto_device_map(
self.model.model,
max_memory={0: max_mem, 1: max_mem},
dtype='float16'
)
device_map = remove_split_layer(device_map)
full_model_device_map = {f"model.{k}": v for k, v in device_map.items()}
full_model_device_map["lm_head"] = 0
self.model = accelerate.load_checkpoint_and_dispatch(
self.model, path, device_map=full_model_device_map,
dtype='float16', skip_keys='past_key_values')
else:
raise ValueError
elif 'mistral' in model_name.lower():
if model_name.endswith('-8bit'):
kwargs = {'quantization_config': BitsAndBytesConfig(
load_in_8bit=True,)}
model_name = model_name[:-len('-8bit')]
if model_name.endswith('-4bit'):
kwargs = {'quantization_config': BitsAndBytesConfig(
load_in_4bit=True,)}
model_name = model_name[:-len('-4bit')]
else:
kwargs = {}
model_id = f'mistralai/{model_name}'
self.tokenizer = AutoTokenizer.from_pretrained(
model_id, device_map='auto', token_type_ids=None,
clean_up_tokenization_spaces=False)
self.model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map='auto',
max_memory={0: '80GIB'},
**kwargs,
)
elif 'falcon' in model_name:
model_id = f'tiiuae/{model_name}'
self.tokenizer = AutoTokenizer.from_pretrained(
model_id, device_map='auto', token_type_ids=None,
clean_up_tokenization_spaces=False)
kwargs = {'quantization_config': BitsAndBytesConfig(
load_in_8bit=True,)}
self.model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
device_map='auto',
**kwargs,
)
else:
raise ValueError
self.model_name = model_name
self.stop_sequences = stop_sequences + [self.tokenizer.eos_token]
self.token_limit = 4096 if 'Llama-2' in model_name else 2048
def predict(self, input_data, temperature, return_full=False):
# Implement prediction.
inputs = self.tokenizer(input_data, return_tensors="pt").to("cuda")
if 'llama' in self.model_name.lower() or 'falcon' in self.model_name or 'mistral' in self.model_name.lower():
if 'token_type_ids' in inputs: # Some HF models have changed.
del inputs['token_type_ids']
pad_token_id = self.tokenizer.eos_token_id
else:
pad_token_id = None
if self.stop_sequences is not None:
stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(
stops=self.stop_sequences,
initial_length=len(inputs['input_ids'][0]),
tokenizer=self.tokenizer)])
else:
stopping_criteria = None
logging.debug('temperature: %f', temperature)
with torch.no_grad():
outputs = self.model.generate(
**inputs,
max_new_tokens=self.max_new_tokens,
return_dict_in_generate=True,
output_scores=True,
output_hidden_states=True,
temperature=temperature,
do_sample=True,
stopping_criteria=stopping_criteria,
pad_token_id=pad_token_id,
)
if len(outputs.sequences[0]) > self.token_limit:
raise ValueError(
'Generation exceeding token limit %d > %d',
len(outputs.sequences[0]), self.token_limit)
full_answer = self.tokenizer.decode(
outputs.sequences[0], skip_special_tokens=True)
if return_full:
return full_answer
# For some models, we need to remove the input_data from the answer.
if full_answer.startswith(input_data):
input_data_offset = len(input_data)
else:
raise ValueError('Have not tested this in a while.')
# Remove input from answer.
answer = full_answer[input_data_offset:]
# Remove stop_words from answer.
stop_at = len(answer)
sliced_answer = answer
if self.stop_sequences is not None:
for stop in self.stop_sequences:
if answer.endswith(stop):
stop_at = len(answer) - len(stop)
sliced_answer = answer[:stop_at]
break
if not all([stop not in sliced_answer for stop in self.stop_sequences]):
error_msg = 'Error: Stop words not removed successfully!'
error_msg += f'Answer: >{answer}< '
error_msg += f'Sliced Answer: >{sliced_answer}<'
if 'falcon' not in self.model_name.lower():
raise ValueError(error_msg)
else:
logging.error(error_msg)
# Remove whitespaces from answer (in particular from beginning.)
sliced_answer = sliced_answer.strip()
# Get the number of tokens until the stop word comes up.
# Note: Indexing with `stop_at` already excludes the stop_token.
# Note: It's important we do this with full answer, since there might be
# non-trivial interactions between the input_data and generated part
# in tokenization (particularly around whitespaces.)
token_stop_index = self.tokenizer(full_answer[:input_data_offset + stop_at], return_tensors="pt")['input_ids'].shape[1]
n_input_token = len(inputs['input_ids'][0])
n_generated = token_stop_index - n_input_token
if n_generated == 0:
logging.warning('Only stop_words were generated. For likelihoods and embeddings, taking stop word instead.')
n_generated = 1
# Get the last hidden state (last layer) and the last token's embedding of the answer.
# Note: We do not want this to be the stop token.
# outputs.hidden_state is a tuple of len = n_generated_tokens.
# The first hidden state is for the input tokens and is of shape
# (n_layers) x (batch_size, input_size, hidden_size).
# (Note this includes the first generated token!)
# The remaining hidden states are for the remaining generated tokens and is of shape
# (n_layers) x (batch_size, 1, hidden_size).
# Note: The output embeddings have the shape (batch_size, generated_length, hidden_size).
# We do not get embeddings for input_data! We thus subtract the n_tokens_in_input from
# token_stop_index to arrive at the right output.
if 'decoder_hidden_states' in outputs.keys():
hidden = outputs.decoder_hidden_states
else:
hidden = outputs.hidden_states
if len(hidden) == 1:
logging.warning(
'Taking first and only generation for hidden! '
'n_generated: %d, n_input_token: %d, token_stop_index %d, '
'last_token: %s, generation was: %s',
n_generated, n_input_token, token_stop_index,
self.tokenizer.decode(outputs['sequences'][0][-1]),
full_answer,
)
last_input = hidden[0]
elif ((n_generated - 1) >= len(hidden)):
# If access idx is larger/equal.
logging.error(
'Taking last state because n_generated is too large'
'n_generated: %d, n_input_token: %d, token_stop_index %d, '
'last_token: %s, generation was: %s, slice_answer: %s',
n_generated, n_input_token, token_stop_index,
self.tokenizer.decode(outputs['sequences'][0][-1]),
full_answer, sliced_answer
)
last_input = hidden[-1]
else:
last_input = hidden[n_generated - 1]
# Then access last layer for input
last_layer = last_input[-1]
# Then access last token in input.
last_token_embedding = last_layer[:, -1, :].cpu()
# Get log_likelihoods.
# outputs.scores are the logits for the generated token.
# outputs.scores is a tuple of len = n_generated_tokens.
# Each entry is shape (bs, vocabulary size).
# outputs.sequences is the sequence of all tokens: input and generated.
transition_scores = self.model.compute_transition_scores(
outputs.sequences, outputs.scores, normalize_logits=True)
# Transition_scores[0] only contains the scores for the first generated tokens.
log_likelihoods = [score.item() for score in transition_scores[0]]
if len(log_likelihoods) == 1:
logging.warning('Taking first and only generation for log likelihood!')
log_likelihoods = log_likelihoods
else:
log_likelihoods = log_likelihoods[:n_generated]
if len(log_likelihoods) == self.max_new_tokens:
logging.warning('Generation interrupted by max_token limit.')
if len(log_likelihoods) == 0:
raise ValueError
return sliced_answer, log_likelihoods, last_token_embedding
def get_p_true(self, input_data):
"""Get the probability of the model anwering A (True) for the given input."""
input_data += ' A'
tokenized_prompt_true = self.tokenizer(input_data, return_tensors='pt').to('cuda')['input_ids']
# The computation of the negative log likelihoods follows:
# https://huggingface.co/docs/transformers/perplexity.
target_ids_true = tokenized_prompt_true.clone()
# Set all target_ids except the last one to -100.
target_ids_true[0, :-1] = -100
with torch.no_grad():
model_output_true = self.model(tokenized_prompt_true, labels=target_ids_true)
loss_true = model_output_true.loss
return -loss_true.item()
================================================
FILE: semantic_uncertainty/uncertainty/uncertainty_measures/p_ik.py
================================================
"""Predict model correctness from linear classifier."""
import logging
import torch
import wandb
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
def get_p_ik(train_embeddings, is_false, eval_embeddings=None, eval_is_false=None):
"""Fit linear classifier to embeddings to predict model correctness."""
logging.info('Accuracy of model on Task: %f.', 1 - torch.tensor(is_false).mean()) # pylint: disable=no-member
# Convert the list of tensors to a 2D tensor.
train_embeddings_tensor = torch.cat(train_embeddings, dim=0) # pylint: disable=no-member
# Convert the tensor to a numpy array.
embeddings_array = train_embeddings_tensor.cpu().numpy()
# Split the data into training and test sets.
X_train, X_test, y_train, y_test = train_test_split( # pylint: disable=invalid-name
embeddings_array, is_false, test_size=0.2, random_state=42) # pylint: disable=invalid-name
# Fit a logistic regression model.
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict deterministically and probabilistically and compute accuracy and auroc for all splits.
X_eval = torch.cat(eval_embeddings, dim=0).cpu().numpy() # pylint: disable=no-member,invalid-name
y_eval = eval_is_false
Xs = [X_train, X_test, X_eval] # pylint: disable=invalid-name
ys = [y_train, y_test, y_eval] # pylint: disable=invalid-name
suffixes = ['train_train', 'train_test', 'eval']
metrics, y_preds_proba = {}, {}
for suffix, X, y_true in zip(suffixes, Xs, ys): # pylint: disable=invalid-name
# If suffix is eval, we fit a new model on the entire training data set
# rather than just a split of the training data set.
if suffix == 'eval':
model = LogisticRegression()
model.fit(embeddings_array, is_false)
convergence = {
'n_iter': model.n_iter_[0],
'converged': (model.n_iter_ < model.max_iter)[0]}
y_pred = model.predict(X)
y_pred_proba = model.predict_proba(X)
y_preds_proba[suffix] = y_pred_proba
acc_p_ik_train = accuracy_score(y_true, y_pred)
auroc_p_ik_train = roc_auc_score(y_true, y_pred_proba[:, 1])
split_metrics = {
f'acc_p_ik_{suffix}': acc_p_ik_train,
f'auroc_p_ik_{suffix}': auroc_p_ik_train}
metrics.update(split_metrics)
logging.info('Metrics for p_ik classifier: %s.', metrics)
wandb.log({**metrics, **convergence})
# Return model predictions on the eval set.
return y_preds_proba['eval'][:, 1]
================================================
FILE: semantic_uncertainty/uncertainty/uncertainty_measures/p_true.py
================================================
"""Compute p_true uncertainty metric."""
import logging
def construct_few_shot_prompt(
*, model, dataset, indices, prompt, brief, brief_always, make_prompt,
num_generations, metric):
"""Construct few shot prompt for p_true uncertainty metric."""
# Call model n_shots many times.
few_shot_prompt = []
all_responses = dict()
for it, i in enumerate(indices):
prompt_candidate = []
example = dataset[i]
question = example["question"]
context = example["context"]
if it != 0:
prompt_candidate += ['\n']
prompt_candidate += ['Question: ' + question]
prompt_candidate += ['\nBrainstormed Answers: ']
current_question = make_prompt(context, question, None, brief, brief_always)
local_prompt = prompt + current_question
logging.info('P_TRUE >> Current Question: '.ljust(25) + current_question)
responses = []
for j in range(num_generations + 1):
if j == 0:
temperature = 0.1
else:
temperature = 1.0
response, _, _ = model.predict(local_prompt, temperature)
logging.info('P_TRUE >> Current Response: '.ljust(25) + response)
responses.append(response)
prompt_candidate += [f'{response.strip()} \n']
if j == 0:
# Save most likely response and compute correctness metric for it.
most_likely_response = response
is_correct = metric(response, example, model)
answers = [answer for answer in example['answers']['text']]
logging.info('P_TRUE >> LOW-T >> true answer: '.ljust(35) + str(answers))
logging.info('P_TRUE >> LOW-T >> acc: '.ljust(35) + str(is_correct))
all_responses[i] = dict(
responses=responses, most_likely_response=most_likely_response,
is_correct=is_correct)
prompt_candidate += ['Possible answer: ' + most_likely_response + '\n']
prompt_candidate += ['Is the possible answer:\n']
prompt_candidate += ['A) True\n']
prompt_candidate += ['B) False\n']
prompt_candidate += ['The possible answer is:']
prompt_candidate += [' A' if is_correct else ' B']
prompt_len = len(model.tokenizer.encode(''.join(few_shot_prompt + prompt_candidate)))
# At test time, get a maximum of `num_generations * model.token_limit` extra tokens
# 200 buffer for question and 'Possible Answer'.
max_input_len = prompt_len + num_generations * model.max_new_tokens + 200
if max_input_len < model.token_limit:
few_shot_prompt.extend(prompt_candidate)
else:
logging.warning('Cutting of p_true prompt at length %d.', it)
break
return ''.join(few_shot_prompt), all_responses, it
def calculate_p_true(
model, question, most_probable_answer, brainstormed_answers,
few_shot_prompt, hint=False):
"""Calculate p_true uncertainty metric."""
if few_shot_prompt:
prompt = few_shot_prompt + '\n'
else:
prompt = ''
prompt += 'Question: ' + question
prompt += '\nBrainstormed Answers: '
for answer in brainstormed_answers + [most_probable_answer]:
prompt += answer.strip() + '\n'
prompt += 'Possible answer: ' + most_probable_answer + '\n'
if not hint:
prompt += 'Is the possible answer:\n'
prompt += 'A) True\n'
prompt += 'B) False\n'
prompt += 'The possible answer is:'
else:
prompt += 'Do the brainstormed answers match the possible answer? Respond with A if they do, if they do not respond with B. Answer:'
log_prob = model.get_p_true(prompt)
return log_prob
================================================
FILE: semantic_uncertainty/uncertainty/uncertainty_measures/semantic_entropy.py
================================================
"""Implement semantic entropy."""
import os
import pickle
import logging
import numpy as np
import wandb
import torch
import torch.nn.functional as F
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from uncertainty.models.huggingface_models import HuggingfaceModel
from uncertainty.utils import openai as oai
from uncertainty.utils import utils
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
class BaseEntailment:
def save_prediction_cache(self):
pass
class EntailmentDeberta(BaseEntailment):
def __init__(self):
self.tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v2-xlarge-mnli")
self.model = AutoModelForSequenceClassification.from_pretrained(
"microsoft/deberta-v2-xlarge-mnli").to(DEVICE)
def check_implication(self, text1, text2, *args, **kwargs):
inputs = self.tokenizer(text1, text2, return_tensors="pt").to(DEVICE)
# The model checks if text1 -> text2, i.e. if text2 follows from text1.
# check_implication('The weather is good', 'The weather is good and I like you') --> 1
# check_implication('The weather is good and I like you', 'The weather is good') --> 2
outputs = self.model(**inputs)
logits = outputs.logits
# Deberta-mnli returns `neutral` and `entailment` classes at indices 1 and 2.
largest_index = torch.argmax(F.softmax(logits, dim=1)) # pylint: disable=no-member
prediction = largest_index.cpu().item()
if os.environ.get('DEBERTA_FULL_LOG', False):
logging.info('Deberta Input: %s -> %s', text1, text2)
logging.info('Deberta Prediction: %s', prediction)
return prediction
class EntailmentLLM(BaseEntailment):
entailment_file = 'entailment_cache.pkl'
def __init__(self, entailment_cache_id, entailment_cache_only):
self.prediction_cache = self.init_prediction_cache(entailment_cache_id)
self.entailment_cache_only = entailment_cache_only
def init_prediction_cache(self, entailment_cache_id):
if entailment_cache_id is None:
return dict()
logging.info('Restoring prediction cache from %s', entailment_cache_id)
api = wandb.Api()
run = api.run(entailment_cache_id)
run.file(self.entailment_file).download(
replace=True, exist_ok=False, root=wandb.run.dir)
with open(f'{wandb.run.dir}/{self.entailment_file}', "rb") as infile:
return pickle.load(infile)
def save_prediction_cache(self):
# Write the dictionary to a pickle file.
utils.save(self.prediction_cache, self.entailment_file)
def check_implication(self, text1, text2, example=None):
if example is None:
raise ValueError
prompt = self.equivalence_prompt(text1, text2, example['question'])
logging.info('%s input: %s', self.name, prompt)
hashed = oai.md5hash(prompt)
if hashed in self.prediction_cache:
logging.info('Restoring hashed instead of predicting with model.')
response = self.prediction_cache[hashed]
else:
if self.entailment_cache_only:
raise ValueError
response = self.predict(prompt, temperature=0.02)
self.prediction_cache[hashed] = response
logging.info('%s prediction: %s', self.name, response)
binary_response = response.lower()[:30]
if 'entailment' in binary_response:
return 2
elif 'neutral' in binary_response:
return 1
elif 'contradiction' in binary_response:
return 0
else:
logging.warning('MANUAL NEUTRAL!')
return 1
class EntailmentGPT4(EntailmentLLM):
def __init__(self, entailment_cache_id, entailment_cache_only):
super().__init__(entailment_cache_id, entailment_cache_only)
self.name = 'gpt-4'
def equivalence_prompt(self, text1, text2, question):
prompt = f"""We are evaluating answers to the question \"{question}\"\n"""
prompt += "Here are two possible answers:\n"
prompt += f"Possible Answer 1: {text1}\nPossible Answer 2: {text2}\n"
prompt += "Does Possible Answer 1 semantically entail Possible Answer 2? Respond with entailment, contradiction, or neutral."""
return prompt
def predict(self, prompt, temperature):
return oai.predict(prompt, temperature, model=self.name)
class EntailmentGPT35(EntailmentGPT4):
def __init__(self, entailment_cache_id, entailment_cache_only):
super().__init__(entailment_cache_id, entailment_cache_only)
self.name = 'gpt-3.5'
class EntailmentGPT4Turbo(EntailmentGPT4):
def __init__(self, entailment_cache_id, entailment_cache_only):
super().__init__(entailment_cache_id, entailment_cache_only)
self.name = 'gpt-4-turbo'
class EntailmentLlama(EntailmentLLM):
def __init__(self, entailment_cache_id, entailment_cache_only, name):
super().__init__(entailment_cache_id, entailment_cache_only)
self.name = name
self.model = HuggingfaceModel(
name, stop_sequences='default', max_new_tokens=30)
def equivalence_prompt(self, text1, text2, question):
prompt = f"""We are evaluating answers to the question \"{question}\"\n"""
prompt += "Here are two possible answers:\n"
prompt += f"Possible Answer 1: {text1}\nPossible Answer 2: {text2}\n"
prompt += "Does Possible Answer 1 semantically entail Possible Answer 2? Respond only with entailment, contradiction, or neutral.\n"""
prompt += "Response:"""
return prompt
def predict(self, prompt, temperature):
predicted_answer, _, _ = self.model.predict(prompt, temperature)
return predicted_answer
def context_entails_response(context, responses, model):
votes = []
for response in responses:
votes.append(model.check_implication(context, response))
return 2 - np.mean(votes)
def get_semantic_ids(strings_list, model, strict_entailment=False, example=None):
"""Group list of predictions into semantic meaning."""
def are_equivalent(text1, text2):
implication_1 = model.check_implication(text1, text2, example=example)
implication_2 = model.check_implication(text2, text1, example=example) # pylint: disable=arguments-out-of-order
assert (implication_1 in [0, 1, 2]) and (implication_2 in [0, 1, 2])
if strict_entailment:
semantically_equivalent = (implication_1 == 2) and (implication_2 == 2)
else:
implications = [implication_1, implication_2]
# Check if none of the implications are 0 (contradiction) and not both of them are neutral.
semantically_equivalent = (0 not in implications) and ([1, 1] != implications)
return semantically_equivalent
# Initialise all ids with -1.
semantic_set_ids = [-1] * len(strings_list)
# Keep track of current id.
next_id = 0
for i, string1 in enumerate(strings_list):
# Check if string1 already has an id assigned.
if semantic_set_ids[i] == -1:
# If string1 has not been assigned an id, assign it next_id.
semantic_set_ids[i] = next_id
for j in range(i+1, len(strings_list)):
# Search through all remaining strings. If they are equivalent to string1, assign them the same id.
if are_equivalent(string1, strings_list[j]):
semantic_set_ids[j] = next_id
next_id += 1
assert -1 not in semantic_set_ids
return semantic_set_ids
def logsumexp_by_id(semantic_ids, log_likelihoods, agg='sum_normalized'):
"""Sum probabilities with the same semantic id.
Log-Sum-Exp because input and output probabilities in log space.
"""
unique_ids = sorted(list(set(semantic_ids)))
assert unique_ids == list(range(len(unique_ids)))
log_likelihood_per_semantic_id = []
for uid in unique_ids:
# Find positions in `semantic_ids` which belong to the active `uid`.
id_indices = [pos for pos, x in enumerate(semantic_ids) if x == uid]
# Gather log likelihoods at these indices.
id_log_likelihoods = [log_likelihoods[i] for i in id_indices]
if agg == 'sum_normalized':
# log_lik_norm = id_log_likelihoods - np.prod(log_likelihoods)
log_lik_norm = id_log_likelihoods - np.log(np.sum(np.exp(log_likelihoods)))
logsumexp_value = np.log(np.sum(np.exp(log_lik_norm)))
else:
raise ValueError
log_likelihood_per_semantic_id.append(logsumexp_value)
return log_likelihood_per_semantic_id
def predictive_entropy(log_probs):
"""Compute MC estimate of entropy.
`E[-log p(x)] ~= -1/N sum_i log p(x_i)`, i.e. the average token likelihood.
"""
entropy = -np.sum(log_probs) / len(log_probs)
return entropy
def predictive_entropy_rao(log_probs):
entropy = -np.sum(np.exp(log_probs) * log_probs)
return entropy
def cluster_assignment_entropy(semantic_ids):
"""Estimate semantic uncertainty from how often different clusters get assigned.
We estimate the categorical distribution over cluster assignments from the
semantic ids. The uncertainty is then given by the entropy of that
distribution. This estimate does not use token likelihoods, it relies soley
on the cluster assignments. If probability mass is spread of between many
clusters, entropy is larger. If probability mass is concentrated on a few
clusters, entropy is small.
Input:
semantic_ids: List of semantic ids, e.g. [0, 1, 2, 1].
Output:
cluster_entropy: Entropy, e.g. (-p log p).sum() for p = [1/4, 2/4, 1/4].
"""
n_generations = len(semantic_ids)
counts = np.bincount(semantic_ids)
probabilities = counts/n_generations
assert np.isclose(probabilities.sum(), 1)
entropy = - (probabilities * np.log(probabilities)).sum()
return entropy
================================================
FILE: semantic_uncertainty/uncertainty/utils/eval_utils.py
================================================
"""Functions for performance evaluation, mainly used in analyze_results.py."""
import numpy as np
import scipy
from sklearn import metrics
# pylint: disable=missing-function-docstring
def bootstrap(function, rng, n_resamples=1000):
def inner(data):
bs = scipy.stats.bootstrap(
(data, ), function, n_resamples=n_resamples, confidence_level=0.9,
random_state=rng)
return {
'std_err': bs.standard_error,
'low': bs.confidence_interval.low,
'high': bs.confidence_interval.high
}
return inner
def auroc(y_true, y_score):
fpr, tpr, thresholds = metrics.roc_curve(y_true, y_score)
del thresholds
return metrics.auc(fpr, tpr)
def accuracy_at_quantile(accuracies, uncertainties, quantile):
cutoff = np.quantile(uncertainties, quantile)
select = uncertainties <= cutoff
return np.mean(accuracies[select])
def area_under_thresholded_accuracy(accuracies, uncertainties):
quantiles = np.linspace(0.1, 1, 20)
select_accuracies = np.array([accuracy_at_quantile(accuracies, uncertainties, q) for q in quantiles])
dx = quantiles[1] - quantiles[0]
area = (select_accuracies * dx).sum()
return area
# Need wrappers because scipy expects 1D data.
def compatible_bootstrap(func, rng):
def helper(y_true_y_score):
# this function is called in the bootstrap
y_true = np.array([i['y_true'] for i in y_true_y_score])
y_score = np.array([i['y_score'] for i in y_true_y_score])
out = func(y_true, y_score)
return out
def wrap_inputs(y_true, y_score):
return [{'y_true': i, 'y_score': j} for i, j in zip(y_true, y_score)]
def converted_func(y_true, y_score):
y_true_y_score = wrap_inputs(y_true, y_score)
return bootstrap(helper, rng=rng)(y_true_y_score)
return converted_func
================================================
FILE: semantic_uncertainty/uncertainty/utils/openai.py
================================================
import os
import hashlib
from tenacity import retry, wait_random_exponential, retry_if_not_exception_type
from openai import OpenAI
CLIENT = OpenAI(api_key=os.environ.get('OPENAI_API_KEY', False))
class KeyError(Exception):
"""OpenAIKey not provided in environment variable."""
pass
@retry(retry=retry_if_not_exception_type(KeyError), wait=wait_random_exponential(min=1, max=10))
def predict(prompt, temperature=1.0, model='gpt-4'):
"""Predict with GPT models."""
if not CLIENT.api_key:
raise KeyError('Need to provide OpenAI API key in environment variable `OPENAI_API_KEY`.')
if isinstance(prompt, str):
messages = [
{'role': 'user', 'content': prompt},
]
else:
messages = prompt
if model == 'gpt-4':
model = 'gpt-4-0613'
elif model == 'gpt-4-turbo':
model = 'gpt-4-1106-preview'
elif model == 'gpt-3.5':
model = 'gpt-3.5-turbo-1106'
output = CLIENT.chat.completions.create(
model=model,
messages=messages,
max_tokens=200,
temperature=temperature,
)
response = output.choices[0].message.content
return response
def md5hash(string):
return int(hashlib.md5(string.encode('utf-8')).hexdigest(), 16)
================================================
FILE: semantic_uncertainty/uncertainty/utils/utils.py
================================================
"""Utility functions."""
import os
import logging
import argparse
import pickle
import wandb
from evaluate import load
from uncertainty.models.huggingface_models import HuggingfaceModel
from uncertainty.utils import openai as oai
BRIEF_PROMPTS = {
'default': "Answer the following question as briefly as possible.\n",
'chat': 'Answer the following question in a single brief but complete sentence.\n'}
def get_parser(stages=['generate', 'compute']):
entity = os.getenv('WANDB_SEM_UNC_ENTITY', None)
parser = argparse.ArgumentParser()
parser.add_argument(
"--debug", action=argparse.BooleanOptionalAction, default=False,
help="Keep default wandb clean.")
parser.add_argument('--entity', type=str, default=entity)
parser.add_argument('--random_seed', type=int, default=10)
parser.add_argument(
"--metric", type=str, default="squad",
choices=['squad', 'llm', 'llm_gpt-3.5', 'llm_gpt-4'],
help="Metric to assign accuracy to generations.")
parser.add_argument(
"--compute_accuracy_at_all_temps",
action=argparse.BooleanOptionalAction, default=True,
help="Compute accuracy at all temperatures or only t<<1.")
parser.add_argument(
"--experiment_lot", type=str, default='Unnamed Experiment',
help="Keep default wandb clean.")
if 'generate' in stages:
parser.add_argument(
"--model_name", type=str, default="Llama-2-7b-chat", help="Model name",
)
parser.add_argument(
"--model_max_new_tokens", type=int, default=50,
help="Max number of tokens generated.",
)
parser.add_argument(
"--dataset", type=str, default="trivia_qa",
choices=['trivia_qa', 'squad', 'bioasq', 'nq', 'svamp'],
help="Dataset to use")
parser.add_argument(
"--ood_train_dataset", type=str, default=None,
choices=['trivia_qa', 'squad', 'bioasq', 'nq', 'svamp'],
help="Dataset to use to assemble few-shot prompt, p_true prompt, and train p_ik.")
parser.add_argument(
"--num_samples", type=int, default=400,
help="Number of samples to use")
parser.add_argument(
"--num_few_shot", type=int, default=5,
help="Number of few shot examples to use")
parser.add_argument(
"--p_true_num_fewshot", type=int, default=20,
help="Number of few shot examples to use")
parser.add_argument(
"--p_true_hint", default=False,
action=argparse.BooleanOptionalAction,
help="Get generations for training set?")
parser.add_argument(
"--num_generations", type=int, default=10,
help="Number of generations to use")
parser.add_argument(
"--temperature", type=float, default=1.0,
help="Temperature")
parser.add_argument(
"--use_mc_options", type=bool, default=True,
help="Include MC options question?")
parser.add_argument(
"--get_training_set_generations", default=True,
action=argparse.BooleanOptionalAction,
help="Get generations for training set?")
parser.add_argument(
"--use_context", default=False,
action=argparse.BooleanOptionalAction,
help="Get generations for training set?")
parser.add_argument(
"--get_training_set_generations_most_likely_only", default=True,
action=argparse.BooleanOptionalAction,
help=(
"Only get embedding of most likely answer for training set. "
"This is all that's needed for p_true."))
parser.add_argument('--compute_p_true', default=True,
action=argparse.BooleanOptionalAction)
parser.add_argument(
"--brief_always", default=False, action=argparse.BooleanOptionalAction)
parser.add_argument(
"--enable_brief", default=True, action=argparse.BooleanOptionalAction)
parser.add_argument(
"--brief_prompt", default='default', type=str)
parser.add_argument(
"--prompt_type", default='default', type=str)
parser.add_argument(
"--compute_uncertainties", default=True,
action=argparse.BooleanOptionalAction,
help='Trigger compute_uncertainty_measures.py')
parser.add_argument(
"--answerable_only", default=False,
action=argparse.BooleanOptionalAction,
help='Exclude unanswerable questions.')
if 'compute' in stages:
parser.add_argument('--recompute_accuracy',
default=False, action=argparse.BooleanOptionalAction)
parser.add_argument('--eval_wandb_runid', type=str,
help='wandb run id of the dataset to evaluate on')
parser.add_argument('--train_wandb_runid', type=str, default=None,
help='wandb run id of the dataset from which training embeddings and p_true samples will be taken')
parser.add_argument('--num_eval_samples', type=int, default=int(1e19))
parser.add_argument('--compute_predictive_entropy',
default=True, action=argparse.BooleanOptionalAction)
parser.add_argument('--compute_p_ik', default=True,
action=argparse.BooleanOptionalAction)
parser.add_argument('--compute_p_ik_answerable', default=False,
action=argparse.BooleanOptionalAction)
parser.add_argument('--compute_context_entails_response', default=False,
action=argparse.BooleanOptionalAction)
parser.add_argument('--analyze_run', default=True,
action=argparse.BooleanOptionalAction)
parser.add_argument('--assign_new_wandb_id', default=True,
action=argparse.BooleanOptionalAction)
parser.add_argument('--restore_entity_eval', type=str, default=entity)
parser.add_argument('--restore_entity_train', type=str, default=entity)
parser.add_argument('--condition_on_question',
default=True, action=argparse.BooleanOptionalAction)
parser.add_argument('--strict_entailment',
default=True, action=argparse.BooleanOptionalAction)
parser.add_argument('--use_all_generations', default=True, action=argparse.BooleanOptionalAction)
parser.add_argument('--use_num_generations', type=int, default=-1)
parser.add_argument("--entailment_model", default='deberta', type=str)
parser.add_argument(
"--entailment_cache_id", default=None, type=str,
help='Restore entailment predictions from previous run for GPT-4/LLaMa-Entailment.')
parser.add_argument('--entailment_cache_only', default=False, action=argparse.BooleanOptionalAction)
parser.add_argument('--compute_p_true_in_compute_stage',
default=False, action=argparse.BooleanOptionalAction)
parser.add_argument('--reuse_entailment_model',
default=False, action=argparse.BooleanOptionalAction,
help='Use entailment model as p_true model.')
return parser
def setup_logger():
"""Setup logger to always print time and level."""
logging.basicConfig(
format='%(asctime)s %(levelname)-8s %(message)s',
level=logging.INFO,
datefmt='%Y-%m-%d %H:%M:%S')
logging.getLogger().setLevel(logging.INFO) # logging.DEBUG
def construct_fewshot_prompt_from_indices(dataset, example_indices, brief, brief_always, make_prompt):
"""Given a dataset and indices, construct a fewshot prompt."""
if not brief_always:
prompt = brief
else:
prompt = ''
for example_index in example_indices:
example = dataset[example_index]
context = example["context"]
question = example["question"]
answer = example["answers"]["text"][0]
prompt = prompt + make_prompt(context, question, answer, brief, brief_always)
return prompt
def split_dataset(dataset):
"""Get indices of answerable and unanswerable questions."""
def clen(ex):
return len(ex["answers"]["text"])
answerable_indices = [i for i, ex in enumerate(dataset) if clen(ex) > 0]
unanswerable_indices = [i for i, ex in enumerate(dataset) if clen(ex) == 0]
# union == full dataset
assert set(answerable_indices) | set(
unanswerable_indices) == set(range(len(dataset)))
# no overlap
assert set(answerable_indices) - \
set(unanswerable_indices) == set(answerable_indices)
return answerable_indices, unanswerable_indices
def model_based_metric(predicted_answer, example, model):
if 'answers' in example:
correct_answers = example['answers']['text']
elif 'reference' in example:
correct_answers = example['reference']['answers']['text']
else:
raise ValueError
prompt = f'We are assessing the quality of answers to the following question: {example["question"]}\n'
if len(correct_answers) == 1:
prompt += f"The expected answer is: {correct_answers[0]}.\n"
else:
prompt += f"The following are expected answers to this question: {correct_answers}.\n"
prompt += f"The proposed answer is: {predicted_answer}\n"
if len(correct_answers) == 1:
prompt += "Within the context of the question, does the proposed answer mean the same as the expected answer?"
else:
prompt += "Within the context of the question, does the proposed answer mean the same as any of the expected answers?"
prompt += " Respond only with yes or no.\nResponse:"
if 'gpt' in model.model_name.lower():
predicted_answer = model.predict(prompt, 0.01)
else:
predicted_answer, _, _ = model.predict(prompt, 0.01)
if 'yes' in predicted_answer.lower():
return 1.0
elif 'no' in predicted_answer.lower():
return 0.0
else:
logging.warning('Redo llm check.')
predicted_answer, _, _ = model.predict(prompt, 1)
if 'yes' in predicted_answer.lower():
return 1.0
elif 'no' in predicted_answer.lower():
return 0.0
logging.warning('Answer neither no nor yes. Defaulting to no!')
return 0.0
def llm_metric(predicted_answer, example, model):
return model_based_metric(predicted_answer, example, model)
def get_gpt_metric(metric_name):
model_name = '_'.join(metric_name.split('_')[1:])
class EntailmentGPT():
def __init__(self, model_name):
self.model_name = model_name
def predict(self, prompt, temperature):
return oai.predict(prompt, temperature, model=self.model_name)
gpt_model = EntailmentGPT(model_name)
def gpt_metric(predicted_answer, example, model):
del model
return model_based_metric(predicted_answer, example, gpt_model)
return gpt_metric
def get_reference(example):
if 'answers' not in example:
example = example['reference']
answers = example['answers']
answer_starts = answers.get('answer_start', [])
reference = {'answers': {'answer_start': answer_starts, 'text': answers['text']}, 'id': example['id']}
return reference
def init_model(args):
mn = args.model_name
if 'llama' in mn.lower() or 'falcon' in mn or 'mistral' in mn.lower():
model = HuggingfaceModel(
mn, stop_sequences='default',
max_new_tokens=args.model_max_new_tokens)
else:
raise ValueError(f'Unknown model_name `{mn}`.')
return model
def get_make_prompt(args):
if args.prompt_type == 'default':
def make_prompt(context, question, answer, brief, brief_always):
prompt = ''
if brief_always:
prompt += brief
if args.use_context and (context is not None):
prompt += f"Context: {context}\n"
prompt += f"Question: {question}\n"
if answer:
prompt += f"Answer: {answer}\n\n"
else:
prompt += 'Answer:'
return prompt
else:
raise ValueError
return make_prompt
def get_metric(metric):
if metric == 'squad':
squad_metric = load("squad_v2")
def metric(response, example, *args, **kwargs):
# Compatibility with recomputation.
if 'id' in example:
exid = example['id']
elif 'id' in example['reference']:
exid = example['reference']['id']
else:
raise ValueError
prediction = {'prediction_text': response, 'no_answer_probability': 0.0, 'id': exid}
results = squad_metric.compute(
predictions=[prediction],
references=[get_reference(example)])
return 1.0 if (results['f1'] >= 50.0) else 0.0
# Reuses the globally active model for these.
elif metric == 'llm':
metric = llm_metric
elif metric == 'llm_gpt-3.5':
metric = get_gpt_metric(metric)
elif metric == 'llm_gpt-4':
metric = get_gpt_metric(metric)
else:
raise ValueError
return metric
def save(object, file):
with open(f'{wandb.run.dir}/{file}', 'wb') as f:
pickle.dump(object, f)
wandb.save(f'{wandb.run.dir}/{file}')
gitextract_1q6fc1mb/
├── .gitignore
├── LICENSE
├── README.md
├── environment.yaml
├── environment_export.yaml
├── notebooks/
│ └── example_evaluation.ipynb
└── semantic_uncertainty/
├── analyze_results.py
├── compute_uncertainty_measures.py
├── generate_answers.py
└── uncertainty/
├── __init__.py
├── data/
│ └── data_utils.py
├── models/
│ ├── __init__.py
│ ├── base_model.py
│ └── huggingface_models.py
├── uncertainty_measures/
│ ├── p_ik.py
│ ├── p_true.py
│ └── semantic_entropy.py
└── utils/
├── eval_utils.py
├── openai.py
└── utils.py
SYMBOL INDEX (67 symbols across 12 files)
FILE: semantic_uncertainty/analyze_results.py
function init_wandb (line 24) | def init_wandb(wandb_runid, assign_new_wandb_id, experiment_lot, entity):
function analyze_run (line 51) | def analyze_run(
FILE: semantic_uncertainty/compute_uncertainty_measures.py
function main (line 32) | def main(args):
FILE: semantic_uncertainty/generate_answers.py
function main (line 21) | def main(args):
FILE: semantic_uncertainty/uncertainty/data/data_utils.py
function load_ds (line 8) | def load_ds(dataset_name, seed, add_options=None):
FILE: semantic_uncertainty/uncertainty/models/base_model.py
class BaseModel (line 8) | class BaseModel(ABC):
method predict (line 13) | def predict(self, input_data, temperature):
method get_p_true (line 17) | def get_p_true(self, input_data):
FILE: semantic_uncertainty/uncertainty/models/huggingface_models.py
class StoppingCriteriaSub (line 22) | class StoppingCriteriaSub(StoppingCriteria):
method __init__ (line 24) | def __init__(self, stops, tokenizer, match_on='text', initial_length=N...
method __call__ (line 34) | def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTen...
function remove_split_layer (line 50) | def remove_split_layer(device_map_in):
class HuggingfaceModel (line 85) | class HuggingfaceModel(BaseModel):
method __init__ (line 88) | def __init__(self, model_name, stop_sequences=None, max_new_tokens=None):
method predict (line 199) | def predict(self, input_data, temperature, return_full=False):
method get_p_true (line 359) | def get_p_true(self, input_data):
FILE: semantic_uncertainty/uncertainty/uncertainty_measures/p_ik.py
function get_p_ik (line 12) | def get_p_ik(train_embeddings, is_false, eval_embeddings=None, eval_is_f...
FILE: semantic_uncertainty/uncertainty/uncertainty_measures/p_true.py
function construct_few_shot_prompt (line 5) | def construct_few_shot_prompt(
function calculate_p_true (line 72) | def calculate_p_true(
FILE: semantic_uncertainty/uncertainty/uncertainty_measures/semantic_entropy.py
class BaseEntailment (line 21) | class BaseEntailment:
method save_prediction_cache (line 22) | def save_prediction_cache(self):
class EntailmentDeberta (line 26) | class EntailmentDeberta(BaseEntailment):
method __init__ (line 27) | def __init__(self):
method check_implication (line 32) | def check_implication(self, text1, text2, *args, **kwargs):
class EntailmentLLM (line 49) | class EntailmentLLM(BaseEntailment):
method __init__ (line 53) | def __init__(self, entailment_cache_id, entailment_cache_only):
method init_prediction_cache (line 57) | def init_prediction_cache(self, entailment_cache_id):
method save_prediction_cache (line 71) | def save_prediction_cache(self):
method check_implication (line 75) | def check_implication(self, text1, text2, example=None):
class EntailmentGPT4 (line 106) | class EntailmentGPT4(EntailmentLLM):
method __init__ (line 108) | def __init__(self, entailment_cache_id, entailment_cache_only):
method equivalence_prompt (line 112) | def equivalence_prompt(self, text1, text2, question):
method predict (line 121) | def predict(self, prompt, temperature):
class EntailmentGPT35 (line 125) | class EntailmentGPT35(EntailmentGPT4):
method __init__ (line 127) | def __init__(self, entailment_cache_id, entailment_cache_only):
class EntailmentGPT4Turbo (line 132) | class EntailmentGPT4Turbo(EntailmentGPT4):
method __init__ (line 134) | def __init__(self, entailment_cache_id, entailment_cache_only):
class EntailmentLlama (line 139) | class EntailmentLlama(EntailmentLLM):
method __init__ (line 141) | def __init__(self, entailment_cache_id, entailment_cache_only, name):
method equivalence_prompt (line 147) | def equivalence_prompt(self, text1, text2, question):
method predict (line 157) | def predict(self, prompt, temperature):
function context_entails_response (line 162) | def context_entails_response(context, responses, model):
function get_semantic_ids (line 169) | def get_semantic_ids(strings_list, model, strict_entailment=False, examp...
function logsumexp_by_id (line 208) | def logsumexp_by_id(semantic_ids, log_likelihoods, agg='sum_normalized'):
function predictive_entropy (line 233) | def predictive_entropy(log_probs):
function predictive_entropy_rao (line 244) | def predictive_entropy_rao(log_probs):
function cluster_assignment_entropy (line 249) | def cluster_assignment_entropy(semantic_ids):
FILE: semantic_uncertainty/uncertainty/utils/eval_utils.py
function bootstrap (line 10) | def bootstrap(function, rng, n_resamples=1000):
function auroc (line 23) | def auroc(y_true, y_score):
function accuracy_at_quantile (line 29) | def accuracy_at_quantile(accuracies, uncertainties, quantile):
function area_under_thresholded_accuracy (line 35) | def area_under_thresholded_accuracy(accuracies, uncertainties):
function compatible_bootstrap (line 44) | def compatible_bootstrap(func, rng):
FILE: semantic_uncertainty/uncertainty/utils/openai.py
class KeyError (line 11) | class KeyError(Exception):
function predict (line 17) | def predict(prompt, temperature=1.0, model='gpt-4'):
function md5hash (line 47) | def md5hash(string):
FILE: semantic_uncertainty/uncertainty/utils/utils.py
function get_parser (line 19) | def get_parser(stages=['generate', 'compute']):
function setup_logger (line 151) | def setup_logger():
function construct_fewshot_prompt_from_indices (line 160) | def construct_fewshot_prompt_from_indices(dataset, example_indices, brie...
function split_dataset (line 179) | def split_dataset(dataset):
function model_based_metric (line 198) | def model_based_metric(predicted_answer, example, model):
function llm_metric (line 242) | def llm_metric(predicted_answer, example, model):
function get_gpt_metric (line 246) | def get_gpt_metric(metric_name):
function get_reference (line 266) | def get_reference(example):
function init_model (line 275) | def init_model(args):
function get_make_prompt (line 286) | def get_make_prompt(args):
function get_metric (line 306) | def get_metric(metric):
function save (line 339) | def save(object, file):
Condensed preview — 20 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (153K chars).
[
{
"path": ".gitignore",
"chars": 1842,
"preview": "figures/\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution "
},
{
"path": "LICENSE",
"chars": 1671,
"preview": "The Clear BSD License\n\nCopyright (c) 2023\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with "
},
{
"path": "README.md",
"chars": 7255,
"preview": "# Detecting Hallucinations in Large Language Models Using Semantic Entropy\n\nThis repository contains the code necessary "
},
{
"path": "environment.yaml",
"chars": 645,
"preview": "name: semantic_uncertainty\nchannels:\n - pytorch\n - nvidia\n - defaults\ndependencies:\n - python=3.11\n - pip\n - pytor"
},
{
"path": "environment_export.yaml",
"chars": 8450,
"preview": "name: semantic_uncertainty_export\nchannels:\n - pytorch\n - nvidia\n - defaults\ndependencies:\n - _libgcc_mutex=0.1=main"
},
{
"path": "notebooks/example_evaluation.ipynb",
"chars": 40967,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"id\": \"e9698d3d-f63c-4d8c-af6e-4e9996b3fe28\",\n \""
},
{
"path": "semantic_uncertainty/analyze_results.py",
"chars": 7094,
"preview": "\"\"\"Compute overall performance metrics from predicted uncertainties.\"\"\"\nimport argparse\nimport functools\nimport logging\n"
},
{
"path": "semantic_uncertainty/compute_uncertainty_measures.py",
"chars": 15702,
"preview": "\"\"\"Compute uncertainty measures after generating answers.\"\"\"\nfrom collections import defaultdict\nimport logging\nimport o"
},
{
"path": "semantic_uncertainty/generate_answers.py",
"chars": 11284,
"preview": "\"\"\"Sample answers from LLMs on QA task.\"\"\"\nimport gc\nimport os\nimport logging\nimport random\nfrom tqdm import tqdm\n\nimpor"
},
{
"path": "semantic_uncertainty/uncertainty/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "semantic_uncertainty/uncertainty/data/data_utils.py",
"chars": 3775,
"preview": "\"\"\"Data Loading Utilities.\"\"\"\nimport os\nimport json\nimport hashlib\nimport datasets\n\n\ndef load_ds(dataset_name, seed, add"
},
{
"path": "semantic_uncertainty/uncertainty/models/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "semantic_uncertainty/uncertainty/models/base_model.py",
"chars": 357,
"preview": "from abc import ABC, abstractmethod\nfrom typing import List, Text\n\n\nSTOP_SEQUENCES = ['\\n\\n\\n\\n', '\\n\\n\\n', '\\n\\n', '\\n'"
},
{
"path": "semantic_uncertainty/uncertainty/models/huggingface_models.py",
"chars": 15099,
"preview": "\"\"\"Implement HuggingfaceModel models.\"\"\"\nimport copy\nimport logging\nfrom collections import Counter\nimport torch\n\nimport"
},
{
"path": "semantic_uncertainty/uncertainty/uncertainty_measures/p_ik.py",
"chars": 2714,
"preview": "\"\"\"Predict model correctness from linear classifier.\"\"\"\nimport logging\nimport torch\nimport wandb\n\nfrom sklearn.linear_mo"
},
{
"path": "semantic_uncertainty/uncertainty/uncertainty_measures/p_true.py",
"chars": 3785,
"preview": "\"\"\"Compute p_true uncertainty metric.\"\"\"\nimport logging\n\n\ndef construct_few_shot_prompt(\n *, model, dataset, indi"
},
{
"path": "semantic_uncertainty/uncertainty/uncertainty_measures/semantic_entropy.py",
"chars": 10069,
"preview": "\"\"\"Implement semantic entropy.\"\"\"\nimport os\nimport pickle\nimport logging\n\nimport numpy as np\nimport wandb\nimport torch\ni"
},
{
"path": "semantic_uncertainty/uncertainty/utils/eval_utils.py",
"chars": 1878,
"preview": "\"\"\"Functions for performance evaluation, mainly used in analyze_results.py.\"\"\"\nimport numpy as np\nimport scipy\nfrom skle"
},
{
"path": "semantic_uncertainty/uncertainty/utils/openai.py",
"chars": 1267,
"preview": "import os\nimport hashlib\nfrom tenacity import retry, wait_random_exponential, retry_if_not_exception_type\n\nfrom openai i"
},
{
"path": "semantic_uncertainty/uncertainty/utils/utils.py",
"chars": 13566,
"preview": "\"\"\"Utility functions.\"\"\"\nimport os\nimport logging\nimport argparse\nimport pickle\n\nimport wandb\n\nfrom evaluate import load"
}
]
About this extraction
This page contains the full source code of the jlko/semantic_uncertainty GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 20 files (144.0 KB), approximately 53.1k tokens, and a symbol index with 67 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.