Full Code of openai/whisper for AI

main c0d2f624c09d cached

43 files

7.6 MB

2.0M tokens

188 symbols

2 requests

Download .txt

Showing preview only (7,984K chars total). Download the full file or copy to clipboard to get everything.

Repository: openai/whisper
Branch: main
Commit: c0d2f624c09d
Files: 43
Total size: 7.6 MB

Directory structure:
gitextract_mtgfm16e/

├── .flake8
├── .gitattributes
├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       ├── python-publish.yml
│       └── test.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── data/
│   ├── README.md
│   └── meanwhile.json
├── model-card.md
├── notebooks/
│   ├── LibriSpeech.ipynb
│   └── Multilingual_ASR.ipynb
├── pyproject.toml
├── requirements.txt
├── tests/
│   ├── conftest.py
│   ├── jfk.flac
│   ├── test_audio.py
│   ├── test_normalizer.py
│   ├── test_timing.py
│   ├── test_tokenizer.py
│   └── test_transcribe.py
└── whisper/
    ├── __init__.py
    ├── __main__.py
    ├── assets/
    │   ├── gpt2.tiktoken
    │   ├── mel_filters.npz
    │   └── multilingual.tiktoken
    ├── audio.py
    ├── decoding.py
    ├── model.py
    ├── normalizers/
    │   ├── __init__.py
    │   ├── basic.py
    │   ├── english.json
    │   └── english.py
    ├── timing.py
    ├── tokenizer.py
    ├── transcribe.py
    ├── triton_ops.py
    ├── utils.py
    └── version.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .flake8
================================================
[flake8]
per-file-ignores =
    */__init__.py: F401



================================================
FILE: .gitattributes
================================================
# Override jupyter in Github language stats for more accurate estimate of repo code languages
# reference: https://github.com/github/linguist/blob/master/docs/overrides.md#generated-code
*.ipynb linguist-generated


================================================
FILE: .github/dependabot.yml
================================================
# Keep GitHub Actions up to date with GitHub's Dependabot...
# https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot
# https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file#package-ecosystem
version: 2
updates:
  - package-ecosystem: github-actions
    directory: /
    groups:
      github-actions:
        patterns:
          - "*"  # Group all Actions updates into a single larger pull request
    schedule:
      interval: weekly


================================================
FILE: .github/workflows/python-publish.yml
================================================
name: Release

on:
  push:
    branches:
    - main
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions-ecosystem/action-regex-match@v2
      id: regex-match
      with:
        text: ${{ github.event.head_commit.message }}
        regex: '^Release ([^ ]+)'
    - name: Set up Python
      uses: actions/setup-python@v5
      with:
        python-version: '3.12'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install setuptools wheel twine build
    - name: Release
      if: ${{ steps.regex-match.outputs.match != '' }}
      uses: softprops/action-gh-release@v2
      with:
        tag_name: v${{ steps.regex-match.outputs.group1 }}
    - name: Build and publish
      if: ${{ steps.regex-match.outputs.match != '' }}
      env:
        TWINE_USERNAME: __token__
        TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
      run: |
        python -m build --sdist
        twine upload dist/*


================================================
FILE: .github/workflows/test.yml
================================================
name: test
on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  pre-commit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Fetch base branch
        run: git fetch origin ${{ github.base_ref }}
      - uses: actions/setup-python@v5
        with:
          python-version: "3.9"
          architecture: x64
      - name: Get pip cache dir
        id: pip-cache
        run: |
          echo "dir=$(pip cache dir)" >> $GITHUB_OUTPUT
      - name: pip/pre-commit cache
        uses: actions/cache@v4
        with:
          path: |
            ${{ steps.pip-cache.outputs.dir }}
            ~/.cache/pre-commit
          key: ${{ runner.os }}-pip-pre-commit-${{ hashFiles('**/.pre-commit-config.yaml') }}
          restore-keys: |
            ${{ runner.os }}-pip-pre-commit
      - name: pre-commit
        run: |
          pip install --upgrade pre-commit
          pre-commit install --install-hooks
          pre-commit run --all-files
  whisper-test:
    needs: pre-commit
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        include:
          - python-version: '3.8'
            pytorch-version: 1.10.1
            numpy-requirement: "'numpy<2'"
          - python-version: '3.8'
            pytorch-version: 1.13.1
            numpy-requirement: "'numpy<2'"
          - python-version: '3.8'
            pytorch-version: 2.0.1
            numpy-requirement: "'numpy<2'"
          - python-version: '3.9'
            pytorch-version: 2.1.2
            numpy-requirement: "'numpy<2'"
          - python-version: '3.10'
            pytorch-version: 2.2.2
            numpy-requirement: "'numpy<2'"
          - python-version: '3.11'
            pytorch-version: 2.3.1
            numpy-requirement: "'numpy'"
          - python-version: '3.12'
            pytorch-version: 2.4.1
            numpy-requirement: "'numpy'"
          - python-version: '3.12'
            pytorch-version: 2.5.1
            numpy-requirement: "'numpy'"
          - python-version: '3.13'
            pytorch-version: 2.5.1
            numpy-requirement: "'numpy'"
    steps:
      - uses: conda-incubator/setup-miniconda@v3
      - run: conda install -n test ffmpeg python=${{ matrix.python-version }}
      - uses: actions/checkout@v4
      - run: echo "$CONDA/envs/test/bin" >> $GITHUB_PATH
      - run: pip3 install .["dev"] ${{ matrix.numpy-requirement }} torch==${{ matrix.pytorch-version }}+cpu --index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pypi.org/simple
      - run: pytest --durations=0 -vv -k 'not test_transcribe or test_transcribe[tiny] or test_transcribe[tiny.en]' -m 'not requires_cuda'


================================================
FILE: .gitignore
================================================
__pycache__/
*.py[cod]
*$py.class
*.egg-info
.pytest_cache
.ipynb_checkpoints

thumbs.db
.DS_Store
.idea



================================================
FILE: .pre-commit-config.yaml
================================================
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: check-json
      - id: end-of-file-fixer
        types: [file, python]
      - id: trailing-whitespace
        types: [file, python]
      - id: mixed-line-ending
      - id: check-added-large-files
        args: [--maxkb=4096]
  - repo: https://github.com/psf/black
    rev: 25.1.0
    hooks:
      - id: black
  - repo: https://github.com/pycqa/isort
    rev: 6.0.0
    hooks:
      - id: isort
        name: isort (python)
        args: ["--profile", "black", "-l", "88", "--trailing-comma", "--multi-line", "3"]
  - repo: https://github.com/pycqa/flake8.git
    rev: 7.1.1
    hooks:
      - id: flake8
        types: [python]
        args: ["--max-line-length", "88", "--ignore", "E203,E501,W503,W504"]


================================================
FILE: CHANGELOG.md
================================================
# CHANGELOG

## [v20250625](https://github.com/openai/whisper/releases/tag/v20250625)

* Fix: Update torch.load to use weights_only=True to prevent security w… ([#2451](https://github.com/openai/whisper/pull/2451))
* Fix: Ensure DTW cost tensor is on the same device as input tensor ([#2561](https://github.com/openai/whisper/pull/2561))
* docs: updated README to specify translation model limitation ([#2547](https://github.com/openai/whisper/pull/2547))
* Fixed triton kernel update to support latest triton versions ([#2588](https://github.com/openai/whisper/pull/2588))
* Fix: GitHub display errors for Jupyter notebooks ([#2589](https://github.com/openai/whisper/pull/2589))
* Bump the github-actions group with 3 updates ([#2592](https://github.com/openai/whisper/pull/2592))
* Keep GitHub Actions up to date with GitHub's Dependabot ([#2486](https://github.com/openai/whisper/pull/2486))
* pre-commit: Upgrade black v25.1.0 and isort v6.0.0 ([#2514](https://github.com/openai/whisper/pull/2514))
* GitHub Actions: Add Python 3.13 to the testing ([#2487](https://github.com/openai/whisper/pull/2487))
* PEP 621: Migrate from setup.py to pyproject.toml ([#2435](https://github.com/openai/whisper/pull/2435))
* pre-commit autoupdate && pre-commit run --all-files ([#2484](https://github.com/openai/whisper/pull/2484))
* Upgrade GitHub Actions ([#2430](https://github.com/openai/whisper/pull/2430))
* Bugfix: Illogical "Avoid computing higher temperatures on no_speech" ([#1903](https://github.com/openai/whisper/pull/1903))
* Updating README and doc strings to reflect that n_mels can now be 128 ([#2049](https://github.com/openai/whisper/pull/2049))
* fix typo data/README.md ([#2433](https://github.com/openai/whisper/pull/2433))
* Update README.md ([#2379](https://github.com/openai/whisper/pull/2379))
* Add option to carry initial_prompt with the sliding window ([#2343](https://github.com/openai/whisper/pull/2343))
* more pytorch versions in tests ([#2408](https://github.com/openai/whisper/pull/2408))

## [v20240930](https://github.com/openai/whisper/releases/tag/v20240930)

* allowing numpy 2 in tests ([#2362](https://github.com/openai/whisper/pull/2362))
* large-v3-turbo model ([#2361](https://github.com/openai/whisper/pull/2361))
* test on python/pytorch versions up to 3.12 and 2.4.1 ([#2360](https://github.com/openai/whisper/pull/2360))
* using sdpa if available ([#2359](https://github.com/openai/whisper/pull/2359))

## [v20240927](https://github.com/openai/whisper/releases/tag/v20240927)

* pinning numpy<2 in tests ([#2332](https://github.com/openai/whisper/pull/2332))
* Relax triton requirements for compatibility with pytorch 2.4 and newer ([#2307](https://github.com/openai/whisper/pull/2307))
* Skip silence around hallucinations ([#1838](https://github.com/openai/whisper/pull/1838))
* Fix triton env marker ([#1887](https://github.com/openai/whisper/pull/1887))

## [v20231117](https://github.com/openai/whisper/releases/tag/v20231117)

* Relax triton requirements for compatibility with pytorch 2.1 and newer ([#1802](https://github.com/openai/whisper/pull/1802))

## [v20231106](https://github.com/openai/whisper/releases/tag/v20231106)

* large-v3 ([#1761](https://github.com/openai/whisper/pull/1761))

## [v20231105](https://github.com/openai/whisper/releases/tag/v20231105)

* remove tiktoken pin ([#1759](https://github.com/openai/whisper/pull/1759))
* docs: Disambiguation of the term "relative speed" in the README ([#1751](https://github.com/openai/whisper/pull/1751))
* allow_pickle=False while loading of mel matrix IN audio.py ([#1511](https://github.com/openai/whisper/pull/1511))
* handling transcribe exceptions. ([#1682](https://github.com/openai/whisper/pull/1682))
* Add new option to generate subtitles by a specific number of words ([#1729](https://github.com/openai/whisper/pull/1729))
* Fix exception when an audio file with no speech is provided ([#1396](https://github.com/openai/whisper/pull/1396))

## [v20230918](https://github.com/openai/whisper/releases/tag/v20230918)

* Add .pre-commit-config.yaml ([#1528](https://github.com/openai/whisper/pull/1528))
* fix doc of TextDecoder ([#1526](https://github.com/openai/whisper/pull/1526))
* Update model-card.md ([#1643](https://github.com/openai/whisper/pull/1643))
* word timing tweaks ([#1559](https://github.com/openai/whisper/pull/1559))
* Avoid rearranging all caches ([#1483](https://github.com/openai/whisper/pull/1483))
* Improve timestamp heuristics. ([#1461](https://github.com/openai/whisper/pull/1461))
* fix condition_on_previous_text ([#1224](https://github.com/openai/whisper/pull/1224))
* Fix numba depreceation notice ([#1233](https://github.com/openai/whisper/pull/1233))
* Updated README.md to provide more insight on BLEU and specific appendices ([#1236](https://github.com/openai/whisper/pull/1236))
* Avoid computing higher temperatures on no_speech segments ([#1279](https://github.com/openai/whisper/pull/1279))
* Dropped unused execute bit from mel_filters.npz. ([#1254](https://github.com/openai/whisper/pull/1254))
* Drop ffmpeg-python dependency and call ffmpeg directly. ([#1242](https://github.com/openai/whisper/pull/1242))
* Python 3.11 ([#1171](https://github.com/openai/whisper/pull/1171))
* Update decoding.py ([#1219](https://github.com/openai/whisper/pull/1219))
* Update decoding.py ([#1155](https://github.com/openai/whisper/pull/1155))
* Update README.md to reference tiktoken ([#1105](https://github.com/openai/whisper/pull/1105))
* Implement max line width and max line count, and make word highlighting optional ([#1184](https://github.com/openai/whisper/pull/1184))
* Squash long words at window and sentence boundaries. ([#1114](https://github.com/openai/whisper/pull/1114))
* python-publish.yml: bump actions version to fix node warning ([#1211](https://github.com/openai/whisper/pull/1211))
* Update tokenizer.py ([#1163](https://github.com/openai/whisper/pull/1163))

## [v20230314](https://github.com/openai/whisper/releases/tag/v20230314)

* abort find_alignment on empty input ([#1090](https://github.com/openai/whisper/pull/1090))
* Fix truncated words list when the replacement character is decoded ([#1089](https://github.com/openai/whisper/pull/1089))
* fix github language stats getting dominated by jupyter notebook ([#1076](https://github.com/openai/whisper/pull/1076))
* Fix alignment between the segments and the list of words ([#1087](https://github.com/openai/whisper/pull/1087))
* Use tiktoken ([#1044](https://github.com/openai/whisper/pull/1044))

## [v20230308](https://github.com/openai/whisper/releases/tag/v20230308)

* kwargs in decode() for convenience ([#1061](https://github.com/openai/whisper/pull/1061))
* fix all_tokens handling that caused more repetitions and discrepancy in JSON ([#1060](https://github.com/openai/whisper/pull/1060))
* fix typo in CHANGELOG.md

## [v20230307](https://github.com/openai/whisper/releases/tag/v20230307)

* Fix the repetition/hallucination issue identified in #1046 ([#1052](https://github.com/openai/whisper/pull/1052))
* Use triton==2.0.0 ([#1053](https://github.com/openai/whisper/pull/1053))
* Install triton in x86_64 linux only ([#1051](https://github.com/openai/whisper/pull/1051))
* update setup.py to specify python >= 3.8 requirement

## [v20230306](https://github.com/openai/whisper/releases/tag/v20230306)

* remove auxiliary audio extension ([#1021](https://github.com/openai/whisper/pull/1021))
* apply formatting with `black`, `isort`, and `flake8` ([#1038](https://github.com/openai/whisper/pull/1038))
* word-level timestamps in `transcribe()` ([#869](https://github.com/openai/whisper/pull/869))
* Decoding improvements ([#1033](https://github.com/openai/whisper/pull/1033))
* Update README.md ([#894](https://github.com/openai/whisper/pull/894))
* Fix infinite loop caused by incorrect timestamp tokens prediction ([#914](https://github.com/openai/whisper/pull/914))
* drop python 3.7 support ([#889](https://github.com/openai/whisper/pull/889))

## [v20230124](https://github.com/openai/whisper/releases/tag/v20230124)

* handle printing even if sys.stdout.buffer is not available ([#887](https://github.com/openai/whisper/pull/887))
* Add TSV formatted output in transcript, using integer start/end time in milliseconds ([#228](https://github.com/openai/whisper/pull/228))
* Added `--output_format` option ([#333](https://github.com/openai/whisper/pull/333))
* Handle `XDG_CACHE_HOME` properly for `download_root` ([#864](https://github.com/openai/whisper/pull/864))
* use stdout for printing transcription progress ([#867](https://github.com/openai/whisper/pull/867))
* Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm ([#659](https://github.com/openai/whisper/pull/659))
* print '?' if a letter can't be encoded using the system default encoding ([#859](https://github.com/openai/whisper/pull/859))

## [v20230117](https://github.com/openai/whisper/releases/tag/v20230117)

The first versioned release available on [PyPI](https://pypi.org/project/openai-whisper/)


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2022 OpenAI

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: MANIFEST.in
================================================
include requirements.txt
include README.md
include LICENSE
include whisper/assets/*
include whisper/normalizers/english.json


================================================
FILE: README.md
================================================
# Whisper

[[Blog]](https://openai.com/blog/whisper)
[[Paper]](https://arxiv.org/abs/2212.04356)
[[Model card]](https://github.com/openai/whisper/blob/main/model-card.md)
[[Colab example]](https://colab.research.google.com/github/openai/whisper/blob/master/notebooks/LibriSpeech.ipynb)

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.


## Approach

![Approach](https://raw.githubusercontent.com/openai/whisper/main/approach.png)

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.


## Setup

We used Python 3.9.9 and [PyTorch](https://pytorch.org/) 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably [OpenAI's tiktoken](https://github.com/openai/tiktoken) for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:

    pip install -U openai-whisper

Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

    pip install git+https://github.com/openai/whisper.git 

To update the package to the latest version of this repository, please run:

    pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

It also requires the command-line tool [`ffmpeg`](https://ffmpeg.org/) to be installed on your system, which is available from most package managers:

```bash
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
```

You may need [`rust`](http://rust-lang.org) installed as well, in case [tiktoken](https://github.com/openai/tiktoken) does not provide a pre-built wheel for your platform. If you see installation errors during the `pip install` command above, please follow the [Getting started page](https://www.rust-lang.org/learn/get-started) to install Rust development environment. Additionally, you may need to configure the `PATH` environment variable, e.g. `export PATH="$HOME/.cargo/bin:$PATH"`. If the installation fails with `No module named 'setuptools_rust'`, you need to install `setuptools_rust`, e.g. by running:

```bash
pip install setuptools-rust
```


## Available models and languages

There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs.
Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model.
The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware.

|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~10x      |
|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~7x       |
| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~4x       |
| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
| turbo  |   809 M    |        N/A         |      `turbo`       |     ~6 GB     |      ~8x       |

The `.en` models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models.
Additionally, the `turbo` model is an optimized version of `large-v3` that offers faster transcription speed with a minimal degradation in accuracy.

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of `large-v3` and `large-v2` models by language, using WERs (word error rates) or CER (character error rates, shown in *Italic*) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of [the paper](https://arxiv.org/abs/2212.04356), as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

![WER breakdown by language](https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62)

## Command-line usage

The following command will transcribe speech in audio files, using the `turbo` model:

```bash
whisper audio.flac audio.mp3 audio.wav --model turbo
```

The default setting (which selects the `turbo` model) works well for transcribing English. However, **the `turbo` model is not trained for translation tasks**. If you need to **translate non-English speech into English**, use one of the **multilingual models** (`tiny`, `base`, `small`, `medium`, `large`) instead of `turbo`. 

For example, to transcribe an audio file containing non-English speech, you can specify the language:

```bash
whisper japanese.wav --language Japanese
```

To **translate** speech into English, use:

```bash
whisper japanese.wav --model medium --language Japanese --task translate
```

> **Note:** The `turbo` model will return the original language even if `--task translate` is specified. Use `medium` or `large` for the best translation results.

Run the following to view all available options:

```bash
whisper --help
```

See [tokenizer.py](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) for the list of all available languages.


## Python usage

Transcription can also be performed within Python: 

```python
import whisper

model = whisper.load_model("turbo")
result = model.transcribe("audio.mp3")
print(result["text"])
```

Internally, the `transcribe()` method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.

Below is an example usage of `whisper.detect_language()` and `whisper.decode()` which provide lower-level access to the model.

```python
import whisper

model = whisper.load_model("turbo")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio, n_mels=model.dims.n_mels).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)
```

## More examples

Please use the [🙌 Show and tell](https://github.com/openai/whisper/discussions/categories/show-and-tell) category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.


## License

Whisper's code and model weights are released under the MIT License. See [LICENSE](https://github.com/openai/whisper/blob/main/LICENSE) for further details.


================================================
FILE: data/README.md
================================================
This directory supplements the paper with more details on how we prepared the data for evaluation, to help replicate our experiments. 

## Short-form English-only datasets

### LibriSpeech

We used the test-clean and test-other splits from the [LibriSpeech ASR corpus](https://www.openslr.org/12).

### TED-LIUM 3

We used the test split of [TED-LIUM Release 3](https://www.openslr.org/51/), using the segmented manual transcripts included in the release.

### Common Voice 5.1

We downloaded the English subset of Common Voice Corpus 5.1 from [the official website](https://commonvoice.mozilla.org/en/datasets)

### Artie

We used the [Artie bias corpus](https://github.com/artie-inc/artie-bias-corpus). This is a subset of the Common Voice dataset.

### CallHome & Switchboard

We used the two corpora from [LDC2002S09](https://catalog.ldc.upenn.edu/LDC2002S09) and [LDC2002T43](https://catalog.ldc.upenn.edu/LDC2002T43) and followed the [eval2000_data_prep.sh](https://github.com/kaldi-asr/kaldi/blob/master/egs/fisher_swbd/s5/local/eval2000_data_prep.sh) script for preprocessing. The `wav.scp` files can be converted to WAV files with the following bash commands:

```bash
mkdir -p wav
while read name cmd; do
    echo $name
    echo ${cmd/\|/} wav/$name.wav | bash
done < wav.scp
```


### WSJ

We used [LDC93S6B](https://catalog.ldc.upenn.edu/LDC93S6B) and [LDC94S13B](https://catalog.ldc.upenn.edu/LDC94S13B) and followed the [s5 recipe](https://github.com/kaldi-asr/kaldi/tree/master/egs/wsj/s5) to preprocess the dataset.

### CORAAL

We used the 231 interviews from [CORAAL (v. 2021.07)](https://oraal.uoregon.edu/coraal) and used the segmentations from [the FairSpeech project](https://github.com/stanford-policylab/asr-disparities/blob/master/input/CORAAL_transcripts.csv).

### CHiME-6

We downloaded the [CHiME-5 dataset](https://spandh.dcs.shef.ac.uk//chime_challenge/CHiME5/download.html) and followed the stage 0 of the [s5_track1 recipe](https://github.com/kaldi-asr/kaldi/tree/master/egs/chime6/s5_track1) to create the CHiME-6 dataset which fixes synchronization. We then used the binaural recordings (`*_P??.wav`) and the corresponding transcripts.

### AMI-IHM, AMI-SDM1

We preprocessed the [AMI Corpus](https://groups.inf.ed.ac.uk/ami/corpus/overview.shtml) by following the stage 0 and 2 of the [s5b recipe](https://github.com/kaldi-asr/kaldi/tree/master/egs/ami/s5b).


## Long-form English-only datasets

### TED-LIUM 3

To create a long-form transcription dataset from the [TED-LIUM3](https://www.openslr.org/51/) dataset, we sliced the audio between the beginning of the first labeled segment and the end of the last labeled segment of each talk, and we used the concatenated text as the label. Below are the timestamps used for slicing each of the 11 TED talks in the test split.   

| Filename            | Begin time (s) | End time (s) |
|---------------------|----------------|--------------|
| DanBarber_2010      | 16.09          | 1116.24      |
| JaneMcGonigal_2010  | 15.476         | 1187.61      |
| BillGates_2010      | 15.861         | 1656.94      |
| TomWujec_2010U      | 16.26          | 402.17       |
| GaryFlake_2010      | 16.06          | 367.14       |
| EricMead_2009P      | 18.434         | 536.44       |
| MichaelSpecter_2010 | 16.11          | 979.312      |
| DanielKahneman_2010 | 15.8           | 1199.44      |
| AimeeMullins_2009P  | 17.82          | 1296.59      |
| JamesCameron_2010   | 16.75          | 1010.65      |
| RobertGupta_2010U   | 16.8           | 387.03       |

### Meanwhile

This dataset consists of 64 segments from The Late Show with Stephen Colbert. The YouTube video ID, start and end timestamps, and the labels can be found in [meanwhile.json](meanwhile.json). The labels are collected from the closed-caption data for each video and corrected with manual inspection.

### Rev16

We use a subset of 16 files from the 30 podcast episodes in [Rev.AI's Podcast Transcription Benchmark](https://www.rev.ai/blog/podcast-transcription-benchmark-part-1/), after finding that there are multiple cases where a significant portion of the audio and the labels did not match, mostly on the parts introducing the sponsors. We selected 16 episodes that do not have this error, whose "file number" are:

    3 4 9 10 11 14 17 18 20 21 23 24 26 27 29 32

### Kincaid46

This dataset consists of 46 audio files and the corresponding transcripts compiled in the blog article [Which automatic transcription service is the most accurate - 2018](https://medium.com/descript/which-automatic-transcription-service-is-the-most-accurate-2018-2e859b23ed19) by Jason Kincaid. We used the 46 audio files and reference transcripts from the Airtable widget in the article.

For the human transcription benchmark in the paper, we use a subset of 25 examples from this data, whose "Ref ID" are:

    2 4 5 8 9 10 12 13 14 16 19 21 23 25 26 28 29 30 33 35 36 37 42 43 45

### Earnings-21, Earnings-22

For these datasets, we used the files available in [the speech-datasets repository](https://github.com/revdotcom/speech-datasets), as of their `202206` version.

### CORAAL

We used the 231 interviews from [CORAAL (v. 2021.07)](https://oraal.uoregon.edu/coraal) and used the full-length interview files and transcripts.


## Multilingual datasets

### Multilingual LibriSpeech

We used the test splits from each language in [the Multilingual LibriSpeech (MLS) corpus](https://www.openslr.org/94/).

### Fleurs

We collected audio files and transcripts using the implementation available as [HuggingFace datasets](https://huggingface.co/datasets/google/fleurs/blob/main/fleurs.py). To use as a translation dataset, we matched the numerical utterance IDs to find the corresponding transcript in English.   

### VoxPopuli

We used the `get_asr_data.py` script from [the official repository](https://github.com/facebookresearch/voxpopuli) to collect the ASR data in 14 languages. 

### Common Voice 9

We downloaded the Common Voice Corpus 9 from [the official website](https://commonvoice.mozilla.org/en/datasets)

### CoVOST 2

We collected the `X into English` data collected using [the official repository](https://github.com/facebookresearch/covost).


================================================
FILE: data/meanwhile.json
================================================
{
    "1YOmY-Vjy-o": {
        "begin": "1:04.0",
        "end": "2:11.0",
        "text": "FOLKS, IF YOU WATCH THE SHOW,\nYOU KNOW I SPEND A LOT OF TIME\nRIGHT OVER THERE, PATIENTLY AND\nASTUTELY SCRUTINIZING THE\nBOXWOOD AND MAHOGANY CHESS SET\nOF THE DAY'S BIGGEST STORIES,\nDEVELOPING THE CENTRAL\nHEADLINE-PAWNS, DEFTLY\nMANEUVERING AN OH-SO-TOPICAL\nKNIGHT TO F-6, FEIGNING A\nCLASSIC SICILIAN-NAJDORF\nVARIATION ON THE NEWS, ALL THE\nWHILE, SEEING EIGHT MOVES DEEP\nAND PATIENTLY MARSHALING THE\nLATEST PRESS RELEASES INTO A\nFISCHER SOZIN LIPNITZKY ATTACK\nTHAT CULMINATES IN THE ELEGANT,\nLETHAL, SLOW-PLAYED EN PASSANT\nCHECKMATE THAT IS MY NIGHTLY\nMONOLOGUE.\nBUT SOMETIMES, SOMETIMES,\nFOLKS-- I,\nSOMETIMES,\nI STARTLE AWAKE UPSIDE DOWN ON\nTHE MONKEY BARS OF A CONDEMNED\nPLAYGROUND ON A SUPERFUND SITE,\nGET ALL HEPPED UP ON GOOFBALLS,\nRUMMAGE THROUGH A DISCARDED TAG\nBAG OF DEFECTIVE TOYS, YANK\nOUT A FISTFUL OF DISEMBODIED\nDOLL LIMBS, TOSS THEM ON A\nSTAINED KID'S PLACEMAT FROM A\nDEFUNCT DENNY'S, SET UP A TABLE\nINSIDE A RUSTY CARGO CONTAINER\nDOWN BY THE WHARF, AND CHALLENGE\nTOOTHLESS DRIFTERS TO THE\nGODLESS BUGHOUSE BLITZ\nOF TOURNAMENT OF NEWS THAT IS MY\nSEGMENT:\nMEANWHILE!\n"
    },
    "3P_XnxdlXu0": {
        "begin": "2:08.3",
        "end": "3:02.3",
        "text": "FOLKS, I SPEND A LOT OF TIME\nRIGHT OVER THERE, NIGHT AFTER NIGHT ACTUALLY, CAREFULLY\nSELECTING FOR YOU THE DAY'S NEWSIEST,\nMOST AERODYNAMIC HEADLINES,\nSTRESS TESTING THE MOST TOPICAL\nANTILOCK BRAKES AND POWER\nSTEERING, PAINSTAKINGLY\nSTITCHING LEATHER SEATING SO\nSOFT, IT WOULD MAKE J.D. POWER\nAND HER ASSOCIATES BLUSH, TO\nCREATE THE LUXURY SEDAN THAT IS\nMY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES,\nFOLKS, I LURCH TO CONSCIOUSNESS\nIN THE BACK OF AN ABANDONED\nSCHOOL BUS AND SLAP MYSELF AWAKE\nWITH A CRUSTY FLOOR MAT BEFORE\nUSING A MOUSE-BITTEN TIMING BELT\nTO STRAP SOME OLD PLYWOOD TO A\nCOUPLE OF DISCARDED OIL DRUMS.\nTHEN, BY THE LIGHT OF A HEATHEN\nMOON, RENDER A GAS TANK OUT OF\nAN EMPTY BIG GULP, FILL IT WITH\nWHITE CLAW AND DENATURED\nALCOHOL, THEN LIGHT A MATCH AND\nLET HER RIP, IN THE DEMENTED\nONE-MAN SOAP BOX DERBY OF NEWS\nTHAT IS MY SEGMENT: MEANWHILE!"
    },
    "3elIlQzJEQ0": {
        "begin": "1:08.5",
        "end": "1:58.5",
        "text": "LADIES AND GENTLEMEN, YOU KNOW, I SPEND A\nLOT OF TIME RIGHT OVER THERE,\nRAISING THE FINEST HOLSTEIN NEWS\nCATTLE, FIRMLY, YET TENDERLY,\nMILKING THE LATEST HEADLINES\nFROM THEIR JOKE-SWOLLEN TEATS,\nCHURNING THE DAILY STORIES INTO\nTHE DECADENT, PROVENCAL-STYLE\nTRIPLE CREME BRIE THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES, FOLKS,\nI STAGGER HOME HUNGRY AFTER\nBEING RELEASED BY THE POLICE,\nAND ROOT AROUND IN THE NEIGHBOR'S\nTRASH CAN FOR AN OLD MILK\nCARTON, SCRAPE OUT THE BLOOMING\nDAIRY RESIDUE ONTO THE REMAINS\nOF A WET CHEESE RIND I WON\nFROM A RAT IN A PRE-DAWN STREET\nFIGHT, PUT IT IN A DISCARDED\nPAINT CAN, AND LEAVE IT TO\nFERMENT NEXT TO A TRASH FIRE,\nTHEN HUNKER DOWN AND HALLUCINATE\nWHILE EATING THE LISTERIA-LADEN\nDEMON CUSTARD OF NEWS THAT IS\nMY SEGMENT: MEANWHILE!"
    },
    "43P4q1KGKEU": {
        "begin": "0:29.3",
        "end": "1:58.3",
        "text": "FOLKS, IF YOU WATCH THE SHOW, YOU KNOW I SPEND MOST\nOF MY TIME, RIGHT OVER THERE.\nCAREFULLY SORTING THROUGH THE\nDAY'S BIGGEST STORIES, AND\nSELECTING ONLY THE MOST\nSUPPLE AND UNBLEMISHED OSTRICH\nAND CROCODILE NEWS LEATHER,\nWHICH I THEN ENTRUST TO ARTISAN\nGRADUATES OF THE \"ECOLE\nGREGOIRE-FERRANDI,\" WHO\nCAREFULLY DYE THEM IN A PALETTE\nOF BRIGHT, ZESTY SHADES, AND\nADORN THEM WITH THE FINEST, MOST\nTOPICAL INLAY WORK USING HAND\nTOOLS AND DOUBLE MAGNIFYING\nGLASSES, THEN ASSEMBLE THEM\nACCORDING TO NOW CLASSIC AND\nELEGANT GEOMETRY USING OUR\nSIGNATURE SADDLE STITCHING, AND\nLINE IT WITH BEESWAX-COATED\nLINEN, AND FINALLY ATTACH A\nMALLET-HAMMERED STRAP, PEARLED\nHARDWARE, AND A CLOCHETTE TO\nCREATE FOR YOU THE ONE-OF-A-KIND\nHAUTE COUTURE HERMES BIRKIN BAG\nTHAT IS MY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES, FOLKS,\nSOMETIMES, SOMETIMES I WAKE UP IN THE\nLAST CAR OF AN ABANDONED ROLLER\nCOASTER AT CONEY ISLAND, WHERE\nI'M HIDING FROM THE TRIADS, I\nHUFF SOME ENGINE LUBRICANTS OUT\nOF A SAFEWAY BAG AND STAGGER\nDOWN THE SHORE TO TEAR THE SAIL\nOFF A BEACHED SCHOONER, THEN I\nRIP THE CO-AXIAL CABLE OUT OF\nTHE R.V. OF AN ELDERLY COUPLE\nFROM UTAH, HANK AND MABEL,\nLOVELY FOLKS, AND USE IT TO\nSTITCH THE SAIL INTO A LOOSE,\nPOUCH-LIKE RUCKSACK, THEN I\nSTOW AWAY IN THE BACK OF A\nGARBAGE TRUCK TO THE JUNK YARD\nWHERE I PICK THROUGH THE DEBRIS\nFOR ONLY THE BROKEN TOYS THAT\nMAKE ME THE SADDEST UNTIL I HAVE\nLOADED, FOR YOU, THE HOBO\nFUGITIVE'S BUG-OUT BINDLE OF\nNEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "4ktyaJkLMfo": {
        "begin": "0:42.5",
        "end": "1:26.5",
        "text": "YOU KNOW, FOLKS, I SPEND A LOT\nOF TIME CRAFTING FOR YOU A\nBESPOKE PLAYLIST OF THE DAY'S\nBIGGEST STORIES, RIGHT OVER THERE, METICULOUSLY\nSELECTING THE MOST TOPICAL\nCHAKRA-AFFIRMING SCENTED\nCANDLES, AND USING FENG SHUI TO\nPERFECTLY ALIGN THE JOKE ENERGY\nIN THE EXCLUSIVE BOUTIQUE YOGA\nRETREAT THAT IS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES,\nI GO TO THE DUMPSTER BEHIND THE\nWAFFLE HOUSE AT 3:00 IN THE\nMORNING, TAKE OFF MY SHIRT,\nCOVER MYSELF IN USED FRY OIL,\nWRAP MY HANDS IN SOME OLD DUCT\nTAPE I STOLE FROM A BROKEN CAR\nWINDOW, THEN POUND A SIX-PACK OF\nBLUEBERRY HARD SELTZER AND A\nSACK OF PILLS I STOLE FROM A\nPARKED AMBULANCE, THEN\nARM-WRESTLE A RACCOON IN THE\nBACK ALLEY VISION QUEST OF NEWS\nTHAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "5Dsh9AgqRG0": {
        "begin": "1:06.0",
        "end": "2:34.0",
        "text": "YOU KNOW, FOLKS, I SPEND MOST OF\nMY TIME RIGHT OVER THERE, MINING\nTHE DAY'S BIGGEST, MOST\nIMPORTANT STORIES, COLLECTING\nTHE FINEST, MOST TOPICAL IRON\nORE, HAND HAMMERING IT INTO JOKE\nPANELS.\nTHEN I CRAFT SHEETS OF BRONZE\nEMBLAZONED WITH PATTERNS THAT\nTELL AN EPIC TALE OF CONQUEST\nAND GLORY.\nTHEN, USING THE GERMANIC\nTRADITIONAL PRESSBLECH\nPROCESS, I PLACE THIN SHEETS OF\nFOIL AGAINST THE SCENES, AND BY\nHAMMERING OR OTHERWISE,\nAPPLYING PRESSURE FROM THE BACK,\nI PROJECT THESE SCENES INTO A\nPAIR OF CHEEK GUARDS AND A\nFACEPLATE.\nAND, FINALLY, USING FLUTED\nSTRIPS OF WHITE ALLOYED\nMOULDING, I DIVIDE THE DESIGNS\nINTO FRAMED PANELS AND HOLD IT\nALL TOGETHER USING BRONZE RIVETS\nTO CREATE THE BEAUTIFUL AND\nINTIMIDATING ANGLO-SAXON\nBATTLE HELM THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES, FOLKS,\nSOMETIMES, JUST SOMETIMES, I COME TO MY SENSES FULLY NAKED\nON THE DECK OF A PIRATE-BESIEGED\nMELEE CONTAINER SHIP THAT PICKED\nME UP FLOATING ON THE DETACHED\nDOOR OF A PORT-A-POTTY IN THE\nINDIAN OCEAN.\nTHEN, AFTER A SUNSTROKE-INDUCED\nREALIZATION THAT THE CREW OF\nTHIS SHIP PLANS TO SELL ME IN\nEXCHANGE FOR A BAG OF ORANGES TO\nFIGHT OFF SCURVY, I LEAD A\nMUTINY USING ONLY A P.V.C. PIPE\nAND A POOL CHAIN.\nTHEN, ACCEPTING MY NEW ROLE AS\nCAPTAIN, AND DECLARING MYSELF\nKING OF THE WINE-DARK SEAS, I\nGRAB A DIRTY MOP BUCKET COVERED\nIN BARNACLES AND ADORN IT WITH\nTHE TEETH OF THE VANQUISHED, TO\nCREATE THE SOPPING WET PIRATE\nCROWN OF NEWS THAT IS MY\nSEGMENT:\n\"MEANWHILE!\" "
    },
    "748OyesQy84": {
        "begin": "0:40.0",
        "end": "1:41.0",
        "text": "FOLKS, IF YOU WATCH THE SHOW, YOU KNOW, I SPEND MOST OF\nMY TIME, RIGHT OVER THERE,\nCAREFULLY BLENDING FOR YOU THE\nDAY'S NEWSIEST, MOST TOPICAL\nFLOUR, EGGS, MILK, AND BUTTER,\nAND STRAINING IT INTO A FINE\nBATTER TO MAKE DELICATE, YET\nINFORMATIVE COMEDY PANCAKES.\nTHEN I GLAZE THEM IN THE JUICE\nAND ZEST OF THE MOST RELEVANT\nMIDNIGHT VALENCIA ORANGES, AND\nDOUSE IT ALL IN A FINE DELAMAIN\nDE VOYAGE COGNAC, BEFORE\nFLAMBEING AND BASTING THEM TABLE\nSIDE, TO SERVE FOR YOU THE JAMES\nBEARD AWARD-WORTHY CREPES\nSUZETTE THAT IS MY NIGHTLY,\nMONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES FOLKS, I WAKE UP IN THE\nBAGGAGE HOLD OF A GREYHOUND BUS\nAS ITS BEING HOISTED BY THE\nSCRAPYARD CLAW TOWARD THE BURN\nPIT, ESCAPE TO A NEARBY\nABANDONED PRICE CHOPPER, WHERE I\nSCROUNGE FOR OLD BREAD SCRAPS,\nBUSTED OPEN BAGS OF STAR FRUIT\nCANDIES, AND EXPIRED EGGS,\nCHUCK IT ALL IN A DIRTY HUBCAP\nAND SLAP IT OVER A TIRE FIRE\nBEFORE USING THE LEGS OF A\nSTAINED PAIR OF SWEATPANTS AS\nOVEN MITTS TO EXTRACT AND SERVE\nTHE DEMENTED TRANSIENT'S POUND\nCAKE OF NEWS THAT IS MY SEGMENT:\n\"MEANWHILE.\""
    },
    "8prs9Pq5Xhk": {
        "begin": "1:18.5",
        "end": "2:17.5",
        "text": "FOLKS, IF YOU WATCH THE SHOW,\nAND I HOPE YOU DO, I SPEND A\nLOT OF TIME RIGHT OVER THERE,\nTIRELESSLY STUDYING THE LINEAGE\nOF THE DAY'S MOST IMPORTANT\nTHOROUGHBRED STORIES AND\nHOLSTEINER HEADLINES, WORKING\nWITH THE BEST TRAINERS MONEY CAN\nBUY TO REAR THEIR COMEDY\nOFFSPRING WITH A HAND THAT IS\nSTERN, YET GENTLE, INTO THE\nTRIPLE-CROWN-WINNING EQUINE\nSPECIMEN THAT IS MY NIGHTLY\nMONOLOGUE.\nBUT SOMETIMES, SOMETIMES, FOLKS\nI BREAK INTO AN UNINCORPORATED\nVETERINARY GENETICS LAB AND GRAB\nWHATEVER TEST TUBES I CAN FIND.\nAND THEN, UNDER A GROW LIGHT I GOT\nFROM A DISCARDED CHIA PET, I MIX\nTHE PILFERED D.N.A. OF A HORSE\nAND WHATEVER WAS IN A TUBE\nLABELED \"KEITH-COLON-EXTRA,\"\nSLURRYING THE CONCOCTION WITH\nCAFFEINE PILLS AND A MICROWAVED\nRED BULL, I SCREAM-SING A PRAYER\nTO JANUS, INITIATOR OF HUMAN\nLIFE AND GOD OF TRANSFORMATION\nAS A HALF-HORSE, HALF-MAN FREAK,\nSEIZES TO LIFE BEFORE ME IN THE\nHIDEOUS COLLECTION OF LOOSE\nANIMAL PARTS AND CORRUPTED MAN\nTISSUE THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "9gX4kdFajqE": {
        "begin": "0:44.0",
        "end": "1:08.0",
        "text": "FOLKS, IF YOU WATCH THE SHOW,\nYOU KNOW SOMETIMES I'M OVER\nTHERE DOING THE MONOLOGUE.\nAND THEN THERE'S A COMMERCIAL\nBREAK, AND THEN I'M SITTING\nHERE.\nAND I DO A REALLY LONG\nDESCRIPTION OF A DIFFERENT\nSEGMENT ON THE SHOW, A SEGMENT\nWE CALL... \"MEANWHILE!\""
    },
    "9ssGpE9oem8": {
        "begin": "0:00.0",
        "end": "0:58.0",
        "text": "WELCOME BACK, EVERYBODY.\nYOU KNOW, FOLKS, I SPEND MOST OF\nMY TIME RIGHT OVER THERE,\nCOMBING OVER THE DAY'S NEWS,\nSELECTING ONLY THE HIGHEST\nQUALITY AND MOST TOPICAL BONDED\nCALFSKIN-LEATHER STORIES,\nCAREFULLY TANNING THEM AND CUTTING\nTHEM WITH MILLIMETER PRECISION,\nTHEN WEAVING IT TOGETHER IN\nA DOUBLE-FACED INTRECCIATO\nPATTERN TO CREATE FOR YOU THE\nEXQUISITE BOTTEGA VENETA CLUTCH\nTHAT IS MY MONOLOGUE.\nBUT SOMETIMES, WHILE AT A RAVE\nIN A CONDEMNED CEMENT FACTORY, I\nGET INJECTED WITH A MYSTERY\nCOCKTAIL OF HALLUCINOGENS AND\nPAINT SOLVENTS, THEN, OBEYING\nTHE VOICES WHO WILL STEAL MY\nTEETH IF I DON'T, I STUMBLE INTO\nA SHIPYARD WHERE I RIP THE\nCANVAS TARP FROM A GRAVEL TRUCK,\nAND TIE IT OFF WITH THE ROPE FROM A\nROTTING FISHING NET, THEN WANDER\nA FOOD COURT, FILLING IT WITH\nWHAT I THINK ARE GOLD COINS BUT\nARE, IN FACT, OTHER PEOPLE'S CAR\nKEYS, TO DRAG AROUND THE\nROOTLESS TRANSIENT'S CLUSTER\nSACK OF NEWS THAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "ARw4K9BRCAE": {
        "begin": "0:26.0",
        "end": "1:16.0",
        "text": "YOU KNOW, FOLKS, I SPEND\nMOST OF MY TIME STANDING RIGHT OVER\nTHERE,\nGOING OVER THE DAY'S NEWS\nAND SELECTING THE FINEST,\nMOST\nTOPICAL CARBON FIBER\nSTORIES, SHAPING THEM IN DRY\nDOCK INTO A\nSLEEK AND SEXY HULL, KITTING\nIT OUT WITH THE MOST TOPICAL\nFIBERGLASS AND TEAK FITTINGS\nBRASS RAILINGS, HOT TUBS,\nAND\nNEWS HELIPADS, TO CREATE THE\nCUSTOM DESIGNED, GLEAMING\nMEGA-YACHT THAT IS MY NIGHTLY\nMONOLOGUE. BUT SOMETIMES, JUST SOMETIMES FOLKS, I\nWASH ASHORE AT\nAN ABANDONED BEACH RESORT\nAFTER A NIGHT OF BATH SALTS\nAND\nSCOPOLAMINE, LASH SOME\nROTTING PICNIC TABLES\nTOGETHER, THEN\nDREDGE THE NEWS POND TO\nHAUL UP WHATEVER DISCARDED\nTRICYCLES AND BROKEN\nFRISBEES I CAN FIND, STEAL\nAN EYE PATCH\nFROM A HOBO, STAPLE A DEAD\nPIGEON TO MY SHOULDER, AND\nSAIL\nINTO INTERNATIONAL WATERS\nON THE PIRATE GARBAGE SCOW\nOF\nNEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "B1DRmrOlKtY": {
        "begin": "1:34.0",
        "end": "2:17.0",
        "text": "FOLKS, I SPEND A LOT OF TIME\nSTANDING RIGHT OVER THERE, OKAY,\nHANDPICKING ONLY THE RIPEST\nMOST TOPICAL DAILY HEADLINES,\nSEEKING OUT THE SWEETEST, MOST\nREFINED OF JEST-JAMS AND\nJOKE-JELLIES, CURATING A PLATE\nOF LOCAL COMEDY-FED MEATS AND\nSATIRICAL CHEESES TO LAY OUT THE\nARTISANAL CHARCUTERIE BOARD\nA-NEWS BOUCHE THAT IS MY NIGHTLY\nMONOLOGUE.\nBUT SOMETIMES, I WAKE UP IN THE\nGREASE TRAP OF AN\nUNLICENSED SLAUGHTERHOUSE,\nSPLASH MY FACE WITH SOME BEEF TALLOW,\nRENDER A RUDIMENTARY PRESERVE\nFROM BONE MARROW AND MELTED\nGUMMY WORMS, AND THROW IT\nTOGETHER WITH SOME SWEETBREADS\nAND TRIPE TO PRESENT THE\nPLOWMAN'S PLATTER OF UNCLAIMED\nOFFAL THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "BT950jqCCUY": {
        "begin": "1:00.0",
        "end": "2:09.0",
        "text": "YOU KNOW, FOLKS,\nI SPEND SO MUCH OF MY TIME RIGHT\nOVER THERE, SIFTING THROUGH THE\nDAY'S BIGGEST STORIES, HAND\nSELECTING ONLY THE FINEST, MOST\nPERFECTLY AGED BURMESE NEWS\nTEAK.\nTHEN CAREFULLY CARVING AND\nSHAPING IT INTO A REFINED\nBUDDHA, DEPICTED IN THE \"CALLING\nTHE EARTH TO WITNESS\" GESTURE,\nWITH CHARACTERISTIC CIRCULAR\nPATTERNS ON THE ROBE, SHINS, AND\nKNEES, OF COURSE, WHICH I\nTHEN CAREFULLY GILD WITH THE\nMOST TOPICAL GOLD LEAF.\nTHEN I HARVEST THE SAP OF A\nUSITATA TREE TO APPLY THE DELICATE\nTAI O LACQUER, AND FINALLY HAND\nDECORATE IT WITH THE WHITE GLASS\nINLAYS TO CREATE FOR YOU THE\nGLORIOUS AMARA-PURA PERIOD\nSCULPTURE THAT IS MY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES FOLKS,\nI WASH UP ON A GALVESTON\nBEACH, NAKED ON A RAFT OF GAS\nCANS AFTER ESCAPING FROM A FIGHT CLUB\nIN INTERNATIONAL WATERS.\nTHEN, STILL DERANGED ON A\nCOCKTAIL OF MESCALINE AND COUGH\nSYRUP, I STEAL A CINDER BLOCK\nFROM UNDER A STRIPPED '87 FORD\nTAURUS,  AND CHIP AWAY AT IT\nWITH A BROKEN UMBRELLA HANDLE I\nSCAVENGED FROM A GOODWILL\nDUMPSTER UNTIL IT VAGUELY\nRESEMBLES THE HUNGERING WOLF\nTHAT SCRATCHES AT THE DOOR\nOF MY DREAMS AND PRESENT TO YOU\nTHE TORMENTED DREAD-EFFIGY OF\nNEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "C0e8XM30tQI": {
        "begin": "0:57.0",
        "end": "2:40.0",
        "text": "YOU KNOW, IF YOU WATCH THE SHOW, YOU'RE AWARE THAT I SPEND MOST OF\nMY TIME RIGHT OVER THERE,\nWANDERING THE NEWS FOREST FOR\nYOU, FELLING ONLY THE BIGGEST\nAND HEARTIEST WHITE STORY OAKS,\nCUTTING AND SHAPING THEM INTO\nTHE NEWSIEST, MOST TOPICAL\nCLEATS, CLAMPS AND PLANKS,\nKEEPING THEM AT A CONSTANT\nANGLE, GRADUALLY CREATING A\nSHELL-SHAPED, SHALLOW BOWL HULL\nUSING THE FIRE-BENDING TECHNIQUE\nINSTEAD OF STEAM-BENDING\nOBVIOUSLY.\nTHEN I LAY OUT ALL THE KEEL\nBLOCKS TO CAREFULLY SET UP THE\nSTEM, STERN AND, GARBOARD,\nATTACH THE BILGE FUTTOCKS TO THE\nTIMBER AND LOVINGLY CRAFT A FLAT\nTRANSOM STERN OUT OF\nNATURALLY-CURVED QUARTER\nCIRCLES.\nTHEN SECURE ALL THE PLANKS WITH\nTRUNNELS HANDMADE FROM THE\nFINEST LOCUST WOOD AND, FINALLY,\nADORN IT WITH A PROUD BOWSPRIT,\nFOREPEAK, AND CUSTOM GILDED\nFIGUREHEAD TO PRESENT TO YOU THE\nDUTCH GOLDEN AGE \"SPIEGEL-JACHT\"\nTHAT IS MY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES,\nFOLKS-- YOU GOT TO HYDRATE\nAFTER THAT.\n\"SPIEGEL-JACHT\"\nBUT SOMETIMES, I\nAWAKEN FROM A MEAT-SWEAT-\nINDUCED FEVER STRAPPED TO A\nBASKET ON THE WONDER WHEEL AT\nCONEY ISLAND, STUMBLE ACROSS THE\nGARBAGE-FLECKED BEACH TO THE\nSOUND OF A TERRIFYING RAGGED\nBELLOW I REALIZE IS COMING FROM\nMY OWN LUNGS, WHICH THEN SUMMONS AN\nARMY OF SEAGULLS WHOM I INSTRUCT\nTO GATHER THE HALF-EMPTIED CANS\nOF BUSCH LIGHT LEFT BY A MOB OF\nBELGIAN TOURISTS, ALL OF WHICH\nI GATHER INTO A SACK I FASHIONED\nOUT OF PANTS I STOLE FROM A\nSLEEPING COP. THEN I SWIPE A\nGIANT INFLATABLE BABY YODA FROM\nA CARNY GAME, STRAP IT TO THE\nMAST I MADE BY RIPPING A B-68\nBUS STOP SIGN OUT OF THE GROUND\nON THE CORNER OF STILLWELL, AND\nLAUNCH THE VESSEL AS CAPTAIN OF\nTHE UNREGULATED PIRATE BOOZE\nCRUISE OF NEWS THAT IS MY\nSEGMENT:\n\"MEANWHILE\"!"
    },
    "CKsASCGr_4A": {
        "begin": "1:48.3",
        "end": "2:37.3",
        "text": "FOLKS, YOU  KNOW, I SPEND A LOT\nOF TIME RIGHT OVER THERE, CARVING THE\nFINEST, MOST-TOPICAL JAPANESE\nNEWS CYPRESS INTO AN EXPRESSIVE\nHANNYA DEMON MASK, DONNING MY\nBLACK AND GOLD SHOZOKU ROBES\nMADE OF THE SMOOTHEST STORY\nSILK, AND MASTERING THE ELABORATE\nCHOREOGRAPHY OF STILLNESS AND\nMOVEMENT, TO CREATE THE JAPANESE\nNOH THEATER PRODUCTION THAT IS\nMY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES,\nFOLKS, I ENTER A FUGUE STATE IN\nTHE MIDDLE OF THE NIGHT, SIPHON\nA BUCKETFUL OF GASOLINE OUT OF MY\nNEIGHBOR'S MAZDA, RUN BAREFOOT\nFROM MY HOUSE TO A HOVEL UNDER\nTHE TURNPIKE WHERE I ASK A HOBO\nFOR A LIGHT AND SET A GARBAGE\nCAN ABLAZE, THEN PLACE MY\nFROSTBITTEN HANDS IN FRONT OF\nTHE DUMPSTER TO PROJECT THE\nSHADOW PUPPET WINO OPERA OF NEWS\nTHAT IS MY SEGMENT, \"MEANWHILE.\""
    },
    "DSc26qAJp_g": {
        "begin": "0:28.0",
        "end": "1:46.0",
        "text": "FOLKS, I SPEND MOST OF MY TIME\nRIGHT OVER THERE, ASSEMBLING THE\nDAY'S BIGGEST, MOST-IMPORTANT\nSTORIES, THEN HAND-SHAPING THEM\nINTO SLEEK, ELEGANT BODYWORK,\nWHICH I LINE WITH ONLY THE\nFINEST, MOST TOPICAL POLISHED\nMACASSAR EBONY AND OPEN-PORE\nPALDAO VENEER, ADDING LIGHT\nMOCCASIN AND DARK SPICE LEATHER\nSEATS, AND MALABAR TEAK WOOD TO\nSET OFF A SCULPTED MINIMALIST\nSWITCHGEAR, ACCOMPANIED BY A\nSTERLING SILVER HUMIDOR AND\nCHAMPAGNE CELLARETTE, THEN\nHAND-SET 1,600 FIBER OPTIC\nLIGHTS ALIGNED WITH PINPOINT\nPERFORATIONS IN THE ROOF-LINING,\nTO CREATE FOR YOU THE BESPOKE\nCOACH BUILT ROLLS-ROYCE SWEPTAIL\nTHAT IS MY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES, FOLKS,\nSOMETIMES, I JUST, I JUST SHRIEK AWAKE IN THE\nSCRAPYARD OF A DERELICT MACHINE\nSHOP, SCAVENGE A\n3.6-HORSEPOWER BRIGGS AND STRATTON\nLAWN MOWER ENGINE, BULLY A\nDOG INTO GIVING ME ITS FRISBEE\nTO USE AS A STEERING WHEEL, THEN\nI BRIEFLY CONSIDER-- BUT DECIDE\nAGAINST-- SWIPING THE BRAKE PADS\nOFF AN UNATTENDED HEARSE,\nBECAUSE WHERE I'M GOING, WE DON'T\nNEED BRAKES.\nI HOOK IT ALL UP TO A RUSTY\nDOLLAR TREE SHOPPING CART,\nSHOTGUN A WHITE CLAW, NO LAW ON THE CLAW, SPRAY\nPAINT MY TEETH, AND BLAST OFF IN\nTHE \"FURY ROAD\" THUG-BUGGY OF\nNEWS THAT IS MY SEGMENT:\n\"MEANWHILE\"!"
    },
    "DhuCyncmFgM": {
        "begin": "0:43.5",
        "end": "1:30.5",
        "text": "YOU KNOW, FOLKS, I\nSPEND A LOT OF TIME STANDING RIGHT\nOVER THERE, COMBING THROUGH\nTHE DAY'S\nSTORIES TO FIND AND ERECT\nFOR YOU THE FINEST GRECIAN\nNEWS\nCOLUMNS, ADORNING THEM WITH\nTHE MOST UP-TO-THE-MINUTE\nBAS-RELIEF.\nAND THEN I IMPART MY MOST\nTOPICAL\nTEACHINGS TO BE ABSORBED BY\nEAGER, SPONGE-LIKE\nMINDS IN THE\nAUGUST ATHENIAN ACADEMY THAT\nIS MY MONOLOGUE.\nBUT SOMETIMES, JUST\nSOMETIMES, FOLKS, I COME TO IN A\nDRIED-OUT BABY POOL\nON AN ABANDONED RANCH I WON\nIN A RUSSIAN ROULETTE GAME\nOFF THE COAST OF MOZAMBIQUE\nI GATHER TUMBLEWEEDS\nAND I LASH THEM TOGETHER WITH\nSOME TWINE I FOUND IN A DUMPSTER\nBY A BURNED-OUT REST STOP,\nTHEN I\nSHOTGUN A HOT MONSTER ENERGY\nDRINK AND CHEW ON MACA ROOT\nAS I\nHALLUCINATE THROUGH THE\nNIGHT IN THE VAGRANT'S HOT\n-BOX YURT OF\nNEWS THAT IS MY SEGMENT:\n\"MEANWILE!\""
    },
    "EnGHyZS4f-8": {
        "begin": "1:33.0",
        "end": "2:12.0",
        "text": "YOU KNOW, FOLKS, I'VE SPENT\nDECADES CULTIVATING THE MOST\nRELEVANT RED OAK, FELLING THEM\nWITH TOPICAL HUSQVARNAS,\nMULTIGRADING THEM INTO THE MOST\nBUZZWORTHY THREE-QUARTER INCH\nFLOORSTRIPS, AND FINISHING THEM\nWITH UP-TO-THE-MINUTE HIGH-GLOSS\nPOLYURETHANE, TO LAY FOR YOU THE\nFLAWLESS PARQUET N.B.A.-QUALITY\nBASKETBALL COURT THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, I SCARE THE GEESE\nOUT OF THE BOG BEHIND MY UNCLE'S\nSHED, FILL IT WITH SAND I STOLE\nFROM AN ABANDONED PLAYGROUND,\nTHEN BLANKET IT WITH WET TRASH\nAND DISCARDED AMAZON BOXES, TO\nCREATE FOR YOU THE\nMUSKRAT-RIDDLED BOUNCELESS\nBACKYARD WALLBALL PIT OF NEWS\nTHAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "G8ajua4Mb5I": {
        "begin": "1:39.0",
        "end": "2:24.0",
        "text": "YOU KNOW, FOLKS,\nIF YOU WATCH THIS SHOW AND I\nHOPE YOU DO,\nI SPEND A LOT OF TIME RIGHT OVER\nTHERE, CAREFULLY ERECTING THE\nNEWSIEST, MOST TOPICAL\nCORINTHIAN COLUMNS, FOLLOWING\nTHE GOLDEN RATIO, POLISHING THE\nFINEST CARRARA MARBLE, AND\nENGRAVING ONTO IT MYTHIC TALES\nWITH FLOURISHES OF HUMOR AND\nPATHOS, TO CREATE THE\nGRECO-ROMAN ACROPOLIS THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES, FOLKS,\nI GRAB A COUPLE OF OFF-BRAND\nLEGOS THAT HAVE BEEN JAMMED\nBETWEEN MY TOES SINCE I STEPPED\nON THEM IN 2003, FISH THE STICKS\nOUT OF SOME MELTED POPSICLES IN\nA BROKEN FREEZER, COLLECT THE\nPRE-SOAKED SAND FROM A\nPLAYGROUND I BROKE INTO IN THE\nDEAD OF NIGHT, AND THROW IT ALL\nTOGETHER TO CONSTRUCT FOR YOU\nTHE RAMSHACKLE PARTHENON OF NEWS\nTHAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "I4s-44cPYVE": {
        "begin": "1:10.0",
        "end": "2:59.0",
        "text": "FOLKS, YOU KNOW, IF YOU WATCH THE SHOW, YOU KNOW, I SPEND\nMOST OF MY TIME\nRIGHT OVER THERE, SURVEYING THE\nNEWS MARKET FOR THE\nBIGGEST STORIES, THEN CAREFULLY\nSELECTING THE FINEST, MOST\nTOPICAL BUFFALO NEWS HIDE WHICH\nI THEN SOAK USING NATURAL SPRING\nAND LIMEWATER-- ONLY DURING\nCOLDER MONTHS-- AND SCRAPE IT\nUNTIL IT IS EVENLY TRANSLUCENT\nAND SUPPLE.\nAND THEN, USING THE TRADITIONAL\nPUSH-KNIFE METHOD, I DELICATELY\nMAKE MORE THAN 3,000 CUTS TO\nCREATE THE EMOTIVE AND POWERFUL\nFIGURINE WHICH I DECORATE WITH\nGRADATIONS AND CONTRAST,\nEMPLOYING THE SHAANXI REGION\nFIVE-COLOR SYSTEM.\nTHEN I CAREFULLY CONNECT THE\nFIGURINE'S JOINTS WITH COTTON\nTHREADS SO THEY CAN BE OPERATED\nFREELY, AND FIRE UP A PAPER\nLANTERN BEHIND A FINE HUANGZHOU\nSILK SCREEN, AND, BACKED BY A\nSUONA HORN, AND YUEQIN, AND\nBANHU FIDDLE, I OPERATE NO LESS\nTHAN FIVE OF THESE FIGURINES AT\nONCE, BECOMING THE LIVING\nEMBODIMENT OF THE 1,000-HAND\nGWAN-YIN, TO MOUNT FOR YOU THE\nEPIC AND MOVING TONG DYNASTY\nPI-YING SHADOW PLAY THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, FOLKS,\nSOMETIMES,\nIT'S CRAFTSMANSHIP.\nIT GOES LIKE THAT, RIGHT\nTHERE.\nSOMETIMES, FOLKS,\nI AM PECKED\nAWAKE BY A DIRTY SEAGULL ON THE\nINTRACOASTAL WATERWAY, WHILE\nSTILL LYING ON THE BACK OF A\nMANATEE WHO RESCUED ME FROM SOME\nARMS DEALERS I DOUBLE-CROSSED\nOFF THE COAST OF CAPE FEAR, AND WHO THEN\nDUMPS ME ON AN ABANDONED WHARF\nWHERE I SLIP THEIR DIRTY SOCK OFF\nA SEVERED FOOT I FISHED OUT OF A\nSTORM DRAIN, AND SLAP GOOGLY\nEYES ON IT MADE FROM TWO MENTOS\nI PRIED OUT OF THE MOUTH OF A\nMANGY COYOTE.\nTHEN I FASHION A KAZOO OUT OF A\nPOCKET COMB I STOLE FROM A\nFISHERMAN AND THE WAX PAPER FROM\nHIS MEATBALL SUB, TO HONK OUT A\nDIRGE WHILE YAMMERING A TONE\nPOEM ABOUT THE DEMONS INFESTING\nMY BEARD IN THE UNBALANCED MANIC\nSOCK PUPPET SHOW OF NEWS THAT IS\nMY SEGMENT:\n\"MEANWHILE!\""
    },
    "JAfAApqOeFU": {
        "begin": "1:49.5",
        "end": "2:42.5",
        "text": "FOLKS.\nI SPEND A LOT OF MY TIME GATHERING FOR YOU\nTHE FINEST, MOST TOPICAL STORIES\nABOUT NATIONAL STUDIES,\nSCIENTIFIC BREAKTHROUGHS, AND\nDRUNK MOOSE BREAKING INTO ICE\nCREAM SHOPS, ONLY TO HAVE A\nPANDEMIC HIT, DURING WHICH I\nTAKE THEM INTO A SAD, EMPTY\nLITTLE ROOM WHERE MY ONLY\nFRIENDS ARE ROBOT CAMERAS AND A\nPURELL BOTTLE, AND I LET MYSELF\nGO WHILE SLOWLY DESCENDING INTO\nMADNESS AS I SHOVE MY SWEET\nINNOCENT LITTLE JOKES INTO A\nSEGMENT THAT I AM FORCED TO\nRENAME \"QUARANTINE-WHILE.\"\nBUT SOMETIMES, I CRAWL OUT OF\nTHE BROOM CLOSET AFTER 15\nMONTHS, POUR MYSELF BACK INTO A\nSUIT, ASSEMBLE THE TOP TEAM IN\nTHE BUSINESS, THE SWINGINGEST\nBAND IN LATE NIGHT, AND THE\nBEST DAMN AUDIENCE IN THE WORLD.\nSO I CAN RETURN TO YOUR LOVING\nARMS IN THE KICK-ASS,\nPROPERLY-PRESENTED\nCELEBRATION OF MARGINAL NEWS\nTHAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "JfT59wBSQME": {
        "begin": "0:52.0",
        "end": "1:37.0",
        "text": "YOU KNOW, FOLKS, I SPEND A\nLOT OF TIME HARVESTING THE DAY'S\nMOST TOPICAL MATCHA POWDER,\nCAREFULLY POLISHING THE NEWSIEST\nCHAWAN TEA BOWL WITH A HEMP\nCLOTH, AND ADDING THE PUREST\nBOILED WATER COLLECTED FROM THE\nRIVER OF STORIES, TO STAGE THE\nELEGANT JAPANESE TEA CEREMONY\nTHAT IS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES, I CONVINCE A\nTRUCK DRIVER TO GIVE ME A RIDE\nIN EXCHANGE FOR AN UNREGISTERED\nHAND GUN AND A HALF-EATEN CAN OF\nBEANS, HITCHHIKE TO THE SONORA\nDESERT WITH NOTHING BUT AN OLD\nPOT THAT I FILL WITH THE\nNEWSPAPER I'VE BEEN USING FOR A\nBLANKET, AND THE SALVAGED\nTOBACCO FROM A SIDEWALK CIGARETTE\nBUTT, TO BREW FOR YOU, THE\nNIGHTMARE HALLUCINATION-INDUCING\nFERMENTED AYAHUASCA SLURRY OF\nNEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "KT8pCZ5Xw9I": {
        "begin": "0:37.6",
        "end": "1:26.6",
        "text": "FOLKS, YOU KNOW,\nIF YOU WATCH THE SHOW,\nTHAT I SPEND A LOT OF TIME RIGHT OVER\nTHERE, GATHERING THE FRESHEST,\nNEWSIEST HEADLINE FLOWERS,\nSCOURING THE FIELDS AND FORESTS\nFOR THE MOST TOPICAL AND\nFRAGRANT SYLVAN NEWS BOUGHS, THE\nJOKIEST FESTIVE GOURDS, AND THEN\nCAREFULLY ASSEMBLING AND\nARRANGING THEM ALL INTO THE\nGRAND YET TASTEFUL STATE\nDINNER-WORTHY CENTERPIECE THAT\nIS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES, NOW, SOMETIMES,\nI RUB SOME LEAD PAINT CHIPS ONTO\nMY GUMS, STAGGER INTO THE WOODS\nWITH NOTHING BUT A STAPLE GUN\nAND SOME EMPTY CANS OF SPRAY\nPAINT, AND THEN, BY THE LIGHT OF THE\nTIRE-FIRE, USING SMASHED LARVAE\nAND MY OWN SALIVA AS GLUE, I\nCOBBLE TOGETHER A CRUDE PILE OF\nPUNKY WOOD AND ANIMAL SKULLS TO\nPRESENT TO YOU THE UNHINGED\nLONER'S CORNUCOPIA OF NEWS THAT\nIS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "L-kR7UCzhTU": {
        "begin": "1:36.0",
        "end": "2:19.0",
        "text": "YOU KNOW, I SPEND A LOT OF TIME\nRIGHT OVER THERE HAND-RAISING\nAND SELECTING THE NEWEST,\nMOST-TOPICAL SEVILLE ORANGES,\nCAREFULLY SIMMERING THEM WITH\nTURBINADO SUGAR AND PRISTINE\nFILTERED WATER TO CREATE FOR YOU\nA DOUBLE-GOLD-MEDAL-WINNING\nBITTERSWEET ENGLISH MARMALADE TO\nSPREAD ON THE PERFECTLY TOASTED\nARTISANAL BRIOCHE THAT IS\nMY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES, FOLKS,\nI GRAB AN EXPIRED TUB OF\nWHIPPING CREAM, TOSS IT IN A\nBLENDER WITH THE HALF OF A\nGOGURT I FOUND IN A SCHOOLYARD,\nAND LET THAT FERMENT BEHIND THE\nFURNACE WHILE I FISH A DRIED\nPITA OUT OF THE GUTTER\nUNDERNEATH THE CORN-- CORNER KEBAB\nSTAND, THEN SLATHER IT WITH THE\nUNPRESSURIZED NIGHTMARE\nDAIRY OF NEWS THAT IS MY\nSEGMENT:\n\"MEANWHILE!\""
    },
    "Lf-LkJhKVhk": {
        "begin": "1:14.0",
        "end": "1:53.0",
        "text": "YOU KNOW, I SPEND A LOT OF TIME\nOVER THERE CAREFULLY ASSEMBLING\nTHE MOST TOPICAL VIRTUOSO WIND\nSECTION, TUNING THE VIOLAS,\nCELLOS, AND CONTRABASS TO THE\nCOUNTRY'S ZEITGEIST, AND WAVING\nMY CONDUCTOR'S BATON TO THE\nTEMPLE OF HUMOR TO PRESENT FOR\nYOU THE SPECTACULAR BRAHMS\nCONCERTO IN SATIRE MAJOR THAT IS\nMY MONOLOGUE.\nBUT SOMETIMES, I WAKE UP IN A\nFUGUE STATE BEHIND THE ABANDONED\nVIDEO STORE, STEAL A GARBAGE CAN\nLID AND A BROKEN UMBRELLA\nHANDLE, AND THEN GRAB AN EMPTY CAN\nOF P.B.R. I'VE BEEN USING AS AN\nASHTRAY TO FASHION A RUSTED\nKAZOO, ALL TO CREATE THE\n2-IN-THE-MORNING ONE-MAN-STOMP\nSHOW OF NEWS THAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "P72uFdrkaVA": {
        "begin": "0:49.3",
        "end": "1:48.3",
        "text": "FOLKS, IF YOU WATCH THE SHOW, YOU KNOW I SPEND A LOT OF\nTIME RIGHT OVER THERE, TENDERLY\nCLIPPING THE NEWSIEST, MOST\nFRAGRANT TEA-LEAF BUDS OF THE\nDAY, GINGERLY LAYING THEM TO DRY\nBY THE LIGHT OF THE\nROSY-FINGERED DAWN,\nPAINSTAKINGLY STEEPING THEM TO\nPERFECTION IN THE MOST TOPICAL\nOF FRESH WATER GATHERED FROM THE\nNATION'S NEWS RESERVOIR, BEFORE\nCEREMONIOUSLY SERVING TO YOU THE\nANTIOXIDANT-RICH ELIXIR OF\nBESPOKE TEA SHAN THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES FOLKS, AFTER\nA BENDER ON ABSINTHE AND\nMOUNTAIN DEW CODE RED, I SWEAT\nMYSELF AWAKE HOVERED OVER A VAT\nOF BLISTERING SEWER RUNOFF.\nI ADD TO THE GURGLING POT OF\nNIGHTMARES WHATEVER THE VOICES\nDEMAND: SCRAPS OF WET TRASHCAN\nLETTUCE AND BAND-AIDS FROM THE\nCOMMUNITY POOL.\nTHEN, USING AN OLD GYM SOCK I\nFOUND AT THE HOBOKEN Y.M.C.A., I\nSTRAIN IT ALL INTO A DISUSED\nGASOLINE CONTAINER TO OFFER YOU\nTHE SCALDING-HOT DEMENTED DEMON\nTONIC OF NEWS THAT IS MY\nSEGMENT: \"MEANWHILE!\""
    },
    "PT5_00Bld_8": {
        "begin": "0:57.0",
        "end": "2:02.0",
        "text": "FOLKS, YOU KNOW, I SPEND A LOT OF MY TIME,\nRIGHT OVER THERE, CRUISING THE\nVAST TSUKIJI FISH MARKET OF THE\nDAY'S BIGGEST STORIES, CAREFULLY\nSURVEYING THE FRESHEST, MOST\nTOPICAL CATCH OF THE DAY,\nCHOOSING ONLY THE HIGHEST GRADE\nAHI NEWS TUNA, AWABI, AND UNAGI,\nTHEN DELICATELY PICKING THROUGH\nTHE RICE GRAINS OF THE\nHEADLINES, GENTLY WRAPPING THE\nINGREDIENTS IN THE FRESHEST\nHAND-PICKED NORI, AND CAREFULLY\nLAYING IT ALL OUT TO PRESENT TO\nYOU THE YAMANAKA LACQUERED\nTHREE-TIER JUBAKO BENTO BOX THAT\nIS MY NIGHTLY MONOLOGUE.\nBUT -- YOU KNOW WHAT I'M SAYING.\nYOU KNOW WHAT I'M SAYING.\nYOU KNOW WHAT'S COMING.\nBUT SOMETIMES, JUST SOMETIMES FOLKS, I AM SLAPPED\nAWAKE BY A POSSUM ON A BED OF\nWET TIRES NEAR A LONG-ABANDONED\nWHARF, SELL MY LAST REMAINING\nADULT TEETH TO A VAGRANT\nFISHMONGER FOR AN EXPIRED SALMON\nCARCASS, AND GRIND THAT INTO A\nSOFT PASTE, THEN ROLL IT IN ENOUGH\nSTALE RICE KRISPIES TO MAKE\nSNAP, CRACKLE, AND POP TAKE A\nHARD LOOK IN THE MIRROR, AND SERVE\nYOU THE NIGHT-SCREAM INDUCING\nHOBO HAND-ROLL OF NEWS THAT IS\nMY SEGMENT:\n\"MEANWHILE!\""
    },
    "QPDZbNEhsuw": {
        "begin": "1:31.0",
        "end": "2:15.0",
        "text": "YOU KNOW, FOLKS, I SPEND MOST\nOF MY TIME RIGHT OVER THERE,\nDIGGING DOWN INTO THE NEWS\nPIT AND MINING FOR YOU THE\nDAY'S\nCLEAREST STORY DIAMONDS,\nCLEAVING THEM INTO THE MOST\nTOPICAL CUTS, FACETING THEM,\nPOLISHING THEM TO A HIGH\nFINISH,\nTHEN SETTING THEM ALL IN A\nDELICATE 24 KARAT GOLD CHAIN\nTO\nCREATE THE BESPOKE CARTIER\nNECKLACE THAT IS MY NIGHTLY MONOLOGUE,\nBUT SOMETIMES, SOMETIMES FOLKS,\nI JUST HUFF A LITTLE TOO MUCH\nEPOXY\nAND STUMBLE DOWN TO AN\nABANDONED PIER, WHERE I FIND\nA PIECE OF\nDISUSED FISHING LINE AND\nSTRING IT WITH OLD BOTTLE\nCAPS, RUSTY\nPADLOCKS, AND BABY TEETH,\nTHEN RIP THE SEAT BELT OUT\nOF A\nBURNED-OUT POLICE CAR TO\nMAKE A CLASP, AND PARADE\nNAKED THROUGH\nA CHI-CHI'S WEARING THE\nPSYCHO CHOKER OF NEWS THAT\nIS MY\nSEGMENT:\n\"MEANWHILE.\""
    },
    "QjQbQlN9Ev8": {
        "begin": "3:53.0",
        "end": "4:35.0",
        "text": "YOU KNOW, FOLKS, I SPEND A LOT OF\nMY TIME RIGHT OVER THERE,\nHANGING FOR YOU THE DAY'S\nHOTTEST, MOST TOPICAL NEWS\nDECORATIONS, BOOKING THE SEXIEST\nBAND, CURATING THE MOST RELEVANT\nDRINKS MENU, DISTRIBUTING\nPLAYFUL, YET TASTEFUL PARTY\nFAVORS, AND GLITTER HATS THAT\nSAY \"2022,\" AND THEN SETTING THE\nMOST AU-COURANT LIGHTING TO\nTHROW FOR YOU THE UPSCALE,\n\"PITCH PERFECT\" NEW YEAR'S EVE\nPARTY THAT IS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES,\nFOLKS, I WAKE UP IN THE RAFTERS\nAFTER LOSING A BET TO A CROW.\nAND THEN I SPIKE THE PUNCH BOWL WITH\nCHLOROFORM AND MILITARY-GRADE\nHELICOPTER LUBRICANTS, MAKE A\nBUNCH OF RESOLUTIONS TO QUIT\nHUFFING WD-40, AND PUNCH A\nPOLICE HORSE DURING THE FUGITIVE\nNEW YEAR'S HO-DOWN OF NEWS THAT\nIS MY SEGMENT:\nMEANWHILE!"
    },
    "R6JV_I36It8": {
        "begin": "0:38.0",
        "end": "1:18.0",
        "text": "YOU KNOW, FOLKS, I SPEND\nA LOT OF TIME\nSHUCKING FOR YOU THE DAY'S MOST\nTOPICAL CLAMS, BONING THE\nFINEST, MOST CURRENT NEWS\nCHICKENS, AND COLLECTING THE\nHIGHEST QUALITY STORY SHRIMP\nAND SAFFRON RICE, THEN GENTLY\nSIMMERING IT ALL IN A CAST-IRON\nCOMEDY PAI-YERA, TO CREATE\nTHE FRAGRANT SEAFOOD PAELLA THAT\nIS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES FOLKS, I\nSHAMBLE DOWN TO THE DOCKS WITH A\nRUSTY CROWBAR, MANEUVER THROUGH\nTHE POLLUTED CANAL USING A\nMCDONALD'S STRAW AS A SNORKEL,\nAND SCRAPE THE BARNACLES OFF A\nPASSING GARBAGE SCOW, TOSS THEM\nIN A POT WITH SOME HALF-USED\nRAMEN FLAVORED PACKETS AND\nMOUNTAIN DEW, TO BREW FOR YOU\nTHE CHUNKY STEW OF NEWS THAT IS\nMY SEGMENT:\n\"MEANWHILE!\""
    },
    "RFVggCw58lo": {
        "begin": "1:00.0",
        "end": "1:43.0",
        "text": "YOU KNOW FOLKS, I SPEND A LOT OF TIME\nSTANDING RIGHT OVER THERE, PAINSTAKINGLY\nPENCILING THE DAY'S MOST TOPICAL\nAND HEROIC STORIES, STAGING THEM\nIN METICULOUSLY PLANNED PANELS,\nHAND-INKING THEM WITH THE\nPITHIEST DIALOGUE, THEN COLOR\nBALANCING THE FINISHED PAGES TO\nCREATE FOR YOU THE GENRE-BENDING\nONCE-IN-A-GENERATION GRAPHIC\nNOVEL THAT IS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES FOLKS, I JUST TEAR A PAGE OUT\nOF A WET NEWSPAPER I PEELED OFF\nA SUBWAY SEAT, GRAB A BROKEN\nCRAYON I FOUND IN MY COUCH,\nCRUDELY DRAW SOME FILTHY\nCARTOONS, SCRIBBLE IN AS\nMANY CURSE WORDS AS I CAN IN THE\nPORNOGRAPHIC DRIFTER ZINE OF\nNEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "RHDQpOVLKeM": {
        "begin": "0:39.0",
        "end": "2:11.0",
        "text": "YOU KNOW, FOLKS, I\nSPEND MOST OF MY TIME, RIGHT\nOVER THERE, PORING OVER THE\nDAY'S BIGGEST STORIES,\nCOLLECTING THE FINEST,\nMOST-TOPICAL NEWS CALFSKINS AND\nPAINSTAKINGLY WASHING THEM IN A\nCALCIUM HYDROXIDE SOLUTION, THEN\nSOAKING THEM IN LIME FOR DAYS TO\nREMOVE ALL NARRATIVE IMPURITIES\nAND CREATE A PALE VELLUM THAT I\nLATER PLACE ON MY SCRIPTORIUM IN\nA MONASTERY ON THE CLIFFS OF\nDOVER.\nTHERE, USING A PEN CUT FROM THE\nWING FEATHER OF A SWAN OF THE\nRIVER AVON, I DESIGN COPTIC AND\nSYRIAC ILLUSTRATIONS, ADORNED\nWITH WHIMSICAL CELTIC SPIRALS\nAND USE GERMANIC ZOOMORPHIC\nDESIGNS TO CREATE THE MARGINALIA\nSURROUNDING THE PAGES OF ELEGANT\nHALF-UNCIAL INSULAR SCRIPT THAT\nTELL THE HOLIEST OF STORIES,\nWHICH I THEN BIND WITH GOLDEN\nTHREAD UNDER A PROTECTIVE CASE\nOF CARVED OAK TO CREATE FOR YOU\nTHE GLORIOUS LATE\nANGLO-SAXON PERIOD ILLUMINATED\nMANUSCRIPT THAT IS MY MONOLOGUE.\nBUT SOMETIMES, FOLKS, SOMETIMES,\nTHESE PEOPLE KNOW, THEY KNOW, THEY KNOW\nBUT SOMETIMES, FOLKS,\nI COME TO UNDER A RAMP IN THE\nMIDDLE OF A DEMOLITION DERBY,\nHOTWIRE THE TRUCKASAURUS\nAND LEAD THE POLICE ON A CHASE\nBEFORE CRASHING INTO A SWAMP\nGATHERING JUST AS, WHAT I ASSUME\nIS A PRIEST SAYS, \"YOU MAY KISS\nTHE BRIDE,\" RIP A LEECH OFF MY\nASS, AND USE IT TO HASTILY\nDOODLE A SKETCH OF THE SCENE IN\nMY OWN BLOOD ON AN OLD DAVE AND\nBUSTERS RECEIPT, THEN STAGGER\nTOWARD THE HAPPY COUPLE\nCLUTCHING THE NIGHTMARE\nSTALKER'S WEDDING ALBUM OF NEWS\nTHAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "TZSw9iRk03E": {
        "begin": "0:22.5",
        "end": "1:08.5",
        "text": "FOLKS, YOU KNOW I\nSPEND A LOT\nOF RIGHT TIME OVER THERE, CAREFULLY\nSTUDYING THE LATEST, NEWSIEST\nCLINICAL STUDIES, PRACTICING AND\nTRAINING UNDER THE BEST, MOST\nTOPICAL DOCTORS, CAREFULLY\nSTERILIZING ALL MY EQUIPMENT,\nAND ASSEMBLING THE WORLD'S GREATEST\nSURGICAL TEAM TO PERFORM FOR YOU THE\nDAZZLINGLY COMPLEX AND\nGROUNDBREAKING THORACIC AORTIC\nDISSECTION REPAIR THAT IS MY\nMONOLOGUE.\nBUT SOMETIMES, SOMETIMES,\nI GET KICKED OUT OF MY HEALTH CARE\nPLAN FOR LISTING MY DOG BENNY AS\nA GASTROENTEROLOGIST, SO I\nSTIPPLE SOME INCISION MARKS ON\nMY ABDOMEN WITH A DRIED-OUT\nSHARPIE, SLAM A COUPLE OF RED BULLS\nIN FRONT OF A SHATTERED MIRROR,\nAND FISH A RUSTY BONING KNIFE\nOUT OF A STOLEN CRAB BOAT TO\nPERFORM THE EXPLORATORY HOBO\nAPPENDECTOMY OF NEWS THAT IS MY\nSEGMENT:\n\"MEANWHILE!\""
    },
    "VV3UJmb8kHw": {
        "begin": "2:04.0",
        "end": "2:54.0",
        "text": "YOU KNOW, FOLKS,\nIF YOU WATCH THE SHOW, YOU KNOW\nI SPEND A LOT OF MY TIME RIGHT\nOVER THERE.\nCAREFULLY COMBING THROUGH THE\nBIGGEST STORIES OF THE DAY,\nSOURCING FOR YOU THE NEWSIEST\nMIKADO ORGANZA IN A HIGH SHEEN,\nADDING THE MOST TOPICAL IVORY\nFEATHER FRINGE AND A DIPPED\nBACK, THEN THROWING ON A DEMURE\nBUT KICKY FLORAL EMBROIDERED\nTULLE SHRUG WITH STATEMENT PEARL\nACCENTS TO PRESENT TO YOU THE\nGLORIOUS \"VOGUE\" COVER-READY\nWEDDING GOWN THAT IS MY\nMONOLOGUE.\nBUT SOMETIMES, WHILE ON A\nGLUE-HUFFING BINGE, I CRASH A\nSTOLEN HEARSE INTO AN ABANDONED\nCHILDREN'S HOSPITAL WHERE I USE\nMY TEETH TO TEAR UP SOME OLD\nCURTAINS AND STAINED CARPETING,\nAND STEAL A BUTTON OFF AN OLD\nSURGICAL APRON, AND STITCH IT\nALL TOGETHER WITH A NEEDLE MADE\nFROM A CHICKEN BONE TO THROW\nTOGETHER THE SHRIEKING CAT\nLADY'S SACK DRESS OF NEWS THAT\nIS MY SEGMENT: \"MEANWHILE.\""
    },
    "VYVbTzoggKc": {
        "begin": "0:00.0",
        "end": "0:49.0",
        "text": "FOLKS, YOU KNOW, I\nSPEND A LOT OF\nMY TIME ON THE SHOW-- IF YOU\nWATCH THE SHOW YOU'D FIGURE\nTHIS OUT,\nRIGHT OVER THERE, STANDING RIGHT OVER THERE IN THE MONOLOGUE SPELUNKING\nTHROUGH THE DAY'S STORIES TO\nSELECT AND SOURCE THE\nNEWSIEST MARBLE, CHISELING\nIT INTO A\nPEDESTAL OF HUMOR AS WIDE AS\nTWO GREEK ISLES.\nTHEN I CAST THE MOST TOPICAL\nCURRENT-EVENTS-BRONZE INTO A\nFINELY CRAFTED MOULD TO\nERECT FOR YOU THE TOWERING\nGRECIAN\nCOLOSSUS THAT IS MY NIGHTLY\nMONOLOGUE.\nBUT SOMETIMES, JUST\nSOMETIMES, FOLKS, I JOLT\nAWAKE INSIDE\nWHAT'S LEFT OF A RUSTED\nMAZDA MIATA IN A WHITE CLAW\nAND\nOVEN-CLEANER-INDUCED FUGUE\nSTATE, SHAMBLE THROUGH THE\nJUNKYARD, RANSACKING THE\nDEBRIS FOR OLD FISHING RODS,\nMELTED\nBATTERIES AND THE SHOVEL OF\nA DERELICT BACKHOE, AND THEN\nBOOST AN\nACETYLENE TORCH TO HASTILY\nWELD TOGETHER THE BOOTLEG\nTRUCKASAURUS OF NEWS THAT\nIS MY SEGMENT:\nMEANWHILE!"
    },
    "WWWeV8xVNtI": {
        "begin": "2:00.0",
        "end": "2:35.0",
        "text": "YOU KNOW, FOLKS, I SPENT A LOT\nOF TIME CAREFULLY RESEARCHING\nTHE DAY'S MOST CULTURALLY\nPRECIOUS STORIES,\nCROSS-REFERENCING HISTORICAL\nACCOUNTS WITH TOPOGRAPHICAL\nMAPS, AND ASSEMBLING THE FINEST\nTEAM OF ARCHAEOLOGISTS TO\nUNEARTH THE UNESCO WORLD\nHERITAGE EXCAVATION SITE OF\nHUMOR THAT IS MY MONOLOGUE.\nBUT SOMETIMES, I DISTRACT AN\nORPHAN WITH A PIECE OF\nLINT-COVERED CANDY AND STEAL\nTHEIR BUCKET AND PAIL, THEN\nSNEAK INTO THE POTTER'S FIELD IN\nTHE DEAD OF NIGHT WITH TWO\nDRIFTERS I PICKED UP ON THE\nCOMMUTER TRAIN, AND FORCE THEM\nTO DIG FOR THE ABANDONED\nPAUPER'S GRAVE\nOF NEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "XzJAtzdrY_w": {
        "begin": "2:57.0",
        "end": "4:32.0",
        "text": "FOLKS, IF YOU WATCH THIS SHOW, YOU KNOW I SPEND MOST\nOF MY TIME, RIGHT OVER THERE,\nCAREFULLY COMBING THE NEWS\nLANDSCAPE AND HARVESTING THE\nFINEST, MOST BEAUTIFUL STORY\nPIGMENTS LIKE MALACHITE,\nAZURITE, AND CINNABAR, WHICH I\nSLOWLY GRIND UNDER A GLASS\nOF MULLER WITH ONLY THE MOST\nTOPICAL LINSEED OIL, WORKING\nTHEM INTO SMOOTH, BUTTERY\nVERMILLIONS, VERDIGRIS, AND NEW\nGAMBOGES, WHICH I THEN APPLY TO\nA GRISAILLE PREPARED ON A CANVAS\nOF FLAX, TOW, AND JUTE, SLOWLY\nWORKING UP THE SHADOW\nSHAPES AND MAJOR MASSES, THEN\nDELICATELY RENDERING THE\nINTERPLAY OF LIGHT AND FORM,\nBEFORE APPLYING THE FINE DAMMAR\nAND MASTIC VARNISH TO UNVEIL FOR\nYOU THE GLORIOUS REMBRANDT\nPORTRAIT OF THE DAY'S EVENTS\nTHAT IS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES--\nSOMETIMES, FOLKS, SOMETIMES\nI'M SHAKEN AWAKE INSIDE THE\nDARKENED TRUNK OF A BULGARIAN\nMOBSTER'S VOLVO 940, I QUIETLY\nRELEASE THE SAFETY CATCH AND\nTUMBLE ONTO THE SIDE OF A DIRT\nROAD, BREAKING BOTH CLAVICLES,\nWHICH I DO NOT FEEL BECAUSE OF\nALL THE ANGEL DUST.\nI STAGGER INTO AN ABANDONED\nTANNERY WHERE I BEFRIEND AN OWL\nWHO TELLS ME TO I HAVE TO LET\nHIM SPEAK THROUGH ME OR HE'LL\nMURDER THE CLOUDS.\nAND IN HIS DIRECTION, I MIX THE\nFUN DIP I FOUND IN MY POCKET\nWITH THE FISTFULS OF HEXAVALENT\nCHROMIUM I SCOOP UP FROM THE\nDISUSED TANNING PITS, THEN HURL\nIT AT THE SIDE OF A NEARBY\nDEFUNCT DAIRY QUEEN IN A FUGUE\nSTATE OF LASHING OUT AT LIGHT\nAND COLOR, TO UNLEASH FOR YOU\nTHE ABSTRACT EXPRESSIONIST\nSPLATTER FRESCO OF NEWS THAT IS\nMY SEGMENT:\nMEANWHILE!"
    },
    "YyV6l8HPmdQ": {
        "begin": "0:48.0",
        "end": "1:32.0",
        "text": "FOLKS, LADIES AND\nGENTLEMEN, YOU KNOW I SPEND A\nLOT\nOF TIME DELICATELY WHITTLING A\nMELANGE OF THE DAY'S MOST\nPRESSING STORY TIMBERS,\nPRECISELY MEASURING THE NECKS,\nRIBS, AND BACKS OF THE NEWS,\nEMPLOYING ONLY THE MOST\nSOPHISTICATED AND TOPICAL\nPURFLING, THEN LAYING 15\nEXQUISITE COATS OF INSIGHT ONTO\nTHE ORNATE YET ROBUST\nSTRADIVARIUS VIOLIN THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES, FOLKS,\nI GATHER UP FRAYED ELECTRICAL\nWIRE FROM A BURNT-OUT BOWLING\nALLEY, TAPE IT TO A\nTERMITE-INFESTED 2-by-4, THEN SHOVE\nONE END TO A DISCARDED CHUM\nBUCKET TO MAKE FOR YOU THE APPALACHIAN\nDRIFTER'S BANJO OF NEWS THAT IS\nMY SEGMENT:\n\"MEANWHILE!\""
    },
    "a8DD__mRtPk": {
        "begin": "1:13.0",
        "end": "1:58.0",
        "text": "FOLKS, YOU KNOW, I SPEND MOST OF\nMY TIME GATHERING FOR YOU THE LATEST,\nMOST CUTTING-EDGE NEWS STORIES,\nCAREFULLY EXAMINING THE DAY'S\nCT SCAN, THEN ASSEMBLING\nAMERICA'S CRACK MEDICAL TEAM,\nAND MAKING PRECISE INCISIONS\nWITH THE AID OF A STRYKER 1588\nLAPAROSCOPE IN THE\nGROUNDBREAKING SURGICAL ARTISTRY\nTHAT IS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES,\nFOLKS, WHEN I NEED A LITTLE EXTRA CASH TO\nPAY OFF MY COCK-FIGHTING DEBTS,\nI SET UP A RUSTY COT UNDER A\nTARP IN WASHINGTON SQUARE PARK,\nWHERE I PLY CURIOUS PASSERSBY\nWITH BATHTUB COUGH SYRUP TO HELP\nDULL THE PAIN WHILE I USE GARDEN\nSHEARS TO CUT OUT ANYTHING THAT\nLOOKS SUPERFLUOUS IN THE AMATEUR\nAPPENDECTOMY TENT OF NEWS\nTHAT IS MY SEGMENT...\nMEANWHILE!"
    },
    "cHhomJMwY1I": {
        "begin": "2:42.0",
        "end": "3:24.0",
        "text": "FOLKS, I SPEND A LOT OF TIME\nSTANDING RIGHT OVER THERE,\nCOMBING THROUGH HOURS UPON HOURS\nOF GAME TAPE ON THE MOST PROMISING\nHEADLINES, METICULOUSLY CRAFTING\nMY BIG BOARD TO RANK STORIES\nBASED ON THEIR RAW TALENT AND\nINTANGIBLES, AND CUT DEALS FOR\nTHE MOST TOPICAL TRADES TO DRAFT\nTHE ONCE-IN-A-GENERATION,\nHEISMAN-WINNING QUARTERBACK THAT\nIS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES I\nWAKE UP IN AN ICE BATH AFTER\nDOING RAILS OF GATORADE POWDER,\nREALIZE IT'S DRAFT DAY, AND I\nHAVE 15 SECONDS LEFT TO MAKE A\nCHOICE AND BLURT OUT THE FIRST\nNAME I SEE TO WASTE THE NUMBER\nONE OVERALL PICK ON THE SCRAWNY,\nUNPOLISHED THIRD-STRING PUNTER\nOF NEWS THAT IS MY SEGMENT,\n\"MEANWHILE!\""
    },
    "hhwTiwUAaf8": {
        "begin": "1:06.5",
        "end": "1:54.5",
        "text": "YOU KNOW, FOLKS, I SPEND MOST\nOF MY TIME SOURCING FOR YOU THE DAY'S\nFINEST HANGZHOU SILK NEWS\nSTORIES, MOUNTING THEM ON THE\nMOST TOPICAL, PREMIUM,\nPOLYHEDRAL BAMBOO JOKE FRAME,\nDECORATING IT WITH ARTISANAL ASH\nINK, INSERTING A HAND-POURED\nBEESWAX CANDLE, FILLING IT WITH\nINTENTION AND THEN SENDING IT ALOFT\nON THE UPDRAFT OF AUDIENCE\nLAUGHTER IN THE SPECTACULAR\nCHINESE LANTERN FESTIVAL THAT IS\nMY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES, FOLKS, I GO VISIT MY\nBUDDY, BARRACUDA AT THE\nABANDONED MALL, DROP A COUPLE OF\nHUNDOS ON SOME WET ROMAN\nCANDLES, BENT SPARKLERS SMUGGLED\nIN FROM THE PHILIPPINES, AND A\nFLARE GUN STOLEN FROM A CRASHED\nCOAST GUARD BOAT, SET IT\nALL OFF IN THE DERANGED,\nUNREGULATED FIREWORKS ACCIDENT\nOF NEWS THAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "iB6diOGE8y4": {
        "begin": "0:52.8",
        "end": "1:50.8",
        "text": "YOU KNOW, IF YOU WATCH THE SHOW, YOU KNOW I SPEND MOST OF MY TIME\nRIGHT OVER THERE, COMBING\nTHROUGH THE DAY'S BIGGEST NEWS,\nAND SELECTING FOR YOU THE\nFINEST, MOST TOPICAL INDIAN\nROSEWOOD, SPRUCE, AND MAHOGANY\nSTORIES.\nI THEN HAND-SHAPE AND COMBINE\nTHEM WITH AN ABALONE\nMULTI-STRIPE BACK INLAY, AND\nFORWARD-SHIFTED SCALLOPED\nBRACES, ANTIQUE WHITE BINDING,\nAND A HIGH-PERFORMANCE NECK WITH\nA HEXAGON FINGERBOARD, AND\nFINALLY LAY IN A\nTORTOISE-PATTERN PEARL PICK\nGUARD, AND A COMPENSATED\nBONE SADDLE, TO CRAFT FOR YOU\nTHE EXQUISITE MARTIN D-45\nDREADNOUGHT ACOUSTIC GUITAR\nTHAT IS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES, FOLKS, I SNAP\nAWAKE IN A RUSTY COFFIN FREEZER\nBEHIND AN ABANDONED DAIRY QUEEN\nOUTSIDE OF GALVESTON.\nTHEN I NAIL A 2-BY-4 TO A CEDAR URN\nI STOLE FROM A FUNERAL PARLOR, STRING\nON SOME BRAKE CABLES I RIPPED\nOUT OF A COP CAR, THEN CUT EYE\nHOLES IN A GOODWILL BAG FOR A MASK,\nHIT A WHIPPET, AND TERRORIZE\nTHE LOCALS ON THE TEXAS CHAINSAW\nBANJO OF NEWS THAT IS MY\nSEGMENT:\n\"MEANWHILE!\""
    },
    "iQFwGF0aW-o": {
        "begin": "2:10.7",
        "end": "3:33.7",
        "text": "I SPEND A LOT OF MY TIME, RIGHT\nOVER THERE, CULTIVATING FOR YOU THE\nDAY'S BIGGEST STORIES,\nPLUCKING THE MOST BEAUTIFUL AND\nTOPICAL NEWS VIOLETS AND\nMARIGOLDS, STRIPPING THE FRENCH\nLAVENDER FROM THE STEM AND\nLOVINGLY PRESSING THEM ALL\nBETWEEN THE PAGES OF A GILDED\nFIRST EDITION OF \"PRIDE AND\nPREJUDICE.\"\nTHEN I FOLD THEM INTO A DOUGH I\nHAND ROLLED FROM PASINI BAKERY\nFLOUR, BORDIER BUTTER, AND\nCHILLED SOMERDALE DEVON DOUBLE\nCREAM, SPRINKLE THEM WITH A\nPINCH OF MUSCOVADO SUGAR, AND\nBAKE THEM IN A \"LA CORNUE GRAND\nPALAIS\" RANGE TO PERFECTLY PREP\nTHE GOURMET FLORAL SHORTBREAD\nCOOKIE THAT IS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, FOLKS, SOMETIMES,\nSOMETIMES, I AM NIBBLED AWAKE BY AN AMOROUS\nRACCOON IN THE ABANDONED WALK-IN\nFREEZER OF A HAUNTED BAKERY IN\nWHICH I HAVE ESTABLISHED\nSQUATTER'S RIGHTS, I SLIP INTO\nTHE TWISTED KITCHENAID PADDLES I\nCALL SHOES, AND KNIFE FIGHT A\nPOSSUM FOR AN EXPIRED BAG OF\nCRUSHED BREAKFAST CEREAL DUST\nAND A BROKEN EGG, WHICH I MIX\nWITH THREE SMUSHED RESTAURANT\nBUTTER PACKETS I STOLE FROM A\nNAPPING RETIREE'S PURSE, POUR\nTHE REST OF A SHATTERED BOTTLE\nOF RUBBING ALCOHOL I FOUND IN\nTHE DUMPSTER OUT BACK INTO A\nRUSTY BARREL TO IGNITE THE HOBO\nFIRE OVER WHICH I BAKE MY\nSLUDGE, THEN DISPLAY IT IN A\nFILTHY CHEF'S HAT TO SERVE YOU\nTHE DERANGED RAT KING BISCUIT OF\nNEWS THAT IS MY SEGMENT:\n\"MEANWHILE\"!"
    },
    "jIL7kvG7d10": {
        "begin": "1:55.3",
        "end": "2:40.3",
        "text": "FOLKS, YOU KNOW, I SPEND A LOT\nOF TIME, RIGHT OVER THERE\nCAREFULLY PLANTING AND TENDING\nTO THE DAY'S BIGGEST, MOST\nIMPORTANT STORIES, TRIMMING\nTHE TOPICAL HEDGES WITH DELICATE\nEXACTITUDE, RESEARCHING AND\nSEEDING THE SOIL OF TODAY'S\nNEWS IN ORDER TO YIELD THE MOST\nBEAUTIFUL, FRAGRANT JOKE\nFLOWERS, AND PRECISELY TIMING\nTHE BLOOM IN THE EXQUISITE\nENGLISH GARDEN THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES FOLKS, I DRIVE A 2003\nPONTIAC SUNFIRE THROUGH A HOME\nDEPOT GARDEN CENTER, DOWN A JUG\nOF MIRACLE GRO, SMEAR MY BODY IN\nMUD AND PEA GRAVEL, BUILD A FORT\nOUT OF PAVERS, PLOP A SUCCULENT\nDISH GARDEN ON MY HEAD, AND\nBARRICADE MYSELF INSIDE A\nPORTABLE TOOL SHED TO CREATE THE\nPARANOID BACKYARD STANDOFF OF\nNEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "jpq8eXZcvpo": {
        "begin": "0:52.5",
        "end": "1:37.5",
        "text": "FOLKS, YOU KNOW, I SPEND A LOT OF MY TIME\nRIGHT OVER THERE, SORTING\nTHROUGH THE DAY'S TOP STORIES,\nCAREFULLY SELECTING FOR YOU THE\nFRESHEST, MOST TOPICAL\nNEWS-FRUIT, ARTFULLY CARVING IT\nINTO SATIRICAL SHAPES, DIPPING\nIT IN THE FINEST ARTISANAL\nCHOCOLATE, AND GENTLY PLACING THEM\nINTO THE FLAWLESSLY COMPOSED AND\nDELICIOUS EDIBLE ARRANGEMENT\nTHAT IS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES,\nFOLKS, I WAKE UP FACE DOWN IN\nTHE RECYCLING BIN BEHIND A JAMBA\nJUICE, FIGHT A SEAGULL FOR THE\nDISCARDED CANTALOUPE RINDS\nAND PINEAPPLE STEMS, DIP THEM\nINTO A BUCKET OF DIESEL SIPHONED OFF\nFROM A SEMI FULL OF UNWASHED\nBIRD BONES, WHICH I USE TO\nSKEWER TOGETHER MY GARBAGE\nKEBABS, THEN STAB THEM ONTO A\nWATERLOGGED TEDDY BEAR TO CREATE\nTHE CRIMINALLY INSANE NIGHTMARE\nGIFT BASKET OF NEWS THAT IS MY\nSEGMENT:\n\"MEANWHILE!\""
    },
    "jq2LhJ9rMpg": {
        "begin": "2:52.0",
        "end": "3:34.0",
        "text": "YOU KNOW, I SPEND A LOT OF TIME\nRIGHT OVER THERE, PULLING\nTOGETHER THE FINEST, NEWSIEST\nLIMESTONE, CHISELING IN THE MOST\nDELICATE AND TOPICAL OF\nBAS-RELIEF, AND THE MOST ORNATE\nARCHES, MAKING SURE THERE'S NARY\nA BAD SEAT IN THE HOUSE, THEN\nASSEMBLING THE MOST FEARSOME\nNEWS WARRIORS THE ARENA HAS EVER\nSEEN TO CONSTRUCT FOR YOU THE\nROMAN COLOSSEUM THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES FOLKS, I WAKE UP MY\nNEIGHBOR AT 3:00 IN THE MORNING,\nDRAG HIM INTO MY SHED, WHERE\nI'VE SET UP A KIDDIE POOL I\nBOUGHT 20 YEARS AGO AND FILLED\nWITH EXPIRED JELL-O AND LUKEWARM\nBEER, HUFF SOME ACETONE OUT OF A\nPRICE CHOPPER BAG, THEN\nCHALLENGE HIM TO JOIN ME IN THE\nOLD MAN WRESTLING LEAGUE OF\nNEWS THAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "lWyia3aF92o": {
        "begin": "1:44.0",
        "end": "2:30.0",
        "text": "FRIENDS, EVERY NIGHT I STAND\nRIGHT OVER THERE, AND I CAREFULLY WORK ON THE LIGHTING\nAND STAGING OF THE DAY'S MOST\nTOPICAL NEWS STORIES, COMPOSING\nGROUNDBREAKING ORCHESTRAL\nARRANGEMENTS TO SUPPORT THEM,\nAND THEN METICULOUSLY CHOREOGRAPHING\nTHEM AND MYSELF INTO A DELICATE,\nHEARTBREAKING, AND YET UPLIFTING\nPAS DE DEUX, TO PRESENT FOR YOU\nTHE EPOCH-DEFINING BALLET THAT\nIS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES FOLKS, I FISH A STAINED\nVELOUR JUMPSUIT OUT OF A CANAL,\nHOOK A RADIO I RIPPED OUT OF A\nGARBAGE TRUCK TO AN ABANDONED\nCAR BATTERY, AND SLAP THE DIAL\nTHROUGH FUZZ TILL IT LANDS ON A\nRANDOM A.M. OLDIES STATION, AND THEN\nSHAKE MY ASS FOR NICKELS IN THE\nDEMENTED VAGRANT MACARENA OF\nNEWS OF THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "ldTzn1RpsNY": {
        "begin": "1:00.0",
        "end": "1:48.0",
        "text": "FOLKS, I SPEND A LOT OF TIME\nRIGHT OVER THERE, COMPILING THE\nMOST CURRENT GEOMETRY QUESTIONS,\nSPRINKLING IN A TOPICAL SET OF\nDATA ANALYSES, FOLDING THEM\nTOGETHER ALONG THE NEWSWORTHIEST\nWORD PROBLEMS, THEN PAIRING ALL\nOF THAT WITH THE DAY'S MOST\nPRESSING READING PASSAGES TO\nCOLLATE FOR YOU THE PERFECTLY\nCALIBRATED, BESPOKE S.A.T. TEST\nTHAT IS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES,\nI HUFF A PILE OF SALVIA AND\nSTAGGER INTO A LANDFILL WHERE I\nFORAGE FOR CRUSTY OLD SUDOKUS,\nGRAB A SACKFUL OF USED AND WET\nMADLIBS, AND CRAZY-GLUE THEM INTO\nTHE SPINE OF A DISCARDED\n\"READER'S DIGEST\" I FOUND IN A\nBURNT-OUT WALDENBOOKS, TO\nPRESENT TO YOU THE ILLEGIBLE\nHOBO BUZZFEED QUIZ OF NEWS THAT\nIS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "lgH-itFA_hg": {
        "begin": "1:53.2",
        "end": "2:33.2",
        "text": "FOLKS, I SPENT A LOT OF TIME\nSTANDING RIGHT OVER THERE, OKAY,\nSETTING\nUP MY NEWS EASEL, LAYING OUT THE\nMOST TOPICAL BRUSH STROKES,\nCHOOSING THE MOST RELEVANT\nCOLORS, ALL TO FAITHFULLY\nCAPTURE FOR YOU, THE SOUL OF THE\nSTORIES OF THE DAY IN THE\nOIL-ON-CANVAS MASTERPIECE THAT\nIS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES -- JUST SOMETIMES,\nFOLKS -- I SET A LIQUOR\nSTORE ON FIRE AND COME BACK THE\nNEXT DAY TO SCRAPE SOME CHARCOAL\nOFF THE BURNT TIMBERS, USE THE\nCARDBOARD FROM THE DISCARDED\nREFRIGERATOR BOX I'VE BEEN\nCALLING HOME FOR THE WEEKEND,\nTHEN HARASS TOURISTS TO ETCH THE\nOFFENSIVE BOARDWALK CARICATURE\nOF NEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "ll5DeZrejsM": {
        "begin": "2:02.7",
        "end": "2:52.7",
        "text": "YOU KNOW, FOLKS,\nIF YOU WATCH THE SHOW, YOU KNOW\nI SPEND A LOT OF TIME RIGHT OVER\nTHERE,  COMBING THROUGH THE\nDAY'S BIG STORIES, SELECTING THE\nFINEST NEWS TENORS AND THE\nSILKIEST SOPRANOS.\nTHEN, I BRUSH UP ON THE WORKS OF\nCERVANTES AND FIND THE PERFECT\nSWEET SPOT BETWEEN DRAMA, HUMOR,\nAND OPERA TO COMPOSE FOR YOU THE\nTIMELESS AND SEDUCTIVE SPANISH\nZARZUELA THAT IS MY\nMONOLOGUE.\nBUT SOMETIMES, SOMETIMES, FOLKS,\nI WAKE UP IN THE FREEZER OF A\nCOMBINATION TACO BELL PIZZA\nHUT ON THE EXPRESSWAY, AND I CUT A\nPAIR OF LEG HOLES INTO A POTATO\nSACK AND RACE BAREFOOT INTO THE\nCITY TO BREAK INTO AN ABANDONED\nDOLLAR STORE, WHERE I FASHION A\nPAIR OF CASTANETS FROM DEFECTIVE\nCHATTERING TEETH TOYS.\nTHEN I DOWN A JERRY CAN FULL OF\nRED BULL AND COUGH MEDICINE\nBEFORE STAGGERING INTO A PUBLIC\nPARK TO DISTURB TOURISTS WITH\nTHE DRIFTER'S FLAMENCO SHOWCASE\nOF NEWS THAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "lzviJMlii7A": {
        "begin": "1:36.5",
        "end": "3:21.5",
        "text": "LADIES AND GENTLEMEN, YOU KNOW, IF YOU\nWATCH THIS SHOW, YOU KNOW I\nSPEND A LOT OF MY TIME RIGHT OVER\nTHERE, METICULOUSLY SIFTING\nTHROUGH THE DAILY NEWS\nDESERT, HARVESTING THE FINEST,\nMOST TOPICAL MINERAL SANDS--\nABOUT 65% SILICA, 10% FLUX\nOF SODIUM OXIDE, AND A\nSTABILIZER OF CALCIUM OXIDE--\nWHICH I THEN SMELT IN A\nHIGH-TEMPERATURE CALCERA FURNACE\nAT 1,200 TO 1,400 DEGREES\nCELSIUS, FUSING THEM INTO LIQUID\nGLASS, THEN CAREFULLY DROPPING\nMY FURNACE TEMPERATURE SO I CAN\nFOLD IN HAND-SELECTED CULLET AND\nCOBALT TO OBTAIN MY INTENDED\nCOLOR AND CREATE THE MOST\nPRISMATIC NEWS CRYSTALS, WHICH\nI THEN DELICATELY HANG ON A\nHAND-CRAFTED BALUSTER OF\nHEADLINES, ARRANGING THEM TO\nCATCH AND REFRACT, IN A GENTLE\nDANCE OF LIGHTS AND SHADOWS, THE\nMOST TOPICAL REFLECTIONS OF THE\nDAY, ADORNING THE ARRANGEMENT\nWITH ONE FINAL FINIAL OF QUIPS\nTO PRESENT TO YOU THE VENETIAN\nMURANO GLASS CHANDELIER THAT IS\nMY MONOLOGUE.\nBUT SOMETIMES FOLKS, SOMETIMES,\nFOLKS, SOMETIMES, I JOLT AWAKE\nNAKED IN THE BACK BOOTH OF A\nLONG-ABANDONED COUNTY FAIR, I\nPULL ON SOME OVERALLS I STOLE\nOFF A SCARECROW, AND CLAW\nTHROUGH THE GROUNDS SCRAPING THE\nLEAVINGS OFF SOME DISUSED\nSPARKLERS, PICK THROUGH BROKEN\nCOKE BOTTLES AND BIRTHDAY\nCANDLES, BOIL OFF THE REMNANTS\nIN A DISCARDED TUB OF LAUNDRY\nDETERGENT TO EXTRACT THE\nBENZENE.\nTHEN, USING THE SHOELACES I TOOK\nOFF A HOBO SLEEPING UNDER\nTHE FERRIS WHEEL AND DENTAL\nFLOSS CURRENTLY IN USE BY SAID\nHOBO, I BIND THE CONGLOMERATE\nOF SHARDS AND ACCELERANT\nTOGETHER AND HOLD IT NEAR THE\nSPUTTERING SPARK PLUG OF AN\nOLD ICE CREAM TRUCK TO IGNITE\nTHE CHAOTIC CANDELABRA OF\nFLAMING NEWS THAT IS MY SEGMENT:\n\"MEANWHILE\"!"
    },
    "okJDGV6Jjmc": {
        "begin": "2:03.0",
        "end": "2:55.0",
        "text": "YOU KNOW, FOLKS, I SPEND A LOT\nOF TIME RIGHT OVER THERE, PORING\nOVER THE DAY'S NEWSIEST, MOST\nTOPICAL NAUTICAL RECORDS TO\nDETERMINE THE ROUGH POSITIONS OF\nTHE DAY'S TRENDING SHIPWRECKS.\nTHEN I USE THE LATEST SONAR TECH\nTO LOCATE AND FIND THE\nORIENTATION OF THE FINEST\nSALVAGE SITE.\nTHEN MY TEAM OF CERTIFIED AND\nLICENSED COMEDY DIVERS DESCEND\nTO THE OCEAN FLOOR AND USE\nCUTTING EDGE UNDERWATER CAMERAS\nTO STITCH TOGETHER THE DETAILED\n3D-MODELED HISTORIC ATOSHA\nSHIPWRECK SITE OF SATIRICAL\nOBSERVATIONS THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES, FOLKS,\nI WAKE UP NAKED BY THE DOCKS\nCOVERED IN PIRATE TATTOOS, BLAST\nA STRING OF WHIPPETS, THEN\nSTAGGER INTO THE FESTERING\nHUDSON RIVER, WHERE I SLOWLY\nSINK THROUGH THE MURK UNTIL I\nIMPALE MYSELF ON THE RUSTY AXLE\nOF A SUNKEN TAXI IN THE\nTETANUS-LACED CRIME SCENE OF\nNEWS THAT IS MY SEGMENT...\n\"MEANWHILE!\""
    },
    "pbR-kF0PjlA": {
        "begin": "1:09.0",
        "end": "2:43.0",
        "text": "WELL FOLKS, I SPEND A LOT\nOF MY TIME, ON THE SHOW, RIGHT OVER THERE,\nWANDERING THROUGH THE\nFARMERS' MARKET THAT IS TODAY'S\nBIGGEST STORIES, SQUEEZING THE\nFINEST NEWS RADISHES, THE RIPEST\nSTORY PEPPERS, SNIFFING THE MOST\nTOPICAL DATES, WHICH I THEN PAIR\nWITH FRA'MANI SOPPRESSATA, AND\nTHE MOST SUCCULENT HANDRAISED\nPATA NEGRA JAMON IBERICO, BACKED\nUP BY GENEROUS HELPINGS OF\nBEEMSTER GOUDA, AND A WEDGE OF\nBRILLAT-SAVARIN TRIPLE CREAM\nBRIE, THEN I ADD FORTNUM AND MASON\nAPRICOT AND FIG SPREADS WITH\nGRISSINI BREADSTICKS AND LA\nPANZANELLA CROCCANTINI, AND\nFINALLY LIBERAL SPRINKLINGS OF\nSAN SABA ELLIOT PECANS AND\nSICILIAN CASTEL-VETRANO OLIVES\nON A RAW CARRARA MARBLE SLAB TO\nLAY OUT FOR YOU THE SPECTACULAR\nGOURMET CHARCUTERIE BOARD THAT\nIS MY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES FOLKS, SOMETIMES, I AM HOSED AWAKE\nINSIDE AN EMPTY 6,000 GALLON\nDIESEL TANKER OFF OF I-24, WHERE I\nAM HIDING FROM A CULT THAT I\nSTARTED, THEN DASH, NAKED BEHIND\nA RECENTLY DEFUNCT QUIZNOS,\nWHERE I MUST WRESTLE A POSSUM\nFOR THE REMAINS OF A BAJA\nCHICKEN FOOTLONG, STAGGER INTO A\nMIDDLE SCHOOL REC. YARD AFTER\nFIGHTING A SEAGULL FOR THE LAST\nHAM CUBE IN A LUNCHABLES TRAY,\nPUNCH A RACCOON TO STEAL HIS\nPEANUT, THEN DUMP IT ALL INTO A\nHUBCAP I STRIPPED OFF AN\nABANDONED '76 CHEVY VEGA TO\nOFFER FOR YOU THE RAIL YARD BUFFET\nOF NEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "pyhaU-_1Szk": {
        "begin": "2:48.0",
        "end": "4:31.0",
        "text": "YOU KNOW FOLKS, I SPEND A LOT OF\nTIME RIGHT OVER THERE, ISOLATING\nTHE BIGGEST, NEWSIEST STORIES\nOF THE DAY AND CONTAINING THEM\nIN THE MOST TOPICAL CIRCULAR\nTUNNEL, WITH A CIRCUMFERENCE OF\n26.7 KILOMETERS, AND A DEPTH\nRANGING FROM 50 TO 175 METERS.\nTHEN, I ADD TWO ADJACENT\nPARALLEL BEAM-LINES, WHICH\nTRAVEL IN OPPOSITE DIRECTIONS\nAROUND THE RING, INTERSECTING AT\nFOUR POINTS.\nI ADD 1,232 DIPOLE MAGNETS TO\nKEEP THE BEAMS IN THEIR CIRCULAR\nPATH, WHILE AN ADDITIONAL 392\nQUADRUPOLE MAGNETS ARE USED TO\nKEEP THE BEAMS FOCUSED, THEN\nI ADD STRONGER QUADRUPOLE\nMAGNETS CLOSE TO THE\nINTERSECTION POINTS IN ORDER TO\nMAXIMIZE THE CHANCES OF\nINTERACTION BETWEEN THE TWO BEAMS\nCROSS, ALL SO I CAN SMASH JOKE\nPROTONS AGAINST EACH OTHER AT\nNEAR THE SPEED OF LIGHT TO\nGENERATE THE HIGGS BOSON HEAVY\nCOMEDY PARTICLES THAT MAKE UP MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, FOLKS, SOMETIMES \nSOMETIMES, SOMETIMES I WAKE UP IN AN\nABANDONED JUNKYARD, STRAPPED TO\nTHE CHASSIS OF WHAT USED TO BE A\nSCHOOL BUS.\nI GNAW MYSELF FREE OF MY\nRESTRAINTS AND CLIMB ATOP A HILL\nOF CRUSHED MAZDA MIATAS TO UTTER\nA CALL THAT CAN BE HEARD ONLY BY\nTHOSE IN THE MIDST OF A\nLIFE-CHANGING PEYOTE TRIP.\nWITH MY FREAKS GATHERED AROUND ME,\nHOTWIRE AS MANY BURNT-OUT\n'91 BUICK LESABRES AS WE CAN\nFIND TO ANIMATE A FLEET OF\nFURY-ROAD-WORTHY LEMONS,\nTHEN ROLL THEM TO THE ABANDONED\nSUBWAY STATION BELOW CITY HALL,\nWHERE I LAUNCH THEM HEAD ON AT\nTOP SPEED IN THE UNREGULATED\nHOTWHEELS COLOSSAL CRASH TRACK\nOF NEWS THAT IS MY SEGMENT:\nMEANWHILE!"
    },
    "q8zlh8XKfLc": {
        "begin": "1:00.0",
        "end": "1:54.0",
        "text": "FOLKS, YOU KNOW, I\nSPEND MOST OF MY TIME, RIGHT\nOVER THERE, WITH MY EARS, MY MIND,\nAND MY HEART OPEN TO THE DAY'S\nBIGGEST STORIES, AUDITIONING AND\nSELECTING ONLY THE MOST TOPICAL\nNEWS-OBOES, THE MOST RELEVANT\nAND LILTING VIOLAS, ROUNDING IT\nOUT WITH SOME NOBLE FRENCH\nHORNS, AND INSOUCIANT\nBASSOONS, THEN COMPOSING AND\nARRANGING THE NEWSIEST, MOST\nUPLIFTING YET BITTERSWEET\nRONDOS, ALLEGROS, SCHERZOS, AND\nSONATAS TO PRESENT TO YOU THE\nTIMELESSLY MOVING YET\nINFORMATIVE POST-MODERN OPUS\nNUMBER ONE SYMPHONY THAT IS MY\nMONOLOGUE.\nBUT SOMETIMES, SOMETIMES FOLKS, I WAKE UP AT THE\nWHEEL OF A STOLEN CEMENT TRUCK,\nSNORT ANOTHER RAIL OF KETAMINE\nAND BATH SALTS, THEN I STRIP\nDOWN AND SCAMPER THROUGH A\nCEMETERY TRAPPING RATS UNDER\nRUSTY COFFEE CANS.\nAFTER AN IMPASSIONED SPEECH TO THEM\nABOUT THEIR NEED\nTO HELP ME SAVE AN OLD THEATER, THEY\nACCOMPANY ME ON A RAID TO A\nPRESCHOOL MUSIC ROOM TO STEAL\nITS FLUTES, RECORDERS, AND\nKAZOOS, WHERE I CONDUCT THE\nFUGITIVE VERMIN PHILHARMONIC OF\nNEWS THAT IS MY SEGMENT:\nMEANWHILE!\n"
    },
    "qEY5SUevhgU": {
        "begin": "1:59.0",
        "end": "3:07.0",
        "text": "FOLKS, IF YOU\nWATCH THIS SHOW, YOU KNOW I\nSPEND MUCH OF MY TIME,\nRIGHT OVER THERE, PLANTING AND\nGROWING THE DAY'S BIGGEST NEWS\nIN A PARCELED TERROIR AT\nPRECISELY 80 METERS, ON A\nNORTH-FACING SLOPE, WITH JUST\nTHE RIGHT MICROCLIMATE, THEN\nHAND-PICKING ONLY THE RIPEST,\nMOST TOPICAL BOTRYTIS-PRUNED\nSTORY GRAPES.\nAFTER THREE PRESSINGS, I THEN\nCAREFULLY BARREL-AGE THEIR\nNOBLE-ROTTED NECTAR FOR 30\nMONTHS EXCLUSIVELY IN NEW OAK BARRELS TO\nBRING OUT THE AROMAS OF TROPICAL\nFRUITS, HONEYED PEARS, AND\nROASTED NUTS IN THE CHATEAU\nD'YQUEM SAUTERNES THAT IS MY\nNIGHTLY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES, FOLKS,\nI WAKE UP IN A BULGARIAN PRISON,\nCONVICTED OF WHAT MY NON-ENGLISH\nSPEAKING, COURT-APPOINTED LAWYER\nONLY CALLS \"ANIMAL WRONGS.\"\nI TRADE THE CIGARETTES I WON IN\nA BARE-KNUCKLE MATCH WITH A\nGUARD FOR SOME FIG MARMALADE,\nAPPLE CORES, AND DISCARDED\nKETCHUP PACKETS, TOSS IT ALL IN\nTHE PLASTIC BAG I STOLE OFF A\nCELLMATE DRAGOMIR'S FOOT WHILE\nHE SLEPT, LEAVE IT UNDER A\nFERMENTING PIPE\nOVERNIGHT, TO SERVE UP THE\nSOUR-MASHED GOON PLONK OF NEWS\nTHAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "r7NnpAGIkEY": {
        "begin": "1:46.5",
        "end": "2:32.5",
        "text": "AND, YOU KNOW,\nFOLKS, I SPEND A LOT OF TIME ON THIS SHOW, RIGHT\nOVER THERE, CAREFULLY HARVESTING\nTHE HIGHEST-QUALITY ORGANIC ACAI\nNEWS BERRIES, PUTTING THEM INTO\nMY CURRENT EVENTS BLENDER, THEN\nPULSING ON HIGH UNTIL THEY'VE\nBECOME THE SMOOTH PURPLE PUREE OF\nSTORIES TO BE PILED WITH\nGRANOLA, CHIA SEEDS, AND SLICED\nJOKE BANANA, TO MAKE THE\nHIGH-PRICED, ARTISANAL SMOOTHIE\nBOWL OF NEWS THAT IS MY\nMONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES\nFOLKS, I LIKE TO SCROUNGE\nTOGETHER SOME EXPIRED KALE FROM\nTHE BACK OF THE FRIDGE, MIX IT\nWITH THE FERMENTING ORANGE\nSLICES LEFT IN THE BACK SEAT\nAFTER LAST WEEK'S LITTLE LEAGUE\nGAME, AND AN APPLE CORE I FOUND\nINSIDE A COFFEE CUP, THEN\nPULVERIZE IT ALL IN A LEAKY\nNUTRI-BULLET TO MAKE THE PRISON\nTOILET GREEN JUICE OF NEWS THAT\nIS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "sKCeqiWA-gQ": {
        "begin": "0:43.5",
        "end": "1:36.5",
        "text": "FOLKS, I SPEND A LOT OF TIME\nRIGHT OVER THERE, NIGHT AFTER NIGHT, COMBING\nTHROUGH THE DAY'S NEWS,\nCAREFULLY SELECTING THE MOST\nTOPICAL, FRAGRANT HERBS AND\nJOKE-RICH ALLIUM, DELICATELY\nSTIRRING THEM INTO A SATIRICAL\nSTOCK, BRINGING THE CONCOCTION\nTO A BREAKING NEWS BOIL BEFORE\nPAINSTAKINGLY EDITING AWAY THE\nSCRAPS, LEAVING ONLY THE PUREST,\nNUTRIENT-RICH CONSOMME OF COMEDY\nTHAT IS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, I SWEAT MYSELF\nAWAKE INSIDE A DEFLATED BOUNCY\nCASTLE AT A DEFUNCT AMUSEMENT\nPARK, BREAK INTO A COMBINATION\nGAS STATION PIZZA HUT WHERE I\nTHROW TOGETHER WHATEVER OLD HOT\nDOGS AND REPURPOSED CHEESE\nPRODUCT I CAN GET MY CHAPPED AND\nCRACKING HANDS ON.\nAND THERE, BY THE CRUEL LIGHT OF\nA PIZZA WARMING DRAWER, I DROWN\nTHE MIXTURE IN A CAN OF\nDISCONTINUED SURGE FROM 2002\nBEFORE STRAINING IT THROUGH THE\nGREASE-SOILED BEARD NET TO\nCREATE THE FESTERING MOP BUCKET\nSOUR MASH OF NEWS THAT IS MY\nSEGMENT:\nMEANWHILE!"
    },
    "tSdWz6CvpIc": {
        "begin": "1:50.0",
        "end": "2:39.0",
        "text": "FOLKS, YOU KNOW, IF YOU WATCH\nTHE SHOW, YOU KNOW I SPEND A\nLOT OF MY TIME, RIGHT OVER THERE,\nCAREFULLY WELDING TOGETHER THE\nDAY'S TOP STORIES, FORGED FROM\nTHE FINEST NEWS METALS, WIRING\nIN THE MOST EFFICIENT, HIGH-\nSPEED ENGINE.\nTHEN I COMBINE THE MOST TOPICAL\nTITANIUM ACCENTS WITH ITALIAN\nCURRENT-EVENTS CRAFTSMANSHIP TO\nCREATE THE BESPOKE TRIUMPH\nMOTORCYCLE THAT IS MY MONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES FOLKS,\nAFTER STANDING TOO LONG OVER AN\nEPHEDRINE BARREL FIRE, I STUMBLE\nINTO A DEFUNCT BODY SHOP, SLAP A\nHALF-EMPTY CANISTER OF PROPANE\nONTO A STOLEN HUFFY, WRAP IT IN\nNEWSPAPER AND BITS OF CAUTION\nTAPE THAT I SWIPED FROM A\nSTILL-ACTIVE CRIME SCENE, AND\nHOOK IT UP TO AN OLD MILK JUG\nFULL OF NITROUS I STOLE FROM A\nBLACK MARKET ORTHODONTIST, IN\nORDER TO MAKE THE FLAMING DEATH\nROCKET OF NEWS THAT IS MY\nSEGMENT:\n\"MEANWHILE!\""
    },
    "thFDua8MF_w": {
        "begin": "1:21.5",
        "end": "2:18.5",
        "text": "FOLKS, I SPEND A\nLOT OF TIME\nRIGHT OVER THERE, COMBING\nTHROUGH THE DAY'S NEWS AND\nCAREFULLY SELECTING THE MOST\nPRISTINE OPALESCENT GLASS\nSTORIES, ORNATELY FUSING THE\nPIECES USING ONLY THE MOST\nTOPICAL COPPER WIRE AND LEAD\nCASING BEFORE COLORING THEM WITH\nTHE MOST PIGMENT-RICH JOKES\nAVAILABLE TO CONSTRUCT FOR YOU AND YOU ALONE\nTHE ELEGANT STAINED-GLASS\nTIFFANY DOME THAT IS MY NIGHTLY\nMONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES FOLKS, I JOLT AWAKE\nBEHIND THE WHEEL OF A '79 BUICK\nREGAL LOWRIDER WHILE DOING\nDONUTS IN THE PARKING LOT OF A\nBOARDED UP JOANNE FABRICS, WHEN\nI CLIP A BARREL FIRE AND I'M\nTHROWN FROM THE CAR INTO THE\nDUMPSTERS.\nTHERE, I RUMMAGE THROUGH THE\nBITS OF BROKEN FANTA BOTTLES,\nAND GLUE THEM TOGETHER WITH\nSTILL-WARM CHEWING GUM, AND\nSTAIN THEM WITH WHATEVER\nREMNANTS I CAN SCRAPE FROM OLD\nKETCHUP AND FUN-DIP PACKETS.\nTHEN I DOUSE MY PANTS IN\nKEROSENE AND LET HER BLAZE TO\nPROJECT THE DEMENTED NIGHTMARE\nKALEIDOSCOPE OF NEWS THAT IS MY\nSEGMENT:\nMEANWHILE"
    },
    "u9oMwS3I12s": {
        "begin": "0:53.0",
        "end": "1:47.0",
        "text": "FOLKS, YOU KNOW, IF YOU WATCH THE SHOW, YOU KNOW I SPEND A LOT OF\nMY TIME RIGHT OVER THERE,\nWORKING THE OLD MIRTH KILN,\nMELTING DOWN THE DAY'S MOST\nIMPORTANT GOLDEN STORY INGOTS\nTO MAKE AN ORNATE SET OF CUSTOM\nNEWS VAMBRACES.\nTHEN CARVE A MOULDED CUIRASS\nINTO THE MOST TOPICAL\nANIMAL-THEMED CHEST PLATE THAT I\nDECORATE WITH FINE FILIGREE AND\nORNATE PRE-COLUMBIAN PATTERNS TO\nCREATE THE BESPOKE SET OF GOLD\nMUISCA ARMOR THAT IS MY\nMONOLOGUE.\nBUT SOMETIMES, JUST SOMETIMES,\nFOLKS, I WAKE UP IN THE BASEMENT\nOF A DERELICT ROW HOUSE DURING A\nFULL MOON, RIFLE THROUGH A\nDISCARDED BOX OF ELBOW PASTA AND\nSTRING SOME NOODLES TOGETHER\nINTO CRUDE SHIN GUARDS WITH\nCOPPER WIRE I STRIPPED OUT OF\nTHE FUNERAL HOME I ROBBED\nEARLIER.\nTHEN, I FASHION A HAT BY\nSTAPLING OLD NEWSPAPER CLIPPINGS\nTO A BIKE HELMET AND WRAP MYSELF\nIN A TARP I SWIPED FROM THE\nRETIREMENT HOME'S HOT TUB TO\nFROLIC BEFORE YOU IN THE\nMADMAN'S HAZMAT SUIT OF NEWS\nTHAT IS MY SEGMENT:\n\"MEANWHILE!\""
    },
    "z2dPp5yM-NA": {
        "begin": "1:34.3",
        "end": "2:34.3",
        "text": "YOU KNOW, FOLKS, IF YOU WATCH\nTHIS SHOW, AND I HOPE YOU DO...\nTHEN YOU KNOW I SPEND MOST OF MY TIME\nRIGHT OVER THERE, CAREFULLY\nUNWRAPPING THE DAY'S NEWS,\nPLACING THE FINEST, MOST TOPICAL\nORNAMENTS UPON THE HAND-CUT\nDOUGLAS FIR OF THE DAY'S TOP\nSTORIES, SPRINKLING\nBIODEGRADABLE TINSEL ON THE\nBOUGHS WITH PRECISION AND\nDECADES OF TRAINING THAT COMES\nACROSS AS EFFORTLESS, CHECKING\nEACH JOKE BULB FOR THE OPTIMAL\nTWINKLE, AND FINALLY TOPPING IT\nOFF WITH A FAMILY HEIRLOOM STAR\nTO CREATE THE MAGICAL CHRISTMAS\nMEMORY THAT IS MY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES,\nI BREAK INTO A HOME DEPOT, HUFF\nA BOTTLE OF GOO GONE, STEAL A\nPALLET OF TWO BY FOURS, A PILE\nOF RUSTY NAILS, A BUCKET OF\nDISCONTINUED FAUCET PARTS, SLAP\nTHEM TOGETHER WITH A RECEIPT PAPER\nAND INDUSTRIAL ADHESIVE, PUT IT\nOUTSIDE THE LIVING ROOM WINDOW\nOF THE RETIREMENT HOME I WAS\nKICKED OUT OF FOR A VERY GOOD\nREASON, AND THROW IN A COUPLE OF\nMANNEQUINS STOLEN FROM A BURNED\nOUT FOREVER 21 TO CREATE THE\nHELLSCAPE CRECHE OF NEWS THAT IS\nMY SEGMENT:\n\"MEANWHILE\"!"
    },
    "zFRXCwdPD-M": {
        "begin": "1:31.6",
        "end": "2:10.6",
        "text": "YOU KNOW, FOLKS, I SPEND A LOT\nOF MY TIME ON THE SHOW RIGHT OVER THERE PRECISELY\nMEASURING THE NEWS' INSEAM,\nSELECTING THE FINEST, MOST\nTOPICAL IMPORTED MERINO STORY\nWOOL, THEN HAND-STITCHING IT\nWITH JOKES TO CREATE FOR YOU THE\nBESPOKE, DOUBLE-BREASTED SAVILE\nROW CURRENT EVENT SUIT THAT IS\nMY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES,\nI LIKE TO GATHER UP SOME USED\nBURLAP FROM BEHIND THE\nMEAT-PACKING PLANT, DRAPE IT\nOVER AN ABANDONED MANNEQUIN AT\nOLD MAN JENKINS' BURNED-DOWN\nDRESS FACTORY, AND SEW IT\nTOGETHER WITH SHOESTRINGS AND A\nSTAPLE GUN, TO CREATE FOR YOU\nTHE HAUNTED POTATO-SACK\nSCARECROW OF NEWS THAT IS MY\nSEGMENT:\n\"MEANWHILE!\""
    },
    "zIS1lp9CS-E": {
        "begin": "1:16.5",
        "end": "2:04.5",
        "text": "FOLKS, I SPEND A\nLOT OF TIME RIGHT OVER THERE,\nSELECTING THE FINEST GRAINS OF\nNEWS, HAND SIFTING THROUGH\nBARRELS OF STEELCUT JOKE OATS,\nSELECTING THE RIPEST SEASONAL\nSTORY BERRIES, AND\nHOME-FERMENTING MACROBIOTIC\nALMOND MILK INTO THE UPSCALE ORGANIC\nYOGURT TO LOVINGLY FOLD TOGETHER\nTHE BUZZWORTHY BREAKFAST PARFAIT\nTHAT IS MY NIGHTLY MONOLOGUE.\nBUT SOMETIMES, SOMETIMES I SWEAT MYSELF\nAWAKE IN THE MIDDLE OF THE\nNIGHT, CREEP OUT TO AN ABANDONED\nSCHOOLYARD, SCRAPE A BUNCH OF\nSPILLED CHEERIOS AND DISCARDED\nGOGURT WRAPPERS INTO A BIG GULP\nTRAVEL MUG I WON IN THE GREAT\nTRUCKER-DRIFTER WARS OF 2019,\nADD SOME FERMENTED CRABAPPLES\nFROM BEHIND THE SWING SET, AND\nCHUG BACK THE FETID PRISON\nPORRIDGE OF NEWS THAT IS MY\nSEGMENT:\nMEANWHILE!"
    }
}

================================================
FILE: model-card.md
================================================
# Model Card: Whisper

This is the official codebase for running the automatic speech recognition (ASR) models (Whisper models) trained and released by OpenAI.

Following [Model Cards for Model Reporting (Mitchell et al.)](https://arxiv.org/abs/1810.03993), we're providing some information about the automatic speech recognition model. More information on how these models were trained and evaluated can be found [in the paper](https://arxiv.org/abs/2212.04356).


## Model Details

The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation). Researchers at OpenAI developed the models to study the robustness of speech processing systems trained under large-scale weak supervision. There are 9 models of different sizes and capabilities, summarized in the following table.

|  Size  | Parameters | English-only model | Multilingual model |  
|:------:|:----------:|:------------------:|:------------------:|
|  tiny  |    39 M    |         ✓          |         ✓          |
|  base  |    74 M    |         ✓          |         ✓          |
| small  |   244 M    |         ✓          |         ✓          |
| medium |   769 M    |         ✓          |         ✓          |
| large  |   1550 M   |                    |         ✓          |
| turbo  |   798 M    |                    |         ✓          |

In December 2022, we [released an improved large model named `large-v2`](https://github.com/openai/whisper/discussions/661), and `large-v3` in November 2023.
Additionally, we've added a `turbo` model in September 2024 which is optimized for inference speed.


### Release date

September 2022 (original series), December 2022 (`large-v2`), November 2023 (`large-v3`), September 2024 (`large-v3-turbo`)

### Model type

Sequence-to-sequence ASR (automatic speech recognition) and speech translation model

### Paper & samples

[Paper](https://arxiv.org/abs/2212.04356) / [Blog](https://openai.com/blog/whisper)


## Model Use

### Evaluated Use

The primary intended users of these models are AI researchers studying the robustness, generalization, capabilities, biases, and constraints of the current model. However, Whisper is also potentially quite useful as an ASR solution for developers, especially for English speech recognition. We recognize that once models are released, it is impossible to restrict access to only “intended” uses or to draw reasonable guidelines around what is or is not research.

The models are primarily trained and evaluated on ASR and speech translation to English tasks. They show strong ASR results in ~10 languages. They may exhibit additional capabilities, particularly if fine-tuned on certain tasks like voice activity detection, speaker classification, or speaker diarization but have not been robustly evaluated in these areas. We strongly recommend that users perform robust evaluations of the models in a particular context and domain before deploying them.

In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent or purporting to use these models for any kind of subjective classification. We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes. The models are intended to transcribe and translate speech, use of the model for classification is not only not evaluated but also not appropriate, particularly to infer human attributes.


## Training Data

The models are trained on 680,000 hours of audio and the corresponding transcripts collected from the internet. 65% of this data (or 438,000 hours) represents English-language audio and matched English transcripts, roughly 18% (or 126,000 hours) represents non-English audio and English transcripts, while the final 17% (or 117,000 hours) represents non-English audio and the corresponding transcript. This non-English data represents 98 different languages. 

As discussed in [the accompanying paper](https://arxiv.org/abs/2212.04356), we see that performance on transcription in a given language is directly correlated with the amount of training data we employ in that language.


## Performance and Limitations

Our studies show that, over many existing ASR systems, the models exhibit improved robustness to accents, background noise, and technical language, as well as zero-shot translation from multiple languages into English; and that accuracy on speech recognition and translation is near the state-of-the-art level. 

However, because the models are trained in a weakly supervised manner using large-scale noisy data, the predictions may include texts that are not actually spoken in the audio input (i.e. hallucination). We hypothesize that this happens because, given their general knowledge of language, the models combine trying to predict the next word in audio with trying to transcribe the audio itself.

Our models perform unevenly across languages, and we observe lower accuracy on low-resource and/or low-discoverability languages or languages where we have less training data. The models also exhibit disparate performance on different accents and dialects of particular languages, which may include a higher word error rate across speakers of different genders, races, ages, or other demographic criteria. Our full evaluation results are presented in [the paper accompanying this release](https://arxiv.org/abs/2212.04356).

In addition, the sequence-to-sequence architecture of the model makes it prone to generating repetitive texts, which can be mitigated to some degree by beam search and temperature scheduling but not perfectly. Further analysis of these limitations is provided in [the paper](https://arxiv.org/abs/2212.04356). It is likely that this behavior and hallucinations may be worse in lower-resource and/or lower-discoverability languages.


## Broader Implications

We anticipate that Whisper models’ transcription capabilities may be used for improving accessibility tools. While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation. The real value of beneficial applications built on top of Whisper models suggests that the disparate performance of these models may have real economic implications.

There are also potential dual-use concerns that come with releasing Whisper. While we hope the technology will be used primarily for beneficial purposes, making ASR technology more accessible could enable more actors to build capable surveillance technologies or scale up existing surveillance efforts, as the speed and accuracy allow for affordable automatic transcription and translation of large volumes of audio communication. Moreover, these models may have some capabilities to recognize specific individuals out of the box, which in turn presents safety concerns related both to dual use and disparate performance. In practice, we expect that the cost of transcription is not the limiting factor of scaling up surveillance projects.


================================================
FILE: notebooks/LibriSpeech.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "v5hvo8QWN-a9"
   },
   "source": [
    "# Installing Whisper\n",
    "\n",
    "The commands below will install the Python packages needed to use Whisper models and evaluate the transcription results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "ZsJUxc0aRsAf"
   },
   "outputs": [],
   "source": [
    "! pip install git+https://github.com/openai/whisper.git\n",
    "! pip install jiwer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1IMEkgyagYto"
   },
   "source": [
    "# Loading the LibriSpeech dataset\n",
    "\n",
    "The following will load the test-clean split of the LibriSpeech corpus using torchaudio."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "3CqtR2Fi5-vP"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "\n",
    "try:\n",
    "    import tensorflow  # required in Colab to avoid protobuf compatibility issues\n",
    "except ImportError:\n",
    "    pass\n",
    "\n",
    "import torch\n",
    "import pandas as pd\n",
    "import whisper\n",
    "import torchaudio\n",
    "\n",
    "from tqdm.notebook import tqdm\n",
    "\n",
    "\n",
    "DEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "GuCCB2KYOJCE"
   },
   "outputs": [],
   "source": [
    "class LibriSpeech(torch.utils.data.Dataset):\n",
    "    \"\"\"\n",
    "    A simple class to wrap LibriSpeech and trim/pad the audio to 30 seconds.\n",
    "    It will drop the last few seconds of a very small portion of the utterances.\n",
    "    \"\"\"\n",
    "    def __init__(self, split=\"test-clean\", device=DEVICE):\n",
    "        self.dataset = torchaudio.datasets.LIBRISPEECH(\n",
    "            root=os.path.expanduser(\"~/.cache\"),\n",
    "            url=split,\n",
    "            download=True,\n",
    "        )\n",
    "        self.device = device\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.dataset)\n",
    "\n",
    "    def __getitem__(self, item):\n",
    "        audio, sample_rate, text, _, _, _ = self.dataset[item]\n",
    "        assert sample_rate == 16000\n",
    "        audio = whisper.pad_or_trim(audio.flatten()).to(self.device)\n",
    "        mel = whisper.log_mel_spectrogram(audio)\n",
    "        \n",
    "        return (mel, text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "-YcRU5jqNqo2"
   },
   "outputs": [],
   "source": [
    "dataset = LibriSpeech(\"test-clean\")\n",
    "loader = torch.utils.data.DataLoader(dataset, batch_size=16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0ljocCNuUAde"
   },
   "source": [
    "# Running inference on the dataset using a base Whisper model\n",
    "\n",
    "The following will take a few minutes to transcribe all utterances in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "_PokfNJtOYNu",
    "outputId": "2c53ec44-bc93-4107-b4fa-214e3f71fe8e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model is English-only and has 71,825,408 parameters.\n"
     ]
    }
   ],
   "source": [
    "model = whisper.load_model(\"base.en\")\n",
    "print(\n",
    "    f\"Model is {'multilingual' if model.is_multilingual else 'English-only'} \"\n",
    "    f\"and has {sum(np.prod(p.shape) for p in model.parameters()):,} parameters.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# predict without timestamps for short-form transcription\n",
    "options = whisper.DecodingOptions(language=\"en\", without_timestamps=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 49,
     "referenced_widgets": [
      "09a29a91f58d4462942505a3cc415801",
      "83391f98a240490987c397048fc1a0d4",
      "06b9aa5f49fa44ba8c93b647dc7db224",
      "da9c231ee67047fb89073c95326b72a5",
      "48da931ebe7f4fd299f8c98c7d2460ff",
      "7a901f447c1d477bb49f954e0feacedd",
      "39f5a6ae8ba74c8598f9c6d5b8ad2d65",
      "a0d10a42c753453283e5219c22239337",
      "09f4cb79ff86465aaf48b0de24869af9",
      "1b9cecf5b3584fba8258a81d4279a25b",
      "039b53f2702c4179af7e0548018d0588"
     ]
    },
    "id": "7OWTn_KvNk59",
    "outputId": "a813a792-3c91-4144-f11f-054fd6778023"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9df048b46f764cf68cbe0045b8ff73a8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/164 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "hypotheses = []\n",
    "references = []\n",
    "\n",
    "for mels, texts in tqdm(loader):\n",
    "    results = model.decode(mels, options)\n",
    "    hypotheses.extend([result.text for result in results])\n",
    "    references.extend(texts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 424
    },
    "id": "4nTyynELQ42j",
    "outputId": "1c72d25a-3e87-4c60-a8d1-1da9d2f73bd7"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>hypothesis</th>\n",
       "      <th>reference</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>He hoped there would be stew for dinner, turni...</td>\n",
       "      <td>HE HOPED THERE WOULD BE STEW FOR DINNER TURNIP...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Stuffered into you, his belly counseled him.</td>\n",
       "      <td>STUFF IT INTO YOU HIS BELLY COUNSELLED HIM</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>After early nightfall the yellow lamps would l...</td>\n",
       "      <td>AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD L...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Hello Bertie, any good in your mind?</td>\n",
       "      <td>HELLO BERTIE ANY GOOD IN YOUR MIND</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Number 10. Fresh Nelly is waiting on you. Good...</td>\n",
       "      <td>NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2615</th>\n",
       "      <td>Oh, to shoot my soul's full meaning into futur...</td>\n",
       "      <td>OH TO SHOOT MY SOUL'S FULL MEANING INTO FUTURE...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2616</th>\n",
       "      <td>Then I, long tried by natural ills, received t...</td>\n",
       "      <td>THEN I LONG TRIED BY NATURAL ILLS RECEIVED THE...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2617</th>\n",
       "      <td>I love thee freely as men strive for right. I ...</td>\n",
       "      <td>I LOVE THEE FREELY AS MEN STRIVE FOR RIGHT I L...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2618</th>\n",
       "      <td>I love thee with the passion put to use, in my...</td>\n",
       "      <td>I LOVE THEE WITH THE PASSION PUT TO USE IN MY ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2619</th>\n",
       "      <td>I love thee with the love I seemed to lose wit...</td>\n",
       "      <td>I LOVE THEE WITH A LOVE I SEEMED TO LOSE WITH ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2620 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             hypothesis  \\\n",
       "0     He hoped there would be stew for dinner, turni...   \n",
       "1          Stuffered into you, his belly counseled him.   \n",
       "2     After early nightfall the yellow lamps would l...   \n",
       "3                  Hello Bertie, any good in your mind?   \n",
       "4     Number 10. Fresh Nelly is waiting on you. Good...   \n",
       "...                                                 ...   \n",
       "2615  Oh, to shoot my soul's full meaning into futur...   \n",
       "2616  Then I, long tried by natural ills, received t...   \n",
       "2617  I love thee freely as men strive for right. I ...   \n",
       "2618  I love thee with the passion put to use, in my...   \n",
       "2619  I love thee with the love I seemed to lose wit...   \n",
       "\n",
       "                                              reference  \n",
       "0     HE HOPED THERE WOULD BE STEW FOR DINNER TURNIP...  \n",
       "1            STUFF IT INTO YOU HIS BELLY COUNSELLED HIM  \n",
       "2     AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD L...  \n",
       "3                    HELLO BERTIE ANY GOOD IN YOUR MIND  \n",
       "4     NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD ...  \n",
       "...                                                 ...  \n",
       "2615  OH TO SHOOT MY SOUL'S FULL MEANING INTO FUTURE...  \n",
       "2616  THEN I LONG TRIED BY NATURAL ILLS RECEIVED THE...  \n",
       "2617  I LOVE THEE FREELY AS MEN STRIVE FOR RIGHT I L...  \n",
       "2618  I LOVE THEE WITH THE PASSION PUT TO USE IN MY ...  \n",
       "2619  I LOVE THEE WITH A LOVE I SEEMED TO LOSE WITH ...  \n",
       "\n",
       "[2620 rows x 2 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.DataFrame(dict(hypothesis=hypotheses, reference=references))\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HPppEJRXX4ox"
   },
   "source": [
    "# Calculating the word error rate\n",
    "\n",
    "Now, we use our English normalizer implementation to standardize the transcription and calculate the WER."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "dl-KBDflMhrg"
   },
   "outputs": [],
   "source": [
    "import jiwer\n",
    "from whisper.normalizers import EnglishTextNormalizer\n",
    "\n",
    "normalizer = EnglishTextNormalizer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 641
    },
    "id": "6-O048q4WI4o",
    "outputId": "f2089bc9-f535-441e-f192-26e52ae82b5e"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>hypothesis</th>\n",
       "      <th>reference</th>\n",
       "      <th>hypothesis_clean</th>\n",
       "      <th>reference_clean</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>He hoped there would be stew for dinner, turni...</td>\n",
       "      <td>HE HOPED THERE WOULD BE STEW FOR DINNER TURNIP...</td>\n",
       "      <td>he hoped there would be stew for dinner turnip...</td>\n",
       "      <td>he hoped there would be stew for dinner turnip...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Stuffered into you, his belly counseled him.</td>\n",
       "      <td>STUFF IT INTO YOU HIS BELLY COUNSELLED HIM</td>\n",
       "      <td>stuffered into you his belly counseled him</td>\n",
       "      <td>stuff it into you his belly counseled him</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>After early nightfall the yellow lamps would l...</td>\n",
       "      <td>AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD L...</td>\n",
       "      <td>after early nightfall the yellow lamps would l...</td>\n",
       "      <td>after early nightfall the yellow lamps would l...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Hello Bertie, any good in your mind?</td>\n",
       "      <td>HELLO BERTIE ANY GOOD IN YOUR MIND</td>\n",
       "      <td>hello bertie any good in your mind</td>\n",
       "      <td>hello bertie any good in your mind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Number 10. Fresh Nelly is waiting on you. Good...</td>\n",
       "      <td>NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD ...</td>\n",
       "      <td>number 10 fresh nelly is waiting on you good n...</td>\n",
       "      <td>number 10 fresh nelly is waiting on you good n...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2615</th>\n",
       "      <td>Oh, to shoot my soul's full meaning into futur...</td>\n",
       "      <td>OH TO SHOOT MY SOUL'S FULL MEANING INTO FUTURE...</td>\n",
       "      <td>0 to shoot my soul is full meaning into future...</td>\n",
       "      <td>0 to shoot my soul is full meaning into future...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2616</th>\n",
       "      <td>Then I, long tried by natural ills, received t...</td>\n",
       "      <td>THEN I LONG TRIED BY NATURAL ILLS RECEIVED THE...</td>\n",
       "      <td>then i long tried by natural ills received the...</td>\n",
       "      <td>then i long tried by natural ills received the...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2617</th>\n",
       "      <td>I love thee freely as men strive for right. I ...</td>\n",
       "      <td>I LOVE THEE FREELY AS MEN STRIVE FOR RIGHT I L...</td>\n",
       "      <td>i love thee freely as men strive for right i l...</td>\n",
       "      <td>i love thee freely as men strive for right i l...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2618</th>\n",
       "      <td>I love thee with the passion put to use, in my...</td>\n",
       "      <td>I LOVE THEE WITH THE PASSION PUT TO USE IN MY ...</td>\n",
       "      <td>i love thee with the passion put to use in my ...</td>\n",
       "      <td>i love thee with the passion put to use in my ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2619</th>\n",
       "      <td>I love thee with the love I seemed to lose wit...</td>\n",
       "      <td>I LOVE THEE WITH A LOVE I SEEMED TO LOSE WITH ...</td>\n",
       "      <td>i love thee with the love i seemed to lose wit...</td>\n",
       "      <td>i love thee with a love i seemed to lose with ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2620 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             hypothesis  \\\n",
       "0     He hoped there would be stew for dinner, turni...   \n",
       "1          Stuffered into you, his belly counseled him.   \n",
       "2     After early nightfall the yellow lamps would l...   \n",
       "3                  Hello Bertie, any good in your mind?   \n",
       "4     Number 10. Fresh Nelly is waiting on you. Good...   \n",
       "...                                                 ...   \n",
       "2615  Oh, to shoot my soul's full meaning into futur...   \n",
       "2616  Then I, long tried by natural ills, received t...   \n",
       "2617  I love thee freely as men strive for right. I ...   \n",
       "2618  I love thee with the passion put to use, in my...   \n",
       "2619  I love thee with the love I seemed to lose wit...   \n",
       "\n",
       "                                              reference  \\\n",
       "0     HE HOPED THERE WOULD BE STEW FOR DINNER TURNIP...   \n",
       "1            STUFF IT INTO YOU HIS BELLY COUNSELLED HIM   \n",
       "2     AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD L...   \n",
       "3                    HELLO BERTIE ANY GOOD IN YOUR MIND   \n",
       "4     NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD ...   \n",
       "...                                                 ...   \n",
       "2615  OH TO SHOOT MY SOUL'S FULL MEANING INTO FUTURE...   \n",
       "2616  THEN I LONG TRIED BY NATURAL ILLS RECEIVED THE...   \n",
       "2617  I LOVE THEE FREELY AS MEN STRIVE FOR RIGHT I L...   \n",
       "2618  I LOVE THEE WITH THE PASSION PUT TO USE IN MY ...   \n",
       "2619  I LOVE THEE WITH A LOVE I SEEMED TO LOSE WITH ...   \n",
       "\n",
       "                                       hypothesis_clean  \\\n",
       "0     he hoped there would be stew for dinner turnip...   \n",
       "1            stuffered into you his belly counseled him   \n",
       "2     after early nightfall the yellow lamps would l...   \n",
       "3                    hello bertie any good in your mind   \n",
       "4     number 10 fresh nelly is waiting on you good n...   \n",
       "...                                                 ...   \n",
       "2615  0 to shoot my soul is full meaning into future...   \n",
       "2616  then i long tried by natural ills received the...   \n",
       "2617  i love thee freely as men strive for right i l...   \n",
       "2618  i love thee with the passion put to use in my ...   \n",
       "2619  i love thee with the love i seemed to lose wit...   \n",
       "\n",
       "                                        reference_clean  \n",
       "0     he hoped there would be stew for dinner turnip...  \n",
       "1             stuff it into you his belly counseled him  \n",
       "2     after early nightfall the yellow lamps would l...  \n",
       "3                    hello bertie any good in your mind  \n",
       "4     number 10 fresh nelly is waiting on you good n...  \n",
       "...                                                 ...  \n",
       "2615  0 to shoot my soul is full meaning into future...  \n",
       "2616  then i long tried by natural ills received the...  \n",
       "2617  i love thee freely as men strive for right i l...  \n",
       "2618  i love thee with the passion put to use in my ...  \n",
       "2619  i love thee with a love i seemed to lose with ...  \n",
       "\n",
       "[2620 rows x 4 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[\"hypothesis_clean\"] = [normalizer(text) for text in data[\"hypothesis\"]]\n",
    "data[\"reference_clean\"] = [normalizer(text) for text in data[\"reference\"]]\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "EBGSITeBYPTT",
    "outputId": "7b3dbe7c-a37e-4a07-a50a-b27d5f88b68f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WER: 4.26 %\n"
     ]
    }
   ],
   "source": [
    "wer = jiwer.wer(list(data[\"reference_clean\"]), list(data[\"hypothesis_clean\"]))\n",
    "\n",
    "print(f\"WER: {wer * 100:.2f} %\")"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "provenance": []
  },
  "gpuClass": "standard",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.9"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "039b53f2702c4179af7e0548018d0588": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "06b9aa5f49fa44ba8c93b647dc7db224": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_a0d10a42c753453283e5219c22239337",
      "max": 164,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_09f4cb79ff86465aaf48b0de24869af9",
      "value": 164
     }
    },
    "09a29a91f58d4462942505a3cc415801": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_83391f98a240490987c397048fc1a0d4",
       "IPY_MODEL_06b9aa5f49fa44ba8c93b647dc7db224",
       "IPY_MODEL_da9c231ee67047fb89073c95326b72a5"
      ],
      "layout": "IPY_MODEL_48da931ebe7f4fd299f8c98c7d2460ff"
     }
    },
    "09f4cb79ff86465aaf48b0de24869af9": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": ""
     }
    },
    "1b9cecf5b3584fba8258a81d4279a25b": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "39f5a6ae8ba74c8598f9c6d5b8ad2d65": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "48da931ebe7f4fd299f8c98c7d2460ff": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "7a901f447c1d477bb49f954e0feacedd": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "83391f98a240490987c397048fc1a0d4": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_7a901f447c1d477bb49f954e0feacedd",
      "placeholder": "",
      "style": "IPY_MODEL_39f5a6ae8ba74c8598f9c6d5b8ad2d65",
      "value": "100%"
     }
    },
    "a0d10a42c753453283e5219c22239337": {
     "model_module": "@jupyter-widgets/base",
     "model_module_version": "1.2.0",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "da9c231ee67047fb89073c95326b72a5": {
     "model_module": "@jupyter-widgets/controls",
     "model_module_version": "1.5.0",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_1b9cecf5b3584fba8258a81d4279a25b",
      "placeholder": "",
      "style": "IPY_MODEL_039b53f2702c4179af7e0548018d0588",
      "value": " 164/164 [05:08&lt;00:00,  1.86s/it]"
     }
    },
    "state": {}
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}


================================================
FILE: notebooks/Multilingual_ASR.ipynb
================================================
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "v5hvo8QWN-a9"
      },
      "source": [
        "# Installing Whisper\n",
        "\n",
        "The commands below will install the Python packages needed to use Whisper models and evaluate the transcription results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "ZsJUxc0aRsAf"
      },
      "outputs": [],
      "source": [
        "! pip install git+https://github.com/openai/whisper.git"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "3CqtR2Fi5-vP"
      },
      "outputs": [],
      "source": [
        "import io\n",
        "import os\n",
        "import numpy as np\n",
        "\n",
        "try:\n",
        "    import tensorflow  # required in Colab to avoid protobuf compatibility issues\n",
        "except ImportError:\n",
        "    pass\n",
        "\n",
        "import torch\n",
        "import pandas as pd\n",
        "import urllib\n",
        "import tarfile\n",
        "import whisper\n",
        "import torchaudio\n",
        "\n",
        "from scipy.io import wavfile\n",
        "from tqdm.notebook import tqdm\n",
        "\n",
        "\n",
        "pd.options.display.max_rows = 100\n",
        "pd.options.display.max_colwidth = 1000\n",
        "DEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1IMEkgyagYto"
      },
      "source": [
        "# Loading the Fleurs dataset\n",
        "\n",
        "Select the language of the Fleur dataset to download. Please note that the transcription and translation performance varies widely depending on the language. Appendix D.2 in the paper contains the performance breakdown by language."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 49,
          "referenced_widgets": [
            "9cf878888ef0434b9cf5883cb29a4d3f",
            "26369a54159f4f46bb7ba89d0c66f6cb",
            "c78c2f0f2945498a93257ce00a5ee9a7"
          ]
        },
        "id": "L4lPK5106Of2",
        "outputId": "f4b56c91-41fb-4dc3-d905-3d05ce944f87"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Dropdown(description='Language:', index=39, options=(('Select language', None), ('----------', None), ('Afrika…"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "9cf878888ef0434b9cf5883cb29a4d3f"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "import ipywidgets as widgets\n",
        "\n",
        "languages = {\"af_za\": \"Afrikaans\", \"am_et\": \"Amharic\", \"ar_eg\": \"Arabic\", \"as_in\": \"Assamese\", \"az_az\": \"Azerbaijani\", \"be_by\": \"Belarusian\", \"bg_bg\": \"Bulgarian\", \"bn_in\": \"Bengali\", \"bs_ba\": \"Bosnian\", \"ca_es\": \"Catalan\", \"cmn_hans_cn\": \"Chinese\", \"cs_cz\": \"Czech\", \"cy_gb\": \"Welsh\", \"da_dk\": \"Danish\", \"de_de\": \"German\", \"el_gr\": \"Greek\", \"en_us\": \"English\", \"es_419\": \"Spanish\", \"et_ee\": \"Estonian\", \"fa_ir\": \"Persian\", \"fi_fi\": \"Finnish\", \"fil_ph\": \"Tagalog\", \"fr_fr\": \"French\", \"gl_es\": \"Galician\", \"gu_in\": \"Gujarati\", \"ha_ng\": \"Hausa\", \"he_il\": \"Hebrew\", \"hi_in\": \"Hindi\", \"hr_hr\": \"Croatian\", \"hu_hu\": \"Hungarian\", \"hy_am\": \"Armenian\", \"id_id\": \"Indonesian\", \"is_is\": \"Icelandic\", \"it_it\": \"Italian\", \"ja_jp\": \"Japanese\", \"jv_id\": \"Javanese\", \"ka_ge\": \"Georgian\", \"kk_kz\": \"Kazakh\", \"km_kh\": \"Khmer\", \"kn_in\": \"Kannada\", \"ko_kr\": \"Korean\", \"lb_lu\": \"Luxembourgish\", \"ln_cd\": \"Lingala\", \"lo_la\": \"Lao\", \"lt_lt\": \"Lithuanian\", \"lv_lv\": \"Latvian\", \"mi_nz\": \"Maori\", \"mk_mk\": \"Macedonian\", \"ml_in\": \"Malayalam\", \"mn_mn\": \"Mongolian\", \"mr_in\": \"Marathi\", \"ms_my\": \"Malay\", \"mt_mt\": \"Maltese\", \"my_mm\": \"Myanmar\", \"nb_no\": \"Norwegian\", \"ne_np\": \"Nepali\", \"nl_nl\": \"Dutch\", \"oc_fr\": \"Occitan\", \"pa_in\": \"Punjabi\", \"pl_pl\": \"Polish\", \"ps_af\": \"Pashto\", \"pt_br\": \"Portuguese\", \"ro_ro\": \"Romanian\", \"ru_ru\": \"Russian\", \"sd_in\": \"Sindhi\", \"sk_sk\": \"Slovak\", \"sl_si\": \"Slovenian\", \"sn_zw\": \"Shona\", \"so_so\": \"Somali\", \"sr_rs\": \"Serbian\", \"sv_se\": \"Swedish\", \"sw_ke\": \"Swahili\", \"ta_in\": \"Tamil\", \"te_in\": \"Telugu\", \"tg_tj\": \"Tajik\", \"th_th\": \"Thai\", \"tr_tr\": \"Turkish\", \"uk_ua\": \"Ukrainian\", \"ur_pk\": \"Urdu\", \"uz_uz\": \"Uzbek\", \"vi_vn\": \"Vietnamese\", \"yo_ng\": \"Yoruba\"}\n",
        "selection = widgets.Dropdown(\n",
        "    options=[(\"Select language\", None), (\"----------\", None)] + sorted([(f\"{v} ({k})\", k) for k, v in languages.items()]),\n",
        "    value=\"ko_kr\",\n",
        "    description='Language:',\n",
        "    disabled=False,\n",
        ")\n",
        "\n",
        "selection"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "4eihI6oK6Of2",
        "outputId": "064878a2-569c-4914-e92c-94280ce13dad"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Selected language: Korean (ko_kr)\n"
          ]
        }
      ],
      "source": [
        "lang = selection.value\n",
        "language = languages[lang]\n",
        "\n",
        "assert lang is not None, \"Please select a language\"\n",
        "print(f\"Selected language: {language} ({lang})\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "GuCCB2KYOJCE"
      },
      "outputs": [],
      "source": [
        "def download(url: str, target_path: str):\n",
        "    with urllib.request.urlopen(url) as source, open(target_path, \"wb\") as output:\n",
        "        with tqdm(total=int(source.info().get(\"Content-Length\")), ncols=80, unit='iB', unit_scale=True, unit_divisor=1024) as loop:\n",
        "            while True:\n",
        "                buffer = source.read(8192)\n",
        "                if not buffer:\n",
        "                    break\n",
        "\n",
        "                output.write(buffer)\n",
        "                loop.update(len(buffer))\n",
        "\n",
        "\n",
        "class Fleurs(torch.utils.data.Dataset):\n",
        "    \"\"\"\n",
        "    A simple class to wrap Fleurs and subsample a portion of the dataset as needed.\n",
        "    \"\"\"\n",
        "    def __init__(self, lang, split=\"test\", subsample_rate=1, device=DEVICE):\n",
        "        url = f\"https://storage.googleapis.com/xtreme_translations/FLEURS102/{lang}.tar.gz\"\n",
        "        tar_path = os.path.expanduser(f\"~/.cache/fleurs/{lang}.tgz\")\n",
        "        os.makedirs(os.path.dirname(tar_path), exist_ok=True)\n",
        "\n",
        "        if not os.path.exists(tar_path):\n",
        "            download(url, tar_path)\n",
        "\n",
        "        all_audio = {}\n",
        "        with tarfile.open(tar_path, \"r:gz\") as tar:\n",
        "            for member in tar.getmembers():\n",
        "                name = member.name\n",
        "                if name.endswith(f\"{split}.tsv\"):\n",
        "                    labels = pd.read_table(tar.extractfile(member), names=(\"id\", \"file_name\", \"raw_transcription\", \"transcription\", \"_\", \"num_samples\", \"gender\"))\n",
        "\n",
        "                if f\"/{split}/\" in name and name.endswith(\".wav\"):\n",
        "                    audio_bytes = tar.extractfile(member).read()\n",
        "                    all_audio[os.path.basename(name)] = wavfile.read(io.BytesIO(audio_bytes))[1]                    \n",
        "\n",
        "        self.labels = labels.to_dict(\"records\")[::subsample_rate]\n",
        "        self.all_audio = all_audio\n",
        "        self.device = device\n",
        "\n",
        "    def __len__(self):\n",
        "        return len(self.labels)\n",
        "\n",
        "    def __getitem__(self, item):\n",
        "        record = self.labels[item]\n",
        "        audio = torch.from_numpy(self.all_audio[record[\"file_name\"]].copy())\n",
        "        text = record[\"transcription\"]\n",
        "        \n",
        "        return (audio, text)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "-YcRU5jqNqo2"
      },
      "outputs": [],
      "source": [
        "dataset = Fleurs(lang, subsample_rate=10)  # subsample 10% of the dataset for a quick demo"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0ljocCNuUAde"
      },
      "source": [
        "# Running inference on the dataset using a medium Whisper model\n",
        "\n",
        "The following will take a few minutes to transcribe and translate utterances in the dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_PokfNJtOYNu",
        "outputId": "5ace74d4-ff20-4830-fe78-958881ec3905"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Model is multilingual and has 762,321,920 parameters.\n"
          ]
        }
      ],
      "source": [
        "model = whisper.load_model(\"medium\")\n",
        "print(\n",
        "    f\"Model is {'multilingual' if model.is_multilingual else 'English-only'} \"\n",
        "    f\"and has {sum(np.prod(p.shape) for p in model.parameters()):,} parameters.\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "F74Yfr696Of5"
      },
      "outputs": [],
      "source": [
        "options = dict(language=language, beam_size=5, best_of=5)\n",
        "transcribe_options = dict(task=\"transcribe\", **options)\n",
        "translate_options = dict(task=\"translate\", **options)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 49,
          "referenced_widgets": [
            "8cdc3b2a910748e98327462922dc008a",
            "32bd3d1217f24bb8a5e2f853633098d8",
            "dd93a997785a41568a8aba9cf5c0d83a",
            "956741460c504706aa097058dbc6eecf",
            "e4024c536d594ea2be54f471eacd485f",
            "f8eb2f7fc8c1400bb8dc351ea7fa6cfa",
            "76677587cf184477bafcc9d5459b5767",
            "50a75e807f144f438032a8fcdb9cdbe0",
            "dafffcc9b35c4bca95f19079d7c8be60",
            "73a0e8df4bb940bc8feae14e0a66d9c5",
            "ccdbe78adf2f4011946e290c81bd1a51"
          ]
        },
        "id": "7OWTn_KvNk59",
        "outputId": "7fc0731d-fba1-42da-8145-387d280f4bb1",
        "scrolled": false
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "  0%|          | 0/39 [00:00<?, ?it/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "8cdc3b2a910748e98327462922dc008a"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "references = []\n",
        "transcriptions = []\n",
        "translations = []\n",
        "\n",
        "for audio, text in tqdm(dataset):\n",
        "    transcription = model.transcribe(audio, **transcribe_options)[\"text\"]\n",
        "    translation = model.transcribe(audio, **translate_options)[\"text\"]\n",
        "    \n",
        "    transcriptions.append(transcription)\n",
        "    translations.append(translation)\n",
        "    references.append(text)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "4nTyynELQ42j",
        "outputId": "c883e334-67bd-462a-89d9-8c011fe42a09",
        "scrolled": false
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                                                                                                                       reference  \\\n",
              "0                                                                                  재입국 충격은 신혼 단계가 없기 때문에 문화 충격보다 빠르게 발생하며 더 오래 지속하고 더 극심할 수 있습니다   \n",
              "1                                                                                         불행히도 운전자의 행동이 100% 확실하다고 예측할 수 없기 때문에 교통 흐름을 연구하기란 어렵다   \n",
              "2                                                                                         합금은 기본적으로 금속 두 개 이상의 혼합물이다 주기율표 상에 원소가 많이 있다는 것을 잊지 마라   \n",
              "3                                                                                          홍콩 섬은 홍콩 영토에 홍콩이라는 이름을 부여한 곳이자 많은 관광객들이 주안점으로 여기는 곳이다   \n",
              "4                                                                                남자들을 단호히 거절하고 자신의 입장을 고수하는 것을 두려워하지 말라문화적 차이든 아니든 희롱은 간과될 수 없다!   \n",
              "5                                                                                        좀 더 작은 섬의 대부분은 독립 국가이거나 프랑스와 관련이 있고 호화로운 비치 리조트로 알려져 있다   \n",
              "6                                                                           사실 이게 존재한다는 사실을 아는 사람이 있다고 해도 찾는 것조차 쉽지 않습니다 일단 동굴 안에 들어가면 완전한 고립입니다   \n",
              "7                                                                                                     원단이 너무 뜨거워지지 않도록 주의하세요원단이 수축하거나 그을릴 수 있습니다   \n",
              "8                                                                           도널드 트럼프 미국 대통령은 늦은 일요일 시간대에 대변인을 통해 발표한 성명에서 미군이 시리아에서 철수할 것이라고 발표했다   \n",
              "9                                                         미국 체조 협회는 미국 올림픽 위원회의 서한을 지지하며 미국의 전 운동선수가 안전한 환경을 가질 수 있도록 돕는 올림픽 관계자의 절대적 필요성을 인정합니다   \n",
              "10  \"이건 작별 인사가 아닙니다. 이것은 한 장의 끝이며 새로운 장의 시작입니다.\"\\t \" \" 이 건 | 작 별 | 인 사 가 | 아 닙 니 다 . | 이 것 은 | 한 | 장 의 | 끝 이 며 | 새 로 운 | 장 의 | 시 작 입 니 다 . \" \" |   \n",
              "11                                                                    남아프리카 공화국의 특정 공원 또는 남아프리카 공화국 국립공원 전체에 입장할 수 있는 와일드 카드를 구매하는 것이 이득일 수 있습니다   \n",
              "12                                                                                   높은 고도나 산길을 넘어 운전하려고 하면 눈 빙판 또는 영하의 기온 등이 생길 가능성에 대해 고려해야 한다   \n",
              "13                                                                                       이 이야기에 관계된 두 명의 성인 자녀를 둔 기혼자인 듀발은 밀러에게 깊은 인상을 남기지 않았습니다   \n",
              "14                                                                                                  찬드라 셰카르 솔랑키 경찰서장은 피고인이 얼굴을 가린 채 법정에 나왔다고 말했다   \n",
              "15                                            지하철의 정규 안내 방송은 카탈로니아어로만 제공되나 계획에 없던 운행 중단 시에는 자동 시스템을 통해 스페인어 영어 프랑스어 아라비아어 일본어를 포함한 다양한 언어로 방송됩니다   \n",
              "16                                                                                                            그는 그 소문을 정치적 수다와 어리석음\"\"이라고 언급했습니다.   \n",
              "17                                                                                                    지구가 마치 움직이지 않는 것처럼 느껴지기에 언뜻 보면 맞는 말 같기도 하죠   \n",
              "18                                                                                               어린이와 노약자를 포함한 6명의 인질들은 필리핀 사진작가들과 마찬가지로 일찍 풀려났다   \n",
              "19                                                                                                           같은 줄을 따라 남자는 무릎을 덮는 바지를 입을 것을 요구받는다   \n",
              "20                                                                                                               그것은 기차와 자동차 및 여러 교통수단을 가져다주었습니다   \n",
              "21                                                               모로코 술탄은 다루 이 바드야daru l-badya라는 도시로 재건축하고 여기에 무역 거점을 세운 스페인 상인들은 이곳을 카사블랑카라고 불렀다   \n",
              "22                                                                                         비록 대부분 이론에 의존하긴 해도 프로그램은 궁수자리 은하 관측을 시뮬레이션하도록 짜여 있습니다   \n",
              "23                                                                                        아동이 위탁 보호를 받게 되는 이유는 방치에서 학대 심지어 강탈에 이르기까지 광범위하게 다양합니다   \n",
              "24                                                                                                             뇌 병리와 행동 사이의 상관관계가 과학자들의 연구를 돕습니다   \n",
              "25                                                                                            사람들은 귀국길에 오르는 여행객들에게도 인내심과 이해심이 필요하다고 생각하지 않을 수 있다   \n",
              "26                                                                                                             교전이 발발한 직후 영국은 독일에 대한 해상 봉쇄를 시작한다   \n",
              "27                                                                                               이렇게 되면 플레이어들이 장치를 허공에서 움직여 동작과 움직임을 제어할 수 있게 된다   \n",
              "28                                                                    스펙트럼의 또 다른 끝에서는 팀이 해오던 모든 것을 바꾸고 자신만의 방식으로 만들어야 한다고 생각하는 인지하지 못한 개인으로 바뀝니다   \n",
              "29                                                                                       피라미드 사운드와 광선 쇼는 이 지역에서 어린이들의 흥미를 가장 많이 끌어들이는 것들 중 하나입니다   \n",
              "30                                                                                                          장면들이 피라미드들 위에 비쳤고 다른 피라미드들에는 불이 밝혀졌다   \n",
              "31                                                                                              이곳은 남아프리카의 명소 중 하나로 남아프리카 국립 공원sanparks의 플래그십입니다   \n",
              "32                                                              아직도 존재하는 것으로 알려진 25개의 던랩 브로드 사이드는 이 문서의 남아있는 가장 오래된 사본이다 손으로 쓴 원본은 더 이상 남아 있지 않다   \n",
              "33                                                                                 상트페테르부르크 크루즈는 시내에서의 시간이 포함됩니다. 크루즈 승객은 비자 요건에서 제외됩니다약관을 확인하세요   \n",
              "34                                                블로그는 학생들의 글쓰기를 개선하는 데에도 도움이 됩니다 학생들이 종종 부족한 문법과 철자법으로 블로그를 시작하지만 독자들의 존재가 일반적으로 그러한 점들을 변화시킵니다   \n",
              "35                                                                                       델 포트로가 2세트에서 먼저 어드밴티지를 얻었지만 6 대 6이 된 후 다시 타이 브레이크가 필요했다   \n",
              "36                                                                                                                  이것은 몇몇 동사와 사물을 구별하는 중요한 방법이다   \n",
              "37                                                                                교회의 핵심 권력은 천 년 넘게 로마에 머물렀다 이런 힘과 돈의 편중은 이 교리가 합당한지에 대한 의문을 낳았다   \n",
              "38                                                                                 보고서는 행정부의 현재 이라크 정책에 대해 거의 모든 측면에서 매우 비판적이며 즉각적인 방향 변경을 촉구합니다   \n",
              "\n",
              "                                                                                               transcription  \\\n",
              "0                                             재입국 충격은 신혼 단계가 없기 때문에 문화 충격보다 빠르게 발생하며 더 오래 지속하고 더 극심할 수 있습니다.   \n",
              "1                                                    불행히도 운전자의 행동이 100% 확실하다고 예측할 수 없기 때문에 교통 흐름을 연구하기란 어렵다.   \n",
              "2                                                     합금은 기본적으로 금속 2개 이상의 혼합물이다. 주기율표상의 원소가 많이 있다는 것을 잊지 마라.   \n",
              "3                                                      홍콩섬은 홍콩 영토에 홍콩이라는 이름을 부여한 곳이자 많은 관광객들이 주한점으로 여기는 곳이다.   \n",
              "4                                         남자들을 단호히 거절하고 자신의 입장을 고수하는 것을 두려워하지 말라. 문화적 차이든 아니든 희롱은 간과 될 수 없다.   \n",
              "5                                                   좀 더 작은 섬의 대부분은 독립국가이거나 프랑스와 관련이 있고, 호화로운 비치 리조트로 알려져 있다.   \n",
              "6                                      사실 이게 존재한다는 사실을 아는 사람이 있다고 해도 찾는 것조차 쉽지 않습니다. 일단 동그란에 들어가면 완전한 고립입니다.   \n",
              "7                                                              원단이 너무 뜨거워지지 않도록 주의하세요. 원단이 수축하거나 그을릴 수 있습니다.   \n",
              "8                                      도널드 트럼프 미국 대통령은 늦은 일요일 시간대에 대변인을 통해 발표한 성명에서 미군이 시리아에서 철수할 것이라고 발표했다.   \n",
              "9                      미국 체조협회는 미국 올림픽위원회의 서한을 지지하며 미국의 전 운동선수가 안전한 환경을 가질 수 있도록 돕는 올림픽 관계자의 절대적 필요성을 인정합니다.   \n",
              "10                                                                  이건 작별인사가 아닙니다. 이것은 한작의 끝이며 새로운 작의 시작입니다.   \n",
              "11                                남아프리카 공화국의 특정 공원 또는 남아프리카 공화국 국립공원 전체에 입장할 수 있는 와일드카드를 구매하는 것이 이득일 수 있습니다.   \n",
              "12                                             높은 고도나 산길을 넘어 운전하려고 하면 눈, 빙판 또는 영하의 기온 등이 생길 가능성에 대해 고려해야 한다.   \n",
              "13                                                  이 이야기에 관계된 두 명의 성인 자녀를 둔 기혼자인 듀발은 밀러에게 깊은 임상을 남기지 않았습니다.   \n",
              "14                                                            찬드라 시에카르 솔랑키 경찰서장은 피고인이 얼굴을 가린 채 법적에 나왔다고 말했다.   \n",
              "15   지하철의 정규 안내 방송은 카탈로니아어로만 제공되나 계획에 없던 운행 중단 시에는 자동 시스템을 통해 스페인어, 영어, 프랑스어, 아라비아어, 일본어를 포함한 다양한 언어로 방송됩니다.   \n",
              "16                                                                          그는 그 소문을 정치적 수다와 어리석음이라고 언급했습니다.   \n",
              "17                                                                지구가 마치 움직이지 않은 것처럼 느껴지기에 언특보면 맞는 말 같기도 하죠?   \n",
              "18                                                          어린이와 노약자를 포함한 6명의 인질들은 필리핀 사진작가들과 마찬가지로 일찍 풀려났다.   \n",
              "19                                                                      같은 줄을 따라 남자는 무릎을 덮는 바지를 입을 것을 요구받는다.   \n",
              "20                                                                         그것은 기차와 자동차 및 여러 교통수단을 가져다 주었습니다.   \n",
              "21                                       모로코 슬타는 다루이 바드야라는 도시로 재건축하고 여기에 무역 거점을 세운 스페인 상인들은 이곳을 카사블랑카라고 불렀다.   \n",
              "22                                                    비록 대부분 이론에 의존하긴 해도, 프로그램은 궁수자리 은하관측을 시뮬레이션하도록 짜여 있습니다.   \n",
              "23                                                   아동이 위탁보호를 받게 되는 이유는 방치에서 학대, 심지어 강탈에 이르기까지 광범위하게 다양합니다.   \n",
              "24                                                                          뇌병리와 행동사이의 상관관계가 과학자들의 연구를 돕습니다.   \n",
              "25                                                       사람들은 귀국길에 오르는 여행객들에게도 인내심과 이해심이 필요하다고 생각하지 않을 수 있다.   \n",
              "26                                                                         교전이 발발한 직후 영국은 독일에 대한 해상붕쇄를 시작한다.   \n",
              "27                                                          이렇게 되면 플레이어들이 장치를 허공해서 움직여 동작과 움직임을 제어할 수 있게 된다.   \n",
              "28                               스펙트럼의 또 다른 끝에서는 팀이 해오던 모든 것을 바꾸고 자신만의 방식으로 만들어야 한다고 생각하는 인지하지 못한 개인으로 바뀝니다.   \n",
              "29                                                   피라미드 사운드와 광선쇼는 이 지역에서 어린이들의 흥미를 가장 많이 끌어들이는 것들 중 하나입니다.   \n",
              "30                                                                     장면들이 피라미드들 위에 비쳤고 다른 피라미드들에는 불이 밝혀졌다.   \n",
              "31                                                                  이곳은 남아프리카의 명소 중 하나로 남아프리카 국립공원의 플래그십입니다.   \n",
              "32                         아직도 존재하는 것으로 알려진 25개의 덜랩 브로드 사이드는 이 문서에 남아있는 가장 오래된 사번이다. 손으로 쓴 원보는 더 이상 남아있지 않다.   \n",
              "33                                         상트 페테르브르크 크루즈는 시내에서의 시간이 포함됩니다. 크루즈 승객은 비자 요건에서 제외됩니다. 약관을 확인하세요.   \n",
              "34          블로그는 학생들의 글쓰기를 개선하는 데에도 도움이 됩니다. 학생들이 종종 부족한 문법과 철자법으로 블로그를 시작하지만 독자들의 존재가 일반적으로 그러한 점들을 변화시킵니다.   \n",
              "35                                                      델포트로가 2세트에서 먼저 어드벤티즈를 얻었지만 6대6이 된 후 다시 타이브레이크가 필요했다.   \n",
              "36                                                                             이것은 몇몇 동사와 사물을 구별하는 중요한 방법이다.   \n",
              "37                                       교회의 핵심 권력은 1000년 넘게 로마에 머물렀다. 이러한 힘과 돈의 편중은 이 교리가 합당한지에 대한 의문을 낳았다.   \n",
              "38                                             보고서는 행정부의 현재 이라크 정책에 대해 거의 모든 측면에서 매우 비판적이며 즉각적인 방향변경을 촉구합니다.   \n",
              "\n",
              "                                                                                                                                                             translation  \n",
              "0                                The re-entry shock does not have a honeymoon stage, so it occurs faster than the cultural shock and can last longer and be more severe.  \n",
              "1                                                         Unfortunately, it is difficult to study the flow of traffic because the driver's behavior is not 100% certain.  \n",
              "2                                          Don't forget that there are more than two metals in the alloy. Don't forget that there are more than two metals in the alloy.  \n",
              "3                                                   Hong Kong Island is a place named Hong Kong on Hong Kong territory, and it is the main destination of many tourists.  \n",
              "4                                             Do not be afraid to reject men and defend your position. Whether it is a cultural difference or not, it cannot be ignored.  \n",
              "5                                                            Most of the smaller islands are independent or related to France, and are known as luxurious beach resorts.  \n",
              "6                           In fact, even if there are people who know that this exists, it is not easy to find it. Once you enter the cave, it is a complete isolation.  \n",
              "7                                                                                        Be careful not to heat the fabric too much. The fabric may shrink or get dirty.  \n",
              "8                                                        U.S. President Donald Trump announced on Sunday that the U.S. military would withdraw from Syria in his speech.  \n",
              "9                                  The U.S. Sports Association supports the call of the U.S. Olympic Committee to help all athletes in the U.S. have a safe environment.  \n",
              "10                                                                                  This is not a farewell. This is the end of one work and the beginning of a new work.  \n",
              "11                                             It may be beneficial to purchase a wild card that allows you to enter a specific park or a national park of South Africa.  \n",
              "12                          If you want to drive over high altitude or mountain roads, you have to consider the possibility of snow, ice, or the temperature of a movie.  \n",
              "13                                                                   Duval, who had two adult children related to this story, did not leave a deep impression on Miller.  \n",
              "14                                                            The police chief of Chandra Shekhar Solanki said that the defendant covered his face and came out legally.  \n",
              "15                     Subway's regular announcement broadcast is only available in Catalan, but it will be broadcast in Spanish, English, French, Arabic, and Japanese.  \n",
              "16                                                                                                             He mentioned the rumor as political talk and foolishness.  \n",
              "17                                                                                                     It feels as if the earth is not moving, and it seems to be right.  \n",
              "18                                                             Children and old people, including 6 people, were released early, just like the Philippine photographers.  \n",
              "19                                                                                              The man is asked to wear pants that cover his knees along the same line.  \n",
              "20                                                                                                           It was given by trains, cars, and other means of transport.  \n",
              "21                         Morocco Sultan was re-constructed as a city called Darui Badia The Spanish merchants who built a trade hub here called this place Casablanca.  \n",
              "22                                                                Although most of them rely on theory, the program is designed to simulate the Milky Way of the palace.  \n",
              "23                                                                            The reason why children are protected is that they can be abused, abused, and even robbed.  \n",
              "24                                                                                         The relationship between the brain and the action helps scientists' research.  \n",
              "25                                                              People may not think that patience and understanding are necessary for travelers on their way back home.  \n",
              "26                                                                                                      After the conflict, UK has a problem of remorse towards Germany.  \n",
              "27                                                                                        In this way, players can control the movement by moving the device in the air.  \n",
              "28        At the other end of the spectrum, you have to change everything your team has been doing and make it your own way. It turns into an unrecognizable individual.  \n",
              "29                                                             Pyramid Sound and Glow Show is one of the things that attracts children's interest the most in this area.  \n",
              "30                                                                                The scenes were reflected on the pyramids, and the fire was lit on the other pyramids.  \n",
              "31                                                                    This is one of the famous places in South Africa and is a flagship of South African National Park.  \n",
              "32            The 25th Dunlap Broadside, which is known to still exist, is the oldest document in this document. The original document written by hand no longer exists.  \n",
              "33                                   Saint-Petersburg Cruise includes time in the city. Cruise passengers are excluded from visa conditions. Please check your passport.  \n",
              "34   Blogs are helpful in improving students' writing skills. Students often start blogs with poor grammar and spelling, which normally can change their writing skills.  \n",
              "35                                                                               Del Potro got an advantage in the second set, but he needed a tiebreak again after 6-6.  \n",
              "36                                                                                                  This is an important method of distinguishing some verbs and things.  \n",
              "37                            The main power of the church has remained in Rome for over 1,000 years. The power of the church has remained in Rome for over 1,000 years.  \n",
              "38                                        The report is very critical of the current Iraqi policy of the administration, and calls for an immediate change of direction.  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-70dfb3e0-7b39-4f16-915d-5da1fc4ea0c2\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>reference</th>\n",
              "      <th>transcription</th>\n",
              "      <th>translation</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>재입국 충격은 신혼 단계가 없기 때문에 문화 충격보다 빠르게 발생하며 더 오래 지속하고 더 극심할 수 있습니다</td>\n",
              "      <td>재입국 충격은 신혼 단계가 없기 때문에 문화 충격보다 빠르게 발생하며 더 오래 지속하고 더 극심할 수 있습니다.</td>\n",
              "      <td>The re-entry shock does not have a honeymoon stage, so it occurs faster than the cultural shock and can last longer and be more severe.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>불행히도 운전자의 행동이 100% 확실하다고 예측할 수 없기 때문에 교통 흐름을 연구하기란 어렵다</td>\n",
              "      <td>불행히도 운전자의 행동이 100% 확실하다고 예측할 수 없기 때문에 교통 흐름을 연구하기란 어렵다.</td>\n",
              "      <td>Unfortunately, it is difficult to study the flow of traffic because the driver's behavior is not 100% certain.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>합금은 기본적으로 금속 두 개 이상의 혼합물이다 주기율표 상에 원소가 많이 있다는 것을 잊지 마라</td>\n",
              "      <td>합금은 기본적으로 금속 2개 이상의 혼합물이다. 주기율표상의 원소가 많이 있다는 것을 잊지 마라.</td>\n",
              "      <td>Don't forget that there are more than two metals in the alloy. Don't forget that there are more than two metals in the alloy.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>홍콩 섬은 홍콩 영토에 홍콩이라는 이름을 부여한 곳이자 많은 관광객들이 주안점으로 여기는 곳이다</td>\n",
              "      <td>홍콩섬은 홍콩 영토에 홍콩이라는 이름을 부여한 곳이자 많은 관광객들이 주한점으로 여기는 곳이다.</td>\n",
              "      <td>Hong Kong Island is a place named Hong Kong on Hong Kong territory, and it is the main destination of many tourists.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>남자들을 단호히 거절하고 자신의 입장을 고수하는 것을 두려워하지 말라문화적 차이든 아니든 희롱은 간과될 수 없다!</td>\n",
              "      <td>남자들을 단호히 거절하고 자신의 입장을 고수하는 것을 두려워하지 말라. 문화적 차이든 아니든 희롱은 간과 될 수 없다.</td>\n",
              "      <td>Do not be afraid to reject men and defend your position. Whether it is a cultural difference or not, it cannot be ignored.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>좀 더 작은 섬의 대부분은 독립 국가이거나 프랑스와 관련이 있고 호화로운 비치 리조트로 알려져 있다</td>\n",
              "      <td>좀 더 작은 섬의 대부분은 독립국가이거나 프랑스와 관련이 있고, 호화로운 비치 리조트로 알려져 있다.</td>\n",
              "      <td>Most of the smaller islands are independent or related to France, and are known as luxurious beach resorts.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>사실 이게 존재한다는 사실을 아는 사람이 있다고 해도 찾는 것조차 쉽지 않습니다 일단 동굴 안에 들어가면 완전한 고립입니다</td>\n",
              "      <td>사실 이게 존재한다는 사실을 아는 사람이 있다고 해도 찾는 것조차 쉽지 않습니다. 일단 동그란에 들어가면 완전한 고립입니다.</td>\n",
              "      <td>In fact, even if there are people who know that this exists, it is not easy to find it. Once you enter the cave, it is a complete isolation.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>원단이 너무 뜨거워지지 않도록 주의하세요원단이 수축하거나 그을릴 수 있습니다</td>\n",
              "      <td>원단이 너무 뜨거워지지 않도록 주의하세요. 원단이 수축하거나 그을릴 수 있습니다.</td>\n",
              "      <td>Be careful not to heat the fabric too much. The fabric may shrink or get dirty.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>도널드 트럼프 미국 대통령은 늦은 일요일 시간대에 대변인을 통해 발표한 성명에서 미군이 시리아에서 철수할 것이라고 발표했다</td>\n",
              "      <td>도널드 트럼프 미국 대통령은 늦은 일요일 시간대에 대변인을 통해 발표한 성명에서 미군이 시리아에서 철수할 것이라고 발표했다.</td>\n",
              "      <td>U.S. President Donald Trump announced on Sunday that the U.S. military would withdraw from Syria in his speech.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>미국 체조 협회는 미국 올림픽 위원회의 서한을 지지하며 미국의 전 운동선수가 안전한 환경을 가질 수 있도록 돕는 올림픽 관계자의 절대적 필요성을 인정합니다</td>\n",
              "      <td>미국 체조협회는 미국 올림픽위원회의 서한을 지지하며 미국의 전 운동선수가 안전한 환경을 가질 수 있도록 돕는 올림픽 관계자의 절대적 필요성을 인정합니다.</td>\n",
              "      <td>The U.S. Sports Association supports the call of the U.S. Olympic Committee to help all athletes in the U.S. have a safe environment.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>\"이건 작별 인사가 아닙니다. 이것은 한 장의 끝이며 새로운 장의 시작입니다.\"\\t \" \" 이 건 | 작 별 | 인 사 가 | 아 닙 니 다 . | 이 것 은 | 한 | 장 의 | 끝 이 며 | 새 로 운 | 장 의 | 시 작 입 니 다 . \" \" |</td>\n",
              "      <td>이건 작별인사가 아닙니다. 이것은 한작의 끝이며 새로운 작의 시작입니다.</td>\n",
              "      <td>This is not a farewell. This is the end of one work and the beginning of a new work.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>남아프리카 공화국의 특정 공원 또는 남아프리카 공화국 국립공원 전체에 입장할 수 있는 와일드 카드를 구매하는 것이 이득일 수 있습니다</td>\n",
              "      <td>남아프리카 공화국의 특정 공원 또는 남아프리카 공화국 국립공원 전체에 입장할 수 있는 와일드카드를 구매하는 것이 이득일 수 있습니다.</td>\n",
              "      <td>It may be beneficial to purchase a wild card that allows you to enter a specific park or a national park of South Africa.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>높은 고도나 산길을 넘어 운전하려고 하면 눈 빙판 또는 영하의 기온 등이 생길 가능성에 대해 고려해야 한다</td>\n",
              "      <td>높은 고도나 산길을 넘어 운전하려고 하면 눈, 빙판 또는 영하의 기온 등이 생길 가능성에 대해 고려해야 한다.</td>\n",
              "      <td>If you want to drive over high altitude or mountain roads, you have to consider the possibility of snow, ice, or the temperature of a movie.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>이 이야기에 관계된 두 명의 성인 자녀를 둔 기혼자인 듀발은 밀러에게 깊은 인상을 남기지 않았습니다</td>\n",
              "      <td>이 이야기에 관계된 두 명의 성인 자녀를 둔 기혼자인 듀발은 밀러에게 깊은 임상을 남기지 않았습니다.</td>\n",
              "      <td>Duval, who had two adult children related to this story, did not leave a deep impression on Miller.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>찬드라 셰카르 솔랑키 경찰서장은 피고인이 얼굴을 가린 채 법정에 나왔다고 말했다</td>\n",
              "      <td>찬드라 시에카르 솔랑키 경찰서장은 피고인이 얼굴을 가린 채 법적에 나왔다고 말했다.</td>\n",
              "      <td>The police chief of Chandra Shekhar Solanki said that the defendant covered his face and came out legally.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>지하철의 정규 안내 방송은 카탈로니아어로만 제공되나 계획에 없던 운행 중단 시에는 자동 시스템을 통해 스페인어 영어 프랑스어 아라비아어 일본어를 포함한 다양한 언어로 방송됩니다</td>\n",
              "      <td>지하철의 정규 안내 방송은 카탈로니아어로만 제공되나 계획에 없던 운행 중단 시에는 자동 시스템을 통해 스페인어, 영어, 프랑스어, 아라비아어, 일본어를 포함한 다양한 언어로 방송됩니다.</td>\n",
              "      <td>Subway's regular announcement broadcast is only available in Catalan, but it will be broadcast in Spanish, English, French, Arabic, and Japanese.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>그는 그 소문을 정치적 수다와 어리석음\"\"이라고 언급했습니다.</td>\n",
              "      <td>그는 그 소문을 정치적 수다와 어리석음이라고 언급했습니다.</td>\n",
              "      <td>He mentioned the rumor as political talk and foolishness.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17</th>\n",
              "      <td>지구가 마치 움직이지 않는 것처럼 느껴지기에 언뜻 보면 맞는 말 같기도 하죠</td>\n",
              "      <td>지구가 마치 움직이지 않은 것처럼 느껴지기에 언특보면 맞는 말 같기도 하죠?</td>\n",
              "      <td>It feels as if the earth is not moving, and it seems to be right.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>어린이와 노약자를 포함한 6명의 인질들은 필리핀 사진작가들과 마찬가지로 일찍 풀려났다</td>\n",
              "      <td>어린이와 노약자를 포함한 6명의 인질들은 필리핀 사진작가들과 마찬가지로 일찍 풀려났다.</td>\n",
              "      <td>Children and old people, including 6 people, were released early, just like the Philippine photographers.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>같은 줄을 따라 남자는 무릎을 덮는 바지를 입을 것을 요구받는다</td>\n",
              "      <td>같은 줄을 따라 남자는 무릎을 덮는 바지를 입을 것을 요구받는다.</td>\n",
              "      <td>The man is asked to wear pants that cover his knees along the same line.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>20</th>\n",
              "      <td>그것은 기차와 자동차 및 여러 교통수단을 가져다주었습니다</td>\n",
              "      <td>그것은 기차와 자동차 및 여러 교통수단을 가져다 주었습니다.</td>\n",
              "      <td>It was given by trains, cars, and other means of transport.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>21</th>\n",
              "      <td>모로코 술탄은 다루 이 바드야daru l-badya라는 도시로 재건축하고 여기에 무역 거점을 세운 스페인 상인들은 이곳을 카사블랑카라고 불렀다</td>\n",
              "      <td>모로코 슬타는 다루이 바드야라는 도시로 재건축하고 여기에 무역 거점을 세운 스페인 상인들은 이곳을 카사블랑카라고 불렀다.</td>\n",
              "      <td>Morocco Sultan was re-constructed as a city called Darui Badia The Spanish merchants who built a trade hub here called this place Casablanca.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>22</th>\n",
              "      <td>비록 대부분 이론에 의존하긴 해도 프로그램은 궁수자리 은하 관측을 시뮬레이션하도록 짜여 있습니다</td>\n",
              "      <td>비록 대부분 이론에 의존하긴 해도, 프로그램은 궁수자리 은하관측을 시뮬레이션하도록 짜여 있습니다.</td>\n",
              "      <td>Although most of them rely on theory, the program is designed to simulate the Milky Way of the palace.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>23</th>\n",
              "      <td>아동이 위탁 보호를 받게 되는 이유는 방치에서 학대 심지어 강탈에 이르기까지 광범위하게 다양합니다</td>\n",
              "      <td>아동이 위탁보호를 받게 되는 이유는 방치에서 학대, 심지어 강탈에 이르기까지 광범위하게 다양합니다.</td>\n",
              "      <td>The reason why children are protected is that they can be abused, abused, and even robbed.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>24</th>\n",
              "      <td>뇌 병리와 행동 사이의 상관관계가 과학자들의 연구를 돕습니다</td>\n",
              "      <td>뇌병리와 행동사이의 상관관계가 과학자들의 연구를 돕습니다.</td>\n",
              "      <td>The relationship between the brain and the action helps scientists' research.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25</th>\n",
              "      <td>사람들은 귀국길에 오르는 여행객들에게도 인내심과 이해심이 필요하다고 생각하지 않을 수 있다</td>\n",
              "      <td>사람들은 귀국길에 오르는 여행객들에게도 인내심과 이해심이 필요하다고 생각하지 않을 수 있다.</td>\n",
              "      <td>People may not think that patience and understanding are necessary for travelers on their way back home.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>26</th>\n",
              "      <td>교전이 발발한 직후 영국은 독일에 대한 해상 봉쇄를 시작한다</td>\n",
              "      <td>교전이 발발한 직후 영국은 독일에 대한 해상붕쇄를 시작한다.</td>\n",
              "      <td>After the conflict, UK has a problem of remorse towards Germany.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>27</th>\n",
              "      <td>이렇게 되면 플레이어들이 장치를 허공에서 움직여 동작과 움직임을 제어할 수 있게 된다</td>\n",
              "      <td>이렇게 되면 플레이어들이 장치를 허공해서 움직여 동작과 움직임을 제어할 수 있게 된다.</td>\n",
              "      <td>In this way, players can control the movement by moving the device in the air.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>28</th>\n",
              "      <td>스펙트럼의 또 다른 끝에서는 팀이 해오던 모든 것을 바꾸고 자신만의 방식으로 만들어야 한다고 생각하는 인지하지 못한 개인으로 바뀝니다</td>\n",
              "      <td>스펙트럼의 또 다른 끝에서는 팀이 해오던 모든 것을 바꾸고 자신만의 방식으로 만들어야 한다고 생각하는 인지하지 못한 개인으로 바뀝니다.</td>\n",
              "      <td>At the other end of the spectrum, you have to change everything your team has been doing and make it your own way. It turns into an unrecognizable individual.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>29</th>\n",
              "      <td>피라미드 사운드와 광선 쇼는 이 지역에서 어린이들의 흥미를 가장 많이 끌어들이는 것들 중 하나입니다</td>\n",
              "      <td>피라미드 사운드와 광선쇼는 이 지역에서 어린이들의 흥미를 가장 많이 끌어들이는 것들 중 하나입니다.</td>\n",
              "      <td>Pyramid Sound and Glow Show is one of the things that attracts children's interest the most in this area.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>30</th>\n",
              "      <td>장면들이 피라미드들 위에 비쳤고 다른 피라미드들에는 불이 밝혀졌다</td>\n",
              "      <td>장면들이 피라미드들 위에 비쳤고 다른 피라미드들에는 불이 밝혀졌다.</td>\n",
              "      <td>The scenes were reflected on the pyramids, and the fire was lit on the other pyramids.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>31</th>\n",
              "      <td>이곳은 남아프리카의 명소 중 하나로 남아프리카 국립 공원sanparks의 플래그십입니다</td>\n",
              "      <td>이곳은 남아프리카의 명소 중 하나로 남아프리카 국립공원의 플래그십입니다.</td>\n",
              "      <td>This is one of the famous places in South Africa and is a flagship of South African National Park.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>32</th>\n",
              "      <td>아직도 존재하는 것으로 알려진 25개의 던랩 브로드 사이드는 이 문서의 남아있는 가장 오래된 사본이다 손으로 쓴 원본은 더 이상 남아 있지 않다</td>\n",
              "      <td>아직도 존재하는 것으로 알려진 25개의 덜랩 브로드 사이드는 이 문서에 남아있는 가장 오래된 사번이다. 손으로 쓴 원보는 더 이상 남아있지 않다.</td>\n",
              "      <td>The 25th Dunlap Broadside, which is known to still exist, is the oldest document in this document. The original document written by hand no longer exists.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>33</th>\n",
              "      <td>상트페테르부르크 크루즈는 시내에서의 시간이 포함됩니다. 크루즈 승객은 비자 요건에서 제외됩니다약관을 확인하세요</td>\n",
              "      <td>상트 페테르브르크 크루즈는 시내에서의 시간이 포함됩니다. 크루즈 승객은 비자 요건에서 제외됩니다. 약관을 확인하세요.</td>\n",
              "      <td>Saint-Petersburg Cruise includes time in the city. Cruise passengers are excluded from visa conditions. Please check your passport.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>34</th>\n",
              "      <td>블로그는 학생들의 글쓰기를 개선하는 데에도 도움이 됩니다 학생들이 종종 부족한 문법과 철자법으로 블로그를 시작하지만 독자들의 존재가 일반적으로 그러한 점들을 변화시킵니다</td>\n",
              "      <td>블로그는 학생들의 글쓰기를 개선하는 데에도 도움이 됩니다. 학생들이 종종 부족한 문법과 철자법으로 블로그를 시작하지만 독자들의 존재가 일반적으로 그러한 점들을 변화시킵니다.</td>\n",
              "      <td>Blogs are helpful in improving students' writing skills. Students often start blogs with poor grammar and spelling, which normally can change their writing skills.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>35</th>\n",
              "      <td>델 포트로가 2세트에서 먼저 어드밴티지를 얻었지만 6 대 6이 된 후 다시 타이 브레이크가 필요했다</td>\n",
              "      <td>델포트로가 2세트에서 먼저 어드벤티즈를 얻었지만 6대6이 된 후 다시 타이브레이크가 필요했다.</td>\n",
              "      <td>Del Potro got an advantage in the second set, but he needed a tiebreak again after 6-6.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>36</th>\n",
              "      <td>이것은 몇몇 동사와 사물을 구별하는 중요한 방법이다</td>\n",
              "      <td>이것은 몇몇 동사와 사물을 구별하는 중요한 방법이다.</td>\n",
              "      <td>This is an important method of distinguishing some verbs and things.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>37</th>\n",
              "      <td>교회의 핵심 권력은 천 년 넘게 로마에 머물렀다 이런 힘과 돈의 편중은 이 교리가 합당한지에 대한 의문을 낳았다</td>\n",
              "      <td>교회의 핵심 권력은 1000년 넘게 로마에 머물렀다. 이러한 힘과 돈의 편중은 이 교리가 합당한지에 대한 의문을 낳았다.</td>\n",
              "      <td>The main power of the church has remained in Rome for over 1,000 years. The power of the church has remained in Rome for over 1,000 years.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>38</th>\n",
              "      <td>보고서는 행정부의 현재 이라크 정책에 대해 거의 모든 측면에서 매우 비판적이며 즉각적인 방향 변경을 촉구합니다</td>\n",
              "      <td>보고서는 행정부의 현재 이라크 정책에 대해 거의 모든 측면에서 매우 비판적이며 즉각적인 방향변경을 촉구합니다.</td>\n",
              "      <td>The report is very critical of the current Iraqi policy of the administration, and calls for an immediate change of direction.</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-70dfb3e0-7b39-4f16-915d-5da1fc4ea0c2')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-70dfb3e0-7b39-4f16-915d-5da1fc4ea0c2 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-70dfb3e0-7b39-4f16-915d-5da1fc4ea0c2');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ]
          },
          "metadata": {},
          "execution_count": 10
        }
      ],
      "source": [
        "data = pd.DataFrame(dict(reference=references, transcription=transcriptions, translation=translations))\n",
        "data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "83LRfd1plYEb"
      },
      "source": [
        "# Word-level timestamps using attention weights\n",
        "\n",
        "Below, we use the cross-attention weights to determine more granular, word-level timestamps. It uses a set of heuristics and dynamic time warping (DTW) to find the alignment between the audio and the transcript."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KGtZWNaQlOVC",
        "outputId": "8a03465b-d4c9-40ca-a959-0454a7a8118a"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Requirement already satisfied: dtw-python in /usr/local/lib/python3.8/dist-packages (1.3.0)\n",
            "Requirement already satisfied: scipy>=1.1 in /usr/local/lib/python3.8/dist-packages (from dtw-python) (1.7.3)\n",
            "Requirement already satisfied: numpy>=1.19 in /usr/local/lib/python3.8/dist-packages (from dtw-python) (1.21.6)\n"
          ]
        }
      ],
      "source": [
        "! pip install dtw-python"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "id": "3HTv8KmzlZtc",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "63f96356-106d-4404-aa48-885e2ca1f7db"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Importing the dtw module. When using in academic works please cite:\n",
            "  T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package.\n",
            "  J. Stat. Soft., doi:10.18637/jss.v031.i07.\n",
            "\n"
          ]
        }
      ],
      "source": [
        "import string\n",
        "import matplotlib.pyplot as plt\n",
        "import matplotlib.font_manager as fm\n",
        "import matplotlib.ticker as ticker\n",
        "\n",
        "from IPython.display import display, HTML\n",
        "from whisper.tokenizer import get_tokenizer\n",
        "from dtw import dtw\n",
        "from scipy.ndimage import median_filter\n",
        "\n",
        "%matplotlib inline\n",
        "%config InlineBackend.figure_format = \"retina\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "jqSiARgqlb6X"
      },
      "outputs": [],
      "source": [
        "AUDIO_SAMPLES_PER_TOKEN = whisper.audio.HOP_LENGTH * 2\n",
        "AUDIO_TIME_PER_TOKEN = AUDIO_SAMPLES_PER_TOKEN / whisper.audio.SAMPLE_RATE\n",
        "\n",
        "medfilt_width = 7\n",
        "qk_scale = 1.0\n",
        "\n",
        "tokenizer = get_tokenizer(model.is_multilingual, language=languages[lang])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "id": "ZTVen0YkldgU"
      },
      "outputs": [],
      "source": [
        "# This part downloads a repackaged version of the Noto Sans font (either CJK or non-CJK)\n",
        "# to render various languages in Matplotlib figures.\n",
        "\n",
        "if languages[lang] in {\"Chinese\", \"Japanese\", \"Korean\"}:\n",
        "    font = \"GoNotoCJKCore.ttf\"\n",
        "else:\n",
        "    font = \"GoNotoCurrent.ttf\"\n",
        "\n",
        "font_release = \"https://github.com/satbyy/go-noto-universal/releases/download/v5.2\"\n",
        "if not os.path.exists(font):\n",
        "    download(f\"{font_release}/{font}\", font)\n",
        "\n",
        "prop = fm.FontProperties(fname=font)\n",
        "props = {'fontproperties': prop}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "id": "UiOFv8X5lhQA"
      },
      "outputs": [],
      "source": [
        "def split_tokens_on_unicode(tokens: torch.Tensor):\n",
        "    words = []\n",
        "    word_tokens = []\n",
        "    current_tokens = []\n",
        "    \n",
        "    for token in tokens.tolist():\n",
        "        current_tokens.append(token)\n",
        "        decoded = tokenizer.decode_with_timestamps(current_tokens)\n",
        "        if \"\\ufffd\" not in decoded:\n",
        "            words.append(decoded)\n",
        "            word_tokens.append(current_tokens)\n",
        "            current_tokens = []\n",
        "    \n",
        "    return words, word_tokens"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "CkhsL9xUmHjt"
      },
      "outputs": [],
      "source": [
        "def split_tokens_on_spaces(tokens: torch.Tensor):\n",
        "    subwords, subword_tokens_list = split_tokens_on_unicode(tokens)\n",
        "    words = []\n",
        "    word_tokens = []\n",
        "    \n",
        "    for subword, subword_tokens in zip(subwords, subword_tokens_list):\n",
        "        special = subword_tokens[0] >= tokenizer.eot\n",
        "        with_space = subword.startswith(\" \")\n",
        "        punctuation = subword.strip() in string.punctuation\n",
        "        if special or with_space or punctuation:\n",
        "            words.append(subword)\n",
        "            word_tokens.append(subword_tokens)\n",
        "        else:\n",
        "            words[-1] = words[-1] + subword\n",
        "            word_tokens[-1].extend(subword_tokens)\n",
        "    \n",
        "    return words, word_tokens"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "pQOl_4TSmIgB"
      },
      "outputs": [],
      "source": [
        "if languages[lang] in {\"Chinese\", \"Japanese\", \"Thai\", \"Lao\", \"Myanmar\"}:\n",
        "    # These lang

Download .txt

gitextract_mtgfm16e/

├── .flake8
├── .gitattributes
├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       ├── python-publish.yml
│       └── test.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── data/
│   ├── README.md
│   └── meanwhile.json
├── model-card.md
├── notebooks/
│   ├── LibriSpeech.ipynb
│   └── Multilingual_ASR.ipynb
├── pyproject.toml
├── requirements.txt
├── tests/
│   ├── conftest.py
│   ├── jfk.flac
│   ├── test_audio.py
│   ├── test_normalizer.py
│   ├── test_timing.py
│   ├── test_tokenizer.py
│   └── test_transcribe.py
└── whisper/
    ├── __init__.py
    ├── __main__.py
    ├── assets/
    │   ├── gpt2.tiktoken
    │   ├── mel_filters.npz
    │   └── multilingual.tiktoken
    ├── audio.py
    ├── decoding.py
    ├── model.py
    ├── normalizers/
    │   ├── __init__.py
    │   ├── basic.py
    │   ├── english.json
    │   └── english.py
    ├── timing.py
    ├── tokenizer.py
    ├── transcribe.py
    ├── triton_ops.py
    ├── utils.py
    └── version.py

Download .txt

SYMBOL INDEX (188 symbols across 17 files)

FILE: tests/conftest.py
  function pytest_configure (line 7) | def pytest_configure(config):
  function random (line 12) | def random():

FILE: tests/test_audio.py
  function test_audio (line 8) | def test_audio():

FILE: tests/test_normalizer.py
  function test_number_normalizer (line 11) | def test_number_normalizer(std):
  function test_spelling_normalizer (line 77) | def test_spelling_normalizer():
  function test_text_normalizer (line 84) | def test_text_normalizer():

FILE: tests/test_timing.py
  function test_dtw (line 23) | def test_dtw(N: int, M: int):
  function test_dtw_cuda_equivalence (line 57) | def test_dtw_cuda_equivalence(N: int, M: int):
  function test_median_filter (line 68) | def test_median_filter(shape):
  function test_median_filter_equivalence (line 89) | def test_median_filter_equivalence(shape):

FILE: tests/test_tokenizer.py
  function test_tokenizer (line 7) | def test_tokenizer(multilingual):
  function test_multilingual_tokenizer (line 14) | def test_multilingual_tokenizer():
  function test_split_on_unicode (line 27) | def test_split_on_unicode():

FILE: tests/test_transcribe.py
  function test_transcribe (line 11) | def test_transcribe(model_name: str):

FILE: whisper/__init__.py
  function _download (line 54) | def _download(url: str, root: str, in_memory: bool) -> Union[bytes, str]:
  function available_models (line 98) | def available_models() -> List[str]:
  function load_model (line 103) | def load_model(

FILE: whisper/audio.py
  function load_audio (line 25) | def load_audio(file: str, sr: int = SAMPLE_RATE):
  function pad_or_trim (line 65) | def pad_or_trim(array, length: int = N_SAMPLES, *, axis: int = -1):
  function mel_filters (line 92) | def mel_filters(device, n_mels: int) -> torch.Tensor:
  function log_mel_spectrogram (line 110) | def log_mel_spectrogram(

FILE: whisper/decoding.py
  function detect_language (line 19) | def detect_language(
  class DecodingOptions (line 81) | class DecodingOptions:
  class DecodingResult (line 118) | class DecodingResult:
  class Inference (line 130) | class Inference:
    method logits (line 131) | def logits(self, tokens: Tensor, audio_features: Tensor) -> Tensor:
    method rearrange_kv_cache (line 135) | def rearrange_kv_cache(self, source_indices) -> None:
    method cleanup_caching (line 139) | def cleanup_caching(self) -> None:
  class PyTorchInference (line 144) | class PyTorchInference(Inference):
    method __init__ (line 145) | def __init__(self, model: "Whisper", initial_token_length: int):
    method logits (line 155) | def logits(self, tokens: Tensor, audio_features: Tensor) -> Tensor:
    method cleanup_caching (line 165) | def cleanup_caching(self):
    method rearrange_kv_cache (line 172) | def rearrange_kv_cache(self, source_indices):
  class SequenceRanker (line 179) | class SequenceRanker:
    method rank (line 180) | def rank(
  class MaximumLikelihoodRanker (line 190) | class MaximumLikelihoodRanker(SequenceRanker):
    method __init__ (line 196) | def __init__(self, length_penalty: Optional[float]):
    method rank (line 199) | def rank(self, tokens: List[List[Tensor]], sum_logprobs: List[List[flo...
  class TokenDecoder (line 216) | class TokenDecoder:
    method reset (line 217) | def reset(self):
    method update (line 220) | def update(
    method finalize (line 247) | def finalize(
  class GreedyDecoder (line 272) | class GreedyDecoder(TokenDecoder):
    method __init__ (line 273) | def __init__(self, temperature: float, eot: int):
    method update (line 277) | def update(
    method finalize (line 295) | def finalize(self, tokens: Tensor, sum_logprobs: Tensor):
  class BeamSearchDecoder (line 301) | class BeamSearchDecoder(TokenDecoder):
    method __init__ (line 302) | def __init__(
    method reset (line 320) | def reset(self):
    method update (line 323) | def update(
    method finalize (line 384) | def finalize(self, preceding_tokens: Tensor, sum_logprobs: Tensor):
  class LogitFilter (line 407) | class LogitFilter:
    method apply (line 408) | def apply(self, logits: Tensor, tokens: Tensor) -> None:
  class SuppressBlank (line 423) | class SuppressBlank(LogitFilter):
    method __init__ (line 424) | def __init__(self, tokenizer: Tokenizer, sample_begin: int):
    method apply (line 428) | def apply(self, logits: Tensor, tokens: Tensor):
  class SuppressTokens (line 433) | class SuppressTokens(LogitFilter):
    method __init__ (line 434) | def __init__(self, suppress_tokens: Sequence[int]):
    method apply (line 437) | def apply(self, logits: Tensor, tokens: Tensor):
  class ApplyTimestampRules (line 441) | class ApplyTimestampRules(LogitFilter):
    method __init__ (line 442) | def __init__(
    method apply (line 452) | def apply(self, logits: Tensor, tokens: Tensor):
  class DecodingTask (line 508) | class DecodingTask:
    method __init__ (line 514) | def __init__(self, model: "Whisper", options: DecodingOptions):
    method _verify_options (line 572) | def _verify_options(self, options: DecodingOptions) -> DecodingOptions:
    method _get_initial_tokens (line 587) | def _get_initial_tokens(self) -> Tuple[int]:
    method _get_suppress_tokens (line 615) | def _get_suppress_tokens(self) -> Tuple[int]:
    method _get_audio_features (line 644) | def _get_audio_features(self, mel: Tensor):
    method _detect_language (line 666) | def _detect_language(self, audio_features: Tensor, tokens: Tensor):
    method _main_loop (line 680) | def _main_loop(self, audio_features: Tensor, tokens: Tensor):
    method run (line 713) | def run(self, mel: Tensor) -> List[DecodingResult]:
  function decode (line 793) | def decode(

FILE: whisper/model.py
  class ModelDimensions (line 26) | class ModelDimensions:
  class LayerNorm (line 39) | class LayerNorm(nn.LayerNorm):
    method forward (line 40) | def forward(self, x: Tensor) -> Tensor:
  class Linear (line 44) | class Linear(nn.Linear):
    method forward (line 45) | def forward(self, x: Tensor) -> Tensor:
  class Conv1d (line 53) | class Conv1d(nn.Conv1d):
    method _conv_forward (line 54) | def _conv_forward(
  function sinusoids (line 62) | def sinusoids(length, channels, max_timescale=10000):
  function disable_sdpa (line 72) | def disable_sdpa():
  class MultiHeadAttention (line 81) | class MultiHeadAttention(nn.Module):
    method __init__ (line 84) | def __init__(self, n_state: int, n_head: int):
    method forward (line 92) | def forward(
    method qkv_attention (line 114) | def qkv_attention(
  class ResidualAttentionBlock (line 142) | class ResidualAttentionBlock(nn.Module):
    method __init__ (line 143) | def __init__(self, n_state: int, n_head: int, cross_attention: bool = ...
    method forward (line 160) | def forward(
  class AudioEncoder (line 174) | class AudioEncoder(nn.Module):
    method __init__ (line 175) | def __init__(
    method forward (line 188) | def forward(self, x: Tensor):
  class TextDecoder (line 207) | class TextDecoder(nn.Module):
    method __init__ (line 208) | def __init__(
    method forward (line 227) | def forward(self, x: Tensor, xa: Tensor, kv_cache: Optional[dict] = No...
  class Whisper (line 252) | class Whisper(nn.Module):
    method __init__ (line 253) | def __init__(self, dims: ModelDimensions):
    method set_alignment_heads (line 278) | def set_alignment_heads(self, dump: bytes):
    method embed_audio (line 287) | def embed_audio(self, mel: torch.Tensor):
    method logits (line 290) | def logits(self, tokens: torch.Tensor, audio_features: torch.Tensor):
    method forward (line 293) | def forward(
    method device (line 299) | def device(self):
    method is_multilingual (line 303) | def is_multilingual(self):
    method num_languages (line 307) | def num_languages(self):
    method install_kv_cache_hooks (line 310) | def install_kv_cache_hooks(self, cache: Optional[dict] = None):

FILE: whisper/normalizers/basic.py
  function remove_symbols_and_diacritics (line 27) | def remove_symbols_and_diacritics(s: str, keep=""):
  function remove_symbols (line 50) | def remove_symbols(s: str):
  class BasicTextNormalizer (line 60) | class BasicTextNormalizer:
    method __init__ (line 61) | def __init__(self, remove_diacritics: bool = False, split_letters: boo...
    method __call__ (line 67) | def __call__(self, s: str):

FILE: whisper/normalizers/english.py
  class EnglishNumberNormalizer (line 12) | class EnglishNumberNormalizer:
    method __init__ (line 23) | def __init__(self):
    method process_words (line 165) | def process_words(self, words: List[str]) -> Iterator[str]:
    method preprocess (line 388) | def preprocess(self, s: str):
    method postprocess (line 417) | def postprocess(self, s: str):
    method __call__ (line 442) | def __call__(self, s: str):
  class EnglishSpellingNormalizer (line 450) | class EnglishSpellingNormalizer:
    method __init__ (line 457) | def __init__(self):
    method __call__ (line 461) | def __call__(self, s: str):
  class EnglishTextNormalizer (line 465) | class EnglishTextNormalizer:
    method __init__ (line 466) | def __init__(self):
    method __call__ (line 526) | def __call__(self, s: str):

FILE: whisper/timing.py
  function median_filter (line 19) | def median_filter(x: torch.Tensor, filter_width: int):
  function backtrace (line 58) | def backtrace(trace: np.ndarray):
  function dtw_cpu (line 83) | def dtw_cpu(x: np.ndarray):
  function dtw_cuda (line 108) | def dtw_cuda(x, BLOCK_SIZE=1024):
  function dtw (line 141) | def dtw(x: torch.Tensor) -> np.ndarray:
  class WordTiming (line 155) | class WordTiming:
  function find_alignment (line 163) | def find_alignment(
  function merge_punctuations (line 245) | def merge_punctuations(alignment: List[WordTiming], prepended: str, appe...
  function add_word_timestamps (line 279) | def add_word_timestamps(

FILE: whisper/tokenizer.py
  class Tokenizer (line 132) | class Tokenizer:
    method __post_init__ (line 142) | def __post_init__(self):
    method encode (line 161) | def encode(self, text, **kwargs):
    method decode (line 164) | def decode(self, token_ids: List[int], **kwargs) -> str:
    method decode_with_timestamps (line 168) | def decode_with_timestamps(self, token_ids: List[int], **kwargs) -> str:
    method eot (line 176) | def eot(self) -> int:
    method transcribe (line 180) | def transcribe(self) -> int:
    method translate (line 184) | def translate(self) -> int:
    method sot (line 188) | def sot(self) -> int:
    method sot_lm (line 192) | def sot_lm(self) -> int:
    method sot_prev (line 196) | def sot_prev(self) -> int:
    method no_speech (line 200) | def no_speech(self) -> int:
    method no_timestamps (line 204) | def no_timestamps(self) -> int:
    method timestamp_begin (line 208) | def timestamp_begin(self) -> int:
    method language_token (line 212) | def language_token(self) -> int:
    method to_language_token (line 219) | def to_language_token(self, language):
    method all_language_tokens (line 226) | def all_language_tokens(self) -> Tuple[int]:
    method all_language_codes (line 234) | def all_language_codes(self) -> Tuple[str]:
    method sot_sequence_including_notimestamps (line 238) | def sot_sequence_including_notimestamps(self) -> Tuple[int]:
    method non_speech_tokens (line 242) | def non_speech_tokens(self) -> Tuple[int]:
    method split_to_word_tokens (line 277) | def split_to_word_tokens(self, tokens: List[int]):
    method split_tokens_on_unicode (line 286) | def split_tokens_on_unicode(self, tokens: List[int]):
    method split_tokens_on_spaces (line 311) | def split_tokens_on_spaces(self, tokens: List[int]):
  function get_encoding (line 331) | def get_encoding(name: str = "gpt2", num_languages: int = 99):
  function get_tokenizer (line 367) | def get_tokenizer(

FILE: whisper/transcribe.py
  function transcribe (line 38) | def transcribe(
  function cli (line 517) | def cli():

FILE: whisper/triton_ops.py
  function dtw_kernel (line 14) | def dtw_kernel(
  function median_kernel (line 44) | def median_kernel(filter_width: int):
  function median_filter_cuda (line 106) | def median_filter_cuda(x: torch.Tensor, filter_width: int):

FILE: whisper/utils.py
  function make_safe (line 12) | def make_safe(string):
  function make_safe (line 19) | def make_safe(string):
  function exact_div (line 24) | def exact_div(x, y):
  function str2bool (line 29) | def str2bool(string):
  function optional_int (line 37) | def optional_int(string):
  function optional_float (line 41) | def optional_float(string):
  function compression_ratio (line 45) | def compression_ratio(text) -> float:
  function format_timestamp (line 50) | def format_timestamp(
  function get_start (line 71) | def get_start(segments: List[dict]) -> Optional[float]:
  function get_end (line 78) | def get_end(segments: List[dict]) -> Optional[float]:
  class ResultWriter (line 85) | class ResultWriter:
    method __init__ (line 88) | def __init__(self, output_dir: str):
    method __call__ (line 91) | def __call__(
    method write_result (line 103) | def write_result(
  class WriteTXT (line 109) | class WriteTXT(ResultWriter):
    method write_result (line 112) | def write_result(
  class SubtitlesWriter (line 119) | class SubtitlesWriter(ResultWriter):
    method iterate_result (line 123) | def iterate_result(
    method format_timestamp (line 230) | def format_timestamp(self, seconds: float):
  class WriteVTT (line 238) | class WriteVTT(SubtitlesWriter):
    method write_result (line 243) | def write_result(
  class WriteSRT (line 251) | class WriteSRT(SubtitlesWriter):
    method write_result (line 256) | def write_result(
  class WriteTSV (line 265) | class WriteTSV(ResultWriter):
    method write_result (line 277) | def write_result(
  class WriteJSON (line 287) | class WriteJSON(ResultWriter):
    method write_result (line 290) | def write_result(
  function get_writer (line 296) | def get_writer(

Download .json

Condensed preview — 43 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (8,121K chars).

[
  {
    "path": ".flake8",
    "chars": 53,
    "preview": "[flake8]\nper-file-ignores =\n    */__init__.py: F401\n\n"
  },
  {
    "path": ".gitattributes",
    "chars": 214,
    "preview": "# Override jupyter in Github language stats for more accurate estimate of repo code languages\n# reference: https://githu"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 579,
    "preview": "# Keep GitHub Actions up to date with GitHub's Dependabot...\n# https://docs.github.com/en/code-security/dependabot/worki"
  },
  {
    "path": ".github/workflows/python-publish.yml",
    "chars": 1002,
    "preview": "name: Release\n\non:\n  push:\n    branches:\n    - main\njobs:\n  deploy:\n    runs-on: ubuntu-latest\n    steps:\n    - uses: ac"
  },
  {
    "path": ".github/workflows/test.yml",
    "chars": 2721,
    "preview": "name: test\non:\n  push:\n    branches:\n      - main\n  pull_request:\n    branches:\n      - main\n\njobs:\n  pre-commit:\n    ru"
  },
  {
    "path": ".gitignore",
    "chars": 106,
    "preview": "__pycache__/\n*.py[cod]\n*$py.class\n*.egg-info\n.pytest_cache\n.ipynb_checkpoints\n\nthumbs.db\n.DS_Store\n.idea\n\n"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 811,
    "preview": "repos:\n  - repo: https://github.com/pre-commit/pre-commit-hooks\n    rev: v5.0.0\n    hooks:\n      - id: check-json\n      "
  },
  {
    "path": "CHANGELOG.md",
    "chars": 9022,
    "preview": "# CHANGELOG\n\n## [v20250625](https://github.com/openai/whisper/releases/tag/v20250625)\n\n* Fix: Update torch.load to use w"
  },
  {
    "path": "LICENSE",
    "chars": 1063,
    "preview": "MIT License\n\nCopyright (c) 2022 OpenAI\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof "
  },
  {
    "path": "MANIFEST.in",
    "chars": 125,
    "preview": "include requirements.txt\ninclude README.md\ninclude LICENSE\ninclude whisper/assets/*\ninclude whisper/normalizers/english."
  },
  {
    "path": "README.md",
    "chars": 8243,
    "preview": "# Whisper\n\n[[Blog]](https://openai.com/blog/whisper)\n[[Paper]](https://arxiv.org/abs/2212.04356)\n[[Model card]](https://"
  },
  {
    "path": "data/README.md",
    "chars": 6211,
    "preview": "This directory supplements the paper with more details on how we prepared the data for evaluation, to help replicate our"
  },
  {
    "path": "data/meanwhile.json",
    "chars": 67802,
    "preview": "{\n    \"1YOmY-Vjy-o\": {\n        \"begin\": \"1:04.0\",\n        \"end\": \"2:11.0\",\n        \"text\": \"FOLKS, IF YOU WATCH THE SHOW"
  },
  {
    "path": "model-card.md",
    "chars": 7279,
    "preview": "# Model Card: Whisper\n\nThis is the official codebase for running the automatic speech recognition (ASR) models (Whisper "
  },
  {
    "path": "notebooks/LibriSpeech.ipynb",
    "chars": 31967,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"v5hvo8QWN-a9\"\n   },\n   \"source\": [\n    \"# Inst"
  },
  {
    "path": "notebooks/Multilingual_ASR.ipynb",
    "chars": 5971401,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"v5hvo8QWN-a9\"\n      },\n      \"sou"
  },
  {
    "path": "pyproject.toml",
    "chars": 1418,
    "preview": "[build-system]\nbuild-backend = \"setuptools.build_meta\"\n\nrequires = [ \"setuptools>=61.2\" ]\n\n[project]\nname = \"openai-whis"
  },
  {
    "path": "requirements.txt",
    "chars": 140,
    "preview": "numba\nnumpy\ntorch\ntqdm\nmore-itertools\ntiktoken\ntriton>=2.0.0;platform_machine==\"x86_64\" and sys_platform==\"linux\" or sys"
  },
  {
    "path": "tests/conftest.py",
    "chars": 214,
    "preview": "import random as rand\n\nimport numpy\nimport pytest\n\n\ndef pytest_configure(config):\n    config.addinivalue_line(\"markers\","
  },
  {
    "path": "tests/test_audio.py",
    "chars": 571,
    "preview": "import os.path\n\nimport numpy as np\n\nfrom whisper.audio import SAMPLE_RATE, load_audio, log_mel_spectrogram\n\n\ndef test_au"
  },
  {
    "path": "tests/test_normalizer.py",
    "chars": 3368,
    "preview": "import pytest\n\nfrom whisper.normalizers import EnglishTextNormalizer\nfrom whisper.normalizers.english import (\n    Engli"
  },
  {
    "path": "tests/test_timing.py",
    "chars": 2368,
    "preview": "import numpy as np\nimport pytest\nimport scipy.ndimage\nimport torch\n\nfrom whisper.timing import dtw_cpu, dtw_cuda, median"
  },
  {
    "path": "tests/test_tokenizer.py",
    "chars": 1286,
    "preview": "import pytest\n\nfrom whisper.tokenizer import get_tokenizer\n\n\n@pytest.mark.parametrize(\"multilingual\", [True, False])\ndef"
  },
  {
    "path": "tests/test_transcribe.py",
    "chars": 1524,
    "preview": "import os\n\nimport pytest\nimport torch\n\nimport whisper\nfrom whisper.tokenizer import get_tokenizer\n\n\n@pytest.mark.paramet"
  },
  {
    "path": "whisper/__init__.py",
    "chars": 7432,
    "preview": "import hashlib\nimport io\nimport os\nimport urllib\nimport warnings\nfrom typing import List, Optional, Union\n\nimport torch\n"
  },
  {
    "path": "whisper/__main__.py",
    "chars": 35,
    "preview": "from .transcribe import cli\n\ncli()\n"
  },
  {
    "path": "whisper/assets/gpt2.tiktoken",
    "chars": 835554,
    "preview": "IQ== 0\nIg== 1\nIw== 2\nJA== 3\nJQ== 4\nJg== 5\nJw== 6\nKA== 7\nKQ== 8\nKg== 9\nKw== 10\nLA== 11\nLQ== 12\nLg== 13\nLw== 14\nMA== 15\nMQ"
  },
  {
    "path": "whisper/assets/multilingual.tiktoken",
    "chars": 816730,
    "preview": "IQ== 0\nIg== 1\nIw== 2\nJA== 3\nJQ== 4\nJg== 5\nJw== 6\nKA== 7\nKQ== 8\nKg== 9\nKw== 10\nLA== 11\nLQ== 12\nLg== 13\nLw== 14\nMA== 15\nMQ"
  },
  {
    "path": "whisper/audio.py",
    "chars": 4945,
    "preview": "import os\nfrom functools import lru_cache\nfrom subprocess import CalledProcessError, run\nfrom typing import Optional, Un"
  },
  {
    "path": "whisper/decoding.py",
    "chars": 32155,
    "preview": "from dataclasses import dataclass, field, replace\nfrom typing import TYPE_CHECKING, Dict, Iterable, List, Optional, Sequ"
  },
  {
    "path": "whisper/model.py",
    "chars": 11749,
    "preview": "import base64\nimport gzip\nfrom contextlib import contextmanager\nfrom dataclasses import dataclass\nfrom typing import Dic"
  },
  {
    "path": "whisper/normalizers/__init__.py",
    "chars": 130,
    "preview": "from .basic import BasicTextNormalizer as BasicTextNormalizer\nfrom .english import EnglishTextNormalizer as EnglishTextN"
  },
  {
    "path": "whisper/normalizers/basic.py",
    "chars": 2047,
    "preview": "import re\nimport unicodedata\n\nimport regex\n\n# non-ASCII letters that are not separated by \"NFKD\" normalization\nADDITIONA"
  },
  {
    "path": "whisper/normalizers/english.json",
    "chars": 56128,
    "preview": "{\n    \"accessorise\": \"accessorize\",\n    \"accessorised\": \"accessorized\",\n    \"accessorises\": \"accessorizes\",\n    \"accesso"
  },
  {
    "path": "whisper/normalizers/english.py",
    "chars": 20843,
    "preview": "import json\nimport os\nimport re\nfrom fractions import Fraction\nfrom typing import Iterator, List, Match, Optional, Union"
  },
  {
    "path": "whisper/timing.py",
    "chars": 12674,
    "preview": "import itertools\nimport subprocess\nimport warnings\nfrom dataclasses import dataclass\nfrom typing import TYPE_CHECKING, L"
  },
  {
    "path": "whisper/tokenizer.py",
    "chars": 12300,
    "preview": "import base64\nimport os\nimport string\nfrom dataclasses import dataclass, field\nfrom functools import cached_property, lr"
  },
  {
    "path": "whisper/transcribe.py",
    "chars": 30315,
    "preview": "import argparse\nimport os\nimport traceback\nimport warnings\nfrom typing import TYPE_CHECKING, List, Optional, Tuple, Unio"
  },
  {
    "path": "whisper/triton_ops.py",
    "chars": 3646,
    "preview": "from functools import lru_cache\n\nimport numpy as np\nimport torch\n\ntry:\n    import triton\n    import triton.language as t"
  },
  {
    "path": "whisper/utils.py",
    "chars": 11529,
    "preview": "import json\nimport os\nimport re\nimport sys\nimport zlib\nfrom typing import Callable, List, Optional, TextIO\n\nsystem_encod"
  },
  {
    "path": "whisper/version.py",
    "chars": 25,
    "preview": "__version__ = \"20250625\"\n"
  }
]

// ... and 2 more files (download for full content)

About this extraction

This page contains the full source code of the openai/whisper GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 43 files (7.6 MB), approximately 2.0M tokens, and a symbol index with 188 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo