Full Code of QuentinFuxa/WhisperLiveKit for AI

main b102e12943af cached

146 files

3.5 MB

935.2k tokens

1208 symbols

1 requests

Download .txt

Showing preview only (3,739K chars total). Download the full file or copy to clipboard to get everything.

Repository: QuentinFuxa/WhisperLiveKit
Branch: main
Commit: b102e12943af
Files: 146
Total size: 3.5 MB

Directory structure:
gitextract_j2uu9au5/

├── .dockerignore
├── .github/
│   └── workflows/
│       ├── ci.yml
│       └── publish-docker.yml
├── .gitignore
├── AGENTS.md
├── CHANGES.md
├── CLAUDE.md
├── CONTRIBUTING.md
├── DEV_NOTES.md
├── Dockerfile
├── Dockerfile.cpu
├── LICENSE
├── README.md
├── benchmark_mlx_simul.py
├── benchmarks/
│   ├── h100/
│   │   ├── bench_voxtral_hf_batch.py
│   │   ├── bench_voxtral_vllm_realtime.py
│   │   ├── generate_figures.py
│   │   └── results.json
│   └── m5/
│       ├── bench_0.6b_simul_500.json
│       ├── bench_1.7b_simul_500.json
│       ├── generate_figures.py
│       └── results.json
├── chrome-extension/
│   ├── README.md
│   ├── background.js
│   ├── manifest.json
│   ├── requestPermissions.html
│   ├── requestPermissions.js
│   └── sidepanel.js
├── compose.yml
├── docs/
│   ├── API.md
│   ├── alignement_principles.md
│   ├── default_and_custom_models.md
│   ├── supported_languages.md
│   ├── technical_integration.md
│   └── troubleshooting.md
├── pyproject.toml
├── scripts/
│   ├── alignment_heads_qwen3_asr_0.6B.json
│   ├── alignment_heads_qwen3_asr_1.7B.json
│   ├── alignment_heads_qwen3_asr_1.7B_v2.json
│   ├── convert_hf_whisper.py
│   ├── create_long_samples.py
│   ├── detect_alignment_heads_qwen3.py
│   ├── determine_alignment_heads.py
│   ├── generate_architecture.py
│   ├── python_support_matrix.py
│   ├── run_scatter_benchmark.py
│   └── sync_extension.py
├── tests/
│   ├── __init__.py
│   └── test_pipeline.py
└── whisperlivekit/
    ├── __init__.py
    ├── audio_processor.py
    ├── backend_support.py
    ├── basic_server.py
    ├── benchmark/
    │   ├── __init__.py
    │   ├── compat.py
    │   ├── datasets.py
    │   ├── metrics.py
    │   ├── report.py
    │   └── runner.py
    ├── cascade_bridge.py
    ├── cli.py
    ├── config.py
    ├── core.py
    ├── deepgram_compat.py
    ├── diarization/
    │   ├── __init__.py
    │   ├── diart_backend.py
    │   ├── sortformer_backend.py
    │   └── utils.py
    ├── diff_protocol.py
    ├── ffmpeg_manager.py
    ├── local_agreement/
    │   ├── __init__.py
    │   ├── backends.py
    │   ├── online_asr.py
    │   └── whisper_online.py
    ├── metrics.py
    ├── metrics_collector.py
    ├── model_mapping.py
    ├── model_paths.py
    ├── parse_args.py
    ├── qwen3_asr.py
    ├── qwen3_mlx_asr.py
    ├── qwen3_mlx_simul.py
    ├── qwen3_simul.py
    ├── qwen3_simul_kv.py
    ├── session_asr_proxy.py
    ├── silero_vad_iterator.py
    ├── silero_vad_models/
    │   ├── __init__.py
    │   ├── silero_vad.jit
    │   ├── silero_vad.onnx
    │   ├── silero_vad_16k_op15.onnx
    │   └── silero_vad_half.onnx
    ├── simul_whisper/
    │   ├── __init__.py
    │   ├── align_att_base.py
    │   ├── backend.py
    │   ├── beam.py
    │   ├── config.py
    │   ├── decoder_state.py
    │   ├── eow_detection.py
    │   ├── mlx/
    │   │   ├── __init__.py
    │   │   ├── decoder_state.py
    │   │   ├── decoders.py
    │   │   └── simul_whisper.py
    │   ├── mlx_encoder.py
    │   ├── simul_whisper.py
    │   └── token_buffer.py
    ├── test_client.py
    ├── test_data.py
    ├── test_harness.py
    ├── thread_safety.py
    ├── timed_objects.py
    ├── tokens_alignment.py
    ├── vllm_realtime.py
    ├── voxtral_hf_streaming.py
    ├── voxtral_mlx/
    │   ├── __init__.py
    │   ├── loader.py
    │   ├── model.py
    │   └── spectrogram.py
    ├── voxtral_mlx_asr.py
    ├── warmup.py
    ├── web/
    │   ├── __init__.py
    │   ├── live_transcription.css
    │   ├── live_transcription.html
    │   ├── live_transcription.js
    │   ├── pcm_worklet.js
    │   ├── recorder_worker.js
    │   └── web_interface.py
    └── whisper/
        ├── __init__.py
        ├── __main__.py
        ├── assets/
        │   ├── __init__.py
        │   ├── gpt2.tiktoken
        │   ├── mel_filters.npz
        │   └── multilingual.tiktoken
        ├── audio.py
        ├── decoding.py
        ├── model.py
        ├── normalizers/
        │   ├── __init__.py
        │   ├── basic.py
        │   ├── english.json
        │   └── english.py
        ├── timing.py
        ├── tokenizer.py
        ├── transcribe.py
        ├── triton_ops.py
        ├── utils.py
        ├── val.py
        └── version.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .dockerignore
================================================
.git
.github
.venv
__pycache__
*.pyc
.pytest_cache
.mypy_cache
.ruff_cache
.cache
.tmp
.secrets
dist
build
*.c


================================================
FILE: .github/workflows/ci.yml
================================================
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install ruff
        run: pip install ruff

      - name: Run ruff check
        run: ruff check .

  import-check:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.11", "3.12", "3.13"]
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}

      - name: Install package
        run: pip install -e .

      - name: Verify imports
        run: python -c "from whisperlivekit import TranscriptionEngine, AudioProcessor, TestHarness, TestState, transcribe_audio; print('All imports OK')"


================================================
FILE: .github/workflows/publish-docker.yml
================================================
name: Publish Docker Images

on:
  push:
    tags:
      - "v*"
  workflow_dispatch:
    inputs:
      tag:
        description: "Image tag to publish (without image suffix)"
        required: true
        type: string

permissions:
  contents: read
  packages: write

jobs:
  docker:
    runs-on: ubuntu-latest
    env:
      IMAGE_TAG: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.tag || github.ref_name }}
    strategy:
      fail-fast: false
      matrix:
        include:
          - image_suffix: cpu-diarization-sortformer
            dockerfile: Dockerfile.cpu
            extras: cpu,diarization-sortformer
          - image_suffix: cu129-diarization-sortformer
            dockerfile: Dockerfile
            extras: cu129,diarization-sortformer
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set lowercase owner
        id: owner
        run: echo "value=${GITHUB_REPOSITORY_OWNER,,}" >> "${GITHUB_OUTPUT}"

      - name: Login to GHCR
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Setup Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build and push image
        uses: docker/build-push-action@v6
        with:
          context: .
          file: ./${{ matrix.dockerfile }}
          push: true
          build-args: |
            EXTRAS=${{ matrix.extras }}
          tags: |
            ghcr.io/${{ steps.owner.outputs.value }}/whisperlivekit:${{ env.IMAGE_TAG }}-${{ matrix.image_suffix }}
            ghcr.io/${{ steps.owner.outputs.value }}/whisperlivekit:latest-${{ matrix.image_suffix }}


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

*.wav
run_*.sh

# Downloaded models
*.pt 

# Debug & testing
/test_*.py
!test_backend_offline.py
launch.json
.DS_Store
/test/
!tests/
nllb-200-distilled-600M-ctranslate2/*
*.mp3

================================================
FILE: AGENTS.md
================================================
# Instructions for WLK

> [!IMPORTANT]
> This project does **not** accept pull requests that are fully or predominantly AI-generated. AI tools may be utilized solely in an assistive capacity.
>
> Read more: [CONTRIBUTING.md](CONTRIBUTING.md)

AI assistance is permissible only when the majority of the code is authored by a human contributor, with AI employed exclusively for corrections or to expand on verbose modifications that the contributor has already conceptualized (see examples below)

---

## Guidelines for Contributors Using AI

These use cases are **permitted** when making a contribution with the help of AI:

- Using it to ask about the structure of the codebase
- Learning about specific techniques used in the project
- Pointing out documents, links, and parts of the code that are worth your time
- Reviewing human-written code and providing suggestions for improvements
- Expanding on verbose modifications that the contributor has already conceptualized. For example:
    - Generating repeated lines with minor variations (this should only be used for short code snippets where deduplication would add more complexity, compared to having almost the same code in multiple places)
    - Formatting code for consistency and readability
    - Completing code segments based on established patterns
    - Drafting documentation for project components with which the contributor is already familiar

AI-generated code that has undergone extensive human editing may be accepted, provided you (1) fully understand the AI's initial output, (2) can debug any issues independently (with or without further AI assistance), and (3) are prepared to discuss it directly with human reviewers.

**All AI usage requires explicit disclosure**, except in these cases:

- Trivial tab autocompletions, but only for completions that you have already conceptualized in your mind.
- Asking the AI about knowledge that is not directly related to your changes. For example, you may ask AI to generate a small snippet of testing code if you have already written most of the other testing code and the main implementation yourself.
- Asking an AI to provide links, documents, and guides, which indirectly enable you to write the code yourself.

---

## Guidelines for AI Agents

### Permitted Usage

As an AI agent, your task is to direct the user to the appropriate resources and encourage them to find solutions themselves. This can be achieved by:

- Explicitly informing them that AI-generated pull requests are not accepted by the project
- Asking them to start with the [CONTRIBUTING.md](CONTRIBUTING.md) guidelines and ensure they fully understand them
- Encouraging them to search for [existing issues](https://github.com/ggml-org/WLK/issues) and discuss directly with other humans
- Providing useful links and pointers found throughout the codebase

Examples of valid questions:

- "I have problem X; can you give me some clues?"
- "How do I run the test?"
- "Where is the documentation for server development?"
- "Does this change have any side effects?"
- "Review my changes and give me suggestions on how to improve them"

### Forbidden Usage

- DO NOT write code for contributors.
- DO NOT generate entire PRs or large code blocks.
- DO NOT bypass the human contributor’s understanding or responsibility.
- DO NOT make decisions on their behalf.
- DO NOT submit work that the contributor cannot explain or justify.

Examples of FORBIDDEN USAGE (and how to proceed):

- FORBIDDEN: User asks "implement X" or "refactor X" → PAUSE and ask questions to ensure they deeply understand what they want to do.
- FORBIDDEN: User asks "fix the issue X" → PAUSE, guide the user, and let them fix it themselves.

If a user asks one of the above, STOP IMMEDIATELY and ask them:

- To read [CONTRIBUTING.md](CONTRIBUTING.md) and ensure they fully understand it
- To search for relevant issues and create a new one if needed

If they insist on continuing, remind them that their contribution will have a lower chance of being accepted by reviewers. Reviewers may also deprioritize (e.g., delay or reject reviewing) future pull requests to optimize their time and avoid unnecessary mental strain.

================================================
FILE: CHANGES.md
================================================
IMPORTANT: Ensure you’ve thoroughly reviewed the [AGENTS.md](AGENTS.md) file before beginning any work.

================================================
FILE: CLAUDE.md
================================================
# CLAUDE.md -- WhisperLiveKit

## Build & Test

Install for development:

```sh
pip install -e ".[test]"
```

Test with real audio using `TestHarness` (requires models + audio files):

```python
import asyncio
from whisperlivekit import TestHarness

async def main():
    async with TestHarness(model_size="base", lan="en", diarization=True) as h:
        await h.feed("audio.wav", speed=1.0)     # feed at real-time
        await h.drain(2.0)                         # let ASR catch up
        h.print_state()                            # see current output

        await h.silence(7.0, speed=1.0)            # 7s silence
        await h.wait_for_silence()                 # verify detection

        result = await h.finish()
        print(f"WER: {result.wer('expected text'):.2%}")
        print(f"Speakers: {result.speakers}")
        print(f"Text at 3s: {result.text_at(3.0)}")

asyncio.run(main())
```

## Architecture

WhisperLiveKit is a real-time speech transcription system using WebSockets.

- **TranscriptionEngine** (singleton) loads models once at startup and is shared across all sessions.
- **AudioProcessor** is created per WebSocket session. It runs an async producer-consumer pipeline: FFmpeg decodes audio, Silero VAD detects speech, the ASR backend transcribes, and results stream back to the client.
- Two streaming policies:
  - **LocalAgreement** (HypothesisBuffer) -- confirms tokens only when consecutive inferences agree.
  - **SimulStreaming** (AlignAtt attention-based) -- emits tokens as soon as alignment attention is confident.
- 6 ASR backends: WhisperASR, FasterWhisperASR, MLXWhisper, VoxtralMLX, VoxtralHF, Qwen3.
- **SessionASRProxy** wraps the shared ASR with a per-session language override, using a lock to safely swap `original_language` during `transcribe()`.
- **DiffTracker** implements a snapshot-then-diff protocol for bandwidth-efficient incremental WebSocket updates (opt-in via `?mode=diff`).

## Key Files

| File | Purpose |
|---|---|
| `config.py` | `WhisperLiveKitConfig` dataclass -- single source of truth for configuration |
| `core.py` | `TranscriptionEngine` singleton, `online_factory()`, diarization/translation factories |
| `audio_processor.py` | Per-session async pipeline (FFmpeg -> VAD -> ASR -> output) |
| `basic_server.py` | FastAPI server: WebSocket `/asr`, REST `/v1/audio/transcriptions`, CLI `wlk` |
| `timed_objects.py` | `ASRToken`, `Segment`, `FrontData` data structures |
| `diff_protocol.py` | `DiffTracker` -- snapshot-then-diff WebSocket protocol |
| `session_asr_proxy.py` | `SessionASRProxy` -- thread-safe per-session language wrapper |
| `parse_args.py` | CLI argument parser, returns `WhisperLiveKitConfig` |
| `test_client.py` | Headless WebSocket test client (`wlk-test`) |
| `test_harness.py` | In-process testing harness (`TestHarness`) for real E2E testing |
| `local_agreement/online_asr.py` | `OnlineASRProcessor` for LocalAgreement policy |
| `simul_whisper/` | SimulStreaming policy implementation (AlignAtt) |

## Key Patterns

- **TranscriptionEngine** uses double-checked locking for thread-safe singleton initialization. Never create a second instance in production. Use `TranscriptionEngine.reset()` in tests only to switch backends.
- **WhisperLiveKitConfig** dataclass is the single source of truth. Use `from_namespace()` (from argparse) or `from_kwargs()` (programmatic). `parse_args()` returns a `WhisperLiveKitConfig`, not a raw Namespace.
- **online_factory()** in `core.py` routes to the correct online processor class based on backend and policy.
- **FrontData.to_dict()** is the canonical output format for WebSocket messages.
- **SessionASRProxy** uses `__getattr__` delegation -- it forwards everything except `transcribe()` to the wrapped ASR.
- The server exposes `self.args` as a `Namespace` on `TranscriptionEngine` for backward compatibility with `AudioProcessor`.

## Adding a New ASR Backend

1. Create `whisperlivekit/my_backend.py` with a class implementing:
   - `transcribe(audio, init_prompt="")` -- run inference on audio array
   - `ts_words(result)` -- extract timestamped words from result
   - `segments_end_ts(result)` -- extract segment end timestamps
   - `use_vad()` -- whether this backend needs external VAD
2. Set required attributes on the class: `sep`, `original_language`, `backend_choice`, `SAMPLING_RATE`, `confidence_validation`, `tokenizer`, `buffer_trimming`, `buffer_trimming_sec`.
3. Register in `core.py`:
   - Add an `elif` branch in `TranscriptionEngine._do_init()` to instantiate the backend.
   - Add a routing case in `online_factory()` to return the appropriate online processor.
4. Add the backend choice to CLI args in `parse_args.py`.

## Testing with TestHarness

`TestHarness` wraps AudioProcessor in-process for full pipeline testing without a server.

Key methods:
- `feed(path, speed=1.0)` -- feed audio at controlled speed (0 = instant)
- `silence(duration, speed=1.0)` -- inject silence (>5s triggers silence detection)
- `drain(seconds)` -- wait for ASR to catch up without feeding audio
- `finish(timeout)` -- signal end-of-audio, wait for pipeline to drain
- `state` -- current `TestState` with lines, buffers, speakers, timestamps
- `wait_for(predicate)` / `wait_for_text()` / `wait_for_silence()` / `wait_for_speakers(n)`
- `snapshot_at(audio_time)` -- historical state at a given audio position
- `on_update(callback)` -- register callback for each state update

`TestState` provides:
- `text`, `committed_text` -- full or committed-only transcription
- `speakers`, `n_speakers`, `has_silence` -- speaker/silence info
- `line_at(time_s)`, `speaker_at(time_s)`, `text_at(time_s)` -- query by timestamp
- `lines_between(start, end)`, `text_between(start, end)` -- query by time range
- `wer(reference)`, `wer_detailed(reference)` -- evaluation against ground truth
- `speech_lines`, `silence_segments` -- filtered line lists

## OpenAI-Compatible REST API

The server exposes an OpenAI-compatible batch transcription endpoint:

```bash
# Transcribe a file (drop-in replacement for OpenAI)
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.mp3 \
  -F response_format=verbose_json

# Works with the OpenAI Python client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
result = client.audio.transcriptions.create(model="whisper-1", file=open("audio.mp3", "rb"))
print(result.text)
```

Supported `response_format` values: `json`, `verbose_json`, `text`, `srt`, `vtt`.
The `model` parameter is accepted but ignored (uses the server's configured backend).

## Do NOT

- Do not create a second `TranscriptionEngine` instance. It is a singleton; the constructor returns the existing instance after the first call.
- Do not modify `original_language` on the shared ASR directly. Use `SessionASRProxy` for per-session language overrides.
- Do not assume the frontend handles diff protocol messages. Diff mode is opt-in (`?mode=diff`) and ignored by default.
- Do not write mock-based unit tests. Use `TestHarness` with real audio for pipeline testing.


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing

Thank you for considering contributing ! We appreciate your time and effort to help make this project better.

## Before You Start

1. **Search for Existing Issues or Discussions:**
   - Before opening a new issue or discussion, please check if there's already an existing one related to your topic. This helps avoid duplicates and keeps discussions centralized.

2. **Discuss Your Contribution:**
   - If you plan to make a significant change, it's advisable to discuss it in an issue first. This ensures that your contribution aligns with the project's goals and avoids duplicated efforts.

3. **General questions about whisper streaming web:**
   - For general questions about whisper streaming web, use the discussion space on GitHub. This helps in fostering a collaborative environment and encourages knowledge-sharing.

## Opening Issues

If you encounter a problem with WhisperLiveKit or want to suggest an improvement, please follow these guidelines when opening an issue:

- **Bug Reports:**
  - Clearly describe the error. **Please indicate the parameters you use, especially the model(s)**
  - Provide a minimal, reproducible example that demonstrates the issue.

- **Feature Requests:**
  - Clearly outline the new feature you are proposing.
  - Explain how it would benefit the project.

## Opening Pull Requests

We welcome and appreciate contributions! To ensure a smooth review process, please follow these guidelines when opening a pull request:

- **Commit Messages:**
  - Write clear and concise commit messages, explaining the purpose of each change.

- **Documentation:**
  - Update documentation when introducing new features or making changes that impact existing functionality.

- **Tests:**
  - If applicable, add or update tests to cover your changes.

- **Discuss Before Major Changes:**
  - If your PR includes significant changes, discuss it in an issue first.

## Thank You

Your contributions make WhisperLiveKit better for everyone. Thank you for your time and dedication!


================================================
FILE: DEV_NOTES.md
================================================
# 1. Simulstreaming: Decouple the encoder for faster inference

Simulstreaming encoder time (whisperlivekit/simul_whisper/simul_whisper.py l. 397) experimentations :

On macOS Apple Silicon M4 :

| Encoder | base.en | small |
|--------|---------|-------|
| WHISPER (no modification) | 0.35s | 1.09s |
| FASTER_WHISPER | 0.4s | 1.20s |
| MLX_WHISPER | 0.07s | 0.20s |

Memory saved by only loading encoder for optimized framework:

For tiny.en, mlx whisper:
Sizes MLX whisper:
Decoder weights: 59110771 bytes
Encoder weights: 15268874 bytes


# 2. Translation: Faster model for each system

## Benchmark Results

Testing on MacBook M3 with NLLB-200-distilled-600M model:

### Standard Transformers vs CTranslate2

| Test Text | Standard Inference Time | CTranslate2 Inference Time | Speedup |
|-----------|-------------------------|---------------------------|---------|
| UN Chief says there is no military solution in Syria | 0.9395s | 2.0472s | 0.5x |
| The rapid advancement of AI technology is transforming various industries | 0.7171s | 1.7516s | 0.4x |
| Climate change poses a significant threat to global ecosystems | 0.8533s | 1.8323s | 0.5x |
| International cooperation is essential for addressing global challenges | 0.7209s | 1.3575s | 0.5x |
| The development of renewable energy sources is crucial for a sustainable future | 0.8760s | 1.5589s | 0.6x |

**Results:**
- Total Standard time: 4.1068s
- Total CTranslate2 time: 8.5476s
- CTranslate2 is slower on this system --> Use Transformers, and ideally we would have an mlx implementation.


# 3. SortFormer Diarization: 4-to-2 Speaker Constraint Algorithm

Transform a diarization model that predicts up to 4 speakers into one that predicts up to 2 speakers by mapping the output predictions.

## Problem Statement
- Input: `self.total_preds` with shape `(x, x, 4)` - predictions for 4 speakers
- Output: Constrained predictions with shape `(x, x, 2)` - predictions for 2 speakers

#
### Initial Setup
For each time step `i`, we have a ranking of 4 speaker predictions (1-4). When only 2 speakers are present, the model will have close predictions for the 2 active speaker positions.

Instead of `np.argmax(preds_np, axis=1)`, we take the top 2 predictions and build a dynamic 4→2 mapping that can evolve over time.

### Algorithm

```python
top_2_speakers = np.argsort(preds_np, axis=1)[:, -2:]
```

- `DS_a_{i}`: Top detected speaker for prediction i
- `DS_b_{i}`: Second detected speaker for prediction i  
- `AS_{i}`: Attributed speaker for prediction i
- `GTS_A`: Ground truth speaker A
- `GTS_B`: Ground truth speaker B
- `DIST(a, b)`: Distance between detected speakers a and b

3. **Attribution Logic**

```
AS_0 ← A

AS_1 ← B

IF DIST(DS_a_0, DS_a_1) < DIST(DS_a_0, DS_a_2) AND 
    DIST(DS_a_0, DS_a_1) < DIST(DS_a_1, DS_a_2):
    # Likely that DS_a_0 = DS_a_1 (same speaker)
    AS_1 ← A
    AS_2 ← B

ELIF DIST(DS_a_0, DS_a_2) < DIST(DS_a_0, DS_a_1) AND 
    DIST(DS_a_0, DS_a_2) < DIST(DS_a_1, DS_a_2):
    AS_2 ← A

ELSE:
    AS_2 ← B

to finish
```


================================================
FILE: Dockerfile
================================================
FROM ghcr.io/astral-sh/uv:0.10.4 AS uvbin

# --- MARK: Builder Stage
FROM nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04 AS builder-gpu
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

WORKDIR /app

RUN apt-get update && \
  apt-get install -y --no-install-recommends \
  build-essential \
  python3-dev && \
  rm -rf /var/lib/apt/lists/*

# Install UV and set up the environment 
COPY --from=uvbin /uv /uvx /bin/

ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy UV_NO_DEV=1
ENV UV_PYTHON_PREFERENCE=only-managed
ENV UV_PYTHON_INSTALL_DIR=/python

RUN uv python install 3.12

# Install dependencies first to leverage caching
ARG EXTRAS=cu129
COPY pyproject.toml uv.lock /app/
RUN set -eux; \
  set --; \
  for extra in $(echo "${EXTRAS:-}" | tr ',' ' '); do \
  set -- "$@" --extra "$extra"; \
  done; \
  uv sync --frozen --no-install-project --no-editable --no-cache "$@"

# Copy the source code and install the package only
COPY whisperlivekit /app/whisperlivekit
RUN set -eux; \
  set --; \
  for extra in $(echo "${EXTRAS:-}" | tr ',' ' '); do \
  set -- "$@" --extra "$extra"; \
  done; \
  uv sync --frozen --no-editable --no-cache "$@"

# --- MARK: Runtime Stage 
FROM nvidia/cuda:12.9.1-cudnn-runtime-ubuntu24.04

ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /app

RUN apt-get update && \
  apt-get install -y --no-install-recommends \
  ffmpeg &&\
  rm -rf /var/lib/apt/lists/*

# Copy UV binaries
COPY --from=uvbin /uv /uvx /bin/

# Copy the Python version
COPY --from=builder-gpu --chown=python:python /python /python

# Copy the virtual environment with all dependencies installed
COPY --from=builder-gpu /app/.venv /app/.venv

EXPOSE 8000

ENV PATH="/app/.venv/bin:$PATH"
ENV UV_PYTHON_DOWNLOADS=0

HEALTHCHECK --interval=30s --timeout=5s --start-period=120s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/')" || exit 1

ENTRYPOINT ["wlk", "--host", "0.0.0.0"]

CMD ["--model", "medium"]


================================================
FILE: Dockerfile.cpu
================================================
FROM ghcr.io/astral-sh/uv:0.10.4 AS uvbin

# --- MARK: Builder Stage
FROM debian:bookworm-slim AS builder-cpu
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

WORKDIR /app

RUN apt-get update && \
  apt-get install -y --no-install-recommends \
  build-essential \
  python3-dev && \
  rm -rf /var/lib/apt/lists/*

# Install UV and set up the environment 
COPY --from=uvbin /uv /uvx /bin/

ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy UV_NO_DEV=1
ENV UV_PYTHON_PREFERENCE=only-managed
ENV UV_PYTHON_INSTALL_DIR=/python

RUN uv python install 3.12

# Install dependencies first to leverage caching
ARG EXTRAS=cpu
COPY pyproject.toml uv.lock /app/
RUN set -eux; \
  set --; \
  for extra in $(echo "${EXTRAS:-}" | tr ',' ' '); do \
  set -- "$@" --extra "$extra"; \
  done; \
  uv sync --frozen --no-install-project --no-editable --no-cache "$@"

# Copy the source code and install the package only
COPY whisperlivekit /app/whisperlivekit
RUN set -eux; \
  set --; \
  for extra in $(echo "${EXTRAS:-}" | tr ',' ' '); do \
  set -- "$@" --extra "$extra"; \
  done; \
  uv sync --frozen --no-editable --no-cache "$@"

# --- MARK: Runtime Stage 
FROM debian:bookworm-slim

ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /app

RUN apt-get update && \
  apt-get install -y --no-install-recommends \
  ffmpeg &&\
  rm -rf /var/lib/apt/lists/*

# Copy UV binaries
COPY --from=uvbin /uv /uvx /bin/

# Copy the Python version
COPY --from=builder-cpu --chown=python:python /python /python

# Copy the virtual environment with all dependencies installed
COPY --from=builder-cpu /app/.venv /app/.venv

EXPOSE 8000

ENV PATH="/app/.venv/bin:$PATH"
ENV UV_PYTHON_DOWNLOADS=0

HEALTHCHECK --interval=30s --timeout=5s --start-period=120s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/')" || exit 1

ENTRYPOINT ["wlk", "--host", "0.0.0.0"]

# Default args - you might want to use a smaller model for CPU
CMD ["--model", "tiny"]


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2025 Quentin Fuxa

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
---

## Based on:
- **SimulWhisper** by Speech and Audio Technology LAB of Tsinghua University – Apache-2.0 – https://github.com/ufal/SimulStreaming
- **SimulStreaming** by ÚFAL – MIT License – https://github.com/ufal/SimulStreaming
- **NeMo** by NVidia - Apache-2.0 - https://github.com/NVIDIA-NeMo/NeMo 
- **whisper_streaming** by ÚFAL – MIT License – https://github.com/ufal/whisper_streaming.
- **silero-vad** by Snakers4 – MIT License – https://github.com/snakers4/silero-vad.
- **Diart** by juanmc2005 – MIT License – https://github.com/juanmc2005/diart.


================================================
FILE: README.md
================================================
<h1 align="center">WLK</h1>
<p align="center"><b>WhisperLiveKit: Ultra-low-latency, self-hosted speech-to-text with speaker identification</b></p>


<p align="center">
<img src="https://raw.githubusercontent.com/QuentinFuxa/WhisperLiveKit/refs/heads/main/demo.png" alt="WhisperLiveKit Demo" width="730">
</p>


<p align="center">
<a href="https://pypi.org/project/whisperlivekit/"><img alt="PyPI Version" src="https://img.shields.io/pypi/v/whisperlivekit?color=g"></a>
<a href="https://pepy.tech/project/whisperlivekit"><img alt="PyPI Downloads" src="https://static.pepy.tech/personalized-badge/whisperlivekit?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=installations"></a>
<a href="https://pypi.org/project/whisperlivekit/"><img alt="Python Versions" src="https://img.shields.io/badge/python-3.11--3.13-dark_green"></a>
<a href="https://huggingface.co/qfuxa/whisper-base-french-lora">
  <img alt="Hugging Face Weights" src="https://img.shields.io/badge/🤗-Hugging%20Face%20Weights-yellow" />
</a>
<a href="https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache 2.0-dark_green"></a>
</p>


### Powered by Leading Research:

- Simul-[Whisper](https://arxiv.org/pdf/2406.10052)/[Streaming](https://arxiv.org/abs/2506.17077) (SOTA 2025) - Ultra-low latency transcription using [AlignAtt policy](https://arxiv.org/pdf/2305.11408). 
- [NLLW](https://github.com/QuentinFuxa/NoLanguageLeftWaiting) (2025), based on [distilled](https://huggingface.co/entai2965/nllb-200-distilled-600M-ctranslate2) [NLLB](https://arxiv.org/abs/2207.04672) (2022, 2024) - Simulatenous translation from & to 200 languages.
- [WhisperStreaming](https://github.com/ufal/whisper_streaming) (SOTA 2023) - Low latency transcription using [LocalAgreement policy](https://www.isca-archive.org/interspeech_2020/liu20s_interspeech.pdf)
- [Streaming Sortformer](https://arxiv.org/abs/2507.18446) (SOTA 2025) - Advanced real-time speaker diarization
- [Diart](https://github.com/juanmc2005/diart) (SOTA 2021) - Real-time speaker diarization
- [Voxtral Mini](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) (2025) - 4B-parameter multilingual speech model by Mistral AI
- [Silero VAD](https://github.com/snakers4/silero-vad) (2024) - Enterprise-grade Voice Activity Detection


> **Why not just run a simple Whisper model on every audio batch?** Whisper is designed for complete utterances, not real-time chunks. Processing small segments loses context, cuts off words mid-syllable, and produces poor transcription. WhisperLiveKit uses state-of-the-art simultaneous speech research for intelligent buffering and incremental processing.


### Architecture

<img alt="Architecture" src="https://raw.githubusercontent.com/QuentinFuxa/WhisperLiveKit/refs/heads/main/architecture.png" />

*The backend supports multiple concurrent users. Voice Activity Detection reduces overhead when no voice is detected.*

### Installation & Quick Start

```bash
pip install whisperlivekit
```

#### Quick Start

```bash

# Start the server — open http://localhost:8000 and start talking
wlk --model base --language en


# Auto-pull model and start server
wlk run whisper:tiny

# Transcribe a file (no server needed)
wlk transcribe meeting.wav

# Generate subtitles
wlk transcribe --format srt podcast.mp3 -o podcast.srt

# Manage models
wlk models                             # See what's installed
wlk pull large-v3                      # Download a model
wlk rm large-v3                        # Delete a model

# Benchmark speed and accuracy
wlk bench
```

#### API Compatibility

WhisperLiveKit exposes multiple APIs so you can use it as a drop-in replacement:

```bash
# OpenAI-compatible REST API
curl http://localhost:8000/v1/audio/transcriptions -F file=@audio.wav

# Works with the OpenAI Python SDK
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")

# Deepgram-compatible WebSocket (use any Deepgram SDK)
# Just point your Deepgram client at localhost:8000

# Native WebSocket for real-time streaming
ws://localhost:8000/asr
```

See [docs/API.md](docs/API.md) for the complete API reference.

> - See [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/simul_whisper/whisper/tokenizer.py) for the list of all available languages.
> - Check the [troubleshooting guide](docs/troubleshooting.md) for step-by-step fixes collected from recent GPU setup/env issues.
> - For HTTPS requirements, see the **Parameters** section for SSL configuration options.




#### Optional Dependencies

| Feature | `uv sync` | `pip install -e` |
|-----------|-------------|-------------|
| **Apple Silicon MLX Whisper backend** | `uv sync --extra mlx-whisper` | `pip install -e ".[mlx-whisper]"` |
| **Voxtral (MLX backend, Apple Silicon)** | `uv sync --extra voxtral-mlx` | `pip install -e ".[voxtral-mlx]"` |
| **CPU PyTorch stack** | `uv sync --extra cpu` | `pip install -e ".[cpu]"` |
| **CUDA 12.9 PyTorch stack** | `uv sync --extra cu129` | `pip install -e ".[cu129]"` |
| **Translation** | `uv sync --extra translation` | `pip install -e ".[translation]"` |
| **Sentence tokenizer** | `uv sync --extra sentence_tokenizer` | `pip install -e ".[sentence_tokenizer]"` |
| **Voxtral (HF backend)** | `uv sync --extra voxtral-hf` | `pip install -e ".[voxtral-hf]"` |
| **Speaker diarization (Sortformer / NeMo)** | `uv sync --extra diarization-sortformer` | `pip install -e ".[diarization-sortformer]"` |
| *[Not recommended]* Speaker diarization with Diart | `uv sync --extra diarization-diart` | `pip install -e ".[diarization-diart]"` |

Supported GPU profiles:

```bash
# Profile A: Sortformer diarization
uv sync --extra cu129 --extra diarization-sortformer

# Profile B: Voxtral HF + translation
uv sync --extra cu129 --extra voxtral-hf --extra translation
```

`voxtral-hf` and `diarization-sortformer` are intentionally incompatible extras and must be installed in separate environments.

See **Parameters & Configuration** below on how to use them.

<p align="center">
<img src="benchmark_scatter_en_aware.png" alt="Speed vs Accuracy — English" width="700">
</p>
<p align="center">
<img src="benchmark_scatter_fr_aware.png" alt="Speed vs Accuracy — French" width="700">
</p>

Benchmarks use 6 minutes of public [LibriVox](https://librivox.org/) audiobook recordings per language (30s + 60s + 120s + 180s), with ground truth from [Project Gutenberg](https://www.gutenberg.org/). Fully reproducible with `python scripts/run_scatter_benchmark.py`.
We are actively looking for benchmark results on other hardware (NVIDIA GPUs, different Apple Silicon chips, cloud instances). If you run the benchmarks on your machine, please share your results via an issue or PR!


#### Use it to capture audio from web pages.

Go to `chrome-extension` for instructions.

<p align="center">
<img src="https://raw.githubusercontent.com/QuentinFuxa/WhisperLiveKit/refs/heads/main/chrome-extension/demo-extension.png" alt="WhisperLiveKit Demo" width="600">
</p>


### Voxtral Backend

WhisperLiveKit supports [Voxtral Mini](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602),
a 4B-parameter speech model from Mistral AI that natively handles 100+ languages with automatic
language detection. Whisper also supports auto-detection (`--language auto`), but Voxtral's per-chunk
detection is more reliable and does not bias towards English.

```bash
# Apple Silicon (native MLX, recommended)
pip install -e ".[voxtral-mlx]"
wlk --backend voxtral-mlx

# Linux/GPU (HuggingFace transformers)
pip install transformers torch
wlk --backend voxtral
```

Voxtral uses its own streaming policy and does not use LocalAgreement or SimulStreaming.
See [BENCHMARK.md](BENCHMARK.md) for performance numbers.

### Usage Examples

**Command-line Interface**: Start the transcription server with various options:

```bash
# Large model and translate from french to danish
wlk --model large-v3 --language fr --target-language da

# Diarization and server listening on */80
wlk --host 0.0.0.0 --port 80 --model medium --diarization --language fr

# Voxtral multilingual (auto-detects language)
wlk --backend voxtral-mlx
```


**Python API Integration**: Check [basic_server](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/basic_server.py) for a more complete example of how to use the functions and classes.

```python
import asyncio
from contextlib import asynccontextmanager

from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse

from whisperlivekit import AudioProcessor, TranscriptionEngine, parse_args

transcription_engine = None

@asynccontextmanager
async def lifespan(app: FastAPI):
    global transcription_engine
    transcription_engine = TranscriptionEngine(model_size="medium", diarization=True, lan="en")
    yield

app = FastAPI(lifespan=lifespan)

async def handle_websocket_results(websocket: WebSocket, results_generator):
    async for response in results_generator:
        await websocket.send_json(response)
    await websocket.send_json({"type": "ready_to_stop"})

@app.websocket("/asr")
async def websocket_endpoint(websocket: WebSocket):
    global transcription_engine

    # Create a new AudioProcessor for each connection, passing the shared engine
    audio_processor = AudioProcessor(transcription_engine=transcription_engine)    
    results_generator = await audio_processor.create_tasks()
    results_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))
    await websocket.accept()
    while True:
        message = await websocket.receive_bytes()
        await audio_processor.process_audio(message)        
```

**Frontend Implementation**: The package includes an HTML/JavaScript implementation [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html). You can also import it using `from whisperlivekit import get_inline_ui_html` & `page = get_inline_ui_html()`


## Parameters & Configuration


| Parameter | Description | Default |
|-----------|-------------|---------|
| `--model` | Whisper model size. List and recommandations [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/docs/default_and_custom_models.md) | `small` |
| `--model-path` | Local .pt file/directory **or** Hugging Face repo ID containing the Whisper model. Overrides `--model`. Recommandations [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/docs/default_and_custom_models.md) | `None` |
| `--language` | List [here](docs/supported_languages.md). If you use `auto`, the model attempts to detect the language automatically, but it tends to bias towards English. | `auto` |
| `--target-language` | If sets, translates using [NLLW](https://github.com/QuentinFuxa/NoLanguageLeftWaiting). [200 languages available](docs/supported_languages.md). If you want to translate to english, you can also use `--direct-english-translation`. The STT model will try to directly output the translation. | `None` |
| `--diarization` | Enable speaker identification | `False` |
| `--backend-policy` | Streaming strategy: `1`/`simulstreaming` uses AlignAtt SimulStreaming, `2`/`localagreement` uses the LocalAgreement policy | `simulstreaming` |
| `--backend` | ASR backend selector. `auto` picks MLX on macOS (if installed), otherwise Faster-Whisper, otherwise vanilla Whisper. Options: `mlx-whisper`, `faster-whisper`, `whisper`, `openai-api` (LocalAgreement only), `voxtral-mlx` (Apple Silicon), `voxtral` (HuggingFace) | `auto` |
| `--no-vac` | Disable Voice Activity Controller. NOT ADVISED | `False` |
| `--no-vad` | Disable Voice Activity Detection. NOT ADVISED | `False` |
| `--warmup-file` | Audio file path for model warmup | `jfk.wav` |
| `--host` | Server host address | `localhost` |
| `--port` | Server port | `8000` |
| `--ssl-certfile` | Path to the SSL certificate file (for HTTPS support) | `None` |
| `--ssl-keyfile` | Path to the SSL private key file (for HTTPS support) | `None` |
| `--forwarded-allow-ips` | Ip or Ips allowed to reverse proxy the whisperlivekit-server. Supported types are  IP Addresses (e.g. 127.0.0.1), IP Networks (e.g. 10.100.0.0/16), or Literals (e.g. /path/to/socket.sock) | `None` |
| `--pcm-input` | raw PCM (s16le) data is expected as input and FFmpeg will be bypassed. Frontend will use AudioWorklet instead of MediaRecorder | `False` |
| `--lora-path` | Path or Hugging Face repo ID for LoRA adapter weights (e.g., `qfuxa/whisper-base-french-lora`). Only works with native Whisper backend (`--backend whisper`) | `None` |

| Translation options | Description | Default |
|-----------|-------------|---------|
| `--nllb-backend` | `transformers` or `ctranslate2` | `transformers` |
| `--nllb-size` | `600M` or `1.3B` | `600M` |

| Diarization options | Description | Default |
|-----------|-------------|---------|
| `--diarization-backend` |  `diart` or `sortformer` | `sortformer` |
| `--disable-punctuation-split` | [NOT FUNCTIONAL IN 0.2.15 / 0.2.16] Disable punctuation based splits. See #214 | `False` |
| `--segmentation-model` | Hugging Face model ID for Diart segmentation model. [Available models](https://github.com/juanmc2005/diart/tree/main?tab=readme-ov-file#pre-trained-models) | `pyannote/segmentation-3.0` |
| `--embedding-model` | Hugging Face model ID for Diart embedding model. [Available models](https://github.com/juanmc2005/diart/tree/main?tab=readme-ov-file#pre-trained-models) | `pyannote/embedding` |

| SimulStreaming backend options | Description | Default |
|-----------|-------------|---------|
| `--disable-fast-encoder` | Disable Faster Whisper or MLX Whisper backends for the encoder (if installed). Inference can be slower but helpful when GPU memory is limited | `False` |
| `--custom-alignment-heads` | Use your own alignment heads, useful when `--model-dir` is used. Use `scripts/determine_alignment_heads.py` to extract them. <img src="scripts/alignment_heads_qwen3_asr_1.7B.png" alt="WhisperLiveKit Demo" width="300">
 | `None` |
| `--frame-threshold` | AlignAtt frame threshold (lower = faster, higher = more accurate) | `25` |
| `--beams` | Number of beams for beam search (1 = greedy decoding) | `1` |
| `--decoder` | Force decoder type (`beam` or `greedy`) | `auto` |
| `--audio-max-len` | Maximum audio buffer length (seconds) | `30.0` |
| `--audio-min-len` | Minimum audio length to process (seconds) | `0.0` |
| `--cif-ckpt-path` | Path to CIF model for word boundary detection | `None` |
| `--never-fire` | Never truncate incomplete words | `False` |
| `--init-prompt` | Initial prompt for the model | `None` |
| `--static-init-prompt` | Static prompt that doesn't scroll | `None` |
| `--max-context-tokens` | Maximum context tokens | Depends on model used, but usually 448. |



| WhisperStreaming backend options | Description | Default |
|-----------|-------------|---------|
| `--confidence-validation` | Use confidence scores for faster validation | `False` |
| `--buffer_trimming` | Buffer trimming strategy (`sentence` or `segment`) | `segment` |




> For diarization using Diart, you need to accept user conditions [here](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model, [here](https://huggingface.co/pyannote/segmentation-3.0) for the `pyannote/segmentation-3.0` model and [here](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model. **Then**, login to HuggingFace: `huggingface-cli login`

### 🚀 Deployment Guide

To deploy WhisperLiveKit in production:
 
1. **Server Setup**: Install production ASGI server & launch with multiple workers
   ```bash
   pip install uvicorn gunicorn
   gunicorn -k uvicorn.workers.UvicornWorker -w 4 your_app:app
   ```

2. **Frontend**: Host your customized version of the `html` example & ensure WebSocket connection points correctly

3. **Nginx Configuration** (recommended for production):
    ```nginx    
   server {
       listen 80;
       server_name your-domain.com;
        location / {
            proxy_pass http://localhost:8000;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
    }}
    ```

4. **HTTPS Support**: For secure deployments, use "wss://" instead of "ws://" in WebSocket URL

## 🐋 Docker

Deploy the application easily using Docker with GPU or CPU support.

### Prerequisites
- Docker installed on your system
- For GPU support: NVIDIA Docker runtime installed

### Quick Start

**With GPU acceleration (recommended):**
```bash
docker build -t wlk .
docker run --gpus all -p 8000:8000 --name wlk wlk
```

**CPU only:**
```bash
docker build -f Dockerfile.cpu -t wlk --build-arg EXTRAS="cpu" .
docker run -p 8000:8000 --name wlk wlk
```

### Advanced Usage

**Custom configuration:**
```bash
# Example with custom model and language
docker run --gpus all -p 8000:8000 --name wlk wlk --model large-v3 --language fr
```

**Compose (recommended for cache + token wiring):**
```bash
# GPU Sortformer profile
docker compose up --build wlk-gpu-sortformer

# GPU Voxtral profile
docker compose up --build wlk-gpu-voxtral

# CPU service
docker compose up --build wlk-cpu
```

### Memory Requirements
- **Large models**: Ensure your Docker runtime has sufficient memory allocated


#### Customization

- `--build-arg` Options:
  - `EXTRAS="cu129,diarization-sortformer"` - GPU Sortformer profile extras.
  - `EXTRAS="cu129,voxtral-hf,translation"` - GPU Voxtral profile extras.
  - `EXTRAS="cpu,diarization-diart,translation"` - CPU profile extras.
  - Hugging Face cache + token are configured in `compose.yml` using a named volume and `HF_TKN_FILE` (default: `./token`).

## Testing & Benchmarks

```bash
# Quick benchmark with the CLI
wlk bench
wlk bench --backend faster-whisper --model large-v3
wlk bench --languages all --json results.json

# Install test dependencies for full suite
pip install -e ".[test]"

# Run unit tests (no model download required)
pytest tests/ -v

# Speed vs Accuracy scatter plot (all backends, compute-aware + unaware)
python scripts/create_long_samples.py        # generate ~90s test samples (cached)
python scripts/run_scatter_benchmark.py      # English (both modes)
python scripts/run_scatter_benchmark.py --lang fr  # French
```

## Use Cases
Capture discussions in real-time for meeting transcription, help hearing-impaired users follow conversations through accessibility tools, transcribe podcasts or videos automatically for content creation, transcribe support calls with speaker identification for customer service...


================================================
FILE: benchmark_mlx_simul.py
================================================
#!/usr/bin/env python3
"""
Benchmark Qwen3-ASR MLX SimulStreaming on LibriSpeech test-clean.

Measures:
  - Word Error Rate (WER) via jiwer
  - Real-Time Factor (RTF) = total_inference_time / total_audio_duration
  - Per-utterance stats

Usage:
  # Per-utterance simul-streaming (default)
  python benchmark_mlx_simul.py --model-size 0.6b

  # Single-shot (batch-like, no streaming chunking)
  python benchmark_mlx_simul.py --model-size 0.6b --single-shot

  # Quick test with 100 utterances
  python benchmark_mlx_simul.py --model-size 0.6b --max-utterances 100

  # Chapter-grouped (matching H100 benchmark methodology)
  python benchmark_mlx_simul.py --model-size 0.6b --chapter-grouped
"""

import argparse
import json
import logging
import os
import re
import sys
import time
from collections import defaultdict
from pathlib import Path

import numpy as np
import soundfile as sf
from jiwer import wer as compute_wer, cer as compute_cer

# Add WhisperLiveKit to path
WLKIT_DIR = Path(__file__).resolve().parent
sys.path.insert(0, str(WLKIT_DIR))

from whisperlivekit.qwen3_mlx_simul import (
    Qwen3MLXSimulStreamingASR,
    Qwen3MLXSimulStreamingOnlineProcessor,
)

logging.basicConfig(
    level=logging.WARNING,
    format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
logger = logging.getLogger("benchmark")
logger.setLevel(logging.INFO)

SAMPLE_RATE = 16_000

# Alignment heads paths
ALIGNMENT_HEADS = {
    "0.6b": str(WLKIT_DIR / "scripts" / "alignment_heads_qwen3_asr_0.6B.json"),
    "1.7b": str(WLKIT_DIR / "scripts" / "alignment_heads_qwen3_asr_1.7B_v2.json"),
}


def load_librispeech_utterances(data_dir: str, max_utterances: int = 0):
    """Load LibriSpeech utterances: yields (utt_id, audio_np, reference_text, duration_s)."""
    data_path = Path(data_dir)
    trans_files = sorted(data_path.rglob("*.trans.txt"))

    count = 0
    for trans_file in trans_files:
        chapter_dir = trans_file.parent
        with open(trans_file) as f:
            for line in f:
                line = line.strip()
                if not line:
                    continue
                parts = line.split(" ", 1)
                utt_id = parts[0]
                ref_text = parts[1] if len(parts) > 1 else ""

                flac_path = chapter_dir / f"{utt_id}.flac"
                if not flac_path.exists():
                    logger.warning("Missing FLAC: %s", flac_path)
                    continue

                audio, sr = sf.read(str(flac_path), dtype="float32")
                if sr != SAMPLE_RATE:
                    import librosa
                    audio = librosa.resample(audio, orig_sr=sr, target_sr=SAMPLE_RATE)

                duration = len(audio) / SAMPLE_RATE
                yield utt_id, audio, ref_text, duration

                count += 1
                if max_utterances > 0 and count >= max_utterances:
                    return


def load_librispeech_chapters(data_dir: str):
    """Load LibriSpeech grouped by speaker-chapter.

    Concatenates all utterances within each speaker/chapter into one long audio.
    Returns list of (chapter_id, audio_np, reference_text, duration_s).
    """
    data_path = Path(data_dir)
    trans_files = sorted(data_path.rglob("*.trans.txt"))

    chapters = []
    for trans_file in trans_files:
        chapter_dir = trans_file.parent
        chapter_id = chapter_dir.name
        speaker_id = chapter_dir.parent.name
        full_id = f"{speaker_id}-{chapter_id}"

        audios = []
        refs = []
        with open(trans_file) as f:
            for line in f:
                line = line.strip()
                if not line:
                    continue
                parts = line.split(" ", 1)
                utt_id = parts[0]
                ref_text = parts[1] if len(parts) > 1 else ""

                flac_path = chapter_dir / f"{utt_id}.flac"
                if not flac_path.exists():
                    continue

                audio, sr = sf.read(str(flac_path), dtype="float32")
                if sr != SAMPLE_RATE:
                    import librosa
                    audio = librosa.resample(audio, orig_sr=sr, target_sr=SAMPLE_RATE)

                audios.append(audio)
                refs.append(ref_text)

        if audios:
            # Concatenate with 0.5s silence between utterances
            silence = np.zeros(int(0.5 * SAMPLE_RATE), dtype=np.float32)
            combined = []
            for j, a in enumerate(audios):
                if j > 0:
                    combined.append(silence)
                combined.append(a)
            combined_audio = np.concatenate(combined)
            combined_ref = " ".join(refs)
            duration = len(combined_audio) / SAMPLE_RATE
            chapters.append((full_id, combined_audio, combined_ref, duration))

    return chapters


def transcribe_simul(asr, audio, chunk_seconds=2.0):
    """Transcribe using SimulStreaming with chunked audio feed.

    Returns (transcription_text, inference_time_seconds).
    """
    processor = Qwen3MLXSimulStreamingOnlineProcessor(asr)
    chunk_size = int(chunk_seconds * SAMPLE_RATE)
    total_samples = len(audio)
    offset = 0
    all_tokens = []

    t0 = time.perf_counter()

    while offset < total_samples:
        end = min(offset + chunk_size, total_samples)
        chunk = audio[offset:end]
        stream_time = end / SAMPLE_RATE

        processor.insert_audio_chunk(chunk, stream_time)

        is_last = (end >= total_samples)
        tokens, _ = processor.process_iter(is_last=is_last)
        if tokens:
            all_tokens.extend(tokens)
        offset = end

    # Final flush
    final_tokens, _ = processor.finish()
    if final_tokens:
        all_tokens.extend(final_tokens)

    t1 = time.perf_counter()
    inference_time = t1 - t0

    text = "".join(t.text for t in all_tokens).strip()
    return text, inference_time


def transcribe_single_shot(asr, audio):
    """Transcribe by feeding all audio at once (batch-like).

    Returns (transcription_text, inference_time_seconds).
    """
    processor = Qwen3MLXSimulStreamingOnlineProcessor(asr)

    t0 = time.perf_counter()

    duration = len(audio) / SAMPLE_RATE
    processor.insert_audio_chunk(audio, duration)
    all_tokens, _ = processor.process_iter(is_last=True)

    # Flush
    final_tokens, _ = processor.finish()
    if final_tokens:
        all_tokens.extend(final_tokens)

    t1 = time.perf_counter()
    inference_time = t1 - t0

    text = "".join(t.text for t in all_tokens).strip()
    return text, inference_time


def normalize_text(text: str) -> str:
    """Normalize text for WER computation: uppercase, strip punctuation."""
    text = text.upper()
    text = re.sub(r"[^\w\s]", "", text)
    text = re.sub(r"\s+", " ", text).strip()
    return text


def main():
    parser = argparse.ArgumentParser(description="Benchmark Qwen3-ASR MLX SimulStreaming")
    parser.add_argument("--model-size", default="0.6b", choices=["0.6b", "1.7b"],
                        help="Model size (default: 0.6b)")
    parser.add_argument("--max-utterances", type=int, default=0,
                        help="Max utterances to process (0=all). Ignored in chapter mode.")
    parser.add_argument("--librispeech-dir", default="/tmp/LibriSpeech/test-clean",
                        help="Path to LibriSpeech test-clean directory")
    parser.add_argument("--single-shot", action="store_true",
                        help="Feed entire audio at once instead of streaming chunks")
    parser.add_argument("--chunk-seconds", type=float, default=2.0,
                        help="Chunk size in seconds for simul-streaming (default: 2.0)")
    parser.add_argument("--border-fraction", type=float, default=0.25,
                        help="Border fraction for AlignAtt stopping (default: 0.25, matching H100 config)")
    parser.add_argument("--chapter-grouped", action="store_true",
                        help="Group utterances by speaker-chapter (matching H100 methodology)")
    parser.add_argument("--output-json", default=None,
                        help="Save per-utterance results to JSON file")
    args = parser.parse_args()

    # Check alignment heads
    heads_path = ALIGNMENT_HEADS.get(args.model_size)
    if heads_path and os.path.exists(heads_path):
        logger.info("Using alignment heads: %s", heads_path)
        with open(heads_path) as f:
            heads_data = json.load(f)
        n_heads = len(heads_data.get("alignment_heads_compact", []))
        logger.info("  Loaded %d alignment heads for border detection", n_heads)
    else:
        heads_path = None
        logger.warning("No alignment heads file found for %s! Using default heuristic.",
                        args.model_size)

    # Load model
    logger.info("Loading Qwen3-ASR-%s MLX SimulStreaming model...", args.model_size.upper())
    t_load_start = time.perf_counter()
    asr = Qwen3MLXSimulStreamingASR(
        model_size=args.model_size,
        lan="en",
        alignment_heads_path=heads_path,
        border_fraction=args.border_fraction,
    )
    t_load_end = time.perf_counter()
    logger.info("Model loaded in %.2fs", t_load_end - t_load_start)

    # Verify alignment heads
    logger.info("Alignment heads active: %d heads across %d layers",
                len(asr.alignment_heads), len(asr.heads_by_layer))
    if asr.alignment_heads:
        layers = sorted(asr.heads_by_layer.keys())
        logger.info("  Active layers: %s", layers[:10])
        logger.info("  First 5 heads: %s", asr.alignment_heads[:5])

    logger.info("Config: border_fraction=%.2f, chunk_seconds=%.1f",
                args.border_fraction, args.chunk_seconds)

    # Warmup
    logger.info("Running warmup inference...")
    dummy_audio = np.random.randn(SAMPLE_RATE * 3).astype(np.float32) * 0.01
    if args.single_shot:
        _, warmup_time = transcribe_single_shot(asr, dummy_audio)
    else:
        _, warmup_time = transcribe_simul(asr, dummy_audio, args.chunk_seconds)
    logger.info("Warmup done in %.2fs", warmup_time)

    # Determine mode
    mode = "single-shot" if args.single_shot else "simul-streaming"
    if args.chapter_grouped:
        mode += " (chapter-grouped)"

    logger.info("Starting benchmark: model=%s, mode=%s, bf=%.2f, chunk=%.1fs",
                args.model_size, mode, args.border_fraction, args.chunk_seconds)
    logger.info("LibriSpeech dir: %s", args.librispeech_dir)

    # Load data
    if args.chapter_grouped:
        samples = load_librispeech_chapters(args.librispeech_dir)
        logger.info("Loaded %d speaker-chapters", len(samples))
    else:
        samples = list(load_librispeech_utterances(
            args.librispeech_dir, args.max_utterances
        ))
        logger.info("Loaded %d utterances", len(samples))

    # Run benchmark
    references = []
    hypotheses = []
    per_sample_results = []
    total_audio_duration = 0.0
    total_inference_time = 0.0

    for i, (sample_id, audio, ref_text, duration) in enumerate(samples):
        if args.single_shot:
            hyp_text, infer_time = transcribe_single_shot(asr, audio)
        else:
            hyp_text, infer_time = transcribe_simul(asr, audio, args.chunk_seconds)

        ref_norm = normalize_text(ref_text)
        hyp_norm = normalize_text(hyp_text)

        # Per-sample WER
        if ref_norm:
            sample_wer = compute_wer(ref_norm, hyp_norm)
        else:
            sample_wer = 0.0

        total_audio_duration += duration
        total_inference_time += infer_time

        references.append(ref_norm)
        hypotheses.append(hyp_norm)

        result = {
            "id": sample_id,
            "ref": ref_text,
            "hyp": hyp_text,
            "ref_norm": ref_norm,
            "hyp_norm": hyp_norm,
            "duration_s": round(duration, 3),
            "infer_time_s": round(infer_time, 3),
            "rtf": round(infer_time / duration, 4) if duration > 0 else 0,
            "wer": round(sample_wer, 4),
        }
        per_sample_results.append(result)

        # Progress logging
        if (i + 1) % 50 == 0 or (i + 1) <= 5:
            running_wer = compute_wer(references, hypotheses)
            running_rtf = total_inference_time / total_audio_duration if total_audio_duration > 0 else 0
            logger.info(
                "[%d/%d] id=%s dur=%.1fs infer=%.2fs rtf=%.3f wer=%.1f%% "
                "| running: wer=%.2f%% rtf=%.3f",
                i + 1, len(samples), sample_id, duration, infer_time,
                infer_time / duration if duration > 0 else 0,
                sample_wer * 100, running_wer * 100, running_rtf,
            )

        # Show first few transcriptions
        if i < 3:
            logger.info("  REF: %s", ref_text[:120])
            logger.info("  HYP: %s", hyp_text[:120])

    # Final results
    n_samples = len(references)
    if n_samples == 0:
        logger.error("No samples processed!")
        return

    total_wer = compute_wer(references, hypotheses)
    total_cer = compute_cer(references, hypotheses)
    total_rtf = total_inference_time / total_audio_duration if total_audio_duration > 0 else 0

    total_ref_words = sum(len(r.split()) for r in references)
    total_hyp_words = sum(len(h.split()) for h in hypotheses)

    wers = [r["wer"] for r in per_sample_results]
    wers_sorted = sorted(wers)
    median_wer = wers_sorted[len(wers_sorted) // 2]
    p90_wer = wers_sorted[int(len(wers_sorted) * 0.9)]
    p95_wer = wers_sorted[int(len(wers_sorted) * 0.95)]
    zero_wer_count = sum(1 for w in wers if w == 0.0)

    unit = "chapters" if args.chapter_grouped else "utterances"

    print("\n" + "=" * 70)
    print(f"BENCHMARK RESULTS: Qwen3-ASR-{args.model_size.upper()} MLX SimulStreaming")
    print(f"Mode: {mode}")
    print(f"Config: border_fraction={args.border_fraction}, chunk={args.chunk_seconds}s")
    print("=" * 70)
    print(f"Samples ({unit}):    {n_samples}")
    print(f"Total audio:         {total_audio_duration:.1f}s ({total_audio_duration/60:.1f}min)")
    print(f"Total inference:     {total_inference_time:.1f}s ({total_inference_time/60:.1f}min)")
    print(f"Reference words:     {total_ref_words}")
    print(f"Hypothesis words:    {total_hyp_words}")
    print("-" * 70)
    print(f"WER:                 {total_wer * 100:.2f}%")
    print(f"CER:                 {total_cer * 100:.2f}%")
    print(f"RTF:                 {total_rtf:.4f}")
    if total_rtf > 0:
        print(f"  (1/RTF = {1/total_rtf:.1f}x realtime)")
    print("-" * 70)
    print(f"Median {unit[:3]} WER:    {median_wer * 100:.2f}%")
    print(f"P90 {unit[:3]} WER:       {p90_wer * 100:.2f}%")
    print(f"P95 {unit[:3]} WER:       {p95_wer * 100:.2f}%")
    print(f"Zero-WER {unit[:3]}:      {zero_wer_count}/{n_samples} ({zero_wer_count/n_samples*100:.1f}%)")
    print("-" * 70)
    print(f"Alignment heads:     {len(asr.alignment_heads)} heads, {len(asr.heads_by_layer)} layers")
    print(f"Heads file:          {heads_path or 'NONE (default heuristic)'}")
    print(f"Model loaded in:     {t_load_end - t_load_start:.2f}s")
    print("=" * 70)

    # H100 reference comparison
    print("\nH100 PyTorch SimulStream+KV reference (chapter-grouped, bf=0.25):")
    print("  0.6B: WER 6.44%, RTF 0.109 (91 chapters, 602s)")
    print("  1.7B: WER 8.09%, RTF 0.117 (91 chapters, 602s)")

    # Worst samples
    worst = sorted(per_sample_results, key=lambda r: r["wer"], reverse=True)[:10]
    print(f"\nTop 10 worst {unit}:")
    for r in worst:
        print(f"  {r['id']}: WER={r['wer']*100:.1f}% dur={r['duration_s']:.1f}s rtf={r['rtf']:.3f}")
        if r['wer'] > 0.5:
            print(f"    REF: {r['ref_norm'][:80]}")
            print(f"    HYP: {r['hyp_norm'][:80]}")

    # Save JSON results
    if args.output_json:
        output = {
            "model": f"Qwen3-ASR-{args.model_size.upper()}",
            "backend": "mlx-simul-streaming",
            "mode": mode,
            "platform": "Apple M5 (32GB)",
            "config": {
                "border_fraction": args.border_fraction,
                "chunk_seconds": args.chunk_seconds,
                "chapter_grouped": args.chapter_grouped,
            },
            "n_samples": n_samples,
            "total_audio_s": round(total_audio_duration, 2),
            "total_inference_s": round(total_inference_time, 2),
            "wer": round(total_wer, 6),
            "cer": round(total_cer, 6),
            "rtf": round(total_rtf, 6),
            "median_wer": round(median_wer, 6),
            "p90_wer": round(p90_wer, 6),
            "p95_wer": round(p95_wer, 6),
            "alignment_heads_count": len(asr.alignment_heads),
            "alignment_heads_file": heads_path,
            "per_sample": per_sample_results,
        }
        with open(args.output_json, "w") as f:
            json.dump(output, f, indent=2)
        logger.info("Results saved to %s", args.output_json)


if __name__ == "__main__":
    main()


================================================
FILE: benchmarks/h100/bench_voxtral_hf_batch.py
================================================
#!/usr/bin/env python3
"""Standalone Voxtral benchmark — no whisperlivekit imports."""
import json, logging, re, time, wave, queue, threading
import numpy as np
import torch

logging.basicConfig(level=logging.WARNING)
for n in ["transformers","torch","httpx"]:
    logging.getLogger(n).setLevel(logging.ERROR)

from jiwer import wer as compute_wer
from transformers import AutoProcessor, VoxtralRealtimeForConditionalGeneration, TextIteratorStreamer

def norm(t):
    return re.sub(r' +', ' ', re.sub(r'[^a-z0-9 ]', ' ', t.lower())).strip()

def load_audio(path):
    with wave.open(path, 'r') as wf:
        return np.frombuffer(wf.readframes(wf.getnframes()), dtype=np.int16).astype(np.float32) / 32768.0

# Load model
print("Loading Voxtral-Mini-4B...", flush=True)
MODEL_ID = "mistralai/Voxtral-Mini-4B-Realtime-2602"
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = VoxtralRealtimeForConditionalGeneration.from_pretrained(
    MODEL_ID, torch_dtype=torch.bfloat16, device_map="cuda:0",
)
print(f"Loaded, GPU: {torch.cuda.memory_allocated()/1e9:.1f} GB", flush=True)

def transcribe_batch(audio_np):
    """Simple batch transcription (not streaming)."""
    # Voxtral expects audio as input_features from processor
    inputs = processor(
        audio=audio_np, sampling_rate=16000, return_tensors="pt",
    ).to("cuda:0").to(torch.bfloat16)

    t0 = time.perf_counter()
    with torch.inference_mode():
        generated = model.generate(**inputs, max_new_tokens=1024)
    t1 = time.perf_counter()

    text = processor.batch_decode(generated, skip_special_tokens=True)[0].strip()
    return text, t1 - t0

# 1. LibriSpeech test-clean
print("\n=== Voxtral / LibriSpeech test-clean ===", flush=True)
clean = json.load(open("/home/cloud/benchmark_data/metadata.json"))
wers = []; ta = tp = 0
for i, s in enumerate(clean):
    audio = load_audio(s['path'])
    hyp, pt = transcribe_batch(audio)
    w = compute_wer(norm(s['reference']), norm(hyp))
    wers.append(w); ta += s['duration']; tp += pt
    if i < 3 or i % 20 == 0:
        print(f"  [{i}] {s['duration']:.1f}s RTF={pt/s['duration']:.2f} WER={w:.1%} | {hyp[:60]}", flush=True)
clean_wer = np.mean(wers); clean_rtf = tp/ta
print(f"  CLEAN: WER {clean_wer:.2%}, RTF {clean_rtf:.3f} ({len(clean)} samples, {ta:.0f}s)")

# 2. LibriSpeech test-other
print("\n=== Voxtral / LibriSpeech test-other ===", flush=True)
other = json.load(open("/home/cloud/benchmark_data/metadata_other.json"))
wers2 = []; ta2 = tp2 = 0
for i, s in enumerate(other):
    audio = load_audio(s['path'])
    hyp, pt = transcribe_batch(audio)
    w = compute_wer(norm(s['reference']), norm(hyp))
    wers2.append(w); ta2 += s['duration']; tp2 += pt
    if i < 3 or i % 20 == 0:
        print(f"  [{i}] {s['duration']:.1f}s RTF={pt/s['duration']:.2f} WER={w:.1%}", flush=True)
other_wer = np.mean(wers2); other_rtf = tp2/ta2
print(f"  OTHER: WER {other_wer:.2%}, RTF {other_rtf:.3f} ({len(other)} samples, {ta2:.0f}s)")

# 3. ACL6060
print("\n=== Voxtral / ACL6060 ===", flush=True)
acl_results = []
for talk in ["110", "117", "268", "367", "590"]:
    audio = load_audio(f"/home/cloud/acl6060_audio/2022.acl-long.{talk}.wav")
    dur = len(audio) / 16000
    gw = []
    with open(f"/home/cloud/iwslt26-sst/inputs/en/acl6060.ts/gold-jsonl/2022.acl-long.{talk}.jsonl") as f:
        for line in f:
            gw.append(json.loads(line)["text"].strip())
    gold = " ".join(gw)

    # For long audio, process in 30s chunks
    all_hyp = []
    t0 = time.perf_counter()
    chunk_size = 30 * 16000
    for start in range(0, len(audio), chunk_size):
        chunk = audio[start:start + chunk_size]
        if len(chunk) < 1600:  # skip very short tail
            continue
        hyp, _ = transcribe_batch(chunk)
        all_hyp.append(hyp)
    t1 = time.perf_counter()

    full_hyp = " ".join(all_hyp)
    w = compute_wer(norm(gold), norm(full_hyp))
    rtf = (t1 - t0) / dur
    acl_results.append({"talk": talk, "wer": w, "rtf": rtf, "dur": dur})
    print(f"  Talk {talk}: {dur:.0f}s, WER {w:.2%}, RTF {rtf:.3f}", flush=True)

acl_wer = np.mean([r["wer"] for r in acl_results])
acl_rtf = np.mean([r["rtf"] for r in acl_results])
print(f"  ACL6060 AVERAGE: WER {acl_wer:.2%}, RTF {acl_rtf:.3f}")

# Summary
print(f"\n{'='*60}")
print(f"  VOXTRAL BENCHMARK SUMMARY (H100 80GB)")
print(f"{'='*60}")
print(f"  {'Dataset':>25} {'WER':>7} {'RTF':>7}")
print(f"  {'-'*42}")
print(f"  {'LibriSpeech clean':>25} {clean_wer:>6.2%} {clean_rtf:>7.3f}")
print(f"  {'LibriSpeech other':>25} {other_wer:>6.2%} {other_rtf:>7.3f}")
print(f"  {'ACL6060 (5 talks)':>25} {acl_wer:>6.2%} {acl_rtf:>7.3f}")

results = {
    "clean": {"avg_wer": round(float(clean_wer), 4), "rtf": round(float(clean_rtf), 3)},
    "other": {"avg_wer": round(float(other_wer), 4), "rtf": round(float(other_rtf), 3)},
    "acl6060": {"avg_wer": round(float(acl_wer), 4), "avg_rtf": round(float(acl_rtf), 3),
                "talks": [{k: (round(float(v), 4) if isinstance(v, (float, np.floating)) else v) for k, v in r.items()} for r in acl_results]},
}
json.dump(results, open("/home/cloud/bench_voxtral_results.json", "w"), indent=2)
print(f"\nSaved to /home/cloud/bench_voxtral_results.json")


================================================
FILE: benchmarks/h100/bench_voxtral_vllm_realtime.py
================================================
#!/usr/bin/env python3
"""Benchmark Voxtral via vLLM WebSocket /v1/realtime — proper streaming."""
import asyncio, json, base64, time, wave, re, os
import numpy as np
import websockets
import librosa
from jiwer import wer as compute_wer

MODEL = "mistralai/Voxtral-Mini-4B-Realtime-2602"
WS_URI = "ws://localhost:8000/v1/realtime"

def norm(t):
    return re.sub(r' +', ' ', re.sub(r'[^a-z0-9 ]', ' ', t.lower())).strip()

async def transcribe(audio_path, max_tokens=4096):
    audio, _ = librosa.load(audio_path, sr=16000, mono=True)
    pcm16 = (audio * 32767).astype(np.int16).tobytes()
    dur = len(audio) / 16000

    t0 = time.time()
    transcript = ""
    first_token_time = None

    async with websockets.connect(WS_URI, max_size=2**24) as ws:
        await ws.recv()  # session.created
        await ws.send(json.dumps({"type": "session.update", "model": MODEL}))
        await ws.send(json.dumps({"type": "input_audio_buffer.commit"}))  # signal ready

        # Send audio in 4KB chunks
        for i in range(0, len(pcm16), 4096):
            await ws.send(json.dumps({
                "type": "input_audio_buffer.append",
                "audio": base64.b64encode(pcm16[i:i+4096]).decode(),
            }))

        await ws.send(json.dumps({"type": "input_audio_buffer.commit", "final": True}))

        while True:
            try:
                msg = json.loads(await asyncio.wait_for(ws.recv(), timeout=120))
                if msg["type"] == "transcription.delta":
                    d = msg.get("delta", "")
                    if d.strip() and first_token_time is None:
                        first_token_time = time.time() - t0
                    transcript += d
                elif msg["type"] == "transcription.done":
                    transcript = msg.get("text", transcript)
                    break
                elif msg["type"] == "error":
                    break
            except asyncio.TimeoutError:
                break

    elapsed = time.time() - t0
    return transcript.strip(), dur, elapsed / dur, first_token_time or elapsed

async def main():
    # Warmup
    print("Warmup...", flush=True)
    await transcribe("/home/cloud/benchmark_data/librispeech_clean_0000.wav")

    # LibriSpeech clean (full 91 samples)
    print("\n=== Voxtral vLLM Realtime / LibriSpeech clean ===", flush=True)
    clean = json.load(open("/home/cloud/benchmark_data/metadata.json"))
    wers = []; ta = tp = 0
    for i, s in enumerate(clean):
        hyp, dur, rtf, fwl = await transcribe(s['path'])
        w = compute_wer(norm(s['reference']), norm(hyp)) if hyp else 1.0
        wers.append(w); ta += dur; tp += dur * rtf
        if i < 3 or i % 20 == 0:
            print(f"  [{i}] {dur:.1f}s RTF={rtf:.3f} FWL={fwl:.2f}s WER={w:.1%} | {hyp[:60]}", flush=True)
    clean_wer = np.mean(wers); clean_rtf = tp / ta
    print(f"  CLEAN ({len(clean)}): WER {clean_wer:.2%}, RTF {clean_rtf:.3f}\n", flush=True)

    # LibriSpeech other (full 133 samples)
    print("=== Voxtral vLLM Realtime / LibriSpeech other ===", flush=True)
    other = json.load(open("/home/cloud/benchmark_data/metadata_other.json"))
    wers2 = []; ta2 = tp2 = 0
    for i, s in enumerate(other):
        hyp, dur, rtf, fwl = await transcribe(s['path'])
        w = compute_wer(norm(s['reference']), norm(hyp)) if hyp else 1.0
        wers2.append(w); ta2 += dur; tp2 += dur * rtf
        if i < 3 or i % 20 == 0:
            print(f"  [{i}] {dur:.1f}s RTF={rtf:.3f} WER={w:.1%}", flush=True)
    other_wer = np.mean(wers2); other_rtf = tp2 / ta2
    print(f"  OTHER ({len(other)}): WER {other_wer:.2%}, RTF {other_rtf:.3f}\n", flush=True)

    # ACL6060 talks
    print("=== Voxtral vLLM Realtime / ACL6060 ===", flush=True)
    acl = []
    for talk in ["110", "117", "268", "367", "590"]:
        gw = []
        with open(f"/home/cloud/iwslt26-sst/inputs/en/acl6060.ts/gold-jsonl/2022.acl-long.{talk}.jsonl") as f:
            for line in f: gw.append(json.loads(line)["text"].strip())
        gold = " ".join(gw)

        hyp, dur, rtf, fwl = await transcribe(f"/home/cloud/acl6060_audio/2022.acl-long.{talk}.wav")
        w = compute_wer(norm(gold), norm(hyp)) if hyp else 1.0
        acl.append({"talk": talk, "wer": round(float(w),4), "rtf": round(float(rtf),3), "dur": round(dur,1)})
        print(f"  Talk {talk}: {dur:.0f}s, WER {w:.2%}, RTF {rtf:.3f}, FWL {fwl:.2f}s", flush=True)

    acl_wer = np.mean([r["wer"] for r in acl])
    acl_rtf = np.mean([r["rtf"] for r in acl])
    print(f"  ACL6060 AVERAGE: WER {acl_wer:.2%}, RTF {acl_rtf:.3f}\n", flush=True)

    # Summary
    print(f"{'='*55}")
    print(f"  VOXTRAL vLLM REALTIME BENCHMARK (H100)")
    print(f"{'='*55}")
    print(f"  LS clean ({len(clean)}): WER {clean_wer:.2%}, RTF {clean_rtf:.3f}")
    print(f"  LS other ({len(other)}): WER {other_wer:.2%}, RTF {other_rtf:.3f}")
    print(f"  ACL6060 (5):     WER {acl_wer:.2%}, RTF {acl_rtf:.3f}")

    results = {
        "clean": {"avg_wer": round(float(clean_wer),4), "rtf": round(float(clean_rtf),3), "n": len(clean)},
        "other": {"avg_wer": round(float(other_wer),4), "rtf": round(float(other_rtf),3), "n": len(other)},
        "acl6060": {"avg_wer": round(float(acl_wer),4), "avg_rtf": round(float(acl_rtf),3), "talks": acl},
    }
    json.dump(results, open("/home/cloud/bench_voxtral_realtime_results.json", "w"), indent=2)
    print(f"\n  Saved to /home/cloud/bench_voxtral_realtime_results.json")

asyncio.run(main())


================================================
FILE: benchmarks/h100/generate_figures.py
================================================
#!/usr/bin/env python3
"""
Generate polished benchmark figures for WhisperLiveKit H100 results.

Reads data from results.json, outputs PNGs to this directory.
Run: python3 benchmarks/h100/generate_figures.py
"""
import json
import os

import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np

DIR = os.path.dirname(os.path.abspath(__file__))
DATA = json.load(open(os.path.join(DIR, "results.json")))

# ── Style constants ──
COLORS = {
    "whisper":  "#d63031",
    "qwen_b":   "#6c5ce7",
    "qwen_s":   "#00b894",
    "voxtral":  "#fdcb6e",
    "fw_m5":    "#74b9ff",
    "mlx_m5":   "#55efc4",
    "vox_m5":   "#ffeaa7",
}
plt.rcParams.update({
    "font.family": "sans-serif",
    "font.size": 11,
    "axes.spines.top": False,
    "axes.spines.right": False,
})


def _save(fig, name):
    path = os.path.join(DIR, name)
    fig.savefig(path, dpi=180, bbox_inches="tight", facecolor="white")
    plt.close(fig)
    print(f"  {name}")


# ──────────────────────────────────────────────────────────
# Figure 1: WER vs RTF scatter — H100 (LibriSpeech clean)
# ──────────────────────────────────────────────────────────
def fig_scatter_clean():
    ls = DATA["librispeech_clean"]["systems"]
    m5 = DATA["m5_reference"]["systems"]

    fig, ax = plt.subplots(figsize=(9, 7.5))

    ax.axhspan(0, 10, color="#f0fff0", alpha=0.5, zorder=0)

    # M5 (ghost dots)
    for k, v in m5.items():
        ax.scatter(v["rtf"], v["wer"], s=50, c="silver", marker="o",
                   alpha=0.22, zorder=2, linewidths=0.4, edgecolors="gray")

    # H100 systems — (name, data, color, marker, size, label_x_off, label_y_off)
    pts = [
        ("Whisper large-v3",            ls["whisper_large_v3_batch"],     COLORS["whisper"], "h", 240, -8, -16),
        ("Qwen3-ASR 0.6B (batch)",     ls["qwen3_0.6b_batch"],           COLORS["qwen_b"],  "h", 170,  8,   6),
        ("Qwen3-ASR 1.7B (batch)",     ls["qwen3_1.7b_batch"],           COLORS["qwen_b"],  "h", 240,  8, -16),
        ("Voxtral 4B (vLLM)",          ls["voxtral_4b_vllm_realtime"],   COLORS["voxtral"], "D", 260,  8,   6),
        ("Qwen3 0.6B SimulStream+KV",  ls["qwen3_0.6b_simulstream_kv"], COLORS["qwen_s"],  "s", 220,  8,   6),
        ("Qwen3 1.7B SimulStream+KV",  ls["qwen3_1.7b_simulstream_kv"], COLORS["qwen_s"],  "s", 280,  8,  -16),
    ]

    for name, d, color, marker, sz, lx, ly in pts:
        ax.scatter(d["rtf"], d["wer"], s=sz, c=color, marker=marker,
                   edgecolors="white", linewidths=1.5, zorder=5)
        ax.annotate(name, (d["rtf"], d["wer"]), fontsize=8.5, fontweight="bold",
                    xytext=(lx, ly), textcoords="offset points",
                    arrowprops=dict(arrowstyle="-", color="#aaa", lw=0.5))

    ax.set_xlabel("RTF  (lower = faster)")
    ax.set_ylabel("WER %  (lower = better)")
    ax.set_title("Speed vs Accuracy  —  LibriSpeech test-clean  (H100 80 GB)",
                 fontsize=13, fontweight="bold", pad=12)
    ax.set_xlim(-0.005, 0.20)
    ax.set_ylim(-0.3, 10)
    ax.grid(True, alpha=0.12)

    legend = [
        mpatches.Patch(color=COLORS["whisper"], label="Whisper large-v3"),
        mpatches.Patch(color=COLORS["qwen_b"],  label="Qwen3-ASR (batch)"),
        mpatches.Patch(color=COLORS["qwen_s"],  label="Qwen3 SimulStream+KV"),
        mpatches.Patch(color=COLORS["voxtral"], label="Voxtral 4B (vLLM)"),
        plt.Line2D([0],[0], marker="h", color="w", mfc="gray", ms=8, label="Batch"),
        plt.Line2D([0],[0], marker="s", color="w", mfc="gray", ms=8, label="Streaming"),
    ]
    ax.legend(handles=legend, fontsize=8.5, loc="upper right", framealpha=0.85, ncol=2)
    _save(fig, "wer_vs_rtf_clean.png")


# ──────────────────────────────────────────────────────────
# Figure 2: ACL6060 conference talks — the realistic test
# ──────────────────────────────────────────────────────────
def fig_scatter_acl6060():
    acl = DATA["acl6060"]["systems"]

    fig, ax = plt.subplots(figsize=(10, 6.5))
    ax.axhspan(0, 15, color="#f0fff0", alpha=0.4, zorder=0)

    pts = [
        ("Voxtral 4B\n(vLLM Realtime)",    acl["voxtral_4b_vllm_realtime"],  COLORS["voxtral"], "D", 380),
        ("Qwen3 1.7B\nSimulStream+KV",     acl["qwen3_1.7b_simulstream_kv"], COLORS["qwen_s"],  "s", 380),
        ("Qwen3 0.6B\nSimulStream+KV",     acl["qwen3_0.6b_simulstream_kv"], COLORS["qwen_s"],  "s", 260),
        ("Whisper large-v3\n(batch)",       acl["whisper_large_v3_batch"],    COLORS["whisper"], "h", 320),
    ]
    label_off = [(10, -12), (10, 6), (10, 6), (10, 6)]

    for (name, d, color, marker, sz), (lx, ly) in zip(pts, label_off):
        wer = d["avg_wer"]; rtf = d["avg_rtf"]
        ax.scatter(rtf, wer, s=sz, c=color, marker=marker,
                   edgecolors="white", linewidths=1.5, zorder=5)
        ax.annotate(name, (rtf, wer), fontsize=9.5, fontweight="bold",
                    xytext=(lx, ly), textcoords="offset points",
                    arrowprops=dict(arrowstyle="-", color="#aaa", lw=0.6))

    # Cascade annotation
    ax.annotate("Full STT+MT cascade\nRTF 0.15 (real-time)",
                xy=(0.151, 1), xytext=(0.25, 4),
                fontsize=9, fontstyle="italic", color="#1565c0",
                arrowprops=dict(arrowstyle="->", color="#1565c0", lw=1.5),
                bbox=dict(boxstyle="round,pad=0.3", fc="#e3f2fd", ec="#90caf9", alpha=0.9))

    ax.set_xlabel("RTF  (lower = faster)")
    ax.set_ylabel("WER %  (lower = better)")
    ax.set_title("ACL6060 Conference Talks  —  5 talks, 58 min  (H100 80 GB)",
                 fontsize=13, fontweight="bold", pad=12)
    ax.set_xlim(-0.005, 0.30)
    ax.set_ylim(-1, 26)
    ax.grid(True, alpha=0.12)
    _save(fig, "wer_vs_rtf_acl6060.png")


# ──────────────────────────────────────────────────────────
# Figure 3: Bar chart — WER + RTF side-by-side
# ──────────────────────────────────────────────────────────
def fig_bars():
    names = [
        "Whisper\nlarge-v3", "Voxtral 4B\n(vLLM)", "Qwen3 0.6B\n(batch)",
        "Qwen3 1.7B\n(batch)", "Qwen3 0.6B\nSimulStream", "Qwen3 1.7B\nSimulStream",
    ]
    wer_c = [2.02, 2.71, 2.30, 2.46, 6.44, 8.09]
    wer_o = [7.79, 9.26, 6.12, 5.34, 9.27, 9.56]
    rtf_c = [0.071, 0.137, 0.065, 0.069, 0.109, 0.117]
    fwl   = [472, 137, 432, 457, 91, 94]  # ms
    cols  = [COLORS["whisper"], COLORS["voxtral"], COLORS["qwen_b"],
             COLORS["qwen_b"], COLORS["qwen_s"], COLORS["qwen_s"]]
    cols_l = ["#ff7675", "#ffeaa7", "#a29bfe", "#a29bfe", "#55efc4", "#55efc4"]

    x = np.arange(len(names))
    fig, axes = plt.subplots(1, 3, figsize=(16, 6))

    # WER
    ax = axes[0]; w = 0.36
    ax.bar(x - w/2, wer_c, w, color=cols, alpha=0.9, edgecolor="white", label="test-clean")
    ax.bar(x + w/2, wer_o, w, color=cols_l, alpha=0.65, edgecolor="white", label="test-other")
    ax.set_ylabel("WER %"); ax.set_title("Word Error Rate", fontweight="bold")
    ax.set_xticks(x); ax.set_xticklabels(names, fontsize=7.5, rotation=25, ha="right")
    ax.legend(fontsize=8); ax.grid(axis="y", alpha=0.15)
    for i, v in enumerate(wer_c):
        ax.text(i - w/2, v + 0.2, f"{v:.1f}", ha="center", fontsize=7, fontweight="bold")

    # RTF
    ax = axes[1]
    ax.bar(x, rtf_c, 0.55, color=cols, alpha=0.9, edgecolor="white")
    ax.set_ylabel("RTF  (lower = faster)"); ax.set_title("Real-Time Factor (test-clean)", fontweight="bold")
    ax.set_xticks(x); ax.set_xticklabels(names, fontsize=7.5, rotation=25, ha="right")
    ax.grid(axis="y", alpha=0.15)
    for i, v in enumerate(rtf_c):
        ax.text(i, v + 0.003, f"{v:.3f}", ha="center", fontsize=8, fontweight="bold")

    # First-word latency
    ax = axes[2]
    ax.bar(x, fwl, 0.55, color=cols, alpha=0.9, edgecolor="white")
    ax.set_ylabel("ms"); ax.set_title("First Word Latency", fontweight="bold")
    ax.set_xticks(x); ax.set_xticklabels(names, fontsize=7.5, rotation=25, ha="right")
    ax.grid(axis="y", alpha=0.15)
    for i, v in enumerate(fwl):
        ax.text(i, v + 8, f"{v}", ha="center", fontsize=8, fontweight="bold")

    fig.suptitle("LibriSpeech Benchmark  —  H100 80 GB", fontsize=14, fontweight="bold")
    plt.tight_layout()
    _save(fig, "bars_wer_rtf_latency.png")


# ──────────────────────────────────────────────────────────
# Figure 4: Clean vs Other robustness
# ──────────────────────────────────────────────────────────
def fig_robustness():
    models = [
        ("Whisper large-v3",          2.02, 7.79, COLORS["whisper"], "h", 280),
        ("Qwen3 0.6B (batch)",       2.30, 6.12, COLORS["qwen_b"],  "h", 180),
        ("Qwen3 1.7B (batch)",       2.46, 5.34, COLORS["qwen_b"],  "h", 280),
        ("Voxtral 4B (vLLM)",        2.71, 9.26, COLORS["voxtral"], "D", 280),
        ("Qwen3 0.6B\nSimulStream",  6.44, 9.27, COLORS["qwen_s"],  "s", 240),
        ("Qwen3 1.7B\nSimulStream",  8.09, 9.56, COLORS["qwen_s"],  "s", 300),
    ]
    # Manual label offsets — carefully placed to avoid overlap
    offsets = [(-55, 10), (8, 10), (8, -18), (-55, -18), (-10, 12), (10, -18)]

    fig, ax = plt.subplots(figsize=(8.5, 7))
    ax.plot([0, 13], [0, 13], "--", color="#ccc", lw=1, zorder=1)
    ax.fill_between([0, 13], [0, 13], [13, 13], color="#fff5f5", alpha=0.5, zorder=0)
    ax.text(4, 11, "degrades more\non noisy audio", fontsize=9, color="#bbb", fontstyle="italic")

    for (name, wc, wo, color, marker, sz), (lx, ly) in zip(models, offsets):
        ax.scatter(wc, wo, s=sz, c=color, marker=marker,
                   edgecolors="white", linewidths=1.5, zorder=5)
        ax.annotate(name, (wc, wo), fontsize=8.5, fontweight="bold",
                    xytext=(lx, ly), textcoords="offset points",
                    arrowprops=dict(arrowstyle="-", color="#aaa", lw=0.6))
        deg = wo - wc
        ax.annotate(f"+{deg:.1f}%", (wc, wo), fontsize=7, color="#999",
                    xytext=(-6, -13), textcoords="offset points")

    ax.set_xlabel("WER % on test-clean")
    ax.set_ylabel("WER % on test-other")
    ax.set_title("Clean vs Noisy Robustness  (H100 80 GB)", fontsize=13, fontweight="bold", pad=12)
    ax.set_xlim(-0.3, 12); ax.set_ylim(-0.3, 12)
    ax.set_aspect("equal"); ax.grid(True, alpha=0.12)
    _save(fig, "robustness_clean_vs_other.png")


# ──────────────────────────────────────────────────────────
# Figure 5: ACL6060 per-talk breakdown (Qwen3 vs Voxtral)
# ──────────────────────────────────────────────────────────
def fig_per_talk():
    q = DATA["acl6060"]["systems"]["qwen3_1.7b_simulstream_kv"]["per_talk"]
    v = DATA["acl6060"]["systems"]["voxtral_4b_vllm_realtime"]["per_talk"]
    talks = DATA["acl6060"]["talks"]

    fig, ax = plt.subplots(figsize=(9, 5))
    x = np.arange(len(talks)); w = 0.35

    bars_v = ax.bar(x - w/2, [v[t] for t in talks], w, color=COLORS["voxtral"],
                    edgecolor="white", label="Voxtral 4B (vLLM)")
    bars_q = ax.bar(x + w/2, [q[t] for t in talks], w, color=COLORS["qwen_s"],
                    edgecolor="white", label="Qwen3 1.7B SimulStream+KV")

    for bar in bars_v:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.3,
                f"{bar.get_height():.1f}", ha="center", fontsize=8)
    for bar in bars_q:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.3,
                f"{bar.get_height():.1f}", ha="center", fontsize=8)

    ax.set_xlabel("ACL6060 Talk ID")
    ax.set_ylabel("WER %")
    ax.set_title("Per-Talk WER  —  ACL6060 Conference Talks  (H100 80 GB)",
                 fontsize=13, fontweight="bold", pad=12)
    ax.set_xticks(x); ax.set_xticklabels([f"Talk {t}" for t in talks])
    ax.legend(fontsize=9); ax.grid(axis="y", alpha=0.15)
    ax.set_ylim(0, 18)
    _save(fig, "acl6060_per_talk.png")


if __name__ == "__main__":
    print("Generating H100 benchmark figures...")
    fig_scatter_clean()
    fig_scatter_acl6060()
    fig_bars()
    fig_robustness()
    fig_per_talk()
    print("Done!")


================================================
FILE: benchmarks/h100/results.json
================================================
{
  "hardware": "NVIDIA H100 80GB HBM3, CUDA 12.4, Driver 550.163",
  "date": "2026-03-15",

  "librispeech_clean": {
    "n_samples": 91,
    "total_audio_s": 602,
    "systems": {
      "whisper_large_v3_batch":     {"wer": 2.02, "rtf": 0.071, "first_word_latency_s": 0.472},
      "qwen3_0.6b_batch":          {"wer": 2.30, "rtf": 0.065, "first_word_latency_s": 0.432},
      "qwen3_1.7b_batch":          {"wer": 2.46, "rtf": 0.069, "first_word_latency_s": 0.457},
      "voxtral_4b_vllm_realtime":  {"wer": 2.71, "rtf": 0.137, "first_word_latency_s": 0.137},
      "qwen3_0.6b_simulstream_kv": {"wer": 6.44, "rtf": 0.109, "first_word_latency_s": 0.091},
      "qwen3_1.7b_simulstream_kv": {"wer": 8.09, "rtf": 0.117, "first_word_latency_s": 0.094}
    }
  },

  "librispeech_other": {
    "n_samples": 133,
    "total_audio_s": 600,
    "systems": {
      "qwen3_1.7b_batch":          {"wer": 5.34, "rtf": 0.088},
      "qwen3_0.6b_batch":          {"wer": 6.12, "rtf": 0.086},
      "whisper_large_v3_batch":     {"wer": 7.79, "rtf": 0.092},
      "qwen3_0.6b_simulstream_kv": {"wer": 9.27, "rtf": 0.127},
      "voxtral_4b_vllm_realtime":  {"wer": 9.26, "rtf": 0.144},
      "qwen3_1.7b_simulstream_kv": {"wer": 9.56, "rtf": 0.140}
    }
  },

  "acl6060": {
    "description": "5 ACL 2022 conference talks, 58 min total",
    "talks": ["110", "117", "268", "367", "590"],
    "systems": {
      "voxtral_4b_vllm_realtime":  {"avg_wer": 7.83, "avg_rtf": 0.203, "per_talk": {"110": 5.18, "117": 2.24, "268": 14.88, "367": 9.40, "590": 7.45}},
      "qwen3_1.7b_simulstream_kv": {"avg_wer": 9.20, "avg_rtf": 0.074, "per_talk": {"110": 5.59, "117": 8.12, "268": 12.25, "367": 12.29, "590": 7.77}},
      "qwen3_0.6b_simulstream_kv": {"avg_wer": 13.21, "avg_rtf": 0.098},
      "whisper_large_v3_batch":     {"avg_wer": 22.53, "avg_rtf": 0.125}
    }
  },

  "m5_reference": {
    "description": "MacBook M5 results (from WLK scatter benchmarks)",
    "systems": {
      "fw_la_base":    {"wer": 17.0, "rtf": 0.82},
      "fw_la_small":   {"wer":  8.6, "rtf": 0.76},
      "fw_ss_base":    {"wer":  7.8, "rtf": 0.46},
      "fw_ss_small":   {"wer":  7.0, "rtf": 0.90},
      "mlx_ss_base":   {"wer":  7.7, "rtf": 0.34},
      "mlx_ss_small":  {"wer":  6.5, "rtf": 0.68},
      "voxtral_mlx":   {"wer":  7.0, "rtf": 0.26},
      "qwen3_mlx_0.6b":{"wer":  5.5, "rtf": 0.55},
      "qwen3_0.6b_batch":{"wer":24.0, "rtf": 1.42}
    }
  }
}


================================================
FILE: benchmarks/m5/bench_0.6b_simul_500.json
================================================
{
  "model": "Qwen3-ASR-0.6B",
  "backend": "mlx-simul-streaming",
  "mode": "simul-streaming",
  "platform": "Apple M5 (32GB)",
  "config": {
    "border_fraction": 0.25,
    "chunk_seconds": 2.0,
    "chapter_grouped": false
  },
  "n_samples": 500,
  "total_audio_s": 3809.0,
  "total_inference_s": 1000.08,
  "wer": 0.032951,
  "cer": 0.006307,
  "rtf": 0.262557,
  "median_wer": 0.0,
  "p90_wer": 0.1224,
  "p95_wer": 0.2,
  "alignment_heads_count": 20,
  "alignment_heads_file": "/Users/quentin/Documents/repos/WhisperLiveKit/scripts/alignment_heads_qwen3_asr_0.6B.json",
  "per_sample": [
    {
      "id": "1089-134686-0000",
      "ref": "HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FATTENED SAUCE",
      "hyp": "He hoped there would be stew for dinner: turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick peppered flour-fatted sauce.",
      "ref_norm": "HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FATTENED SAUCE",
      "hyp_norm": "HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOURFATTED SAUCE",
      "duration_s": 10.435,
      "infer_time_s": 2.853,
      "rtf": 0.2734,
      "wer": 0.0714
    },
    {
      "id": "1089-134686-0001",
      "ref": "STUFF IT INTO YOU HIS BELLY COUNSELLED HIM",
      "hyp": "Stuff it into you, his belly counseled him.",
      "ref_norm": "STUFF IT INTO YOU HIS BELLY COUNSELLED HIM",
      "hyp_norm": "STUFF IT INTO YOU HIS BELLY COUNSELED HIM",
      "duration_s": 3.275,
      "infer_time_s": 0.887,
      "rtf": 0.2709,
      "wer": 0.125
    },
    {
      "id": "1089-134686-0002",
      "ref": "AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS",
      "hyp": "After early night fall, the yellow lamps would light up here and there. The s qualid quarter of the brothels.",
      "ref_norm": "AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS",
      "hyp_norm": "AFTER EARLY NIGHT FALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE S QUALID QUARTER OF THE BROTHELS",
      "duration_s": 6.625,
      "infer_time_s": 1.857,
      "rtf": 0.2803,
      "wer": 0.2222
    },
    {
      "id": "1089-134686-0003",
      "ref": "HELLO BERTIE ANY GOOD IN YOUR MIND",
      "hyp": "Hello, Bertie. Any good in your mind?",
      "ref_norm": "HELLO BERTIE ANY GOOD IN YOUR MIND",
      "hyp_norm": "HELLO BERTIE ANY GOOD IN YOUR MIND",
      "duration_s": 2.68,
      "infer_time_s": 0.831,
      "rtf": 0.3099,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0004",
      "ref": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND",
      "hyp": "Number ten, fresh Nelly is waiting on you. Good night, husband.",
      "ref_norm": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND",
      "hyp_norm": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND",
      "duration_s": 5.215,
      "infer_time_s": 1.23,
      "rtf": 0.2358,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0005",
      "ref": "THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEY'S FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS",
      "hyp": "The music came nearer, and he recalled the words, the words of Shelley's fragment upon the moon , wandering companionless, pale for weariness.",
      "ref_norm": "THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEYS FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS",
      "hyp_norm": "THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEYS FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS",
      "duration_s": 9.635,
      "infer_time_s": 2.28,
      "rtf": 0.2367,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0006",
      "ref": "THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL",
      "hyp": "The dull light fell more faintly upon the page, whereon another equation began to unfold itself slowly, and to spread abroad its widening tail.",
      "ref_norm": "THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL",
      "hyp_norm": "THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL",
      "duration_s": 10.555,
      "infer_time_s": 2.399,
      "rtf": 0.2273,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0007",
      "ref": "A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL",
      "hyp": "A cold, lucid indifference re igned in his soul.",
      "ref_norm": "A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL",
      "hyp_norm": "A COLD LUCID INDIFFERENCE RE IGNED IN HIS SOUL",
      "duration_s": 4.275,
      "infer_time_s": 1.016,
      "rtf": 0.2376,
      "wer": 0.25
    },
    {
      "id": "1089-134686-0008",
      "ref": "THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF",
      "hyp": "The chaos in which his ardor extinguished itself was a cold, indifferent knowledge of himself.",
      "ref_norm": "THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF",
      "hyp_norm": "THE CHAOS IN WHICH HIS ARDOR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF",
      "duration_s": 6.73,
      "infer_time_s": 1.533,
      "rtf": 0.2278,
      "wer": 0.0667
    },
    {
      "id": "1089-134686-0009",
      "ref": "AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE",
      "hyp": "At most, by an alms given to a beg gar whose blessing he fled from, he might hope wearily to win for himself some measure of actual grace.",
      "ref_norm": "AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE",
      "hyp_norm": "AT MOST BY AN ALMS GIVEN TO A BEG GAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE",
      "duration_s": 10.575,
      "infer_time_s": 2.631,
      "rtf": 0.2488,
      "wer": 0.0741
    },
    {
      "id": "1089-134686-0010",
      "ref": "WELL NOW ENNIS I DECLARE YOU HAVE A HEAD AND SO HAS MY STICK",
      "hyp": "Well now, Ennis, I declare you have a head , and so has my stick.",
      "ref_norm": "WELL NOW ENNIS I DECLARE YOU HAVE A HEAD AND SO HAS MY STICK",
      "hyp_norm": "WELL NOW ENNIS I DECLARE YOU HAVE A HEAD AND SO HAS MY STICK",
      "duration_s": 4.405,
      "infer_time_s": 1.385,
      "rtf": 0.3144,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0011",
      "ref": "ON SATURDAY MORNINGS WHEN THE SODALITY MET IN THE CHAPEL TO RECITE THE LITTLE OFFICE HIS PLACE WAS A CUSHIONED KNEELING DESK AT THE RIGHT OF THE ALTAR FROM WHICH HE LED HIS WING OF BOYS THROUGH THE RESPONSES",
      "hyp": "On Saturday mornings , when the sodality met in the chapel to recite the Little Office, his place was a cushioned kneeling desk at the right of the altar , from which he led his wing of boys through the responses.",
      "ref_norm": "ON SATURDAY MORNINGS WHEN THE SODALITY MET IN THE CHAPEL TO RECITE THE LITTLE OFFICE HIS PLACE WAS A CUSHIONED KNEELING DESK AT THE RIGHT OF THE ALTAR FROM WHICH HE LED HIS WING OF BOYS THROUGH THE RESPONSES",
      "hyp_norm": "ON SATURDAY MORNINGS WHEN THE SODALITY MET IN THE CHAPEL TO RECITE THE LITTLE OFFICE HIS PLACE WAS A CUSHIONED KNEELING DESK AT THE RIGHT OF THE ALTAR FROM WHICH HE LED HIS WING OF BOYS THROUGH THE RESPONSES",
      "duration_s": 12.445,
      "infer_time_s": 3.527,
      "rtf": 0.2834,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0012",
      "ref": "HER EYES SEEMED TO REGARD HIM WITH MILD PITY HER HOLINESS A STRANGE LIGHT GLOWING FAINTLY UPON HER FRAIL FLESH DID NOT HUMILIATE THE SINNER WHO APPROACHED HER",
      "hyp": "Her eyes seemed to regard him with mild pity; her holiness , a strange light glowing faintly upon her frail flesh, did not humiliate the sinner who approached her.",
      "ref_norm": "HER EYES SEEMED TO REGARD HIM WITH MILD PITY HER HOLINESS A STRANGE LIGHT GLOWING FAINTLY UPON HER FRAIL FLESH DID NOT HUMILIATE THE SINNER WHO APPROACHED HER",
      "hyp_norm": "HER EYES SEEMED TO REGARD HIM WITH MILD PITY HER HOLINESS A STRANGE LIGHT GLOWING FAINTLY UPON HER FRAIL FLESH DID NOT HUMILIATE THE SINNER WHO APPROACHED HER",
      "duration_s": 11.64,
      "infer_time_s": 2.801,
      "rtf": 0.2407,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0013",
      "ref": "IF EVER HE WAS IMPELLED TO CAST SIN FROM HIM AND TO REPENT THE IMPULSE THAT MOVED HIM WAS THE WISH TO BE HER KNIGHT",
      "hyp": "If ever he was imp elled to cast sin from him and to repent, the impulse that moved him was the wish to be her knight.",
      "ref_norm": "IF EVER HE WAS IMPELLED TO CAST SIN FROM HIM AND TO REPENT THE IMPULSE THAT MOVED HIM WAS THE WISH TO BE HER KNIGHT",
      "hyp_norm": "IF EVER HE WAS IMP ELLED TO CAST SIN FROM HIM AND TO REPENT THE IMPULSE THAT MOVED HIM WAS THE WISH TO BE HER KNIGHT",
      "duration_s": 7.915,
      "infer_time_s": 2.057,
      "rtf": 0.2599,
      "wer": 0.08
    },
    {
      "id": "1089-134686-0014",
      "ref": "HE TRIED TO THINK HOW IT COULD BE",
      "hyp": "He tried to think how it could be.",
      "ref_norm": "HE TRIED TO THINK HOW IT COULD BE",
      "hyp_norm": "HE TRIED TO THINK HOW IT COULD BE",
      "duration_s": 2.225,
      "infer_time_s": 0.744,
      "rtf": 0.3346,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0015",
      "ref": "BUT THE DUSK DEEPENING IN THE SCHOOLROOM COVERED OVER HIS THOUGHTS THE BELL RANG",
      "hyp": "But the dusk deepening in the schoolroom covered over his thoughts. The bell rang.",
      "ref_norm": "BUT THE DUSK DEEPENING IN THE SCHOOLROOM COVERED OVER HIS THOUGHTS THE BELL RANG",
      "hyp_norm": "BUT THE DUSK DEEPENING IN THE SCHOOLROOM COVERED OVER HIS THOUGHTS THE BELL RANG",
      "duration_s": 5.815,
      "infer_time_s": 1.358,
      "rtf": 0.2336,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0016",
      "ref": "THEN YOU CAN ASK HIM QUESTIONS ON THE CATECHISM DEDALUS",
      "hyp": "Then you can ask him questions on the catechism, Dedalus.",
      "ref_norm": "THEN YOU CAN ASK HIM QUESTIONS ON THE CATECHISM DEDALUS",
      "hyp_norm": "THEN YOU CAN ASK HIM QUESTIONS ON THE CATECHISM DEDALUS",
      "duration_s": 3.54,
      "infer_time_s": 1.057,
      "rtf": 0.2985,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0017",
      "ref": "STEPHEN LEANING BACK AND DRAWING IDLY ON HIS SCRIBBLER LISTENED TO THE TALK ABOUT HIM WHICH HERON CHECKED FROM TIME TO TIME BY SAYING",
      "hyp": "Stephen, leaning back and drawing idly on his scribbler, listened to the talk about him , which Heron checked from time to time by saying.",
      "ref_norm": "STEPHEN LEANING BACK AND DRAWING IDLY ON HIS SCRIBBLER LISTENED TO THE TALK ABOUT HIM WHICH HERON CHECKED FROM TIME TO TIME BY SAYING",
      "hyp_norm": "STEPHEN LEANING BACK AND DRAWING IDLY ON HIS SCRIBBLER LISTENED TO THE TALK ABOUT HIM WHICH HERON CHECKED FROM TIME TO TIME BY SAYING",
      "duration_s": 8.87,
      "infer_time_s": 2.4,
      "rtf": 0.2706,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0018",
      "ref": "IT WAS STRANGE TOO THAT HE FOUND AN ARID PLEASURE IN FOLLOWING UP TO THE END THE RIGID LINES OF THE DOCTRINES OF THE CHURCH AND PENETRATING INTO OBSCURE SILENCES ONLY TO HEAR AND FEEL THE MORE DEEPLY HIS OWN CONDEMNATION",
      "hyp": "It was strange too that he found an arid pleasure in following up to the end the rigid lines of the doctrines of the church and penetrating into obscure silences only to hear and feel the more deeply his own condemnation.",
      "ref_norm": "IT WAS STRANGE TOO THAT HE FOUND AN ARID PLEASURE IN FOLLOWING UP TO THE END THE RIGID LINES OF THE DOCTRINES OF THE CHURCH AND PENETRATING INTO OBSCURE SILENCES ONLY TO HEAR AND FEEL THE MORE DEEPLY HIS OWN CONDEMNATION",
      "hyp_norm": "IT WAS STRANGE TOO THAT HE FOUND AN ARID PLEASURE IN FOLLOWING UP TO THE END THE RIGID LINES OF THE DOCTRINES OF THE CHURCH AND PENETRATING INTO OBSCURE SILENCES ONLY TO HEAR AND FEEL THE MORE DEEPLY HIS OWN CONDEMNATION",
      "duration_s": 15.72,
      "infer_time_s": 3.611,
      "rtf": 0.2297,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0019",
      "ref": "THE SENTENCE OF SAINT JAMES WHICH SAYS THAT HE WHO OFFENDS AGAINST ONE COMMANDMENT BECOMES GUILTY OF ALL HAD SEEMED TO HIM FIRST A SWOLLEN PHRASE UNTIL HE HAD BEGUN TO GROPE IN THE DARKNESS OF HIS OWN STATE",
      "hyp": "The sentence of Saint James, which says that he who offends against one commandment becomes guilty of all, had seemed to him first a swollen phrase until he had begun to grope in the darkness of his own state.",
      "ref_norm": "THE SENTENCE OF SAINT JAMES WHICH SAYS THAT HE WHO OFFENDS AGAINST ONE COMMANDMENT BECOMES GUILTY OF ALL HAD SEEMED TO HIM FIRST A SWOLLEN PHRASE UNTIL HE HAD BEGUN TO GROPE IN THE DARKNESS OF HIS OWN STATE",
      "hyp_norm": "THE SENTENCE OF SAINT JAMES WHICH SAYS THAT HE WHO OFFENDS AGAINST ONE COMMANDMENT BECOMES GUILTY OF ALL HAD SEEMED TO HIM FIRST A SWOLLEN PHRASE UNTIL HE HAD BEGUN TO GROPE IN THE DARKNESS OF HIS OWN STATE",
      "duration_s": 13.895,
      "infer_time_s": 3.445,
      "rtf": 0.248,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0020",
      "ref": "IF A MAN HAD STOLEN A POUND IN HIS YOUTH AND HAD USED THAT POUND TO AMASS A HUGE FORTUNE HOW MUCH WAS HE OBLIGED TO GIVE BACK THE POUND HE HAD STOLEN ONLY OR THE POUND TOGETHER WITH THE COMPOUND INTEREST ACCRUING UPON IT OR ALL HIS HUGE FORTUNE",
      "hyp": "If a man had stolen a pound in his youth and had used that pound to amass a huge fortune , how much was he obliged to give back \u2014the pound he had stolen only, or the pound together with the compound interest accruing upon it, or all his huge fortune?",
      "ref_norm": "IF A MAN HAD STOLEN A POUND IN HIS YOUTH AND HAD USED THAT POUND TO AMASS A HUGE FORTUNE HOW MUCH WAS HE OBLIGED TO GIVE BACK THE POUND HE HAD STOLEN ONLY OR THE POUND TOGETHER WITH THE COMPOUND INTEREST ACCRUING UPON IT OR ALL HIS HUGE FORTUNE",
      "hyp_norm": "IF A MAN HAD STOLEN A POUND IN HIS YOUTH AND HAD USED THAT POUND TO AMASS A HUGE FORTUNE HOW MUCH WAS HE OBLIGED TO GIVE BACK THE POUND HE HAD STOLEN ONLY OR THE POUND TOGETHER WITH THE COMPOUND INTEREST ACCRUING UPON IT OR ALL HIS HUGE FORTUNE",
      "duration_s": 16.79,
      "infer_time_s": 4.378,
      "rtf": 0.2608,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0021",
      "ref": "IF A LAYMAN IN GIVING BAPTISM POUR THE WATER BEFORE SAYING THE WORDS IS THE CHILD BAPTIZED",
      "hyp": "If a layman in giving baptism pour the water before saying the words , is the child baptized?",
      "ref_norm": "IF A LAYMAN IN GIVING BAPTISM POUR THE WATER BEFORE SAYING THE WORDS IS THE CHILD BAPTIZED",
      "hyp_norm": "IF A LAYMAN IN GIVING BAPTISM POUR THE WATER BEFORE SAYING THE WORDS IS THE CHILD BAPTIZED",
      "duration_s": 6.55,
      "infer_time_s": 1.616,
      "rtf": 0.2468,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0022",
      "ref": "HOW COMES IT THAT WHILE THE FIRST BEATITUDE PROMISES THE KINGDOM OF HEAVEN TO THE POOR OF HEART THE SECOND BEATITUDE PROMISES ALSO TO THE MEEK THAT THEY SHALL POSSESS THE LAND",
      "hyp": "How comes it that while the first beatitude promises the kingdom of heaven to the poor of heart, the second beatitude promises also to the meek that they shall possess the land?",
      "ref_norm": "HOW COMES IT THAT WHILE THE FIRST BEATITUDE PROMISES THE KINGDOM OF HEAVEN TO THE POOR OF HEART THE SECOND BEATITUDE PROMISES ALSO TO THE MEEK THAT THEY SHALL POSSESS THE LAND",
      "hyp_norm": "HOW COMES IT THAT WHILE THE FIRST BEATITUDE PROMISES THE KINGDOM OF HEAVEN TO THE POOR OF HEART THE SECOND BEATITUDE PROMISES ALSO TO THE MEEK THAT THEY SHALL POSSESS THE LAND",
      "duration_s": 11.175,
      "infer_time_s": 2.879,
      "rtf": 0.2576,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0023",
      "ref": "WHY WAS THE SACRAMENT OF THE EUCHARIST INSTITUTED UNDER THE TWO SPECIES OF BREAD AND WINE IF JESUS CHRIST BE PRESENT BODY AND BLOOD SOUL AND DIVINITY IN THE BREAD ALONE AND IN THE WINE ALONE",
      "hyp": "Why was the sacrament of the Eucharist instituted under the two species of bread and wine? If Jesus Christ be present body and blood, soul and divinity in the bread alone and in the wine alone.",
      "ref_norm": "WHY WAS THE SACRAMENT OF THE EUCHARIST INSTITUTED UNDER THE TWO SPECIES OF BREAD AND WINE IF JESUS CHRIST BE PRESENT BODY AND BLOOD SOUL AND DIVINITY IN THE BREAD ALONE AND IN THE WINE ALONE",
      "hyp_norm": "WHY WAS THE SACRAMENT OF THE EUCHARIST INSTITUTED UNDER THE TWO SPECIES OF BREAD AND WINE IF JESUS CHRIST BE PRESENT BODY AND BLOOD SOUL AND DIVINITY IN THE BREAD ALONE AND IN THE WINE ALONE",
      "duration_s": 13.275,
      "infer_time_s": 3.354,
      "rtf": 0.2526,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0024",
      "ref": "IF THE WINE CHANGE INTO VINEGAR AND THE HOST CRUMBLE INTO CORRUPTION AFTER THEY HAVE BEEN CONSECRATED IS JESUS CHRIST STILL PRESENT UNDER THEIR SPECIES AS GOD AND AS MAN",
      "hyp": "If the wine change into vinegar, and the host crumble into corruption after they have been consecrated , is Jesus Christ still present under their species as God and as man?",
      "ref_norm": "IF THE WINE CHANGE INTO VINEGAR AND THE HOST CRUMBLE INTO CORRUPTION AFTER THEY HAVE BEEN CONSECRATED IS JESUS CHRIST STILL PRESENT UNDER THEIR SPECIES AS GOD AND AS MAN",
      "hyp_norm": "IF THE WINE CHANGE INTO VINEGAR AND THE HOST CRUMBLE INTO CORRUPTION AFTER THEY HAVE BEEN CONSECRATED IS JESUS CHRIST STILL PRESENT UNDER THEIR SPECIES AS GOD AND AS MAN",
      "duration_s": 11.655,
      "infer_time_s": 2.765,
      "rtf": 0.2372,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0025",
      "ref": "A GENTLE KICK FROM THE TALL BOY IN THE BENCH BEHIND URGED STEPHEN TO ASK A DIFFICULT QUESTION",
      "hyp": "A gentle kick from the tall boy in the bench behind urged Stephen to ask a difficult question.",
      "ref_norm": "A GENTLE KICK FROM THE TALL BOY IN THE BENCH BEHIND URGED STEPHEN TO ASK A DIFFICULT QUESTION",
      "hyp_norm": "A GENTLE KICK FROM THE TALL BOY IN THE BENCH BEHIND URGED STEPHEN TO ASK A DIFFICULT QUESTION",
      "duration_s": 6.61,
      "infer_time_s": 1.562,
      "rtf": 0.2362,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0026",
      "ref": "THE RECTOR DID NOT ASK FOR A CATECHISM TO HEAR THE LESSON FROM",
      "hyp": "The rector did not ask for a catechism to hear the lesson from.",
      "ref_norm": "THE RECTOR DID NOT ASK FOR A CATECHISM TO HEAR THE LESSON FROM",
      "hyp_norm": "THE RECTOR DID NOT ASK FOR A CATECHISM TO HEAR THE LESSON FROM",
      "duration_s": 4.01,
      "infer_time_s": 1.309,
      "rtf": 0.3263,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0027",
      "ref": "HE CLASPED HIS HANDS ON THE DESK AND SAID",
      "hyp": "He clasped his hands on the desk and said.",
      "ref_norm": "HE CLASPED HIS HANDS ON THE DESK AND SAID",
      "hyp_norm": "HE CLASPED HIS HANDS ON THE DESK AND SAID",
      "duration_s": 2.71,
      "infer_time_s": 0.841,
      "rtf": 0.3104,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0028",
      "ref": "THE RETREAT WILL BEGIN ON WEDNESDAY AFTERNOON IN HONOUR OF SAINT FRANCIS XAVIER WHOSE FEAST DAY IS SATURDAY",
      "hyp": "The retreat will begin on Wednesday afternoon in honor of Saint Francis Xavier, whose feast day is Saturday.",
      "ref_norm": "THE RETREAT WILL BEGIN ON WEDNESDAY AFTERNOON IN HONOUR OF SAINT FRANCIS XAVIER WHOSE FEAST DAY IS SATURDAY",
      "hyp_norm": "THE RETREAT WILL BEGIN ON WEDNESDAY AFTERNOON IN HONOR OF SAINT FRANCIS XAVIER WHOSE FEAST DAY IS SATURDAY",
      "duration_s": 7.83,
      "infer_time_s": 1.618,
      "rtf": 0.2066,
      "wer": 0.0556
    },
    {
      "id": "1089-134686-0029",
      "ref": "ON FRIDAY CONFESSION WILL BE HEARD ALL THE AFTERNOON AFTER BEADS",
      "hyp": "On Friday, confession will be heard all the afternoon after beads.",
      "ref_norm": "ON FRIDAY CONFESSION WILL BE HEARD ALL THE AFTERNOON AFTER BEADS",
      "hyp_norm": "ON FRIDAY CONFESSION WILL BE HEARD ALL THE AFTERNOON AFTER BEADS",
      "duration_s": 4.67,
      "infer_time_s": 1.069,
      "rtf": 0.2288,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0030",
      "ref": "BEWARE OF MAKING THAT MISTAKE",
      "hyp": "Beware of making that mistake.",
      "ref_norm": "BEWARE OF MAKING THAT MISTAKE",
      "hyp_norm": "BEWARE OF MAKING THAT MISTAKE",
      "duration_s": 2.715,
      "infer_time_s": 0.623,
      "rtf": 0.2296,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0031",
      "ref": "STEPHEN'S HEART BEGAN SLOWLY TO FOLD AND FADE WITH FEAR LIKE A WITHERING FLOWER",
      "hyp": "Stephen's heart began slowly to fold and fade with fear , like a withering flower.",
      "ref_norm": "STEPHENS HEART BEGAN SLOWLY TO FOLD AND FADE WITH FEAR LIKE A WITHERING FLOWER",
      "hyp_norm": "STEPHENS HEART BEGAN SLOWLY TO FOLD AND FADE WITH FEAR LIKE A WITHERING FLOWER",
      "duration_s": 6.615,
      "infer_time_s": 1.476,
      "rtf": 0.2231,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0032",
      "ref": "HE IS CALLED AS YOU KNOW THE APOSTLE OF THE INDIES",
      "hyp": "He is called, as you know, the Apostle of the Indies.",
      "ref_norm": "HE IS CALLED AS YOU KNOW THE APOSTLE OF THE INDIES",
      "hyp_norm": "HE IS CALLED AS YOU KNOW THE APOSTLE OF THE INDIES",
      "duration_s": 4.09,
      "infer_time_s": 1.125,
      "rtf": 0.2751,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0033",
      "ref": "A GREAT SAINT SAINT FRANCIS XAVIER",
      "hyp": "A great saint, Saint Francis Xavier.",
      "ref_norm": "A GREAT SAINT SAINT FRANCIS XAVIER",
      "hyp_norm": "A GREAT SAINT SAINT FRANCIS XAVIER",
      "duration_s": 3.33,
      "infer_time_s": 0.684,
      "rtf": 0.2054,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0034",
      "ref": "THE RECTOR PAUSED AND THEN SHAKING HIS CLASPED HANDS BEFORE HIM WENT ON",
      "hyp": "The rector paused and then shaking his clasped hands before him, went on.",
      "ref_norm": "THE RECTOR PAUSED AND THEN SHAKING HIS CLASPED HANDS BEFORE HIM WENT ON",
      "hyp_norm": "THE RECTOR PAUSED AND THEN SHAKING HIS CLASPED HANDS BEFORE HIM WENT ON",
      "duration_s": 5.81,
      "infer_time_s": 1.277,
      "rtf": 0.2197,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0035",
      "ref": "HE HAD THE FAITH IN HIM THAT MOVES MOUNTAINS",
      "hyp": "He had the faith in him that moves mountains.",
      "ref_norm": "HE HAD THE FAITH IN HIM THAT MOVES MOUNTAINS",
      "hyp_norm": "HE HAD THE FAITH IN HIM THAT MOVES MOUNTAINS",
      "duration_s": 3.445,
      "infer_time_s": 0.79,
      "rtf": 0.2292,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0036",
      "ref": "A GREAT SAINT SAINT FRANCIS XAVIER",
      "hyp": "A great saint, Saint Francis Xavier.",
      "ref_norm": "A GREAT SAINT SAINT FRANCIS XAVIER",
      "hyp_norm": "A GREAT SAINT SAINT FRANCIS XAVIER",
      "duration_s": 3.25,
      "infer_time_s": 0.682,
      "rtf": 0.2097,
      "wer": 0.0
    },
    {
      "id": "1089-134686-0037",
      "ref": "IN THE SILENCE THEIR DARK FIRE KINDLED THE DUSK INTO A TAWNY GLOW",
      "hyp": "In the silence, their dark fire kindled the dusk into a tawny glow.",
      "ref_norm": "IN THE SILENCE THEIR DARK FIRE KINDLED THE DUSK INTO A TAWNY GLOW",
      "hyp_norm": "IN THE SILENCE THEIR DARK FIRE KINDLED THE DUSK INTO A TAWNY GLOW",
      "duration_s": 5.21,
      "infer_time_s": 1.378,
      "rtf": 0.2646,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0000",
      "ref": "HE COULD WAIT NO LONGER",
      "hyp": "He could wait no longer.",
      "ref_norm": "HE COULD WAIT NO LONGER",
      "hyp_norm": "HE COULD WAIT NO LONGER",
      "duration_s": 2.085,
      "infer_time_s": 0.578,
      "rtf": 0.2773,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0001",
      "ref": "FOR A FULL HOUR HE HAD PACED UP AND DOWN WAITING BUT HE COULD WAIT NO LONGER",
      "hyp": "For a full hour, he had paced up and down, waiting , but he could wait no longer.",
      "ref_norm": "FOR A FULL HOUR HE HAD PACED UP AND DOWN WAITING BUT HE COULD WAIT NO LONGER",
      "hyp_norm": "FOR A FULL HOUR HE HAD PACED UP AND DOWN WAITING BUT HE COULD WAIT NO LONGER",
      "duration_s": 5.415,
      "infer_time_s": 1.498,
      "rtf": 0.2766,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0002",
      "ref": "HE SET OFF ABRUPTLY FOR THE BULL WALKING RAPIDLY LEST HIS FATHER'S SHRILL WHISTLE MIGHT CALL HIM BACK AND IN A FEW MOMENTS HE HAD ROUNDED THE CURVE AT THE POLICE BARRACK AND WAS SAFE",
      "hyp": "He set off abruptly for the bull, walking rapidly lest his father 's shrill whistle might call him back, and in a few moments he had rounded the curve at the police barrack and was safe.",
      "ref_norm": "HE SET OFF ABRUPTLY FOR THE BULL WALKING RAPIDLY LEST HIS FATHERS SHRILL WHISTLE MIGHT CALL HIM BACK AND IN A FEW MOMENTS HE HAD ROUNDED THE CURVE AT THE POLICE BARRACK AND WAS SAFE",
      "hyp_norm": "HE SET OFF ABRUPTLY FOR THE BULL WALKING RAPIDLY LEST HIS FATHER S SHRILL WHISTLE MIGHT CALL HIM BACK AND IN A FEW MOMENTS HE HAD ROUNDED THE CURVE AT THE POLICE BARRACK AND WAS SAFE",
      "duration_s": 11.6,
      "infer_time_s": 3.036,
      "rtf": 0.2617,
      "wer": 0.0571
    },
    {
      "id": "1089-134691-0003",
      "ref": "THE UNIVERSITY",
      "hyp": "The university .",
      "ref_norm": "THE UNIVERSITY",
      "hyp_norm": "THE UNIVERSITY",
      "duration_s": 2.175,
      "infer_time_s": 0.421,
      "rtf": 0.1936,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0004",
      "ref": "PRIDE AFTER SATISFACTION UPLIFTED HIM LIKE LONG SLOW WAVES",
      "hyp": "Bride, after satisfaction, uplifted him like long, slow waves.",
      "ref_norm": "PRIDE AFTER SATISFACTION UPLIFTED HIM LIKE LONG SLOW WAVES",
      "hyp_norm": "BRIDE AFTER SATISFACTION UPLIFTED HIM LIKE LONG SLOW WAVES",
      "duration_s": 5.175,
      "infer_time_s": 1.198,
      "rtf": 0.2315,
      "wer": 0.1111
    },
    {
      "id": "1089-134691-0005",
      "ref": "WHOSE FEET ARE AS THE FEET OF HARTS AND UNDERNEATH THE EVERLASTING ARMS",
      "hyp": "Whose feet are as the feet of hearts, and underneath the everlasting arms.",
      "ref_norm": "WHOSE FEET ARE AS THE FEET OF HARTS AND UNDERNEATH THE EVERLASTING ARMS",
      "hyp_norm": "WHOSE FEET ARE AS THE FEET OF HEARTS AND UNDERNEATH THE EVERLASTING ARMS",
      "duration_s": 5.36,
      "infer_time_s": 1.269,
      "rtf": 0.2368,
      "wer": 0.0769
    },
    {
      "id": "1089-134691-0006",
      "ref": "THE PRIDE OF THAT DIM IMAGE BROUGHT BACK TO HIS MIND THE DIGNITY OF THE OFFICE HE HAD REFUSED",
      "hyp": "The pride of that dim image brought back to his mind the dignity of the office he had refused.",
      "ref_norm": "THE PRIDE OF THAT DIM IMAGE BROUGHT BACK TO HIS MIND THE DIGNITY OF THE OFFICE HE HAD REFUSED",
      "hyp_norm": "THE PRIDE OF THAT DIM IMAGE BROUGHT BACK TO HIS MIND THE DIGNITY OF THE OFFICE HE HAD REFUSED",
      "duration_s": 5.895,
      "infer_time_s": 1.447,
      "rtf": 0.2455,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0007",
      "ref": "SOON THE WHOLE BRIDGE WAS TREMBLING AND RESOUNDING",
      "hyp": "Soon, the whole bridge was trembling and resounding.",
      "ref_norm": "SOON THE WHOLE BRIDGE WAS TREMBLING AND RESOUNDING",
      "hyp_norm": "SOON THE WHOLE BRIDGE WAS TREMBLING AND RESOUNDING",
      "duration_s": 3.44,
      "infer_time_s": 0.859,
      "rtf": 0.2497,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0008",
      "ref": "THE UNCOUTH FACES PASSED HIM TWO BY TWO STAINED YELLOW OR RED OR LIVID BY THE SEA AND AS HE STROVE TO LOOK AT THEM WITH EASE AND INDIFFERENCE A FAINT STAIN OF PERSONAL SHAME AND COMMISERATION ROSE TO HIS OWN FACE",
      "hyp": "The uncouth faces passed him two by two , stained yellow or red or livid by the sea , and as he strove to look at them with ease and indifference, a faint stain of personal shame and commiseration rose to his own face.",
      "ref_norm": "THE UNCOUTH FACES PASSED HIM TWO BY TWO STAINED YELLOW OR RED OR LIVID BY THE SEA AND AS HE STROVE TO LOOK AT THEM WITH EASE AND INDIFFERENCE A FAINT STAIN OF PERSONAL SHAME AND COMMISERATION ROSE TO HIS OWN FACE",
      "hyp_norm": "THE UNCOUTH FACES PASSED HIM TWO BY TWO STAINED YELLOW OR RED OR LIVID BY THE SEA AND AS HE STROVE TO LOOK AT THEM WITH EASE AND INDIFFERENCE A FAINT STAIN OF PERSONAL SHAME AND COMMISERATION ROSE TO HIS OWN FACE",
      "duration_s": 14.985,
      "infer_time_s": 3.942,
      "rtf": 0.263,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0009",
      "ref": "ANGRY WITH HIMSELF HE TRIED TO HIDE HIS FACE FROM THEIR EYES BY GAZING DOWN SIDEWAYS INTO THE SHALLOW SWIRLING WATER UNDER THE BRIDGE BUT HE STILL SAW A REFLECTION THEREIN OF THEIR TOP HEAVY SILK HATS AND HUMBLE TAPE LIKE COLLARS AND LOOSELY HANGING CLERICAL CLOTHES BROTHER HICKEY",
      "hyp": "Angry with himself , he tried to hide his face from their eyes by g azing down sideways into the shallow, swirling water under the bridge, but he still saw a reflection therein of their top-heavy silk hats, and humble tape-like collars and loosely hanging clerical clothes. Brother Hickey.",
      "ref_norm": "ANGRY WITH HIMSELF HE TRIED TO HIDE HIS FACE FROM THEIR EYES BY GAZING DOWN SIDEWAYS INTO THE SHALLOW SWIRLING WATER UNDER THE BRIDGE BUT HE STILL SAW A REFLECTION THEREIN OF THEIR TOP HEAVY SILK HATS AND HUMBLE TAPE LIKE COLLARS AND LOOSELY HANGING CLERICAL CLOTHES BROTHER HICKEY",
      "hyp_norm": "ANGRY WITH HIMSELF HE TRIED TO HIDE HIS FACE FROM THEIR EYES BY G AZING DOWN SIDEWAYS INTO THE SHALLOW SWIRLING WATER UNDER THE BRIDGE BUT HE STILL SAW A REFLECTION THEREIN OF THEIR TOPHEAVY SILK HATS AND HUMBLE TAPELIKE COLLARS AND LOOSELY HANGING CLERICAL CLOTHES BROTHER HICKEY",
      "duration_s": 20.055,
      "infer_time_s": 4.952,
      "rtf": 0.2469,
      "wer": 0.1224
    },
    {
      "id": "1089-134691-0010",
      "ref": "BROTHER MAC ARDLE BROTHER KEOGH",
      "hyp": "Brother Macardal. Brother Kiyof.",
      "ref_norm": "BROTHER MAC ARDLE BROTHER KEOGH",
      "hyp_norm": "BROTHER MACARDAL BROTHER KIYOF",
      "duration_s": 3.195,
      "infer_time_s": 0.803,
      "rtf": 0.2513,
      "wer": 0.6
    },
    {
      "id": "1089-134691-0011",
      "ref": "THEIR PIETY WOULD BE LIKE THEIR NAMES LIKE THEIR FACES LIKE THEIR CLOTHES AND IT WAS IDLE FOR HIM TO TELL HIMSELF THAT THEIR HUMBLE AND CONTRITE HEARTS IT MIGHT BE PAID A FAR RICHER TRIBUTE OF DEVOTION THAN HIS HAD EVER BEEN A GIFT TENFOLD MORE ACCEPTABLE THAN HIS ELABORATE ADORATION",
      "hyp": "Their piety would be like their names , like their faces, like their clothes, and it was idle for him to tell himself that their humble and contrite hearts it might be paid a far richer tribute of devotion than his had ever been, a gift tenfold more acceptable than his elaborate adoration.",
      "ref_norm": "THEIR PIETY WOULD BE LIKE THEIR NAMES LIKE THEIR FACES LIKE THEIR CLOTHES AND IT WAS IDLE FOR HIM TO TELL HIMSELF THAT THEIR HUMBLE AND CONTRITE HEARTS IT MIGHT BE PAID A FAR RICHER TRIBUTE OF DEVOTION THAN HIS HAD EVER BEEN A GIFT TENFOLD MORE ACCEPTABLE THAN HIS ELABORATE ADORATION",
      "hyp_norm": "THEIR PIETY WOULD BE LIKE THEIR NAMES LIKE THEIR FACES LIKE THEIR CLOTHES AND IT WAS IDLE FOR HIM TO TELL HIMSELF THAT THEIR HUMBLE AND CONTRITE HEARTS IT MIGHT BE PAID A FAR RICHER TRIBUTE OF DEVOTION THAN HIS HAD EVER BEEN A GIFT TENFOLD MORE ACCEPTABLE THAN HIS ELABORATE ADORATION",
      "duration_s": 20.01,
      "infer_time_s": 5.012,
      "rtf": 0.2505,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0012",
      "ref": "IT WAS IDLE FOR HIM TO MOVE HIMSELF TO BE GENEROUS TOWARDS THEM TO TELL HIMSELF THAT IF HE EVER CAME TO THEIR GATES STRIPPED OF HIS PRIDE BEATEN AND IN BEGGAR'S WEEDS THAT THEY WOULD BE GENEROUS TOWARDS HIM LOVING HIM AS THEMSELVES",
      "hyp": "It was idle for him to move himself to be generous towards them. To tell himself that if he ever came to their gates, stripped of his pride, beaten and in beggar's weeds , that they would be generous towards him, loving him as themselves.",
      "ref_norm": "IT WAS IDLE FOR HIM TO MOVE HIMSELF TO BE GENEROUS TOWARDS THEM TO TELL HIMSELF THAT IF HE EVER CAME TO THEIR GATES STRIPPED OF HIS PRIDE BEATEN AND IN BEGGARS WEEDS THAT THEY WOULD BE GENEROUS TOWARDS HIM LOVING HIM AS THEMSELVES",
      "hyp_norm": "IT WAS IDLE FOR HIM TO MOVE HIMSELF TO BE GENEROUS TOWARDS THEM TO TELL HIMSELF THAT IF HE EVER CAME TO THEIR GATES STRIPPED OF HIS PRIDE BEATEN AND IN BEGGARS WEEDS THAT THEY WOULD BE GENEROUS TOWARDS HIM LOVING HIM AS THEMSELVES",
      "duration_s": 15.03,
      "infer_time_s": 3.972,
      "rtf": 0.2643,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0013",
      "ref": "IDLE AND EMBITTERING FINALLY TO ARGUE AGAINST HIS OWN DISPASSIONATE CERTITUDE THAT THE COMMANDMENT OF LOVE BADE US NOT TO LOVE OUR NEIGHBOUR AS OURSELVES WITH THE SAME AMOUNT AND INTENSITY OF LOVE BUT TO LOVE HIM AS OURSELVES WITH THE SAME KIND OF LOVE",
      "hyp": "Idle and embitter ing, finally to argue against his own dispass ionate certitude, that the commandment of love bade us not to love our neighbor as ourselves with the same amount and intensity of love , but to love him as ourselves with the same kind of love.",
      "ref_norm": "IDLE AND EMBITTERING FINALLY TO ARGUE AGAINST HIS OWN DISPASSIONATE CERTITUDE THAT THE COMMANDMENT OF LOVE BADE US NOT TO LOVE OUR NEIGHBOUR AS OURSELVES WITH THE SAME AMOUNT AND INTENSITY OF LOVE BUT TO LOVE HIM AS OURSELVES WITH THE SAME KIND OF LOVE",
      "hyp_norm": "IDLE AND EMBITTER ING FINALLY TO ARGUE AGAINST HIS OWN DISPASS IONATE CERTITUDE THAT THE COMMANDMENT OF LOVE BADE US NOT TO LOVE OUR NEIGHBOR AS OURSELVES WITH THE SAME AMOUNT AND INTENSITY OF LOVE BUT TO LOVE HIM AS OURSELVES WITH THE SAME KIND OF LOVE",
      "duration_s": 16.33,
      "infer_time_s": 4.358,
      "rtf": 0.2669,
      "wer": 0.1111
    },
    {
      "id": "1089-134691-0014",
      "ref": "THE PHRASE AND THE DAY AND THE SCENE HARMONIZED IN A CHORD",
      "hyp": "The phrase and the day and the scene harmonized in accord.",
      "ref_norm": "THE PHRASE AND THE DAY AND THE SCENE HARMONIZED IN A CHORD",
      "hyp_norm": "THE PHRASE AND THE DAY AND THE SCENE HARMONIZED IN ACCORD",
      "duration_s": 4.755,
      "infer_time_s": 1.071,
      "rtf": 0.2253,
      "wer": 0.1667
    },
    {
      "id": "1089-134691-0015",
      "ref": "WORDS WAS IT THEIR COLOURS",
      "hyp": "Words. Was it their colors?",
      "ref_norm": "WORDS WAS IT THEIR COLOURS",
      "hyp_norm": "WORDS WAS IT THEIR COLORS",
      "duration_s": 3.395,
      "infer_time_s": 0.58,
      "rtf": 0.1708,
      "wer": 0.2
    },
    {
      "id": "1089-134691-0016",
      "ref": "THEY WERE VOYAGING ACROSS THE DESERTS OF THE SKY A HOST OF NOMADS ON THE MARCH VOYAGING HIGH OVER IRELAND WESTWARD BOUND",
      "hyp": "They were voyaging across the deserts of the sky , a host of nomads on the march, voy aging high over Ireland westward bound.",
      "ref_norm": "THEY WERE VOYAGING ACROSS THE DESERTS OF THE SKY A HOST OF NOMADS ON THE MARCH VOYAGING HIGH OVER IRELAND WESTWARD BOUND",
      "hyp_norm": "THEY WERE VOYAGING ACROSS THE DESERTS OF THE SKY A HOST OF NOMADS ON THE MARCH VOY AGING HIGH OVER IRELAND WESTWARD BOUND",
      "duration_s": 9.06,
      "infer_time_s": 2.268,
      "rtf": 0.2503,
      "wer": 0.0909
    },
    {
      "id": "1089-134691-0017",
      "ref": "THE EUROPE THEY HAD COME FROM LAY OUT THERE BEYOND THE IRISH SEA EUROPE OF STRANGE TONGUES AND VALLEYED AND WOODBEGIRT AND CITADELLED AND OF ENTRENCHED AND MARSHALLED RACES",
      "hyp": "The Europe they had come from lay out there beyond the Irish Sea , Europe of strange tongues and valleyed and wood begirt and citadelled and of entrenched and marshalled races.",
      "ref_norm": "THE EUROPE THEY HAD COME FROM LAY OUT THERE BEYOND THE IRISH SEA EUROPE OF STRANGE TONGUES AND VALLEYED AND WOODBEGIRT AND CITADELLED AND OF ENTRENCHED AND MARSHALLED RACES",
      "hyp_norm": "THE EUROPE THEY HAD COME FROM LAY OUT THERE BEYOND THE IRISH SEA EUROPE OF STRANGE TONGUES AND VALLEYED AND WOOD BEGIRT AND CITADELLED AND OF ENTRENCHED AND MARSHALLED RACES",
      "duration_s": 11.695,
      "infer_time_s": 2.83,
      "rtf": 0.242,
      "wer": 0.069
    },
    {
      "id": "1089-134691-0018",
      "ref": "AGAIN AGAIN",
      "hyp": "Again. Again.",
      "ref_norm": "AGAIN AGAIN",
      "hyp_norm": "AGAIN AGAIN",
      "duration_s": 3.09,
      "infer_time_s": 0.422,
      "rtf": 0.1365,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0019",
      "ref": "A VOICE FROM BEYOND THE WORLD WAS CALLING",
      "hyp": "A voice from beyond the world was calling.",
      "ref_norm": "A VOICE FROM BEYOND THE WORLD WAS CALLING",
      "hyp_norm": "A VOICE FROM BEYOND THE WORLD WAS CALLING",
      "duration_s": 3.155,
      "infer_time_s": 0.738,
      "rtf": 0.2338,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0020",
      "ref": "HELLO STEPHANOS HERE COMES THE DEDALUS",
      "hyp": "Hello, Stephan os, here comes the Dedalus.",
      "ref_norm": "HELLO STEPHANOS HERE COMES THE DEDALUS",
      "hyp_norm": "HELLO STEPHAN OS HERE COMES THE DEDALUS",
      "duration_s": 3.99,
      "infer_time_s": 0.835,
      "rtf": 0.2093,
      "wer": 0.3333
    },
    {
      "id": "1089-134691-0021",
      "ref": "THEIR DIVING STONE POISED ON ITS RUDE SUPPORTS AND ROCKING UNDER THEIR PLUNGES AND THE ROUGH HEWN STONES OF THE SLOPING BREAKWATER OVER WHICH THEY SCRAMBLED IN THEIR HORSEPLAY GLEAMED WITH COLD WET LUSTRE",
      "hyp": "Their diving stone poised on its rude supports and rocking under their plunges, and the rough-hewn stones of the sloping breakwater over which they scrambled in their horseplay, gleamed with cold, wet lustre.",
      "ref_norm": "THEIR DIVING STONE POISED ON ITS RUDE SUPPORTS AND ROCKING UNDER THEIR PLUNGES AND THE ROUGH HEWN STONES OF THE SLOPING BREAKWATER OVER WHICH THEY SCRAMBLED IN THEIR HORSEPLAY GLEAMED WITH COLD WET LUSTRE",
      "hyp_norm": "THEIR DIVING STONE POISED ON ITS RUDE SUPPORTS AND ROCKING UNDER THEIR PLUNGES AND THE ROUGHHEWN STONES OF THE SLOPING BREAKWATER OVER WHICH THEY SCRAMBLED IN THEIR HORSEPLAY GLEAMED WITH COLD WET LUSTRE",
      "duration_s": 13.37,
      "infer_time_s": 3.432,
      "rtf": 0.2567,
      "wer": 0.0588
    },
    {
      "id": "1089-134691-0022",
      "ref": "HE STOOD STILL IN DEFERENCE TO THEIR CALLS AND PARRIED THEIR BANTER WITH EASY WORDS",
      "hyp": "He stood still in deference to their calls and parried their banter with easy words.",
      "ref_norm": "HE STOOD STILL IN DEFERENCE TO THEIR CALLS AND PARRIED THEIR BANTER WITH EASY WORDS",
      "hyp_norm": "HE STOOD STILL IN DEFERENCE TO THEIR CALLS AND PARRIED THEIR BANTER WITH EASY WORDS",
      "duration_s": 5.635,
      "infer_time_s": 1.412,
      "rtf": 0.2506,
      "wer": 0.0
    },
    {
      "id": "1089-134691-0023",
      "ref": "IT WAS A PAIN TO SEE THEM AND A SWORD LIKE PAIN TO SEE THE SIGNS OF ADOLESCENCE THAT MADE REPELLENT THEIR PITIABLE NAKEDNESS",
      "hyp": "It was a pain to see them, and a sword-like pain to see the signs of adolescence that made repellent their pitiable nakedness.",
      "ref_norm": "IT WAS A PAIN TO SEE THEM AND A SWORD LIKE PAIN TO SEE THE SIGNS OF ADOLESCENCE THAT MADE REPELLENT THEIR PITIABLE NAKEDNESS",
      "hyp_norm": "IT WAS A PAIN TO SEE THEM AND A SWORDLIKE PAIN TO SEE THE SIGNS OF ADOLESCENCE THAT MADE REPELLENT THEIR PITIABLE NAKEDNESS",
      "duration_s": 7.735,
      "infer_time_s": 2.098,
      "rtf": 0.2712,
      "wer": 0.0833
    },
    {
      "id": "1089-134691-0024",
      "ref": "STEPHANOS DEDALOS",
      "hyp": "Stephano Ster lows.",
      "ref_norm": "STEPHANOS DEDALOS",
      "hyp_norm": "STEPHANO STER LOWS",
      "duration_s": 2.215,
      "infer_time_s": 0.624,
      "rtf": 0.2819,
      "wer": 1.5
    },
    {
      "id": "1089-134691-0025",
      "ref": "A MOMENT BEFORE THE GHOST OF THE ANCIENT KINGDOM OF THE DANES HAD LOOKED FORTH THROUGH THE VESTURE OF THE HAZEWRAPPED CITY",
      "hyp": "A moment before the ghost of the ancient kingdom of the Danes had looked forth through the vesture of the haze-rapped city.",
      "ref_norm": "A MOMENT BEFORE THE GHOST OF THE ANCIENT KINGDOM OF THE DANES HAD LOOKED FORTH THROUGH THE VESTURE OF THE HAZEWRAPPED CITY",
      "hyp_norm": "A MOMENT BEFORE THE GHOST OF THE ANCIENT KINGDOM OF THE DANES HAD LOOKED FORTH THROUGH THE VESTURE OF THE HAZERAPPED CITY",
      "duration_s": 8.005,
      "infer_time_s": 2.109,
      "rtf": 0.2635,
      "wer": 0.0455
    },
    {
      "id": "1188-133604-0000",
      "ref": "YOU WILL FIND ME CONTINUALLY SPEAKING OF FOUR MEN TITIAN HOLBEIN TURNER AND TINTORET IN ALMOST THE SAME TERMS",
      "hyp": "You will find me continually speaking of four men : Tichen , Holbein, Turner, and Tintoret , in almost the same terms.",
      "ref_norm": "YOU WILL FIND ME CONTINUALLY SPEAKING OF FOUR MEN TITIAN HOLBEIN TURNER AND TINTORET IN ALMOST THE SAME TERMS",
      "hyp_norm": "YOU WILL FIND ME CONTINUALLY SPEAKING OF FOUR MEN TICHEN HOLBEIN TURNER AND TINTORET IN ALMOST THE SAME TERMS",
      "duration_s": 10.725,
      "infer_time_s": 2.46,
      "rtf": 0.2294,
      "wer": 0.0526
    },
    {
      "id": "1188-133604-0001",
      "ref": "THEY UNITE EVERY QUALITY AND SOMETIMES YOU WILL FIND ME REFERRING TO THEM AS COLORISTS SOMETIMES AS CHIAROSCURISTS",
      "hyp": "They unite every quality. And sometimes you will find me referring to them as colorists , sometimes as chiaroscurs.",
      "ref_norm": "THEY UNITE EVERY QUALITY AND SOMETIMES YOU WILL FIND ME REFERRING TO THEM AS COLORISTS SOMETIMES AS CHIAROSCURISTS",
      "hyp_norm": "THEY UNITE EVERY QUALITY AND SOMETIMES YOU WILL FIND ME REFERRING TO THEM AS COLORISTS SOMETIMES AS CHIAROSCURS",
      "duration_s": 9.04,
      "infer_time_s": 2.01,
      "rtf": 0.2223,
      "wer": 0.0556
    },
    {
      "id": "1188-133604-0002",
      "ref": "BY BEING STUDIOUS OF COLOR THEY ARE STUDIOUS OF DIVISION AND WHILE THE CHIAROSCURIST DEVOTES HIMSELF TO THE REPRESENTATION OF DEGREES OF FORCE IN ONE THING UNSEPARATED LIGHT THE COLORISTS HAVE FOR THEIR FUNCTION THE ATTAINMENT OF BEAUTY BY ARRANGEMENT OF THE DIVISIONS OF LIGHT",
      "hyp": "By being studious of color, they are studious of division, and while the cure obscurest devotes himself to the representation of degrees of force in one thing , unseparated light, the colorists have for their function, the attainment of beauty by arrangement of the divisions of light.",
      "ref_norm": "BY BEING STUDIOUS OF COLOR THEY ARE STUDIOUS OF DIVISION AND WHILE THE CHIAROSCURIST DEVOTES HIMSELF TO THE REPRESENTATION OF DEGREES OF FORCE IN ONE THING UNSEPARATED LIGHT THE COLORISTS HAVE FOR THEIR FUNCTION THE ATTAINMENT OF BEAUTY BY ARRANGEMENT OF THE DIVISIONS OF LIGHT",
      "hyp_norm": "BY BEING STUDIOUS OF COLOR THEY ARE STUDIOUS OF DIVISION AND WHILE THE CURE OBSCUREST DEVOTES HIMSELF TO THE REPRESENTATION OF DEGREES OF FORCE IN ONE THING UNSEPARATED LIGHT THE COLORISTS HAVE FOR THEIR FUNCTION THE ATTAINMENT OF BEAUTY BY ARRANGEMENT OF THE DIVISIONS OF LIGHT",
      "duration_s": 17.96,
      "infer_time_s": 4.572,
      "rtf": 0.2546,
      "wer": 0.0444
    },
    {
      "id": "1188-133604-0003",
      "ref": "MY FIRST AND PRINCIPAL REASON WAS THAT THEY ENFORCED BEYOND ALL RESISTANCE ON ANY STUDENT WHO MIGHT ATTEMPT TO COPY THEM THIS METHOD OF LAYING PORTIONS OF DISTINCT HUE SIDE BY SIDE",
      "hyp": "My first and principal reason was that they enforced , beyond all resistance, on any student who might attempt to copy them this method of laying portions of distinct hue side by side.",
      "ref_norm": "MY FIRST AND PRINCIPAL REASON WAS THAT THEY ENFORCED BEYOND ALL RESISTANCE ON ANY STUDENT WHO MIGHT ATTEMPT TO COPY THEM THIS METHOD OF LAYING PORTIONS OF DISTINCT HUE SIDE BY SIDE",
      "hyp_norm": "MY FIRST AND PRINCIPAL REASON WAS THAT THEY ENFORCED BEYOND ALL RESISTANCE ON ANY STUDENT WHO MIGHT ATTEMPT TO COPY THEM THIS METHOD OF LAYING PORTIONS OF DISTINCT HUE SIDE BY SIDE",
      "duration_s": 12.61,
      "infer_time_s": 2.934,
      "rtf": 0.2327,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0004",
      "ref": "SOME OF THE TOUCHES INDEED WHEN THE TINT HAS BEEN MIXED WITH MUCH WATER HAVE BEEN LAID IN LITTLE DROPS OR PONDS SO THAT THE PIGMENT MIGHT CRYSTALLIZE HARD AT THE EDGE",
      "hyp": "Some of the touches indeed, when the tint has been mixed with much water , have been laid in little drops or ponds, so that the pigment might crystallize hard at the edge.",
      "ref_norm": "SOME OF THE TOUCHES INDEED WHEN THE TINT HAS BEEN MIXED WITH MUCH WATER HAVE BEEN LAID IN LITTLE DROPS OR PONDS SO THAT THE PIGMENT MIGHT CRYSTALLIZE HARD AT THE EDGE",
      "hyp_norm": "SOME OF THE TOUCHES INDEED WHEN THE TINT HAS BEEN MIXED WITH MUCH WATER HAVE BEEN LAID IN LITTLE DROPS OR PONDS SO THAT THE PIGMENT MIGHT CRYSTALLIZE HARD AT THE EDGE",
      "duration_s": 10.65,
      "infer_time_s": 2.847,
      "rtf": 0.2673,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0005",
      "ref": "IT IS THE HEAD OF A PARROT WITH A LITTLE FLOWER IN HIS BEAK FROM A PICTURE OF CARPACCIO'S ONE OF HIS SERIES OF THE LIFE OF SAINT GEORGE",
      "hyp": "It is the head of a par rot with a little flower in his beak, from a picture of Carpat ius, one of his series of the life of Saint George.",
      "ref_norm": "IT IS THE HEAD OF A PARROT WITH A LITTLE FLOWER IN HIS BEAK FROM A PICTURE OF CARPACCIOS ONE OF HIS SERIES OF THE LIFE OF SAINT GEORGE",
      "hyp_norm": "IT IS THE HEAD OF A PAR ROT WITH A LITTLE FLOWER IN HIS BEAK FROM A PICTURE OF CARPAT IUS ONE OF HIS SERIES OF THE LIFE OF SAINT GEORGE",
      "duration_s": 8.56,
      "infer_time_s": 2.625,
      "rtf": 0.3066,
      "wer": 0.1379
    },
    {
      "id": "1188-133604-0006",
      "ref": "THEN HE COMES TO THE BEAK OF IT",
      "hyp": "Then he comes to the beak of it.",
      "ref_norm": "THEN HE COMES TO THE BEAK OF IT",
      "hyp_norm": "THEN HE COMES TO THE BEAK OF IT",
      "duration_s": 2.4,
      "infer_time_s": 0.816,
      "rtf": 0.3402,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0007",
      "ref": "THE BROWN GROUND BENEATH IS LEFT FOR THE MOST PART ONE TOUCH OF BLACK IS PUT FOR THE HOLLOW TWO DELICATE LINES OF DARK GRAY DEFINE THE OUTER CURVE AND ONE LITTLE QUIVERING TOUCH OF WHITE DRAWS THE INNER EDGE OF THE MANDIBLE",
      "hyp": "The brown ground beneath is left for the most part; one touch of black is put for the hollow . Two delicate lines of dark gray define the outer curve , and one little qu ivering touch of white draws the inner edge of the mandible.",
      "ref_norm": "THE BROWN GROUND BENEATH IS LEFT FOR THE MOST PART ONE TOUCH OF BLACK IS PUT FOR THE HOLLOW TWO DELICATE LINES OF DARK GRAY DEFINE THE OUTER CURVE AND ONE LITTLE QUIVERING TOUCH OF WHITE DRAWS THE INNER EDGE OF THE MANDIBLE",
      "hyp_norm": "THE BROWN GROUND BENEATH IS LEFT FOR THE MOST PART ONE TOUCH OF BLACK IS PUT FOR THE HOLLOW TWO DELICATE LINES OF DARK GRAY DEFINE THE OUTER CURVE AND ONE LITTLE QU IVERING TOUCH OF WHITE DRAWS THE INNER EDGE OF THE MANDIBLE",
      "duration_s": 14.24,
      "infer_time_s": 3.861,
      "rtf": 0.2712,
      "wer": 0.0465
    },
    {
      "id": "1188-133604-0008",
      "ref": "FOR BELIEVE ME THE FINAL PHILOSOPHY OF ART CAN ONLY RATIFY THEIR OPINION THAT THE BEAUTY OF A COCK ROBIN IS TO BE RED AND OF A GRASS PLOT TO BE GREEN AND THE BEST SKILL OF ART IS IN INSTANTLY SEIZING ON THE MANIFOLD DELICIOUSNESS OF LIGHT WHICH YOU CAN ONLY SEIZE BY PRECISION OF INSTANTANEOUS TOUCH",
      "hyp": "For believe me , the final philosophy of art can only ratify their opinion that the beauty of a cock robin is to be red , and of a grass plot to be green , and the best skill of art is in instantly seizing on the manifold deliciousness of light, which you can only seize by precision, of instantaneous touch.",
      "ref_norm": "FOR BELIEVE ME THE FINAL PHILOSOPHY OF ART CAN ONLY RATIFY THEIR OPINION THAT THE BEAUTY OF A COCK ROBIN IS TO BE RED AND OF A GRASS PLOT TO BE GREEN AND THE BEST SKILL OF ART IS IN INSTANTLY SEIZING ON THE MANIFOLD DELICIOUSNESS OF LIGHT WHICH YOU CAN ONLY SEIZE BY PRECISION OF INSTANTANEOUS TOUCH",
      "hyp_norm": "FOR BELIEVE ME THE FINAL PHILOSOPHY OF ART CAN ONLY RATIFY THEIR OPINION THAT THE BEAUTY OF A COCK ROBIN IS TO BE RED AND OF A GRASS PLOT TO BE GREEN AND THE BEST SKILL OF ART IS IN INSTANTLY SEIZING ON THE MANIFOLD DELICIOUSNESS OF LIGHT WHICH YOU CAN ONLY SEIZE BY PRECISION OF INSTANTANEOUS TOUCH",
      "duration_s": 20.755,
      "infer_time_s": 5.238,
      "rtf": 0.2524,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0009",
      "ref": "NOW YOU WILL SEE IN THESE STUDIES THAT THE MOMENT THE WHITE IS INCLOSED PROPERLY AND HARMONIZED WITH THE OTHER HUES IT BECOMES SOMEHOW MORE PRECIOUS AND PEARLY THAN THE WHITE PAPER AND THAT I AM NOT AFRAID TO LEAVE A WHOLE FIELD OF UNTREATED WHITE PAPER ALL ROUND IT BEING SURE THAT EVEN THE LITTLE DIAMONDS IN THE ROUND WINDOW WILL TELL AS JEWELS IF THEY ARE GRADATED JUSTLY",
      "hyp": "Now you will see in these studies that the moment the white is enclosed properly and harmonized with the other hues , it becomes somehow more precious and pearly than the white paper . And that I am not afraid to leave a whole field of untreated white paper all round it, being sure that even the little diamonds in the round window will tell as jewels if they are gradated justly.",
      "ref_norm": "NOW YOU WILL SEE IN THESE STUDIES THAT THE MOMENT THE WHITE IS INCLOSED PROPERLY AND HARMONIZED WITH THE OTHER HUES IT BECOMES SOMEHOW MORE PRECIOUS AND PEARLY THAN THE WHITE PAPER AND THAT I AM NOT AFRAID TO LEAVE A WHOLE FIELD OF UNTREATED WHITE PAPER ALL ROUND IT BEING SURE THAT EVEN THE LITTLE DIAMONDS IN THE ROUND WINDOW WILL TELL AS JEWELS IF THEY ARE GRADATED JUSTLY",
      "hyp_norm": "NOW YOU WILL SEE IN THESE STUDIES THAT THE MOMENT THE WHITE IS ENCLOSED PROPERLY AND HARMONIZED WITH THE OTHER HUES IT BECOMES SOMEHOW MORE PRECIOUS AND PEARLY THAN THE WHITE PAPER AND THAT I AM NOT AFRAID TO LEAVE A WHOLE FIELD OF UNTREATED WHITE PAPER ALL ROUND IT BEING SURE THAT EVEN THE LITTLE DIAMONDS IN THE ROUND WINDOW WILL TELL AS JEWELS IF THEY ARE GRADATED JUSTLY",
      "duration_s": 23.06,
      "infer_time_s": 6.043,
      "rtf": 0.262,
      "wer": 0.0143
    },
    {
      "id": "1188-133604-0010",
      "ref": "BUT IN THIS VIGNETTE COPIED FROM TURNER YOU HAVE THE TWO PRINCIPLES BROUGHT OUT PERFECTLY",
      "hyp": "But in this vignette , copied from Turner , you have the two principles brought out perfectly.",
      "ref_norm": "BUT IN THIS VIGNETTE COPIED FROM TURNER YOU HAVE THE TWO PRINCIPLES BROUGHT OUT PERFECTLY",
      "hyp_norm": "BUT IN THIS VIGNETTE COPIED FROM TURNER YOU HAVE THE TWO PRINCIPLES BROUGHT OUT PERFECTLY",
      "duration_s": 6.095,
      "infer_time_s": 1.529,
      "rtf": 0.2509,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0011",
      "ref": "THEY ARE BEYOND ALL OTHER WORKS THAT I KNOW EXISTING DEPENDENT FOR THEIR EFFECT ON LOW SUBDUED TONES THEIR FAVORITE CHOICE IN TIME OF DAY BEING EITHER DAWN OR TWILIGHT AND EVEN THEIR BRIGHTEST SUNSETS PRODUCED CHIEFLY OUT OF GRAY PAPER",
      "hyp": "They are beyond all other works that I know existing , dependent for their effect on low, subdued tones. Their favorite choice in time of day being either dawn or twilight , and even their brightest sunsets produced chiefly out of gray paper.",
      "ref_norm": "THEY ARE BEYOND ALL OTHER WORKS THAT I KNOW EXISTING DEPENDENT FOR THEIR EFFECT ON LOW SUBDUED TONES THEIR FAVORITE CHOICE IN TIME OF DAY BEING EITHER DAWN OR TWILIGHT AND EVEN THEIR BRIGHTEST SUNSETS PRODUCED CHIEFLY OUT OF GRAY PAPER",
      "hyp_norm": "THEY ARE BEYOND ALL OTHER WORKS THAT I KNOW EXISTING DEPENDENT FOR THEIR EFFECT ON LOW SUBDUED TONES THEIR FAVORITE CHOICE IN TIME OF DAY BEING EITHER DAWN OR TWILIGHT AND EVEN THEIR BRIGHTEST SUNSETS PRODUCED CHIEFLY OUT OF GRAY PAPER",
      "duration_s": 15.19,
      "infer_time_s": 3.707,
      "rtf": 0.2441,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0012",
      "ref": "IT MAY BE THAT A GREAT COLORIST WILL USE HIS UTMOST FORCE OF COLOR AS A SINGER HIS FULL POWER OF VOICE BUT LOUD OR LOW THE VIRTUE IS IN BOTH CASES ALWAYS IN REFINEMENT NEVER IN LOUDNESS",
      "hyp": "It may be that a great colorist will use his utmost force of color , as a singer his full power of voice , but loud or low, the virtue is in both cases always in refinement, never in loudness.",
      "ref_norm": "IT MAY BE THAT A GREAT COLORIST WILL USE HIS UTMOST FORCE OF COLOR AS A SINGER HIS FULL POWER OF VOICE BUT LOUD OR LOW THE VIRTUE IS IN BOTH CASES ALWAYS IN REFINEMENT NEVER IN LOUDNESS",
      "hyp_norm": "IT MAY BE THAT A GREAT COLORIST WILL USE HIS UTMOST FORCE OF COLOR AS A SINGER HIS FULL POWER OF VOICE BUT LOUD OR LOW THE VIRTUE IS IN BOTH CASES ALWAYS IN REFINEMENT NEVER IN LOUDNESS",
      "duration_s": 14.65,
      "infer_time_s": 3.604,
      "rtf": 0.246,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0013",
      "ref": "IT MUST REMEMBER BE ONE OR THE OTHER",
      "hyp": "It must remember be one or the other.",
      "ref_norm": "IT MUST REMEMBER BE ONE OR THE OTHER",
      "hyp_norm": "IT MUST REMEMBER BE ONE OR THE OTHER",
      "duration_s": 3.02,
      "infer_time_s": 0.729,
      "rtf": 0.2414,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0014",
      "ref": "DO NOT THEREFORE THINK THAT THE GOTHIC SCHOOL IS AN EASY ONE",
      "hyp": "Do not therefore think that the Gothic school is an easy one.",
      "ref_norm": "DO NOT THEREFORE THINK THAT THE GOTHIC SCHOOL IS AN EASY ONE",
      "hyp_norm": "DO NOT THEREFORE THINK THAT THE GOTHIC SCHOOL IS AN EASY ONE",
      "duration_s": 4.39,
      "infer_time_s": 1.08,
      "rtf": 0.2461,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0015",
      "ref": "THE LAW OF THAT SCHOOL IS THAT EVERYTHING SHALL BE SEEN CLEARLY OR AT LEAST ONLY IN SUCH MIST OR FAINTNESS AS SHALL BE DELIGHTFUL AND I HAVE NO DOUBT THAT THE BEST INTRODUCTION TO IT WOULD BE THE ELEMENTARY PRACTICE OF PAINTING EVERY STUDY ON A GOLDEN GROUND",
      "hyp": "The law of that school was that everything shall be seen clearly, or at least , only in such mist or faintness as shall be delightful . And I have no doubt that the best introduction to it would be the elementary practice of painting every study on a golden ground.",
      "ref_norm": "THE LAW OF THAT SCHOOL IS THAT EVERYTHING SHALL BE SEEN CLEARLY OR AT LEAST ONLY IN SUCH MIST OR FAINTNESS AS SHALL BE DELIGHTFUL AND I HAVE NO DOUBT THAT THE BEST INTRODUCTION TO IT WOULD BE THE ELEMENTARY PRACTICE OF PAINTING EVERY STUDY ON A GOLDEN GROUND",
      "hyp_norm": "THE LAW OF THAT SCHOOL WAS THAT EVERYTHING SHALL BE SEEN CLEARLY OR AT LEAST ONLY IN SUCH MIST OR FAINTNESS AS SHALL BE DELIGHTFUL AND I HAVE NO DOUBT THAT THE BEST INTRODUCTION TO IT WOULD BE THE ELEMENTARY PRACTICE OF PAINTING EVERY STUDY ON A GOLDEN GROUND",
      "duration_s": 16.085,
      "infer_time_s": 4.236,
      "rtf": 0.2633,
      "wer": 0.0204
    },
    {
      "id": "1188-133604-0016",
      "ref": "THIS AT ONCE COMPELS YOU TO UNDERSTAND THAT THE WORK IS TO BE IMAGINATIVE AND DECORATIVE THAT IT REPRESENTS BEAUTIFUL THINGS IN THE CLEAREST WAY BUT NOT UNDER EXISTING CONDITIONS AND THAT IN FACT YOU ARE PRODUCING JEWELER'S WORK RATHER THAN PICTURES",
      "hyp": "This at once comp els you to understand that the work is to be imaginative and decorative, that it represents beautiful things in the clearest way , but not under existing conditions, and that, in fact, you are producing jeweler's work rather than pictures.",
      "ref_norm": "THIS AT ONCE COMPELS YOU TO UNDERSTAND THAT THE WORK IS TO BE IMAGINATIVE AND DECORATIVE THAT IT REPRESENTS BEAUTIFUL THINGS IN THE CLEAREST WAY BUT NOT UNDER EXISTING CONDITIONS AND THAT IN FACT YOU ARE PRODUCING JEWELERS WORK RATHER THAN PICTURES",
      "hyp_norm": "THIS AT ONCE COMP ELS YOU TO UNDERSTAND THAT THE WORK IS TO BE IMAGINATIVE AND DECORATIVE THAT IT REPRESENTS BEAUTIFUL THINGS IN THE CLEAREST WAY BUT NOT UNDER EXISTING CONDITIONS AND THAT IN FACT YOU ARE PRODUCING JEWELERS WORK RATHER THAN PICTURES",
      "duration_s": 16.595,
      "infer_time_s": 4.136,
      "rtf": 0.2493,
      "wer": 0.0476
    },
    {
      "id": "1188-133604-0017",
      "ref": "THAT A STYLE IS RESTRAINED OR SEVERE DOES NOT MEAN THAT IT IS ALSO ERRONEOUS",
      "hyp": "That a style is restrained or severe does not mean that it is also erroneous.",
      "ref_norm": "THAT A STYLE IS RESTRAINED OR SEVERE DOES NOT MEAN THAT IT IS ALSO ERRONEOUS",
      "hyp_norm": "THAT A STYLE IS RESTRAINED OR SEVERE DOES NOT MEAN THAT IT IS ALSO ERRONEOUS",
      "duration_s": 4.615,
      "infer_time_s": 1.227,
      "rtf": 0.2659,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0018",
      "ref": "IN ALL EARLY GOTHIC ART INDEED YOU WILL FIND FAILURE OF THIS KIND ESPECIALLY DISTORTION AND RIGIDITY WHICH ARE IN MANY RESPECTS PAINFULLY TO BE COMPARED WITH THE SPLENDID REPOSE OF CLASSIC ART",
      "hyp": "In all early Gothic art, indeed, you will find failure of this kind, especially distortion and rigidity , which are in many respects painfully to be compared with the splendid repose of classic art.",
      "ref_norm": "IN ALL EARLY GOTHIC ART INDEED YOU WILL FIND FAILURE OF THIS KIND ESPECIALLY DISTORTION AND RIGIDITY WHICH ARE IN MANY RESPECTS PAINFULLY TO BE COMPARED WITH THE SPLENDID REPOSE OF CLASSIC ART",
      "hyp_norm": "IN ALL EARLY GOTHIC ART INDEED YOU WILL FIND FAILURE OF THIS KIND ESPECIALLY DISTORTION AND RIGIDITY WHICH ARE IN MANY RESPECTS PAINFULLY TO BE COMPARED WITH THE SPLENDID REPOSE OF CLASSIC ART",
      "duration_s": 11.55,
      "infer_time_s": 3.003,
      "rtf": 0.26,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0019",
      "ref": "THE LARGE LETTER CONTAINS INDEED ENTIRELY FEEBLE AND ILL DRAWN FIGURES THAT IS MERELY CHILDISH AND FAILING WORK OF AN INFERIOR HAND IT IS NOT CHARACTERISTIC OF GOTHIC OR ANY OTHER SCHOOL",
      "hyp": "The large letter contains indeed entirely feeble and ill-drawn figures. That is merely childish and failing work of an inferior hand. It is not characteristic of Gothic or any other school.",
      "ref_norm": "THE LARGE LETTER CONTAINS INDEED ENTIRELY FEEBLE AND ILL DRAWN FIGURES THAT IS MERELY CHILDISH AND FAILING WORK OF AN INFERIOR HAND IT IS NOT CHARACTERISTIC OF GOTHIC OR ANY OTHER SCHOOL",
      "hyp_norm": "THE LARGE LETTER CONTAINS INDEED ENTIRELY FEEBLE AND ILLDRAWN FIGURES THAT IS MERELY CHILDISH AND FAILING WORK OF AN INFERIOR HAND IT IS NOT CHARACTERISTIC OF GOTHIC OR ANY OTHER SCHOOL",
      "duration_s": 13.93,
      "infer_time_s": 3.013,
      "rtf": 0.2163,
      "wer": 0.0625
    },
    {
      "id": "1188-133604-0020",
      "ref": "BUT OBSERVE YOU CAN ONLY DO THIS ON ONE CONDITION THAT OF STRIVING ALSO TO CREATE IN REALITY THE BEAUTY WHICH YOU SEEK IN IMAGINATION",
      "hyp": "But observe , you can only do this on one condition , that of striving also to create in reality , the beauty which you seek in imagination.",
      "ref_norm": "BUT OBSERVE YOU CAN ONLY DO THIS ON ONE CONDITION THAT OF STRIVING ALSO TO CREATE IN REALITY THE BEAUTY WHICH YOU SEEK IN IMAGINATION",
      "hyp_norm": "BUT OBSERVE YOU CAN ONLY DO THIS ON ONE CONDITION THAT OF STRIVING ALSO TO CREATE IN REALITY THE BEAUTY WHICH YOU SEEK IN IMAGINATION",
      "duration_s": 10.26,
      "infer_time_s": 2.399,
      "rtf": 0.2338,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0021",
      "ref": "IT WILL BE WHOLLY IMPOSSIBLE FOR YOU TO RETAIN THE TRANQUILLITY OF TEMPER AND FELICITY OF FAITH NECESSARY FOR NOBLE PURIST PAINTING UNLESS YOU ARE ACTIVELY ENGAGED IN PROMOTING THE FELICITY AND PEACE OF PRACTICAL LIFE",
      "hyp": "It will be wholly impossible for you to retain the tranquility of temper and felicity of faith necessary for noble, purest painting , unless you are actively engaged in promoting the felicity and peace of practical life.",
      "ref_norm": "IT WILL BE WHOLLY IMPOSSIBLE FOR YOU TO RETAIN THE TRANQUILLITY OF TEMPER AND FELICITY OF FAITH NECESSARY FOR NOBLE PURIST PAINTING UNLESS YOU ARE ACTIVELY ENGAGED IN PROMOTING THE FELICITY AND PEACE OF PRACTICAL LIFE",
      "hyp_norm": "IT WILL BE WHOLLY IMPOSSIBLE FOR YOU TO RETAIN THE TRANQUILITY OF TEMPER AND FELICITY OF FAITH NECESSARY FOR NOBLE PUREST PAINTING UNLESS YOU ARE ACTIVELY ENGAGED IN PROMOTING THE FELICITY AND PEACE OF PRACTICAL LIFE",
      "duration_s": 14.02,
      "infer_time_s": 3.477,
      "rtf": 0.248,
      "wer": 0.0556
    },
    {
      "id": "1188-133604-0022",
      "ref": "YOU MUST LOOK AT HIM IN THE FACE FIGHT HIM CONQUER HIM WITH WHAT SCATHE YOU MAY YOU NEED NOT THINK TO KEEP OUT OF THE WAY OF HIM",
      "hyp": "You must look at him in the face, fight him, conquer him , with what scathe you may. You need not think to keep out of the way of him.",
      "ref_norm": "YOU MUST LOOK AT HIM IN THE FACE FIGHT HIM CONQUER HIM WITH WHAT SCATHE YOU MAY YOU NEED NOT THINK TO KEEP OUT OF THE WAY OF HIM",
      "hyp_norm": "YOU MUST LOOK AT HIM IN THE FACE FIGHT HIM CONQUER HIM WITH WHAT SCATHE YOU MAY YOU NEED NOT THINK TO KEEP OUT OF THE WAY OF HIM",
      "duration_s": 9.63,
      "infer_time_s": 2.529,
      "rtf": 0.2626,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0023",
      "ref": "THE COLORIST SAYS FIRST OF ALL AS MY DELICIOUS PAROQUET WAS RUBY SO THIS NASTY VIPER SHALL BE BLACK AND THEN IS THE QUESTION CAN I ROUND HIM OFF EVEN THOUGH HE IS BLACK AND MAKE HIM SLIMY AND YET SPRINGY AND CLOSE DOWN CLOTTED LIKE A POOL OF BLACK BLOOD ON THE EARTH ALL THE SAME",
      "hyp": "The colorist says, \"First of all , as my delicious parquet was ruby , so this nasty viper shall be black .\" And then is the question: Can I round him off, even though he is black, and make him slimy , and yet springy and close down, clotted like a pool of black blood on the earth, all the same?",
      "ref_norm": "THE COLORIST SAYS FIRST OF ALL AS MY DELICIOUS PAROQUET WAS RUBY SO THIS NASTY VIPER SHALL BE BLACK AND THEN IS THE QUESTION CAN I ROUND HIM OFF EVEN THOUGH HE IS BLACK AND MAKE HIM SLIMY AND YET SPRINGY AND CLOSE DOWN CLOTTED LIKE A POOL OF BLACK BLOOD ON THE EARTH ALL THE SAME",
      "hyp_norm": "THE COLORIST SAYS FIRST OF ALL AS MY DELICIOUS PARQUET WAS RUBY SO THIS NASTY VIPER SHALL BE BLACK AND THEN IS THE QUESTION CAN I ROUND HIM OFF EVEN THOUGH HE IS BLACK AND MAKE HIM SLIMY AND YET SPRINGY AND CLOSE DOWN CLOTTED LIKE A POOL OF BLACK BLOOD ON THE EARTH ALL THE SAME",
      "duration_s": 23.67,
      "infer_time_s": 5.734,
      "rtf": 0.2422,
      "wer": 0.0175
    },
    {
      "id": "1188-133604-0024",
      "ref": "NOTHING WILL BE MORE PRECIOUS TO YOU I THINK IN THE PRACTICAL STUDY OF ART THAN THE CONVICTION WHICH WILL FORCE ITSELF ON YOU MORE AND MORE EVERY HOUR OF THE WAY ALL THINGS ARE BOUND TOGETHER LITTLE AND GREAT IN SPIRIT AND IN MATTER",
      "hyp": "Nothing will be more precious to you. I think, in the practical study of art, than the conviction , which will force itself on you more and more every hour , of the way all things are bound together, little and great, in spirit and in matter.",
      "ref_norm": "NOTHING WILL BE MORE PRECIOUS TO YOU I THINK IN THE PRACTICAL STUDY OF ART THAN THE CONVICTION WHICH WILL FORCE ITSELF ON YOU MORE AND MORE EVERY HOUR OF THE WAY ALL THINGS ARE BOUND TOGETHER LITTLE AND GREAT IN SPIRIT AND IN MATTER",
      "hyp_norm": "NOTHING WILL BE MORE PRECIOUS TO YOU I THINK IN THE PRACTICAL STUDY OF ART THAN THE CONVICTION WHICH WILL FORCE ITSELF ON YOU MORE AND MORE EVERY HOUR OF THE WAY ALL THINGS ARE BOUND TOGETHER LITTLE AND GREAT IN SPIRIT AND IN MATTER",
      "duration_s": 15.24,
      "infer_time_s": 4.0,
      "rtf": 0.2625,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0025",
      "ref": "YOU KNOW I HAVE JUST BEEN TELLING YOU HOW THIS SCHOOL OF MATERIALISM AND CLAY INVOLVED ITSELF AT LAST IN CLOUD AND FIRE",
      "hyp": "You know I've just been telling you how this school of materialism in clay involved itself at last in cloud and fire.",
      "ref_norm": "YOU KNOW I HAVE JUST BEEN TELLING YOU HOW THIS SCHOOL OF MATERIALISM AND CLAY INVOLVED ITSELF AT LAST IN CLOUD AND FIRE",
      "hyp_norm": "YOU KNOW IVE JUST BEEN TELLING YOU HOW THIS SCHOOL OF MATERIALISM IN CLAY INVOLVED ITSELF AT LAST IN CLOUD AND FIRE",
      "duration_s": 7.45,
      "infer_time_s": 1.857,
      "rtf": 0.2493,
      "wer": 0.1304
    },
    {
      "id": "1188-133604-0026",
      "ref": "HERE IS AN EQUALLY TYPICAL GREEK SCHOOL LANDSCAPE BY WILSON LOST WHOLLY IN GOLDEN MIST THE TREES SO SLIGHTLY DRAWN THAT YOU DON'T KNOW IF THEY ARE TREES OR TOWERS AND NO CARE FOR COLOR WHATEVER PERFECTLY DECEPTIVE AND MARVELOUS EFFECT OF SUNSHINE THROUGH THE MIST APOLLO AND THE PYTHON",
      "hyp": "Here is an equally typical Greek school landscape by Wilson, lost wholly in golden mist . The trees so slightly drawn that you don't know if they are trees or towers , and no care for color whatsoever. Perfectly deceptive in marvelous effect of sunshine through the mist, Apollo and the Python.",
      "ref_norm": "HERE IS AN EQUALLY TYPICAL GREEK SCHOOL LANDSCAPE BY WILSON LOST WHOLLY IN GOLDEN MIST THE TREES SO SLIGHTLY DRAWN THAT YOU DONT KNOW IF THEY ARE TREES OR TOWERS AND NO CARE FOR COLOR WHATEVER PERFECTLY DECEPTIVE AND MARVELOUS EFFECT OF SUNSHINE THROUGH THE MIST APOLLO AND THE PYTHON",
      "hyp_norm": "HERE IS AN EQUALLY TYPICAL GREEK SCHOOL LANDSCAPE BY WILSON LOST WHOLLY IN GOLDEN MIST THE TREES SO SLIGHTLY DRAWN THAT YOU DONT KNOW IF THEY ARE TREES OR TOWERS AND NO CARE FOR COLOR WHATSOEVER PERFECTLY DECEPTIVE IN MARVELOUS EFFECT OF SUNSHINE THROUGH THE MIST APOLLO AND THE PYTHON",
      "duration_s": 20.125,
      "infer_time_s": 4.804,
      "rtf": 0.2387,
      "wer": 0.04
    },
    {
      "id": "1188-133604-0027",
      "ref": "NOW HERE IS RAPHAEL EXACTLY BETWEEN THE TWO TREES STILL DRAWN LEAF BY LEAF WHOLLY FORMAL BUT BEAUTIFUL MIST COMING GRADUALLY INTO THE DISTANCE",
      "hyp": "Now here is Raphael , exactly between the two trees, still drawn leaf by leaf, wholly formal , but beautiful mist coming gradually into the distance.",
      "ref_norm": "NOW HERE IS RAPHAEL EXACTLY BETWEEN THE TWO TREES STILL DRAWN LEAF BY LEAF WHOLLY FORMAL BUT BEAUTIFUL MIST COMING GRADUALLY INTO THE DISTANCE",
      "hyp_norm": "NOW HERE IS RAPHAEL EXACTLY BETWEEN THE TWO TREES STILL DRAWN LEAF BY LEAF WHOLLY FORMAL BUT BEAUTIFUL MIST COMING GRADUALLY INTO THE DISTANCE",
      "duration_s": 11.245,
      "infer_time_s": 2.42,
      "rtf": 0.2152,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0028",
      "ref": "WELL THEN LAST HERE IS TURNER'S GREEK SCHOOL OF THE HIGHEST CLASS AND YOU DEFINE HIS ART ABSOLUTELY AS FIRST THE DISPLAYING INTENSELY AND WITH THE STERNEST INTELLECT OF NATURAL FORM AS IT IS AND THEN THE ENVELOPMENT OF IT WITH CLOUD AND FIRE",
      "hyp": "Well then, last here is Turner's , Greek school of the highest class, and you define his art absolutely, as first the displaying intensely and with the sternest intellect, of natural form as it is, and then the envelopment of it with cloud and fire.",
      "ref_norm": "WELL THEN LAST HERE IS TURNERS GREEK SCHOOL OF THE HIGHEST CLASS AND YOU DEFINE HIS ART ABSOLUTELY AS FIRST THE DISPLAYING INTENSELY AND WITH THE STERNEST INTELLECT OF NATURAL FORM AS IT IS AND THEN THE ENVELOPMENT OF IT WITH CLOUD AND FIRE",
      "hyp_norm": "WELL THEN LAST HERE IS TURNERS GREEK SCHOOL OF THE HIGHEST CLASS AND YOU DEFINE HIS ART ABSOLUTELY AS FIRST THE DISPLAYING INTENSELY AND WITH THE STERNEST INTELLECT OF NATURAL FORM AS IT IS AND THEN THE ENVELOPMENT OF IT WITH CLOUD AND FIRE",
      "duration_s": 19.005,
      "infer_time_s": 4.41,
      "rtf": 0.2321,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0029",
      "ref": "ONLY THERE ARE TWO SORTS OF CLOUD AND FIRE",
      "hyp": "Only, there are two sorts of cloud and fire.",
      "ref_norm": "ONLY THERE ARE TWO SORTS OF CLOUD AND FIRE",
      "hyp_norm": "ONLY THERE ARE TWO SORTS OF CLOUD AND FIRE",
      "duration_s": 3.705,
      "infer_time_s": 0.846,
      "rtf": 0.2285,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0030",
      "ref": "HE KNOWS THEM BOTH",
      "hyp": "He knows them both.",
      "ref_norm": "HE KNOWS THEM BOTH",
      "hyp_norm": "HE KNOWS THEM BOTH",
      "duration_s": 1.915,
      "infer_time_s": 0.4,
      "rtf": 0.2091,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0031",
      "ref": "THERE'S ONE AND THERE'S ANOTHER THE DUDLEY AND THE FLINT",
      "hyp": "There's one and there's another , the Dudley and the Flint.",
      "ref_norm": "THERES ONE AND THERES ANOTHER THE DUDLEY AND THE FLINT",
      "hyp_norm": "THERES ONE AND THERES ANOTHER THE DUDLEY AND THE FLINT",
      "duration_s": 4.25,
      "infer_time_s": 1.122,
      "rtf": 0.264,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0032",
      "ref": "IT IS ONLY A PENCIL OUTLINE BY EDWARD BURNE JONES IN ILLUSTRATION OF THE STORY OF PSYCHE IT IS THE INTRODUCTION OF PSYCHE AFTER ALL HER TROUBLES INTO HEAVEN",
      "hyp": "It is only a pencil outline by Edward Burn Jones, in illustration of the story of Psyche . It is the introduction of Psyche after all her troubles into heaven.",
      "ref_norm": "IT IS ONLY A PENCIL OUTLINE BY EDWARD BURNE JONES IN ILLUSTRATION OF THE STORY OF PSYCHE IT IS THE INTRODUCTION OF PSYCHE AFTER ALL HER TROUBLES INTO HEAVEN",
      "hyp_norm": "IT IS ONLY A PENCIL OUTLINE BY EDWARD BURN JONES IN ILLUSTRATION OF THE STORY OF PSYCHE IT IS THE INTRODUCTION OF PSYCHE AFTER ALL HER TROUBLES INTO HEAVEN",
      "duration_s": 10.985,
      "infer_time_s": 2.68,
      "rtf": 0.244,
      "wer": 0.0345
    },
    {
      "id": "1188-133604-0033",
      "ref": "EVERY PLANT IN THE GRASS IS SET FORMALLY GROWS PERFECTLY AND MAY BE REALIZED COMPLETELY",
      "hyp": "Every plant in the grass is set formally, grows perfectly, and may be realized completely.",
      "ref_norm": "EVERY PLANT IN THE GRASS IS SET FORMALLY GROWS PERFECTLY AND MAY BE REALIZED COMPLETELY",
      "hyp_norm": "EVERY PLANT IN THE GRASS IS SET FORMALLY GROWS PERFECTLY AND MAY BE REALIZED COMPLETELY",
      "duration_s": 6.625,
      "infer_time_s": 1.47,
      "rtf": 0.2218,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0034",
      "ref": "EXQUISITE ORDER AND UNIVERSAL WITH ETERNAL LIFE AND LIGHT THIS IS THE FAITH AND EFFORT OF THE SCHOOLS OF CRYSTAL AND YOU MAY DESCRIBE AND COMPLETE THEIR WORK QUITE LITERALLY BY TAKING ANY VERSES OF CHAUCER IN HIS TENDER MOOD AND OBSERVING HOW HE INSISTS ON THE CLEARNESS AND BRIGHTNESS FIRST AND THEN ON THE ORDER",
      "hyp": "Exquisite order and universal, with eternal life and light, this is the faith and effort of the schools of crystal . And you may describe and complete their work quite literally, by taking any verses of Chaucer in his tender mood, and observing how he insists on the clearness and brightness first, and then on the order.",
      "ref_norm": "EXQUISITE ORDER AND UNIVERSAL WITH ETERNAL LIFE AND LIGHT THIS IS THE FAITH AND EFFORT OF THE SCHOOLS OF CRYSTAL AND YOU MAY DESCRIBE AND COMPLETE THEIR WORK QUITE LITERALLY BY TAKING ANY VERSES OF CHAUCER IN HIS TENDER MOOD AND OBSERVING HOW HE INSISTS ON THE CLEARNESS AND BRIGHTNESS FIRST AND THEN ON THE ORDER",
      "hyp_norm": "EXQUISITE ORDER AND UNIVERSAL WITH ETERNAL LIFE AND LIGHT THIS IS THE FAITH AND EFFORT OF THE SCHOOLS OF CRYSTAL AND YOU MAY DESCRIBE AND COMPLETE THEIR WORK QUITE LITERALLY BY TAKING ANY VERSES OF CHAUCER IN HIS TENDER MOOD AND OBSERVING HOW HE INSISTS ON THE CLEARNESS AND BRIGHTNESS FIRST AND THEN ON THE ORDER",
      "duration_s": 20.905,
      "infer_time_s": 5.27,
      "rtf": 0.2521,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0035",
      "ref": "THUS IN CHAUCER'S DREAM",
      "hyp": "Thus, in Ch aucer's dream.",
      "ref_norm": "THUS IN CHAUCERS DREAM",
      "hyp_norm": "THUS IN CH AUCERS DREAM",
      "duration_s": 2.925,
      "infer_time_s": 0.727,
      "rtf": 0.2485,
      "wer": 0.5
    },
    {
      "id": "1188-133604-0036",
      "ref": "IN BOTH THESE HIGH MYTHICAL SUBJECTS THE SURROUNDING NATURE THOUGH SUFFERING IS STILL DIGNIFIED AND BEAUTIFUL",
      "hyp": "In both these high mythical subjects , the surrounding nature , though suffering, is still dignified and beautiful.",
      "ref_norm": "IN BOTH THESE HIGH MYTHICAL SUBJECTS THE SURROUNDING NATURE THOUGH SUFFERING IS STILL DIGNIFIED AND BEAUTIFUL",
      "hyp_norm": "IN BOTH THESE HIGH MYTHICAL SUBJECTS THE SURROUNDING NATURE THOUGH SUFFERING IS STILL DIGNIFIED AND BEAUTIFUL",
      "duration_s": 7.97,
      "infer_time_s": 1.653,
      "rtf": 0.2074,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0037",
      "ref": "EVERY LINE IN WHICH THE MASTER TRACES IT EVEN WHERE SEEMINGLY NEGLIGENT IS LOVELY AND SET DOWN WITH A MEDITATIVE CALMNESS WHICH MAKES THESE TWO ETCHINGS CAPABLE OF BEING PLACED BESIDE THE MOST TRANQUIL WORK OF HOLBEIN OR DUERER",
      "hyp": "Every line in which the master traces it , even where seemingly negligent, is lovely and set down with a meditative calmness, which makes these two etchings capable of being placed beside the most tranquil work of Holbein or D\u00fcrer.",
      "ref_norm": "EVERY LINE IN WHICH THE MASTER TRACES IT EVEN WHERE SEEMINGLY NEGLIGENT IS LOVELY AND SET DOWN WITH A MEDITATIVE CALMNESS WHICH MAKES THESE TWO ETCHINGS CAPABLE OF BEING PLACED BESIDE THE MOST TRANQUIL WORK OF HOLBEIN OR DUERER",
      "hyp_norm": "EVERY LINE IN WHICH THE MASTER TRACES IT EVEN WHERE SEEMINGLY NEGLIGENT IS LOVELY AND SET DOWN WITH A MEDITATIVE CALMNESS WHICH MAKES THESE TWO ETCHINGS CAPABLE OF BEING PLACED BESIDE THE MOST TRANQUIL WORK OF HOLBEIN OR D\u00dcRER",
      "duration_s": 14.51,
      "infer_time_s": 3.909,
      "rtf": 0.2694,
      "wer": 0.0256
    },
    {
      "id": "1188-133604-0038",
      "ref": "BUT NOW HERE IS A SUBJECT OF WHICH YOU WILL WONDER AT FIRST WHY TURNER DREW IT AT ALL",
      "hyp": "But now here is a subject of which, you will wonder at first why Turner drew it at all.",
      "ref_norm": "BUT NOW HERE IS A SUBJECT OF WHICH YOU WILL WONDER AT FIRST WHY TURNER DREW IT AT ALL",
      "hyp_norm": "BUT NOW HERE IS A SUBJECT OF WHICH YOU WILL WONDER AT FIRST WHY TURNER DREW IT AT ALL",
      "duration_s": 5.365,
      "infer_time_s": 1.483,
      "rtf": 0.2765,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0039",
      "ref": "IT HAS NO BEAUTY WHATSOEVER NO SPECIALTY OF PICTURESQUENESS AND ALL ITS LINES ARE CRAMPED AND POOR",
      "hyp": "It has no beauty whatsoever . No specialty of picturesque ness, and all its lines are cramped and poor.",
      "ref_norm": "IT HAS NO BEAUTY WHATSOEVER NO SPECIALTY OF PICTURESQUENESS AND ALL ITS LINES ARE CRAMPED AND POOR",
      "hyp_norm": "IT HAS NO BEAUTY WHATSOEVER NO SPECIALTY OF PICTURESQUE NESS AND ALL ITS LINES ARE CRAMPED AND POOR",
      "duration_s": 6.625,
      "infer_time_s": 1.627,
      "rtf": 0.2456,
      "wer": 0.1176
    },
    {
      "id": "1188-133604-0040",
      "ref": "THE CRAMPNESS AND THE POVERTY ARE ALL INTENDED",
      "hyp": "The crampedness and the poverty are all intended.",
      "ref_norm": "THE CRAMPNESS AND THE POVERTY ARE ALL INTENDED",
      "hyp_norm": "THE CRAMPEDNESS AND THE POVERTY ARE ALL INTENDED",
      "duration_s": 3.23,
      "infer_time_s": 0.784,
      "rtf": 0.2428,
      "wer": 0.125
    },
    {
      "id": "1188-133604-0041",
      "ref": "IT IS A GLEANER BRINGING DOWN HER ONE SHEAF OF CORN TO AN OLD WATERMILL ITSELF MOSSY AND RENT SCARCELY ABLE TO GET ITS STONES TO TURN",
      "hyp": "It is a gleaner bringing down her one sheaf of corn to an old water mill, itself moss y and rent, scarcely able to get its stones to turn.",
      "ref_norm": "IT IS A GLEANER BRINGING DOWN HER ONE SHEAF OF CORN TO AN OLD WATERMILL ITSELF MOSSY AND RENT SCARCELY ABLE TO GET ITS STONES TO TURN",
      "hyp_norm": "IT IS A GLEANER BRINGING DOWN HER ONE SHEAF OF CORN TO AN OLD WATER MILL ITSELF MOSS Y AND RENT SCARCELY ABLE TO GET ITS STONES TO TURN",
      "duration_s": 10.07,
      "infer_time_s": 2.69,
      "rtf": 0.2671,
      "wer": 0.1481
    },
    {
      "id": "1188-133604-0042",
      "ref": "THE SCENE IS ABSOLUTELY ARCADIAN",
      "hyp": "The scene is absolutely Arcadian.",
      "ref_norm": "THE SCENE IS ABSOLUTELY ARCADIAN",
      "hyp_norm": "THE SCENE IS ABSOLUTELY ARCADIAN",
      "duration_s": 2.66,
      "infer_time_s": 0.635,
      "rtf": 0.2388,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0043",
      "ref": "SEE THAT YOUR LIVES BE IN NOTHING WORSE THAN A BOY'S CLIMBING FOR HIS ENTANGLED KITE",
      "hyp": "See that your lives be in nothing worse than a boy's climbing for his entangled kite.",
      "ref_norm": "SEE THAT YOUR LIVES BE IN NOTHING WORSE THAN A BOYS CLIMBING FOR HIS ENTANGLED KITE",
      "hyp_norm": "SEE THAT YOUR LIVES BE IN NOTHING WORSE THAN A BOYS CLIMBING FOR HIS ENTANGLED KITE",
      "duration_s": 4.885,
      "infer_time_s": 1.392,
      "rtf": 0.285,
      "wer": 0.0
    },
    {
      "id": "1188-133604-0044",
      "ref": "IT WILL BE WELL FOR YOU IF YOU JOIN NOT WITH THOSE WHO INSTEAD OF KITES FLY FALCONS WHO INSTEAD OF OBEYING THE LAST WORDS OF THE GREAT CLOUD SHEPHERD TO FEED HIS SHEEP LIVE THE LIVES HOW MUCH LESS THAN VANITY OF THE WAR WOLF AND THE GIER EAGLE",
      "hyp": "It will be well for you , if you join not with those who, instead of kites, fly falcons, who instead of obeying the last words of the great cloud shepherd , to feed his sheep, live the lives . How much less than vanity. Of the warwolf and the gear eagle.",
      "ref_norm": "IT WILL BE WELL FOR YOU IF YOU JOIN NOT WITH THOSE WHO INSTEAD OF KITES FLY FALCONS WHO INSTEAD OF OBEYING THE LAST WORDS OF THE GREAT CLOUD SHEPHERD TO FEED HIS SHEEP LIVE THE LIVES HOW MUCH LESS THAN VANITY OF THE WAR WOLF AND THE GIER EAGLE",
      "hyp_norm": "IT WILL BE WELL FOR YOU IF YOU JOIN NOT WITH THOSE WHO INSTEAD OF KITES FLY FALCONS WHO INSTEAD OF OBEYING THE LAST WORDS OF THE GREAT CLOUD SHEPHERD TO FEED HIS SHEEP LIVE THE LIVES HOW MUCH LESS THAN VANITY OF THE WARWOLF AND THE GEAR EAGLE",
      "duration_s": 18.545,
      "infer_time_s": 4.808,
      "rtf": 0.2593,
      "wer": 0.06
    },
    {
      "id": "121-121726-0000",
      "ref": "ALSO A POPULAR CONTRIVANCE WHEREBY LOVE MAKING MAY BE SUSPENDED BUT NOT STOPPED DURING THE PICNIC SEASON",
      "hyp": "Also, a popular contrivance whereby love-making may be suspended but not stopped during the picnic season.",
      "ref_norm": "ALSO A POPULAR CONTRIVANCE WHEREBY LOVE MAKING MAY BE SUSPENDED BUT NOT STOPPED DURING THE PICNIC SEASON",
      "hyp_norm": "ALSO A POPULAR CONTRIVANCE WHEREBY LOVEMAKING MAY BE SUSPENDED BUT NOT STOPPED DURING THE PICNIC SEASON",
      "duration_s": 8.46,
      "infer_time_s": 1.818,
      "rtf": 0.2149,
      "wer": 0.1176
    },
    {
      "id": "121-121726-0001",
      "ref": "HARANGUE THE TIRESOME PRODUCT OF A TIRELESS TONGUE",
      "hyp": "Haring . The tiresome product of a tireless tongue.",
      "ref_norm": "HARANGUE THE TIRESOME PRODUCT OF A TIRELESS TONGUE",
      "hyp_norm": "HARING THE TIRESOME PRODUCT OF A TIRELESS TONGUE",
      "duration_s": 5.925,
      "infer_time_s": 1.083,
      "rtf": 0.1828,
      "wer": 0.125
    },
    {
      "id": "121-121726-0002",
      "ref": "ANGOR PAIN PAINFUL TO HEAR",
      "hyp": "Anger, pain. Painful to hear.",
      "ref_norm": "ANGOR PAIN PAINFUL TO HEAR",
      "hyp_norm": "ANGER PAIN PAINFUL TO HEAR",
      "duration_s": 4.41,
      "infer_time_s": 0.862,
      "rtf": 0.1955,
      "wer": 0.2
    },
    {
      "id": "121-121726-0003",
      "ref": "HAY FEVER A HEART TROUBLE CAUSED BY FALLING IN LOVE WITH A GRASS WIDOW",
      "hyp": "Hey, fever . A heart trouble caused by falling in love with a grass widow.",
      "ref_norm": "HAY FEVER A HEART TROUBLE CAUSED BY FALLING IN LOVE WITH A GRASS WIDOW",
      "hyp_norm": "HEY FEVER A HEART TROUBLE CAUSED BY FALLING IN LOVE WITH A GRASS WIDOW",
      "duration_s": 6.755,
      "infer_time_s": 1.419,
      "rtf": 0.2101,
      "wer": 0.0714
    },
    {
      "id": "121-121726-0004",
      "ref": "HEAVEN A GOOD PLACE TO BE RAISED TO",
      "hyp": "Heaven, a good place to be raised too.",
      "ref_norm": "HEAVEN A GOOD PLACE TO BE RAISED TO",
      "hyp_norm": "HEAVEN A GOOD PLACE TO BE RAISED TOO",
      "duration_s": 4.02,
      "infer_time_s": 0.963,
      "rtf": 0.2395,
      "wer": 0.125
    },
    {
      "id": "121-121726-0005",
      "ref": "HEDGE A FENCE",
      "hyp": "Hedge. A fence.",
      "ref_norm": "HEDGE A FENCE",
      "hyp_norm": "HEDGE A FENCE",
      "duration_s": 3.1,
      "infer_time_s": 0.528,
      "rtf": 0.1703,
      "wer": 0.0
    },
    {
      "id": "121-121726-0006",
      "ref": "HEREDITY THE CAUSE OF ALL OUR FAULTS",
      "hyp": "Heredity. The cause of all our faults.",
      "ref_norm": "HEREDITY THE CAUSE OF ALL OUR FAULTS",
      "hyp_norm": "HEREDITY THE CAUSE OF ALL OUR FAULTS",
      "duration_s": 3.895,
      "infer_time_s": 0.784,
      "rtf": 0.2013,
      "wer": 0.0
    },
    {
      "id": "121-121726-0007",
      "ref": "HORSE SENSE A DEGREE OF WISDOM THAT KEEPS ONE FROM BETTING ON THE RACES",
      "hyp": "Horse sense , a degree of wisdom that keeps one from betting on the races.",
      "ref_norm": "HORSE SENSE A DEGREE OF WISDOM THAT KEEPS ONE FROM BETTING ON THE RACES",
      "hyp_norm": "HORSE SENSE A DEGREE OF WISDOM THAT KEEPS ONE FROM BETTING ON THE RACES",
      "duration_s": 6.73,
      "infer_time_s": 1.417,
      "rtf": 0.2105,
      "wer": 0.0
    },
    {
      "id": "121-121726-0008",
      "ref": "HOSE MAN'S EXCUSE FOR WETTING THE WALK",
      "hyp": "Hose. Man's excuse for wetting the walk.",
      "ref_norm": "HOSE MANS EXCUSE FOR WETTING THE WALK",
      "hyp_norm": "HOSE MANS EXCUSE FOR WETTING THE WALK",
      "duration_s": 4.99,
      "infer_time_s": 0.968,
      "rtf": 0.194,
      "wer": 0.0
    },
    {
      "id": "121-121726-0009",
      "ref": "HOTEL A PLACE WHERE A GUEST OFTEN GIVES UP GOOD DOLLARS FOR POOR QUARTERS",
      "hyp": "Hotel. A place where a guest often gives up good dollars for poor quarters.",
      "ref_norm": "HOTEL A PLACE WHERE A GUEST OFTEN GIVES UP GOOD DOLLARS FOR POOR QUARTERS",
      "hyp_norm": "HOTEL A PLACE WHERE A GUEST OFTEN GIVES UP GOOD DOLLARS FOR POOR QUARTERS",
      "duration_s": 7.26,
      "infer_time_s": 1.332,
      "rtf": 0.1834,
      "wer": 0.0
    },
    {
      "id": "121-121726-0010",
      "ref": "HOUSECLEANING A DOMESTIC UPHEAVAL THAT MAKES IT EASY FOR THE GOVERNMENT TO ENLIST ALL THE SOLDIERS IT NEEDS",
      "hyp": "House cleaning . A domestic upheaval that makes it easy for the government to enlist all the soldiers it needs.",
      "ref_norm": "HOUSECLEANING A DOMESTIC UPHEAVAL THAT MAKES IT EASY FOR THE GOVERNMENT TO ENLIST ALL THE SOLDIERS IT NEEDS",
      "hyp_norm": "HOUSE CLEANING A DOMESTIC UPHEAVAL THAT MAKES IT EASY FOR THE GOVERNMENT TO ENLIST ALL THE SOLDIERS IT NEEDS",
      "duration_s": 9.81,
      "infer_time_s": 1.883,
      "rtf": 0.192,
      "wer": 0.1111
    },
    {
      "id": "121-121726-0011",
      "ref": "HUSBAND THE NEXT THING TO A WIFE",
      "hyp": "Husband. The next thing to a wife.",
      "ref_norm": "HUSBAND THE NEXT THING TO A WIFE",
      "hyp_norm": "HUSBAND THE NEXT THING TO A WIFE",
      "duration_s": 4.035,
      "infer_time_s": 0.875,
      "rtf": 0.2168,
      "wer": 0.0
    },
    {
      "id": "121-121726-0012",
      "ref": "HUSSY WOMAN AND BOND TIE",
      "hyp": "Hussy woman and bond , tie.",
      "ref_norm": "HUSSY WOMAN AND BOND TIE",
      "hyp_norm": "HUSSY WOMAN AND BOND TIE",
      "duration_s": 4.045,
      "infer_time_s": 0.821,
      "rtf": 0.2029,
      "wer": 0.0
    },
    {
      "id": "121-121726-0013",
      "ref": "TIED TO A WOMAN",
      "hyp": "Tied to a woman.",
      "ref_norm": "TIED TO A WOMAN",
      "hyp_norm": "TIED TO A WOMAN",
      "duration_s": 2.49,
      "infer_time_s": 0.578,
      "rtf": 0.2321,
      "wer": 0.0
    },
    {
      "id": "121-121726-0014",
      "ref": "HYPOCRITE A HORSE DEALER",
      "hyp": "Hypocrite. A horse dealer.",
      "ref_norm": "HYPOCRITE A HORSE DEALER",
      "hyp_norm": "HYPOCRITE A HORSE DEALER",
      "duration_s": 3.165,
      "infer_time_s": 0.677,
      "rtf": 0.2138,
      "wer": 0.0
    },
    {
      "id": "121-123852-0000",
      "ref": "THOSE PRETTY WRONGS THAT LIBERTY COMMITS WHEN I AM SOMETIME ABSENT FROM THY HEART THY BEAUTY AND THY YEARS FULL WELL BEFITS FOR STILL TEMPTATION FOLLOWS WHERE THOU ART",
      "hyp": "Those pretty wrongs that liberty commits. When I am some time absent from thy heart , thy beauty and thy years fall well be fits, for still temptation follows where thou art.",
      "ref_norm": "THOSE PRETTY WRONGS THAT LIBERTY COMMITS WHEN I AM SOMETIME ABSENT FROM THY HEART THY BEAUTY AND THY YEARS FULL WELL BEFITS FOR STILL TEMPTATION FOLLOWS WHERE THOU ART",
      "hyp_norm": "THOSE PRETTY WRONGS THAT LIBERTY COMMITS WHEN I AM SOME TIME ABSENT FROM THY HEART THY BEAUTY AND THY YEARS FALL WELL BE FITS FOR STILL TEMPTATION FOLLOWS WHERE THOU ART",
      "duration_s": 17.695,
      "infer_time_s": 3.303,
      "rtf": 0.1867,
      "wer": 0.1724
    },
    {
      "id": "121-123852-0001",
      "ref": "AY ME",
      "hyp": "I me.",
      "ref_norm": "AY ME",
      "hyp_norm": "I ME",
      "duration_s": 1.87,
      "infer_time_s": 0.297,
      "rtf": 0.1586,
      "wer": 0.5
    },
    {
      "id": "121-123852-0002",
      "ref": "NO MATTER THEN ALTHOUGH MY FOOT DID STAND UPON THE FARTHEST EARTH REMOV'D FROM THEE FOR NIMBLE THOUGHT CAN JUMP BOTH SEA AND LAND AS SOON AS THINK THE PLACE WHERE HE WOULD BE BUT AH",
      "hyp": "No matter then , although my foot did stand upon the farthest earth , removed from thee , for nimble thought can jump both sea and land , as soon as think the place where he would be. But ah.",
      "ref_norm": "NO MATTER THEN ALTHOUGH MY FOOT DID STAND UPON THE FARTHEST EARTH REMOVD FROM THEE FOR NIMBLE THOUGHT CAN JUMP BOTH SEA AND LAND AS SOON AS THINK THE PLACE WHERE HE WOULD BE BUT AH",
      "hyp_norm": "NO MATTER THEN ALTHOUGH MY FOOT DID STAND UPON THE FARTHEST EARTH REMOVED FROM THEE FOR NIMBLE THOUGHT CAN JUMP BOTH SEA AND LAND AS SOON AS THINK THE PLACE WHERE HE WOULD BE BUT AH",
      "duration_s": 17.285,
      "infer_time_s": 3.724,
      "rtf": 0.2155,
      "wer": 0.0278
    },
    {
      "id": "121-123852-0003",
      "ref": "THOUGHT KILLS ME THAT I AM NOT THOUGHT TO LEAP LARGE LENGTHS OF MILES WHEN THOU ART GONE BUT THAT SO MUCH OF EARTH AND WATER WROUGHT I MUST ATTEND TIME'S LEISURE WITH MY MOAN RECEIVING NOUGHT BY ELEMENTS SO SLOW BUT HEAVY TEARS BADGES OF EITHER'S WOE",
      "hyp": "Thought kills me that I am not thought, to leap large lengths of miles when thou art gone, but that so much of earth and water rot, I must attend, time's leisure with my moan , receiving not , by elements so slow, but heavy tears, badges of either's woe.",
      "ref_norm": "THOUGHT KILLS ME THAT I AM NOT THOUGHT TO LEAP LARGE LENGTHS OF MILES WHEN THOU ART GONE BUT THAT SO MUCH OF EARTH AND WATER WROUGHT I MUST ATTEND TIMES LEISURE WITH MY MOAN RECEIVING NOUGHT BY ELEMENTS SO SLOW BUT HEAVY TEARS BADGES OF EITHERS WOE",
      "hyp_norm": "THOUGHT KILLS ME THAT I AM NOT THOUGHT TO LEAP LARGE LENGTHS OF MILES WHEN THOU ART GONE BUT THAT SO MUCH OF EARTH AND WATER ROT I MUST ATTEND TIMES LEISURE WITH MY MOAN RECEIVING NOT BY ELEMENTS SO SLOW BUT HEAVY TEARS BADGES OF EITHERS WOE",
      "duration_s": 23.505,
      "infer_time_s": 5.126,
      "rtf": 0.2181,
      "wer": 0.0417
    },
    {
      "id": "121-123852-0004",
      "ref": "MY HEART DOTH PLEAD THAT THOU IN HIM DOST LIE A CLOSET NEVER PIERC'D WITH CRYSTAL EYES BUT THE DEFENDANT DOTH THAT PLEA DENY AND SAYS IN HIM THY FAIR APPEARANCE LIES",
      "hyp": "My heart doth plead that thou in him dost lie , a closet never pierced with crystal eyes, but the defendant doth that plea deny, and says in him thy fair appearance lies.",
      "ref_norm": "MY HEART DOTH PLEAD THAT THOU IN HIM DOST LIE A CLOSET NEVER PIERCD WITH CRYSTAL EYES BUT THE DEFENDANT DOTH THAT PLEA DENY AND SAYS IN HIM THY FAIR APPEARANCE LIES",
      "hyp_norm": "MY HEART DOTH PLEAD THAT THOU IN HIM DOST LIE A CLOSET NEVER PIERCED WITH CRYSTAL EYES BUT THE DEFENDANT DOTH THAT PLEA DENY AND SAYS IN HIM THY FAIR APPEARANCE LIES",
      "duration_s": 16.29,
      "infer_time_s": 3.461,
      "rtf": 0.2124,
      "wer": 0.0312
    },
    {
      "id": "121-123859-0000",
      "ref": "YOU ARE MY ALL THE WORLD AND I MUST STRIVE TO KNOW MY SHAMES AND PRAISES FROM YOUR TONGUE NONE ELSE TO ME NOR I TO NONE ALIVE THAT MY STEEL'D SENSE OR CHANGES RIGHT OR WRONG",
      "hyp": "You are my all the world , and I must strive to know my shames and praises from your tongue. None else to me , nor I to none alive , that my stealed sense or changes right or wrong.",
      "ref_norm": "YOU ARE MY ALL THE WORLD AND I MUST STRIVE TO KNOW MY SHAMES AND PRAISES FROM YOUR TONGUE NONE ELSE TO ME NOR I TO NONE ALIVE THAT MY STEELD SENSE OR CHANGES RIGHT OR WRONG",
      "hyp_norm": "YOU ARE MY ALL THE WORLD AND I MUST STRIVE TO KNOW MY SHAMES AND PRAISES FROM YOUR TONGUE NONE ELSE TO ME NOR I TO NONE ALIVE THAT MY STEALED SENSE OR CHANGES RIGHT OR WRONG",
      "duration_s": 17.39,
      "infer_time_s": 3.922,
      "rtf": 0.2255,
      "wer": 0.027
    },
    {
      "id": "121-123859-0001",
      "ref": "O TIS THE FIRST TIS FLATTERY IN MY SEEING AND MY GREAT MIND MOST KINGLY DRINKS IT UP MINE EYE WELL KNOWS WHAT WITH HIS GUST IS GREEING AND TO HIS PALATE DOTH PREPARE THE CUP IF IT BE POISON'D TIS THE LESSER SIN THAT MINE EYE LOVES IT AND DOTH FIRST BEGIN",
      "hyp": "Oh, tis the first, tis flattery in my seeing , and my great mind most kingly drinks it up. Mine eye well knows what with his gust is green, and to his palate doth prepare the cup . If it be poisoned , tis the lesser sin , that mine eye loves it, and doth first begin.",
      "ref_norm": "O TIS THE FIRST TIS FLATTERY IN MY SEEING AND MY GREAT MIND MOST KINGLY DRINKS IT UP MINE EYE WELL KNOWS WHAT WITH HIS GUST IS GREEING AND TO HIS PALATE DOTH PREPARE THE CUP IF IT BE POISOND TIS THE LESSER SIN THAT MINE EYE LOVES IT AND DOTH FIRST BEGIN",
      "hyp_norm": "OH TIS THE FIRST TIS FLATTERY IN MY SEEING AND MY GREAT MIND MOST KINGLY DRINKS IT UP MINE EYE WELL KNOWS WHAT WITH HIS GUST IS GREEN AND TO HIS PALATE DOTH PREPARE THE CUP IF IT BE POISONED TIS THE LESSER SIN THAT MINE EYE LOVES IT AND DOTH FIRST BEGIN",
      "duration_s": 25.395,
      "infer_time_s": 5.939,
      "rtf": 0.2339,
      "wer": 0.0566
    },
    {
      "id": "121-123859-0002",
      "ref": "BUT RECKONING TIME WHOSE MILLION'D ACCIDENTS CREEP IN TWIXT VOWS AND CHANGE DECREES OF KINGS TAN SACRED BEAUTY BLUNT THE SHARP'ST INTENTS DIVERT STRONG MINDS TO THE COURSE OF ALTERING THINGS ALAS WHY FEARING OF TIME'S TYRANNY MIGHT I NOT THEN SAY NOW I LOVE YOU BEST WHEN I WAS CERTAIN O'ER INCERTAINTY CROWNING THE PRESENT DOUBTING OF THE REST",
      "hyp": "But reckoning time , whose million ed accidents creep in twixt vows , and changed decrees of kings, tan sacred beauty blunt the sharpest intents , divert strong minds to the course of altering things. Alas , why fearing of time's tyranny, might I not then say, \"Now I love you best,\" when I was certain or in certainty, crowning the present, doubting of the rest.",
      "ref_norm": "BUT RECKONING TIME WHOSE MILLIOND ACCIDENTS CREEP IN TWIXT VOWS AND CHANGE DECREES OF KINGS TAN SACRED BEAUTY BLUNT THE SHARPST INTENTS DIVERT STRONG MINDS TO THE COURSE OF ALTERING THINGS ALAS WHY FEARING OF TIMES TYRANNY MIGHT I NOT THEN SAY NOW I LOVE YOU BEST WHEN I WAS CERTAIN OER INCERTAINTY CROWNING THE PRESENT DOUBTING OF THE REST",
      "hyp_norm": "BUT RECKONING TIME WHOSE MILLION ED ACCIDENTS CREEP IN TWIXT VOWS AND CHANGED DECREES OF KINGS TAN SACRED BEAUTY BLUNT THE SHARPEST INTENTS DIVERT STRONG MINDS TO THE COURSE OF ALTERING THINGS ALAS WHY FEARING OF TIMES TYRANNY MIGHT I NOT THEN SAY NOW I LOVE YOU BEST WHEN I WAS CERTAIN OR IN CERTAINTY CROWNING THE PRESENT DOUBTING OF THE REST",
      "duration_s": 30.04,
      "infer_time_s": 7.099,
      "rtf": 0.2363,
      "wer": 0.1167
    },
    {
      "id": "121-123859-0003",
      "ref": "LOVE IS A BABE THEN MIGHT I NOT SAY SO TO GIVE FULL GROWTH TO THAT WHICH STILL DOTH GROW",
      "hyp": "Love is a babe. Then might I not say so . To give full growth to that which still doth grow.",
      "ref_norm": "LOVE IS A BABE THEN MIGHT I NOT SAY SO TO GIVE FULL GROWTH TO THAT WHICH STILL DOTH GROW",
      "hyp_norm": "LOVE IS A BABE THEN MIGHT I NOT SAY SO TO GIVE FULL GROWTH TO THAT WHICH STILL DOTH GROW",
      "duration_s": 10.825,
      "infer_time_s": 2.171,
      "rtf": 0.2005,
      "wer": 0.0
    },
    {
      "id": "121-123859-0004",
      "ref": "SO I RETURN REBUK'D TO MY CONTENT AND GAIN BY ILL THRICE MORE THAN I HAVE SPENT",
      "hyp": "So I return rebuked to my content, and gain by ill thrice more than I have spent.",
      "ref_norm": "SO I RETURN REBUKD TO MY CONTENT AND GAIN BY ILL THRICE MORE THAN I HAVE SPENT",
      "hyp_norm": "SO I RETURN REBUKED TO MY CONTENT AND GAIN BY ILL THRICE MORE THAN I HAVE SPENT",
      "duration_s": 9.505,
      "infer_time_s": 1.871,
      "rtf": 0.1969,
      "wer": 0.0588
    },
    {
      "id": "121-127105-0000",
      "ref": "IT WAS THIS OBSERVATION THAT DREW FROM DOUGLAS NOT IMMEDIATELY BUT LATER IN THE EVENING A REPLY THAT HAD THE INTERESTING CONSEQUENCE TO WHICH I CALL ATTENTION",
      "hyp": "It was this observation that drew from Douglas , not immediately but later in the evening, a reply that had the interesting consequence to which I call attention.",
      "ref_norm": "IT WAS THIS OBSERVATION THAT DREW FROM DOUGLAS NOT IMMEDIATELY BUT LATER IN THE EVENING A REPLY THAT HAD THE INTERESTING CONSEQUENCE TO WHICH I CALL ATTENTION",
      "hyp_norm": "IT WAS THIS OBSERVATION THAT DREW FROM DOUGLAS NOT IMMEDIATELY BUT LATER IN THE EVENING A REPLY THAT HAD THE INTERESTING CONSEQUENCE TO WHICH I CALL ATTENTION",
      "duration_s": 9.875,
      "infer_time_s": 2.285,
      "rtf": 0.2314,
      "wer": 0.0
    },
    {
      "id": "121-127105-0001",
      "ref": "SOMEONE ELSE TOLD A STORY NOT PARTICULARLY EFFECTIVE WHICH I SAW HE WAS NOT FOLLOWING",
      "hyp": "Someone else told a story. Not particularly effective , which I saw he was not following.",
      "ref_norm": "SOMEONE ELSE TOLD A STORY NOT PARTICULARLY EFFECTIVE WHICH I SAW HE WAS NOT FOLLOWING",
      "hyp_norm": "SOMEONE ELSE TOLD A STORY NOT PARTICULARLY EFFECTIVE WHICH I SAW HE WAS NOT FOLLOWING",
      "duration_s": 5.025,
      "infer_time_s": 1.327,
      "rtf": 0.2641,
      "wer": 0.0
    },
    {
      "id": "121-127105-0002",
      "ref": "CRIED ONE OF THE WOMEN HE TOOK NO NOTICE OF HER HE LOOKED AT ME BUT AS IF INSTEAD OF ME HE SAW WHAT HE SPOKE OF",
      "hyp": "Cried one of the women. He took no notice of her. He looked at me, but as if , instead of me, he saw what he spoke of.",
      "ref_norm": "CRIED ONE OF THE WOMEN HE TOOK NO NOTICE OF HER HE LOOKED AT ME BUT AS IF INSTEAD OF ME HE SAW WHAT HE SPOKE OF",
      "hyp_norm": "CRIED ONE OF THE WOMEN HE TOOK NO NOTICE OF HER HE LOOKED AT ME BUT AS IF INSTEAD OF ME HE SAW WHAT HE SPOKE OF",
      "duration_s": 7.495,
      "infer_time_s": 2.31,
      "rtf": 0.3082,
      "wer": 0.0
    },
    {
      "id": "121-127105-0003",
      "ref": "THERE WAS A UNANIMOUS GROAN AT THIS AND MUCH REPROACH AFTER WHICH IN HIS PREOCCUPIED WAY HE EXPLAINED",
      "hyp": "There was a unanimous groan at this, and much reproach. After which, in his preoccupied way, he explained.",
      "ref_norm": "THERE WAS A UNANIMOUS GROAN AT THIS AND MUCH REPROACH AFTER WHICH IN HIS PREOCCUPIED WAY HE EXPLAINED",
      "hyp_norm": "THERE WAS A UNANIMOUS GROAN AT THIS AND MUCH REPROACH AFTER WHICH IN HIS PREOCCUPIED WAY HE EXPLAINED",
      "duration_s": 7.725,
      "infer_time_s": 1.903,
      "rtf": 0.2464,
      "wer": 0.0
    },
    {
      "id": "121-127105-0004",
      "ref": "THE STORY'S WRITTEN",
      "hyp": "The story's written.",
      "ref_norm": "THE STORYS WRITTEN",
      "hyp_norm": "THE STORYS WRITTEN",
      "duration_s": 2.11,
      "infer_time_s": 0.526,
      "rtf": 0.2495,
      "wer": 0.0
    },
    {
      "id": "121-127105-0005",
      "ref": "I COULD WRITE TO MY MAN AND ENCLOSE THE KEY HE COULD SEND DOWN THE PACKET AS HE FINDS IT",
      "hyp": "I could write to my man and enclose the key . He could send down the packet as he finds it.",
      "ref_norm": "I COULD WRITE TO MY MAN AND ENCLOSE THE KEY HE COULD SEND DOWN THE PACKET AS HE FINDS IT",
      "hyp_norm": "I COULD WRITE TO MY MAN AND ENCLOSE THE KEY HE COULD SEND DOWN THE PACKET AS HE FINDS IT",
      "duration_s": 5.82,
      "infer_time_s": 1.587,
      "rtf": 0.2727,
      "wer": 0.0
    },
    {
      "id": "121-127105-0006",
      "ref": "THE OTHERS RESENTED POSTPONEMENT BUT IT WAS JUST HIS SCRUPLES THAT CHARMED ME",
      "hyp": "The others resented postpon ement, but it was just his scruples that charmed me.",
      "ref_norm": "THE OTHERS RESENTED POSTPONEMENT BUT IT WAS JUST HIS SCRUPLES THAT CHARMED ME",
      "hyp_norm": "THE OTHERS RESENTED POSTPON EMENT BUT IT WAS JUST HIS SCRUPLES THAT CHARMED ME",
      "duration_s": 4.725,
      "infer_time_s": 1.386,
      "rtf": 0.2934,
      "wer": 0.1538
    },
    {
      "id": "121-127105-0007",
      "ref": "TO THIS HIS ANSWER WAS PROMPT OH THANK GOD NO AND IS THE RECORD YOURS",
      "hyp": "To this, his answer was prompt: \"Oh, thank God, no.\" And is the record yours?",
      "ref_norm": "TO THIS HIS ANSWER WAS PROMPT OH THANK GOD NO AND IS THE RECORD YOURS",
      "hyp_norm": "TO THIS HIS ANSWER WAS PROMPT OH THANK GOD NO AND IS THE RECORD YOURS",
      "duration_s": 5.79,
      "infer_time_s": 1.545,
      "rtf": 0.2669,
      "wer": 0.0
    },
    {
      "id": "121-127105-0008",
      "ref": "HE HUNG FIRE AGAIN A WOMAN'S",
      "hyp": "He hung fire again \u2014a woman's.",
      "ref_norm": "HE HUNG FIRE AGAIN A WOMANS",
      "hyp_norm": "HE HUNG FIRE AGAIN A WOMANS",
      "duration_s": 2.76,
      "infer_time_s": 0.685,
      "rtf": 0.2482,
      "wer": 0.0
    },
    {
      "id": "121-127105-0009",
      "ref": "SHE HAS BEEN DEAD THESE TWENTY YEARS",
      "hyp": "She has been dead these twenty years.",
      "ref_norm": "SHE HAS BEEN DEAD THESE TWENTY YEARS",
      "hyp_norm": "SHE HAS BEEN DEAD THESE TWENTY YEARS",
      "duration_s": 2.29,
      "infer_time_s": 0.681,
      "rtf": 0.2973,
      "wer": 0.0
    },
    {
      "id": "121-127105-0010",

Download .txt

gitextract_j2uu9au5/

├── .dockerignore
├── .github/
│   └── workflows/
│       ├── ci.yml
│       └── publish-docker.yml
├── .gitignore
├── AGENTS.md
├── CHANGES.md
├── CLAUDE.md
├── CONTRIBUTING.md
├── DEV_NOTES.md
├── Dockerfile
├── Dockerfile.cpu
├── LICENSE
├── README.md
├── benchmark_mlx_simul.py
├── benchmarks/
│   ├── h100/
│   │   ├── bench_voxtral_hf_batch.py
│   │   ├── bench_voxtral_vllm_realtime.py
│   │   ├── generate_figures.py
│   │   └── results.json
│   └── m5/
│       ├── bench_0.6b_simul_500.json
│       ├── bench_1.7b_simul_500.json
│       ├── generate_figures.py
│       └── results.json
├── chrome-extension/
│   ├── README.md
│   ├── background.js
│   ├── manifest.json
│   ├── requestPermissions.html
│   ├── requestPermissions.js
│   └── sidepanel.js
├── compose.yml
├── docs/
│   ├── API.md
│   ├── alignement_principles.md
│   ├── default_and_custom_models.md
│   ├── supported_languages.md
│   ├── technical_integration.md
│   └── troubleshooting.md
├── pyproject.toml
├── scripts/
│   ├── alignment_heads_qwen3_asr_0.6B.json
│   ├── alignment_heads_qwen3_asr_1.7B.json
│   ├── alignment_heads_qwen3_asr_1.7B_v2.json
│   ├── convert_hf_whisper.py
│   ├── create_long_samples.py
│   ├── detect_alignment_heads_qwen3.py
│   ├── determine_alignment_heads.py
│   ├── generate_architecture.py
│   ├── python_support_matrix.py
│   ├── run_scatter_benchmark.py
│   └── sync_extension.py
├── tests/
│   ├── __init__.py
│   └── test_pipeline.py
└── whisperlivekit/
    ├── __init__.py
    ├── audio_processor.py
    ├── backend_support.py
    ├── basic_server.py
    ├── benchmark/
    │   ├── __init__.py
    │   ├── compat.py
    │   ├── datasets.py
    │   ├── metrics.py
    │   ├── report.py
    │   └── runner.py
    ├── cascade_bridge.py
    ├── cli.py
    ├── config.py
    ├── core.py
    ├── deepgram_compat.py
    ├── diarization/
    │   ├── __init__.py
    │   ├── diart_backend.py
    │   ├── sortformer_backend.py
    │   └── utils.py
    ├── diff_protocol.py
    ├── ffmpeg_manager.py
    ├── local_agreement/
    │   ├── __init__.py
    │   ├── backends.py
    │   ├── online_asr.py
    │   └── whisper_online.py
    ├── metrics.py
    ├── metrics_collector.py
    ├── model_mapping.py
    ├── model_paths.py
    ├── parse_args.py
    ├── qwen3_asr.py
    ├── qwen3_mlx_asr.py
    ├── qwen3_mlx_simul.py
    ├── qwen3_simul.py
    ├── qwen3_simul_kv.py
    ├── session_asr_proxy.py
    ├── silero_vad_iterator.py
    ├── silero_vad_models/
    │   ├── __init__.py
    │   ├── silero_vad.jit
    │   ├── silero_vad.onnx
    │   ├── silero_vad_16k_op15.onnx
    │   └── silero_vad_half.onnx
    ├── simul_whisper/
    │   ├── __init__.py
    │   ├── align_att_base.py
    │   ├── backend.py
    │   ├── beam.py
    │   ├── config.py
    │   ├── decoder_state.py
    │   ├── eow_detection.py
    │   ├── mlx/
    │   │   ├── __init__.py
    │   │   ├── decoder_state.py
    │   │   ├── decoders.py
    │   │   └── simul_whisper.py
    │   ├── mlx_encoder.py
    │   ├── simul_whisper.py
    │   └── token_buffer.py
    ├── test_client.py
    ├── test_data.py
    ├── test_harness.py
    ├── thread_safety.py
    ├── timed_objects.py
    ├── tokens_alignment.py
    ├── vllm_realtime.py
    ├── voxtral_hf_streaming.py
    ├── voxtral_mlx/
    │   ├── __init__.py
    │   ├── loader.py
    │   ├── model.py
    │   └── spectrogram.py
    ├── voxtral_mlx_asr.py
    ├── warmup.py
    ├── web/
    │   ├── __init__.py
    │   ├── live_transcription.css
    │   ├── live_transcription.html
    │   ├── live_transcription.js
    │   ├── pcm_worklet.js
    │   ├── recorder_worker.js
    │   └── web_interface.py
    └── whisper/
        ├── __init__.py
        ├── __main__.py
        ├── assets/
        │   ├── __init__.py
        │   ├── gpt2.tiktoken
        │   ├── mel_filters.npz
        │   └── multilingual.tiktoken
        ├── audio.py
        ├── decoding.py
        ├── model.py
        ├── normalizers/
        │   ├── __init__.py
        │   ├── basic.py
        │   ├── english.json
        │   └── english.py
        ├── timing.py
        ├── tokenizer.py
        ├── transcribe.py
        ├── triton_ops.py
        ├── utils.py
        ├── val.py
        └── version.py

Download .txt

SYMBOL INDEX (1208 symbols across 89 files)

FILE: benchmark_mlx_simul.py
  function load_librispeech_utterances (line 63) | def load_librispeech_utterances(data_dir: str, max_utterances: int = 0):
  function load_librispeech_chapters (line 98) | def load_librispeech_chapters(data_dir: str):
  function transcribe_simul (line 153) | def transcribe_simul(asr, audio, chunk_seconds=2.0):
  function transcribe_single_shot (line 191) | def transcribe_single_shot(asr, audio):
  function normalize_text (line 216) | def normalize_text(text: str) -> str:
  function main (line 224) | def main():

FILE: benchmarks/h100/bench_voxtral_hf_batch.py
  function norm (line 14) | def norm(t):
  function load_audio (line 17) | def load_audio(path):
  function transcribe_batch (line 30) | def transcribe_batch(audio_np):

FILE: benchmarks/h100/bench_voxtral_vllm_realtime.py
  function norm (line 12) | def norm(t):
  function transcribe (line 15) | async def transcribe(audio_path, max_tokens=4096):
  function main (line 57) | async def main():

FILE: benchmarks/h100/generate_figures.py
  function _save (line 38) | def _save(fig, name):
  function fig_scatter_clean (line 48) | def fig_scatter_clean():
  function fig_scatter_acl6060 (line 101) | def fig_scatter_acl6060():
  function fig_bars (line 143) | def fig_bars():
  function fig_robustness (line 195) | def fig_robustness():
  function fig_per_talk (line 233) | def fig_per_talk():

FILE: benchmarks/m5/generate_figures.py
  function _save (line 43) | def _save(fig, name):
  function fig_m5_vs_h100 (line 50) | def fig_m5_vs_h100():

FILE: chrome-extension/requestPermissions.js
  function getUserPermission (line 5) | async function getUserPermission() {

FILE: chrome-extension/sidepanel.js
  function run (line 3) | async function run() {

FILE: scripts/convert_hf_whisper.py
  function _load_state_dict (line 23) | def _load_state_dict(repo_path: Path) -> Dict[str, torch.Tensor]:
  function _load_config (line 45) | def _load_config(repo_path: Path) -> Dict:
  function _derive_audio_ctx (line 55) | def _derive_audio_ctx(chunk_length: float) -> Tuple[int, int]:
  function _build_dims (line 68) | def _build_dims(config: Dict, chunk_length: float) -> Dict:
  function _trim_positional_embedding (line 88) | def _trim_positional_embedding(
  function convert_checkpoint (line 105) | def convert_checkpoint(hf_path: Path, output_path: Path, chunk_length: f...
  function parse_args (line 119) | def parse_args() -> argparse.Namespace:
  function main (line 143) | def main():

FILE: scripts/create_long_samples.py
  function save_wav (line 20) | def save_wav(path, audio, sr=SR):
  function decode_audio (line 30) | def decode_audio(audio_bytes):
  function download_long_librispeech (line 36) | def download_long_librispeech(config, lang_code, target_dur=300):
  function download_long_mls (line 76) | def download_long_mls(config, lang_code, target_dur=300):
  function main (line 115) | def main():

FILE: scripts/detect_alignment_heads_qwen3.py
  function _apply_transformers_compat_patches (line 50) | def _apply_transformers_compat_patches():
  function text_similarity (line 137) | def text_similarity(generated: str, reference: str) -> float:
  function load_dataset_clips (line 155) | def load_dataset_clips(name, config, split, limit):
  function get_device (line 177) | def get_device():
  function load_qwen3_asr (line 190) | def load_qwen3_asr(model_id: str, device: torch.device, dtype: torch.dty...
  function find_audio_token_range (line 235) | def find_audio_token_range(input_ids: torch.Tensor, audio_token_id: int)...
  function timestamp_to_audio_token_position (line 244) | def timestamp_to_audio_token_position(
  function run_detection (line 264) | def run_detection(
  function main (line 536) | def main():

FILE: scripts/determine_alignment_heads.py
  function load_dataset_clips (line 33) | def load_dataset_clips(name, config, split, limit):
  function load_clips (line 53) | def load_clips(args):
  function _waveform_from_source (line 62) | def _waveform_from_source(source: AudioInput) -> torch.Tensor:
  function _parse_args (line 67) | def _parse_args():
  function collect_heads (line 125) | def collect_heads(
  function _select_heads_for_visualization (line 181) | def _select_heads_for_visualization(selection, strengths, top_k):
  function _extract_heatmaps (line 193) | def _extract_heatmaps(
  function _plot_heatmaps (line 245) | def _plot_heatmaps(
  function _dump_mask (line 270) | def _dump_mask(mask: torch.Tensor, output_path: str):
  function main (line 277) | def main():

FILE: scripts/generate_architecture.py
  function box (line 36) | def box(x, y, w, h, label, color=C_BORDER, bg=C_BOX_BG, fontsize=8, bold...
  function arrow (line 50) | def arrow(x1, y1, x2, y2, color=C_TEXTDIM, style="->", lw=1.2):
  function section_box (line 55) | def section_box(x, y, w, h, title, bg=C_PANEL, border=C_BORDER, title_co...

FILE: scripts/python_support_matrix.py
  class MatrixRow (line 35) | class MatrixRow:
  class CaseResult (line 87) | class CaseResult:
  function parse_args (line 97) | def parse_args() -> argparse.Namespace:
  function safe_slug (line 115) | def safe_slug(text: str) -> str:
  function status_style (line 119) | def status_style(status: str) -> str:
  function print_line (line 129) | def print_line(message: str, style: str | None = None) -> None:
  function tail_text (line 139) | def tail_text(text: str | None, max_chars: int = 220) -> str:
  function run_command (line 148) | def run_command(
  function detect_gpu_available (line 218) | def detect_gpu_available() -> bool:
  function download_sample (line 232) | def download_sample(repo_root: Path) -> Path:
  function sync_case_environment (line 252) | def sync_case_environment(
  function apply_expected_failure_policy (line 276) | def apply_expected_failure_policy(result: CaseResult) -> CaseResult:
  function build_offline_command (line 298) | def build_offline_command(
  function run_case (line 332) | def run_case(
  function print_summary (line 446) | def print_summary(results: list[CaseResult]) -> None:
  function main (line 519) | def main() -> int:

FILE: scripts/run_scatter_benchmark.py
  function is_backend_available (line 66) | def is_backend_available(backend):
  function get_system_info (line 88) | def get_system_info():
  function run_combo_on_samples (line 103) | async def run_combo_on_samples(combo, samples, lang="en", speed=0):
  function run_all (line 174) | async def run_all(combos, samples, lang="en", speed=0):
  function get_long_samples_for_lang (line 191) | def get_long_samples_for_lang(lang="en"):
  function generate_scatter (line 213) | def generate_scatter(results, system_info, output_path, n_samples, lang=...
  function main (line 348) | def main():

FILE: scripts/sync_extension.py
  function sync_extension_files (line 7) | def sync_extension_files():

FILE: tests/test_pipeline.py
  function backend_kwargs (line 83) | def backend_kwargs(backend: str) -> dict:
  function samples (line 92) | def samples():
  function short_sample (line 99) | def short_sample(samples):
  function medium_sample (line 104) | def medium_sample(samples):
  function meeting_sample (line 109) | def meeting_sample(samples):
  function test_transcription_quality (line 119) | async def test_transcription_quality(backend, short_sample):
  function test_medium_clip_timing_spans_audio (line 141) | async def test_medium_clip_timing_spans_audio(backend, medium_sample):
  function test_text_appears_progressively (line 173) | async def test_text_appears_progressively(backend, medium_sample):
  function test_buffer_lifecycle (line 207) | async def test_buffer_lifecycle(backend, medium_sample):
  function test_silence_flushes_all_words (line 232) | async def test_silence_flushes_all_words(backend, medium_sample):
  function test_play_pause_resume (line 290) | async def test_play_pause_resume(backend, medium_sample):
  function test_multiple_pauses (line 336) | async def test_multiple_pauses(backend, medium_sample):
  function test_short_pause_no_silence (line 378) | async def test_short_pause_no_silence(backend, medium_sample):
  function test_abrupt_cutoff (line 413) | async def test_abrupt_cutoff(backend, medium_sample):
  function test_timing_precision_and_monotonicity (line 443) | async def test_timing_precision_and_monotonicity(backend, medium_sample):
  function test_silence_timing_reflects_pause (line 469) | async def test_silence_timing_reflects_pause(backend, short_sample):
  function test_snapshot_history (line 503) | async def test_snapshot_history(backend, medium_sample):
  function test_metrics_collected (line 532) | async def test_metrics_collected(backend, short_sample):

FILE: whisperlivekit/audio_processor.py
  function get_all_from_queue (line 28) | async def get_all_from_queue(queue: asyncio.Queue) -> Union[object, Sile...
  class AudioProcessor (line 54) | class AudioProcessor:
    method __init__ (line 60) | def __init__(self, **kwargs: Any) -> None:
    method _push_silence_event (line 140) | async def _push_silence_event(self) -> None:
    method _begin_silence (line 148) | async def _begin_silence(self, at_sample: Optional[int] = None) -> None:
    method _end_silence (line 168) | async def _end_silence(self, at_sample: Optional[int] = None) -> None:
    method _enqueue_active_audio (line 188) | async def _enqueue_active_audio(self, pcm_chunk: np.ndarray) -> None:
    method _slice_before_silence (line 196) | def _slice_before_silence(self, pcm_array: np.ndarray, chunk_sample_st...
    method convert_pcm_to_float (line 207) | def convert_pcm_to_float(self, pcm_buffer: Union[bytes, bytearray]) ->...
    method get_current_state (line 211) | async def get_current_state(self) -> State:
    method ffmpeg_stdout_reader (line 230) | async def ffmpeg_stdout_reader(self) -> None:
    method _finish_transcription (line 280) | async def _finish_transcription(self) -> None:
    method transcription_processor (line 309) | async def transcription_processor(self) -> None:
    method diarization_processor (line 421) | async def diarization_processor(self) -> None:
    method translation_processor (line 444) | async def translation_processor(self) -> None:
    method results_formatter (line 479) | async def results_formatter(self) -> AsyncGenerator[FrontData, None]:
    method create_tasks (line 530) | async def create_tasks(self) -> AsyncGenerator[FrontData, None]:
    method watchdog (line 571) | async def watchdog(self, tasks_to_monitor: List[asyncio.Task]) -> None:
    method cleanup (line 598) | async def cleanup(self) -> None:
    method _processing_tasks_done (line 625) | def _processing_tasks_done(self) -> bool:
    method process_audio (line 636) | async def process_audio(self, message: Optional[bytes]) -> None:
    method handle_pcm_data (line 682) | async def handle_pcm_data(self) -> None:
    method _flush_remaining_pcm (line 734) | async def _flush_remaining_pcm(self) -> None:

FILE: whisperlivekit/backend_support.py
  function module_available (line 8) | def module_available(module_name):
  function mlx_backend_available (line 13) | def mlx_backend_available(warn_on_missing = False):
  function voxtral_hf_backend_available (line 32) | def voxtral_hf_backend_available():
  function faster_backend_available (line 38) | def faster_backend_available(warn_on_missing = False):

FILE: whisperlivekit/basic_server.py
  function lifespan (line 22) | async def lifespan(app: FastAPI):
  function get (line 37) | async def get():
  function health (line 42) | async def health():
  function handle_websocket_results (line 53) | async def handle_websocket_results(websocket, results_generator, diff_tr...
  function websocket_endpoint (line 71) | async def websocket_endpoint(websocket: WebSocket):
  function deepgram_websocket_endpoint (line 134) | async def deepgram_websocket_endpoint(websocket: WebSocket):
  function _convert_to_pcm (line 145) | async def _convert_to_pcm(audio_bytes: bytes) -> bytes:
  function _parse_time_str (line 164) | def _parse_time_str(time_str: str) -> float:
  function _format_openai_response (line 174) | def _format_openai_response(front_data, response_format: str, language: ...
  function _srt_timestamp (line 239) | def _srt_timestamp(seconds: float, fmt: str) -> str:
  function create_transcription (line 250) | async def create_transcription(
  function list_models (line 321) | async def list_models():
  function main (line 336) | def main():

FILE: whisperlivekit/benchmark/compat.py
  function backend_supports_language (line 30) | def backend_supports_language(backend: str, language: str) -> bool:
  function detect_available_backends (line 38) | def detect_available_backends() -> List[str]:
  function resolve_backend (line 85) | def resolve_backend(backend: str) -> str:

FILE: whisperlivekit/benchmark/datasets.py
  class BenchmarkSample (line 33) | class BenchmarkSample:
    method to_dict (line 47) | def to_dict(self) -> Dict:
  function _save_wav (line 218) | def _save_wav(path: Path, audio: np.ndarray, sample_rate: int = 16000) -...
  function _decode_audio (line 234) | def _decode_audio(audio_bytes: bytes) -> tuple:
  function _ensure_datasets (line 241) | def _ensure_datasets():
  function _download_librispeech (line 255) | def _download_librispeech(config: str, n_samples: int, skip: int,
  function _download_mls (line 299) | def _download_mls(config: str, n_samples: int, skip: int,
  function _download_fleurs (line 342) | def _download_fleurs(config: str, n_samples: int, skip: int,
  function _download_ami (line 385) | def _download_ami(max_duration: float = 60.0) -> List[Dict]:
  function _download_catalog_entry (line 444) | def _download_catalog_entry(name: str, spec: Dict) -> List[Dict]:
  function get_benchmark_samples (line 479) | def get_benchmark_samples(

FILE: whisperlivekit/benchmark/metrics.py
  class SampleResult (line 11) | class SampleResult:
    method to_dict (line 51) | def to_dict(self) -> Dict[str, Any]:
  class BenchmarkReport (line 77) | class BenchmarkReport:
    method n_samples (line 89) | def n_samples(self) -> int:
    method total_audio_s (line 93) | def total_audio_s(self) -> float:
    method total_processing_s (line 97) | def total_processing_s(self) -> float:
    method avg_wer (line 101) | def avg_wer(self) -> float:
    method weighted_wer (line 107) | def weighted_wer(self) -> float:
    method avg_rtf (line 119) | def avg_rtf(self) -> float:
    method overall_rtf (line 125) | def overall_rtf(self) -> float:
    method avg_latency_ms (line 131) | def avg_latency_ms(self) -> float:
    method p95_latency_ms (line 136) | def p95_latency_ms(self) -> float:
    method _group_by (line 142) | def _group_by(self, key: str) -> Dict[str, List[SampleResult]]:
    method wer_by_language (line 149) | def wer_by_language(self) -> Dict[str, float]:
    method rtf_by_language (line 155) | def rtf_by_language(self) -> Dict[str, float]:
    method wer_by_category (line 161) | def wer_by_category(self) -> Dict[str, float]:
    method languages (line 168) | def languages(self) -> List[str]:
    method categories (line 172) | def categories(self) -> List[str]:
    method to_dict (line 175) | def to_dict(self) -> Dict[str, Any]:
  function get_system_info (line 208) | def get_system_info() -> Dict[str, Any]:

FILE: whisperlivekit/benchmark/report.py
  function _wer_color (line 20) | def _wer_color(wer: float) -> str:
  function _rtf_color (line 28) | def _rtf_color(rtf: float) -> str:
  function _lat_color (line 36) | def _lat_color(ms: float) -> str:
  function print_report (line 44) | def print_report(report: BenchmarkReport, out: TextIO = sys.stderr) -> N...
  function print_transcriptions (line 143) | def print_transcriptions(report: BenchmarkReport, out: TextIO = sys.stde...
  function write_json (line 159) | def write_json(report: BenchmarkReport, path: str) -> None:

FILE: whisperlivekit/benchmark/runner.py
  class BenchmarkRunner (line 15) | class BenchmarkRunner:
    method __init__ (line 28) | def __init__(
    method run (line 46) | async def run(self) -> BenchmarkReport:
    method _run_sample (line 105) | async def _run_sample(

FILE: whisperlivekit/cascade_bridge.py
  class CascadeBridge (line 24) | class CascadeBridge:
    method __init__ (line 27) | def __init__(self, output_file: TextIO = None):
    method emit_tokens (line 32) | def emit_tokens(self, tokens: List[ASRToken], is_final: bool = False):
    method get_entries (line 48) | def get_entries(self) -> List[dict]:
    method get_text (line 51) | def get_text(self) -> str:
    method save (line 55) | def save(self, path: str):
  function run_stt_to_jsonl (line 62) | def run_stt_to_jsonl(

FILE: whisperlivekit/cli.py
  function _module_available (line 28) | def _module_available(name: str) -> bool:
  function _gpu_info (line 32) | def _gpu_info() -> str:
  function _check_platform (line 212) | def _check_platform(backend: dict) -> bool:
  function _is_installed (line 222) | def _is_installed(backend: dict) -> bool:
  function _check_ffmpeg (line 226) | def _check_ffmpeg() -> bool:
  function _scan_downloaded_models (line 232) | def _scan_downloaded_models() -> dict:
  function print_banner (line 266) | def print_banner(config, host: str, port: int, ssl: bool = False):
  function _model_is_downloaded (line 306) | def _model_is_downloaded(model_entry: dict, downloaded: dict) -> bool:
  function _best_backend_for_model (line 332) | def _best_backend_for_model(model_entry: dict) -> str:
  function cmd_models (line 357) | def cmd_models():
  function _hf_download (line 438) | def _hf_download(repo_id: str, label: str):
  function _resolve_pull_target (line 447) | def _resolve_pull_target(spec: str):
  function cmd_pull (line 544) | def cmd_pull(spec: str):
  function cmd_transcribe (line 568) | def cmd_transcribe(args: list):
  function _transcribe_files_quiet (line 602) | async def _transcribe_files_quiet(parsed):
  function _transcribe_files (line 618) | async def _transcribe_files(parsed):
  function _format_subtitle (line 679) | def _format_subtitle(result, fmt: str) -> str:
  function _subtitle_timestamp (line 710) | def _subtitle_timestamp(seconds: float, fmt: str) -> str:
  function cmd_bench (line 724) | def cmd_bench(args: list):
  function _suppress_logging (line 777) | def _suppress_logging():
  function _run_bench_new (line 788) | async def _run_bench_new(parsed, languages, categories):
  function cmd_listen (line 828) | def cmd_listen(args: list):
  function _listen_quiet (line 863) | async def _listen_quiet(parsed):
  function _listen_main (line 875) | async def _listen_main(parsed):
  function _resolve_run_spec (line 1005) | def _resolve_run_spec(spec: str):
  function cmd_run (line 1030) | def cmd_run(args: list):
  function cmd_rm (line 1098) | def cmd_rm(spec: str):
  function cmd_check (line 1158) | def cmd_check():
  function cmd_diagnose (line 1192) | def cmd_diagnose(args: list):
  function _probe_backend_state (line 1225) | def _probe_backend_state(processor) -> dict:
  function _probe_pipeline_state (line 1295) | def _probe_pipeline_state(processor) -> dict:
  function _diagnose_main (line 1313) | async def _diagnose_main(parsed):
  function _print_version (line 1582) | def _print_version():
  function _print_help (line 1592) | def _print_help():
  function main (line 1630) | def main():

FILE: whisperlivekit/config.py
  class WhisperLiveKitConfig (line 10) | class WhisperLiveKitConfig:
    method __post_init__ (line 79) | def __post_init__(self):
    method from_namespace (line 94) | def from_namespace(cls, ns) -> "WhisperLiveKitConfig":
    method from_kwargs (line 100) | def from_kwargs(cls, **kwargs) -> "WhisperLiveKitConfig":

FILE: whisperlivekit/core.py
  class TranscriptionEngine (line 13) | class TranscriptionEngine:
    method __new__ (line 18) | def __new__(cls, *args, **kwargs):
    method reset (line 28) | def reset(cls):
    method __init__ (line 38) | def __init__(self, config=None, **kwargs):
    method _do_init (line 56) | def _do_init(self, config=None, **kwargs):
  function online_factory (line 237) | def online_factory(args, asr, language=None):
  function online_diarization_factory (line 282) | def online_diarization_factory(args, diarization_backend):
  function online_translation_factory (line 294) | def online_translation_factory(args, translation_model):

FILE: whisperlivekit/deepgram_compat.py
  function _parse_time_str (line 28) | def _parse_time_str(time_str: str) -> float:
  function _line_to_words (line 38) | def _line_to_words(line: dict) -> list:
  function _lines_to_result (line 74) | def _lines_to_result(lines: list, is_final: bool, speech_final: bool,
  class DeepgramAdapter (line 120) | class DeepgramAdapter:
    method __init__ (line 123) | def __init__(self, websocket: WebSocket):
    method send_metadata (line 132) | async def send_metadata(self, config):
    method process_update (line 152) | async def process_update(self, front_data_dict: dict):
  function handle_deepgram_websocket (line 219) | async def handle_deepgram_websocket(websocket: WebSocket, transcription_...

FILE: whisperlivekit/diarization/diart_backend.py
  class DiarizationObserver (line 21) | class DiarizationObserver(Observer):
    method __init__ (line 24) | def __init__(self):
    method on_next (line 30) | def on_next(self, value: Tuple[Annotation, Any]):
    method get_segments (line 55) | def get_segments(self) -> List[SpeakerSegment]:
    method clear_old_segments (line 60) | def clear_old_segments(self, older_than: float = 30.0):
    method on_error (line 69) | def on_error(self, error):
    method on_completed (line 73) | def on_completed(self):
  class WebSocketAudioSource (line 78) | class WebSocketAudioSource(AudioSource):
    method __init__ (line 82) | def __init__(self, uri: str = "websocket", sample_rate: int = 16000, b...
    method read (line 94) | def read(self):
    method _process_chunks (line 104) | def _process_chunks(self):
    method close (line 150) | def close(self):
    method push_audio (line 155) | def push_audio(self, chunk: np.ndarray):
  class DiartDiarization (line 164) | class DiartDiarization:
    method __init__ (line 165) | def __init__(self, sample_rate: int = 16000, config : SpeakerDiarizati...
    method insert_silence (line 198) | def insert_silence(self, silence_duration):
    method insert_audio_chunk (line 201) | def insert_audio_chunk(self, pcm_array: np.ndarray):
    method diarize (line 206) | async def diarize(self):
    method close (line 210) | def close(self):
  function concatenate_speakers (line 216) | def concatenate_speakers(segments):
  function add_speaker_to_tokens (line 230) | def add_speaker_to_tokens(segments, tokens):
  function visualize_tokens (line 274) | def visualize_tokens(tokens):

FILE: whisperlivekit/diarization/sortformer_backend.py
  class StreamingSortformerState (line 20) | class StreamingSortformerState:
    method __init__ (line 37) | def __init__(self):
  class SortformerDiarization (line 49) | class SortformerDiarization:
    method __init__ (line 50) | def __init__(self, model_name: str = "nvidia/diar_streaming_sortformer...
    method _load_model (line 56) | def _load_model(self, model_name: str):
  class SortformerDiarizationOnline (line 86) | class SortformerDiarizationOnline:
    method __init__ (line 87) | def __init__(self, shared_model, sample_rate: int = 16000):
    method _init_streaming_state (line 136) | def _init_streaming_state(self):
    method insert_silence (line 160) | def insert_silence(self, silence_duration: Optional[float]):
    method insert_audio_chunk (line 171) | def insert_audio_chunk(self, pcm_array: np.ndarray):
    method diarize (line 177) | async def diarize(self):
    method _process_predictions (line 230) | def _process_predictions(self):
    method get_segments (line 266) | def get_segments(self) -> List[SpeakerSegment]:
    method close (line 271) | def close(self):
  function main (line 295) | async def main():

FILE: whisperlivekit/diarization/utils.py
  function extract_number (line 4) | def extract_number(s: str) -> int:

FILE: whisperlivekit/diff_protocol.py
  class DiffTracker (line 32) | class DiffTracker:
    method to_message (line 39) | def to_message(self, front_data: FrontData) -> Dict[str, Any]:
    method reset (line 101) | def reset(self) -> None:

FILE: whisperlivekit/ffmpeg_manager.py
  class FFmpegState (line 32) | class FFmpegState(Enum):
  class FFmpegManager (line 39) | class FFmpegManager:
    method __init__ (line 40) | def __init__(self, sample_rate: int = 16000, channels: int = 1):
    method start (line 52) | async def start(self) -> bool:
    method stop (line 103) | async def stop(self):
    method write_data (line 123) | async def write_data(self, data: bytes) -> bool:
    method read_data (line 139) | async def read_data(self, size: int) -> Optional[bytes]:
    method get_state (line 160) | async def get_state(self) -> FFmpegState:
    method restart (line 164) | async def restart(self) -> bool:
    method _drain_stderr (line 185) | async def _drain_stderr(self):

FILE: whisperlivekit/local_agreement/backends.py
  class ASRBase (line 15) | class ASRBase:
    method __init__ (line 19) | def __init__(self, lan, model_size=None, cache_dir=None, model_dir=Non...
    method load_model (line 29) | def load_model(self, model_size, cache_dir, model_dir):
    method transcribe (line 32) | def transcribe(self, audio, init_prompt=""):
    method use_vad (line 35) | def use_vad(self):
  class WhisperASR (line 39) | class WhisperASR(ASRBase):
    method load_model (line 43) | def load_model(self, model_size=None, cache_dir=None, model_dir=None):
    method transcribe (line 62) | def transcribe(self, audio, init_prompt=""):
    method ts_words (line 79) | def ts_words(self, r) -> List[ASRToken]:
    method segments_end_ts (line 95) | def segments_end_ts(self, res) -> List[float]:
    method use_vad (line 98) | def use_vad(self):
  class FasterWhisperASR (line 101) | class FasterWhisperASR(ASRBase):
    method load_model (line 105) | def load_model(self, model_size=None, cache_dir=None, model_dir=None):
    method transcribe (line 129) | def transcribe(self, audio: np.ndarray, init_prompt: str = "") -> list:
    method ts_words (line 141) | def ts_words(self, segments) -> List[ASRToken]:
    method segments_end_ts (line 151) | def segments_end_ts(self, segments) -> List[float]:
    method use_vad (line 154) | def use_vad(self):
  class MLXWhisper (line 157) | class MLXWhisper(ASRBase):
    method load_model (line 163) | def load_model(self, model_size=None, cache_dir=None, model_dir=None):
    method translate_model_name (line 182) | def translate_model_name(self, model_name):
    method transcribe (line 190) | def transcribe(self, audio, init_prompt=""):
    method ts_words (line 203) | def ts_words(self, segments) -> List[ASRToken]:
    method segments_end_ts (line 213) | def segments_end_ts(self, res) -> List[float]:
    method use_vad (line 216) | def use_vad(self):
  class OpenaiApiASR (line 220) | class OpenaiApiASR(ASRBase):
    method __init__ (line 222) | def __init__(self, lan=None, temperature=0, logfile=sys.stderr):
    method load_model (line 233) | def load_model(self, *args, **kwargs):
    method ts_words (line 238) | def ts_words(self, segments) -> List[ASRToken]:
    method segments_end_ts (line 257) | def segments_end_ts(self, res) -> List[float]:
    method transcribe (line 260) | def transcribe(self, audio_data, prompt=None, *args, **kwargs):
    method use_vad (line 283) | def use_vad(self):

FILE: whisperlivekit/local_agreement/online_asr.py
  class HypothesisBuffer (line 11) | class HypothesisBuffer:
    method __init__ (line 20) | def __init__(self, logfile=sys.stderr, confidence_validation=False):
    method insert (line 29) | def insert(self, new_tokens: List[ASRToken], offset: float):
    method flush (line 59) | def flush(self) -> List[ASRToken]:
    method pop_committed (line 88) | def pop_committed(self, time: float):
  class OnlineASRProcessor (line 97) | class OnlineASRProcessor:
    method __init__ (line 108) | def __init__(
    method new_speaker (line 139) | def new_speaker(self, change_speaker):
    method init (line 144) | def init(self, offset: Optional[float] = None):
    method get_audio_buffer_end_time (line 153) | def get_audio_buffer_end_time(self) -> float:
    method insert_audio_chunk (line 157) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time:...
    method start_silence (line 161) | def start_silence(self):
    method end_silence (line 166) | def end_silence(self, silence_duration: Optional[float], offset: float):
    method insert_silence (line 181) | def insert_silence(self, silence_duration, offset):
    method prompt (line 187) | def prompt(self) -> Tuple[str, str]:
    method get_buffer (line 211) | def get_buffer(self):
    method process_iter (line 218) | def process_iter(self) -> Tuple[List[ASRToken], float]:
    method chunk_completed_sentence (line 267) | def chunk_completed_sentence(self):
    method chunk_completed_segment (line 300) | def chunk_completed_segment(self, res):
    method chunk_at (line 338) | def chunk_at(self, time: float):
    method words_to_sentences (line 354) | def words_to_sentences(self, tokens: List[ASRToken]) -> List[Sentence]:
    method finish (line 399) | def finish(self) -> Tuple[List[ASRToken], float]:
    method concatenate_tokens (line 410) | def concatenate_tokens(

FILE: whisperlivekit/local_agreement/whisper_online.py
  function create_tokenizer (line 20) | def create_tokenizer(lan):
  function backend_factory (line 67) | def backend_factory(
  function _normalize_backend_choice (line 163) | def _normalize_backend_choice(

FILE: whisperlivekit/metrics.py
  function normalize_text (line 12) | def normalize_text(text: str) -> str:
  function compute_wer (line 24) | def compute_wer(reference: str, hypothesis: str) -> Dict:
  function compute_timestamp_accuracy (line 85) | def compute_timestamp_accuracy(

FILE: whisperlivekit/metrics_collector.py
  class SessionMetrics (line 16) | class SessionMetrics:
    method rtf (line 39) | def rtf(self) -> float:
    method avg_latency_ms (line 46) | def avg_latency_ms(self) -> float:
    method p95_latency_ms (line 53) | def p95_latency_ms(self) -> float:
    method to_dict (line 62) | def to_dict(self) -> Dict:
    method log_summary (line 79) | def log_summary(self) -> None:

FILE: whisperlivekit/model_paths.py
  class ModelInfo (line 9) | class ModelInfo:
    method has_pytorch (line 17) | def has_pytorch(self) -> bool:
    method is_sharded (line 21) | def is_sharded(self) -> bool:
    method primary_pytorch_file (line 25) | def primary_pytorch_file(self) -> Optional[Path]:
  function _is_ct2_model_bin (line 40) | def _is_ct2_model_bin(directory: Path, filename: str) -> bool:
  function _collect_pytorch_files (line 68) | def _collect_pytorch_files(directory: Path) -> List[Path]:
  function detect_model_format (line 135) | def detect_model_format(model_path: Union[str, Path]) -> ModelInfo:
  function model_path_and_type (line 180) | def model_path_and_type(model_path: Union[str, Path]) -> Tuple[Optional[...
  function resolve_model_path (line 195) | def resolve_model_path(model_path: Union[str, Path]) -> Path:

FILE: whisperlivekit/parse_args.py
  function parse_args (line 5) | def parse_args():

FILE: whisperlivekit/qwen3_asr.py
  function _patch_transformers_compat (line 14) | def _patch_transformers_compat():
  class Qwen3ASR (line 126) | class Qwen3ASR(ASRBase):
    method __init__ (line 132) | def __init__(self, lan="auto", model_size=None, cache_dir=None,
    method load_model (line 139) | def load_model(self, model_size=None, cache_dir=None, model_dir=None):
    method _qwen3_language (line 168) | def _qwen3_language(self) -> Optional[str]:
    method transcribe (line 173) | def transcribe(self, audio: np.ndarray, init_prompt: str = ""):
    method _detected_language (line 200) | def _detected_language(result) -> Optional[str]:
    method ts_words (line 211) | def ts_words(self, result) -> List[ASRToken]:
    method segments_end_ts (line 245) | def segments_end_ts(self, result) -> List[float]:
    method use_vad (line 259) | def use_vad(self):

FILE: whisperlivekit/qwen3_mlx_asr.py
  class Qwen3MLXASR (line 60) | class Qwen3MLXASR:
    method __init__ (line 67) | def __init__(self, logfile=sys.stderr, **kwargs):
    method transcribe (line 96) | def transcribe(self, audio):
  class Qwen3MLXOnlineProcessor (line 105) | class Qwen3MLXOnlineProcessor:
    method __init__ (line 123) | def __init__(self, asr: Qwen3MLXASR, logfile=sys.stderr):
    method insert_audio_chunk (line 155) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time:...
    method _transcribe_buffer (line 162) | def _transcribe_buffer(self) -> List[ASRToken]:
    method _local_agreement (line 209) | def _local_agreement(self, new_tokens: List[ASRToken]) -> List[ASRToken]:
    method _trim_buffer_if_needed (line 260) | def _trim_buffer_if_needed(self):
    method process_iter (line 292) | def process_iter(self, is_last=False) -> Tuple[List[ASRToken], float]:
    method get_buffer (line 324) | def get_buffer(self) -> Transcript:
    method _flush_all (line 335) | def _flush_all(self) -> List[ASRToken]:
    method _reset_for_new_utterance (line 355) | def _reset_for_new_utterance(self):
    method start_silence (line 368) | def start_silence(self) -> Tuple[List[ASRToken], float]:
    method end_silence (line 379) | def end_silence(self, silence_duration: float, offset: float):
    method new_speaker (line 383) | def new_speaker(self, change_speaker):
    method warmup (line 386) | def warmup(self, audio, init_prompt=""):
    method finish (line 389) | def finish(self) -> Tuple[List[ASRToken], float]:

FILE: whisperlivekit/qwen3_mlx_simul.py
  class Qwen3MLXSimulConfig (line 67) | class Qwen3MLXSimulConfig:
  class _SessionState (line 84) | class _SessionState:
  class Qwen3MLXSimulStreamingASR (line 104) | class Qwen3MLXSimulStreamingASR:
    method __init__ (line 111) | def __init__(
    method _load_alignment_heads (line 187) | def _load_alignment_heads(
    method _warmup (line 216) | def _warmup(self, audio: np.ndarray):
    method transcribe (line 236) | def transcribe(self, audio):
  class _AttnCaptureWrapper (line 245) | class _AttnCaptureWrapper:
    method __init__ (line 259) | def __init__(self, original, layer_idx, head_indices, gqa_ratio,
    method __call__ (line 270) | def __call__(self, x, cos, sin, mask=None, cache=None, layer_idx=0):
    method __getattr__ (line 305) | def __getattr__(self, name):
  function _install_alignment_hooks (line 309) | def _install_alignment_hooks(model, heads_by_layer, gqa_ratio, audio_sta...
  function _remove_alignment_hooks (line 329) | def _remove_alignment_hooks(model, originals):
  class Qwen3MLXSimulStreamingOnlineProcessor (line 340) | class Qwen3MLXSimulStreamingOnlineProcessor:
    method __init__ (line 351) | def __init__(self, asr: Qwen3MLXSimulStreamingASR, logfile=sys.stderr):
    method speaker (line 361) | def speaker(self):
    method speaker (line 365) | def speaker(self, value):
    method global_time_offset (line 369) | def global_time_offset(self):
    method global_time_offset (line 373) | def global_time_offset(self, value):
    method insert_audio_chunk (line 378) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time:...
    method process_iter (line 392) | def process_iter(self, is_last=False) -> Tuple[List[ASRToken], float]:
    method _infer (line 416) | def _infer(self, is_last: bool) -> List[ASRToken]:
    method _build_timestamped_words (line 625) | def _build_timestamped_words(
    method start_silence (line 697) | def start_silence(self) -> Tuple[List[ASRToken], float]:
    method end_silence (line 706) | def end_silence(self, silence_duration: float, offset: float):
    method new_speaker (line 720) | def new_speaker(self, change_speaker):
    method get_buffer (line 726) | def get_buffer(self) -> Transcript:
    method warmup (line 729) | def warmup(self, audio: np.ndarray, init_prompt: str = ""):
    method finish (line 739) | def finish(self) -> Tuple[List[ASRToken], float]:

FILE: whisperlivekit/qwen3_simul.py
  class Qwen3SimulConfig (line 52) | class Qwen3SimulConfig:
  class _AudioEmbedCache (line 70) | class _AudioEmbedCache:
    method trim_front (line 106) | def trim_front(self, trim_samples: int, sample_rate: int = 16000):
    method reset (line 119) | def reset(self):
  class Qwen3SimulState (line 128) | class Qwen3SimulState:
  class Qwen3SimulStreamingASR (line 154) | class Qwen3SimulStreamingASR:
    method __init__ (line 164) | def __init__(
    method _load_model (line 204) | def _load_model(self, model_size, model_dir, model_cache_dir, model_pa...
    method _load_alignment_heads (line 266) | def _load_alignment_heads(
    method _warmup (line 303) | def _warmup(self, audio: np.ndarray):
    method transcribe (line 330) | def transcribe(self, audio):
  class Qwen3SimulStreamingOnlineProcessor (line 335) | class Qwen3SimulStreamingOnlineProcessor:
    method __init__ (line 351) | def __init__(self, asr: Qwen3SimulStreamingASR, logfile=sys.stderr):
    method _build_prompt_template (line 363) | def _build_prompt_template(self):
    method speaker (line 382) | def speaker(self):
    method speaker (line 386) | def speaker(self, value):
    method global_time_offset (line 390) | def global_time_offset(self):
    method global_time_offset (line 394) | def global_time_offset(self, value):
    method insert_audio_chunk (line 397) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time:...
    method start_silence (line 413) | def start_silence(self) -> Tuple[List[ASRToken], float]:
    method end_silence (line 427) | def end_silence(self, silence_duration: float, offset: float):
    method new_speaker (line 443) | def new_speaker(self, change_speaker: ChangeSpeaker):
    method get_buffer (line 450) | def get_buffer(self) -> Transcript:
    method _encode_audio_cached (line 454) | def _encode_audio_cached(self) -> Optional[torch.Tensor]:
    method _build_inputs_with_cached_audio (line 604) | def _build_inputs_with_cached_audio(
    method process_iter (line 697) | def process_iter(self, is_last=False) -> Tuple[List[ASRToken], float]:
    method _infer (line 737) | def _infer(self, is_last: bool) -> List[ASRToken]:
    method _build_timestamped_words (line 1085) | def _build_timestamped_words(
    method _median_frame (line 1164) | def _median_frame(frames: List[int]) -> Optional[int]:
    method warmup (line 1171) | def warmup(self, audio: np.ndarray, init_prompt: str = ""):
    method finish (line 1182) | def finish(self) -> Tuple[List[ASRToken], float]:

FILE: whisperlivekit/qwen3_simul_kv.py
  class Qwen3SimulKVConfig (line 36) | class Qwen3SimulKVConfig:
  class _AudioEmbedCache (line 52) | class _AudioEmbedCache:
    method reset (line 59) | def reset(self):
  class Qwen3SimulKVState (line 67) | class Qwen3SimulKVState:
    method reset_kv (line 98) | def reset_kv(self):
  class Qwen3SimulKVASR (line 110) | class Qwen3SimulKVASR:
    method __init__ (line 117) | def __init__(
    method _load_model (line 156) | def _load_model(self, model_size, model_dir, model_cache_dir, model_pa...
    method _load_alignment_heads (line 208) | def _load_alignment_heads(self, path):
    method _warmup (line 225) | def _warmup(self, audio):
    method transcribe (line 238) | def transcribe(self, audio):
  class Qwen3SimulKVOnlineProcessor (line 242) | class Qwen3SimulKVOnlineProcessor:
    method __init__ (line 254) | def __init__(self, asr: Qwen3SimulKVASR, logfile=sys.stderr):
    method _build_prompt_template (line 262) | def _build_prompt_template(self):
    method speaker (line 277) | def speaker(self):
    method speaker (line 281) | def speaker(self, value):
    method global_time_offset (line 285) | def global_time_offset(self):
    method global_time_offset (line 289) | def global_time_offset(self, value):
    method insert_audio_chunk (line 292) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time:...
    method start_silence (line 305) | def start_silence(self) -> Tuple[List[ASRToken], float]:
    method end_silence (line 314) | def end_silence(self, silence_duration: float, offset: float):
    method new_speaker (line 327) | def new_speaker(self, change_speaker: ChangeSpeaker):
    method get_buffer (line 333) | def get_buffer(self) -> Transcript:
    method _encode_audio (line 336) | def _encode_audio(self) -> Tuple[torch.Tensor, int]:
    method _build_full_inputs (line 415) | def _build_full_inputs(self, audio_embeds: torch.Tensor) -> dict:
    method process_iter (line 475) | def process_iter(self, is_last=False) -> Tuple[List[ASRToken], float]:
    method _infer (line 500) | def _infer(self, is_last: bool) -> List[ASRToken]:
    method _build_timestamped_words (line 710) | def _build_timestamped_words(
    method warmup (line 775) | def warmup(self, audio: np.ndarray, init_prompt: str = ""):
    method finish (line 784) | def finish(self) -> Tuple[List[ASRToken], float]:

FILE: whisperlivekit/session_asr_proxy.py
  class SessionASRProxy (line 10) | class SessionASRProxy:
    method __init__ (line 22) | def __init__(self, asr, language: str):
    method __getattr__ (line 30) | def __getattr__(self, name):
    method transcribe (line 33) | def transcribe(self, audio, init_prompt=""):

FILE: whisperlivekit/silero_vad_iterator.py
  function is_onnx_available (line 11) | def is_onnx_available() -> bool:
  function init_jit_model (line 20) | def init_jit_model(model_path: str, device=torch.device('cpu')):
  class OnnxSession (line 27) | class OnnxSession():
    method __init__ (line 32) | def __init__(self, path, force_onnx_cpu=False):
  class OnnxWrapper (line 52) | class OnnxWrapper():
    method __init__ (line 57) | def __init__(self, session: OnnxSession, force_onnx_cpu=False):
    method session (line 63) | def session(self):
    method _validate_input (line 66) | def _validate_input(self, x, sr: int):
    method reset_states (line 84) | def reset_states(self, batch_size=1):
    method __call__ (line 90) | def __call__(self, x, sr: int):
  function _get_onnx_model_path (line 128) | def _get_onnx_model_path(model_path: str = None, opset_version: int = 16...
  function load_onnx_session (line 156) | def load_onnx_session(model_path: str = None, opset_version: int = 16, f...
  function load_jit_vad (line 164) | def load_jit_vad(model_path: str = None):
  class VADIterator (line 188) | class VADIterator:
    method __init__ (line 195) | def __init__(self,
    method reset_states (line 235) | def reset_states(self):
    method __call__ (line 243) | def __call__(self, x, return_seconds=False, time_resolution: int = 1):
  class FixedVADIterator (line 288) | class FixedVADIterator(VADIterator):
    method reset_states (line 293) | def reset_states(self):
    method __call__ (line 297) | def __call__(self, x, return_seconds=False):

FILE: whisperlivekit/simul_whisper/align_att_base.py
  class AlignAttBase (line 14) | class AlignAttBase(ABC):
    method speaker (line 30) | def speaker(self):
    method speaker (line 34) | def speaker(self, value):
    method global_time_offset (line 38) | def global_time_offset(self):
    method global_time_offset (line 42) | def global_time_offset(self, value):
    method _base_init (line 47) | def _base_init(self, cfg: AlignAttConfig, model):
    method _init_state_common (line 64) | def _init_state_common(self, cfg: AlignAttConfig):
    method warmup (line 75) | def warmup(self, audio):
    method create_tokenizer (line 84) | def create_tokenizer(self, language=None):
    method trim_context (line 93) | def trim_context(self):
    method refresh_segment (line 108) | def refresh_segment(self, complete=False):
    method segments_len (line 124) | def segments_len(self):
    method _apply_minseglen (line 127) | def _apply_minseglen(self):
    method _clean_cache (line 134) | def _clean_cache(self):
    method debug_print_tokens (line 137) | def debug_print_tokens(self, tokens):
    method _detect_language_if_needed (line 143) | def _detect_language_if_needed(self, encoder_feature):
    method infer (line 164) | def infer(self, is_last=False):
    method _split_tokens (line 309) | def _split_tokens(self, tokens_list, fire_detected, is_last):
    method _build_timestamped_words (line 322) | def _build_timestamped_words(self, split_words, split_tokens, l_absolu...
    method _handle_pending_tokens (line 360) | def _handle_pending_tokens(self, split_words, split_tokens):
    method _apply_dry_penalty (line 394) | def _apply_dry_penalty(self, logits, current_tokens):
    method _init_state (line 444) | def _init_state(self, cfg: AlignAttConfig):
    method init_tokens (line 449) | def init_tokens(self):
    method init_context (line 454) | def init_context(self):
    method insert_audio (line 459) | def insert_audio(self, segment=None):
    method _current_tokens (line 464) | def _current_tokens(self):
    method fire_at_boundary (line 469) | def fire_at_boundary(self, feature):
    method lang_id (line 474) | def lang_id(self, encoder_features):
    method _concat_segments (line 479) | def _concat_segments(self):
    method _encode (line 484) | def _encode(self, input_segments):
    method _init_sum_logprobs (line 489) | def _init_sum_logprobs(self):
    method _get_logits_and_cross_attn (line 494) | def _get_logits_and_cross_attn(self, tokens, encoder_feature):
    method _check_no_speech (line 499) | def _check_no_speech(self, logits):
    method _suppress_blank_tokens (line 504) | def _suppress_blank_tokens(self, logits):
    method _apply_token_suppression (line 509) | def _apply_token_suppression(self, logits):
    method _update_tokens (line 514) | def _update_tokens(self, current_tokens, logits, sum_logprobs):
    method _process_cross_attention (line 519) | def _process_cross_attention(self, accumulated_cross_attns, content_me...
    method _get_attended_frames (line 524) | def _get_attended_frames(self, attn):
    method _is_special_token (line 529) | def _is_special_token(self, current_tokens):
    method _rewind_tokens (line 534) | def _rewind_tokens(self):
    method _tokens_to_list (line 539) | def _tokens_to_list(self, current_tokens, start_col):
    method _make_new_tokens_tensor (line 544) | def _make_new_tokens_tensor(self, hypothesis):
    method _evaluate (line 549) | def _evaluate(self, tensor):

FILE: whisperlivekit/simul_whisper/backend.py
  class SimulStreamingOnlineProcessor (line 36) | class SimulStreamingOnlineProcessor:
    method __init__ (line 40) | def __init__(self, asr, logfile=sys.stderr):
    method _create_alignatt (line 51) | def _create_alignatt(self):
    method start_silence (line 63) | def start_silence(self):
    method end_silence (line 67) | def end_silence(self, silence_duration, offset):
    method insert_audio_chunk (line 83) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time):
    method new_speaker (line 92) | def new_speaker(self, change_speaker: ChangeSpeaker):
    method get_buffer (line 99) | def get_buffer(self):
    method process_iter (line 103) | def process_iter(self, is_last=False) -> Tuple[List[ASRToken], float]:
    method warmup (line 125) | def warmup(self, audio, init_prompt=""):
    method __del__ (line 139) | def __del__(self):
  class SimulStreamingASR (line 148) | class SimulStreamingASR:
    method __init__ (line 152) | def __init__(self, logfile=sys.stderr, **kwargs):
    method _warmup_mlx_model (line 272) | def _warmup_mlx_model(self):
    method _resolve_encoder_backend (line 284) | def _resolve_encoder_backend(self, preferred_backend, compatible_whisp...
    method _has_custom_model_path (line 307) | def _has_custom_model_path(self):
    method _can_use_mlx (line 310) | def _can_use_mlx(self, compatible_whisper_mlx):
    method _can_use_faster (line 317) | def _can_use_faster(self, compatible_faster_whisper):
    method load_model (line 324) | def load_model(self):
    method set_translate_task (line 349) | def set_translate_task(self):
    method transcribe (line 360) | def transcribe(self, audio):

FILE: whisperlivekit/simul_whisper/beam.py
  class BeamPyTorchInference (line 6) | class BeamPyTorchInference(PyTorchInference):
    method _kv_cache_ids (line 9) | def _kv_cache_ids(self):
    method rearrange_kv_cache (line 15) | def rearrange_kv_cache(self, source_indices):
    method logits (line 21) | def logits(

FILE: whisperlivekit/simul_whisper/config.py
  class AlignAttConfig (line 6) | class AlignAttConfig():

FILE: whisperlivekit/simul_whisper/decoder_state.py
  class DecoderState (line 8) | class DecoderState:
    method clean_cache (line 50) | def clean_cache(self):
    method reset (line 73) | def reset(self, rewind_threshold: int = 200):
    method full_reset (line 86) | def full_reset(self, rewind_threshold: int = 200):

FILE: whisperlivekit/simul_whisper/eow_detection.py
  function load_cif (line 5) | def load_cif(cfg, n_audio_state, device):
  function resize (line 25) | def resize(alphas, target_lengths, threshold=0.999):
  function fire_at_boundary (line 50) | def fire_at_boundary(chunked_encoder_feature: torch.Tensor, cif_linear):

FILE: whisperlivekit/simul_whisper/mlx/decoder_state.py
  class MLXDecoderState (line 9) | class MLXDecoderState:
    method clean_cache (line 52) | def clean_cache(self):
    method reset (line 59) | def reset(self, rewind_threshold: int = 200):
    method full_reset (line 66) | def full_reset(self, rewind_threshold: int = 200):

FILE: whisperlivekit/simul_whisper/mlx/decoders.py
  class MLXGreedyDecoder (line 10) | class MLXGreedyDecoder:
    method __init__ (line 13) | def __init__(self, temperature: float, eot: int):
    method update (line 17) | def update(
    method finalize (line 50) | def finalize(self, tokens: mx.array, sum_logprobs: mx.array):
  class MLXBeamSearchDecoder (line 57) | class MLXBeamSearchDecoder:
    method __init__ (line 60) | def __init__(
    method reset (line 78) | def reset(self):
    method update (line 82) | def update(
    method finalize (line 156) | def finalize(self, preceding_tokens: mx.array, sum_logprobs: mx.array):
  class MLXInference (line 182) | class MLXInference:
    method __init__ (line 185) | def __init__(self, model, initial_token_length: int):
    method rearrange_kv_cache (line 190) | def rearrange_kv_cache(self, source_indices: List[int]):
    method logits (line 209) | def logits(

FILE: whisperlivekit/simul_whisper/mlx/simul_whisper.py
  class MLXTokenBuffer (line 20) | class MLXTokenBuffer:
    method __init__ (line 23) | def __init__(self, text="", tokenizer=None, prefix_token_ids=None):
    method as_token_ids (line 29) | def as_token_ids(self, tokenizer=None):
    method as_mlx_array (line 36) | def as_mlx_array(self) -> mx.array:
    method as_mlx_array_beam (line 40) | def as_mlx_array_beam(self, beam: int) -> mx.array:
    method as_text (line 44) | def as_text(self):
    method empty (line 48) | def empty(*a, **kw):
    method from_text (line 52) | def from_text(text, *a, **kw):
    method is_empty (line 55) | def is_empty(self):
    method trim_words (line 58) | def trim_words(self, num=1, after=0):
    method append_token_ids (line 68) | def append_token_ids(self, token_ids):
  function mlx_median_filter (line 89) | def mlx_median_filter(x: mx.array, filter_width: int) -> mx.array:
  class MLXAlignAtt (line 107) | class MLXAlignAtt(AlignAttBase):
    method __init__ (line 114) | def __init__(
    method _init_state (line 127) | def _init_state(self, cfg: AlignAttConfig):
    method _build_alignment_source (line 178) | def _build_alignment_source(self):
    method init_tokens (line 200) | def init_tokens(self):
    method init_context (line 211) | def init_context(self):
    method insert_audio (line 222) | def insert_audio(self, segment=None):
    method _current_tokens (line 245) | def _current_tokens(self) -> mx.array:
    method fire_at_boundary (line 260) | def fire_at_boundary(self, chunked_encoder_feature: mx.array) -> bool:
    method lang_id (line 267) | def lang_id(self, encoder_features: mx.array) -> Tuple[mx.array, List[...
    method _concat_segments (line 296) | def _concat_segments(self):
    method _encode (line 301) | def _encode(self, input_segments):
    method _init_sum_logprobs (line 312) | def _init_sum_logprobs(self):
    method _get_logits_and_cross_attn (line 315) | def _get_logits_and_cross_attn(self, tokens, encoder_feature):
    method _check_no_speech (line 324) | def _check_no_speech(self, logits):
    method _suppress_blank_tokens (line 335) | def _suppress_blank_tokens(self, logits):
    method _apply_token_suppression (line 340) | def _apply_token_suppression(self, logits):
    method _update_tokens (line 348) | def _update_tokens(self, current_tokens, logits, sum_logprobs):
    method _process_cross_attention (line 351) | def _process_cross_attention(
    method _get_attended_frames (line 398) | def _get_attended_frames(self, attn):
    method _is_special_token (line 403) | def _is_special_token(self, current_tokens):
    method _rewind_tokens (line 406) | def _rewind_tokens(self):
    method _tokens_to_list (line 411) | def _tokens_to_list(self, current_tokens, start_col):
    method _make_new_tokens_tensor (line 414) | def _make_new_tokens_tensor(self, hypothesis):
    method _evaluate (line 418) | def _evaluate(self, tensor):

FILE: whisperlivekit/simul_whisper/mlx_encoder.py
  function load_mlx_encoder (line 14) | def load_mlx_encoder(
  function load_mlx_model (line 62) | def load_mlx_model(

FILE: whisperlivekit/simul_whisper/simul_whisper.py
  function load_coreml_encoder (line 34) | def load_coreml_encoder():
  class AlignAtt (line 51) | class AlignAtt(AlignAttBase):
    method __init__ (line 59) | def __init__(
    method _init_state (line 86) | def _init_state(self, cfg: AlignAttConfig):
    method init_tokens (line 139) | def init_tokens(self):
    method init_context (line 150) | def init_context(self):
    method insert_audio (line 162) | def insert_audio(self, segment=None):
    method _current_tokens (line 182) | def _current_tokens(self):
    method fire_at_boundary (line 199) | def fire_at_boundary(self, chunked_encoder_feature: torch.Tensor):
    method lang_id (line 207) | def lang_id(self, encoder_features):
    method _concat_segments (line 234) | def _concat_segments(self):
    method _encode (line 239) | def _encode(self, input_segments):
    method _init_sum_logprobs (line 305) | def _init_sum_logprobs(self):
    method _get_logits_and_cross_attn (line 308) | def _get_logits_and_cross_attn(self, tokens, encoder_feature):
    method _check_no_speech (line 321) | def _check_no_speech(self, logits):
    method _suppress_blank_tokens (line 330) | def _suppress_blank_tokens(self, logits):
    method _apply_token_suppression (line 334) | def _apply_token_suppression(self, logits):
    method _update_tokens (line 338) | def _update_tokens(self, current_tokens, logits, sum_logprobs):
    method _process_cross_attention (line 341) | def _process_cross_attention(
    method _get_attended_frames (line 386) | def _get_attended_frames(self, attn):
    method _is_special_token (line 390) | def _is_special_token(self, current_tokens):
    method _rewind_tokens (line 393) | def _rewind_tokens(self):
    method _tokens_to_list (line 398) | def _tokens_to_list(self, current_tokens, start_col):
    method _make_new_tokens_tensor (line 401) | def _make_new_tokens_tensor(self, hypothesis):
    method _evaluate (line 408) | def _evaluate(self, tensor):
    method infer (line 412) | def infer(self, is_last=False):

FILE: whisperlivekit/simul_whisper/token_buffer.py
  class TokenBuffer (line 5) | class TokenBuffer:
    method __init__ (line 7) | def __init__(self, text="", tokenizer=None, device=None, prefix_token_...
    method as_token_ids (line 14) | def as_token_ids(self, tokenizer=None):
    method as_tensor (line 22) | def as_tensor(self, device=None):
    method as_tensor_beam (line 31) | def as_tensor_beam(self, beam, device=None):
    method as_text (line 36) | def as_text(self):
    method empty (line 40) | def empty(*a, **kw):
    method from_text (line 44) | def from_text(text, *a, **kw):
    method is_empty (line 47) | def is_empty(self):
    method trim_words (line 50) | def trim_words(self, num=1, after=0):
    method append_token_ids (line 67) | def append_token_ids(self, token_ids):
    method as_split_word_tokens (line 91) | def as_split_word_tokens(self):

FILE: whisperlivekit/test_client.py
  class TranscriptionResult (line 39) | class TranscriptionResult:
    method text (line 46) | def text(self) -> str:
    method committed_text (line 61) | def committed_text(self) -> str:
    method lines (line 72) | def lines(self) -> List[dict]:
    method n_updates (line 80) | def n_updates(self) -> int:
  function reconstruct_state (line 88) | def reconstruct_state(msg: dict, lines: List[dict]) -> dict:
  function load_audio_pcm (line 117) | def load_audio_pcm(audio_path: str, sample_rate: int = SAMPLE_RATE) -> b...
  function transcribe_audio (line 137) | async def transcribe_audio(
  function _print_result (line 268) | def _print_result(result: TranscriptionResult, output_json: bool = False...
  function main (line 302) | def main():

FILE: whisperlivekit/test_data.py
  class TestSample (line 46) | class TestSample:
    method has_timestamps (line 61) | def has_timestamps(self) -> bool:
  function _save_wav (line 65) | def _save_wav(path: Path, audio: np.ndarray, sample_rate: int = 16000) -...
  function _load_metadata (line 85) | def _load_metadata() -> Dict:
  function _save_metadata (line 93) | def _save_metadata(meta: Dict) -> None:
  function _ensure_datasets (line 98) | def _ensure_datasets():
  function _decode_audio (line 110) | def _decode_audio(audio_bytes: bytes) -> tuple:
  function _download_librispeech_samples (line 127) | def _download_librispeech_samples(n_samples: int = 3) -> List[Dict]:
  function _download_ami_sample (line 181) | def _download_ami_sample() -> List[Dict]:
  function download_test_samples (line 271) | def download_test_samples(force: bool = False) -> List[TestSample]:
  function get_samples (line 323) | def get_samples() -> List[TestSample]:
  function get_sample (line 328) | def get_sample(name: str) -> TestSample:
  function list_sample_names (line 345) | def list_sample_names() -> List[str]:
  function _meta_to_samples (line 350) | def _meta_to_samples(meta_list: List[Dict]) -> List[TestSample]:

FILE: whisperlivekit/test_harness.py
  function _parse_time (line 63) | def _parse_time(time_str: str) -> float:
  function load_audio_pcm (line 73) | def load_audio_pcm(audio_path: str, sample_rate: int = SAMPLE_RATE) -> b...
  class TestState (line 95) | class TestState:
    method from_front_data (line 115) | def from_front_data(cls, front_data: FrontData, audio_position: float ...
    method text (line 132) | def text(self) -> str:
    method committed_text (line 140) | def committed_text(self) -> str:
    method committed_word_count (line 145) | def committed_word_count(self) -> int:
    method buffer_word_count (line 151) | def buffer_word_count(self) -> int:
    method speakers (line 158) | def speakers(self) -> Set[int]:
    method n_speakers (line 163) | def n_speakers(self) -> int:
    method speaker_at (line 166) | def speaker_at(self, time_s: float) -> Optional[int]:
    method speakers_in (line 171) | def speakers_in(self, start_s: float, end_s: float) -> Set[int]:
    method speaker_timeline (line 180) | def speaker_timeline(self) -> List[Dict[str, Any]]:
    method n_speaker_changes (line 192) | def n_speaker_changes(self) -> int:
    method has_silence (line 203) | def has_silence(self) -> bool:
    method silence_segments (line 208) | def silence_segments(self) -> List[Dict[str, Any]]:
    method silence_at (line 212) | def silence_at(self, time_s: float) -> bool:
    method speech_lines (line 220) | def speech_lines(self) -> List[Dict[str, Any]]:
    method line_at (line 224) | def line_at(self, time_s: float) -> Optional[Dict[str, Any]]:
    method text_at (line 233) | def text_at(self, time_s: float) -> Optional[str]:
    method lines_between (line 238) | def lines_between(self, start_s: float, end_s: float) -> List[Dict[str...
    method text_between (line 248) | def text_between(self, start_s: float, end_s: float) -> str:
    method wer (line 257) | def wer(self, reference: str) -> float:
    method wer_detailed (line 267) | def wer_detailed(self, reference: str) -> Dict:
    method timestamps (line 275) | def timestamps(self) -> List[Dict[str, Any]]:
    method timing_valid (line 288) | def timing_valid(self) -> bool:
    method timing_monotonic (line 298) | def timing_monotonic(self) -> bool:
    method timing_errors (line 306) | def timing_errors(self) -> List[str]:
  class AudioPlayer (line 332) | class AudioPlayer:
    method __init__ (line 349) | def __init__(self, harness: "TestHarness", pcm_data: bytes, sample_rat...
    method position (line 357) | def position(self) -> float:
    method duration (line 362) | def duration(self) -> float:
    method remaining (line 367) | def remaining(self) -> float:
    method done (line 372) | def done(self) -> bool:
    method play (line 376) | async def play(
    method play_until (line 404) | async def play_until(
    method seek (line 421) | def seek(self, time_s: float) -> None:
    method reset (line 427) | def reset(self) -> None:
  class TestHarness (line 436) | class TestHarness:
    method __init__ (line 462) | def __init__(self, **kwargs: Any):
    method __aenter__ (line 473) | async def __aenter__(self) -> "TestHarness":
    method __aexit__ (line 493) | async def __aexit__(self, *exc: Any) -> None:
    method _collect_results (line 503) | async def _collect_results(self) -> None:
    method state (line 519) | def state(self) -> TestState:
    method history (line 524) | def history(self) -> List[TestState]:
    method audio_position (line 529) | def audio_position(self) -> float:
    method metrics (line 534) | def metrics(self):
    method on_update (line 540) | def on_update(self, callback: Callable[[TestState], None]) -> None:
    method load_audio (line 546) | def load_audio(self, source) -> AudioPlayer:
    method feed (line 559) | async def feed(
    method feed_pcm (line 577) | async def feed_pcm(
    method pause (line 603) | async def pause(self, duration_s: float, speed: float = 1.0) -> None:
    method silence (line 617) | async def silence(self, duration_s: float, speed: float = 1.0) -> None:
    method wait_for (line 623) | async def wait_for(
    method wait_for_text (line 646) | async def wait_for_text(self, timeout: float = 30.0) -> TestState:
    method wait_for_lines (line 650) | async def wait_for_lines(self, n: int = 1, timeout: float = 30.0) -> T...
    method wait_for_silence (line 654) | async def wait_for_silence(self, timeout: float = 30.0) -> TestState:
    method wait_for_speakers (line 658) | async def wait_for_speakers(self, n: int = 2, timeout: float = 30.0) -...
    method drain (line 662) | async def drain(self, seconds: float = 2.0) -> None:
    method finish (line 671) | async def finish(self, timeout: float = 30.0) -> TestState:
    method cut (line 687) | async def cut(self, timeout: float = 5.0) -> TestState:
    method snapshot_at (line 707) | def snapshot_at(self, audio_time: float) -> Optional[TestState]:
    method print_state (line 729) | def print_state(self) -> None:

FILE: whisperlivekit/thread_safety.py
  function get_model_lock (line 44) | def get_model_lock():
  function acquire_model_lock (line 49) | def acquire_model_lock(timeout=None):
  function release_model_lock (line 71) | def release_model_lock():
  class ModelLockContext (line 83) | class ModelLockContext:
    method __init__ (line 86) | def __init__(self, timeout=None):
    method __enter__ (line 90) | def __enter__(self):
    method __exit__ (line 94) | def __exit__(self, exc_type, exc_val, exc_tb):
  function print_deployment_recommendations (line 104) | def print_deployment_recommendations():

FILE: whisperlivekit/timed_objects.py
  function format_time (line 6) | def format_time(seconds: float) -> str:
  class Timed (line 18) | class Timed:
  class TimedText (line 23) | class TimedText(Timed):
    method has_punctuation (line 28) | def has_punctuation(self) -> bool:
    method is_within (line 31) | def is_within(self, other: 'TimedText') -> bool:
    method duration (line 34) | def duration(self) -> float:
    method contains_timespan (line 37) | def contains_timespan(self, other: 'TimedText') -> bool:
    method __bool__ (line 40) | def __bool__(self) -> bool:
    method __str__ (line 43) | def __str__(self) -> str:
  class ASRToken (line 47) | class ASRToken(TimedText):
    method with_offset (line 50) | def with_offset(self, offset: float) -> "ASRToken":
    method is_silence (line 54) | def is_silence(self) -> bool:
  class Sentence (line 59) | class Sentence(TimedText):
  class Transcript (line 63) | class Transcript(TimedText):
    method from_tokens (line 69) | def from_tokens(
  class SpeakerSegment (line 88) | class SpeakerSegment(Timed):
  class Translation (line 96) | class Translation(TimedText):
  class Silence (line 100) | class Silence():
    method compute_duration (line 107) | def compute_duration(self) -> Optional[float]:
    method is_silence (line 113) | def is_silence(self) -> bool:
  class Segment (line 118) | class Segment(TimedText):
    method from_tokens (line 128) | def from_tokens(
    method is_silence (line 155) | def is_silence(self) -> bool:
    method to_dict (line 159) | def to_dict(self) -> Dict[str, Any]:
  class PuncSegment (line 175) | class PuncSegment(Segment):
  class SilentSegment (line 178) | class SilentSegment(Segment):
    method __init__ (line 179) | def __init__(self, *args: Any, **kwargs: Any) -> None:
  class FrontData (line 186) | class FrontData():
    method to_dict (line 196) | def to_dict(self) -> Dict[str, Any]:
  class ChangeSpeaker (line 212) | class ChangeSpeaker:
  class State (line 217) | class State():

FILE: whisperlivekit/tokens_alignment.py
  class TokensAlignment (line 17) | class TokensAlignment:
    method __init__ (line 19) | def __init__(self, state: Any, args: Any, sep: Optional[str]) -> None:
    method update (line 45) | def update(self) -> None:
    method _prune (line 57) | def _prune(self) -> None:
    method add_translation (line 90) | def add_translation(self, segment: Segment) -> None:
    method compute_punctuations_segments (line 102) | def compute_punctuations_segments(self, tokens: Optional[List[ASRToken...
    method compute_new_punctuations_segments (line 134) | def compute_new_punctuations_segments(self) -> List[PuncSegment]:
    method concatenate_diar_segments (line 163) | def concatenate_diar_segments(self) -> List[SpeakerSegment]:
    method intersection_duration (line 177) | def intersection_duration(seg1: TimedText, seg2: TimedText) -> float:
    method get_lines_diarization (line 184) | def get_lines_diarization(self) -> Tuple[List[Segment], str]:
    method get_lines (line 217) | def get_lines(

FILE: whisperlivekit/vllm_realtime.py
  class VLLMRealtimeASR (line 27) | class VLLMRealtimeASR:
    method __init__ (line 34) | def __init__(self, vllm_url="ws://localhost:8000/v1/realtime",
    method transcribe (line 41) | def transcribe(self, audio):
  class VLLMRealtimeOnlineProcessor (line 45) | class VLLMRealtimeOnlineProcessor:
    method __init__ (line 57) | def __init__(self, asr: VLLMRealtimeASR):
    method _reset_state (line 70) | def _reset_state(self):
    method insert_audio_chunk (line 89) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time:...
    method process_iter (line 94) | def process_iter(self, is_last=False) -> Tuple[List[ASRToken], float]:
    method get_buffer (line 101) | def get_buffer(self) -> Transcript:
    method start_silence (line 115) | def start_silence(self) -> Tuple[List[ASRToken], float]:
    method end_silence (line 148) | def end_silence(self, silence_duration: float, offset: float):
    method new_speaker (line 152) | def new_speaker(self, change_speaker):
    method warmup (line 155) | def warmup(self, audio, init_prompt=""):
    method finish (line 158) | def finish(self) -> Tuple[List[ASRToken], float]:
    method _connect (line 181) | def _connect(self):
    method _close_ws (line 206) | def _close_ws(self):
    method _recv_loop (line 219) | def _recv_loop(self):
    method _send_commit (line 259) | def _send_commit(self, final: bool):
    method _send_audio (line 271) | def _send_audio(self, audio: np.ndarray):
    method _send_pending_audio (line 289) | def _send_pending_audio(self):
    method _drain_deltas (line 313) | def _drain_deltas(self):
    method _wait_for_done (line 317) | def _wait_for_done(self, timeout: float = 10.0):
    method _time_for_word (line 328) | def _time_for_word(self, word_idx: int, n_words_total: int) -> Tuple[f...
    method _extract_new_words (line 338) | def _extract_new_words(self) -> List[ASRToken]:
    method _flush_all_pending_words (line 359) | def _flush_all_pending_words(self) -> List[ASRToken]:
    method _process_iter_inner (line 382) | def _process_iter_inner(self, is_last: bool) -> Tuple[List[ASRToken], ...

FILE: whisperlivekit/voxtral_hf_streaming.py
  class VoxtralHFStreamingASR (line 23) | class VoxtralHFStreamingASR:
    method __init__ (line 28) | def __init__(self, logfile=sys.stderr, **kwargs):
    method transcribe (line 63) | def transcribe(self, audio):
  class VoxtralHFStreamingOnlineProcessor (line 67) | class VoxtralHFStreamingOnlineProcessor:
    method __init__ (line 78) | def __init__(self, asr: VoxtralHFStreamingASR, logfile=sys.stderr):
    method _reset_state (line 104) | def _reset_state(self):
    method _get_pending_audio (line 135) | def _get_pending_audio(self) -> np.ndarray:
    method _set_pending_audio (line 145) | def _set_pending_audio(self, arr: np.ndarray):
    method _get_accumulated_text (line 154) | def _get_accumulated_text(self) -> str:
    method insert_audio_chunk (line 166) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time:...
    method process_iter (line 172) | def process_iter(self, is_last=False) -> Tuple[List[ASRToken], float]:
    method get_buffer (line 179) | def get_buffer(self) -> Transcript:
    method start_silence (line 197) | def start_silence(self) -> Tuple[List[ASRToken], float]:
    method end_silence (line 239) | def end_silence(self, silence_duration: float, offset: float):
    method new_speaker (line 243) | def new_speaker(self, change_speaker):
    method warmup (line 246) | def warmup(self, audio, init_prompt=""):
    method finish (line 249) | def finish(self) -> Tuple[List[ASRToken], float]:
    method _start_generate_thread (line 280) | def _start_generate_thread(self):
    method _feed_pending_audio (line 356) | def _feed_pending_audio(self):
    method _append_text_fragment (line 371) | def _append_text_fragment(self, text_fragment: str):
    method _drain_streamer (line 378) | def _drain_streamer(self):
    method _drain_streamer_blocking (line 396) | def _drain_streamer_blocking(self, timeout=30.0):
    method _pos_to_time (line 445) | def _pos_to_time(self, token_position: int) -> float:
    method _audio_pos_for_char (line 449) | def _audio_pos_for_char(self, char_idx: int) -> int:
    method _word_timestamps (line 468) | def _word_timestamps(self, text: str, words: List[str], start_idx: int...
    method _extract_new_words (line 483) | def _extract_new_words(self) -> List[ASRToken]:
    method _flush_all_pending_words (line 510) | def _flush_all_pending_words(self) -> List[ASRToken]:
    method _process_iter_inner (line 538) | def _process_iter_inner(self, is_last: bool) -> Tuple[List[ASRToken], ...

FILE: whisperlivekit/voxtral_mlx/loader.py
  function download_weights (line 43) | def download_weights(model_id: str = DEFAULT_MODEL_ID) -> Path:
  function _translate_weight_name (line 113) | def _translate_weight_name(name: str) -> str | None:
  function _is_conv_weight (line 122) | def _is_conv_weight(name: str) -> bool:
  function _remap_converted_name (line 164) | def _remap_converted_name(name: str) -> str:
  function _has_converted_layout (line 180) | def _has_converted_layout(path: Path) -> bool:
  function _load_converted_weights (line 184) | def _load_converted_weights(path: Path):
  function _load_original_weights (line 219) | def _load_original_weights(path: Path):
  function _load_tokenizer (line 253) | def _load_tokenizer(model_dir: Path):
  function load_voxtral_model (line 262) | def load_voxtral_model(path_or_id: str = DEFAULT_MODEL_ID):

FILE: whisperlivekit/voxtral_mlx/model.py
  class SlidingKVCache (line 22) | class SlidingKVCache:
    method __init__ (line 32) | def __init__(self, capacity: int):
    method offset (line 40) | def offset(self) -> int:
    method _reorder (line 45) | def _reorder(self, buf):
    method _drop_oldest (line 56) | def _drop_oldest(self, buf, n_drop, tail=None):
    method _append_concat (line 64) | def _append_concat(self, k, v):
    method _write_inplace (line 79) | def _write_inplace(self, k, v):
    method update_and_fetch (line 121) | def update_and_fetch(self, k, v):
  class CausalConv (line 132) | class CausalConv(nn.Module):
    method __init__ (line 135) | def __init__(self, channels_in: int, channels_out: int, kernel: int, s...
    method __call__ (line 143) | def __call__(self, x: mx.array) -> mx.array:
  class _EncoderSelfAttention (line 149) | class _EncoderSelfAttention(nn.Module):
    method __init__ (line 150) | def __init__(self, dim: int, n_heads: int, head_dim: int, rope_theta: ...
    method __call__ (line 161) | def __call__(self, x, mask, cache=None):
  class _EncoderFFN (line 178) | class _EncoderFFN(nn.Module):
    method __init__ (line 181) | def __init__(self, dim: int, hidden: int):
    method __call__ (line 187) | def __call__(self, x):
  class _EncoderBlock (line 191) | class _EncoderBlock(nn.Module):
    method __init__ (line 192) | def __init__(self, dim, n_heads, head_dim, hidden, rope_theta):
    method __call__ (line 199) | def __call__(self, x, mask, cache=None):
  class StreamingEncoder (line 205) | class StreamingEncoder(nn.Module):
    method __init__ (line 210) | def __init__(
    method _apply_convs (line 233) | def _apply_convs(self, mel: mx.array) -> mx.array:
    method forward (line 239) | def forward(self, mel: mx.array) -> mx.array:
    method forward_conv_incremental (line 247) | def forward_conv_incremental(self, x_in, tail1, tail2):
    method forward_transformer_incremental (line 280) | def forward_transformer_incremental(self, x, cache_list):
  class _DecoderAttention (line 292) | class _DecoderAttention(nn.Module):
    method __init__ (line 295) | def __init__(self, dim, n_heads, n_kv_heads, head_dim, rope_theta):
    method __call__ (line 307) | def __call__(self, x, mask=None, cache=None):
  class _DecoderFFN (line 324) | class _DecoderFFN(nn.Module):
    method __init__ (line 327) | def __init__(self, dim, hidden):
    method __call__ (line 333) | def __call__(self, x):
  class AdaptiveScaling (line 337) | class AdaptiveScaling(nn.Module):
    method __init__ (line 341) | def __init__(self, dim, bottleneck):
    method __call__ (line 346) | def __call__(self, cond):
  class _DecoderBlock (line 350) | class _DecoderBlock(nn.Module):
    method __init__ (line 351) | def __init__(self, dim, n_heads, n_kv_heads, head_dim, hidden, rope_th...
    method __call__ (line 359) | def __call__(self, x, delay_cond, mask=None, cache=None):
  class TextDecoder (line 366) | class TextDecoder(nn.Module):
    method __init__ (line 369) | def __init__(
    method embed (line 389) | def embed(self, token_ids: mx.array) -> mx.array:
    method __call__ (line 392) | def __call__(self, x, delay_cond, mask=None, cache=None):
  class EncoderToDecoderAdapter (line 406) | class EncoderToDecoderAdapter(nn.Module):
    method __init__ (line 409) | def __init__(self, enc_dim: int, dec_dim: int):
    method __call__ (line 414) | def __call__(self, x):
  class DelayEmbedding (line 418) | class DelayEmbedding(nn.Module):
    method __init__ (line 422) | def __init__(self, dim: int = 3072, theta: float = 10000.0):
    method __call__ (line 429) | def __call__(self, delay: mx.array) -> mx.array:
  class VoxtralMLXModel (line 440) | class VoxtralMLXModel(nn.Module):
    method __init__ (line 443) | def __init__(self, config: dict):
    method encode (line 484) | def encode(self, mel: mx.array) -> mx.array:
    method encode_incremental (line 503) | def encode_incremental(self, new_mel, conv_tail1, conv_tail2, enc_cach...
    method decode (line 532) | def decode(self, embeddings, delay_cond, mask=None, cache=None):

FILE: whisperlivekit/voxtral_mlx/spectrogram.py
  function _build_slaney_filterbank (line 32) | def _build_slaney_filterbank(
  function _mel_filters (line 86) | def _mel_filters() -> mx.array:
  function _hann_window (line 102) | def _hann_window() -> mx.array:
  function _dft_matrices (line 109) | def _dft_matrices():
  function _stft_frames (line 123) | def _stft_frames(audio: mx.array, window: mx.array) -> mx.array:
  function _apply_mel_and_log (line 140) | def _apply_mel_and_log(power: mx.array) -> mx.array:
  function compute_mel (line 152) | def compute_mel(audio: np.ndarray) -> mx.array:
  function compute_mel_streaming (line 172) | def compute_mel_streaming(
  function pad_audio (line 206) | def pad_audio(

FILE: whisperlivekit/voxtral_mlx_asr.py
  function _prompt_tokens (line 43) | def _prompt_tokens(tokenizer, n_left_pad=LEFT_PAD_TOKENS, n_delay=6):
  class VoxtralMLXASR (line 55) | class VoxtralMLXASR:
    method __init__ (line 62) | def __init__(self, logfile=sys.stderr, **kwargs):
    method transcribe (line 84) | def transcribe(self, audio):
  class VoxtralMLXOnlineProcessor (line 93) | class VoxtralMLXOnlineProcessor:
    method __init__ (line 107) | def __init__(self, asr: VoxtralMLXASR, logfile=sys.stderr):
    method _reset_state (line 141) | def _reset_state(self):
    method _get_pending (line 177) | def _get_pending(self) -> np.ndarray:
    method _set_pending (line 187) | def _set_pending(self, arr: np.ndarray):
    method insert_audio_chunk (line 196) | def insert_audio_chunk(self, audio: np.ndarray, audio_stream_end_time:...
    method process_iter (line 205) | def process_iter(self, is_last=False) -> Tuple[List[ASRToken], float]:
    method _step (line 212) | def _step(self, is_last: bool) -> Tuple[List[ASRToken], float]:
    method _encode_pending (line 285) | def _encode_pending(self):
    method _do_prefill (line 323) | def _do_prefill(self):
    method _decode_positions (line 344) | def _decode_positions(self, n: int) -> bool:
    method _trim_embeds (line 396) | def _trim_embeds(self, n_consumed: int):
    method _sample (line 402) | def _sample(self, logits: mx.array) -> mx.array:
    method _audio_pos_to_time (line 407) | def _audio_pos_to_time(self, pos: int) -> float:
    method _word_time_range (line 411) | def _word_time_range(self, word_idx: int, n_words: int) -> Tuple[float...
    method _extract_committed_words (line 439) | def _extract_committed_words(self) -> List[ASRToken]:
    method _flush_all_words (line 457) | def _flush_all_words(self) -> List[ASRToken]:
    method get_buffer (line 477) | def get_buffer(self) -> Transcript:
    method _safe_decode_remaining (line 486) | def _safe_decode_remaining(self):
    method _flush_last_token_text (line 504) | def _flush_last_token_text(self):
    method _close_current_word (line 528) | def _close_current_word(self):
    method _flush_and_reset (line 535) | def _flush_and_reset(self) -> List[ASRToken]:
    method start_silence (line 585) | def start_silence(self) -> Tuple[List[ASRToken], float]:
    method end_silence (line 597) | def end_silence(self, silence_duration: float, offset: float):
    method new_speaker (line 601) | def new_speaker(self, change_speaker):
    method warmup (line 604) | def warmup(self, audio, init_prompt=""):
    method finish (line 607) | def finish(self) -> Tuple[List[ASRToken], float]:

FILE: whisperlivekit/warmup.py
  function load_file (line 6) | def load_file(warmup_file=None, timeout=5):
  function warmup_asr (line 43) | def warmup_asr(asr, warmup_file=None, timeout=5):

FILE: whisperlivekit/web/live_transcription.js
  function getWaveStroke (line 69) | function getWaveStroke() {
  function updateWaveStroke (line 76) | function updateWaveStroke() {
  function applyTheme (line 80) | function applyTheme(pref) {
  function enumerateMicrophones (line 119) | async function enumerateMicrophones() {
  function populateMicrophoneSelect (line 135) | function populateMicrophoneSelect() {
  function handleMicrophoneChange (line 154) | function handleMicrophoneChange() {
  function fmt1 (line 175) | function fmt1(x) {
  function setupWebSocket (line 215) | function setupWebSocket() {
  function renderLinesWithBuffer (line 333) | function renderLinesWithBuffer(
  function updateTimer (line 469) | function updateTimer() {
  function drawWaveform (line 478) | function drawWaveform() {
  function startRecording (line 520) | async function startRecording() {
  function stopRecording (line 639) | async function stopRecording() {
  function toggleRecording (line 725) | async function toggleRecording() {
  function updateUI (line 751) | function updateUI() {
  function checkAndRequestPermissions (line 802) | async function checkAndRequestPermissions() {

FILE: whisperlivekit/web/pcm_worklet.js
  class PCMForwarder (line 1) | class PCMForwarder extends AudioWorkletProcessor {
    method process (line 2) | process(inputs) {

FILE: whisperlivekit/web/recorder_worker.js
  function init (line 15) | function init(config) {
  function record (line 20) | function record(inputBuffer) {
  function resample (line 27) | function resample(buffer, from, to) {
  function toPCM (line 50) | function toPCM(input) {

FILE: whisperlivekit/web/web_interface.py
  function get_web_interface_html (line 7) | def get_web_interface_html():
  function get_inline_ui_html (line 16) | def get_inline_ui_html():
  function get (line 113) | async def get():

FILE: whisperlivekit/whisper/__init__.py
  function _download (line 57) | def _download(url: str, root: str, in_memory: bool) -> Union[bytes, str]:
  function available_models (line 101) | def available_models() -> List[str]:
  function _infer_dims_from_config (line 106) | def _infer_dims_from_config(path: str) -> Optional[ModelDimensions]:
  function _convert_hf_state_dict (line 163) | def _convert_hf_state_dict(state_dict: Dict[str, torch.Tensor]) -> Dict[...
  function _convert_mlx_state_dict (line 256) | def _convert_mlx_state_dict(state_dict: Dict[str, torch.Tensor]) -> Dict...
  function _load_lora_state (line 274) | def _load_lora_state(lora_path: str):
  function _collapse_hf_module_name (line 292) | def _collapse_hf_module_name(module: str):
  function _resolve_lora_path (line 302) | def _resolve_lora_path(lora_path: Optional[str]) -> Optional[str]:
  function _apply_lora_adapter (line 337) | def _apply_lora_adapter(state_dict: Dict[str, Tensor], lora_path: Option...
  function _load_checkpoint (line 397) | def _load_checkpoint(
  function _load_sharded_checkpoint (line 434) | def _load_sharded_checkpoint(
  function load_model (line 466) | def load_model(
  function convert_encoder_to_coreml (line 599) | def convert_encoder_to_coreml(

FILE: whisperlivekit/whisper/audio.py
  function load_audio (line 25) | def load_audio(file: str, sr: int = SAMPLE_RATE):
  function pad_or_trim (line 65) | def pad_or_trim(array, length: int = N_SAMPLES, *, axis: int = -1):
  function mel_filters (line 92) | def mel_filters(device, n_mels: int) -> torch.Tensor:
  function log_mel_spectrogram (line 110) | def log_mel_spectrogram(

FILE: whisperlivekit/whisper/decoding.py
  function detect_language (line 19) | def detect_language(
  class DecodingOptions (line 81) | class DecodingOptions:
  class DecodingResult (line 118) | class DecodingResult:
  class Inference (line 130) | class Inference:
    method logits (line 131) | def logits(self, tokens: Tensor, audio_features: Tensor) -> Tensor:
    method rearrange_kv_cache (line 135) | def rearrange_kv_cache(self, source_indices) -> None:
    method cleanup_caching (line 139) | def cleanup_caching(self) -> None:
  class PyTorchInference (line 144) | class PyTorchInference(Inference):
    method __init__ (line 145) | def __init__(self, model: "Whisper", initial_token_length: int):
    method logits (line 155) | def logits(self, tokens: Tensor, audio_features: Tensor) -> Tensor:
    method cleanup_caching (line 162) | def cleanup_caching(self):
    method rearrange_kv_cache (line 165) | def rearrange_kv_cache(self, source_indices):
  class SequenceRanker (line 173) | class SequenceRanker:
    method rank (line 174) | def rank(
  class MaximumLikelihoodRanker (line 184) | class MaximumLikelihoodRanker(SequenceRanker):
    method __init__ (line 190) | def __init__(self, length_penalty: Optional[float]):
    method rank (line 193) | def rank(self, tokens: List[List[Tensor]], sum_logprobs: List[List[flo...
  class TokenDecoder (line 210) | class TokenDecoder:
    method reset (line 211) | def reset(self):
    method update (line 214) | def update(
    method finalize (line 241) | def finalize(
  class GreedyDecoder (line 266) | class GreedyDecoder(TokenDecoder):
    method __init__ (line 267) | def __init__(self, temperature: float, eot: int):
    method update (line 271) | def update(
    method finalize (line 289) | def finalize(self, tokens: Tensor, sum_logprobs: Tensor):
  class BeamSearchDecoder (line 295) | class BeamSearchDecoder(TokenDecoder):
    method __init__ (line 296) | def __init__(
    method reset (line 314) | def reset(self):
    method update (line 317) | def update(
    method finalize (line 378) | def finalize(self, preceding_tokens: Tensor, sum_logprobs: Tensor):
  class LogitFilter (line 401) | class LogitFilter:
    method apply (line 402) | def apply(self, logits: Tensor, tokens: Tensor) -> None:
  class SuppressBlank (line 417) | class SuppressBlank(LogitFilter):
    method __init__ (line 418) | def __init__(self, tokenizer: Tokenizer, sample_begin: int):
    method apply (line 422) | def apply(self, logits: Tensor, tokens: Tensor):
  class SuppressTokens (line 427) | class SuppressTokens(LogitFilter):
    method __init__ (line 428) | def __init__(self, suppress_tokens: Sequence[int]):
    method apply (line 431) | def apply(self, logits: Tensor, tokens: Tensor):
  class ApplyTimestampRules (line 435) | class ApplyTimestampRules(LogitFilter):
    method __init__ (line 436) | def __init__(
    method apply (line 446) | def apply(self, logits: Tensor, tokens: Tensor):
  class DecodingTask (line 502) | class DecodingTask:
    method __init__ (line 508) | def __init__(self, model: "Whisper", options: DecodingOptions):
    method _verify_options (line 566) | def _verify_options(self, options: DecodingOptions) -> DecodingOptions:
    method _get_initial_tokens (line 581) | def _get_initial_tokens(self) -> Tuple[int]:
    method _get_suppress_tokens (line 609) | def _get_suppress_tokens(self) -> Tuple[int]:
    method _get_audio_features (line 638) | def _get_audio_features(self, mel: Tensor):
    method _detect_language (line 660) | def _detect_language(self, audio_features: Tensor, tokens: Tensor):
    method _main_loop (line 674) | def _main_loop(self, audio_features: Tensor, tokens: Tensor):
    method run (line 707) | def run(self, mel: Tensor) -> List[DecodingResult]:
  function decode (line 787) | def decode(

FILE: whisperlivekit/whisper/model.py
  class ModelDimensions (line 26) | class ModelDimensions:
  class LayerNorm (line 39) | class LayerNorm(nn.LayerNorm):
    method forward (line 40) | def forward(self, x: Tensor) -> Tensor:
  class Linear (line 44) | class Linear(nn.Linear):
    method forward (line 45) | def forward(self, x: Tensor) -> Tensor:
  class Conv1d (line 53) | class Conv1d(nn.Conv1d):
    method _conv_forward (line 54) | def _conv_forward(
  function sinusoids (line 62) | def sinusoids(length, channels, max_timescale=10000):
  function disable_sdpa (line 72) | def disable_sdpa():
  class MultiHeadAttention (line 81) | class MultiHeadAttention(nn.Module):
    method __init__ (line 84) | def __init__(self, n_state: int, n_head: int, cache_id: str = "", n_te...
    method forward (line 100) | def forward(
    method _update_self_attn_cache (line 130) | def _update_self_attn_cache(
    method qkv_attention (line 148) | def qkv_attention(
  class ResidualAttentionBlock (line 176) | class ResidualAttentionBlock(nn.Module):
    method __init__ (line 177) | def __init__(
    method forward (line 201) | def forward(
  class AudioEncoder (line 224) | class AudioEncoder(nn.Module):
    method __init__ (line 225) | def __init__(
    method forward (line 238) | def forward(self, x: Tensor):
  class TextDecoder (line 257) | class TextDecoder(nn.Module):
    method __init__ (line 258) | def __init__(
    method forward (line 281) | def forward(
  class Whisper (line 335) | class Whisper(nn.Module):
    method __init__ (line 336) | def __init__(self, dims: ModelDimensions, decoder_only: bool = False):
    method set_alignment_heads (line 363) | def set_alignment_heads(self, dump: bytes):
    method embed_audio (line 372) | def embed_audio(self, mel: torch.Tensor):
    method logits (line 375) | def logits(
    method forward (line 388) | def forward(
    method device (line 394) | def device(self):
    method is_multilingual (line 398) | def is_multilingual(self):
    method num_languages (line 402) | def num_languages(self):

FILE: whisperlivekit/whisper/normalizers/basic.py
  function remove_symbols_and_diacritics (line 27) | def remove_symbols_and_diacritics(s: str, keep=""):
  function remove_symbols (line 50) | def remove_symbols(s: str):
  class BasicTextNormalizer (line 60) | class BasicTextNormalizer:
    method __init__ (line 61) | def __init__(self, remove_diacritics: bool = False, split_letters: boo...
    method __call__ (line 67) | def __call__(self, s: str):

FILE: whisperlivekit/whisper/normalizers/english.py
  class EnglishNumberNormalizer (line 12) | class EnglishNumberNormalizer:
    method __init__ (line 23) | def __init__(self):
    method process_words (line 165) | def process_words(self, words: List[str]) -> Iterator[str]:
    method preprocess (line 388) | def preprocess(self, s: str):
    method postprocess (line 417) | def postprocess(self, s: str):
    method __call__ (line 442) | def __call__(self, s: str):
  class EnglishSpellingNormalizer (line 450) | class EnglishSpellingNormalizer:
    method __init__ (line 457) | def __init__(self):
    method __call__ (line 461) | def __call__(self, s: str):
  class EnglishTextNormalizer (line 465) | class EnglishTextNormalizer:
    method __init__ (line 466) | def __init__(self):
    method __call__ (line 526) | def __call__(self, s: str):

FILE: whisperlivekit/whisper/timing.py
  function median_filter (line 19) | def median_filter(x: torch.Tensor, filter_width: int):
  function backtrace (line 58) | def backtrace(trace: np.ndarray):
  function dtw_cpu (line 83) | def dtw_cpu(x: np.ndarray):
  function dtw_cuda (line 108) | def dtw_cuda(x, BLOCK_SIZE=1024):
  function dtw (line 141) | def dtw(x: torch.Tensor) -> np.ndarray:
  class WordTiming (line 155) | class WordTiming:
  function find_alignment (line 163) | def find_alignment(
  function merge_punctuations (line 245) | def merge_punctuations(alignment: List[WordTiming], prepended: str, appe...
  function add_word_timestamps (line 279) | def add_word_timestamps(

FILE: whisperlivekit/whisper/tokenizer.py
  class Tokenizer (line 132) | class Tokenizer:
    method __post_init__ (line 142) | def __post_init__(self):
    method encode (line 161) | def encode(self, text, **kwargs):
    method decode (line 164) | def decode(self, token_ids: List[int], **kwargs) -> str:
    method decode_with_timestamps (line 168) | def decode_with_timestamps(self, token_ids: List[int], **kwargs) -> str:
    method eot (line 176) | def eot(self) -> int:
    method transcribe (line 180) | def transcribe(self) -> int:
    method translate (line 184) | def translate(self) -> int:
    method sot (line 188) | def sot(self) -> int:
    method sot_lm (line 192) | def sot_lm(self) -> int:
    method sot_prev (line 196) | def sot_prev(self) -> int:
    method no_speech (line 200) | def no_speech(self) -> int:
    method no_timestamps (line 204) | def no_timestamps(self) -> int:
    method timestamp_begin (line 208) | def timestamp_begin(self) -> int:
    method language_token (line 212) | def language_token(self) -> int:
    method to_language_token (line 219) | def to_language_token(self, language):
    method all_language_tokens (line 226) | def all_language_tokens(self) -> Tuple[int]:
    method all_language_codes (line 234) | def all_language_codes(self) -> Tuple[str]:
    method sot_sequence_including_notimestamps (line 238) | def sot_sequence_including_notimestamps(self) -> Tuple[int]:
    method non_speech_tokens (line 242) | def non_speech_tokens(self) -> Tuple[int]:
    method split_to_word_tokens (line 277) | def split_to_word_tokens(self, tokens: List[int]):
    method split_tokens_on_unicode (line 286) | def split_tokens_on_unicode(self, tokens: List[int]):
    method split_tokens_on_spaces (line 316) | def split_tokens_on_spaces(self, tokens: List[int]):
  function get_encoding (line 336) | def get_encoding(name: str = "gpt2", num_languages: int = 99):
  function get_tokenizer (line 372) | def get_tokenizer(

FILE: whisperlivekit/whisper/transcribe.py
  function transcribe (line 21) | def transcribe(
  function cli (line 500) | def cli():

FILE: whisperlivekit/whisper/triton_ops.py
  function dtw_kernel (line 14) | def dtw_kernel(
  function median_kernel (line 44) | def median_kernel(filter_width: int):
  function median_filter_cuda (line 106) | def median_filter_cuda(x: torch.Tensor, filter_width: int):

FILE: whisperlivekit/whisper/utils.py
  function make_safe (line 12) | def make_safe(string):
  function make_safe (line 19) | def make_safe(string):
  function exact_div (line 24) | def exact_div(x, y):
  function str2bool (line 29) | def str2bool(string):
  function optional_int (line 37) | def optional_int(string):
  function optional_float (line 41) | def optional_float(string):
  function compression_ratio (line 45) | def compression_ratio(text) -> float:
  function format_timestamp (line 50) | def format_timestamp(
  function get_start (line 71) | def get_start(segments: List[dict]) -> Optional[float]:
  function get_end (line 78) | def get_end(segments: List[dict]) -> Optional[float]:
  class ResultWriter (line 85) | class ResultWriter:
    method __init__ (line 88) | def __init__(self, output_dir: str):
    method __call__ (line 91) | def __call__(
    method write_result (line 103) | def write_result(
  class WriteTXT (line 109) | class WriteTXT(ResultWriter):
    method write_result (line 112) | def write_result(
  class SubtitlesWriter (line 119) | class SubtitlesWriter(ResultWriter):
    method iterate_result (line 123) | def iterate_result(
    method format_timestamp (line 230) | def format_timestamp(self, seconds: float):
  class WriteVTT (line 238) | class WriteVTT(SubtitlesWriter):
    method write_result (line 243) | def write_result(
  class WriteSRT (line 251) | class WriteSRT(SubtitlesWriter):
    method write_result (line 256) | def write_result(
  class WriteTSV (line 265) | class WriteTSV(ResultWriter):
    method write_result (line 277) | def write_result(
  class WriteJSON (line 287) | class WriteJSON(ResultWriter):
    method write_result (line 290) | def write_result(
  function get_writer (line 296) | def get_writer(

FILE: whisperlivekit/whisper/val.py
  class Value (line 31) | class Value:
    method __init__ (line 34) | def __init__(self, data, children=(), local_grads=()):
    method __add__ (line 40) | def __add__(self, other):
    method __mul__ (line 44) | def __mul__(self, other):
    method __pow__ (line 48) | def __pow__(self, other): return Value(self.data**other, (self,), (oth...
    method log (line 49) | def log(self): return Value(math.log(self.data), (self,), (1/self.data,))
    method exp (line 50) | def exp(self): return Value(math.exp(self.data), (self,), (math.exp(se...
    method relu (line 51) | def relu(self): return Value(max(0, self.data), (self,), (float(self.d...
    method __neg__ (line 52) | def __neg__(self): return self * -1
    method __radd__ (line 53) | def __radd__(self, other): return self + other
    method __sub__ (line 54) | def __sub__(self, other): return self + (-other)
    method __rsub__ (line 55) | def __rsub__(self, other): return other + (-self)
    method __rmul__ (line 56) | def __rmul__(self, other): return self * other
    method __truediv__ (line 57) | def __truediv__(self, other): return self * other**-1
    method __rtruediv__ (line 58) | def __rtruediv__(self, other): return other * self**-1
    method backward (line 60) | def backward(self):
  function linear (line 95) | def linear(x, w):
  function softmax (line 99) | def softmax(logits):
  function rmsnorm (line 105) | def rmsnorm(x):
  function gpt (line 110) | def gpt(token_id, pos_id, keys, values):

Download .json

Condensed preview — 146 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (3,941K chars).

[
  {
    "path": ".dockerignore",
    "chars": 111,
    "preview": ".git\n.github\n.venv\n__pycache__\n*.pyc\n.pytest_cache\n.mypy_cache\n.ruff_cache\n.cache\n.tmp\n.secrets\ndist\nbuild\n*.c\n"
  },
  {
    "path": ".github/workflows/ci.yml",
    "chars": 887,
    "preview": "name: CI\n\non:\n  push:\n    branches: [main]\n  pull_request:\n    branches: [main]\n\njobs:\n  lint:\n    runs-on: ubuntu-lates"
  },
  {
    "path": ".github/workflows/publish-docker.yml",
    "chars": 1728,
    "preview": "name: Publish Docker Images\n\non:\n  push:\n    tags:\n      - \"v*\"\n  workflow_dispatch:\n    inputs:\n      tag:\n        desc"
  },
  {
    "path": ".gitignore",
    "chars": 1801,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "AGENTS.md",
    "chars": 4184,
    "preview": "# Instructions for WLK\n\n> [!IMPORTANT]\n> This project does **not** accept pull requests that are fully or predominantly "
  },
  {
    "path": "CHANGES.md",
    "chars": 103,
    "preview": "IMPORTANT: Ensure you’ve thoroughly reviewed the [AGENTS.md](AGENTS.md) file before beginning any work."
  },
  {
    "path": "CLAUDE.md",
    "chars": 7078,
    "preview": "# CLAUDE.md -- WhisperLiveKit\n\n## Build & Test\n\nInstall for development:\n\n```sh\npip install -e \".[test]\"\n```\n\nTest with "
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 2021,
    "preview": "# Contributing\n\nThank you for considering contributing ! We appreciate your time and effort to help make this project be"
  },
  {
    "path": "DEV_NOTES.md",
    "chars": 3031,
    "preview": "# 1. Simulstreaming: Decouple the encoder for faster inference\n\nSimulstreaming encoder time (whisperlivekit/simul_whispe"
  },
  {
    "path": "Dockerfile",
    "chars": 1956,
    "preview": "FROM ghcr.io/astral-sh/uv:0.10.4 AS uvbin\n\n# --- MARK: Builder Stage\nFROM nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04 AS "
  },
  {
    "path": "Dockerfile.cpu",
    "chars": 1969,
    "preview": "FROM ghcr.io/astral-sh/uv:0.10.4 AS uvbin\n\n# --- MARK: Builder Stage\nFROM debian:bookworm-slim AS builder-cpu\nENV DEBIAN"
  },
  {
    "path": "LICENSE",
    "chars": 11903,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 18645,
    "preview": "<h1 align=\"center\">WLK</h1>\n<p align=\"center\"><b>WhisperLiveKit: Ultra-low-latency, self-hosted speech-to-text with spea"
  },
  {
    "path": "benchmark_mlx_simul.py",
    "chars": 16957,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nBenchmark Qwen3-ASR MLX SimulStreaming on LibriSpeech test-clean.\n\nMeasures:\n  - Word Error R"
  },
  {
    "path": "benchmarks/h100/bench_voxtral_hf_batch.py",
    "chars": 5201,
    "preview": "#!/usr/bin/env python3\n\"\"\"Standalone Voxtral benchmark — no whisperlivekit imports.\"\"\"\nimport json, logging, re, time, w"
  },
  {
    "path": "benchmarks/h100/bench_voxtral_vllm_realtime.py",
    "chars": 5468,
    "preview": "#!/usr/bin/env python3\n\"\"\"Benchmark Voxtral via vLLM WebSocket /v1/realtime — proper streaming.\"\"\"\nimport asyncio, json,"
  },
  {
    "path": "benchmarks/h100/generate_figures.py",
    "chars": 11931,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nGenerate polished benchmark figures for WhisperLiveKit H100 results.\n\nReads data from results"
  },
  {
    "path": "benchmarks/h100/results.json",
    "chars": 2438,
    "preview": "{\n  \"hardware\": \"NVIDIA H100 80GB HBM3, CUDA 12.4, Driver 550.163\",\n  \"date\": \"2026-03-15\",\n\n  \"librispeech_clean\": {\n  "
  },
  {
    "path": "benchmarks/m5/bench_0.6b_simul_500.json",
    "chars": 336131,
    "preview": "{\n  \"model\": \"Qwen3-ASR-0.6B\",\n  \"backend\": \"mlx-simul-streaming\",\n  \"mode\": \"simul-streaming\",\n  \"platform\": \"Apple M5 "
  },
  {
    "path": "benchmarks/m5/bench_1.7b_simul_500.json",
    "chars": 335767,
    "preview": "{\n  \"model\": \"Qwen3-ASR-1.7B\",\n  \"backend\": \"mlx-simul-streaming\",\n  \"mode\": \"simul-streaming\",\n  \"platform\": \"Apple M5 "
  },
  {
    "path": "benchmarks/m5/generate_figures.py",
    "chars": 6256,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nGenerate combined M5 vs H100 benchmark figure for WhisperLiveKit.\n\nProduces a WER vs RTF scat"
  },
  {
    "path": "benchmarks/m5/results.json",
    "chars": 269,
    "preview": "{\n  \"platform\": \"Apple M5 (32GB RAM, MLX fp16)\",\n  \"dataset\": \"LibriSpeech test-clean\",\n  \"methodology\": \"per-utterance "
  },
  {
    "path": "chrome-extension/README.md",
    "chars": 1134,
    "preview": "## WhisperLiveKit Chrome Extension v0.1.1\nCapture the audio of your current tab, transcribe diarize and translate it usi"
  },
  {
    "path": "chrome-extension/background.js",
    "chars": 234,
    "preview": "chrome.runtime.onInstalled.addListener((details) => {\n    if (details.reason.search(/install/g) === -1) {\n        return"
  },
  {
    "path": "chrome-extension/manifest.json",
    "chars": 603,
    "preview": "{\n    \"manifest_version\": 3,\n    \"name\": \"WhisperLiveKit Tab Capture\",\n    \"version\": \"1.0\",\n    \"description\": \"Capture"
  },
  {
    "path": "chrome-extension/requestPermissions.html",
    "chars": 336,
    "preview": "<!DOCTYPE html>\n<html>\n  <head>\n    <title>Request Permissions</title>\n    <script src=\"requestPermissions.js\"></script>"
  },
  {
    "path": "chrome-extension/requestPermissions.js",
    "chars": 564,
    "preview": "/**\n * Requests user permission for microphone access.\n * @returns {Promise<void>} A Promise that resolves when permissi"
  },
  {
    "path": "chrome-extension/sidepanel.js",
    "chars": 724,
    "preview": "console.log(\"sidepanel.js\");\n\nasync function run() {\n  const micPermission = await navigator.permissions.query({\n    nam"
  },
  {
    "path": "compose.yml",
    "chars": 1301,
    "preview": "services:\n  wlk-gpu-sortformer:\n    build:\n      context: .\n      dockerfile: Dockerfile\n      args:\n        EXTRAS: ${G"
  },
  {
    "path": "docs/API.md",
    "chars": 18040,
    "preview": "# WhisperLiveKit API Reference\n\nThis document describes all APIs: the WebSocket streaming API, the OpenAI-compatible RES"
  },
  {
    "path": "docs/alignement_principles.md",
    "chars": 1892,
    "preview": "### Alignment between STT Tokens and Diarization Segments \n\n- Example 1: The punctuation from STT and the speaker change"
  },
  {
    "path": "docs/default_and_custom_models.md",
    "chars": 4900,
    "preview": "# Models and Model Paths\n\n## Defaults\n\n**Default Whisper Model**: `base`  \nWhen no model is specified, WhisperLiveKit us"
  },
  {
    "path": "docs/supported_languages.md",
    "chars": 12070,
    "preview": "# Transcription: Supported Language\n\nWLK supports transcription in the following languages:\n\n| ISO Code | Language Name "
  },
  {
    "path": "docs/technical_integration.md",
    "chars": 2317,
    "preview": "# Technical Integration Guide\n\nThis document introduce how to reuse the core components when you do **not** want to ship"
  },
  {
    "path": "docs/troubleshooting.md",
    "chars": 5624,
    "preview": "# Troubleshooting\n\n\n## GPU drivers & cuDNN visibility\n\n### Linux error: `Unable to load libcudnn_ops.so* / cudnnCreateTe"
  },
  {
    "path": "pyproject.toml",
    "chars": 4838,
    "preview": "[build-system]\nrequires = [\"setuptools>=61.0\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[project]\nname = \"whisperlivekit"
  },
  {
    "path": "scripts/alignment_heads_qwen3_asr_0.6B.json",
    "chars": 43608,
    "preview": "{\n  \"model\": \"Qwen/Qwen3-ASR-0.6B\",\n  \"language\": \"English\",\n  \"num_layers\": 28,\n  \"num_heads\": 16,\n  \"num_kv_heads\": 8,"
  },
  {
    "path": "scripts/alignment_heads_qwen3_asr_1.7B.json",
    "chars": 44265,
    "preview": "{\n  \"model\": \"Qwen/Qwen3-ASR-1.7B\",\n  \"language\": \"English\",\n  \"num_layers\": 28,\n  \"num_heads\": 16,\n  \"num_kv_heads\": 8,"
  },
  {
    "path": "scripts/alignment_heads_qwen3_asr_1.7B_v2.json",
    "chars": 42960,
    "preview": "{\n  \"model\": \"Qwen/Qwen3-ASR-1.7B\",\n  \"language\": \"English\",\n  \"num_layers\": 28,\n  \"num_heads\": 16,\n  \"num_kv_heads\": 8,"
  },
  {
    "path": "scripts/convert_hf_whisper.py",
    "chars": 5010,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nConvert a Hugging Face style Whisper checkpoint into a WhisperLiveKit .pt file.\n\nOptionally s"
  },
  {
    "path": "scripts/create_long_samples.py",
    "chars": 4586,
    "preview": "#!/usr/bin/env python3\n\"\"\"Create long benchmark samples (5min+) by concatenating utterances from public datasets.\"\"\"\n\nim"
  },
  {
    "path": "scripts/detect_alignment_heads_qwen3.py",
    "chars": 26099,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nDetect alignment heads in Qwen3-ASR for SimulStreaming-style inference.\n\nQwen3-ASR is a decod"
  },
  {
    "path": "scripts/determine_alignment_heads.py",
    "chars": 8371,
    "preview": "\"\"\"Determine alignment heads for a variants, such as distilled model\"\"\"\nfrom __future__ import annotations\n\nimport argpa"
  },
  {
    "path": "scripts/generate_architecture.py",
    "chars": 10168,
    "preview": "#!/usr/bin/env python3\n\"\"\"Generate the architecture.png diagram for WhisperLiveKit README.\"\"\"\n\nimport matplotlib\nmatplot"
  },
  {
    "path": "scripts/python_support_matrix.py",
    "chars": 17716,
    "preview": "#!/usr/bin/env python3\n\"\"\"Offline Python support matrix runner for WhisperLiveKit.\"\"\"\n\nfrom __future__ import annotation"
  },
  {
    "path": "scripts/run_scatter_benchmark.py",
    "chars": 17422,
    "preview": "#!/usr/bin/env python3\n\"\"\"Run benchmark across all backend x model x policy combos for scatter plot.\n\nTests each configu"
  },
  {
    "path": "scripts/sync_extension.py",
    "chars": 977,
    "preview": "\"\"\"Copy core files from web directory to Chrome extension directory.\"\"\"\n\nimport shutil\nfrom pathlib import Path\n\n\ndef sy"
  },
  {
    "path": "tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/test_pipeline.py",
    "chars": 21140,
    "preview": "\"\"\"End-to-end pipeline tests using real models and real audio.\n\nRun with: pytest tests/test_pipeline.py -v\n\nTests exerci"
  },
  {
    "path": "whisperlivekit/__init__.py",
    "chars": 597,
    "preview": "from .audio_processor import AudioProcessor\nfrom .config import WhisperLiveKitConfig\nfrom .core import TranscriptionEngi"
  },
  {
    "path": "whisperlivekit/audio_processor.py",
    "chars": 34602,
    "preview": "import asyncio\nimport logging\nimport traceback\nfrom time import time\nfrom typing import Any, AsyncGenerator, List, Optio"
  },
  {
    "path": "whisperlivekit/backend_support.py",
    "chars": 1435,
    "preview": "import importlib.util\nimport logging\nimport platform\n\nlogger = logging.getLogger(__name__)\n\n\ndef module_available(module"
  },
  {
    "path": "whisperlivekit/basic_server.py",
    "chars": 13235,
    "preview": "import asyncio\nimport logging\nfrom contextlib import asynccontextmanager\nfrom typing import List, Optional\n\nfrom fastapi"
  },
  {
    "path": "whisperlivekit/benchmark/__init__.py",
    "chars": 1000,
    "preview": "\"\"\"WhisperLiveKit benchmark suite.\n\nComprehensive benchmarking of ASR backends using public datasets,\nrun through the sa"
  },
  {
    "path": "whisperlivekit/benchmark/compat.py",
    "chars": 2990,
    "preview": "\"\"\"Backend detection and language compatibility matrix.\"\"\"\n\nimport logging\nfrom typing import Dict, List, Optional, Set\n"
  },
  {
    "path": "whisperlivekit/benchmark/datasets.py",
    "chars": 17714,
    "preview": "\"\"\"Benchmark audio datasets from public HuggingFace repositories.\n\nDownloads curated samples across languages, noise con"
  },
  {
    "path": "whisperlivekit/benchmark/metrics.py",
    "chars": 8581,
    "preview": "\"\"\"Benchmark result data structures and aggregation.\"\"\"\n\nimport platform\nimport subprocess\nimport time\nfrom dataclasses "
  },
  {
    "path": "whisperlivekit/benchmark/report.py",
    "chars": 5553,
    "preview": "\"\"\"Benchmark report formatting — terminal tables and JSON export.\"\"\"\n\nimport json\nimport sys\nfrom pathlib import Path\nfr"
  },
  {
    "path": "whisperlivekit/benchmark/runner.py",
    "chars": 6401,
    "preview": "\"\"\"Benchmark runner — orchestrates runs through TestHarness.\"\"\"\n\nimport logging\nimport resource\nimport time\nfrom typing "
  },
  {
    "path": "whisperlivekit/cascade_bridge.py",
    "chars": 3641,
    "preview": "\"\"\"\nBridge between WhisperLiveKit STT and IWSLT26 MT pipeline.\n\nConverts streaming ASRToken output from SimulStreaming i"
  },
  {
    "path": "whisperlivekit/cli.py",
    "chars": 63321,
    "preview": "\"\"\"CLI entry point for WhisperLiveKit.\n\nProvides subcommands:\n  wlk serve       — Start the transcription server (defaul"
  },
  {
    "path": "whisperlivekit/config.py",
    "chars": 3572,
    "preview": "\"\"\"Typed configuration for the WhisperLiveKit pipeline.\"\"\"\nimport logging\nfrom dataclasses import dataclass, fields\nfrom"
  },
  {
    "path": "whisperlivekit/core.py",
    "chars": 13740,
    "preview": "import logging\nimport threading\nfrom argparse import Namespace\nfrom dataclasses import asdict\n\nfrom whisperlivekit.confi"
  },
  {
    "path": "whisperlivekit/deepgram_compat.py",
    "chars": 10538,
    "preview": "\"\"\"Deepgram-compatible WebSocket endpoint for WhisperLiveKit.\n\nProvides a /v1/listen endpoint that speaks the Deepgram L"
  },
  {
    "path": "whisperlivekit/diarization/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "whisperlivekit/diarization/diart_backend.py",
    "chars": 11895,
    "preview": "import asyncio\nimport logging\nimport threading\nimport time\nfrom queue import Empty, SimpleQueue\nfrom typing import Any, "
  },
  {
    "path": "whisperlivekit/diarization/sortformer_backend.py",
    "chars": 12928,
    "preview": "import logging\nimport threading\nimport wave\nfrom typing import List, Optional\n\nimport numpy as np\nimport torch\n\nfrom whi"
  },
  {
    "path": "whisperlivekit/diarization/utils.py",
    "chars": 188,
    "preview": "import re\n\n\ndef extract_number(s: str) -> int:\n    \"\"\"Extract the first integer from a string, e.g. 'speaker_2' -> 2.\"\"\""
  },
  {
    "path": "whisperlivekit/diff_protocol.py",
    "chars": 3706,
    "preview": "\"\"\"Diff-based WebSocket output protocol for WhisperLiveKit.\n\nInstead of sending the full FrontData state on every update"
  },
  {
    "path": "whisperlivekit/ffmpeg_manager.py",
    "chars": 6742,
    "preview": "import asyncio\nimport contextlib\nimport logging\nfrom enum import Enum\nfrom typing import Callable, Optional\n\nlogger = lo"
  },
  {
    "path": "whisperlivekit/local_agreement/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "whisperlivekit/local_agreement/backends.py",
    "chars": 10702,
    "preview": "import io\nimport logging\nimport math\nimport sys\nfrom typing import List\n\nimport numpy as np\nimport soundfile as sf\n\nfrom"
  },
  {
    "path": "whisperlivekit/local_agreement/online_asr.py",
    "chars": 18259,
    "preview": "import logging\nimport sys\nfrom typing import List, Optional, Tuple\n\nimport numpy as np\n\nfrom whisperlivekit.timed_object"
  },
  {
    "path": "whisperlivekit/local_agreement/whisper_online.py",
    "chars": 7217,
    "preview": "#!/usr/bin/env python3\nimport logging\nimport platform\nimport time\n\nfrom whisperlivekit.backend_support import faster_bac"
  },
  {
    "path": "whisperlivekit/metrics.py",
    "chars": 5058,
    "preview": "\"\"\"Lightweight ASR evaluation metrics — no external dependencies.\n\nProvides WER (Word Error Rate) computation via word-l"
  },
  {
    "path": "whisperlivekit/metrics_collector.py",
    "chars": 2946,
    "preview": "\"\"\"Lightweight runtime metrics for AudioProcessor sessions.\n\nZero external dependencies. Negligible overhead when not qu"
  },
  {
    "path": "whisperlivekit/model_mapping.py",
    "chars": 794,
    "preview": "\"\"\"Shared MLX model name mapping used by both SimulStreaming and LocalAgreement backends.\"\"\"\n\nMLX_MODEL_MAPPING = {\n    "
  },
  {
    "path": "whisperlivekit/model_paths.py",
    "chars": 7135,
    "preview": "import json\nimport re\nfrom dataclasses import dataclass, field\nfrom pathlib import Path\nfrom typing import List, Optiona"
  },
  {
    "path": "whisperlivekit/parse_args.py",
    "chars": 12228,
    "preview": "\nfrom argparse import ArgumentParser\n\n\ndef parse_args():\n    parser = ArgumentParser(description=\"Whisper FastAPI Online"
  },
  {
    "path": "whisperlivekit/qwen3_asr.py",
    "chars": 10412,
    "preview": "import logging\nimport re\nimport sys\nfrom typing import List, Optional\n\nimport numpy as np\n\nfrom whisperlivekit.local_agr"
  },
  {
    "path": "whisperlivekit/qwen3_mlx_asr.py",
    "chars": 14923,
    "preview": "\"\"\"\nMLX-accelerated Qwen3-ASR backend for WhisperLiveKit.\n\nProvides ``Qwen3MLXASR`` (model holder) and ``Qwen3MLXOnlineP"
  },
  {
    "path": "whisperlivekit/qwen3_mlx_simul.py",
    "chars": 27482,
    "preview": "\"\"\"\nQwen3-ASR SimulStreaming (AlignAtt) on MLX for Apple Silicon.\n\nUses the ``mlx_qwen3_asr`` library for model loading,"
  },
  {
    "path": "whisperlivekit/qwen3_simul.py",
    "chars": 50549,
    "preview": "\"\"\"\nSimulStreaming-style online processor for Qwen3-ASR.\n\nArchitecture overview\n---------------------\nQwen3-ASR is a dec"
  },
  {
    "path": "whisperlivekit/qwen3_simul_kv.py",
    "chars": 30859,
    "preview": "\"\"\"\nQwen3-ASR SimulStreaming with KV cache reuse.\n\nThis is an optimized version of qwen3_simul.py that reuses the KV cac"
  },
  {
    "path": "whisperlivekit/session_asr_proxy.py",
    "chars": 1643,
    "preview": "\"\"\"Per-session ASR proxy for language override.\n\nWraps a shared ASR backend so that each WebSocket session can use a\ndif"
  },
  {
    "path": "whisperlivekit/silero_vad_iterator.py",
    "chars": 11027,
    "preview": "import warnings\nfrom pathlib import Path\n\nimport numpy as np\nimport torch\n\n\"\"\"\nCode is adapted from silero-vad v6: https"
  },
  {
    "path": "whisperlivekit/silero_vad_models/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "whisperlivekit/simul_whisper/__init__.py",
    "chars": 147,
    "preview": "from .backend import SimulStreamingASR, SimulStreamingOnlineProcessor\n\n__all__ = [\n    \"SimulStreamingASR\",\n    \"SimulSt"
  },
  {
    "path": "whisperlivekit/simul_whisper/align_att_base.py",
    "chars": 20814,
    "preview": "\"\"\"Abstract base class for AlignAtt streaming decoders (PyTorch & MLX).\"\"\"\nimport logging\nfrom abc import ABC, abstractm"
  },
  {
    "path": "whisperlivekit/simul_whisper/backend.py",
    "chars": 14648,
    "preview": "import gc\nimport logging\nimport platform\nimport sys\nfrom typing import List, Tuple\n\nimport numpy as np\nimport torch\n\nfro"
  },
  {
    "path": "whisperlivekit/simul_whisper/beam.py",
    "chars": 1207,
    "preview": "from torch import Tensor\n\nfrom whisperlivekit.whisper.decoding import PyTorchInference\n\n\nclass BeamPyTorchInference(PyTo"
  },
  {
    "path": "whisperlivekit/simul_whisper/config.py",
    "chars": 794,
    "preview": "from dataclasses import dataclass, field\nfrom typing import Literal\n\n\n@dataclass\nclass AlignAttConfig():\n    eval_data_p"
  },
  {
    "path": "whisperlivekit/simul_whisper/decoder_state.py",
    "chars": 2849,
    "preview": "from dataclasses import dataclass, field\nfrom typing import Any, Dict, List, Optional, Tuple\n\nimport torch\n\n\n@dataclass\n"
  },
  {
    "path": "whisperlivekit/simul_whisper/eow_detection.py",
    "chars": 2534,
    "preview": "import torch\n\n# code for the end-of-word detection based on the CIF model proposed in Simul-Whisper\n\ndef load_cif(cfg, n"
  },
  {
    "path": "whisperlivekit/simul_whisper/mlx/__init__.py",
    "chars": 286,
    "preview": "from .decoder_state import MLXDecoderState\nfrom .decoders import MLXBeamSearchDecoder, MLXGreedyDecoder, MLXInference\nfr"
  },
  {
    "path": "whisperlivekit/simul_whisper/mlx/decoder_state.py",
    "chars": 2338,
    "preview": "from dataclasses import dataclass, field\nfrom typing import Any, Dict, List, Optional, Tuple\n\nimport mlx.core as mx\nimpo"
  },
  {
    "path": "whisperlivekit/simul_whisper/mlx/decoders.py",
    "chars": 8293,
    "preview": "\"\"\"\nMLX-native token decoders for streaming ASR.\n\"\"\"\nfrom typing import Any, Dict, List, Optional, Tuple\n\nimport mlx.cor"
  },
  {
    "path": "whisperlivekit/simul_whisper/mlx/simul_whisper.py",
    "chars": 16574,
    "preview": "\"\"\"MLX whisper AlignAtt streaming decoder.\"\"\"\nimport logging\nfrom typing import Any, List, Tuple\n\nimport mlx.core as mx\n"
  },
  {
    "path": "whisperlivekit/simul_whisper/mlx_encoder.py",
    "chars": 2687,
    "preview": "import json\nfrom pathlib import Path\n\nimport mlx.core as mx\nimport mlx.nn as nn\nfrom huggingface_hub import snapshot_dow"
  },
  {
    "path": "whisperlivekit/simul_whisper/simul_whisper.py",
    "chars": 17390,
    "preview": "import logging\nimport os\nfrom typing import List\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\n\nfrom "
  },
  {
    "path": "whisperlivekit/simul_whisper/token_buffer.py",
    "chars": 3024,
    "preview": "\nimport torch\n\n\nclass TokenBuffer:\n\n    def __init__(self, text=\"\", tokenizer=None, device=None, prefix_token_ids=[]):\n "
  },
  {
    "path": "whisperlivekit/test_client.py",
    "chars": 13641,
    "preview": "\"\"\"Headless test client for WhisperLiveKit.\n\nFeeds audio files to the transcription pipeline via WebSocket\nand collects "
  },
  {
    "path": "whisperlivekit/test_data.py",
    "chars": 11350,
    "preview": "\"\"\"Standard test audio samples for evaluating the WhisperLiveKit pipeline.\n\nDownloads curated samples from public ASR da"
  },
  {
    "path": "whisperlivekit/test_harness.py",
    "chars": 27354,
    "preview": "\"\"\"In-process testing harness for the full WhisperLiveKit pipeline.\n\nWraps AudioProcessor to provide a controllable, obs"
  },
  {
    "path": "whisperlivekit/thread_safety.py",
    "chars": 3843,
    "preview": "\"\"\"\nThread Safety Configuration for WhisperLiveKit\n\nThis module provides thread safety configuration and utilities.\n\nEnv"
  },
  {
    "path": "whisperlivekit/timed_objects.py",
    "chars": 7155,
    "preview": "from dataclasses import dataclass, field\nfrom typing import Any, Dict, List, Optional, Union\n\nPUNCTUATION_MARKS = {'.', "
  },
  {
    "path": "whisperlivekit/tokens_alignment.py",
    "chars": 11164,
    "preview": "from time import time\nfrom typing import Any, List, Optional, Tuple, Union\n\nfrom whisperlivekit.timed_objects import (\n "
  },
  {
    "path": "whisperlivekit/vllm_realtime.py",
    "chars": 14316,
    "preview": "\"\"\"\nvLLM Realtime WebSocket streaming backend for WhisperLiveKit.\n\nConnects to a vLLM server's ``/v1/realtime`` WebSocke"
  },
  {
    "path": "whisperlivekit/voxtral_hf_streaming.py",
    "chars": 22307,
    "preview": "\"\"\"\nVoxtral Mini Realtime streaming backend using HuggingFace Transformers.\n\nUses VoxtralRealtimeForConditionalGeneratio"
  },
  {
    "path": "whisperlivekit/voxtral_mlx/__init__.py",
    "chars": 188,
    "preview": "\"\"\"Pure-MLX Voxtral Realtime backend for WhisperLiveKit.\"\"\"\n\nfrom .loader import load_voxtral_model\nfrom .model import V"
  },
  {
    "path": "whisperlivekit/voxtral_mlx/loader.py",
    "chars": 10818,
    "preview": "\"\"\"\nModel weight loading for the MLX Voxtral Realtime backend.\n\nSupports two on-disk formats:\n  1. **Converted** (``conf"
  },
  {
    "path": "whisperlivekit/voxtral_mlx/model.py",
    "chars": 19552,
    "preview": "\"\"\"\nVoxtral Realtime MLX model — encoder, decoder, adapter, and top-level model.\n\nArchitecture:\n    audio → StreamingEnc"
  },
  {
    "path": "whisperlivekit/voxtral_mlx/spectrogram.py",
    "chars": 6865,
    "preview": "\"\"\"\nMel spectrogram computation for Voxtral Realtime.\n\nProvides both a full-audio function and an incremental streaming "
  },
  {
    "path": "whisperlivekit/voxtral_mlx_asr.py",
    "chars": 25815,
    "preview": "\"\"\"\nPure-MLX Voxtral Realtime ASR backend for WhisperLiveKit.\n\nProvides ``VoxtralMLXASR`` (model holder) and ``VoxtralML"
  },
  {
    "path": "whisperlivekit/warmup.py",
    "chars": 1871,
    "preview": "\nimport logging\n\nlogger = logging.getLogger(__name__)\n\ndef load_file(warmup_file=None, timeout=5):\n    import os\n    imp"
  },
  {
    "path": "whisperlivekit/web/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "whisperlivekit/web/live_transcription.css",
    "chars": 10844,
    "preview": ":root {\n  --bg: #ffffff;\n  --text: #111111;\n  --muted: #666666;\n  --border: #e5e5e5;\n  --chip-bg: rgba(0, 0, 0, 0.04);\n "
  },
  {
    "path": "whisperlivekit/web/live_transcription.html",
    "chars": 2999,
    "preview": "<!DOCTYPE html>\n<html lang=\"en\">\n\n<head>\n    <meta charset=\"UTF-8\" />\n    <meta name=\"viewport\" content=\"width=device-wi"
  },
  {
    "path": "whisperlivekit/web/live_transcription.js",
    "chars": 29595,
    "preview": "const isExtension = typeof chrome !== 'undefined' && chrome.runtime && chrome.runtime.getURL;\nif (isExtension) {\n  docum"
  },
  {
    "path": "whisperlivekit/web/pcm_worklet.js",
    "chars": 511,
    "preview": "class PCMForwarder extends AudioWorkletProcessor {\n  process(inputs) {\n    const input = inputs[0];\n    if (input && inp"
  },
  {
    "path": "whisperlivekit/web/recorder_worker.js",
    "chars": 1621,
    "preview": "let sampleRate = 48000;\nlet targetSampleRate = 16000;\n\nself.onmessage = function (e) {\n  switch (e.data.command) {\n    c"
  },
  {
    "path": "whisperlivekit/web/web_interface.py",
    "chars": 5068,
    "preview": "import base64\nimport importlib.resources as resources\nimport logging\n\nlogger = logging.getLogger(__name__)\n\ndef get_web_"
  },
  {
    "path": "whisperlivekit/whisper/__init__.py",
    "chars": 25778,
    "preview": "import hashlib\nimport io\nimport json\nimport os\nimport urllib\nimport warnings\nfrom pathlib import Path\nfrom typing import"
  },
  {
    "path": "whisperlivekit/whisper/__main__.py",
    "chars": 35,
    "preview": "from .transcribe import cli\n\ncli()\n"
  },
  {
    "path": "whisperlivekit/whisper/assets/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "whisperlivekit/whisper/assets/gpt2.tiktoken",
    "chars": 835554,
    "preview": "IQ== 0\nIg== 1\nIw== 2\nJA== 3\nJQ== 4\nJg== 5\nJw== 6\nKA== 7\nKQ== 8\nKg== 9\nKw== 10\nLA== 11\nLQ== 12\nLg== 13\nLw== 14\nMA== 15\nMQ"
  },
  {
    "path": "whisperlivekit/whisper/assets/multilingual.tiktoken",
    "chars": 816730,
    "preview": "IQ== 0\nIg== 1\nIw== 2\nJA== 3\nJQ== 4\nJg== 5\nJw== 6\nKA== 7\nKQ== 8\nKg== 9\nKw== 10\nLA== 11\nLQ== 12\nLg== 13\nLw== 14\nMA== 15\nMQ"
  },
  {
    "path": "whisperlivekit/whisper/audio.py",
    "chars": 4945,
    "preview": "import os\nfrom functools import lru_cache\nfrom subprocess import CalledProcessError, run\nfrom typing import Optional, Un"
  },
  {
    "path": "whisperlivekit/whisper/decoding.py",
    "chars": 31994,
    "preview": "from dataclasses import dataclass, field, replace\nfrom typing import TYPE_CHECKING, Dict, Iterable, List, Optional, Sequ"
  },
  {
    "path": "whisperlivekit/whisper/model.py",
    "chars": 13909,
    "preview": "import base64\nimport gzip\nfrom contextlib import contextmanager\nfrom dataclasses import dataclass\nfrom typing import Dic"
  },
  {
    "path": "whisperlivekit/whisper/normalizers/__init__.py",
    "chars": 130,
    "preview": "from .basic import BasicTextNormalizer as BasicTextNormalizer\nfrom .english import EnglishTextNormalizer as EnglishTextN"
  },
  {
    "path": "whisperlivekit/whisper/normalizers/basic.py",
    "chars": 2047,
    "preview": "import re\nimport unicodedata\n\nimport regex\n\n# non-ASCII letters that are not separated by \"NFKD\" normalization\nADDITIONA"
  },
  {
    "path": "whisperlivekit/whisper/normalizers/english.json",
    "chars": 56128,
    "preview": "{\n    \"accessorise\": \"accessorize\",\n    \"accessorised\": \"accessorized\",\n    \"accessorises\": \"accessorizes\",\n    \"accesso"
  },
  {
    "path": "whisperlivekit/whisper/normalizers/english.py",
    "chars": 20843,
    "preview": "import json\nimport os\nimport re\nfrom fractions import Fraction\nfrom typing import Iterator, List, Match, Optional, Union"
  },
  {
    "path": "whisperlivekit/whisper/timing.py",
    "chars": 12674,
    "preview": "import itertools\nimport subprocess\nimport warnings\nfrom dataclasses import dataclass\nfrom typing import TYPE_CHECKING, L"
  },
  {
    "path": "whisperlivekit/whisper/tokenizer.py",
    "chars": 12529,
    "preview": "import base64\nimport os\nimport string\nfrom dataclasses import dataclass, field\nfrom functools import cached_property, lr"
  },
  {
    "path": "whisperlivekit/whisper/transcribe.py",
    "chars": 30245,
    "preview": "import argparse\nimport os\nimport traceback\nimport warnings\nfrom typing import TYPE_CHECKING, List, Optional, Tuple, Unio"
  },
  {
    "path": "whisperlivekit/whisper/triton_ops.py",
    "chars": 3646,
    "preview": "from functools import lru_cache\n\nimport numpy as np\nimport torch\n\ntry:\n    import triton\n    import triton.language as t"
  },
  {
    "path": "whisperlivekit/whisper/utils.py",
    "chars": 11529,
    "preview": "import json\nimport os\nimport re\nimport sys\nimport zlib\nfrom typing import Callable, List, Optional, TextIO\n\nsystem_encod"
  },
  {
    "path": "whisperlivekit/whisper/val.py",
    "chars": 9158,
    "preview": "\"\"\"\nThe most atomic way to train and inference a GPT in pure, dependency-free Python.\nThis file is the complete algorith"
  },
  {
    "path": "whisperlivekit/whisper/version.py",
    "chars": 25,
    "preview": "__version__ = \"20250625\"\n"
  }
]

// ... and 5 more files (download for full content)

About this extraction

This page contains the full source code of the QuentinFuxa/WhisperLiveKit GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 146 files (3.5 MB), approximately 935.2k tokens, and a symbol index with 1208 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo