Full Code of jasonppy/VoiceCraft for AI

master a702dfd2ced6 cached
47 files
1.2 MB
595.7k tokens
348 symbols
1 requests
Download .txt
Showing preview only (1,267K chars total). Download the full file or copy to clipboard to get everything.
Repository: jasonppy/VoiceCraft
Branch: master
Commit: a702dfd2ced6
Files: 47
Total size: 1.2 MB

Directory structure:
gitextract_ru4iadb9/

├── .dockerignore
├── .gitignore
├── Dockerfile
├── LICENSE-CODE
├── LICENSE-MODEL
├── README.md
├── RealEdit.txt
├── cog.yaml
├── config.py
├── data/
│   ├── __init__.py
│   ├── gigaspeech.py
│   ├── phonemize_encodec_encode_hf.py
│   └── tokenizer.py
├── demo/
│   └── temp/
│       ├── 84_121550_000074_000000.txt
│       └── mfa_alignments/
│           ├── 5895_34622_000026_000002.csv
│           └── 84_121550_000074_000000.csv
├── edit_utils.py
├── environment.yml
├── gradio_app.ipynb
├── gradio_app.py
├── gradio_requirements.txt
├── inference_speech_editing.ipynb
├── inference_speech_editing_scale.py
├── inference_tts.ipynb
├── inference_tts_scale.py
├── main.py
├── models/
│   ├── codebooks_patterns.py
│   ├── modules/
│   │   ├── __init__.py
│   │   ├── activation.py
│   │   ├── embedding.py
│   │   ├── sampling.py
│   │   ├── scaling.py
│   │   ├── transformer.py
│   │   └── utils.py
│   └── voicecraft.py
├── predict.py
├── pretrained_models/
│   └── .gitkeep
├── start-jupyter.bat
├── start-jupyter.sh
├── steps/
│   ├── __init__.py
│   ├── optim.py
│   ├── trainer.py
│   └── trainer_utils.py
├── tts_demo.py
├── voicecraft-gradio-colab.ipynb
└── z_scripts/
    ├── e830M.sh
    └── e830M_ft.sh

================================================
FILE CONTENTS
================================================

================================================
FILE: .dockerignore
================================================
# The .dockerignore file excludes files from the container build process.
#
# https://docs.docker.com/engine/reference/builder/#dockerignore-file

# Exclude Git files
.git
.github
.gitignore

# Exclude Python cache files
__pycache__
.mypy_cache
.pytest_cache
.ruff_cache

# Exclude Python virtual environment
/venv


================================================
FILE: .gitignore
================================================
__pycache__/
*.py[cod]
*$py.class
*.egg-info
.pytest_cache
.ipynb_checkpoints

thumbs.db
.DS_Store
.idea
*.log
*.pdf
*.mkv
*.mp4
*.png
*.wav
*.mp3
*.pth
*.th
*.json

*durip*
*rtx*
*l40*
*a40*

src/audiocraft

!/demo/
!/demo/*
/demo/temp/*.txt
!/demo/temp/84_121550_000074_000000.txt
.cog/tmp/*

================================================
FILE: Dockerfile
================================================
FROM jupyter/base-notebook:python-3.9.13

USER root

# Install OS dependencies
RUN apt-get update && apt-get install -y git-core ffmpeg espeak-ng && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Update Conda, create the voicecraft environment, and install dependencies
RUN conda update -y -n base -c conda-forge conda && \
    conda create -y -n voicecraft python=3.9.16 && \
    conda run -n voicecraft conda install -y -c conda-forge montreal-forced-aligner=2.2.17 openfst=1.8.2 kaldi=5.5.1068 && \
    conda run -n voicecraft mfa model download dictionary english_us_arpa && \
    conda run -n voicecraft mfa model download acoustic english_us_arpa && \
    conda run -n voicecraft pip install -e git+https://github.com/facebookresearch/audiocraft.git@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft && \
    conda run -n voicecraft pip install xformers==0.0.22 && \
    conda run -n voicecraft pip install torch==2.0.1 && \
    conda run -n voicecraft pip install torchaudio==2.0.2 && \
    conda run -n voicecraft pip install tensorboard==2.16.2 && \
    conda run -n voicecraft pip install phonemizer==3.2.1 && \
    conda run -n voicecraft pip install datasets==2.16.0 && \
    conda run -n voicecraft pip install torchmetrics==0.11.1 && \
    conda run -n voicecraft pip install huggingface_hub==0.22.2
    

# Install the Jupyter kernel
RUN conda install -n voicecraft ipykernel --update-deps --force-reinstall -y && \
    conda run -n voicecraft python -m ipykernel install --name=voicecraft

================================================
FILE: LICENSE-CODE
================================================
Attribution-NonCommercial-ShareAlike 4.0 International

=======================================================================

Creative Commons Corporation ("Creative Commons") is not a law firm and
does not provide legal services or legal advice. Distribution of
Creative Commons public licenses does not create a lawyer-client or
other relationship. Creative Commons makes its licenses and related
information available on an "as-is" basis. Creative Commons gives no
warranties regarding its licenses, any material licensed under their
terms and conditions, or any related information. Creative Commons
disclaims all liability for damages resulting from their use to the
fullest extent possible.

Using Creative Commons Public Licenses

Creative Commons public licenses provide a standard set of terms and
conditions that creators and other rights holders may use to share
original works of authorship and other material subject to copyright
and certain other rights specified in the public license below. The
following considerations are for informational purposes only, are not
exhaustive, and do not form part of our licenses.

     Considerations for licensors: Our public licenses are
     intended for use by those authorized to give the public
     permission to use material in ways otherwise restricted by
     copyright and certain other rights. Our licenses are
     irrevocable. Licensors should read and understand the terms
     and conditions of the license they choose before applying it.
     Licensors should also secure all rights necessary before
     applying our licenses so that the public can reuse the
     material as expected. Licensors should clearly mark any
     material not subject to the license. This includes other CC-
     licensed material, or material used under an exception or
     limitation to copyright. More considerations for licensors:
    wiki.creativecommons.org/Considerations_for_licensors

     Considerations for the public: By using one of our public
     licenses, a licensor grants the public permission to use the
     licensed material under specified terms and conditions. If
     the licensor's permission is not necessary for any reason--for
     example, because of any applicable exception or limitation to
     copyright--then that use is not regulated by the license. Our
     licenses grant only permissions under copyright and certain
     other rights that a licensor has authority to grant. Use of
     the licensed material may still be restricted for other
     reasons, including because others have copyright or other
     rights in the material. A licensor may make special requests,
     such as asking that all changes be marked or described.
     Although not required by our licenses, you are encouraged to
     respect those requests where reasonable. More considerations
     for the public:
    wiki.creativecommons.org/Considerations_for_licensees

=======================================================================

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
Public License

By exercising the Licensed Rights (defined below), You accept and agree
to be bound by the terms and conditions of this Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International Public License
("Public License"). To the extent this Public License may be
interpreted as a contract, You are granted the Licensed Rights in
consideration of Your acceptance of these terms and conditions, and the
Licensor grants You such rights in consideration of benefits the
Licensor receives from making the Licensed Material available under
these terms and conditions.


Section 1 -- Definitions.

  a. Adapted Material means material subject to Copyright and Similar
     Rights that is derived from or based upon the Licensed Material
     and in which the Licensed Material is translated, altered,
     arranged, transformed, or otherwise modified in a manner requiring
     permission under the Copyright and Similar Rights held by the
     Licensor. For purposes of this Public License, where the Licensed
     Material is a musical work, performance, or sound recording,
     Adapted Material is always produced where the Licensed Material is
     synched in timed relation with a moving image.

  b. Adapter's License means the license You apply to Your Copyright
     and Similar Rights in Your contributions to Adapted Material in
     accordance with the terms and conditions of this Public License.

  c. BY-NC-SA Compatible License means a license listed at
     creativecommons.org/compatiblelicenses, approved by Creative
     Commons as essentially the equivalent of this Public License.

  d. Copyright and Similar Rights means copyright and/or similar rights
     closely related to copyright including, without limitation,
     performance, broadcast, sound recording, and Sui Generis Database
     Rights, without regard to how the rights are labeled or
     categorized. For purposes of this Public License, the rights
     specified in Section 2(b)(1)-(2) are not Copyright and Similar
     Rights.

  e. Effective Technological Measures means those measures that, in the
     absence of proper authority, may not be circumvented under laws
     fulfilling obligations under Article 11 of the WIPO Copyright
     Treaty adopted on December 20, 1996, and/or similar international
     agreements.

  f. Exceptions and Limitations means fair use, fair dealing, and/or
     any other exception or limitation to Copyright and Similar Rights
     that applies to Your use of the Licensed Material.

  g. License Elements means the license attributes listed in the name
     of a Creative Commons Public License. The License Elements of this
     Public License are Attribution, NonCommercial, and ShareAlike.

  h. Licensed Material means the artistic or literary work, database,
     or other material to which the Licensor applied this Public
     License.

  i. Licensed Rights means the rights granted to You subject to the
     terms and conditions of this Public License, which are limited to
     all Copyright and Similar Rights that apply to Your use of the
     Licensed Material and that the Licensor has authority to license.

  j. Licensor means the individual(s) or entity(ies) granting rights
     under this Public License.

  k. NonCommercial means not primarily intended for or directed towards
     commercial advantage or monetary compensation. For purposes of
     this Public License, the exchange of the Licensed Material for
     other material subject to Copyright and Similar Rights by digital
     file-sharing or similar means is NonCommercial provided there is
     no payment of monetary compensation in connection with the
     exchange.

  l. Share means to provide material to the public by any means or
     process that requires permission under the Licensed Rights, such
     as reproduction, public display, public performance, distribution,
     dissemination, communication, or importation, and to make material
     available to the public including in ways that members of the
     public may access the material from a place and at a time
     individually chosen by them.

  m. Sui Generis Database Rights means rights other than copyright
     resulting from Directive 96/9/EC of the European Parliament and of
     the Council of 11 March 1996 on the legal protection of databases,
     as amended and/or succeeded, as well as other essentially
     equivalent rights anywhere in the world.

  n. You means the individual or entity exercising the Licensed Rights
     under this Public License. Your has a corresponding meaning.


Section 2 -- Scope.

  a. License grant.

       1. Subject to the terms and conditions of this Public License,
          the Licensor hereby grants You a worldwide, royalty-free,
          non-sublicensable, non-exclusive, irrevocable license to
          exercise the Licensed Rights in the Licensed Material to:

            a. reproduce and Share the Licensed Material, in whole or
               in part, for NonCommercial purposes only; and

            b. produce, reproduce, and Share Adapted Material for
               NonCommercial purposes only.

       2. Exceptions and Limitations. For the avoidance of doubt, where
          Exceptions and Limitations apply to Your use, this Public
          License does not apply, and You do not need to comply with
          its terms and conditions.

       3. Term. The term of this Public License is specified in Section
          6(a).

       4. Media and formats; technical modifications allowed. The
          Licensor authorizes You to exercise the Licensed Rights in
          all media and formats whether now known or hereafter created,
          and to make technical modifications necessary to do so. The
          Licensor waives and/or agrees not to assert any right or
          authority to forbid You from making technical modifications
          necessary to exercise the Licensed Rights, including
          technical modifications necessary to circumvent Effective
          Technological Measures. For purposes of this Public License,
          simply making modifications authorized by this Section 2(a)
          (4) never produces Adapted Material.

       5. Downstream recipients.

            a. Offer from the Licensor -- Licensed Material. Every
               recipient of the Licensed Material automatically
               receives an offer from the Licensor to exercise the
               Licensed Rights under the terms and conditions of this
               Public License.

            b. Additional offer from the Licensor -- Adapted Material.
               Every recipient of Adapted Material from You
               automatically receives an offer from the Licensor to
               exercise the Licensed Rights in the Adapted Material
               under the conditions of the Adapter's License You apply.

            c. No downstream restrictions. You may not offer or impose
               any additional or different terms or conditions on, or
               apply any Effective Technological Measures to, the
               Licensed Material if doing so restricts exercise of the
               Licensed Rights by any recipient of the Licensed
               Material.

       6. No endorsement. Nothing in this Public License constitutes or
          may be construed as permission to assert or imply that You
          are, or that Your use of the Licensed Material is, connected
          with, or sponsored, endorsed, or granted official status by,
          the Licensor or others designated to receive attribution as
          provided in Section 3(a)(1)(A)(i).

  b. Other rights.

       1. Moral rights, such as the right of integrity, are not
          licensed under this Public License, nor are publicity,
          privacy, and/or other similar personality rights; however, to
          the extent possible, the Licensor waives and/or agrees not to
          assert any such rights held by the Licensor to the limited
          extent necessary to allow You to exercise the Licensed
          Rights, but not otherwise.

       2. Patent and trademark rights are not licensed under this
          Public License.

       3. To the extent possible, the Licensor waives any right to
          collect royalties from You for the exercise of the Licensed
          Rights, whether directly or through a collecting society
          under any voluntary or waivable statutory or compulsory
          licensing scheme. In all other cases the Licensor expressly
          reserves any right to collect such royalties, including when
          the Licensed Material is used other than for NonCommercial
          purposes.


Section 3 -- License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the
following conditions.

  a. Attribution.

       1. If You Share the Licensed Material (including in modified
          form), You must:

            a. retain the following if it is supplied by the Licensor
               with the Licensed Material:

                 i. identification of the creator(s) of the Licensed
                    Material and any others designated to receive
                    attribution, in any reasonable manner requested by
                    the Licensor (including by pseudonym if
                    designated);

                ii. a copyright notice;

               iii. a notice that refers to this Public License;

                iv. a notice that refers to the disclaimer of
                    warranties;

                 v. a URI or hyperlink to the Licensed Material to the
                    extent reasonably practicable;

            b. indicate if You modified the Licensed Material and
               retain an indication of any previous modifications; and

            c. indicate the Licensed Material is licensed under this
               Public License, and include the text of, or the URI or
               hyperlink to, this Public License.

       2. You may satisfy the conditions in Section 3(a)(1) in any
          reasonable manner based on the medium, means, and context in
          which You Share the Licensed Material. For example, it may be
          reasonable to satisfy the conditions by providing a URI or
          hyperlink to a resource that includes the required
          information.
       3. If requested by the Licensor, You must remove any of the
          information required by Section 3(a)(1)(A) to the extent
          reasonably practicable.

  b. ShareAlike.

     In addition to the conditions in Section 3(a), if You Share
     Adapted Material You produce, the following conditions also apply.

       1. The Adapter's License You apply must be a Creative Commons
          license with the same License Elements, this version or
          later, or a BY-NC-SA Compatible License.

       2. You must include the text of, or the URI or hyperlink to, the
          Adapter's License You apply. You may satisfy this condition
          in any reasonable manner based on the medium, means, and
          context in which You Share Adapted Material.

       3. You may not offer or impose any additional or different terms
          or conditions on, or apply any Effective Technological
          Measures to, Adapted Material that restrict exercise of the
          rights granted under the Adapter's License You apply.


Section 4 -- Sui Generis Database Rights.

Where the Licensed Rights include Sui Generis Database Rights that
apply to Your use of the Licensed Material:

  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
     to extract, reuse, reproduce, and Share all or a substantial
     portion of the contents of the database for NonCommercial purposes
     only;

  b. if You include all or a substantial portion of the database
     contents in a database in which You have Sui Generis Database
     Rights, then the database in which You have Sui Generis Database
     Rights (but not its individual contents) is Adapted Material,
     including for purposes of Section 3(b); and

  c. You must comply with the conditions in Section 3(a) if You Share
     all or a substantial portion of the contents of the database.

For the avoidance of doubt, this Section 4 supplements and does not
replace Your obligations under this Public License where the Licensed
Rights include other Copyright and Similar Rights.


Section 5 -- Disclaimer of Warranties and Limitation of Liability.

  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.

  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.

  c. The disclaimer of warranties and limitation of liability provided
     above shall be interpreted in a manner that, to the extent
     possible, most closely approximates an absolute disclaimer and
     waiver of all liability.


Section 6 -- Term and Termination.

  a. This Public License applies for the term of the Copyright and
     Similar Rights licensed here. However, if You fail to comply with
     this Public License, then Your rights under this Public License
     terminate automatically.

  b. Where Your right to use the Licensed Material has terminated under
     Section 6(a), it reinstates:

       1. automatically as of the date the violation is cured, provided
          it is cured within 30 days of Your discovery of the
          violation; or

       2. upon express reinstatement by the Licensor.

     For the avoidance of doubt, this Section 6(b) does not affect any
     right the Licensor may have to seek remedies for Your violations
     of this Public License.

  c. For the avoidance of doubt, the Licensor may also offer the
     Licensed Material under separate terms or conditions or stop
     distributing the Licensed Material at any time; however, doing so
     will not terminate this Public License.

  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
     License.


Section 7 -- Other Terms and Conditions.

  a. The Licensor shall not be bound by any additional or different
     terms or conditions communicated by You unless expressly agreed.

  b. Any arrangements, understandings, or agreements regarding the
     Licensed Material not stated herein are separate from and
     independent of the terms and conditions of this Public License.


Section 8 -- Interpretation.

  a. For the avoidance of doubt, this Public License does not, and
     shall not be interpreted to, reduce, limit, restrict, or impose
     conditions on any use of the Licensed Material that could lawfully
     be made without permission under this Public License.

  b. To the extent possible, if any provision of this Public License is
     deemed unenforceable, it shall be automatically reformed to the
     minimum extent necessary to make it enforceable. If the provision
     cannot be reformed, it shall be severed from this Public License
     without affecting the enforceability of the remaining terms and
     conditions.

  c. No term or condition of this Public License will be waived and no
     failure to comply consented to unless expressly agreed to by the
     Licensor.

  d. Nothing in this Public License constitutes or may be interpreted
     as a limitation upon, or waiver of, any privileges and immunities
     that apply to the Licensor or You, including from the legal
     processes of any jurisdiction or authority.

=======================================================================

Creative Commons is not a party to its public
licenses. Notwithstanding, Creative Commons may elect to apply one of
its public licenses to material it publishes and in those instances
will be considered the “Licensor.” The text of the Creative Commons
public licenses is dedicated to the public domain under the CC0 Public
Domain Dedication. Except for the limited purpose of indicating that
material is shared under a Creative Commons public license or as
otherwise permitted by the Creative Commons policies published at
creativecommons.org/policies, Creative Commons does not authorize the
use of the trademark "Creative Commons" or any other trademark or logo
of Creative Commons without its prior written consent including,
without limitation, in connection with any unauthorized modifications
to any of its public licenses or any other arrangements,
understandings, or agreements concerning use of licensed material. For
the avoidance of doubt, this paragraph does not form part of the
public licenses.

Creative Commons may be contacted at creativecommons.org.


================================================
FILE: LICENSE-MODEL
================================================
Coqui Public Model License 1.0.0
https://coqui.ai/cpml.txt

This license allows only non-commercial use of a machine learning model and its outputs.

Acceptance
In order to get any license under these terms, you must agree to them as both strict obligations and conditions to all your licenses.

Licenses
The licensor grants you a copyright license to do everything you might do with the model that would otherwise infringe the licensor's copyright in it, for any non-commercial purpose. The licensor grants you a patent license that covers patent claims the licensor can license, or becomes able to license, that you would infringe by using the model in the form provided by the licensor, for any non-commercial purpose.

Non-commercial Purpose
Non-commercial purposes include any of the following uses of the model or its output, but only so far as you do not receive any direct or indirect payment arising from the use of the model or its output.

Personal use for research, experiment, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, amateur pursuits, or religious observance.
Use by commercial or for-profit entities for testing, evaluation, or non-commercial research and development. Use of the model to train other models for commercial use is not a non-commercial purpose.
Use by any charitable organization for charitable purposes, or for testing or evaluation. Use for revenue-generating activity, including projects directly funded by government grants, is not a non-commercial purpose.
Notices
You must ensure that anyone who gets a copy of any part of the model, or any modification of the model, or their output, from you also gets a copy of these terms or the URL for them above.

No Other Rights
These terms do not allow you to sublicense or transfer any of your licenses to anyone else, or prevent the licensor from granting licenses to anyone else. These terms do not imply any other licenses.

Patent Defense
If you make any written claim that the model infringes or contributes to infringement of any patent, your licenses for the model granted under these terms ends immediately. If your company makes such a claim, your patent license ends immediately for work on behalf of your company.

Violations
The first time you are notified in writing that you have violated any of these terms, or done anything with the model or its output that is not covered by your licenses, your licenses can nonetheless continue if you come into full compliance with these terms, and take practical steps to correct past violations, within 30 days of receiving notice. Otherwise, all your licenses end immediately.

No Liability
AS FAR AS THE LAW ALLOWS, THE MODEL AND ITS OUTPUT COME AS IS, WITHOUT ANY WARRANTY OR CONDITION, AND THE LICENSOR WILL NOT BE LIABLE TO YOU FOR ANY DAMAGES ARISING OUT OF THESE TERMS OR THE USE OR NATURE OF THE MODEL OR ITS OUTPUT, UNDER ANY KIND OF LEGAL CLAIM. IF THIS PROVISION IS NOT ENFORCEABLE IN YOUR JURISDICTION, YOUR LICENSES ARE VOID.

Definitions
The licensor is the individual or entity offering these terms, and the model is the model the licensor makes available under these terms, including any documentation or similar information about the model.

You refers to the individual or entity agreeing to these terms.

Your company is any legal entity, sole proprietorship, or other kind of organization that you work for, plus all organizations that have control over, are under the control of, or are under common control with that organization. Control means ownership of substantially all the assets of an entity, or the power to direct its management and policies by vote, contract, or otherwise. Control can be direct or indirect.

Your licenses are all the licenses granted to you under these terms.

Use means anything you do with the model or its output requiring one of your licenses.

================================================
FILE: README.md
================================================
# VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
[![Paper](https://img.shields.io/badge/arXiv-2403.16973-brightgreen.svg?style=flat-square)](https://arxiv.org/pdf/2403.16973.pdf)  [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/pyp1/VoiceCraft_gradio)  [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1IOjpglQyMTO2C3Y94LD9FY0Ocn-RJRg6?usp=sharing)  [![Replicate](https://replicate.com/cjwbw/voicecraft/badge)](https://replicate.com/cjwbw/voicecraft)  [![YouTube demo](https://img.shields.io/youtube/views/eikybOi8iwU)](https://youtu.be/eikybOi8iwU)  [![Demo page](https://img.shields.io/badge/Audio_Samples-blue?logo=Github&style=flat-square)](https://jasonppy.github.io/VoiceCraft_web/)


### TL;DR
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both **speech editing** and **zero-shot text-to-speech (TTS)** on in-the-wild data including audiobooks, internet videos, and podcasts.

To clone or edit an unseen voice, VoiceCraft needs only a few seconds of reference.

## How to run inference
There are three ways (besides running Gradio in Colab):

1. More flexible inference beyond Gradio UI in Google Colab. see [quickstart colab](#quickstart-colab)
2. with docker. see [quickstart docker](#quickstart-docker)
3. without docker. see [environment setup](#environment-setup). You can also run gradio locally if you choose this option
4. As a standalone script that you can easily integrate into other projects.
see [quickstart command line](#quickstart-command-line).

When you are inside the docker image or you have installed all dependencies, Checkout [`inference_tts.ipynb`](./inference_tts.ipynb).

If you want to do model development such as training/finetuning, I recommend following [envrionment setup](#environment-setup) and [training](#training).

## News
:star: 03/15/2025: change inference sampling from topp=1 to topk=40 massively improve editing and TTS performance

:star: 04/22/2024: 330M/830M TTS Enhanced Models are up [here](https://huggingface.co/pyp1), load them through [`gradio_app.py`](./gradio_app.py) or [`inference_tts.ipynb`](./inference_tts.ipynb)! Replicate demo is up, major thanks to [@chenxwh](https://github.com/chenxwh)!

:star: 04/11/2024: VoiceCraft Gradio is now available on HuggingFace Spaces [here](https://huggingface.co/spaces/pyp1/VoiceCraft_gradio)! Major thanks to [@zuev-stepan](https://github.com/zuev-stepan), [@Sewlell](https://github.com/Sewlell), [@pgsoar](https://github.com/pgosar) [@Ph0rk0z](https://github.com/Ph0rk0z).

:star: 04/05/2024: I finetuned giga330M with the TTS objective on gigaspeech and 1/5 of librilight. Weights are [here](https://huggingface.co/pyp1/VoiceCraft/tree/main). Make sure maximal prompt + generation length <= 16 seconds (due to our limited compute, we had to drop utterances longer than 16s in training data). Even stronger models forthcomming, stay tuned!

:star: 03/28/2024: Model weights for giga330M and giga830M are up on HuggingFace🤗 [here](https://huggingface.co/pyp1/VoiceCraft/tree/main)!

## TODO
- [x] Codebase upload
- [x] Environment setup
- [x] Inference demo for speech editing and TTS
- [x] Training guidance
- [x] RealEdit dataset and training manifest
- [x] Model weights
- [x] Better guidance on training/finetuning
- [x] Colab notebooks
- [x] HuggingFace Spaces demo
- [x] Command line
- [ ] Improve efficiency

## QuickStart Colab

:star: To try out speech editing or TTS Inference with VoiceCraft, the simplest way is using Google Colab.
Instructions to run are on the Colab itself.

1. To try [Speech Editing](https://colab.research.google.com/drive/1FV7EC36dl8UioePY1xXijXTMl7X47kR_?usp=sharing)
2. To try [TTS Inference](https://colab.research.google.com/drive/1lch_6it5-JpXgAQlUTRRI2z2_rk5K67Z?usp=sharing)

## QuickStart Command Line

:star: To use it as a standalone script, check out tts_demo.py and speech_editing_demo.py.
Be sure to first [setup your environment](#environment-setup).
Without arguments, they will run the standard demo arguments used as an example elsewhere
in this repository. You can use the command line arguments to specify unique input audios,
target transcripts, and inference hyperparameters. Run the help command for more information:
`python3 tts_demo.py -h`

## QuickStart Docker
:star: To try out TTS inference with VoiceCraft, you can also use docker. Thank [@ubergarm](https://github.com/ubergarm) and [@jayc88](https://github.com/jay-c88) for making this happen.

Tested on Linux and Windows and should work with any host with docker installed.
```bash
# 1. clone the repo on in a directory on a drive with plenty of free space
git clone git@github.com:jasonppy/VoiceCraft.git
cd VoiceCraft

# 2. assumes you have docker installed with nvidia container container-toolkit (windows has this built into the driver)
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.13.5/install-guide.html
# sudo apt-get install -y nvidia-container-toolkit-base || yay -Syu nvidia-container-toolkit || echo etc...

# 3. First build the docker image
docker build --tag "voicecraft" .

# 4. Try to start an existing container otherwise create a new one passing in all GPUs
./start-jupyter.sh  # linux
start-jupyter.bat   # windows

# 5. now open a webpage on the host box to the URL shown at the bottom of:
docker logs jupyter

# 6. optionally look inside from another terminal
docker exec -it jupyter /bin/bash
export USER=(your_linux_username_used_above)
export HOME=/home/$USER
sudo apt-get update

# 7. confirm video card(s) are visible inside container
nvidia-smi

# 8. Now in browser, open inference_tts.ipynb and work through one cell at a time
echo GOOD LUCK
```

## Environment setup
```bash
conda create -n voicecraft python=3.9.16
conda activate voicecraft

pip install -e git+https://github.com/facebookresearch/audiocraft.git@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft
pip install xformers==0.0.22
pip install torchaudio==2.0.2 torch==2.0.1 # this assumes your system is compatible with CUDA 11.7, otherwise checkout https://pytorch.org/get-started/previous-versions/#v201
apt-get install ffmpeg # if you don't already have ffmpeg installed
apt-get install espeak-ng # backend for the phonemizer installed below
pip install tensorboard==2.16.2
pip install phonemizer==3.2.1
pip install datasets==2.16.0
pip install torchmetrics==0.11.1
pip install huggingface_hub==0.22.2
# install MFA for getting forced-alignment, this could take a few minutes
conda install -c conda-forge montreal-forced-aligner=2.2.17 openfst=1.8.2 kaldi=5.5.1068
# install MFA english dictionary and model
mfa model download dictionary english_us_arpa
mfa model download acoustic english_us_arpa
# pip install huggingface_hub
# conda install pocl # above gives an warning for installing pocl, not sure if really need this

# to run ipynb
conda install -n voicecraft ipykernel --no-deps --force-reinstall
```

If you have encountered version issues when running things, checkout [environment.yml](./environment.yml) for exact matching.

## Inference Examples
Checkout [`inference_speech_editing.ipynb`](./inference_speech_editing.ipynb) and [`inference_tts.ipynb`](./inference_tts.ipynb)

## Gradio
### Run in colab

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1IOjpglQyMTO2C3Y94LD9FY0Ocn-RJRg6?usp=sharing)

### Run locally
After environment setup install additional dependencies:
```bash
apt-get install -y espeak espeak-data libespeak1 libespeak-dev
apt-get install -y festival*
apt-get install -y build-essential
apt-get install -y flac libasound2-dev libsndfile1-dev vorbis-tools
apt-get install -y libxml2-dev libxslt-dev zlib1g-dev
pip install -r gradio_requirements.txt
```

Run gradio server from terminal or [`gradio_app.ipynb`](./gradio_app.ipynb):
```bash
python gradio_app.py
```
It is ready to use on [default url](http://127.0.0.1:7860).

### How to use it
1. (optionally) Select models
2. Load models
3. Transcribe
4. (optionally) Tweak some parameters
5. Run
6. (optionally) Rerun part-by-part in Long TTS mode

### Some features
Smart transcript: write only what you want to generate

TTS mode: Zero-shot TTS

Edit mode: Speech editing

Long TTS mode: Easy TTS on long texts


## Training
To train an VoiceCraft model, you need to prepare the following parts:
1. utterances and their transcripts
2. encode the utterances into codes using e.g. Encodec
3. convert transcripts into phoneme sequence, and a phoneme set (we named it vocab.txt)
4. manifest (i.e. metadata)

Step 1,2,3 are handled in [./data/phonemize_encodec_encode_hf.py](./data/phonemize_encodec_encode_hf.py), where
1. Gigaspeech is downloaded through HuggingFace. Note that you need to sign an agreement in order to download the dataset (it needs your auth token)
2. phoneme sequence and encodec codes are also extracted using the script.

An example run:

```bash
conda activate voicecraft
export CUDA_VISIBLE_DEVICES=0
cd ./data
python phonemize_encodec_encode_hf.py \
--dataset_size xs \
--download_to path/to/store_huggingface_downloads \
--save_dir path/to/store_extracted_codes_and_phonemes \
--encodec_model_path path/to/encodec_model \
--mega_batch_size 120 \
--batch_size 32 \
--max_len 30000
```
where encodec_model_path is avaliable [here](https://huggingface.co/pyp1/VoiceCraft). This model is trained on Gigaspeech XL, it has 56M parameters, 4 codebooks, each codebook has 2048 codes. Details are described in our [paper](https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf). If you encounter OOM during extraction, try decrease the batch_size and/or max_len.
The extracted codes, phonemes, and vocab.txt will be stored at `path/to/store_extracted_codes_and_phonemes/${dataset_size}/{encodec_16khz_4codebooks,phonemes,vocab.txt}`.

As for manifest, please download train.txt and validation.txt from [here](https://huggingface.co/datasets/pyp1/VoiceCraft_RealEdit/tree/main), and put them under `path/to/store_extracted_codes_and_phonemes/manifest/`. Please also download vocab.txt from [here](https://huggingface.co/datasets/pyp1/VoiceCraft_RealEdit/tree/main) if you want to use our pretrained VoiceCraft model (so that the phoneme-to-token matching is the same).

Now, you are good to start training!

```bash
conda activate voicecraft
cd ./z_scripts
bash e830M.sh
```

It's the same procedure to prepare your own custom dataset. Make sure that if

## Finetuning
You also need to do step 1-4 as Training, and I recommend to use AdamW for optimization if you finetune a pretrained model for better stability. checkout script `./z_scripts/e830M_ft.sh`.

If your dataset introduce new phonemes (which is very likely) that doesn't exist in the giga checkpoint, make sure you combine the original phonemes with the phoneme from your data when construction vocab. And you need to adjust `--text_vocab_size` and `--text_pad_token` so that the former is bigger than or equal to you vocab size, and the latter has the same value as `--text_vocab_size` (i.e. `--text_pad_token` is always the last token). Also since the text embedding are now of a different size, make sure you modify the weights loading part so that I won't crash (you could skip loading `text_embedding` or only load the existing part, and randomly initialize the new)

## License
The codebase is under CC BY-NC-SA 4.0 ([LICENSE-CODE](./LICENSE-CODE)), and the model weights are under Coqui Public Model License 1.0.0 ([LICENSE-MODEL](./LICENSE-MODEL)). Note that we use some of the code from other repository that are under different licenses: `./models/codebooks_patterns.py` is under MIT license; `./models/modules`, `./steps/optim.py`, `data/tokenizer.py` are under Apache License, Version 2.0; the phonemizer we used is under GNU 3.0 License.

## Acknowledgement
We thank Feiteng for his [VALL-E reproduction](https://github.com/lifeiteng/vall-e), and we thank audiocraft team for open-sourcing [encodec](https://github.com/facebookresearch/audiocraft).

## Citation
```
@article{peng2024voicecraft,
  author    = {Peng, Puyuan and Huang, Po-Yao and Mohamed, Abdelrahman and Harwath, David},
  title     = {VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild},
  journal   = {arXiv},
  year      = {2024},
}
```

## Disclaimer
Any organization or individual is prohibited from using any technology mentioned in this paper to generate or edit someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.



================================================
FILE: RealEdit.txt
================================================
wav_fn	orig_transcript	new_transcript	orig_masked_span	new_masked_span	type
YOU1000000102_S0000137.wav	if i had never dropped out. i would have never dropped in on that calligraphy class and personal computers might not have the wonderful typography that they do.	if i had never dropped out. i would have never stopped by that calligraphy class and personal computers might not have the wonderful typography that they do.	10,12	10,11	substitution
YOU1000000124_S0000174.wav	so people in symbolic era. i mean we all agree that symbols come in and symbols come out, okay.	so people in symbolic era. i mean we all agree that signals go in and symbols come out, okay.	11,12	11,12	substitution
YOU1000000006_S0000016.wav	and then we can actually go through and generate a lead. so we've got this first call to action here and this is sending them to a landing page.	and then we can actually go through and generate a lead. so we've got this first action here and this is sending them to a landing page.	16,17	15,16	deletion
YOU1000000149_S0000172.wav	the dry shampoo honestly like weirdly styles it enough to where i'm like satisfied with it but i do like to add a little bit of hair spray.	the dry shampoo honestly like weirdly styles it enough to where i'm like satisfied with it but sometimes it needs a little extra i do like to add a little bit of hair spray.	16,17	17,22	insertion
YOU1000000102_S0000031.wav	steve also cofounded pixar animation studios. which has revolutionized the film industry in it short history with brilliant use of technology.	steve also cofounded pixar animation studios. which has revolutionized the film industry in it short history with films like toy story that showcase brilliant use of technology.	16,17	17,22	insertion
YOU1000000148_S0000041.wav	so i was actually happy to wear such a heavy dress and to be able to wear proper wool, you know mountain underwear ha ha ha.	so i was actually happy to wear such a rugged coat and boots and to be able to wear proper wool, you know mountain underwear ha ha ha.	9,10	9,12	substitution
YOU1000000153_S0000000.wav	some times i really feel that the world around us continues to be more hectic and more complicated and so many of us are truly craving to find simplicity.	some times i really feel that the world around us continues to be more hectic, more impersonal, and more uncaring and so many of us are truly craving to find simplicity.	14,17	14,19	substitution
YOU1000000163_S0000051.wav	and finally, pressing on the crown opens up the app menu so this is where you can access your third party apps and system settings.	and finally, pressing on the crown opens up the app menu where you can access your third party apps and system settings.	11,13	10,11	deletion
YOU1000000115_S0000057.wav	in the future when the borrower repays the loan plus interest, the asset and the liability disappear and the transaction is settled.	in the future when the borrower repays the lender the loan plus interest, the asset and the liability disappear and the transaction is settled.	7,8	8,9	insertion
YOU1000000103_S0000113.wav	with an election firmly behind us, voters are taking a new measure of joe biden and what they believe he should deliver as president.	with an election firmly behind us, voters are taking a new measure of joe biden and the rest of his administration and reconsidering what they believe he should deliver as president.	15,16	16,22	insertion
YOU1000000019_S0000015.wav	causing a lock up. although this mostly happens when i'm spamming tps in the last layer. so i've had to learn to control my speed to help with overshooting.	causing a lock up. although this mostly happens when i'm tapping the pedal faster than i should be so i've had to learn to control my speed to help with overshooting.	10,15	10,17	substitution
YOU1000000045_S0000205.wav	and said, you know, we need to start a process ah in order to figure out how we can protect the kurds who have been our allies.	and said, you know, we need to start a process ah in order to figure out how we can provide aid to all the groups that have helped us including the kurds who have been our allies.	19	19,29	substitution
YOU1000000155_S0000027.wav	in a case like this, i probably wouldn't spend any more time looking at the deal if i was only interested in the cash flow.	in a case like this, i probably wouldn't spend any more time looking at all the details and the fine print if i was only interested in the cash flow.	14,15	14,20	substitution
YOU1000000118_S0000018.wav	the reason will often comeback to how you created the machine learning dataset. so give yourself time to absorb the lessons.	the reason will often boil down to the quality of the machine learning dataset. so give yourself time to absorb the lessons.	4,8	4,9	substitution
YOU1000000115_S0000104.wav	the total amount of credit in the united states is about fifty trillion dollars and the total amount of money is only about three trillion dollars.	the total amount of credit is about fifty trillion dollars and the total amount of money is only about three trillion dollars.	5,8	4,5	deletion
YOU1000000045_S0000187.wav	so, there's a huge difference. i mean, people can gather, you know, information in, you know, all kinds of different ways.	so, there's a huge difference. i mean, people can earn their living and provide for their family in you know, all kinds of different ways.	9,13	9,17	substitution
YOU1000000101_S0000060.wav	they knew that governments don't control things. a government can't control the economy without controlling people.	they knew that governments don't control money directly. a government can't control the economy without controlling people.	6	6,7	substitution
YOU1000000106_S0000178.wav	okay so my little cousin julia wants to know what did you want to do with your life at age five?	okay so my little cousin julia has kind of a weird question for you. she wants to know what did you want to do with your life at age five?	5,6	6,14	insertion
YOU1000000165_S0000039.wav	and one to watch over the course of the six seasons of girls. after rising to prominence on the series, he promptly began picking up increasingly significant roles.	and one to watch over the course of the six seasons of girls. after his dating scandal, he has lost some significant roles.	14,25	14,20	substitution
YOU1000000181_S0000024.wav	around a large pot of vinegar crowd three men, these aren't any ordinary men in fact they're the three founders of the great asian philosophies.	around a large pot of vinegar crowd three men. they're cooking for the three founders of the great asian philosophies.	8,16	8,11	substitution
YOU1000000116_S0000006.wav	which was no part of his intention and this term invisible hand is famous led by the invisible hand to promote an end.	which was no part of his intention and this term is famous led by the invisible hand to promote an end.	10,11	9,10	deletion
YOU1000000015_S0000044.wav	or press control c or command c to copy that link. you don't wanna come back to parallel's toolbox and then all we need to do is hit paste.	or press control c or command c to copy that paragraph, then go back to the original document and then all we need to do is hit paste.	10,18	10,17	substitution
YOU1000000167_S0000004.wav	you could tell christopher robin had something important to say from the way he clasped his knees tightly and wriggled his toes.	you could tell christopher was really eager to get out of his seat from the way he clasped his knees tightly and wriggled his toes.	4,9	4,12	substitution
YOU1000000119_S0000017.wav	at the end of this course, you will be able to explain the three fundamental characteristic that define the blockchain using bitcoin blockchain.	at the end of this course, you will be able to explain to your friends what blockchain is, and why they should be using bitcoin blockchain.	12,19	12,22	substitution
YOU1000000170_S0000103.wav	it it felt like, at the time like, an incredible milestone. we we needed to post it to use net.	it it felt like, we needed to post it to use net.	4,11	3,4	deletion
YOU1000000167_S0000107.wav	he hadn't expected london to have quite so many legs.	he hadn't expected the new furniture to have quite so many legs.	3	3,5	substitution
YOU1000000183_S0000073.wav	and then when you actually do sit down to learn about some universities look at the programs that are really relevant to you.	and then when you actually do make searches you have to make sure that you find recommendations that are really relevant to you.	6,16	6,16	substitution
YOU1000000120_S0000023.wav	coconut milk some peeled tomatoes i wanna zip this up now.	coconut milk some tomatoes i wanna zip this up now.	3	2,3	deletion
YOU1000000187_S0000089.wav	there is beautiful things in life so when you're suffering just you know its part of the package you know you look at it we're born.	there is beautiful things in life so you should always remember when you're suffering just you know its part of the package you know you look at it we're born.	6,7	7,10	insertion
YOU1000000122_S0000033.wav	we should not be finding objects all of a sudden that are spitting distance away from us that we've been missing all this time.	we should not be finding objects all of a sudden spitting distance away from us that we've been missing all this time.	10,11	9,10	deletion
YOU1000000045_S0000141.wav	because of how you know toxic politics in america has become.	because of how toxic politics in america has become.	3,4	2,3	deletion
YOU1000000185_S0000003.wav	and boys did you pick a great week to tune in. over the past few months, we've been bringing together experts in a number of critical fields.	and boys did you pick a great week to tune in. We've got an amazing episode lined up for you today. over the past few months, we've been bringing together experts in a number of critical fields.	10,11	11,20	insertion
YOU1000000127_S0000064.wav	economic development remains one of the most effective ways to increase the capacity to adapt to climate change.	economic development remains one of the most promising options that we have left on the table to increase the capacity to adapt to climate change.	7,8	7,15	substitution
YOU1000000105_S0000131.wav	and all deservedly so, but we have something for you. in fact, guillermo, bring this in.	and all deservedly so, but we have to show you something that just arrived this morning. in fact, guillermo, bring this in.	7,9	7,15	substitution
YOU1000000122_S0000012.wav	jackie has spent fifteen years searching our solar neighborhood for new neighbors.	jackie has spent the last three decades searching our solar neighborhood for new neighbors.	3,4	3,6	substitution
YOU1000000138_S0000114.wav	he, interestingly so, absolutely no reason to feel any type of remorse, and although he's quite pleasant to me he's pleasant to us, he's a very very dangerous individual.	he, interestingly so, absolutely no reason to feel any type of remorse, and although he might seem like a nice person, be careful, because he's a very very dangerous individual.	14,22	14,23	substitution
YOU1000000027_S0000026.wav	because i'm gonna be doing more live chats here and there and i'm just trying to post more videos on facebook and be more active there.	because i'm gonna be doing more live chats here and there and i'm just trying to post more overall content and more videos on facebook and be more active there.	17,18	18,21	insertion
YOU1000000123_S0000009.wav	i wrote the title of the course many years ago, ah, when i created this course.	i wrote the title when i created this course.	4,10	3,4	deletion
YOU1000000001_S0000037.wav	i'm gonna try to keep my vehicle on the road without going on the lawn over there like i did on the last ah pull in.	i'm gonna try to avoid any unintentional mishaps like last time and keep my vehicle on the road without going on the lawn over there like i did on the last ah pull in.	3,4	4,11	insertion
YOU1000000019_S0000011.wav	but qiyi then went on to post some more photos and explained some more of the subtle differences between the two cubes.	but qiyi then went on to post some more really high quality pictures and videos and explained some more of the subtle differences between the two cubes.	9	9,14	substitution
YOU1000000126_S0000185.wav	when we started coursera, we had no idea that over the next several years, it will blossom to such a large movement.	when we started coursera, we were absolutely confident that with enough hard work it will blossom to such a large movement.	5,13	5,12	substitution
YOU1000000171_S0000052.wav	currently fifteen to twenty countries and ah another.	currently fifteen to seventeen states and ah another.	3,4	3,4	substitution
YOU1000000184_S0000015.wav	we we both had a fair amount of experience in real estate and charlie made his early money in real estate um.	we we both had a fair amount of experience in investing but warren actually made all his early money in real estate um.	10,14	10,15	substitution
YOU1000000111_S0000109.wav	and you gotta make sure you in nobody's area to allow them to knock down a three and you get contact.	and you gotta make sure to allow them to knock down a three and you get contact.	5,8	4,5	deletion
YOU1000000005_S0000035.wav	and then the campaign content i think this one is really key to use as well.	and then the campaign content is super detailed so this one is really key to use as well.	5,6	5,8	substitution
YOU1000000004_S0000033.wav	and you are gonna be looking to copy and pasting some things here now this gives us clear instruction.	and you are gonna be looking to copy things here now this gives us clear instruction.	8,10	7,8	deletion
YOU1000000163_S0000001.wav	and what was really unique about this smartwatch was that it actually came with not one but two displays.	and what was really unique about this smartwatch was that since it can flip open it actually came with not one but two displays.	9,10	10,14	insertion
YOU1000000043_S0000180.wav	ah so let's ignore that and go back to the software. so that is the last thing.	ah so let's ignore that and go back to the beginning. so that is the last thing.	10	10	substitution
YOU1000000101_S0000273.wav	to keep the goldwater crusade on the air, send one, ten, fifty dollars.	to keep the goldwater, send one, ten, fifty dollars.	3,6	2,3	deletion
YOU1000000108_S0000265.wav	reflect it's just a place that i got to go.	reflect it's just a place that we all got to go.	6	6,7	substitution
YOU1000000128_S0000041.wav	that's a question nobody can answer because the future depends on decisions that have not yet been taken.	that's a question nobody can answer with confidence since the future depends on decisions that have not yet been taken.	6	6,8	substitution
YOU1000000191_S0000014.wav	ten standing up and over hundred and ten are chilling out, bellies full of grass.	ten standing up and over hundred and ten are just laying around chilling out, bellies full of grass.	8,9	9,11	insertion
YOU1000000174_S0000066.wav	we have done a lot of work around vaccine planning but also realistically it's not gonna be available to community members right away.	we have done a lot of work but also realistically it's not gonna be available to community members right away.	7,9	6,7	deletion
YOU1000000133_S0000039.wav	when the c e o of blockbuster heard that, he promptly had a kitchen sink delivered to the netflix office, a fairly creative way of declaring war.	when the c e o of blockbuster heard that, he promptly had five hundred pounds of glitter divided into five thousand manilla envelopes delivered to the netflix office, a fairly creative way of declaring war.	12,14	12,22	substitution
YOU1000000118_S0000004.wav	we end with a discussion of two link, of how to do machine learning at scale using python notebooks and server less data processing components.	we end with a discussion of two link, of how to do machine learning at scale using python notebooks and energy efficient data processing components.	20,21	20,21	substitution
YOU1000000007_S0000039.wav	we just come in here and create a custom app okay now we're just gonna give this a name we're gonna say this is the seller leads campaign.	we just come in here and give this a name we're gonna say this is the seller leads campaign.	6,14	5,6	deletion
YOU1000000113_S0000034.wav	that's a bomb and that's a good sign from him. he got fully extended on it. knew it as soon as that ball left.	that's a bomb and that's a good sign from him. he clearly signalled and made the play happen as soon as that ball left.	11,17	11,17	substitution
YOU1000000169_S0000188.wav	they're being trained to detect the early warning signals for severe allergic reactions, epileptic fits and narcolepsy.	they're being trained to detect the early warning signals for epileptic fits and narcolepsy.	10,12	9,10	deletion
YOU1000000155_S0000070.wav	an investor flipping houses at this level might require far less than seventy percent maybe a fifty percent or even lower.	an investor flipping houses at this level might require lower margins than seventy percent maybe a fifty percent or even lower.	9,10	9,10	substitution
YOU1000000119_S0000043.wav	what is a blockchain? blockchain is about enabling peer to peer transaction in a decentralized network.	what is a blockchain? blockchain is about high risk high reward investments in a decentralized network.	7,11	7,11	substitution
YOU1000000139_S0000064.wav	but then queen actually changes its direction attacking h seven point.	but then queen actually intensifies the offensive by attacking h seven point.	4,6	4,7	substitution
YOU1000000037_S0000435.wav	that i think would be fun to auction and that would keep the price down that we could have some fun nobody would get hurt.	that i think that we could have some fun nobody would get hurt.	3,14	2,3	deletion
YOU1000000159_S0000039.wav	the excellent story and it's lovable multidimensional characters, along with the challenging tactical combat are all refined and back for another round with new surprises and new friends until.	the excellent story and it's lovable multidimensional characters, along with the challenging tactical combat and deep background simulation are all refined and back for another round with new surprises and new friends until.	13,14	14,17	insertion
YOU1000000110_S0000046.wav	argentina's trophy and it's a fifth world crown.	argentina's trophy and victory is a fifth world crown.	3	3,4	substitution
YOU1000000180_S0000044.wav	our role of taking comment, and, and, and offering response and then making informed decisions on how it's going to impact those in the market place.	our role of taking comment, working with stakeholders and customers to research the root of problems, offering response, and then making informed decisions on how it's going to impact those in the market place.	5,9	5,17	substitution
YOU1000000016_S0000006.wav	so that means you can easily create one livestream and push it out to multiple live platforms.	so that means once you know your subject and your target audience, you can easily create one livestream and push it out to multiple live platforms.	2,3	3,11	insertion
YOU1000000137_S0000397.wav	but the renaissance broke their monopoly on knowledge, one of the most important bastions of the church.	but the renaissance broke their monopoly on knowledge, with it's free movement of research and endless scientific inquiry, one of the most important bastions of the church.	7,8	8,17	insertion
YOU1000000101_S0000078.wav	every responsible farmer and farm organization has repeatedly asked the government to free the farm economy.	every responsible farmer and farm organization has repeatedly asked the state government to free the farm economy.	9,10	10	insertion
YOU1000000045_S0000118.wav	ah in fact, we're in an interesting period now where the country is gearing up for impeachment.	ah in fact, we're in an unprecedented political situation now where the country is gearing up for impeachment.	6,7	6,8	substitution
YOU1000000185_S0000136.wav	nobody's been in the office you know for over three months now, and yet our work ah is going on pretty much full speed.	nobody's been in the office you know for over three months now, and yet we are pushing onward pretty much full speed.	14,19	14,17	substitution
YOU1000000108_S0000070.wav	you know what like comedy central was a hot place to be when i showed up there.	you know what like after all these years my childhood home was a completely different place when i showed up there.	4,11	4,15	substitution
YOU1000000141_S0000085.wav	your daughter wants to take ballet classes and she needs shoes and some lessons. your son wants to play sports, he needs cleats and some gear.	your daughter wants to take advanced calculus classes. your son wants to play sports, he needs cleats and some gear.	5,13	5,7	substitution
YOU1000000153_S0000027.wav	really good sized water tank here is well.	really good sized piece of land here is well.	3,4	3,5	substitution
YOU1000000108_S0000206.wav	manipulating! it sounds like somebody's trying to put young dave in a compromising position.	manipulating! it sounds like somebody's in a compromising position.	5,9	4,5	deletion
YOU1000000101_S0000132.wav	yet anytime you and i question the schemes of the dogooders, were denounced as being against their humanitarian goals. they say we're always against things, we're never for anything.	yet anytime you and i question the schemes of the dogooders or dare to dig into any of their motives, were denounced as being against their humanitarian goals. they say we're always against things, we're never for anything.	9,10	10,18	insertion
YOU1000000117_S0000291.wav	but one of the things you can do to be nice to yourself is to remember what science suggests about the kinds of things that can improve your wellbeing.	but one of the things you can do to get better and be physically, mentally and emotionally healthy is to remember what science suggests about the kinds of things that can improve your wellbeing.	9,12	9,17	substitution
YOU1000000117_S0000077.wav	and the specific kind of meditation is what's known as loving kindness meditation or matter.	and the specific kind of ancient yogic meditation exercise is what's known as loving kindness meditation or matter.	5	5,8	substitution
YOU1000000117_S0000231.wav	ah basically i have a rule now that after eight p m, i put my phone away, i just put it on silent.	ah basically i put my phone away, i just put it on silent.	3,12	2,3	deletion
YOU1000000192_S0000168.wav	just out of gas there, ah she'll be right.	just out of food and water there, ah she'll be right.	3	3,5	substitution
YOU1000000183_S0000080.wav	and then after that what happens next well let's listen in.	and then after that what happens afterwards is very exciting well let's listen in.	6	6,9	substitution
YOU1000000101_S0000130.wav	she wanted the divorce to get an eightydollar raise. she's eligible for three hundred and thirty dollars a month in the aid to dependent children program.	she wanted the divorce to get an eightydollar raise. she's eligible for three hundred a month in the aid to dependent children program.	14,16	13,14	deletion
YOU1000000106_S0000171.wav	besides your phone and wallet what's a couple must have purse items?	besides your phone, can I have purse items?	2,8	2,4	substitution
YOU1000000043_S0000141.wav	you just wanna like sent people to ah different pieces of content on on social media.	you just wanna like trash people on social media.	4,12	4,5	substitution
YOU1000000186_S0000063.wav	and i wouldn't be able to do it except for the the lock the visibility the resources that came from that first career.	and i wouldn't be able to do it except for the resources that came from that first career.	11,15	10,11	deletion
YOU1000000102_S0000129.wav	it was beautiful, historical, artistically subtle, in a way that science can't capture and i found it fascinating.	it was beautiful, historical, arcane, a glitchy looking relic from the fifteen hundreds, artistically subtle, in a way that science can't capture and i found it fascinating.	3,4	4,12	insertion
YOU1000000123_S0000100.wav	if you make a lot of money in finance, it's a game. you enjoyed it. now give most of it away, that's, that's going to be a theme.	if you make a lot of money in finance, it's a game. you enjoyed it. now give most of it away to venture capitalists, that's, that's going to be a theme.	19,20	20,22	insertion
YOU1000000173_S0000047.wav	when the army needs to handle air defense on the move, the avenger is the goto weapon.	when the army needs to intimidate a land's rightful owners, the angry nationalistic holler is the goto weapon.	5,12	5,13	substitution
YOU1000000119_S0000040.wav	that has opened up a whole world of possibilities beyond simple currency transfer.	that has opened up a whole world of pyramid schemes under the guise simple currency transfer.	8,9	8,12	substitution
YOU1000000043_S0000025.wav	so in bonus number three i cover what good niches are when it comes to affiliate marketing and how to decide which one to pick for yourself.	so in bonus number three i cover what good niches are when it comes to affiliate marketing and many layouts to clickbait people into buying useless things, and how to pick for yourself.	18,22	18,28	substitution
YOU1000000023_S0000047.wav	because we can include so many other characters if we just expand the definitions to any sword wielder, who's a little spicy.|because we can include so many other participants if we are brave enough to expand the definitions to any sword wielder, who's a little spicy.	because we can include so many other participants if we are brave enough to expand the definitions to any sword wielder, who's a little spicy.|because we can include so many other participants if we are brave enough to expand the definitions to any blade wielder, who's a little spicy.	7,10|16	7,13|19	substitution|substitution
YOU1000000103_S0000018.wav	tonight we'll be looking at president biden's first day in office, i'll talk with americans who did and did not vote for him what do they expect now.|tonight we'll be showcasing our new mascot, Edward the Egg! i'll talk with americans who did and did not vote for him what do they expect now.	tonight we'll be showcasing our new mascot, Edward the Egg! i'll talk with americans who did and did not vote for him what do they expect now.|tonight we'll be showcasing our new mascot, Edward the Egg! i'll talk with americans who'll tell him what do they expect now.	3,10|15,21	3,9|14,15	substitution|substitution
YOU1000000123_S0000094.wav	it can't be more easily solved, we need, we need all these people. that's why i take some pride in this course in being connected to the real world.|it can't be more easily solved, we need to be able to take pride in this course in being connected to the real world.	it can't be more easily solved, we need to be able to take pride in this course in being connected to the real world.|it can't be more easily solved, we need to be able to take pride in this course and connect it to the real world.	7,17|22,24	7,12|17,19	substitution|substitution
YOU1000000117_S0000269.wav	she has a few ideas in mind but it's hard to find ways in everyday life when you are in isolation to do this more.|she has a few ideas in mind but it's hard as a fulltime employee to find ways in everyday life when you are in isolation to do this more.	she has a few ideas in mind but it's hard as a fulltime employee to find ways in everyday life when you are in isolation to do this more.|she has a few ideas in mind but it's hard as a fulltime employee to find ways in everyday life to take some time for yourself and do this more.	9,10|16,21	10,13|20,26	insertion|substitution
YOU1000000153_S0000099.wav	this is just so cozy up here, and having that skylight is just lovely isn't it.|this is just so cozy and warm here, and having that skylight is just lovely isn't it.	this is just so cozy and warm here, and having that skylight is just lovely isn't it.|this is just so cozy and warm here, isn't it.	5|7,13	5,6|7,8	substitution|deletion
YOU1000000123_S0000045.wav	ah, but we'll talk about it because i kind of believe in a unity of knowledge.|ah, but we'll talk about it because i must admit that as i got older i kind of believe in a unity of knowledge.	ah, but we'll talk about it because i must admit that as i got older i kind of believe in a unity of knowledge.|ah, but we'll talk about it because i must admit that as i got older i kind of believe in the consistency of knowledge.	7,8|12,13	8,15|20,21	insertion|substitution
YOU1000000117_S0000193.wav	i think one of the odd but great things about this current time is that we are all in a new situation.|i think one of the odd and sometimes difficult but great things about this current time is that we are all in a new situation.	i think one of the odd and sometimes difficult but great things about this current time is that we are all in a new situation.|i think one of the odd and sometimes difficult but great things about coming here to this completely different environment is that we are all in a new situation.	5,6|10,12	6,8|13,19	insertion|substitution
YOU1000000171_S0000102.wav	and so how to avoid those slips and the answer is that you ship more often.|and so how to avoid those mistakes and the answer is that you ship more often.	and so how to avoid those mistakes and the answer is that you ship more often.|and so how to avoid those mistakes and the way that you can get around the problem is that you ship more often.	6|9	6|9,16	substitution|substitution
YOU1000000127_S0000082.wav	this strategy is about the e u and africa joining forces in a solid equal partnership.|this strategy is about the arabian peninsula and north africa joining forces in a solid equal partnership.	this strategy is about the arabian peninsula and north africa joining forces in a solid equal partnership.|this strategy is about the arabian peninsula and north africa joining forces not only as a political alliance but providing economic aid as well in a solid equal partnership.	5,7|10,11	5,8|12,23	substitution|insertion
YOU1000000124_S0000157.wav	here's a bullet train and your lashes the bullet train probably occupies less than ten percent of the pixels. the building in the background is much bigger.|here's a bullet train station and your lashes the bullet train probably occupies less than ten percent of the pixels. the building in the background is much bigger.	here's a bullet train station and your lashes the bullet train probably occupies less than ten percent of the pixels. the building in the background is much bigger.|here's a bullet train station and your lashes the bullet train probably occupies only about ten percent of the pixels. the building in the background is much bigger.	3,4|12,13	4|13,14	insertion|substitution
YOU1000000103_S0000157.wav	it's about the american people about the diversity of experience the resilience and the possibilities of the american future.|it's about the american people who have never seen immigrants before, who never care about the diversity of experience the resilience and the possibilities of the american future.	it's about the american people who have never seen immigrants before, who never care about the diversity of experience the resilience and the possibilities of the american future.|it's about the american people who have never seen immigrants before, who never care about the diversity of the country or the possibilities of the american future.	4,5|9,12	5,13|18,20	insertion|substitution
5849_50962_000025_000001.wav	"Here she comes!" called the crowd presently, as the black speck far out, and the strain on the cord, showed the buoy was coming back.	"Here she comes!" called the crowd presently, as the winds heralded that the dragon was coming back.	9,21	9,13	substitution
1701_141760_000050_000000.wav	"He is a very, very nice, honest, and pleasant fellow," answered Boris.	"He is a very, very nice, honest, but not pleasant fellow," answered Boris.	7	7,8	substitution
5536_43359_000021_000005.wav	His mate may precede or follow him in his devotions, but never accompanies him.	His mate may precede him in his devotions, but never accompanies him.	4,5	3,4	deletion
8297_275154_000008_000004.wav	And yet he spoke roughly; he looked like an angry man brought to bay.	And yet he spoke roughly; he looked like an man brought to bay.	9	8,9	deletion
5536_43359_000017_000000.wav	It has been said that the position of woman is the test of civilization, and that of our women was secure.	It has been said that the position of our women was secure.	8,16	7,8	deletion
4570_56594_000008_000000.wav	Then there was nothing said again for some time.	Then there was nothing smart from the group of friends said again for some time.	3,4	4,9	insertion
5543_27761_000077_000003.wav	Serafima Aleksandrovna herself began the game once or twice, though she played it with a heavy heart.	Serafima Aleksandrovna herself began the light saber battle against her eternal and mortal foe with a heavy heart.	5,12	5,13	substitution
8288_274162_000086_000000.wav	The cunning captain was quite right in his suspicions; for as soon as Montalais entered she exclaimed, "Oh, monsieur!"	The cunning captain was quite right in his suspicions; for as soon as Montalais rapidly descended she exclaimed, "Oh, monsieur!"	14	14,15	substitution
6841_88291_000009_000006.wav	Then one or the other threw off the rope. Homer rode away, coiling the rope as he went.	Then one or the other threw off the robe. Homer rode away, coiling the rope as he went.	8	8	substitution
8297_275156_000024_000000.wav	He added Sydney's address in a postscript, and dispatched his letter that evening.	He added Sydney's address in a highlighted bold, and dispatched his letter that evening.	6	6,7	substitution
6123_59150_000007_000001.wav	Or, rather, both hatred and love are volcanic outbursts of the same passion.	Or, rather, both hatred and love are the same passion.	7,9	6,7	deletion
116_288048_000019_000004.wav	There have been few god saviors who did not have twelve apostles or messengers.	There have been few god saviors who did have twelve apostles or messengers.	8	7,8	deletion
3000_15664_000026_000004.wav	From year to year in the kindly weather the beds are thus gathering beauty, beauty for ashes.	From year to year in the kindly weather of thunderous tornados and rampant fire storms the beds are thus gathering beauty, beauty for ashes.	7,8	8,14	insertion
6313_76958_000032_000000.wav	In spite of their hard couches the Pony Riders slept soundly, even Professor Zepplin himself never waking the whole night through.	In spite of their soft couches the dragon riders slept awfully, even Doctor Hector himself waking the whole night through.	4,15	4,14	substitution
2506_11278_000011_000000.wav	We are three sisters, from seventeen to twenty two.	We are six hundred siblings, from negative seventeen to twenty two.	2,4	2,6	substitution
700_122867_000012_000001.wav	But please, Marilla, go away and don't look at me.	But please, Marilla, come closer but don't look at me.	3,5	3,5	substitution
3660_172182_000013_000005.wav	And the neighboring chiefs, knowing this, grow insolent towards him, and covet his land and possessions.	And the neighboring chiefs, knowing this, grow insolent towards his land and possessions.	9,11	8,9	deletion
6123_59186_000015_000001.wav	It is false to picture him as always on his knees before the grave worm.	It is false to picture him as before the grave worm.	7,10	6,7	deletion
2803_154328_000034_000002.wav	It was evidently the cue of both sides to be silent.	It was evidently the cue of both to close their mouths and to be silent.	7	7,11	substitution
8297_275156_000008_000001.wav	It was not the newspaper which he had bought at the station.	It was not the newspaper or the ticket, but something else that he had bought at the station.	4,5	5,10	insertion
6313_66129_000013_000001.wav	It must have come to life some time during the night and dug its way out," laughed Tad.	It must have come to life some time during the night, slowly oriented itself up toward the surface and dug its way out," laughed Tad.	9,10	10,16	insertion
3663_172528_000026_000005.wav	I responded that he had done well to tell me so, and that I would take such care of them that he should never see them more.	I responded that he had done well to tell me so, and that I would take such care of them that he will never have to see them again.	22,26	22,28	substitution
8173_294714_000023_000004.wav	There is a serious necessity for his getting out of prison.	There is a serious necessity for his release from prison.	7,9	7,8	substitution
6841_88294_000031_000002.wav	On the middle of his back knelt my one armed friend. And that sharp hook was caught neatly under the point of the Mexican's jaw.	On the middle of his back knelt my one armed friend. And that sharp hook was caught neatly under the point of the man's jaw.	23	23	substitution
7976_110523_000012_000000.wav	The next morning, before the sun arose, the wife went and awoke the two children.	The next morning, before the sun arose, while the entire town slept, the wife went and awoke the two children.	6,7	7,11	insertion
1993_147965_000006_000003.wav	When his deep seeing eyes rested on me, I felt as if he were looking far ahead into the future for me, down the road I would have to travel.	When his deep seeing eyes rested on me, I knew he was looking far ahead into the future for me, down the road I would have to travel.	9,13	9,11	substitution
7697_105815_000023_000006.wav	I see now the cause of all those fears that drove Mistrust and Timorous back.	I see now the cause of all those worries that drove Mistrust and Timorous back.	8	8	substitution
116_288046_000004_000007.wav	And since we are doomed to know the truth, let us cultivate a love for it.	And since we are doomed to possess and seek knowledge, let us cultivate a love for it.	6,8	6,9	substitution
3000_15664_000035_000003.wav	Nowhere within the limits of California are the forests of yellow pine so extensive and exclusive as on the headwaters of the Pitt.	Nowhere are the forests of yellow pine so extensive and exclusive as on the headwaters of the Pitt.	1,5	0,1	deletion
174_84280_000004_000003.wav	Nevertheless she was not all my life, nor the form of all my life.	Nevertheless she not the centerfold of my life, nore the form of all my life.	2,7	2,8	substitution
5694_64025_000004_000006.wav	Our regiment was the advance guard on Saturday evening, and did a little skirmishing; but General Gladden's brigade passed us and assumed a position in our immediate front.	Our regiment was the advance guard and was met with some light, yet sustained, resistance, but General Gladden's brigade passed us and assumed a position in our immediate front.	6,13	6,14	substitution
3853_163249_000137_000000.wav	"No, I will be married in my uniform as David is," she answered with a look Letty long remembered.	"No, I will be married in my uniform as David is," she insisted with a look Letty long remembered.	12	12	substitution
6267_53049_000048_000004.wav	I never knew what had become of Penelope.	I never knew what had happened to Penelope.	5,6	5,6	substitution
2035_147961_000013_000003.wav	At last he was shut off by a coughing fit which fairly choked him.	At last he emerged and water in his lungs fairly choked him.	3,10	3,8	substitution
4831_18525_000037_000001.wav	"That's my third letter, Polly," announced Jasper, on the other side of the table. "Now, I am going to begin on Joel's."	"That's my third letter, Polly," announced Jasper, from behind the table. "Now, I am going to begin on Joel's."	7,11	7,8	substitution
6241_61946_000034_000000.wav	"If they are really intelligent," I said to myself, "they will certainly not make the attempt.	"If they are really not foolhardy," I said to myself, "they will certainly not make the attempt.	4	4,5	substitution
1630_96099_000031_000002.wav	I had no fear of him, not till the very last, when he played me this evil turn.	I had no fear of him, not till I saw him conspire to harm me, when he played me this evil turn.	8,10	8,14	substitution
2428_83699_000043_000001.wav	There be a lot of luggage. He do say he's come to stay with you.	There be a lot of luggage in the trunk. He do say he's come to stay with you.	5	5,8	substitution
5895_34615_000016_000001.wav	Is there a providence of demons as well as of God? We put the question without answering it.	Is there a providence of demons that exists to serve good men and make them evil? We put the question without answering it.	6,10	6,15	substitution
2428_83699_000044_000002.wav	We've lost the key of the cellar, and there's nothing out, except water, and I don't think you'd care for that.	We've lost the key of the cellar, and there's nothing except water, and I don't think you'd care for that.	10	9,10	deletion
3660_6517_000056_000004.wav	Williams had to confess he was beaten and must draw fires.	Williams had to take an art class and must draw fires.	3,6	3,6	substitution
7601_291468_000005_000001.wav	It was observed by a great projector of inland lock navigation, that rivers, lakes, and oceans were only formed to feed canals.	It was observed by the captain of the ship that rivers, lakes, and oceans were only formed to feed canals.	4,10	4,8	substitution
3663_172528_000010_000006.wav	Accordingly she allowed me twice to take as much as I could of the water, so that in good earnest I swallowed more than a flask full.	Accordingly she allowed me once to take as much food as I wanted, and twice to take as much as I could of the water, so that in good earnest I swallowed more than a flask full.	3,4	4,13	insertion
700_122866_000021_000004.wav	You ought to cultivate your imagination, you know. Miss Stacy says so.	You ought to always be polite, you know. Miss Stacy says so.	3,5	3,5	substitution
84_121550_000074_000000.wav	But when I had approached so near to them The common object, which the sense deceives, Lost not by distance any of its marks,	But when I saw the mirage of the lake in the distance, which the sense deceives, Lost not by distance any of its marks,	3,11	3,11	substitution
4323_13259_000009_000002.wav	It was true that the victory was won by a very meager majority.	It was true that the victory was won by a majority.	10,11	9,10	deletion
8254_115543_000021_000002.wav	"It is a rare sight now a days to see one of these white cobras."	"It is a rare sight in this country to see one of these white cobras."	5,7	5,7	substitution
4153_61735_000020_000001.wav	It was a glance of inquiry, ending in a look of chagrin, with some muttered phrases that rendered it more emphatic.	It was a look of disgust followed by a curled lip, with some muttered phrases that rendered it more emphatic.	3,11	3,10	substitution
1255_138279_000011_000000.wav	The shape went slowly along, but without much exertion, for the snow, though sudden, was not as yet more than two inches deep.	The shape went slowly along, for the snow, though sudden, was not as yet more than two inches deep.	5,8	4,5	deletion
5694_64025_000022_000014.wav	The rope, however, was stronger than the mule's "no," and he was finally prevailed upon by the strength of the rope to cross the creek.	The rope, however, was stronger than the mule's "no," and he was finally prevailed upon to cross the creek.	15,20	14,15	deletion
2277_149896_000025_000002.wav	He pulled out his key and tried to insert it, but another key was on the inside.	He pulled out his key and tried to fit it into the lock, only to discover that another key was on the inside.	8,10	8,16	substitution
6345_64257_000012_000002.wav	Thus was she borne away captive of her dead, neither willing nor unwilling, of life and death equally careless.	Thus was she borne away of life and death equally careless.	4,11	3,4	deletion
1993_147965_000003_000000.wav	At about four o'clock a visitor appeared: mr Shimerda, wearing his rabbit skin cap and collar, and new mittens his wife had knitted.	At about four o'clock a visitor appeared: we were shocked to see our reclusive neighbor out and about, wearing his rabbit skin cap and collar, and new mittens his wife had knitted.	7,8	7,17	substitution
2803_154320_000005_000012.wav	john came up to him and said, "Your Lordship is looking out for land?"	john came up to him and said, "I see that you're looking out for land?"	7,9	7,10	substitution
700_122868_000033_000004.wav	That scene of two years before flashed back into her recollection as vividly as if it had taken place yesterday.	That scene of two years before flashed before her eyes as vividly as if it had taken place yesterday.	7,10	7,9	substitution
2902_9008_000005_000007.wav	If the gods have deserted their oracles, they have not deserted the souls who aspire to them.	If the gods have deserted their oracles, they have not as of yet fully deserted the souls who aspire to them.	9,10	10,13	insertion
5895_34629_000011_000000.wav	Ursus had made his arrangements with the tavern keeper, Master Nicless, who, owing to his respect for the law, would not admit the wolf without charging him extra.	Ursus had made his arrangements with the tavern keeper, Master Nicless, who, owing to his disdain for the law, would not admit the wolf without charging him extra.	15	15	substitution
1650_157641_000035_000000.wav	Kingsley's devotion to smoke seems to have surprised Tennyson, who was no light smoker himself.	Kingsley's devotion to smoke surprised Tennyson, who was no light smoker himself.	4,6	3,4	deletion
2035_147961_000017_000003.wav	At midnight the parents of the bride said good bye to her and blessed her.	At midnight the parents of the groom said good bye to her and blessed her.	6	6	substitution
6313_76958_000004_000000.wav	In a few moments the sound of singing was borne to the ears of the campers.	In a few moments singing was borne to the ears of the campers.	4,6	3,4	deletion
2902_9008_000014_000002.wav	Strange! that men should be content to grovel, and be men, when they might rise to the rank of gods!	Strange! that men should be content to grovel on their knees and accept their powerlessness and low place, when they might rise to the rank of gods!	7,10	7,17	substitution
4570_56594_000014_000000.wav	"Yours is a great beef country, I believe," says the old gentleman.	"Yours is a great chicken farm, I believe," says the old gentleman.	4,5	4,5	substitution
8288_274162_000089_000000.wav	"How very fortunate that is; he was looking for you, too."	"How very fortunate that is; he was just here looking for you, too."	6,7	7,8	insertion
1630_73710_000016_000004.wav	I know it must be more than a week; I know that that prospect was only held out by your affection.	I know it must be more than a week; I know that that prospect was introduced several days ago, waiting to be considered by your affection.	15,17	15,22	substitution
4323_18416_000044_000000.wav	"Yes, yes, of course; but you are too young to judge of such things," said the old gentleman decidedly, "as the giving away of property and all that."	"Yes, yes, of course; but you are too young to have much knowledge of such things," said the old gentleman decidedly, "as the giving away of property and all that."	10	10,12	substitution
1462_170142_000040_000001.wav	Bartley leaned his head in his hands and spoke through his teeth.	Bartley leaned his head in his hands and spoke softly through his teeth.	8,9	9	insertion
84_121550_000126_000000.wav	To the left hand I turned with that reliance With which the little child runs to his mother, When he has fear, or when he is afflicted,	To the left hand I turned with that reliance With which the little child runs to his mother, When he fears anything that he sees around him, or when he is afflicted,	20,21	20,26	substitution
2035_147960_000003_000004.wav	We might get some puppies, or owl eggs, or snake skins.	We might get several colorful gemstones, or owl eggs, or snake skins.	3,4	3,5	substitution
6345_93302_000075_000009.wav	He loved her with all his heart, and he, also, had what she had never suspected in him, the literary sense.	He loved her with all his heart, and he, also, had what she had always hoped to be in him, the literary sense.	14,15	14,17	substitution
1686_142278_000007_000002.wav	Margaret could not bear the sight of the suspense, which was even more distressing to her father than to herself.	Margaret could not bear the sight of herself.	7,18	6,7	deletion
2277_149897_000021_000001.wav	He tried to get the interest of things about him, but it was not to be.	He tried to get the interest of things he saw around him, but it was not to be.	8	8,10	substitution
6123_59150_000010_000003.wav	The man was not a thief; he was an honest man, in fact, and by a peasant's standard by no means poor.	The man was not a thief; he was an honest man, in fact, and by no means poor.	15,18	14,15	deletion
6267_65525_000003_000000.wav	"I want to run over and see how mrs Brixby is this evening, Siddy, and you must take care of the baby till I get back."	"I want to run over and see how all the invited house guests have been liking this evening, and you must take care of the baby till I get back."	8,13	8,17	substitution
7850_281318_000008_000000.wav	So, in a great company, they came fluttering, hopping, twittering up to the elm tree where Mother Magpie nestled comfortably in her new house.	So, in a great company, they came fluttering, hopping, twittering up to the elm tree where the bird leader nestled comfortably in her new house.	16,17	16,18	substitution
5694_64038_000008_000003.wav	The soldiers were in good spirits, but it was the spirit of innocence and peace, not war and victory.	The soldiers were in good spirits, but not war and victory.	7,14	6,7	deletion
6467_97061_000022_000000.wav	"Having made himself invisible, he entered without difficulty the apartment of the princess, and was astonished and enraged on finding her lying in your arms."	"Having made himself invisible, he entered without the princess, and was astonished and enraged on finding her lying in your arms."	7,10	6,7	deletion
8254_84205_000031_000002.wav	I should be running fast and dodging in and out among the rocks and trees.	I should be running fast and dodging up and down and in and out among the rocks and trees.	6,7	7,10	insertion
2803_154328_000083_000003.wav	He was now the object of their anxiety, and whose absence was a black shadow between them and their happiness.	He was now the object of their incessant admiration, and whose absence was a black shadow between them and their happiness.	7	7,8	substitution
7850_286674_000006_000003.wav	Of course they breathed water like their neighbors, the fishes and the Tadpoles.	Of course they breathed water like the fishes and the Tadpoles.	6,7	5,6	deletion
4570_102353_000014_000006.wav	After some further discussion of the question, the visitors withdrew, dissatisfied with the result of the interview.	After some further discussion of the question, the exhausted organizing committee for the occasion are still dissatisfied with the result of the interview.	8,9	8,15	substitution
1988_24833_000028_000000.wav	I get the pillows comfortably arranged on the floor, with a big bottle of soda and a bag of popcorn within easy reach.	I get the pillows scattered wildly on the floor, with a big bottle of soda and a bag of popcorn within easy reach.	4,5	4,5	substitution
1919_142785_000003_000002.wav	In a short time, boil up the vinegar again, add pepper and ginger in the above proportion, and instantly cover them up.	In a short time, boil up the water, add pepper and ginger in the above proportion, and instantly cover them up.	7,8	7	substitution
2506_11278_000007_000001.wav	The Right Honourable was the son of a nobleman, and practised on an old lady.	The Right Honourable was the son of a nobleman of the oldest sort, and practised on an old lady.	8,9	9,12	insertion
2803_161169_000011_000019.wav	What do you think of that from the coal tar.	What do you think of the coal tar.	5,6	4,5	deletion
3536_8226_000019_000000.wav	"It's love for her as has done it then," said Bozzle, shaking his head.	"It's love for her as has been done before then," said Bozzle, shaking his head.	6,7	6,8	substitution
4831_25894_000013_000000.wav	But see, then, it is cold in the streets; the wind bites, and the snow freezes one's fingers.	But see, then, it is cold in the streets; the hail scares all of the critters, and the snow freezes one's fingers.	10,11	10,15	substitution
7697_245715_000006_000002.wav	Therefore, in the state of innocence, children would not have been deprived of the use of their limbs.|Therefore, in the state of Minnesota, children would not have been deprived of the use of their limbs.	Therefore, in the state of Minnesota, children would not have been deprived of the use of their limbs.|Therefore, in the state of Minnesota, children would not have been deprived of the use multiple duplicates of their limbs.	5	5	substitution|substitution
6295_64301_000013_000007.wav	It cried aloud that eternity was very long, and like a great palace without a quiet room.|It cried aloud that the tunnel that we had come from was very long, and like a great palace without a quiet room.	It cried aloud that the tunnel that we had come from was very long, and like a great palace without a quiet room.|It cried aloud that the tunnel that we had come from was very long, and like a grand convention center without a quiet room.	4|11,12	4,10|17,19	substitution|substitution
2412_153954_000009_000001.wav	When I had shown them what I did with it, they were astonished but not displeased, and seemed to like the smell.|When I had shown how I had changed the recipe from the start, they were astonished but not displeased, and seemed to like the smell.	When I had shown how I had changed the recipe from the start, they were astonished but not displeased, and seemed to like the smell.|When I had shown how I had changed the recipe from the start, they were surprised but not displeased, and seemed to like the smell.	3,4|12	4,6|15	insertion|substitution
1462_170138_000006_000001.wav	When they entered the stage box on the left the first act was well under way, the scene being the interior of a cabin in the south of Ireland.|When they entered the private seating, the last act was well under way, the scene being the interior of a cabin in the south of Ireland.	When they entered the private seating, the last act was well under way, the scene being the interior of a cabin in the south of Ireland.|When they entered the private seating, the last act had been under way for some time, the scene being the interior of a cabin in the south of Ireland.	4,10|12,15	4,7|9,15	substitution|substitution
2412_153954_000007_000003.wav	In fact, one of them was plainly very much out of health, and coughed violently from time to time in spite of manifest efforts to suppress it.|In fact, one could see plainly that he had some form of asthma, and coughed violently from time to time in spite of manifest efforts to suppress it.	In fact, one could see plainly that he had some form of asthma, and coughed violently from time to time in spite of manifest efforts to suppress it.|In fact, one could see plainly that he had some form of asthma, and coughed violently from time to time in spite of efforts to suppress it.	3,11|22	3,12|22,23	substitution|deletion
84_121550_000147_000000.wav	Therefore my answer is with greater care, That he may hear me who is weeping yonder, So that the sin and dole be of one measure.|Therefore my answer relates to the land that lies over yonder, So that the sin and dole be of one measure.	Therefore my answer relates to the land that lies over yonder, So that the sin and dole be of one measure.|Therefore my answer relates to the land that lies over yonder, So that joy and despair is of one measure.	3,14|18,22	3,9|13,16	substitution|substitution
3170_137482_000032_000000.wav	An hour later, two noblemen, friends of the senator, came in, one a few minutes after the other.|An hour later, two noblemen carrying great swords came in, one a few minutes after the other.	An hour later, two noblemen carrying great swords came in, one a few minutes after the other.|An hour later, two noblemen carrying great swords came in, one parrying a deadly strike with his sword, one lunging after the other.	4,8|12,14	4,7|11,19	substitution|substitution
1630_102884_000006_000000.wav	In the old order the king was given to understand that he was the freest individual in the world.|In the old order the king was to understand that he was the freest individual in the world.	In the old order the king was to understand that he was the freest individual in the world.|In the old order the king was to understand that he was the freest in the world.	7|15	6,7|13,14	deletion|deletion
8173_294714_000033_000000.wav	"Promise that you won't ask me to borrow money of you for mr Van Brandt," she rejoined, "and I will accept your help gratefully."|"Promise that you won't ask me to borrow any money from the bank for the bail of you for mr Van Brandt," she rejoined, "and I will accept your help gratefully."	"Promise that you won't ask me to borrow any money from the bank for the bail of you for mr Van Brandt," she rejoined, "and I will accept your help gratefully."|"Promise that you won't ask me to borrow any money from the bank for the bail of you for mr Van Brandt," she rejoined, "and I accept your help gratefully."	7,8|19	8,14|25,26	insertion|deletion
8297_275156_000023_000003.wav	Shall I say that she may expect an early visit from you, when I see her to morrow?|Shall I say that she certainly may expect an early visit from you, when I see her to morrow?	Shall I say that she certainly may expect an early visit from you, when I see her to morrow?|Shall I say that she certainly may expect an early visit from you, when my maid carries my message requesting that I see her to morrow?	4,5|12,13	5|14,20	insertion|insertion
1686_142278_000039_000002.wav	I think I could do anything but that: the idea of her distress turns me sick with dread.|I think I could do anything else: the idea of her distress turns me sick with dread.	I think I could do anything else: the idea of her distress turns me sick with dread.|I think I could do anything else: the idea turns me sick with dread.	6,7|10,12	6|8,9	substitution|deletion
5338_284437_000013_000000.wav	"Come!" commanded the woman who led the party; "you three must follow me to the presence of Tourmaline.|"Come!" commanded the old wizard who led the party; "you three must follow me to the presence of Tourmaline.	"Come!" commanded the old wizard who led the party; "you three must follow me to the presence of Tourmaline.|"Come!" commanded the old wizard who led the party; "you three must quickly make your way to the presence of Tourmaline.	3|11,12	3,4|12,15	substitution|substitution
2086_149220_000027_000001.wav	"As to his character, we need not discuss its points; they have already been settled by a competent tribunal, or one which called itself competent.|"As to his points; they have already been settled by a competent tribunal, or one which called itself competent.	"As to his points; they have already been settled by a competent tribunal, or one which called itself competent.|"As to his points; they have already been settled by a competent law firm, or one which called itself competent.	3,8|18	2,3|12,13	deletion|substitution
1919_142785_000047_000001.wav	It grows somewhat like the lily of the valley, but its height is about three feet.|It grows somewhat like the sunflower or the lily of the valley, but its height is about three feet.	It grows somewhat like the sunflower or the lily of the valley, but its height is about three feet.|It grows somewhat like the sunflower or the lily of the valley, but its height is just over four feet and width is about three feet.	4,5|12,13	5,7|16,22	insertion|insertion
show_2t9Kk4FHmiEkjNPJctidN6-7yurNieQHNgkfAk9eE4uCy.wav	Yeah, I'll tell you one guy wouldn't want to fish against.	Yeah, I'll tell you one guy to fish against.	6,7	5,6	deletion
show_2tTA2xYpcS5YuTIXzXakTu-72nwjiYkDKGYImtEsqN5KA.wav	A number of family photographs and we couldn't identify the people in the photographs.	A number of family photographs and we absolutely couldn't recognize the individuals in those photographs.	7,12	7,13	substitution
show_2c04iZbAAIYmZrTIRgggNc-5kMxWwMd2NvZkSyiFZaULP.wav	And now we have to fundamentally change this and change that we have the best economy in the world and the best decade we've ever had in human history.	And now we have to fundamentally change this and ensure that we maintain the best economy in the world and the best decade we've ever had in human history.	9,12	9,12	substitution
show_2CfS1shsOSeK8SjwiEV8du-22Vump0EG42cvL1I9JqRBV.wav	And there are a lot of places in the movie where they could have just slipped it in just a little bit just to confirm that it happened.	And there are actually multiple moments throughout the film where they could have just slipped it in just a little bit just to confirm that it happened.	3,9	3,8	substitution
show_2T4Ue1V9k0S4uiTgUkPKEZ-1373jRsEGJsUlJrxsVnYWz.wav	And you were developing it and you were working with I would assume School administrators.	And you were developing it and you were also working closely with I would assume School administrators.	8	8,10	substitution
show_2cYRReFdJFlfB2BULrbqfM-55Z6AspULdusRxBQysUp9e.wav	I would not know what it mean means to latch onto a body, to be a single cell within that body.	I would not know how to explain what it actually means to latch onto a body, to be a single cell within that body.	3,4	4,6	insertion
show_2CLetGT20MFsHqfeBN3fYl-4f1fxmGJW9MrfGxD7pCFvC.wav	It went out and it worked and it's scaled and ah it mine's great.	It went out and it scaled and it worked and ah it mine's great.	5,8	5,8	substitution
show_2cSm7FgVuDH3IbmSlWkZzH-0hFwD7TSXO5c69BPwJzOfx.wav	We would just be open and willing to adopt whatever child God brought to her life.	We would just be excited to welcome whatever child God brought into her life.	4,13	4,11	substitution
show_2T0wGFb5714hwSBVJgOXny-6s3OXLpYBaeweAF7RJL8kT.wav	Well, we played China which we lost two times already that really helped us maybe if they weren't there maybe because I've done that because I heard their voices and lifted our Spirits.	Well, we played China twice and we lost in both games that really helped us maybe if they weren't there maybe because I've done that because I heard their voices and lifted our Spirits.	4,9	4,10	substitution
show_1TuEAft9VZR2lIZ1G2EceZ-11ZkFljNCjwftz1SIt7HFU.wav	Ended at improving air quality within its borders.	Ended at significantly improving air quality within its borders.	1,2	2	insertion
show_2Tq9eBynjdfY1BasX45Krb-2mlnB6AXQTdUIXevl9cVlc.wav	Your 99 cents a month will go a long way to improving my podcast.	Your 99 cents a month will go a long way in terms of improving podcast.	9,10	10	insertion
show_2c2IJzenX6Q6gJxc2aGRf8-5eUimgKIjnroT9AZIRQr7p.wav	Sort of famous spots to travel and hike and I've been to Lake Louise and I took Lake Louise and and that photo ended up in National Geographic online.	Sort of famous spots to travel and hike and I've been to Lake Louise and that photo ended up in National Geographic online.	15,20	14,15	deletion
show_2cARopVSXsbrWNyt0qEfWf-6hOLbfEH7wqMmi16SiIRSm.wav	Secondary sleeves, pants, socks, gloves, and shoe color edit.	Secondary sleeves, pants, socks, gloves, hat, tie, and shoe color edit.	4,5	5,6	insertion
show_28r3cKdOurjFZECEslXgC2-4jGHOLOoOdqe4LIf8TDUHP.wav	Come full circle back to talking about something like that.	Come full circle back to visiting some exotic new places like that.	5,7	5,9	substitution
show_2t5hhjQSp2hYEutTsCpEwF-1jLJLlnxbhrsjyu4nwBKgr.wav	The reason I said I was 8 is because nothing in my brain told me to use water or to even remove my underwear.	The reason I said I was 8 is because nothing in the instruction manual told me to use water or to even remove my underwear.	11,12	11,13	substitution
show_2c2IJzenX6Q6gJxc2aGRf8-4cWYuGIkcM95UsvHUEpJK4.wav	why we keep going back just because we want to be able to document these places while there aren't so many people visiting.	why we keep going back just because we want to be able to document so many people visiting.	14,18	13,14	deletion
show_1Cwk6m9lXuEd2rilGhWiGr-6QZBZLHGD3DCpZDETpjodI.wav	No to the chemical pollution, air pollution, and the destruction of the environment caused by factories and the manufacturing industry.	No to the chemical pollution, air pollution, no to the killing of plants and wildlife and the destruction of the environment caused by factories and the manufacturing industry.	6,7	7,14	insertion
show_2c2IJzenX6Q6gJxc2aGRf8-5sNE1N7WKOd53y40RUJOyD.wav	really want to push with my channel like photography isn't some serious thing like that	really want to push with my channel like because you know fishing isn't some serious thing like that	8	8,11	substitution
show_1CdMgzPowibFyvgH7hnPZJ-1eO9qoY5JAcN2nNP2675tl.wav	Positive way to improve your leadership and to improve the atmosphere within your team and the culture of your company.	Positive way to improve your commitment to raising awareness of environmentalist causes within your team and the culture of your company.	5,10	5,11	substitution
show_2T23esVRXBfFb5vigvG7A5-2ndcCw02nZHgp0WKFQ8lHe.wav	It's like you kind of like the best way I can describe it is like you kind of as you navigate your way through a creative field.	It's like you kind of like the best way I can describe it is that you need to always remember your personal vision as you navigate your way through a creative field.	13,17	13,22	substitution
show_28gAb6BYOPQTAwtd6JivzK-5gbhBB6vzrxSApOcmcVTs5.wav	want you guys to be able to feel comfortable being vulnerable with us.	want you guys to be able to feel comfortable with being honest and truthful with us.	9,10	9,13	substitution
show_2cDEjUoE1xIZqEMHdy2iLg-3BhVKbLFPasrUOxLgZNUbd.wav	that schedule is one per week and it will probably be like a Wednesday night thing because I plan on doing one to two videos per week.	that schedule is one per week and you will start to see a lot more content arriving because I plan on doing one to two videos per week.	7,15	7,16	substitution
show_2CLetGT20MFsHqfeBN3fYl-0lZ9jnYVgXx7HmKUcdTnJO.wav	hedge a little bit in a traditional Market since and whether its buying a bunch of gpus and then reselling them for 2x the price ah that happened.	hedge a little bit in a traditional Market by buying up a whole lot of single family homes and then reselling them for 2x the price ah that happened.	8,16	8,17	substitution
show_2TPvj8tyUhY2UHOzU9kyu4-6lnZyS5yzd3S4Vaqw5TrHy.wav	Okay, so then he moves on he says so I understand now before we look at everything in more detail.	Okay, so then he moves on he says so I understand now before inspecting everything in more detail.	13,15	13	substitution
show_2cH1Sf7Tg3TiDdGpD3oLiR-0opjgwiSz3AWOOoE49L9pi.wav	The Patriots will just skiing blocking but their backs and tight ends, maybe tighten the formations a little bit.	The Patriots will just focus on their wide receivers and tight ends, maybe tighten the formations a little bit.	4,8	4,8	substitution
show_28hFGrNqCyS73hMP94FALm-3NRcwegtutZLbw1YC2DhhM.wav	The GP did not recommend talking therapies.	The GP did not suggest talking therapies.	4	4	substitution
show_2CJ6f4oLCccT3fsUaWAk9k-3fVgo6u94DJHpK7uP1Qb7V.wav	And, like comment subscribe give me feedback give me feedback.	And, like comment subscribe give me your thoughts and any feedback.	6,8	6,9	substitution
show_2cNIhBNwWmamJs75G3tMxY-4UyKjZff8srwoG8A71ql2K.wav	feel safe to get naked emotionally and mentally to share how they overcome their pain and suffering and how they grew into these bright lights.	feel safe to get so incredibly overwhelmed emotionally and mentally to share how they overcome their pain and suffering and how they grew into these bright lights.	4	4,6	substitution
show_2cNIhBNwWmamJs75G3tMxY-1xe6sJ3hUH8wCHnPxCjbwu.wav	And so I looked at it as very much like an organic problem and I said okay my brain needs this.	And so I looked at it as very much like a complex issue and I said okay my brain needs this.	10,12	10,12	substitution
show_28qfNqUaAXdF3TcEGapJ1d-7uzzKzrT6ggIoqv9gRPz7E.wav	But yeah, I was I was never the best student first year college was actually really really good and I even had a full time job at a time.	But yeah, I would say that I was never the best student in class, but first year college was actually really really good and I even had a full time job at a time.	3,9	3,14	substitution
show_1tl5wg2z0fzjWR18MHKARa-3auJGSBu9ERKSjw44eKkhj.wav	It was because partly was because I had such an Early Peek.	It was because at least partly because I had such an Early Peek.	3,4	3,5	substitution
show_1ChaMDlb8CNR7Bta8ZxODC-6gI5xAKjYcPiQ2cANcnG9q.wav	And if we don't make it a priority the distractions will get us one of the things I love so much about Jesus is if you go back and look at the gospels, he was so focused.	And if we don't make it a priority the distractions will get us one of the things I admire about Jesus is if you go back and look at the gospels, he was so focused.	18,20	18	substitution
show_2chnqxY9vGUWIxn4JvvRpZ-5WanEbFbdssEwK77TcezxC.wav	really supportive friends and firefighters in the fire service who've inspired me to go out there and do this.	really supportive colleagues, friends, and family and they inspired me to go out there and do this.	2,9	2,7	substitution
show_1CHJvc14dYPq0IsX5T0YAP-2FKEiw2NrWejqlw9IkaB1X.wav	Within I'd say within like half a week things changed in my house the energy changed in my house the relationship with my wife started to change and I was like does the magic started to happen?	Within I'd say within like half a week things changed in my house the relationship with my wife started to change and I was like does the magic started to happen?	14,19	13,14	deletion
show_2cyslpwM45TtVfznjVlCnL-4ECn8gmQeSRJJmM3eHg4TF.wav	His music worked better when he did live action TV shows for suspense and humanistic reactions to like scenes and intense situations.	His music worked better when he did live action TV shows for suspense and emotional responses to like scenes and intense situations.	14,15	14,15	substitution
show_2TXTWqxZkLHF6k4cd5F8XN-1N8w7dUDxxqtRErZObJrbG.wav	Helping me find my identity and high school, which I also was fortunate enough to find this amazing woman.	Helping me find my high school, which I also was fortunate enough to find this amazing woman.	4,5	3,4	deletion
show_2Cz04p7U4u2lLSofHLYIeH-0KUpyjDzatq7f6TWjWgAdf.wav	passion and you're not gonna see those results and it's gonna stop you from actually making it get into it because you love it.	passion and you're not gonna like them and it's gonna stop you from actually making it get into it because you love it.	5,7	5,6	substitution
show_2TjptLx9uQUaHhp6YB8jhW-0YIfNwpxL2ztlaSsmBTCKL.wav	Another thing is anything inside the parentheses turns the opposite so negative becomes positive and a positive becomes negative.	Another thing is that in this equation anything inside the parentheses turns the opposite so negative becomes positive and a positive becomes negative.	2,3	3,6	insertion
show_2Chp07kvTN1qDImtrnXm4O-7mdEKKOHue6R5MLo283EQm.wav	hope hope you got a better feel of organic versus paid look definitely keep an eye out for upcoming episodes because I'm a dive a lot more deep into the paid social world.	hope hope you got a better feel of how instagram stars fund their extravagant vacations definitely keep an eye out for upcoming episodes because I'm a dive a lot more deep into the paid social world.	8,11	8,14	substitution
show_2ctsjdVxkuzqftlC9TJASy-6iadzuoEBJ9AOLXaXPmagP.wav	So if you've been following my story, you will remember that I said earlier in this podcast that the Grammy nominations came out.	So if you've been following my story, you will remember that I said earlier that this week we had super exciting stuff to talk about because Grammy nominations came out.	14,18	14,25	substitution
show_1cPkxhnrYWUvCzd0uXMKwo-1auCDHN3NrKq4Bn0OrE0lM.wav	freedom is made with a key.	freedom is made by effort not with a key.	2,3	3,5	insertion
show_2CZeMpXywYmWy53SV2kWEm-2Ic0xbN3defufBYR46ooEi.wav	So for more craziness now that French was conquered we have to join forces to Great Britain.	So for more craziness now that French was conquered by the Germans, we have to join forces to Great Britain.	8,9	9,11	insertion
show_2CyWjLhTGlpGHeSpJOvxj2-6Hvd5G0lyzP62VYPmP1jQj.wav	It was one of those things, you know, you have project sometimes you start on the bottom and sometimes you start at the top.	It was one of those things, you know, some people just start on the bottom and sometimes you start at the top.	8,12	8,10	substitution
show_2cNgsFoVxaxZkUnVU3ehQu-0mgpNxV3cnsvy7RXtF9OHv.wav	Twenty years later it became 20 thousand times worse.	Twenty years later it became thousand times worse.	5	4,5	deletion
show_2czbki8aNirvUjlYcO3I1t-7kdfTr9l9Egod1iFPzIqkK.wav	As a body and some individuals on ways and means we were speaking about possibly just passing the sales price disclosure on residential property.	As a body and some individuals on ways and means we were speaking about possibly just banning them on residential property.	16,20	16,17	substitution
show_2cFZSZNdkxKdiTEE7yrAMB-06KUOjkKFLQgxfnC59GMtT.wav	Tyler also introduces the lack of closure that will bother him increasingly throughout the album's front half.	Tyler also introduces in great detail its outcome on the album as a whole, the lack of closure that will bother him increasingly throughout the album's front half.	2,3	3,13	insertion
show_2C2dO6pWL4cPzOJ2Bu7QRA-3I34aJLdGXgCEuY6rd90Tm.wav	And then the other matchup is Seattle visits Philadelphia now Philadelphia has a worse record, but because they were not able to win their division.	And then the other matchup is Seattle visits Philadelphia now Seattle has the better record, but because they were not able to win their division.	10,13	10,13	substitution
show_2c2IJzenX6Q6gJxc2aGRf8-0z0etCBM2PrHOLc9gxc25E.wav	More of a base and infrastructure to tell those stories rather than doing it out of a out of a tent with solar power.	More of a base and infrastructure to fight these battles instead of out of a tent with solar power.	7,16	7,11	substitution
show_1CxjAV2kY4pypL256BmRQ6-531d49VH5hVVMzXfKhsMFf.wav	So the fourth episode May the fourth be with you.	So the fourth place winner will chat with you.	3,7	3,6	substitution
show_2TI9Upbk0gXEdxsYTPzB9W-4x4pLTV2ZiA98TIIWkK0JH.wav	This year and and the word started to spread in the lacrosse community and just in general.	This year and and the word started to spread rapidly and without any sense of control in the slightest in the lacrosse community and just in general.	8,9	9,18	insertion
show_2cOpF3UhdxdvlZZzyOVPHt-3GXDpd2ZHpy9YTMkLAjuZq.wav	Joey Scott and Richie all sat together at the creek.	Joey Scott and Albert enjoyed fishing trout at the creek.	3,6	3,6	substitution
show_2TXUkJOq3oBEn2ROormwza-6ogJCqF2Ya7qZ2L23AAoPd.wav	Are equipped with sensors to monitor the proximity of surroundings and can open in even the tightest parking spaces.	Are equipped with sensors to detect collisions and can open in even the tightest parking spaces.	5,9	5,6	substitution
show_28OttmVaPSfuB6e4cqX0yu-2tB2ldmTxblsCsC8QH7jnm.wav	And we're at this point.	And we're all extremely excited at this point.	1,2	2,4	insertion
show_28DAnHzOfbUoRkpj5OMqVI-58LUphKYwegjJP2ZQFjmUH.wav	You know best to study every day after your classes, especially because you know, you're still in the mood you're still in that flow.	You know best to study when well rested and well fed and when you're still in that flow.	5,18	5,12	substitution
show_2T8QRK60cWaPQflfo6Wuc4-4oTO10xL7hQQS2fuXBy1d7.wav	In the pursuit of lightness minimal stress ultimate fulfillment.	In the pursuit of calm serenity an escape from stress ultimate fulfillment.	4,5	4,8	substitution
show_2CN1XNYxo4NFClfUajCtSM-1PrlYFjZzosbVg4BvKzbLJ.wav	and and you know that she has been around for a few decades now longer and it has such a story that it it it was established in basketball.	and and you know that she has been around for as long as many young people today have been watching and it has such a story that it it it was established in basketball.	10,14	10,19	substitution
show_2t94ceh3K4qorKbKXJw7NV-5r2eumfkrw5Ym2WYexqEpK.wav	Hello and welcome to the first of hopefully many you cannot ingest podcast.	Hello and welcome to the second of hopefully many you cannot ingest podcast.	5	5	substitution
show_2c3EDnMjSm9bAr1fQgLmMg-0jzn6k4JPy4s3XOiQZFFF8.wav	They sit down that these plot points need to happen because they have a whole Board of index cards full of notes that need to happen in the story.	They sit down that these plot points need to happen because they have a whole Board of like fifty index cards full of notes that need to happen in the story.	16,17	17,18	insertion
show_2cQVtitXsGYcp9kIYBi9VJ-7wZR6aZIx7PTYFcShbre2k.wav	Community so I didn't ever feel that openness until I moved back and I wasn't in that realm anymore.	Community so I didn't ever feel that openness until I moved across the country and I wasn't in that realm anymore.	11	11,13	substitution
show_1C49KB0vYZsFe9eoFAr2Cq-6ZkOgQuv6e4y74xhDNKc4Y.wav	If you ask me what you are I would have Alex Caruso as the starting point guard for the Lakers.	If you ask me what you are I would have Alex Caruso out on the court playing as the starting point guard for the Lakers.	11,12	12,16	insertion
show_2CMZqwsTyimKMEGMIdOFCz-2xWHgQryE2ruRadxVVbdbD.wav	Really going to talk about why us and why now for this podcast.	Really going to talk about who we are why we're here and why now for this podcast.	5,6	5,10	substitution
show_2t5PIVQePC6L3CFRpAUnaf-0f0tl83ucovdSpJHoftEU5.wav	No words just lightning breaking darkness and crashing into the Earth with brilliant presence.	No words just lightning breaking darkness and crashing into the surface of the Earth with brilliant presence.	9,10	10,12	insertion
show_2cGQMNoS6MuKFozuNYjCOQ-6qmAgAKLoYpSknCgQ0y6ET.wav	For making the title though because I need to get my numbers way up before I get there, but I'm gonna get there title of Iceland is definitely going to sign me and um, yeah.	For making the title though because I need to get my numbers way up before I get there, but I'm gonna get there title of Iceland is going to sign me and um, yeah.	27	26,27	deletion
show_2tgc74udMU420iVPvl597O-0fORVMXyI1aCUobzMKm5Ll.wav	Feeling into the eyes and the temples and the entire facial structure.	Feeling into the eyes and getting a sense of the entire facial structure.	5,7	5,8	substitution
show_2csQINhTs2YQOWmpmy5gmJ-3J4UWcEHj2lQvk4lRog4LZ.wav	It had proven to be an exciting challenge the last time he subdued a couple and Israel's expecting his second victim to be by soon.	It had proven to be an exciting challenge the last time he subdued a couple and he couldn't wait for his victim to be by soon.	16,19	16,20	substitution
show_2tYdWKnaDR4D2qgCHml2Ax-1ntgzgJweV4WPav6lZUeK9.wav	Have three rounds of attack before you switch on the defense and then you're going to have three rounds of Defense before you switch on to attack.	Have three rounds of attack before you switch on the defense and then you're going to have three rounds of Defense after which you repeat again on to attack.	21,23	21,25	substitution
show_2TuwSyFIHWD1UxyBCMLnWT-4N24WWvGmHzFS0BoqfGNRE.wav	break out of their shell a little bit and you see oh gosh, they're really way more Hardy than I thought or funny than I thought or whatever because it's the moment they've drench themselves like the other day.	break out of their shell a little bit and you see oh gosh, they're really way more Hardy than I thought or funny than I thought or whatever because it's the moment they recovered from harsh cold weather of the other day.	32,35	32,38	substitution
show_1CzCdrVrUH7JwgyZnVGYLh-6Q5gndks7qNDYlyWQNThsb.wav	Hey guys, Tim Jennings here with soul heart with another episode of search engine optimization tips and tricks trying to get your site found on Google and other search engines like that.	Hey guys, Tim Robinson hosting the show today with another episode of search engine optimization tips and tricks trying to get your site found on Google and other search engines like that.	3,7	3,7	substitution
show_1CUdmqDR1A47vPMjsiK6m2-0vStekPNMu57qySDbgL4Bz.wav	And I think that's uh it's it's a fascinating uh Dynamic but uh I wanna thank you.	And I think that's absolutely a super fresh, new, and exciting Dynamic but uh I wanna thank you.	4,9	4,10	substitution
show_1tUHam5eF5aw1ANOoTTNHY-00rYk6fUFND3sgPVnFfOx5.wav	You know your body first and foremost because that is so important.	You know your friends and family first and foremost because that is so important.	3	3,5	substitution
show_1Tazwk3AUA0uz6jQk0X2qx-6AnUhuFsREJdKTZ5YtJN67.wav	I actually got I took the BET and I bet on the Cavs winning.	I actually got I took the BET and I bet on the Patriots with Tom Brady winning.	12	12,15	substitution
show_1c7paeaWBSC8lM2WmoE7oI-7ngmygKXeMj6llnxB9E5W5.wav	We also have to be able to observe ourselves and how we behave, why do we refuse to rest?	We also have to be able to observe ourselves and contemplate our decisions, such as why do we refuse to rest?	10,12	10,14	substitution
show_1tPIbAQXvAfaZ9w2aUDVn5-2Otc04LmTGhUMuBF8U36Bt.wav	teachers in elementary school and middle school are not that different.	teachers in elementary school and in most universities are not that different.	5,6	5,7	substitution
show_1CnVxnXxFzJyqAVe0gxVao-2MZDt0KXGXSM6ciptUHsJI.wav	maybe take like two shots a day of like that drink and I'm sorry sure when I saw my doctor that Monday she did an ultrasound.	maybe take like a day off or something because after the weekend when I saw my doctor that Monday she did an ultrasound.	3,14	3,11	substitution
show_1TH2TkfOKETXMhheVKhnSF-4OR6mYxdRwIdRfZAZyIg0d.wav	Interesting and I think this is a comment a much more common phenomenon nowadays is that she just found out that she has a fifth sibling?	Interesting and I think she just found out that she has a fifth sibling?	4,15	3,4	deletion
show_1CmHgwWKnKTU94RPoVJVhm-2FjSP20WACh8xlrfgo27lv.wav	Now maybe should we why don't we push this out to situation Nation?	Now maybe should we think about if we want to push this out to situation Nation?	4,6	4,9	substitution
show_1cY9S6222J0jGzYbHGQKPs-7ImJPzwLhq2ZjRrbYBMANb.wav	Okay, what's um, what's one genre people will be shocked to know that you read.	Okay, what's um, what's one genre people will be shocked to know you really like to read.	12,13	12,15	substitution
show_1T69Xe0EJ4n0gOO4RD9qv0-42elfJEncMhPZCtpdwUX2Q.wav	To buy that vodka or putting the money at the dock passive and there's one other thing that actually I'm not sure if you you guys are really aware.	To buy that vodka or any other alcohol you need to show proof of age and there's one other thing that actually I'm not sure if you you guys are really aware.	5,11	5,14	substitution
show_1t3ZatwPEux3wUnXMUE62z-2VItlcCcQodDtxIm1hEgLc.wav	I knew that if I didn't just start even if it wasn't perfect that I would never start.	I knew that if I didn't just get to work on it even if it wasn't perfect that I would never start.	7	7,11	substitution
show_1T6df6cejtcf12QJZN0yUu-5mvPGk6dS60f0LCPGjbR4D.wav	Just because someone isn't saying anything while you're talking doesn't mean they are processing a single word.	Just because someone is quiet while you're talking doesn't mean they are processing a single word.	3,5	3,4	substitution
show_1cKkutPWS7rRmyOtrSwouo-79txjRa3xuTvzkjLhRvBgR.wav	Is an infinitely constricting Paradox if I try and Define how much needs to be done before I can enjoy an emotional experience.	Is an infinitely constricting Paradox if I try and figure out everything I have to do before I can enjoy an emotional experience.	9,15	9,15	substitution
show_1C98g10rH9mj7aiTgRGEtH-416OTsevOhXjjB6asFyyK9.wav	And like I look back to our branding uh two years ago, like you can see what the website looked like back then horrendous.	And like I look back to our branding back when we were first starting out as a company, like you can see what the website looked like back then horrendous.	8,11	8,17	substitution
show_1CMwbrEPRtk46eseEmxFOd-6FlNy7MMX7L4J5KCyVYWYM.wav	If you screenshot somebody though, like if they send a picture and you take a screenshot, I think that person knows that you have screenshotted them.	If you screenshot somebody though, like if they send a picture and you take a screenshot, I think there's a little notification in the app telling them that you have screenshotted them.	18,20	18,26	substitution
show_1TTQPQzpjtXPKadzUx5vo6-1nVtCzKPNliNLJ91Ck5TSq.wav	So my last question for you guys talk to me about your most memorable moment or the aha moment or the that feeling of just it was so wonderful to be part of this club.	So my last question for you guys talk to me about that feeling of just it was so wonderful to be part of this club.	11,20	10,11	deletion
show_1c4MlC6ClLyP8osRCtdTUs-2GXhBpBNH9GrdHorlNBcPo.wav	Personal trainers pretty much pretty much a teacher is just a teacher teaching you in assisting Youth of the goals that you want.	Personal trainers pretty much are just coaches I mean really they're like pretty much a teacher is just a teacher teaching you in assisting Youth of the goals that you want.	3,4	4,11	insertion
show_1coo0trh2Do1KR6ev6Fczv-3Zr9rSN8B9pjuNMjZVcYVM.wav	So that's why there's a definite divided in people's opinion on this and that's why it's been such a highly talked about issue.	So that's why there's a definite divided in people's opinion on this which is likely also why it's been such a highly talked about issue.	12,13	12,15	substitution
show_1cVKtsokch166IPm3tRg0U-4XdMT9MFto7G1IWnaH7fZV.wav	Prohibits that so by getting rid of homework.	Prohibits that so we're getting rid of homework.	3	3	substitution
show_1CLAwGQAgTZIzV0TBj254v-51nbmcVD8allN0g5hLbhN7.wav	Went Caspian and said Lord King slay me speedily as a great traitor for buy my silence.	Went Caspian and said if you'll see me slain I ask only that it be done speedily as a great traitor for buy my silence.	4,7	4,15	substitution
show_1TsDtgHbctWFu1B856QLI0-6LykTfQQJFPhKfoVzNdtgA.wav	Environment has changed and yes, it is easy to say our environment is out to get us and it might not be your fault.	Environment has changed and yes, it is easy to say our environment is dying right now and it might not be your fault.	13,16	13,15	substitution
show_1cOsDxbQjLADedwZaG7Bm1-6mLz726LYCcgN69RWWlrOJ.wav	Fast cars, that had the nice clothes, that had the money, they was criminals.	Fast cars, that had the nice clothes, that had expensive gold watches, that had the money, they was criminals.	8,9	9,13	insertion
show_1ttEqOUCJnc7JAGNCWaAqq-471ymSOOetlSAecJK9Bfhr.wav	Kind of a great time he kind of gets to pair with uh Quinn Priester and those to kind of get to come up through the minor leagues together.	Kind of a great time he gets to team up with Quinn Priester and those to kind of get to come up through the minor leagues together.	6,12	6,10	substitution
show_1CB8GHgtZAnsn6ihBUOWKo-1DyrYzUYIq5Zx4edeHge5Y.wav	back the Coca Cola, and then everyone be happy and buy it more, then they make more money.	back the Coca Cola, and then everyone buy it more, then they make more money.	7,9	6,7	deletion
show_1tzXR6tf3WGxV9nNWolDMN-5ZayFE8KG7W18jlM9t5d44.wav	And, just either let the balloon go she would count down uh and then we would all think of what we wanted to let go and then we would let the balloon go or blow out our candles and then it was gone.	And, just either let the balloon go she would count down uh and then we would celebrate this moment togather and then we would let the balloon go or blow out our candles and then it was gone.	16,24	16,19	substitution
show_1tAewNZS0q8QPQpIIEUQQ0-3juUg3wFn3w7OFFo0sGe6R.wav	When they killed them they turned back into the packed humans that were there.	When they killed them they turned back into the terrified ponies that were there.	9,10	9,10	substitution
show_1T15rqmPErKONqSx9rzr9H-726cSurFjtPFS9fiEAMT6b.wav	Use those email templates verbatim verbatim, but make sure y'all very careful with them because there's merge codes that they have preselected that might not be.	Use those email templates verbatim, but make sure y'all very careful with them because there's merge codes that they have preselected that might not be.	4	3,4	deletion
show_1cdpRq4rWNv1xYw3yab7b7-1BDuArBpFR2bZmv4cNafcl.wav	So I'm pretty sure I wanted to be a teacher so I could just tell everyone what to do.|So I'm pretty convinced that I wanted to be a teacher so I could just tell everyone what to do.	So I'm pretty convinced that I wanted to be a teacher so I could just tell everyone what to do.|So I'm pretty convinced that I wanted to be a teacher so that I can tell everyone what to do.	3|11,13	3,4|12,14	substitution|substitution
show_2C0AgUOt4eCULjFjb3mynN-6Uv8Y4yw6o4V2zCKZrzrPg.wav	See why it's extremely valuable to it's kind of like it's kind of like having a wall hack to watch a demo.|See why it's extremely important right? it's kind of like it's kind of like having a wall hack to watch a demo.	See why it's extremely important right? it's kind of like it's kind of like having a wall hack to watch a demo.|See why it's extremely important right? it's kind of like having a rough time to watch a demo.	4,5|10,17	4,5|10,13	substitution|substitution
show_2czpNd58pfuIxOCvO2czHu-2U9S7MpxvVUR6sPTdFRUWR.wav	So um yea that's it for this episode of the podcast what I will let say just eh that just as we come to the end of the podcast.|So um yea that's pretty much all for this episode of the podcast what I will let say just eh that just as we come to the end of the podcast.	So um yea that's pretty much all for this episode of the podcast what I will let say just eh that just as we come to the end of the podcast.|So um yea that's pretty much all for this episode of the podcast as we come to the end of the podcast.	4|11,19	4,6|12,13	substitution|deletion
show_2c04iZbAAIYmZrTIRgggNc-4zacmnwJi3osMV5beOYGLu.wav	Then at the same time in my mind mentally I kept screaming and yelling and mentally I'm thinking to myself I'm screaming and yelling at the top of my lungs.|Then at the same time I feel like I'm trapped mentally and I'm thinking to myself I'm screaming and yelling at the top of my lungs.	Then at the same time I feel like I'm trapped mentally and I'm thinking to myself I'm screaming and yelling at the top of my lungs.|Then at the same time I feel like I'm trapped mentally and I'm thinking to myself I'm screaming at the top of my lungs.	5,15|22,23	5,11|17,18	substitution|deletion
show_2co4uJEBlUoi9JanlRz6ls-1QGXaI6j7lYMr3jofmM7Vy.wav	If you ever wondered how I make my podcast guys, well I use anchor anchor is free.|If you ever thought about how I make my podcast guys, well I use anchor anchor is free.	If you ever thought about how I make my podcast guys, well I use anchor anchor is free.|If you ever thought about how I make my podcast guys, well the key is a tool called anchor anchor is free.	3|11,12	3,4|12,17	substitution|substitution
show_1TMFh5H29pHWPD6KizMrlq-0nCgrfdU9zWUl6XJUnbsUX.wav	served an incredibly big purpose doing that that being said all of that was me trying to instill those things into my life.|served an incredibly long time doing that that being said all of that was me trying to instill those things into my life.	served an incredibly long time doing that that being said all of that was me trying to instill those things into my life.|served an incredibly long time doing that that being said all of that was me trying to instill things into my life.	3,4|18	3,4|17,18	substitution|deletion
show_28ZmEMgyEUzHCmoW9DdwK3-12YTpVg7Ko2mRc4Si6OtDu.wav	Be a good stress as well something that you know, you could be controlling something that won't you know, take a mental toll on you.|Be a good stress as well you know, you could be controlling something that won't you know, take a mental toll on you.	Be a good stress as well you know, you could be controlling something that won't you know, take a mental toll on you.|Be a good stress as well you know, you could be controlling something that will not take a mental toll on you.	6,7|16,18	5,6|14,15	deletion|substitution
show_2toX0f3dPmI8gmUSOKZicx-1fJ8FUGSZLyb5fpbd3QDSi.wav	This year has been like my entire Journey so far in the music business and I'm just looking forward to what's to come.|This year has been the best part of my Journey so far in the music business and I'm just looking forward to what's to come.	This year has been the best part of my Journey so far in the music business and I'm just looking forward to what's to come.|This year has been the best part of my Journey so far in the acting business and I'm just looking forward to what's to come.	4,6|12	4,8|14	substitution|substitution
show_2txZW3TWakg6Pr41kcbiA6-2MlY5WOs8ScY5zTfGRJHDc.wav	And but with this job it was like I was staring at a computer like for 10 hours 8 hours and then maybe doing therapy like for nine hours a week.|And but with this job I had to just be like reading or typing away on a computer like for 10 hours 8 hours and then maybe doing therapy like for nine hours a week.	And but with this job I had to just be like reading or typing away on a computer like for 10 hours 8 hours and then maybe doing therapy like for nine hours a week.|And but with this job I had to just be like reading or typing away on a computer like for 10 hours 8 hours and then maybe doing digital art for nine hours a week.	5,11|24,25	5,15|28,29	substitution|substitution
show_2CN1XNYxo4NFClfUajCtSM-75tS2ZoJCscJetPxi0aukT.wav	So, you know like there was a there was an example where I bought two or three pair and literally just gave them to Goodwill as I was moving because the I never wore them they still had tags on them.|So, you know like there was an example where I bought two or three pair and literally just gave them to Goodwill as I was moving because the I never wore them they still had tags on them.	So, you know like there was an example where I bought two or three pair and literally just gave them to Goodwill as I was moving because the I never wore them they still had tags on them.|So, you know like there was an example where I bought two or three pair and then just gave them to Goodwill as I was moving because the I never wore them they still had tags on them.	6,8|19	5,6|16	deletion|substitution
show_2Cp52s1B4vepSJi2F8gmU9-36PtNWkMxWZtxbqBJ4mUXD.wav	And I want to mention that there were a couple of teachers who really helped me during that time.|And I want to mention that there were a couple of great teachers who really helped me during that time.	And I want to mention that there were a couple of great teachers who really helped me during that time.|And I want to mention that there were a couple of great teachers who noticed and reached out when I was struggling and helped me during that time.	10,11|13	11|14,22	insertion|substitution
show_2T23esVRXBfFb5vigvG7A5-6JivmdWNP3UnZiIOplv953.wav	It's hard to say how I get things off the ground because I just get going like I don't know how to explain it.|It's hard to say how I get the system running so quickly because I just get going like I don't know how to explain it.	It's hard to say how I get the system running so quickly because I just get going like I don't know how to explain it.|It's hard to say how I get the system running so quickly because I just get going I don't know how to explain it.	7,10|16	7,11|16,17	substitution|deletion
show_2T3Pjyw6MJEwPE9uixrYak-7M5817WSsGaldlQfZptcyf.wav	I know now how extremely lucky that truly is what a blessing Not only was I going to be a mom but besides nausea all my other symptoms went away.|I know now how incredibly lucky I really was, what a blessing Not only was I going to be a mom but besides nausea all my other symptoms went away.	I know now how incredibly lucky I really was, what a blessing Not only was I going to be a mom but besides nausea all my other symptoms went away.|I know now how incredibly lucky I really was, what a blessing Not only was I going to be a mom but except for nausea all my other symptoms went away.	4,8|22	4,8|22,23	substitution|substitution
show_1C5B3zuyd67j7v9XRKNb2L-4Az7OglwsgYR94ikvOZdCf.wav	Past something and I have so many things that I have planned for this podcast, but first before we get into any of that I'm going to introduce myself.|Past something and I have so many things that I have planned to talk about today, but first before we get into any of that I'm going to introduce myself.	Past something and I have so many things that I have planned to talk about today, but first before we get into any of that I'm going to introduce myself.|Past something and I have so many things that I have planned to talk about today, but first before we get into any of that I would like to introduce myself.	12,14|24,25	12,15|25,27	substitution|substitution
show_1c8f0MS5LcfbSvwexFC9mn-1MY1u1xOyiFAWXGRqyPJj7.wav	You know with hesitation everything is counted to the T and says if every drink is measured, how are you going to give a regular an honest poor?|You know with hesitation everything is counted to the T and says if you're calculating the nutritional information, how are you going to give a regular an honest poor?	You know with hesitation everything is counted to the T and says if you're calculating the nutritional information, how are you going to give a regular an honest poor?|You know with hesitation everything is counted to the T and says if you're calculating the nutritional information, how are you going to provide the same service to an honest poor?	13,16|22,24	13,17|23,27	substitution|substitution

================================================
FILE: cog.yaml
================================================
# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
  gpu: true
  system_packages:
    - libgl1-mesa-glx
    - libglib2.0-0
    - ffmpeg
    - espeak-ng
  python_version: "3.11"
  python_packages:
    - torch==2.1.0
    - torchaudio==2.1.0 
    - xformers
    - phonemizer==3.2.1
    - whisperx==3.1.1
    - openai-whisper>=20231117
  run:
    - git clone https://github.com/facebookresearch/audiocraft && pip install -e ./audiocraft
    - pip install "pydantic<2.0.0"
    - curl -o /usr/local/bin/pget -L "https://github.com/replicate/pget/releases/download/v0.6.0/pget_linux_x86_64" && chmod +x /usr/local/bin/pget
    - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/wav2vec2_fairseq_base_ls960_asr_ls960.pth" "https://download.pytorch.org/torchaudio/models/wav2vec2_fairseq_base_ls960_asr_ls960.pth"
predict: "predict.py:Predictor"


================================================
FILE: config.py
================================================
import argparse


def MyParser():
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    # general training 
    parser.add_argument("--seed", type=int, default=1)
    parser.add_argument("--precision", type=str, default="float16")
    parser.add_argument("--num_workers", type=int, default=8)
    parser.add_argument("--resume", action="store_true", default=False)
    parser.add_argument("--tb_write_every_n_steps", type=int, default=100)
    parser.add_argument("--print_every_n_steps", type=int, default=400)
    parser.add_argument("--val_every_n_steps", type=int, default=800)
    parser.add_argument("--lr", type=float, default=0.05)
    parser.add_argument("--batch_size", type=int, default=100, help="this is the effective batch size, no matter whether using gradient_accumulation_steps, not used if we specified max_num_tokens")
    parser.add_argument("--max_num_tokens", type=int, default=100000, help="max number of encodec tokens per gpu, this is only used when using dynamic batching, will ignore batch size. Note this is the final effective batch size per GPU, i.e. gradient accumulated batch size per gpu")
    parser.add_argument("--val_max_num_tokens", type=int, default=None, help="FOR validation")
    parser.add_argument("--num_buckets", type=int, default=6, help='used for dynamic batching, bucketing the samples based on the number of tokens')
    parser.add_argument("--dynamic_batching", type=int, default=0)
    parser.add_argument("--weight_decay", type=float, default=1e-2)
    parser.add_argument("--warmup_fraction", type=float, default=0.01, help="use linear warmup, the proportion of the training steps that are used for warming up")
    parser.add_argument("--num_epochs", type=int, default=10)
    parser.add_argument("--num_steps", type=int, default=None, help="if not None, will ignore n_epochs and use num_steps as the total number of amount of training, can try e.g. 400000 i.e. 400k steps")
    parser.add_argument("--gradient_accumulation_steps", type=int, default=1)
    parser.add_argument("--gradient_clip_val", type=float, default=1.0, help="the value for torch.nn.utils.clip_grad_norm_(), not used if we use ScaledAdam optimizer")
    parser.add_argument("--early_stop_step", type=int, default=3200, help="stop training after this many steps of non-improvement")
    parser.add_argument("--early_stop_threshold", type=float, default=-1.0, help="early stop after the improvement is below this threshold for certain number of steps")

    # optimizer focused
    parser.add_argument("--optimizer_name", type=str, default="AdamW", help="can also use ScaledAdam, in which case we'll also use the Eden scheduler")
    parser.add_argument("--reduce_lr_start_step", type=int, default=3000, help='after which significantly reduce the lr. a param for the eden optimizer')
    parser.add_argument("--pseudo_epoch_size", type=int, default=3000, help="only use for Eden scheduler.")
    parser.add_argument("--reduce_lr_start_epoch", type=int, default=4)
    parser.add_argument("--clipping_update_period", type=int, default=600)


    # path
    parser.add_argument("--exp_dir", type=str, default=None, help="will be combined with dataset name")
    parser.add_argument("--dataset", type=str, help="e.g. 'libritts', 'gigaspeech', they are folder name in the data dir also")
    parser.add_argument("--dataset_dir", type=str, help="need to be compatible with corresponding dataset py file")
    parser.add_argument("--phn_folder_name", type=str, default="phonemes", help="for libritts I also have arpa phns, in which case should be phonemes_arpa")
    parser.add_argument("--encodec_folder_name", type=str, default="encodec_16khz_4codebooks", help="folder where encodec codes are stored")
    parser.add_argument("--manifest_name", type=str, default="manifest", help="metadata filename")

    # data focused
    parser.add_argument("--pad_x", type=int, default=1, help="whether or not always pad x to have text_max_length. select 1 to get the maximal memory consumption, but the actual case should be smaller, better to have it being 0")
    parser.add_argument("--audio_max_length", type=float, default=20, help="in second, crop or drop the audio is length is longer than this")
    parser.add_argument("--audio_min_length", type=float, default=2, help="in second, drop the audio if length is shorter than this")
    parser.add_argument("--text_max_length", type=int, default=400, help='if too long, we crop or drop')
    parser.add_argument("--text_min_length", type=float, default=10, help="if too short, will drop")
    parser.add_argument("--encodec_sr", type=int, default=50, help="for my encodec that takes 16kHz audio with a downsample rate of 320, the codec sample rate is 50Hz, i.e. 50 codes (x n_codebooks) per second")
    parser.add_argument("--drop_long", type=int, default=0, help="if this is true, will drop example whose encodec sequence or phone sequence is too long, rather than cropping, to reduce hellucination")

    # encodec and token rearrangement
    parser.add_argument('--mask_len_min', type=int, default=1, help='Minimum mask length')
    parser.add_argument('--mask_len_max', type=int, default=600, help='Maximum mask length')
    parser.add_argument("--eos", type=int, default=-1, help="this is to be used with reduced_eog, where we end the utterance with eos, and end the generated segment with eog, also when this is used, the n_special should be 4")
    parser.add_argument("--reduced_eog", type=int, default=0, help="for the non-final segments, do not insert eog at the end, this could hopefully solve the early stopping issue when doing tts")
    parser.add_argument("--special_first", type=int, default=0, help="if 1, need to have special tokens to be the first few tokens, e.g. 0, 1, 2, which means we need to adjust the preprocessing and postprocessing of the encodec codes. note that we hard coded to have 3 special tokens")
    parser.add_argument("--n_special", type=int, default=3, help="empty, eog, pad, (eos)")
    parser.add_argument("--codebook_weight", type=str, default=None, help="e.g. ['5','1','0.5','0.1']")
    parser.add_argument("--max_mask_portion",type=float,default=0.7,help="should mask a utterance for more than this portion")
    parser.add_argument("--max_n_spans", type=int, default=3, help='maximal number of spans, only use when using multicm3, this is used to decide number of mask_embedding, and max clamp value if use Poisson distribution, if use uniform distribution to sample number of spans if will be uniform(1,max_n_spans)')
    parser.add_argument("--shuffle_mask_embedding", type=int, default=0, help="whether shuffle the mask embedding, so that mask:0 is not the most well trained, default is not shuffling. The default has it's benefit, as it make sure that mask:0 always appear the first")
    parser.add_argument("--mask_sample_dist", type=str, default="poisson1", help="uniform or poissonx, e.g. poisson1, meaning the parameter lambda is 1, it will most likely sample 1 masks")
    parser.add_argument("--min_gap", type=int, default=5, help="after sampled starts, delete later one if it closer to the former start than the min_gap")
    parser.add_argument('--n_codebooks', type=int, default=4)
    parser.add_argument('--text_vocab_size', type=int, default=100, help='Size of text vocabulary')
    parser.add_argument('--text_pad_token', type=int, default=100, help='padding of the text tokens, not attended')
    parser.add_argument('--audio_vocab_size', type=str, default='2048', help="Size of audio vocabulary")
    parser.add_argument("--empty_token", default=2048, type=int, help="indicating the no token at the position for the codebook")
    parser.add_argument('--eog', type=int, default=2049, help='End of generation token')
    parser.add_argument('--audio_pad_token', type=int, default=2050, help='padding of the encodec codes, not attended')

    # model focused
    parser.add_argument('--d_model', type=int, default=2048, help='Model dimension')
    parser.add_argument('--audio_embedding_dim', type=int, default=2048, help='dimension for encodec continues embedding (before being quantized)')
    parser.add_argument('--text_embedding_dropout', type=float, default=0.1, help='Dropout for text embedding')
    parser.add_argument('--audio_embedding_dropout', type=float, default=0, help='Dropout for audio embedding')
    parser.add_argument('--text_positional_embedding_dropout', type=float, default=0.1, help='Dropout for text positional embedding')
    parser.add_argument('--audio_positional_embedding_dropout', type=float, default=0.1, help='Dropout for audio positional embedding')
    parser.add_argument('--trm_dropout', type=float, default=0.1, help='Dropout for transformer')
    parser.add_argument('--nhead', type=int, default=16, help='Number of attention heads')
    parser.add_argument('--num_decoder_layers', type=int, default=16, help='Number of decoder layers')
    parser.add_argument('--load_model_from', type=str, default=None, help='Path to load model from, this will be effective last, so will overwrite all previous load, including resume')
    return parser

================================================
FILE: data/__init__.py
================================================


================================================
FILE: data/gigaspeech.py
================================================
import os
import torch
import random
import copy
import logging
import shutil

class dataset(torch.utils.data.Dataset):
    def __init__(self, args, split):
        super().__init__()
        self.args = args
        self.split = split
        assert self.split in ['train', 'validation', 'test']
        manifest_fn = os.path.join(self.args.dataset_dir, self.args.manifest_name, self.split+".txt")

        with open(manifest_fn, "r") as rf:
            data = [l.strip().split("\t") for l in rf.readlines()]
        lengths_list = [int(item[-1]) for item in data]
        self.data = []
        self.lengths_list = []
        for d, l in zip(data, lengths_list):
            if l >= self.args.encodec_sr*self.args.audio_min_length:
                if self.args.drop_long and l > self.args.encodec_sr*self.args.audio_max_length:
                    continue
                self.data.append(d)
                self.lengths_list.append(l)
        logging.info(f"number of data points for {self.split} split: {len(self.lengths_list)}")

        # phoneme vocabulary
        vocab_fn = os.path.join(self.args.dataset_dir,"vocab.txt")
        shutil.copy(vocab_fn, os.path.join(self.args.exp_dir, "vocab.txt"))
        with open(vocab_fn, "r") as f:
            temp = [l.strip().split(" ") for l in f.readlines() if len(l) != 0]
            self.phn2num = {item[1]:int(item[0]) for item in temp}
        
        self.symbol_set = set(["<SIL>", "<MUSIC>", "<NOISE>", "<OTHER>"])
    
    def __len__(self):
        return len(self.lengths_list)
    
    def _load_phn_enc(self, index):
        item = self.data[index]
        pf = os.path.join(self.args.dataset_dir, self.args.phn_folder_name, item[1]+".txt")
        ef = os.path.join(self.args.dataset_dir, self.args.encodec_folder_name, item[1]+".txt")
        try:
            with open(pf, "r") as p, open(ef, "r") as e:
                phns = [l.strip() for l in p.readlines()]
                assert len(phns) == 1, phns
                x = [self.phn2num[item] for item in phns[0].split(" ") if item not in self.symbol_set] # drop ["<SIL>", "<MUSIC>", "<NOISE>", "<OTHER>"], as they are not in training set annotation
                encos = [l.strip().split() for k, l in enumerate(e.readlines()) if k < self.args.n_codebooks]
                
                assert len(encos) == self.args.n_codebooks, ef
                if self.args.special_first:
                    y = [[int(n)+self.args.n_special for n in l] for l in encos]
                else:
                    y = [[int(n) for n in l] for l in encos]
        except Exception as e:
            logging.info(f"loading failed for {pf} and {ef}, maybe files don't exist or are corrupted")
            logging.info(f"error message: {e}")
            return [], [[]]

        return x, y

    def __getitem__(self, index):
        x, y = self._load_phn_enc(index)
        x_len, y_len = len(x), len(y[0])

        if x_len == 0 or y_len == 0:
            return {
            "x": None, 
            "x_len": None, 
            "y": None, 
            "y_len": None, 
            "y_mask_interval": None, # index y_mask_interval[1] is the position of start_of_continue token
            "extra_mask_start": None # this is only used in VE1
            }
        while y_len < self.args.encodec_sr*self.args.audio_min_length:
            assert not self.args.dynamic_batching
            index = random.choice(range(len(self))) # regenerate an index
            x, y = self._load_phn_enc(index)
            x_len, y_len = len(x), len(y[0])
        if self.args.drop_long:
            while x_len > self.args.text_max_length or y_len > self.args.encodec_sr*self.args.audio_max_length:
                index = random.choice(range(len(self))) # regenerate an index
                x, y = self._load_phn_enc(index)
                x_len, y_len = len(x), len(y[0])

        ### padding and cropping below ###
        ### padding and cropping below ###
        # adjust the length of encodec codes, pad to max_len or randomly crop
        orig_y_len = copy.copy(y_len)
        max_len = int(self.args.audio_max_length * self.args.encodec_sr)
        if y_len > max_len:
            audio_start = random.choice(range(0, y_len-max_len))
            for i in range(len(y)):
                y[i] = y[i][audio_start:(audio_start+max_len)]
            y_len = max_len
        else:
            audio_start = 0
            if not self.args.dynamic_batching:
                pad = [0] * (max_len - y_len) if self.args.sep_special_token else [self.args.audio_pad_token] * (max_len - y_len)
                for i in range(len(y)):
                    y[i] = y[i] + pad
        
        # adjust text
        # if audio is cropped, and text is longer than max, crop max based on how audio is cropped
        if audio_start > 0 and len(x) > self.args.text_max_length: # if audio is longer than max and text is long than max, start text the way audio started
            x = x[int(len(x)*audio_start/orig_y_len):]
            if len(x) > self.args.text_max_length: # if text is still longer than max, cut the end
                x = x[:self.args.text_max_length]
        
        x_len = len(x)
        if x_len > self.args.text_max_length:
            text_start = random.choice(range(0, x_len - self.args.text_max_length))
            x = x[text_start:text_start+self.args.text_max_length]
            x_len = self.args.text_max_length
        elif self.args.pad_x and x_len <= self.args.text_max_length:
            pad = [0] * (self.args.text_max_length - x_len) if self.args.sep_special_token else [self.args.text_pad_token] * (self.args.text_max_length - x_len)
            x = x + pad
        ### padding and cropping above ###
        ### padding and cropping above ###

        return {
            "x": torch.LongTensor(x), 
            "x_len": x_len, 
            "y": torch.LongTensor(y), 
            "y_len": y_len
            }
            

    def collate(self, batch):
        out = {key:[] for key in batch[0]}
        for item in batch:
            if item['x'] == None: # deal with load failure
                continue
            for key, val in item.items():
                out[key].append(val)
        res = {}
        if self.args.pad_x:
            res["x"] = torch.stack(out["x"], dim=0)
        else:
            res["x"] = torch.nn.utils.rnn.pad_sequence(out["x"], batch_first=True, padding_value=self.args.text_pad_token)
        res["x_lens"] = torch.LongTensor(out["x_len"])
        if self.args.dynamic_batching:
            if out['y'][0].ndim==2:
                res['y'] = torch.nn.utils.rnn.pad_sequence([item.transpose(1,0) for item in out['y']],padding_value=self.args.audio_pad_token)
                res['y'] = res['y'].permute(1,2,0) # T B K -> B K T
            else:
                assert out['y'][0].ndim==1, out['y'][0].shape
                res['y'] = torch.nn.utils.rnn.pad_sequence(out['y'], batch_first=True, padding_value=self.args.audio_pad_token)
        else:
            res['y'] = torch.stack(out['y'], dim=0)
        res["y_lens"] = torch.LongTensor(out["y_len"])
        res["text_padding_mask"] = torch.arange(res['x'][0].shape[-1]).unsqueeze(0) >= res['x_lens'].unsqueeze(1)
        res["audio_padding_mask"] = torch.arange(res['y'][0].shape[-1]).unsqueeze(0) >= res['y_lens'].unsqueeze(1)
        return res

================================================
FILE: data/phonemize_encodec_encode_hf.py
================================================
import argparse
def parse_args():
    parser = argparse.ArgumentParser(description="encode the librilight dataset using encodec model")
    parser.add_argument("--dataset_size", type=str, default='xs', help='sizes of gigaspeech, xs, s, m, l, xl. we use xl for VoiceCraft training, xs is good for debugging')
    parser.add_argument('--download_to', type=str, default="/data/scratch/pyp/datasets/gigaspeech_debug", help="dir where you want the huggingface gigaspeech dataset to be downloaded to")
    parser.add_argument('--save_dir', type=str, default="/data/scratch/pyp/datasets/gigaspeech_phn_enc_manifest_debug", help="path to the manifest, phonemes, and encodec codes dirs")
    parser.add_argument('--encodec_model_path', type=str, default="/data/scratch/pyp/exp_pyp/audiocraft/encodec/xps/6f79c6a8/checkpoint.th")
    parser.add_argument('--n_workers', type=int, default=4, help="Number of parallel worker processes")
    parser.add_argument('--mega_batch_size', type=int, default=100, help="Number of samples in each mega batch for multiprocess dataloading")
    parser.add_argument('--batch_size', type=int, default=4, help="batch size for encodec encoding, decrease it if OOM. This is the sum of batch size *over each gpu*, so increase it if you are using more gpus")
    parser.add_argument('--model_sr', type=int, default=16000, help='encodec input audio sample rate')
    parser.add_argument('--downsample_rate', type=int, default=320, help='encodec downsample rate')
    parser.add_argument('--model_code_sr', type=int, default=50, help='encodec model code sample rate')
    parser.add_argument('--len_cap', type=float, default=35.0, help='will drop audios that are longer than this number')
    parser.add_argument('--max_len', type=int, default=30000, help='max length of audio in samples, if exceed, will cut a batch into half to process, decrease this number if OOM on your machine')
    return parser.parse_args()
if __name__ == "__main__":
    import logging
    formatter = (
        "%(asctime)s [%(levelname)s] %(filename)s:%(lineno)d || %(message)s"
    )
    logging.basicConfig(format=formatter, level=logging.INFO)
    args = parse_args()

    import os
    import numpy as np
    import torch
    import tqdm
    import time
    from datasets import load_dataset, DownloadConfig

    from tokenizer import TextTokenizer, tokenize_text
    
    # get the path
    phn_save_root = os.path.join(args.save_dir, args.dataset_size, "phonemes")
    codes_save_root = os.path.join(args.save_dir, args.dataset_size, "encodec_16khz_4codebooks")
    vocab_fn = os.path.join(args.save_dir, args.dataset_size, "vocab.txt")
    os.makedirs(phn_save_root, exist_ok=True)
    os.makedirs(codes_save_root, exist_ok=True)


    def sort_by_audio_len(lens):
        inds = np.argsort(lens).tolist()
        logging.info(f"longest: {lens[inds[-1]]*args.model_code_sr} encodec codes, {lens[inds[-1]]:.2f} sec.")
        logging.info(f"shortest: {lens[inds[0]]*args.model_code_sr} encodec codes, {lens[inds[0]]:.2f} sec.")
        logging.info(f"median: {lens[inds[len(inds)//2]]*args.model_code_sr} encodec codes, {lens[inds[len(inds)//2]]:.2f} sec.")
        logging.info(f"95 percentile longest: {lens[inds[int(len(inds)*0.95)]]*args.model_code_sr} encodec codes, {lens[inds[int(len(inds)*0.95)]]:.2f} sec.")
        return inds[::-1]
    
    def write_array_to_txt_file(array, filename):
        with open(filename, 'w') as f:
            for a in array[:-1]:
                f.write(' '.join(map(str, a))+'\n')
            f.write(' '.join(map(str, array[-1])))
    

    ### phonemization
    # load tokenizer
    # load the encodec model
    from audiocraft.solvers import CompressionSolver
    model = CompressionSolver.model_from_checkpoint(args.encodec_model_path)
    model = model.cuda()
    model = model.eval()
    text_tokenizer = TextTokenizer()


    # https://github.com/SpeechColab/GigaSpeech
    # there are only four different punctuations
    # need to check whether there are other < started strings
    punc2sym = {" <COMMA>": ",", " <PERIOD>": ".", " <QUESTIONMARK>": "?", " <EXCLAMATIONPOINT>": "!"} # note the space in front of each punc name
    gar2sym = {"<SIL>": "#%#", "<MUSIC>": "##%", "<NOISE>": "%%#", "<OTHER>":"%#%"} # so that they are savely keep as the original sym when using tokenize_text
    punc2sym.update(gar2sym)

    word2sym = { "h æ ʃ h ɐ ʃ p ɚ s ɛ n t": "<MUSIC>", "h æ ʃ p ɚ s ɛ n t h æ ʃ": "<SIL>", "p ɚ s ɛ n t h ɐ ʃ p ɚ s ɛ n t": "<OTHER>", "p ɚ s ɛ n t p ɚ s ɛ n t h æ ʃ": "<NOISE>"}
    forbidden_words = set(['#%#', '##%', '%%#', '%#%'])

    dc = DownloadConfig(cache_dir=args.download_to)
    stime = time.time()
    logging.info("loading the dataset...")
    gs = load_dataset("speechcolab/gigaspeech", args.dataset_size, use_auth_token=True, cache_dir = args.download_to, download_config=dc)
    logging.info(f"time spend on loading the dataset: {time.time() - stime:.2f} seconds")

    splits = ['validation', 'test', 'train']
    
    logging.info(f"gigaspeech dataset {args.dataset_size} info: {gs}")
    logging.info(f"phonemizing...")
    phn_vocab = set()
    all_lens = []
    
    # you will see a ton of [WARNING] words_mismatch.py:88......, it's not a issue
    for split in tqdm.tqdm(splits):
        skip = 0
        logging.info(f"now processing split {split}...")
        for item in tqdm.tqdm(gs[split]):
            save_fn = os.path.join(phn_save_root, item['segment_id']+".txt")
            text = item['text']
            if sum(word in forbidden_words for word in text.split(" ")):
                logging.info(f"skip {item['segment_id']}, because it contains forbiden words. It's transcript: {text}")
                skip += 1
                continue
            for k, v in punc2sym.items():
                text = text.replace(k, v)
            phn = tokenize_text(text_tokenizer, text)
            phn_seq = " ".join(phn)
            for k, v in word2sym.items():
                phn_seq = phn_seq.replace(k, v)
            phn_vocab.update(phn_seq.split(" "))
            all_lens.append(len(phn_seq.split(" ")))
            with open(save_fn, "w") as f:
                f.write(phn_seq)
        logging.info(f"split {split} has {len(gs[split])} samples in total, skipped {skip} due to forbiden words")

    print(f"phn vocab size: {len(list(phn_vocab))}")
    print("phn sequence stats: ")
    print(f"longest: {max(all_lens)}")
    print(f"shortest: {min(all_lens)}")
    print(f"median: {np.quantile(all_lens, 0.5)}")
    print(f"95 percentile longest: {np.quantile(all_lens, 0.95)}")
    print("write vocabulary to ", vocab_fn)
    with open(vocab_fn, "w") as f:
        for i, phn in enumerate(list(phn_vocab)):
            if i < len(list(phn_vocab)) - 1:
                f.write(f"{str(i)} {phn}\n")
            else:
                f.write(f"{str(i)} {phn}")

    class mydataset(torch.utils.data.Dataset):
        def __init__(self, split):
            super().__init__()
            self.data = gs[split]
        def __len__(self):
            return len(self.data)
        def __getitem__(self, ind):
            try:
                segment_id, audio, sr, text, begin_time, end_time = self.data[ind]['segment_id'], torch.from_numpy(self.data[ind]['audio']['array']).float(), self.data[ind]['audio']['sampling_rate'], self.data[ind]['text'], self.data[ind]['begin_time'], self.data[ind]['end_time']
            except:
                return None, None, None, None, None, None
            
            return segment_id, audio, sr, text, begin_time, end_time
        def collate(self, batch):
            res = {'segment_id': [], "audio": [], "sr": [], "text": [], "begin_time": [], "end_time": []}
            for item in batch:
                if item[0] != None:
                    res['segment_id'].append(item[0])
                    res['audio'].append(item[1])
                    res['sr'].append(item[2])
                    res['text'].append(item[3])
                    res['begin_time'].append(item[4])
                    res['end_time'].append(item[5])
            return res


    ## encodec codes extraction
    logging.info("encodec encoding...")
    train_dataset = mydataset('train')
    train_loader = torch.torch.utils.data.DataLoader(train_dataset, batch_size=args.mega_batch_size, shuffle=False, drop_last=False, num_workers=args.n_workers, collate_fn=train_dataset.collate)
    validation_dataset = mydataset('validation')
    validation_loader = torch.torch.utils.data.DataLoader(validation_dataset, batch_size=args.mega_batch_size, shuffle=False, drop_last=False, num_workers=args.n_workers, collate_fn=validation_dataset.collate)
    test_dataset = mydataset('test')
    test_loader = torch.torch.utils.data.DataLoader(test_dataset, batch_size=args.mega_batch_size, shuffle=False, drop_last=False, num_workers=args.n_workers, collate_fn=test_dataset.collate)
    splits = ['validation', 'test', 'train']
    loaders = [validation_loader, test_loader, train_loader]
    # splits = ['validation'] # for debug
    # loaders = [validation_loader]
    for split, loader in zip(splits, loaders):
        skip = 0
        logging.info(f"now processing split {split}...")
        mega_n_steps = int(np.ceil(len(gs[split]) / args.mega_batch_size))
        logging.info(f"partition the split {split} into {mega_n_steps} parts, each has {args.mega_batch_size} samples")
        for m, mega_batch in enumerate(loader):
            logging.info(f"====================================")
            logging.info(f"====================================")
            logging.info(f"now processing mega step {m+1}/{mega_n_steps}")
            lengths = np.array(mega_batch['end_time']) - np.array(mega_batch['begin_time'])
            sorted_inds = sort_by_audio_len(lengths)
            for j in range(len(sorted_inds))[::-1]:
                if lengths[sorted_inds[j]] < 0.2 or lengths[sorted_inds[j]] > args.len_cap: # skip samples that are too short (shorter than 0.2s), or too big (bigger than 80s)
                    skip += 1
                    del sorted_inds[j]
            
            n_steps = int(np.ceil(len(sorted_inds) / args.batch_size))
            for n in tqdm.tqdm(range(n_steps), disable=True):
                inds_used = sorted_inds[n*args.batch_size:(n+1)*args.batch_size]
                audio_batch = [mega_batch['audio'][id] for id in inds_used]
                sr_batch = [mega_batch['sr'][id] for id in inds_used]
                segment_id_batch = [mega_batch['segment_id'][id] for id in inds_used]
                text_batch = [mega_batch['text'][id] for id in inds_used]
                padded_wav = torch.nn.utils.rnn.pad_sequence(audio_batch, batch_first=True).unsqueeze(1) # [B, T] -> [B, 1, T]
                all_lens = [lengths[id] for id in inds_used]
                with torch.no_grad():
                    if max(all_lens) > args.max_len and len(all_lens) > 1: # NOTE decrease args.max_len if OOM, or chunk it into more than 2 forward passes
                        codes = []
                        inwav = padded_wav.cuda()
                        codes.append(model.encode(inwav[:len(inwav)//2])[0].cpu())
                        codes.append(model.encode(inwav[len(inwav)//2:])[0].cpu())
                        codes = torch.cat(codes, dim=0)
                    else:
                        encoded_frames = model.encode(padded_wav.cuda())
                        # logging.info(f"encoded_frames: {encoded_frames[0].shape}")
                        codes = encoded_frames[0].cpu()

                for i, length in enumerate(all_lens):
                    save_fn = os.path.join(codes_save_root, segment_id_batch[i]+".txt")
                    actual_len = round(length * args.model_code_sr) # 320 is downsample rate for this model
                    cur_code = codes[i].tolist() if type(codes) == list else codes[i, :, :actual_len].tolist()
                    write_array_to_txt_file(cur_code, save_fn)


================================================
FILE: data/tokenizer.py
================================================
# cp from https://github.com/lifeiteng/vall-e/blob/main/valle/data/tokenizer.py
# Copyright    2023                            (authors: Feiteng Li)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import re
from dataclasses import asdict, dataclass
from typing import Any, Dict, List, Optional, Pattern, Union

import numpy as np
import torch
import torchaudio
# from lhotse.features import FeatureExtractor
# from lhotse.utils import Seconds, compute_num_frames
from phonemizer.backend import EspeakBackend
from phonemizer.backend.espeak.language_switch import LanguageSwitch
from phonemizer.backend.espeak.words_mismatch import WordMismatch
from phonemizer.punctuation import Punctuation
from phonemizer.separator import Separator



class TextTokenizer:
    """Phonemize Text."""

    def __init__(
        self,
        language="en-us",
        backend="espeak",
        separator=Separator(word="_", syllable="-", phone="|"),
        preserve_punctuation=True,
        punctuation_marks: Union[str, Pattern] = Punctuation.default_marks(),
        with_stress: bool = False,
        tie: Union[bool, str] = False,
        language_switch: LanguageSwitch = "keep-flags",
        words_mismatch: WordMismatch = "ignore",
    ) -> None:
        phonemizer = EspeakBackend(
            language,
            punctuation_marks=punctuation_marks,
            preserve_punctuation=preserve_punctuation,
            with_stress=with_stress,
            tie=tie,
            language_switch=language_switch,
            words_mismatch=words_mismatch,
        )
        
        self.backend = phonemizer
        self.separator = separator

    def to_list(self, phonemized: str) -> List[str]:
        fields = []
        for word in phonemized.split(self.separator.word):
            # "ɐ    m|iː|n?"    ɹ|ɪ|z|ɜː|v; h|ɪ|z.
            pp = re.findall(r"\w+|[^\w\s]", word, re.UNICODE)
            fields.extend(
                [p for p in pp if p != self.separator.phone]
                + [self.separator.word]
            )
        assert len("".join(fields[:-1])) == len(phonemized) - phonemized.count(
            self.separator.phone
        )
        return fields[:-1]

    def __call__(self, text, strip=True) -> List[List[str]]:
        if isinstance(text, str):
            text = [text]

        phonemized = self.backend.phonemize(
            text, separator=self.separator, strip=strip, njobs=1
        )
        return [self.to_list(p) for p in phonemized]


def tokenize_text(tokenizer: TextTokenizer, text: str) -> List[str]:
    phonemes = tokenizer([text.strip()])
    return phonemes[0]  # k2symbols

def convert_audio(wav: torch.Tensor, sr: int, target_sr: int, target_channels: int):
    assert wav.shape[0] in [1, 2], "Audio must be mono or stereo."
    if target_channels == 1:
        wav = wav.mean(0, keepdim=True)
    elif target_channels == 2:
        *shape, _, length = wav.shape
        wav = wav.expand(*shape, target_channels, length)
    elif wav.shape[0] == 1:
        wav = wav.expand(target_channels, -1)
    wav = torchaudio.transforms.Resample(sr, target_sr)(wav)
    return wav

class AudioTokenizer:
    """EnCodec audio."""

    def __init__(
        self,
        device: Any = None,
        signature = None
    ) -> None:
        from audiocraft.solvers import CompressionSolver
        model = CompressionSolver.model_from_checkpoint(signature)
        self.sample_rate = model.sample_rate
        self.channels = model.channels
        
        if not device:
            device = torch.device("cpu")
            if torch.cuda.is_available():
                device = torch.device("cuda:0")

        self._device = device

        self.codec = model.to(device)

    @property
    def device(self):
        return self._device

    def encode(self, wav: torch.Tensor) -> torch.Tensor:
        codes = self.codec.encode(wav.to(self.device))
        return [(codes[0], None)]

    def decode(self, frames: torch.Tensor) -> torch.Tensor:
        frames = frames[0][0] # [1,4,T]
        return self.codec.decode(frames)
    


def tokenize_audio(tokenizer: AudioTokenizer, audio_path: str, offset = -1, num_frames=-1):
    # Load and pre-process the audio waveform
    if offset != -1 and num_frames!=-1:
        wav, sr = torchaudio.load(audio_path, frame_offset=offset, num_frames=num_frames)
    else:
        wav, sr = torchaudio.load(audio_path)
    wav = convert_audio(wav, sr, tokenizer.sample_rate, tokenizer.channels)
    wav = wav.unsqueeze(0)

    # Extract discrete codes from EnCodec
    with torch.no_grad():
        encoded_frames = tokenizer.encode(wav)
    return encoded_frames


================================================
FILE: demo/temp/84_121550_000074_000000.txt
================================================
But when I had approached so near to them The common object, which the sense deceives, Lost not by distance any of its marks,

================================================
FILE: demo/temp/mfa_alignments/5895_34622_000026_000002.csv
================================================
Begin,End,Label,Type,Speaker
0.04,0.58,gwynplaine,words,temp
0.58,0.94,had,words,temp
0.94,1.45,besides,words,temp
1.45,1.62,for,words,temp
1.62,1.86,his,words,temp
1.86,2.16,work,words,temp
2.16,2.31,and,words,temp
2.31,2.49,for,words,temp
2.49,2.71,his,words,temp
2.71,3.03,feats,words,temp
3.03,3.12,of,words,temp
3.12,3.61,strength,words,temp
3.95,4.25,round,words,temp
4.25,4.45,his,words,temp
4.45,4.7,neck,words,temp
4.7,4.81,and,words,temp
4.81,5.04,over,words,temp
5.04,5.22,his,words,temp
5.22,5.83,shoulders,words,temp
6.16,6.31,an,words,temp
6.41,7.15,esclavine,words,temp
7.15,7.29,of,words,temp
7.29,7.7,leather,words,temp
0.04,0.1,G,phones,temp
0.1,0.13,W,phones,temp
0.13,0.22,IH1,phones,temp
0.22,0.3,N,phones,temp
0.3,0.38,P,phones,temp
0.38,0.42,L,phones,temp
0.42,0.53,EY1,phones,temp
0.53,0.58,N,phones,temp
0.58,0.71,HH,phones,temp
0.71,0.86,AE1,phones,temp
0.86,0.94,D,phones,temp
0.94,0.97,B,phones,temp
0.97,1.01,IH0,phones,temp
1.01,1.14,S,phones,temp
1.14,1.34,AY1,phones,temp
1.34,1.4,D,phones,temp
1.4,1.45,Z,phones,temp
1.45,1.52,F,phones,temp
1.52,1.55,AO1,phones,temp
1.55,1.62,R,phones,temp
1.62,1.69,HH,phones,temp
1.69,1.76,IH1,phones,temp
1.76,1.86,Z,phones,temp
1.86,1.95,W,phones,temp
1.95,2.07,ER1,phones,temp
2.07,2.16,K,phones,temp
2.16,2.23,AH0,phones,temp
2.23,2.26,N,phones,temp
2.26,2.31,D,phones,temp
2.31,2.38,F,phones,temp
2.38,2.41,AO1,phones,temp
2.41,2.49,R,phones,temp
2.49,2.55,HH,phones,temp
2.55,2.62,IH1,phones,temp
2.62,2.71,Z,phones,temp
2.71,2.8,F,phones,temp
2.8,2.9,IY1,phones,temp
2.9,2.98,T,phones,temp
2.98,3.03,S,phones,temp
3.03,3.07,AH0,phones,temp
3.07,3.12,V,phones,temp
3.12,3.2,S,phones,temp
3.2,3.26,T,phones,temp
3.26,3.32,R,phones,temp
3.32,3.39,EH1,phones,temp
3.39,3.48,NG,phones,temp
3.48,3.53,K,phones,temp
3.53,3.61,TH,phones,temp
3.95,4.03,R,phones,temp
4.03,4.16,AW1,phones,temp
4.16,4.21,N,phones,temp
4.21,4.25,D,phones,temp
4.25,4.29,HH,phones,temp
4.29,4.36,IH1,phones,temp
4.36,4.45,Z,phones,temp
4.45,4.53,N,phones,temp
4.53,4.62,EH1,phones,temp
4.62,4.7,K,phones,temp
4.7,4.74,AH0,phones,temp
4.74,4.77,N,phones,temp
4.77,4.81,D,phones,temp
4.81,4.92,OW1,phones,temp
4.92,4.97,V,phones,temp
4.97,5.04,ER0,phones,temp
5.04,5.11,HH,phones,temp
5.11,5.18,IH1,phones,temp
5.18,5.22,Z,phones,temp
5.22,5.34,SH,phones,temp
5.34,5.47,OW1,phones,temp
5.47,5.51,L,phones,temp
5.51,5.58,D,phones,temp
5.58,5.71,ER0,phones,temp
5.71,5.83,Z,phones,temp
6.16,6.23,AE1,phones,temp
6.23,6.31,N,phones,temp
6.41,7.15,spn,phones,temp
7.15,7.21,AH0,phones,temp
7.21,7.29,V,phones,temp
7.29,7.36,L,phones,temp
7.36,7.44,EH1,phones,temp
7.44,7.49,DH,phones,temp
7.49,7.7,ER0,phones,temp


================================================
FILE: demo/temp/mfa_alignments/84_121550_000074_000000.csv
================================================
Begin,End,Label,Type,Speaker
0.03,0.18,but,words,temp
0.18,0.32,when,words,temp
0.32,0.48,i,words,temp
0.48,0.64,had,words,temp
0.64,1.19,approached,words,temp
1.22,1.58,so,words,temp
1.58,1.91,near,words,temp
1.91,2.07,to,words,temp
2.07,2.42,them,words,temp
2.53,2.61,the,words,temp
2.61,3.01,common,words,temp
3.05,3.62,object,words,temp
3.68,3.93,which,words,temp
3.93,4.02,the,words,temp
4.02,4.34,sense,words,temp
4.34,4.97,deceives,words,temp
5.04,5.54,lost,words,temp
5.54,6.0,not,words,temp
6.0,6.14,by,words,temp
6.14,6.67,distance,words,temp
6.79,7.05,any,words,temp
7.05,7.18,of,words,temp
7.18,7.34,its,words,temp
7.34,7.87,marks,words,temp
0.03,0.06,B,phones,temp
0.06,0.09,AH1,phones,temp
0.09,0.18,T,phones,temp
0.18,0.23,W,phones,temp
0.23,0.27,EH1,phones,temp
0.27,0.32,N,phones,temp
0.32,0.48,AY1,phones,temp
0.48,0.49,HH,phones,temp
0.49,0.6,AE1,phones,temp
0.6,0.64,D,phones,temp
0.64,0.7,AH0,phones,temp
0.7,0.83,P,phones,temp
0.83,0.88,R,phones,temp
0.88,0.99,OW1,phones,temp
0.99,1.12,CH,phones,temp
1.12,1.19,T,phones,temp
1.22,1.4,S,phones,temp
1.4,1.58,OW1,phones,temp
1.58,1.7,N,phones,temp
1.7,1.84,IH1,phones,temp
1.84,1.91,R,phones,temp
1.91,2.01,T,phones,temp
2.01,2.07,AH0,phones,temp
2.07,2.13,DH,phones,temp
2.13,2.3,EH1,phones,temp
2.3,2.42,M,phones,temp
2.53,2.55,DH,phones,temp
2.55,2.61,AH0,phones,temp
2.61,2.73,K,phones,temp
2.73,2.85,AA1,phones,temp
2.85,2.9,M,phones,temp
2.9,2.95,AH0,phones,temp
2.95,3.01,N,phones,temp
3.05,3.22,AA1,phones,temp
3.22,3.27,B,phones,temp
3.27,3.34,JH,phones,temp
3.34,3.48,EH0,phones,temp
3.48,3.54,K,phones,temp
3.54,3.62,T,phones,temp
3.68,3.69,HH,phones,temp
3.69,3.76,W,phones,temp
3.76,3.8,IH1,phones,temp
3.8,3.93,CH,phones,temp
3.93,3.95,DH,phones,temp
3.95,4.02,AH0,phones,temp
4.02,4.12,S,phones,temp
4.12,4.21,EH1,phones,temp
4.21,4.27,N,phones,temp
4.27,4.34,S,phones,temp
4.34,4.42,D,phones,temp
4.42,4.45,IH0,phones,temp
4.45,4.59,S,phones,temp
4.59,4.79,IY1,phones,temp
4.79,4.87,V,phones,temp
4.87,4.97,Z,phones,temp
5.04,5.12,L,phones,temp
5.12,5.33,AO1,phones,temp
5.33,5.42,S,phones,temp
5.42,5.54,T,phones,temp
5.54,5.7,N,phones,temp
5.7,5.89,AA1,phones,temp
5.89,6.0,T,phones,temp
6.0,6.05,B,phones,temp
6.05,6.14,AY1,phones,temp
6.14,6.24,D,phones,temp
6.24,6.3,IH1,phones,temp
6.3,6.38,S,phones,temp
6.38,6.45,T,phones,temp
6.45,6.51,AH0,phones,temp
6.51,6.57,N,phones,temp
6.57,6.67,S,phones,temp
6.79,6.89,EH1,phones,temp
6.89,6.95,N,phones,temp
6.95,7.05,IY0,phones,temp
7.05,7.13,AH0,phones,temp
7.13,7.18,V,phones,temp
7.18,7.22,IH0,phones,temp
7.22,7.29,T,phones,temp
7.29,7.34,S,phones,temp
7.34,7.39,M,phones,temp
7.39,7.5,AA1,phones,temp
7.5,7.58,R,phones,temp
7.58,7.7,K,phones,temp
7.7,7.87,S,phones,temp


================================================
FILE: edit_utils.py
================================================
def get_span(orig, new, editType):
    orig_list = orig.split(" ")
    new_list = new.split(" ")
    
    flag = False # this indicate whether the actual edit follow the specified editType
    if editType == "deletion":
        assert len(orig_list) > len(new_list), f"the edit type is deletion, but new is not shorter than original:\n new: {new}\n orig: {orig}"
        diff = len(orig_list) - len(new_list)
        for i, (o, n) in enumerate(zip(orig_list, new_list)):
            if o != n: # assume the index of the first different word is the starting index of the orig_span
            
                orig_span = [i, i + diff - 1] # assume that the indices are starting and ending index of the deleted part
                new_span = [i-1, i] # but for the new span, the starting and ending index is the two words that surround the deleted part
                flag = True
                break


    elif editType == "insertion": 
        assert len(orig_list) < len(new_list), f"the edit type is insertion, but the new is not longer than the original:\n new: {new}\n orig: {orig}"
        diff = len(new_list) - len(orig_list)
        for i, (o, n) in enumerate(zip(orig_list, new_list)):
            if o != n: # insertion is just the opposite of deletion
                new_span = [i, i + diff - 1] # NOTE if only inserted one word, s and e will be the same
                orig_span = [i-1, i]
                flag = True
                break

    elif editType == "substitution":
        new_span = []
        orig_span = []
        for i, (o, n) in enumerate(zip(orig_list, new_list)):
            if o != n:
                new_span = [i]
                orig_span = [i]
                break
        assert len(new_span) == 1 and len(orig_span) == 1, f"new_span: {new_span}, orig_span: {orig_span}"
        for j, (o, n) in enumerate(zip(orig_list[::-1], new_list[::-1])):
            if o != n:
                new_span.append(len(new_list) - j -1)
                orig_span.append(len(orig_list) - j - 1)
                flag = True
                break
    else:
        raise RuntimeError(f"editType unknown: {editType}")

    if not flag:
        raise RuntimeError(f"wrong editing with the specified edit type:\n original: {orig}\n new: {new}\n, editType: {editType}")

    return orig_span, new_span    

================================================
FILE: environment.yml
================================================
name: voicecraft
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - aom=3.8.2=h59595ed_0
  - asttokens=2.4.1=pyhd8ed1ab_0
  - atk-1.0=2.38.0=hd4edc92_1
  - audioread=3.0.1=py39hf3d152e_1
  - backcall=0.2.0=pyh9f0ad1d_0
  - baumwelch=0.3.7=h00ab1b0_5
  - biopython=1.79=py39hb9d737c_3
  - brotli=1.1.0=hd590300_1
  - brotli-bin=1.1.0=hd590300_1
  - brotli-python=1.1.0=py39h3d6467e_1
  - bzip2=1.0.8=hd590300_5
  - ca-certificates=2024.2.2=hbcca054_0
  - cairo=1.18.0=h3faef2a_0
  - certifi=2024.2.2=pyhd8ed1ab_0
  - cffi=1.16.0=py39h7a31438_0
  - charset-normalizer=3.3.2=pyhd8ed1ab_0
  - click=8.1.7=unix_pyh707e725_0
  - colorama=0.4.6=pyhd8ed1ab_0
  - comm=0.2.2=pyhd8ed1ab_0
  - contourpy=1.2.0=py39h7633fee_0
  - cycler=0.12.1=pyhd8ed1ab_0
  - dataclassy=1.0.1=pyhd8ed1ab_0
  - dav1d=1.2.1=hd590300_0
  - debugpy=1.8.1=py39h3d6467e_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - executing=2.0.1=pyhd8ed1ab_0
  - expat=2.6.2=h59595ed_0
  - ffmpeg=6.1.1=gpl_h38e077a_106
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=h77eed37_1
  - fontconfig=2.14.2=h14ed4e7_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - fonttools=4.49.0=py39hd1e30aa_0
  - freetype=2.12.1=h267a509_2
  - fribidi=1.0.10=h36c2ea0_0
  - gdk-pixbuf=2.42.10=h829c605_5
  - gettext=0.21.1=h27087fc_0
  - giflib=5.2.1=h0b41bf4_3
  - gmp=6.3.0=h59595ed_1
  - gnutls=3.7.9=hb077bed_0
  - graphite2=1.3.13=h58526e2_1001
  - graphviz=9.0.0=h78e8752_1
  - greenlet=3.0.3=py39h3d6467e_0
  - gtk2=2.24.33=h280cfa0_4
  - gts=0.7.6=h977cf35_4
  - harfbuzz=8.3.0=h3d44ed6_0
  - hdbscan=0.8.33=py39h44dd56e_4
  - icu=73.2=h59595ed_0
  - idna=3.6=pyhd8ed1ab_0
  - importlib-metadata=7.0.2=pyha770c72_0
  - importlib-resources=6.3.0=pyhd8ed1ab_0
  - importlib_metadata=7.0.2=hd8ed1ab_0
  - importlib_resources=6.3.0=pyhd8ed1ab_0
  - ipykernel=6.29.3=pyhd33586a_0
  - jedi=0.19.1=pyhd8ed1ab_0
  - joblib=1.3.2=pyhd8ed1ab_0
  - jupyter_client=8.6.1=pyhd8ed1ab_0
  - jupyter_core=5.7.2=py39hf3d152e_0
  - kaldi=5.5.1068=cpu_h31769b2_2
  - keyutils=1.6.1=h166bdaf_0
  - kiwisolver=1.4.5=py39h7633fee_1
  - kneed=0.8.5=pyhd8ed1ab_0
  - krb5=1.21.2=h659d440_0
  - lame=3.100=h166bdaf_1003
  - lazy_loader=0.3=pyhd8ed1ab_0
  - lcms2=2.16=hb7c19ff_0
  - ld_impl_linux-64=2.40=h41732ed_0
  - lerc=4.0.0=h27087fc_0
  - libabseil=20240116.1=cxx17_h59595ed_2
  - libass=0.17.1=h8fe9dca_1
  - libblas=3.9.0=21_linux64_openblas
  - libbrotlicommon=1.1.0=hd590300_1
  - libbrotlidec=1.1.0=hd590300_1
  - libbrotlienc=1.1.0=hd590300_1
  - libcblas=3.9.0=21_linux64_openblas
  - libclang-cpp15=15.0.7=default_hb11cfb5_4
  - libdeflate=1.19=hd590300_0
  - libdrm=2.4.120=hd590300_0
  - libedit=3.1.20191231=he28a2e2_2
  - libexpat=2.6.2=h59595ed_0
  - libffi=3.4.2=h7f98852_5
  - libflac=1.4.3=h59595ed_0
  - libgcc-ng=13.2.0=h807b86a_5
  - libgd=2.3.3=h119a65a_9
  - libgfortran-ng=13.2.0=h69a702a_5
  - libgfortran5=13.2.0=ha4646dd_5
  - libglib=2.80.0=hf2295e7_0
  - libgomp=13.2.0=h807b86a_5
  - libhwloc=2.9.3=default_h554bfaf_1009
  - libiconv=1.17=hd590300_2
  - libidn2=2.3.7=hd590300_0
  - libjpeg-turbo=3.0.0=hd590300_1
  - liblapack=3.9.0=21_linux64_openblas
  - liblapacke=3.9.0=21_linux64_openblas
  - libllvm14=14.0.6=hcd5def8_4
  - libllvm15=15.0.7=hb3ce162_4
  - libllvmspirv15=15.0.0=h0cdce71_1
  - libnsl=2.0.1=hd590300_0
  - libogg=1.3.4=h7f98852_1
  - libopenblas=0.3.26=pthreads_h413a1c8_0
  - libopenvino=2024.0.0=h2e90f83_1
  - libopenvino-auto-batch-plugin=2024.0.0=hd5fc58b_1
  - libopenvino-auto-plugin=2024.0.0=hd5fc58b_1
  - libopenvino-hetero-plugin=2024.0.0=h3ecfda7_1
  - libopenvino-intel-cpu-plugin=2024.0.0=h2e90f83_1
  - libopenvino-intel-gpu-plugin=2024.0.0=h2e90f83_1
  - libopenvino-ir-frontend=2024.0.0=h3ecfda7_1
  - libopenvino-onnx-frontend=2024.0.0=h757c851_1
  - libopenvino-paddle-frontend=2024.0.0=h757c851_1
  - libopenvino-pytorch-frontend=2024.0.0=h59595ed_1
  - libopenvino-tensorflow-frontend=2024.0.0=hca94c1a_1
  - libopenvino-tensorflow-lite-frontend=2024.0.0=h59595ed_1
  - libopus=1.3.1=h7f98852_1
  - libpciaccess=0.18=hd590300_0
  - libpng=1.6.43=h2797004_0
  - libpq=16.2=h33b98f1_0
  - libprotobuf=4.25.3=h08a7969_0
  - librosa=0.10.1=pyhd8ed1ab_0
  - librsvg=2.56.3=he3f83f7_1
  - libsndfile=1.2.2=hc60ed4a_1
  - libsodium=1.0.18=h36c2ea0_1
  - libsqlite=3.45.2=h2797004_0
  - libstdcxx-ng=13.2.0=h7e041cc_5
  - libtasn1=4.19.0=h166bdaf_0
  - libtiff=4.6.0=ha9c0a0a_2
  - libunistring=0.9.10=h7f98852_0
  - libuuid=2.38.1=h0b41bf4_0
  - libva=2.21.0=hd590300_0
  - libvorbis=1.3.7=h9c3ff4c_0
  - libvpx=1.14.0=h59595ed_0
  - libwebp=1.3.2=h658648e_1
  - libwebp-base=1.3.2=hd590300_0
  - libxcb=1.15=h0b41bf4_0
  - libxcrypt=4.4.36=hd590300_1
  - libxml2=2.12.5=h232c23b_0
  - libzlib=1.2.13=hd590300_5
  - llvm-spirv-15=15.0.0=h0cdce71_1
  - mad=0.15.1b=h9c3ff4c_1
  - markdown-it-py=3.0.0=pyhd8ed1ab_0
  - matplotlib-base=3.8.3=py39he9076e7_0
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - mdurl=0.1.2=pyhd8ed1ab_0
  - montreal-forced-aligner=2.2.17=pyhd8ed1ab_0
  - mpg123=1.32.4=h59595ed_0
  - msgpack-python=1.0.7=py39h7633fee_0
  - munkres=1.1.4=pyh9f0ad1d_0
  - ncurses=6.4=h59595ed_2
  - nest-asyncio=1.6.0=pyhd8ed1ab_0
  - nettle=3.9.1=h7ab15ed_0
  - ngram=1.3.14=h924138e_2
  - numba=0.59.0=py39h615d6bd_1
  - numpy=1.26.4=py39h474f0d3_0
  - ocl-icd=2.3.2=hd590300_0
  - openfst=1.8.2=h924138e_2
  - openh264=2.4.1=h59595ed_0
  - openjpeg=2.5.2=h488ebb8_0
  - openssl=3.2.1=hd590300_0
  - p11-kit=0.24.1=hc5aa10d_0
  - packaging=24.0=pyhd8ed1ab_0
  - pandas=2.2.1=py39hddac248_0
  - pango=1.52.1=ha41ecd1_0
  - parso=0.8.3=pyhd8ed1ab_0
  - patsy=0.5.6=pyhd8ed1ab_0
  - pcre2=10.43=hcad00b1_0
  - pexpect=4.9.0=pyhd8ed1ab_0
  - pgvector-python=0.2.5=pyhe093146_0
  - pickleshare=0.7.5=py_1003
  - pillow=10.2.0=py39had0adad_0
  - pip=24.0=pyhd8ed1ab_0
  - pixman=0.43.2=h59595ed_0
  - platformdirs=4.2.0=pyhd8ed1ab_0
  - pocl=5.0=h03a6ac1_2
  - pocl-core=5.0=hdaecddf_2
  - pocl-cpu=5.0=he901f76_2
  - pocl-cpu-minimal=5.0=h5ccd973_2
  - pocl-cuda=5.0=hdaecddf_2
  - pocl-remote=5.0=h5ccd973_2
  - pooch=1.8.1=pyhd8ed1ab_0
  - postgresql=16.2=h7387d8b_0
  - prompt-toolkit=3.0.42=pyha770c72_0
  - prompt_toolkit=3.0.42=hd8ed1ab_0
  - psutil=5.9.8=py39hd1e30aa_0
  - psycopg2=2.9.9=py39h89197e3_0
  - pthread-stubs=0.4=h36c2ea0_1001
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pugixml=1.14=h59595ed_0
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pygments=2.17.2=pyhd8ed1ab_0
  - pyparsing=3.1.2=pyhd8ed1ab_0
  - pysocks=1.7.1=pyha2e5f31_6
  - pysoundfile=0.12.1=pypyhd8ed1ab_1
  - python=3.9.18=h0755675_1_cpython
  - python-tzdata=2024.1=pyhd8ed1ab_0
  - python_abi=3.9=4_cp39
  - pytz=2024.1=pyhd8ed1ab_0
  - pyyaml=6.0.1=py39hd1e30aa_1
  - pyzmq=25.1.2=py39h8c080ef_0
  - readline=8.2=h8228510_1
  - requests=2.31.0=pyhd8ed1ab_0
  - rich=13.7.1=pyhd8ed1ab_0
  - rich-click=1.7.4=pyhd8ed1ab_0
  - scikit-learn=1.2.2=py39hc236052_2
  - scipy=1.12.0=py39h474f0d3_2
  - seaborn=0.13.2=hd8ed1ab_0
  - seaborn-base=0.13.2=pyhd8ed1ab_0
  - setuptools=69.2.0=pyhd8ed1ab_0
  - six=1.16.0=pyh6c4a22f_0
  - snappy=1.1.10=h9fff704_0
  - sox=14.4.2=ha5cc309_1018
  - soxr=0.1.3=h0b41bf4_3
  - soxr-python=0.3.7=py39h44dd56e_0
  - sqlalchemy=2.0.28=py39hd1e30aa_0
  - sqlite=3.45.2=h2c6b66d_0
  - stack_data=0.6.2=pyhd8ed1ab_0
  - statsmodels=0.14.1=py39h44dd56e_0
  - svt-av1=1.8.0=h59595ed_0
  - tbb=2021.11.0=h00ab1b0_1
  - threadpoolctl=3.3.0=pyhc1e730c_0
  - tk=8.6.13=noxft_h4845f30_101
  - tornado=6.4=py39hd1e30aa_0
  - tqdm=4.66.2=pyhd8ed1ab_0
  - traitlets=5.14.2=pyhd8ed1ab_0
  - typing-extensions=4.10.0=hd8ed1ab_0
  - typing_extensions=4.10.0=pyha770c72_0
  - tzcode=2024a=h3f72095_0
  - tzdata=2024a=h0c530f3_0
  - unicodedata2=15.1.0=py39hd1e30aa_0
  - urllib3=2.2.1=pyhd8ed1ab_0
  - wcwidth=0.2.13=pyhd8ed1ab_0
  - wheel=0.42.0=pyhd8ed1ab_0
  - x264=1!164.3095=h166bdaf_2
  - x265=3.5=h924138e_3
  - xorg-fixesproto=5.0=h7f98852_1002
  - xorg-kbproto=1.0.7=h7f98852_1002
  - xorg-libice=1.1.1=hd590300_0
  - xorg-libsm=1.2.4=h7391055_0
  - xorg-libx11=1.8.7=h8ee46fc_0
  - xorg-libxau=1.0.11=hd590300_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xorg-libxext=1.3.4=h0b41bf4_2
  - xorg-libxfixes=5.0.3=h7f98852_1004
  - xorg-libxrender=0.9.11=hd590300_0
  - xorg-renderproto=0.11.1=h7f98852_1002
  - xorg-xextproto=7.3.0=h0b41bf4_1003
  - xorg-xproto=7.0.31=h7f98852_1007
  - xz=5.2.6=h166bdaf_0
  - yaml=0.2.5=h7f98852_2
  - zeromq=4.3.5=h59595ed_1
  - zipp=3.17.0=pyhd8ed1ab_0
  - zlib=1.2.13=hd590300_5
  - zstd=1.5.5=hfc55251_0
  - pip:
      - absl-py==2.1.0
      - aiofiles==23.2.1
      - aiohttp==3.9.3
      - aiosignal==1.3.1
      - altair==5.2.0
      - antlr4-python3-runtime==4.9.3
      - anyio==4.3.0
      - async-timeout==4.0.3
      - attrs==23.2.0
      - av==11.0.0
      - babel==2.14.0
      - beautifulsoup4==4.12.3
      - bibtexparser==2.0.0b7
      - bleach==6.1.0
      - blis==0.7.11
      - catalogue==2.0.10
      - clldutils==3.22.2
      - cloudpickle==3.0.0
      - cmake==3.28.3
      - colorlog==6.8.2
      - confection==0.1.4
      - csvw==3.3.0
      - cymem==2.0.8
      - cython==0.29.37
      - datasets==2.16.0
      - defusedxml==0.7.1
      - demucs==4.0.1
      - dill==0.3.6
      - dlinfo==1.2.1
      - docopt==0.6.2
      - dora-search==0.1.12
      - einops==0.7.0
      - encodec==0.1.1
      - exceptiongroup==1.2.0
      - fastapi==0.110.0
      - fastjsonschema==2.19.1
      - ffmpy==0.3.2
      - filelock==3.13.1
      - flashy==0.0.2
      - frozenlist==1.4.1
      - fsspec==2023.10.0
      - gradio==3.50.2
      - gradio-client==0.6.1
      - grpcio==1.62.1
      - h11==0.14.0
      - httpcore==1.0.4
      - httpx==0.27.0
      - huggingface-hub==0.22.2
      - hydra-colorlog==1.2.0
      - hydra-core==1.3.2
      - ipython==8.12.3
      - isodate==0.6.1
      - jinja2==3.1.3
      - jsonschema==4.21.1
      - jsonschema-specifications==2023.12.1
      - julius==0.2.7
      - jupyterlab-pygments==0.3.0
      - lameenc==1.7.0
      - langcodes==3.3.0
      - language-tags==1.2.0
      - lit==18.1.1
      - llvmlite==0.42.0
      - lxml==5.1.0
      - markdown==3.5.2
      - markupsafe==2.1.5
      - mistune==3.0.2
      - mpmath==1.3.0
      - msgpack==1.0.8
      - multidict==6.0.5
      - multiprocess==0.70.14
      - murmurhash==1.0.10
      - nbclient==0.10.0
      - nbconvert==7.16.3
      - nbformat==5.10.3
      - networkx==3.2.1
      - num2words==0.5.13
      - nvidia-cublas-cu11==11.10.3.66
      - nvidia-cuda-cupti-cu11==11.7.101
      - nvidia-cuda-nvrtc-cu11==11.7.99
      - nvidia-cuda-runtime-cu11==11.7.99
      - nvidia-cudnn-cu11==8.5.0.96
      - nvidia-cufft-cu11==10.9.0.58
      - nvidia-curand-cu11==10.2.10.91
      - nvidia-cusolver-cu11==11.4.0.1
      - nvidia-cusparse-cu11==11.7.4.91
      - nvidia-nccl-cu11==2.14.3
      - nvidia-nvtx-cu11==11.7.91
      - omegaconf==2.3.0
      - openunmix==1.2.1
      - orjson==3.9.15
      - pandocfilters==1.5.1
      - pathlib-abc==0.1.1
      - pathy==0.11.0
      - pgvector==0.2.2
      - phonemizer==3.2.1
      - pipreqs==0.5.0
      - praatio==6.2.0
      - preshed==3.0.9
      - protobuf==4.25.3
      - pyarrow==15.0.2
      - pyarrow-hotfix==0.6
      - pydantic==1.10.14
      - pydub==0.25.1
      - pylatexenc==2.10
      - pynini==2.1.6
      - pypinyin==0.48.0
      - python-dateutil==2.9.0.post0
      - python-multipart==0.0.9
      - rdflib==7.0.0
      - referencing==0.33.0
      - regex==2023.12.25
      - responses==0.18.0
      - retrying==1.3.4
      - rfc3986==1.5.0
      - rpds-py==0.18.0
      - safetensors==0.4.2
      - segments==2.2.1
      - semantic-version==2.10.0
      - sentencepiece==0.2.0
      - smart-open==6.4.0
      - sniffio==1.3.1
      - soupsieve==2.5
      - spacy==3.5.2
      - spacy-legacy==3.0.12
      - spacy-loggers==1.0.5
      - srsly==2.4.8
      - starlette==0.36.3
      - submitit==1.5.1
      - sympy==1.12
      - tabulate==0.9.0
      - tensorboard==2.16.2
      - tensorboard-data-server==0.7.2
      - thinc==8.1.12
      - tinycss2==1.2.1
      - tokenizers==0.15.2
      - toolz==0.12.1
      - torch==2.0.1
      - torchaudio==2.0.2
      - torchmetrics==0.11.1
      - transformers==4.38.2
      - treetable==0.2.5
      - triton==2.0.0
      - typer==0.7.0
      - uritemplate==4.1.1
      - uvicorn==0.28.0
      - wasabi==1.1.2
      - webencodings==0.5.1
      - websockets==11.0.3
      - werkzeug==3.0.1
      - xformers==0.0.22
      - xxhash==3.4.1
      - yarg==0.1.9
      - yarl==1.9.4
prefix: /home/pyp/miniconda3/envs/voicecraft


================================================
FILE: gradio_app.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9b6a0c92",
   "metadata": {},
   "source": [
    "### Only do the below if you are using docker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "961faa43",
   "metadata": {},
   "outputs": [],
   "source": [
    "!source ~/.bashrc && \\\n",
    "    apt-get update && \\\n",
    "    apt-get install -y espeak espeak-data libespeak1 libespeak-dev && \\\n",
    "    apt-get install -y festival* && \\\n",
    "    apt-get install -y build-essential && \\\n",
    "    apt-get install -y flac libasound2-dev libsndfile1-dev vorbis-tools && \\\n",
    "    apt-get install -y libxml2-dev libxslt-dev zlib1g-dev"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "598d75cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "!source ~/.bashrc && \\\n",
    "    conda activate voicecraft && \\\n",
    "    pip install -r gradio_requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b9c4436",
   "metadata": {},
   "source": [
    "# STOP\n",
    "You have to do this part manually using the mouse/keyboard and the tabs at the top.\n",
    "\n",
    "* Refresh your browser to make sure it picks up the new kernel.\n",
    "* Kernel -> Change Kernel -> Select Kernel -> voicecraft\n",
    "* Kernel -> Restart Kernel -> Yes\n",
    "\n",
    "Now you can run the rest of the notebook and get an audio sample output. It will automatically download more models and such. The next time you use this container, you can just start below here as the dependencies will remain available until you delete the docker container."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f089aa96",
   "metadata": {},
   "outputs": [],
   "source": [
    "from gradio_app import app\n",
    "app.launch()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "voicecraft",
   "language": "python",
   "name": "voicecraft"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: gradio_app.py
================================================
import os
import re
from num2words import num2words
import gradio as gr
import torch
import torchaudio
from data.tokenizer import (
    AudioTokenizer,
    TextTokenizer,
)
from models import voicecraft
import io
import numpy as np
import random
import uuid
import nltk
nltk.download('punkt')

DEMO_PATH = os.getenv("DEMO_PATH", "./demo")
TMP_PATH = os.getenv("TMP_PATH", "./demo/temp")
MODELS_PATH = os.getenv("MODELS_PATH", "./pretrained_models")
device = "cuda" if torch.cuda.is_available() else "cpu"
whisper_model, align_model, voicecraft_model = None, None, None
_whitespace_re = re.compile(r"\s+")

def get_random_string():
    return "".join(str(uuid.uuid4()).split("-"))


def seed_everything(seed):
    if seed != -1:
        os.environ['PYTHONHASHSEED'] = str(seed)
        random.seed(seed)
        np.random.seed(seed)
        torch.manual_seed(seed)
        torch.cuda.manual_seed(seed)
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True


class WhisperxAlignModel:
    def __init__(self):
        from whisperx import load_align_model
        self.model, self.metadata = load_align_model(language_code="en", device=device)

    def align(self, segments, audio_path):
        from whisperx import align, load_audio
        audio = load_audio(audio_path)
        return align(segments, self.model, self.metadata, audio, device, return_char_alignments=False)["segments"]


class WhisperModel:
    def __init__(self, model_name):
        from whisper import load_model
        self.model = load_model(model_name, device)

        from whisper.tokenizer import get_tokenizer
        tokenizer = get_tokenizer(multilingual=False)
        self.supress_tokens = [-1] + [
            i
            for i in range(tokenizer.eot)
            if all(c in "0123456789" for c in tokenizer.decode([i]).removeprefix(" "))
        ]

    def transcribe(self, audio_path):
        return self.model.transcribe(audio_path, suppress_tokens=self.supress_tokens, word_timestamps=True)["segments"]


class WhisperxModel:
    def __init__(self, model_name, align_model: WhisperxAlignModel):
        from whisperx import load_model
        self.model = load_model(model_name, device, asr_options={"suppress_numerals": True, "max_new_tokens": None, "clip_timestamps": None, "hallucination_silence_threshold": None})
        self.align_model = align_model

    def transcribe(self, audio_path):
        segments = self.model.transcribe(audio_path, batch_size=
Download .txt
gitextract_ru4iadb9/

├── .dockerignore
├── .gitignore
├── Dockerfile
├── LICENSE-CODE
├── LICENSE-MODEL
├── README.md
├── RealEdit.txt
├── cog.yaml
├── config.py
├── data/
│   ├── __init__.py
│   ├── gigaspeech.py
│   ├── phonemize_encodec_encode_hf.py
│   └── tokenizer.py
├── demo/
│   └── temp/
│       ├── 84_121550_000074_000000.txt
│       └── mfa_alignments/
│           ├── 5895_34622_000026_000002.csv
│           └── 84_121550_000074_000000.csv
├── edit_utils.py
├── environment.yml
├── gradio_app.ipynb
├── gradio_app.py
├── gradio_requirements.txt
├── inference_speech_editing.ipynb
├── inference_speech_editing_scale.py
├── inference_tts.ipynb
├── inference_tts_scale.py
├── main.py
├── models/
│   ├── codebooks_patterns.py
│   ├── modules/
│   │   ├── __init__.py
│   │   ├── activation.py
│   │   ├── embedding.py
│   │   ├── sampling.py
│   │   ├── scaling.py
│   │   ├── transformer.py
│   │   └── utils.py
│   └── voicecraft.py
├── predict.py
├── pretrained_models/
│   └── .gitkeep
├── start-jupyter.bat
├── start-jupyter.sh
├── steps/
│   ├── __init__.py
│   ├── optim.py
│   ├── trainer.py
│   └── trainer_utils.py
├── tts_demo.py
├── voicecraft-gradio-colab.ipynb
└── z_scripts/
    ├── e830M.sh
    └── e830M_ft.sh
Download .txt
SYMBOL INDEX (348 symbols across 21 files)

FILE: config.py
  function MyParser (line 4) | def MyParser():

FILE: data/gigaspeech.py
  class dataset (line 8) | class dataset(torch.utils.data.Dataset):
    method __init__ (line 9) | def __init__(self, args, split):
    method __len__ (line 38) | def __len__(self):
    method _load_phn_enc (line 41) | def _load_phn_enc(self, index):
    method __getitem__ (line 64) | def __getitem__(self, index):
    method collate (line 131) | def collate(self, batch):

FILE: data/phonemize_encodec_encode_hf.py
  function parse_args (line 2) | def parse_args():
  function sort_by_audio_len (line 42) | def sort_by_audio_len(lens):
  function write_array_to_txt_file (line 50) | def write_array_to_txt_file(array, filename):
  class mydataset (line 127) | class mydataset(torch.utils.data.Dataset):
    method __init__ (line 128) | def __init__(self, split):
    method __len__ (line 131) | def __len__(self):
    method __getitem__ (line 133) | def __getitem__(self, ind):
    method collate (line 140) | def collate(self, batch):

FILE: data/tokenizer.py
  class TextTokenizer (line 33) | class TextTokenizer:
    method __init__ (line 36) | def __init__(
    method to_list (line 61) | def to_list(self, phonemized: str) -> List[str]:
    method __call__ (line 75) | def __call__(self, text, strip=True) -> List[List[str]]:
  function tokenize_text (line 85) | def tokenize_text(tokenizer: TextTokenizer, text: str) -> List[str]:
  function convert_audio (line 89) | def convert_audio(wav: torch.Tensor, sr: int, target_sr: int, target_cha...
  class AudioTokenizer (line 101) | class AudioTokenizer:
    method __init__ (line 104) | def __init__(
    method device (line 124) | def device(self):
    method encode (line 127) | def encode(self, wav: torch.Tensor) -> torch.Tensor:
    method decode (line 131) | def decode(self, frames: torch.Tensor) -> torch.Tensor:
  function tokenize_audio (line 137) | def tokenize_audio(tokenizer: AudioTokenizer, audio_path: str, offset = ...

FILE: edit_utils.py
  function get_span (line 1) | def get_span(orig, new, editType):

FILE: gradio_app.py
  function get_random_string (line 26) | def get_random_string():
  function seed_everything (line 30) | def seed_everything(seed):
  class WhisperxAlignModel (line 41) | class WhisperxAlignModel:
    method __init__ (line 42) | def __init__(self):
    method align (line 46) | def align(self, segments, audio_path):
  class WhisperModel (line 52) | class WhisperModel:
    method __init__ (line 53) | def __init__(self, model_name):
    method transcribe (line 65) | def transcribe(self, audio_path):
  class WhisperxModel (line 69) | class WhisperxModel:
    method __init__ (line 70) | def __init__(self, model_name, align_model: WhisperxAlignModel):
    method transcribe (line 75) | def transcribe(self, audio_path):
  function load_models (line 82) | def load_models(whisper_backend_name, whisper_model_name, alignment_mode...
  function get_transcribe_state (line 125) | def get_transcribe_state(segments):
  function transcribe (line 139) | def transcribe(seed, audio_path):
  function align_segments (line 156) | def align_segments(transcript, audio_path):
  function align (line 178) | def align(seed, transcript, audio_path):
  function get_output_audio (line 201) | def get_output_audio(audio_tensors, codec_audio_sr):
  function replace_numbers_with_words (line 208) | def replace_numbers_with_words(sentence):
  function run (line 218) | def run(seed, left_margin, right_margin, codec_audio_sr, codec_sr, top_k...
  function update_input_audio (line 324) | def update_input_audio(audio_path):
  function change_mode (line 337) | def change_mode(mode):
  function load_sentence (line 348) | def load_sentence(selected_sentence, codec_audio_sr, audio_tensors):
  function update_bound_word (line 356) | def update_bound_word(is_first_word, selected_word, edit_word_mode):
  function update_bound_words (line 372) | def update_bound_words(from_selected_word, to_selected_word, edit_word_m...
  function update_demo (line 417) | def update_demo(mode, smart_transcript, edit_word_mode, transcript, edit...
  function get_app (line 433) | def get_app():

FILE: inference_speech_editing_scale.py
  function get_args (line 19) | def get_args():
  function inference_one_sample (line 41) | def inference_one_sample(model, model_args, phn2num, text_tokenizer, aud...
  function get_model (line 86) | def get_model(exp_dir, device=None):
  function get_mask_interval (line 107) | def get_mask_interval(ali_fn, word_span_ind, editType):
  function seed_everything (line 130) | def seed_everything(seed):

FILE: inference_tts_scale.py
  function get_args (line 20) | def get_args():
  function inference_one_sample (line 43) | def inference_one_sample(model, model_args, phn2num, text_tokenizer, aud...
  function get_model (line 107) | def get_model(exp_dir, device=None):
  function seed_everything (line 128) | def seed_everything(seed):

FILE: models/codebooks_patterns.py
  class Pattern (line 21) | class Pattern:
    method __post_init__ (line 49) | def __post_init__(self):
    method _validate_layout (line 57) | def _validate_layout(self):
    method num_sequence_steps (line 79) | def num_sequence_steps(self):
    method max_delay (line 83) | def max_delay(self):
    method valid_layout (line 91) | def valid_layout(self):
    method get_sequence_coords_with_timestep (line 95) | def get_sequence_coords_with_timestep(self, t: int, q: tp.Optional[int...
    method get_steps_with_timestep (line 110) | def get_steps_with_timestep(self, t: int, q: tp.Optional[int] = None) ...
    method get_first_step_with_timesteps (line 113) | def get_first_step_with_timesteps(self, t: int, q: tp.Optional[int] = ...
    method _build_pattern_sequence_scatter_indexes (line 117) | def _build_pattern_sequence_scatter_indexes(self, timesteps: int, n_q:...
    method build_pattern_sequence (line 151) | def build_pattern_sequence(self, z: torch.Tensor, special_token: int, ...
    method _build_reverted_sequence_scatter_indexes (line 178) | def _build_reverted_sequence_scatter_indexes(self, sequence_steps: int...
    method revert_pattern_sequence (line 222) | def revert_pattern_sequence(self, s: torch.Tensor, special_token: int,...
    method revert_pattern_logits (line 247) | def revert_pattern_logits(self, logits: torch.Tensor, special_token: f...
  class CodebooksPatternProvider (line 269) | class CodebooksPatternProvider(ABC):
    method __init__ (line 287) | def __init__(self, n_q: int, cached: bool = True):
    method get_pattern (line 293) | def get_pattern(self, timesteps: int) -> Pattern:
  class DelayedPatternProvider (line 302) | class DelayedPatternProvider(CodebooksPatternProvider):
    method __init__ (line 325) | def __init__(self, n_q: int, delays: tp.Optional[tp.List[int]] = None,
    method get_pattern (line 336) | def get_pattern(self, timesteps: int) -> Pattern:
  class ParallelPatternProvider (line 355) | class ParallelPatternProvider(DelayedPatternProvider):
    method __init__ (line 363) | def __init__(self, n_q: int):
  class UnrolledPatternProvider (line 367) | class UnrolledPatternProvider(CodebooksPatternProvider):
    method __init__ (line 418) | def __init__(self, n_q: int, flattening: tp.Optional[tp.List[int]] = N...
    method _build_flattened_codebooks (line 432) | def _build_flattened_codebooks(self, delays: tp.List[int], flattening:...
    method _num_inner_steps (line 452) | def _num_inner_steps(self):
    method num_virtual_steps (line 457) | def num_virtual_steps(self, timesteps: int) -> int:
    method get_pattern (line 460) | def get_pattern(self, timesteps: int) -> Pattern:
  class VALLEPattern (line 488) | class VALLEPattern(CodebooksPatternProvider):
    method __init__ (line 497) | def __init__(self, n_q: int, delays: tp.Optional[tp.List[int]] = None):
    method get_pattern (line 505) | def get_pattern(self, timesteps: int) -> Pattern:
  class MusicLMPattern (line 520) | class MusicLMPattern(CodebooksPatternProvider):
    method __init__ (line 528) | def __init__(self, n_q: int, group_by: int = 2):
    method get_pattern (line 532) | def get_pattern(self, timesteps: int) -> Pattern:

FILE: models/modules/activation.py
  function _canonical_mask (line 20) | def _canonical_mask(
  function _in_projection_packed (line 48) | def _in_projection_packed(
  function _none_or_dtype (line 110) | def _none_or_dtype(input: Optional[Tensor]) -> Optional[DType]:
  class MultiheadAttention (line 116) | class MultiheadAttention(Module):
    method __init__ (line 176) | def __init__(
    method _reset_parameters (line 280) | def _reset_parameters(self):
    method __setstate__ (line 297) | def __setstate__(self, state):
    method forward (line 304) | def forward(

FILE: models/modules/embedding.py
  class TokenEmbedding (line 22) | class TokenEmbedding(nn.Module):
    method __init__ (line 23) | def __init__(
    method weight (line 38) | def weight(self) -> torch.Tensor:
    method embedding (line 41) | def embedding(self, index: int) -> torch.Tensor:
    method forward (line 44) | def forward(self, x: torch.Tensor):
  class SinePositionalEmbedding (line 51) | class SinePositionalEmbedding(nn.Module):
    method __init__ (line 52) | def __init__(
    method extend_pe (line 69) | def extend_pe(self, x):
    method forward (line 94) | def forward(self, x: torch.Tensor) -> torch.Tensor:

FILE: models/modules/sampling.py
  function top_k_top_p_filtering (line 4) | def top_k_top_p_filtering(
  function topk_sampling (line 48) | def topk_sampling(logits, top_k=10, top_p=1.0, temperature=1.0):

FILE: models/modules/scaling.py
  class Transpose (line 35) | class Transpose(nn.Identity):
    method forward (line 38) | def forward(self, input: torch.Tensor) -> torch.Tensor:
  class ActivationBalancerFunction (line 41) | class ActivationBalancerFunction(torch.autograd.Function):
    method forward (line 43) | def forward(
    method backward (line 61) | def backward(ctx, x_grad: Tensor) -> Tuple[Tensor, None, None, None]:
  function _compute_scale_factor (line 82) | def _compute_scale_factor(
  function _compute_sign_factor (line 111) | def _compute_sign_factor(
  class ActivationScaleBalancerFunction (line 147) | class ActivationScaleBalancerFunction(torch.autograd.Function):
    method forward (line 155) | def forward(
    method backward (line 170) | def backward(ctx, x_grad: Tensor) -> Tuple[Tensor, None, None, None]:
  class RandomClampFunction (line 186) | class RandomClampFunction(torch.autograd.Function):
    method forward (line 188) | def forward(
    method backward (line 207) | def backward(
  function random_clamp (line 218) | def random_clamp(
  function random_cast_to_half (line 228) | def random_cast_to_half(x: Tensor, min_abs: float = 5.0e-06) -> Tensor:
  class RandomGradFunction (line 243) | class RandomGradFunction(torch.autograd.Function):
    method forward (line 250) | def forward(ctx, x: Tensor, min_abs: float) -> Tensor:
    method backward (line 255) | def backward(ctx, ans_grad: Tensor) -> Tuple[Tensor, None]:
  class RandomGrad (line 267) | class RandomGrad(torch.nn.Module):
    method __init__ (line 273) | def __init__(self, min_abs: float = 5.0e-06):
    method forward (line 277) | def forward(self, x: Tensor):
  class SoftmaxFunction (line 288) | class SoftmaxFunction(torch.autograd.Function):
    method forward (line 295) | def forward(ctx, x: Tensor, dim: int):
    method backward (line 308) | def backward(ctx, ans_grad: Tensor):
  function softmax (line 318) | def softmax(x: Tensor, dim: int):
  class MaxEigLimiterFunction (line 325) | class MaxEigLimiterFunction(torch.autograd.Function):
    method forward (line 327) | def forward(
    method backward (line 341) | def backward(ctx, x_grad, *args):
  class BasicNorm (line 366) | class BasicNorm(torch.nn.Module):
    method __init__ (line 396) | def __init__(
    method forward (line 415) | def forward(self, x: Tensor) -> Tensor:
  function ScaledLinear (line 432) | def ScaledLinear(*args, initial_scale: float = 1.0, **kwargs) -> nn.Linear:
  function ScaledConv1d (line 457) | def ScaledConv1d(
  function TransposeScaledConv1d (line 488) | def TransposeScaledConv1d(
  function ScaledConv1dTranspose (line 510) | def ScaledConv1dTranspose(
  function TransposeConv1d (line 532) | def TransposeConv1d(
  function Conv1dTranspose (line 544) | def Conv1dTranspose(
  class SRLinear (line 556) | class SRLinear(nn.Linear):
    method __init__ (line 561) | def __init__(self, in_features, out_features, bias=True, **kwargs):
    method get_sigma (line 571) | def get_sigma(self):
    method get_weight (line 581) | def get_weight(self):
    method forward (line 588) | def forward(self, x):
  class SRConv1d (line 592) | class SRConv1d(SRLinear):
    method __init__ (line 593) | def __init__(
    method forward (line 610) | def forward(self, x):
  function TransposeSRConv1d (line 620) | def TransposeSRConv1d(
  function SRConv1dTranspose (line 632) | def SRConv1dTranspose(
  class ActivationBalancer (line 644) | class ActivationBalancer(torch.nn.Module):
    method __init__ (line 684) | def __init__(
    method forward (line 715) | def forward(self, x: Tensor) -> Tensor:
  function penalize_abs_values_gt (line 769) | def penalize_abs_values_gt(x: Tensor, limit: float, penalty: float) -> T...
  function _diag (line 797) | def _diag(x: Tensor):  # like .diag(), but works for tensors with 3 dims.
  function _whitening_metric (line 808) | def _whitening_metric(x: Tensor, num_groups: int):
  class WhiteningPenaltyFunction (line 846) | class WhiteningPenaltyFunction(torch.autograd.Function):
    method forward (line 848) | def forward(
    method backward (line 862) | def backward(ctx, x_grad: Tensor):
  class Whiten (line 887) | class Whiten(nn.Module):
    method __init__ (line 888) | def __init__(
    method forward (line 929) | def forward(self, x: Tensor) -> Tensor:
  class WithLoss (line 970) | class WithLoss(torch.autograd.Function):
    method forward (line 972) | def forward(ctx, x: Tensor, y: Tensor):
    method backward (line 977) | def backward(ctx, ans_grad: Tensor):
  function with_loss (line 983) | def with_loss(x, y):
  function _no_op (line 990) | def _no_op(x: Tensor) -> Tensor:
  class Identity (line 999) | class Identity(torch.nn.Module):
    method __init__ (line 1000) | def __init__(self):
    method forward (line 1003) | def forward(self, x):
  class MaxEig (line 1007) | class MaxEig(torch.nn.Module):
    method __init__ (line 1028) | def __init__(
    method forward (line 1058) | def forward(self, x: Tensor) -> Tensor:
    method _set_direction (line 1116) | def _set_direction(self, direction: Tensor):
    method _find_direction_coeffs (line 1131) | def _find_direction_coeffs(
  class DoubleSwishFunction (line 1161) | class DoubleSwishFunction(torch.autograd.Function):
    method forward (line 1178) | def forward(ctx, x: Tensor) -> Tensor:
    method backward (line 1211) | def backward(ctx, y_grad: Tensor) -> Tensor:
  class DoubleSwish (line 1220) | class DoubleSwish(torch.nn.Module):
    method forward (line 1221) | def forward(self, x: Tensor) -> Tensor:
  function BalancedDoubleSwish (line 1230) | def BalancedDoubleSwish(
  function _test_max_eig (line 1245) | def _test_max_eig():
  function _test_whiten (line 1272) | def _test_whiten():
  function _test_activation_balancer_sign (line 1299) | def _test_activation_balancer_sign():
  function _test_activation_balancer_magnitude (line 1325) | def _test_activation_balancer_magnitude():
  function _test_basic_norm (line 1353) | def _test_basic_norm():
  function _test_double_swish_deriv (line 1370) | def _test_double_swish_deriv():
  function _test_softmax (line 1384) | def _test_softmax():

FILE: models/modules/transformer.py
  class LayerNorm (line 18) | class LayerNorm(nn.Module):
    method __init__ (line 24) | def __init__(
    method reset_parameters (line 53) | def reset_parameters(self) -> None:
    method forward (line 58) | def forward(self, input: Tensor, embedding: Any = None) -> Tensor:
    method extra_repr (line 77) | def extra_repr(self) -> str:
  class AdaptiveLayerNorm (line 84) | class AdaptiveLayerNorm(nn.Module):
    method __init__ (line 87) | def __init__(self, d_model, norm) -> None:
    method forward (line 94) | def forward(self, input: Tensor, embedding: Tensor = None) -> Tensor:
  class BasicNorm (line 112) | class BasicNorm(_BasicNorm):
    method __init__ (line 113) | def __init__(
    method forward (line 122) | def forward(self, input: Tensor, embedding: Any = None) -> Tensor:
  class BalancedBasicNorm (line 134) | class BalancedBasicNorm(nn.Module):
    method __init__ (line 135) | def __init__(
    method forward (line 152) | def forward(self, input: Tensor, embedding: Any = None) -> Tensor:
  class IdentityNorm (line 161) | class IdentityNorm(nn.Module):
    method __init__ (line 162) | def __init__(
    method forward (line 171) | def forward(self, input: Tensor, embedding: Any = None) -> Tensor:
  class TransformerEncoderLayer (line 179) | class TransformerEncoderLayer(nn.Module):
    method __init__ (line 182) | def __init__(
    method __setstate__ (line 261) | def __setstate__(self, state):
    method forward (line 266) | def forward(
    method _sa_block (line 346) | def _sa_block(
    method _sa_block_attn (line 366) | def _sa_block_attn(
    method _ff_block (line 386) | def _ff_block(self, x: Tensor) -> Tensor:
  class TransformerEncoder (line 391) | class TransformerEncoder(nn.Module):
    method __init__ (line 411) | def __init__(self, encoder_layer, num_layers, norm=None):
    method forward (line 417) | def forward(
  class TransformerDecoderLayer (line 491) | class TransformerDecoderLayer(nn.Module):
    method __init__ (line 494) | def __init__(
    method forward (line 587) | def forward(
    method _sa_block (line 646) | def _sa_block(
    method _mha_block (line 663) | def _mha_block(
    method _ff_block (line 681) | def _ff_block(self, x: Tensor) -> Tensor:
  function _get_clones (line 686) | def _get_clones(module, N):
  function _get_activation_fn (line 690) | def _get_activation_fn(activation: str) -> Callable[[Tensor], Tensor]:

FILE: models/modules/utils.py
  function make_pad_mask (line 5) | def make_pad_mask(lengths: torch.Tensor, max_len: int = 0) -> torch.Tensor:
  function generate_partial_autoregressive_mask (line 32) | def generate_partial_autoregressive_mask(sz, start, end):

FILE: models/voicecraft.py
  function top_k_top_p_filtering (line 26) | def top_k_top_p_filtering(
  function topk_sampling (line 71) | def topk_sampling(logits, top_k=10, top_p=1.0, temperature=1.0):
  class VoiceCraft (line 90) | class VoiceCraft(
    method __new__ (line 97) | def __new__(cls, args: Optional[Namespace] = None, config: Optional[Di...
    method __init__ (line 106) | def __init__(self, args: Optional[Namespace] = None, config: Optional[...
    method prepare_mask_intervals (line 198) | def prepare_mask_intervals(self, y_lens):
    method rearrange (line 239) | def rearrange(self, y, non_mask_intervals, mask_intervals):
    method shift (line 254) | def shift(self, rearranged_y):
    method insert_mask (line 264) | def insert_mask(self, shifted_y):
    method cat_y (line 290) | def cat_y(self, inserted_y, mask_position, y_lens):
    method embed_y (line 311) | def embed_y(self, cated_y, mask_position, mask_value):
    method prepare_input_target (line 322) | def prepare_input_target(self, y, y_lens):
    method remove_mask (line 376) | def remove_mask(self, logits, mask_position, new_y_lens):
    method revert_pattern (line 387) | def revert_pattern(self, patterns, logits_use):
    method dec_forward (line 406) | def dec_forward(
    method forward (line 472) | def forward(self, batch):
    method inference (line 561) | def inference(
    method inference_tts (line 908) | def inference_tts(
    method inference_tts_batch (line 1156) | def inference_tts_batch(

FILE: predict.py
  class ModelOutput (line 33) | class ModelOutput(BaseModel):
  class WhisperxAlignModel (line 38) | class WhisperxAlignModel:
    method __init__ (line 39) | def __init__(self):
    method align (line 46) | def align(self, segments, audio_path):
  class WhisperxModel (line 60) | class WhisperxModel:
    method __init__ (line 61) | def __init__(self, model_name, align_model: WhisperxAlignModel, device...
    method transcribe (line 77) | def transcribe(self, audio_path):
  function download_weights (line 84) | def download_weights(url, dest):
  class Predictor (line 92) | class Predictor(BasePredictor):
    method setup (line 93) | def setup(self):
    method predict (line 130) | def predict(
  function seed_everything (line 336) | def seed_everything(seed):
  function get_transcribe_state (line 346) | def get_transcribe_state(segments):
  function find_closest_cut_off_word (line 357) | def find_closest_cut_off_word(word_bounds, cut_off_sec):
  function get_mask_interval_from_word_bounds (line 372) | def get_mask_interval_from_word_bounds(word_bounds, word_span_ind, editT...

FILE: steps/optim.py
  class BatchedOptimizer (line 29) | class BatchedOptimizer(Optimizer):
    method __init__ (line 40) | def __init__(self, params, defaults):
    method batched_params (line 44) | def batched_params(self, param_group, group_params_names):
  class ScaledAdam (line 129) | class ScaledAdam(BatchedOptimizer):
    method __init__ (line 172) | def __init__(
    method __setstate__ (line 212) | def __setstate__(self, state):
    method step (line 216) | def step(self, closure=None):
    method _init_state (line 265) | def _init_state(self, group: dict, p: Tensor, state: dict):
    method _get_clipping_scale (line 316) | def _get_clipping_scale(
    method _show_gradient_dominating_parameter (line 414) | def _show_gradient_dominating_parameter(
    method _step_one_batch (line 479) | def _step_one_batch(
    method _size_update (line 531) | def _size_update(
    method _step (line 598) | def _step(self, group: dict, p: Tensor, state: dict):
    method _step_scalar (line 639) | def _step_scalar(self, group: dict, p: Tensor, state: dict):
  class LRScheduler (line 664) | class LRScheduler(object):
    method __init__ (line 670) | def __init__(self, optimizer: Optimizer, verbose: bool = False):
    method state_dict (line 687) | def state_dict(self):
    method load_state_dict (line 699) | def load_state_dict(self, state_dict):
    method get_last_lr (line 708) | def get_last_lr(self) -> List[float]:
    method get_lr (line 712) | def get_lr(self):
    method step_batch (line 718) | def step_batch(self, batch: Optional[int] = None) -> None:
    method step_epoch (line 730) | def step_epoch(self, epoch: Optional[int] = None):
    method _set_lrs (line 740) | def _set_lrs(self):
    method print_lr (line 750) | def print_lr(self, is_verbose, group, lr):
  class Eden (line 759) | class Eden(LRScheduler):
    method __init__ (line 781) | def __init__(
    method get_lr (line 794) | def get_lr(self):
  function _test_eden (line 810) | def _test_eden():
  class Eve (line 836) | class Eve(Optimizer):
    method __init__ (line 872) | def __init__(
    method __setstate__ (line 908) | def __setstate__(self, state):
    method step (line 912) | def step(self, closure=None):
  function ScaledLinear (line 987) | def ScaledLinear(*args, initial_scale: float = 1.0, **kwargs) -> nn.Linear:
  function _test_scaled_adam (line 1010) | def _test_scaled_adam(hidden_dim: int):

FILE: steps/trainer.py
  class Trainer (line 21) | class Trainer:
    method __init__ (line 23) | def __init__(self, args, world_size, rank):
    method train (line 55) | def train(self):
    method validate_and_save (line 198) | def validate_and_save(self):
    method validate (line 244) | def validate(self, valid_loader=None, hide_progress=True):
    method _setup_meters (line 295) | def _setup_meters(self):
    method _setup_progress (line 304) | def _setup_progress(self):
    method _save_progress (line 327) | def _save_progress(self):
    method _setup_dataloader (line 332) | def _setup_dataloader(self):
    method _setup_models (line 371) | def _setup_models(self):
    method _setup_optimizer (line 420) | def _setup_optimizer(self):
    method seed_everything (line 461) | def seed_everything(self, seed=1):

FILE: steps/trainer_utils.py
  class StatefulDistributedSampler (line 12) | class StatefulDistributedSampler(Sampler[int]):
    method __init__ (line 13) | def __init__(self, dataset, batch_size, num_replicas = None, rank = No...
    method __len__ (line 48) | def __len__(self):
    method set_epoch (line 51) | def set_epoch(self, epoch):
    method __iter__ (line 92) | def __iter__(self):
    method set_epoch_resume (line 96) | def set_epoch_resume(self, epoch, cur_step):
  class StatefulSampler (line 102) | class StatefulSampler(Sampler):
    method __init__ (line 103) | def __init__(self, data_source_length, batch_size, use_random=True, se...
    method __len__ (line 113) | def __len__(self):
    method __iter__ (line 116) | def __iter__(self):
    method set_epoch (line 121) | def set_epoch(self, epoch):
    method set_epoch_resume (line 136) | def set_epoch_resume(self, epoch, cur_step):
  class AverageMeter (line 142) | class AverageMeter:
    method __init__ (line 144) | def __init__(self):
    method reset (line 147) | def reset(self):
    method update (line 153) | def update(self, val, n=1):
  function print_model_info (line 159) | def print_model_info(model, print_model = False, print_params = True):
  class DistributedDynamicBatchSampler (line 175) | class DistributedDynamicBatchSampler(Sampler):
    method __init__ (line 283) | def __init__(
    method get_durations (line 404) | def get_durations(self, batch):
    method _get_boundaries_through_warping (line 408) | def _get_boundaries_through_warping(
    method _permute_batches (line 439) | def _permute_batches(self):
    method _generate_batches (line 467) | def _generate_batches(self):
    method __iter__ (line 593) | def __iter__(self):
    method set_epoch (line 605) | def set_epoch(self, epoch):
    method __len__ (line 622) | def __len__(self):
    method set_epoch_resume (line 625) | def set_epoch_resume(self, epoch, cur_step):

FILE: tts_demo.py
  function parse_arguments (line 23) | def parse_arguments():
  function find_closest_word_boundary (line 145) | def find_closest_word_boundary(alignments, cut_off_sec, margin, cutoff_t...
  function seed_everything (line 184) | def seed_everything(seed):
Condensed preview — 47 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,282K chars).
[
  {
    "path": ".dockerignore",
    "chars": 315,
    "preview": "# The .dockerignore file excludes files from the container build process.\n#\n# https://docs.docker.com/engine/reference/b"
  },
  {
    "path": ".gitignore",
    "chars": 293,
    "preview": "__pycache__/\n*.py[cod]\n*$py.class\n*.egg-info\n.pytest_cache\n.ipynb_checkpoints\n\nthumbs.db\n.DS_Store\n.idea\n*.log\n*.pdf\n*.m"
  },
  {
    "path": "Dockerfile",
    "chars": 1523,
    "preview": "FROM jupyter/base-notebook:python-3.9.13\n\nUSER root\n\n# Install OS dependencies\nRUN apt-get update && apt-get install -y "
  },
  {
    "path": "LICENSE-CODE",
    "chars": 20845,
    "preview": "Attribution-NonCommercial-ShareAlike 4.0 International\n\n================================================================"
  },
  {
    "path": "LICENSE-MODEL",
    "chars": 3895,
    "preview": "Coqui Public Model License 1.0.0\nhttps://coqui.ai/cpml.txt\n\nThis license allows only non-commercial use of a machine lea"
  },
  {
    "path": "README.md",
    "chars": 12763,
    "preview": "# VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild\n[![Paper](https://img.shields.io/badge/arXiv-2403."
  },
  {
    "path": "RealEdit.txt",
    "chars": 97775,
    "preview": "wav_fn\torig_transcript\tnew_transcript\torig_masked_span\tnew_masked_span\ttype\nYOU1000000102_S0000137.wav\tif i had never dr"
  },
  {
    "path": "cog.yaml",
    "chars": 948,
    "preview": "# Configuration for Cog ⚙️\n# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md\n\nbuild:\n  gpu: true\n  sy"
  },
  {
    "path": "config.py",
    "chars": 9153,
    "preview": "import argparse\n\n\ndef MyParser():\n    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpForm"
  },
  {
    "path": "data/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/gigaspeech.py",
    "chars": 7366,
    "preview": "import os\nimport torch\nimport random\nimport copy\nimport logging\nimport shutil\n\nclass dataset(torch.utils.data.Dataset):\n"
  },
  {
    "path": "data/phonemize_encodec_encode_hf.py",
    "chars": 11977,
    "preview": "import argparse\ndef parse_args():\n    parser = argparse.ArgumentParser(description=\"encode the librilight dataset using "
  },
  {
    "path": "data/tokenizer.py",
    "chars": 5135,
    "preview": "# cp from https://github.com/lifeiteng/vall-e/blob/main/valle/data/tokenizer.py\n# Copyright    2023                     "
  },
  {
    "path": "demo/temp/84_121550_000074_000000.txt",
    "chars": 125,
    "preview": "But when I had approached so near to them The common object, which the sense deceives, Lost not by distance any of its m"
  },
  {
    "path": "demo/temp/mfa_alignments/5895_34622_000026_000002.csv",
    "chars": 2761,
    "preview": "Begin,End,Label,Type,Speaker\r\n0.04,0.58,gwynplaine,words,temp\r\n0.58,0.94,had,words,temp\r\n0.94,1.45,besides,words,temp\r\n1"
  },
  {
    "path": "demo/temp/mfa_alignments/84_121550_000074_000000.csv",
    "chars": 2823,
    "preview": "Begin,End,Label,Type,Speaker\r\n0.03,0.18,but,words,temp\r\n0.18,0.32,when,words,temp\r\n0.32,0.48,i,words,temp\r\n0.48,0.64,had"
  },
  {
    "path": "edit_utils.py",
    "chars": 2330,
    "preview": "def get_span(orig, new, editType):\n    orig_list = orig.split(\" \")\n    new_list = new.split(\" \")\n    \n    flag = False #"
  },
  {
    "path": "environment.yml",
    "chars": 12675,
    "preview": "name: voicecraft\nchannels:\n  - conda-forge\n  - defaults\ndependencies:\n  - _libgcc_mutex=0.1=conda_forge\n  - _openmp_mute"
  },
  {
    "path": "gradio_app.ipynb",
    "chars": 2286,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"9b6a0c92\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Only do th"
  },
  {
    "path": "gradio_app.py",
    "chars": 33638,
    "preview": "import os\nimport re\nfrom num2words import num2words\nimport gradio as gr\nimport torch\nimport torchaudio\nfrom data.tokeniz"
  },
  {
    "path": "gradio_requirements.txt",
    "chars": 126,
    "preview": "gradio==3.50.2\nnltk>=3.8.1\nopenai-whisper>=20231117\naeneas>=1.7.3.0\nwhisperx>=3.1.1\nhuggingface_hub==0.22.2\nnum2words==0"
  },
  {
    "path": "inference_speech_editing.ipynb",
    "chars": 647101,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n "
  },
  {
    "path": "inference_speech_editing_scale.py",
    "chars": 10434,
    "preview": "import argparse, pickle\nimport logging\nimport os, random\nimport numpy as np\nimport torch\nimport torchaudio\n\nfrom data.to"
  },
  {
    "path": "inference_tts.ipynb",
    "chars": 10017,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"VoiceCraft Inference Text To Speech"
  },
  {
    "path": "inference_tts_scale.py",
    "chars": 9407,
    "preview": "import argparse, pickle\nimport logging\nimport os, random\nimport numpy as np\nimport torch\nimport torchaudio\n\nfrom data.to"
  },
  {
    "path": "main.py",
    "chars": 1393,
    "preview": "from pathlib import Path\nimport torch\nimport pickle\nimport argparse\nimport logging\nimport torch.distributed as dist\nfrom"
  },
  {
    "path": "models/codebooks_patterns.py",
    "chars": 27578,
    "preview": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n# All rights reserved.\n#\n# This source code is licensed under the l"
  },
  {
    "path": "models/modules/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "models/modules/activation.py",
    "chars": 30772,
    "preview": "# cp from https://github.com/lifeiteng/vall-e/blob/main/valle/modules/activation.py, modified by Puyuan Peng, 2024\nfrom "
  },
  {
    "path": "models/modules/embedding.py",
    "chars": 3280,
    "preview": "# cp from https://github.com/lifeiteng/vall-e/blob/main/valle/modules/embedding.py\n# Copyright    2023                  "
  },
  {
    "path": "models/modules/sampling.py",
    "chars": 3102,
    "preview": "import torch\nimport torch.nn.functional as F\n\ndef top_k_top_p_filtering(\n    logits, top_k=0, top_p=1.0, filter_value=-f"
  },
  {
    "path": "models/modules/scaling.py",
    "chars": 49807,
    "preview": "# cp from https://github.com/lifeiteng/vall-e/blob/main/valle/modules/scaling.py\n# Copyright    2022  Xiaomi Corp.      "
  },
  {
    "path": "models/modules/transformer.py",
    "chars": 23543,
    "preview": "# cp from https://github.com/lifeiteng/vall-e/blob/main/valle/modules/transformer.py, modified by Puyuan Peng 2024\nimpor"
  },
  {
    "path": "models/modules/utils.py",
    "chars": 1341,
    "preview": "# cp from https://github.com/lifeiteng/vall-e/blob/main/valle/modules/transformer.py, modified by Puyuan Peng\nimport tor"
  },
  {
    "path": "models/voicecraft.py",
    "chars": 81820,
    "preview": "import random\n\nimport numpy as np\nimport logging\nimport argparse, copy\nfrom typing import Dict, Optional\nimport torch\nim"
  },
  {
    "path": "predict.py",
    "chars": 13820,
    "preview": "# Prediction interface for Cog ⚙️\n# https://github.com/replicate/cog/blob/main/docs/python.md\n\nimport os\nimport time\nimp"
  },
  {
    "path": "pretrained_models/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "start-jupyter.bat",
    "chars": 792,
    "preview": "@echo off\n\necho Creating and running the Jupyter container...\n\ndocker run -it -d ^\n    --gpus all ^\n    -p 8888:8888 ^\n "
  },
  {
    "path": "start-jupyter.sh",
    "chars": 803,
    "preview": "#!/usr/bin/env bash\n## Assumes you have docker installed with nvidia container container-toolkit\n# https://docs.nvidia.c"
  },
  {
    "path": "steps/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "steps/optim.py",
    "chars": 43866,
    "preview": "# Copyright      2022  Xiaomi Corp.        (authors: Daniel Povey)\n#\n# See ../LICENSE for clarification regarding multip"
  },
  {
    "path": "steps/trainer.py",
    "chars": 24523,
    "preview": "import time\nimport os, random\nimport torch\nimport math, pickle\nfrom tqdm import tqdm\nfrom torch.optim import AdamW\nfrom "
  },
  {
    "path": "steps/trainer_utils.py",
    "chars": 27921,
    "preview": "\nimport torch\nimport math\nimport torch.distributed as dist\nfrom torch.utils.data.sampler import Sampler\nimport copy\nimpo"
  },
  {
    "path": "tts_demo.py",
    "chars": 10769,
    "preview": "\"\"\"\nThis script will allow you to run TTS inference with Voicecraft\nBefore getting started, be sure to follow the enviro"
  },
  {
    "path": "voicecraft-gradio-colab.ipynb",
    "chars": 4840,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"colab_type\": \"text\",\n        \"id\": \"vie"
  },
  {
    "path": "z_scripts/e830M.sh",
    "chars": 1875,
    "preview": "#!/bin/bash\nsource ~/miniconda3/etc/profile.d/conda.sh\nconda activate voicecraft\nexport CUDA_VISIBLE_DEVICES=0,1,2,3\nexp"
  },
  {
    "path": "z_scripts/e830M_ft.sh",
    "chars": 1844,
    "preview": "#!/bin/bash\nsource ~/miniconda3/etc/profile.d/conda.sh\nconda activate voicecraft\nexport CUDA_VISIBLE_DEVICES=0,1,2,3\nexp"
  }
]

About this extraction

This page contains the full source code of the jasonppy/VoiceCraft GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 47 files (1.2 MB), approximately 595.7k tokens, and a symbol index with 348 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!