Full Code of myshell-ai/MeloTTS for AI

main 209145371cff cached
90 files
15.4 MB
1.1M tokens
402 symbols
1 requests
Download .txt
Showing preview only (4,366K chars total). Download the full file or copy to clipboard to get everything.
Repository: myshell-ai/MeloTTS
Branch: main
Commit: 209145371cff
Files: 90
Total size: 15.4 MB

Directory structure:
gitextract_toln448z/

├── .github/
│   └── workflows/
│       └── pypi.yml
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.md
├── docs/
│   ├── install.md
│   ├── quick_use.md
│   └── training.md
├── melo/
│   ├── __init__.py
│   ├── api.py
│   ├── app.py
│   ├── attentions.py
│   ├── commons.py
│   ├── configs/
│   │   └── config.json
│   ├── data/
│   │   └── example/
│   │       └── metadata.list
│   ├── data_utils.py
│   ├── download_utils.py
│   ├── infer.py
│   ├── init_downloads.py
│   ├── losses.py
│   ├── main.py
│   ├── mel_processing.py
│   ├── models.py
│   ├── modules.py
│   ├── monotonic_align/
│   │   ├── __init__.py
│   │   └── core.py
│   ├── preprocess_text.py
│   ├── split_utils.py
│   ├── text/
│   │   ├── __init__.py
│   │   ├── chinese.py
│   │   ├── chinese_bert.py
│   │   ├── chinese_mix.py
│   │   ├── cleaner.py
│   │   ├── cleaner_multiling.py
│   │   ├── cmudict.rep
│   │   ├── cmudict_cache.pickle
│   │   ├── english.py
│   │   ├── english_bert.py
│   │   ├── english_utils/
│   │   │   ├── __init__.py
│   │   │   ├── abbreviations.py
│   │   │   ├── number_norm.py
│   │   │   └── time_norm.py
│   │   ├── es_phonemizer/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── cleaner.py
│   │   │   ├── es_symbols.json
│   │   │   ├── es_symbols.txt
│   │   │   ├── es_symbols_v2.json
│   │   │   ├── es_to_ipa.py
│   │   │   ├── example_ipa.txt
│   │   │   ├── gruut_wrapper.py
│   │   │   ├── punctuation.py
│   │   │   ├── spanish_symbols.txt
│   │   │   └── test.ipynb
│   │   ├── fr_phonemizer/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── cleaner.py
│   │   │   ├── en_symbols.json
│   │   │   ├── example_ipa.txt
│   │   │   ├── fr_symbols.json
│   │   │   ├── fr_to_ipa.py
│   │   │   ├── french_abbreviations.py
│   │   │   ├── french_symbols.txt
│   │   │   ├── gruut_wrapper.py
│   │   │   └── punctuation.py
│   │   ├── french.py
│   │   ├── french_bert.py
│   │   ├── japanese.py
│   │   ├── japanese_bert.py
│   │   ├── ko_dictionary.py
│   │   ├── korean.py
│   │   ├── opencpop-strict.txt
│   │   ├── spanish.py
│   │   ├── spanish_bert.py
│   │   ├── symbols.py
│   │   └── tone_sandhi.py
│   ├── train.py
│   ├── train.sh
│   ├── transforms.py
│   └── utils.py
├── requirements.txt
├── setup.py
└── test/
    ├── basetts_test_resources/
    │   ├── en_egs_text.txt
    │   ├── es_egs_text.txt
    │   ├── fr_egs_text.txt
    │   ├── jp_egs_text.txt
    │   ├── kr_egs_text.txt
    │   └── zh_mix_en_egs_text.txt
    ├── test_base_model_tts_package.py
    └── test_base_model_tts_package_from_S3.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/pypi.yml
================================================
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries

# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.

name: Upload Python Package

on:
  release:
    types: [published]

permissions:
  contents: read

jobs:
  deploy:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3
    - name: Set up Python
      uses: actions/setup-python@v3
      with:
        python-version: '3.x'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        python -m ensurepip --upgrade
        pip install build
    - name: Build package
      run: python -m build
    - name: Publish package
      uses: pypa/gh-action-pypi-publish@release/v1.8
      with:
        user: __token__
        password: ${{ secrets.PYPI_API_TOKEN }}


================================================
FILE: .gitignore
================================================
__pycache__/
.ipynb_checkpoints/
basetts_outputs_use_bert/
basetts_outputs/
multilingual_ckpts
basetts_outputs_package/
build/
*.egg-info/

*.zip
*.wav

================================================
FILE: Dockerfile
================================================
FROM python:3.9-slim
WORKDIR /app
COPY . /app

RUN apt-get update && apt-get install -y \
    build-essential libsndfile1 \
    && rm -rf /var/lib/apt/lists/*

RUN pip install -e .
RUN python -m unidic download
RUN python melo/init_downloads.py

CMD ["python", "./melo/app.py", "--host", "0.0.0.0", "--port", "8888"]

================================================
FILE: LICENSE
================================================
Copyright (c) 2024 MyShell.ai

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

================================================
FILE: README.md
================================================
<div align="center">
  <div>&nbsp;</div>
  <img src="logo.png" width="300"/> <br>
  <a href="https://trendshift.io/repositories/8133" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8133" alt="myshell-ai%2FMeloTTS | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>

## Introduction
MeloTTS is a **high-quality multi-lingual** text-to-speech library by [MIT](https://www.mit.edu/) and [MyShell.ai](https://myshell.ai). Supported languages include:

| Language | Example |
| --- | --- |
| English (American)    | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-US/speed_1.0/sent_000.wav) |
| English (British)     | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-BR/speed_1.0/sent_000.wav) |
| English (Indian)      | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN_INDIA/speed_1.0/sent_000.wav) |
| English (Australian)  | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-AU/speed_1.0/sent_000.wav) |
| English (Default)     | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-Default/speed_1.0/sent_000.wav) |
| Spanish               | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/es/ES/speed_1.0/sent_000.wav) |
| French                | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/fr/FR/speed_1.0/sent_000.wav) |
| Chinese (mix EN)      | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/zh/ZH/speed_1.0/sent_008.wav) |
| Japanese              | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/jp/JP/speed_1.0/sent_000.wav) |
| Korean                | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/kr/KR/speed_1.0/sent_000.wav) |

Some other features include:
- The Chinese speaker supports `mixed Chinese and English`.
- Fast enough for `CPU real-time inference`.

## Usage
- [Use without Installation](docs/quick_use.md)
- [Install and Use Locally](docs/install.md)
- [Training on Custom Dataset](docs/training.md)

The Python API and model cards can be found in [this repo](https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#python-api) or on [HuggingFace](https://huggingface.co/myshell-ai).

**Contributing**

If you find this work useful, please consider contributing to this repo.

- Many thanks to [@fakerybakery](https://github.com/fakerybakery) for adding the Web UI and CLI part.

## Authors

- [Wenliang Zhao](https://wl-zhao.github.io) at Tsinghua University
- [Xumin Yu](https://yuxumin.github.io) at Tsinghua University
- [Zengyi Qin](https://www.qinzy.tech) (project lead) at MIT and MyShell

**Citation**
```
@software{zhao2024melo,
  author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
  title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
  url = {https://github.com/myshell-ai/MeloTTS},
  year = {2023}
}
```

## License

This library is under MIT License, which means it is free for both commercial and non-commercial use.

## Acknowledgements

This implementation is based on [TTS](https://github.com/coqui-ai/TTS), [VITS](https://github.com/jaywalnut310/vits), [VITS2](https://github.com/daniilrobnikov/vits2) and [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2). We appreciate their awesome work.


================================================
FILE: docs/install.md
================================================
## Install and Use Locally

### Table of Content
- [Linux and macOS Install](#linux-and-macos-install)
- [Docker Install for Windows and macOS](#docker-install)
- [Usage](#usage)
  - [Web UI](#webui)
  - [CLI](#cli)
  - [Python API](#python-api)

### Linux and macOS Install
The repo is developed and tested on `Ubuntu 20.04` and `Python 3.9`.
```bash
git clone https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
pip install -e .
python -m unidic download
```
If you encountered issues in macOS install, try the [Docker Install](#docker-install)

### Docker Install
To avoid compatibility issues, for Windows users and some macOS users, we suggest to run via Docker. Ensure that [you have Docker installed](https://docs.docker.com/engine/install/).

**Build Docker**

This could take a few minutes.
```bash
git clone https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
docker build -t melotts . 
```

**Run Docker**
```bash
docker run -it -p 8888:8888 melotts
```
If your local machine has GPU, then you can choose to run:
```bash
docker run --gpus all -it -p 8888:8888 melotts
```
Then open [http://localhost:8888](http://localhost:8888) in your browser to use the app.

## Usage

### WebUI

The WebUI supports muliple languages and voices. First, follow the installation steps. Then, simply run:

```bash
melo-ui
# Or: python melo/app.py
```

### CLI

You may use the MeloTTS CLI to interact with MeloTTS. The CLI may be invoked using either `melotts` or `melo`. Here are some examples:

**Read English text:**

```bash
melo "Text to read" output.wav
```

**Specify a language:**

```bash
melo "Text to read" output.wav --language EN
```

**Specify a speaker:**

```bash
melo "Text to read" output.wav --language EN --speaker EN-US
melo "Text to read" output.wav --language EN --speaker EN-AU
```

The available speakers are: `EN-Default`, `EN-US`, `EN-BR`, `EN_INDIA` `EN-AU`.

**Specify a speed:**

```bash
melo "Text to read" output.wav --language EN --speaker EN-US --speed 1.5
melo "Text to read" output.wav --speed 1.5
```

**Use a different language:**

```bash
melo "text-to-speech 领域近年来发展迅速" zh.wav -l ZH
```

**Load from a file:**

```bash
melo file.txt out.wav --file
```

The full API documentation may be found using:

```bash
melo --help
```

### Python API

#### English with Multiple Accents

```python
from melo.api import TTS

# Speed is adjustable
speed = 1.0

# CPU is sufficient for real-time inference.
# You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
device = 'auto' # Will automatically use GPU if available

# English 
text = "Did you ever hear a folk tale about a giant turtle?"
model = TTS(language='EN', device=device)
speaker_ids = model.hps.data.spk2id

# American accent
output_path = 'en-us.wav'
model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed)

# British accent
output_path = 'en-br.wav'
model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed)

# Indian accent
output_path = 'en-india.wav'
model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed)

# Australian accent
output_path = 'en-au.wav'
model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed)

# Default accent
output_path = 'en-default.wav'
model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)

```

#### Spanish
```python
from melo.api import TTS

# Speed is adjustable
speed = 1.0

# CPU is sufficient for real-time inference.
# You can also change to cuda:0
device = 'cpu'

text = "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante."
model = TTS(language='ES', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'es.wav'
model.tts_to_file(text, speaker_ids['ES'], output_path, speed=speed)
```

#### French

```python
from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante."
model = TTS(language='FR', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'fr.wav'
model.tts_to_file(text, speaker_ids['FR'], output_path, speed=speed)
```

#### Chinese

```python
from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。"
model = TTS(language='ZH', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'zh.wav'
model.tts_to_file(text, speaker_ids['ZH'], output_path, speed=speed)
```

#### Japanese

```python
from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "彼は毎朝ジョギングをして体を健康に保っています。"
model = TTS(language='JP', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'jp.wav'
model.tts_to_file(text, speaker_ids['JP'], output_path, speed=speed)
```

#### Korean

```python
from melo.api import TTS

# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0

text = "안녕하세요! 오늘은 날씨가 정말 좋네요."
model = TTS(language='KR', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'kr.wav'
model.tts_to_file(text, speaker_ids['KR'], output_path, speed=speed)
```


================================================
FILE: docs/quick_use.md
================================================
## Use MeloTTS without Installation

**Quick Demo**

- [Official live demo](https://app.myshell.ai/bot/UN77N3/1709094629) on Myshell.
- Hugging Face Space [live demo](https://huggingface.co/spaces/mrfakename/MeloTTS).

**Use on MyShell**

There are hundreds of TTS models on MyShell, much more than MeloTTS. For example:

English
- [gentle British male voice](https://app.myshell.ai/widget/nIfamm)
- [cheerful young female voice](https://app.myshell.ai/widget/AjIjqy)
- [sultry and robust male voice](https://app.myshell.ai/widget/zQJJN3)

Spanish
- [voz femenina adorable](https://app.myshell.ai/widget/buIZBf)
- [voz masculina joven](https://app.myshell.ai/widget/rayuiy)
- [voz de niña inmadura](https://app.myshell.ai/widget/mYFV3e)

French
- [voix adorable de fille](https://app.myshell.ai/widget/3IfEfy)
- [voix douce masculine](https://app.myshell.ai/widget/IRR3M3)
- [voix douce féminine](https://app.myshell.ai/widget/NRbaUj)

German
- [sanfte Männerstimme](https://app.myshell.ai/widget/JFnAn2)
- [sanfte Frauenstimme](https://app.myshell.ai/widget/MrU7Nb)
- [unreife Mädchenstimme](https://app.myshell.ai/widget/UFbYBj)

Portuguese
- [voz feminina nítida](https://app.myshell.ai/widget/VzMb6j)
- [voz de menino imaturo](https://app.myshell.ai/widget/nAzeei)
- [voz masculina sóbria](https://app.myshell.ai/widget/JZRNJz)

Russian
- [зрелый женский голос](https://app.myshell.ai/widget/6byMZ3)
- [зрелый мужской голос](https://app.myshell.ai/widget/NB7jmm)

Chinese
- [甜美女声](https://app.myshell.ai/widget/ymeUjm)
- [青年男声](https://app.myshell.ai/widget/NZnERb)

More can be found at the widget center of [MyShell.ai](https://app.myshell.ai/robot-workshop).


================================================
FILE: docs/training.md
================================================
## Training

Before training, please install MeloTTS in dev mode and go to the `melo` folder. 
```
pip install -e .
cd melo
```

### Data Preparation
To train a TTS model, we need to prepare the audio files and a metadata file. We recommend using 44100Hz audio files and the metadata file should have the following format:

```
path/to/audio_001.wav |<speaker_name>|<language_code>|<text_001>
path/to/audio_002.wav |<speaker_name>|<language_code>|<text_002>
```
The transcribed text can be obtained by ASR model, (e.g., [whisper](https://github.com/openai/whisper)). An example metadata can be found in `data/example/metadata.list`

We can then run the preprocessing code:
```
python preprocess_text.py --metadata data/example/metadata.list 
```
A config file `data/example/config.json` will be generated. Feel free to edit some hyper-parameters in that config file (for example, you may decrease the batch size if you have encountered the CUDA out-of-memory issue).

### Training
The training can be launched by:
```
bash train.sh <path/to/config.json> <num_of_gpus>
```

We have found for some machine the training will sometimes crash due to an [issue](https://github.com/pytorch/pytorch/issues/2530) of gloo. Therefore, we add an auto-resume wrapper in the `train.sh`.

### Inference
Simply run:
```
python infer.py --text "<some text here>" -m /path/to/checkpoint/G_<iter>.pth -o <output_dir>
```



================================================
FILE: melo/__init__.py
================================================


================================================
FILE: melo/api.py
================================================
import os
import re
import json
import torch
import librosa
import soundfile
import torchaudio
import numpy as np
import torch.nn as nn
from tqdm import tqdm
import torch

from . import utils
from . import commons
from .models import SynthesizerTrn
from .split_utils import split_sentence
from .mel_processing import spectrogram_torch, spectrogram_torch_conv
from .download_utils import load_or_download_config, load_or_download_model

class TTS(nn.Module):
    def __init__(self, 
                language,
                device='auto',
                use_hf=True,
                config_path=None,
                ckpt_path=None):
        super().__init__()
        if device == 'auto':
            device = 'cpu'
            if torch.cuda.is_available(): device = 'cuda'
            if torch.backends.mps.is_available(): device = 'mps'
        if 'cuda' in device:
            assert torch.cuda.is_available()

        # config_path = 
        hps = load_or_download_config(language, use_hf=use_hf, config_path=config_path)

        num_languages = hps.num_languages
        num_tones = hps.num_tones
        symbols = hps.symbols

        model = SynthesizerTrn(
            len(symbols),
            hps.data.filter_length // 2 + 1,
            hps.train.segment_size // hps.data.hop_length,
            n_speakers=hps.data.n_speakers,
            num_tones=num_tones,
            num_languages=num_languages,
            **hps.model,
        ).to(device)

        model.eval()
        self.model = model
        self.symbol_to_id = {s: i for i, s in enumerate(symbols)}
        self.hps = hps
        self.device = device
    
        # load state_dict
        checkpoint_dict = load_or_download_model(language, device, use_hf=use_hf, ckpt_path=ckpt_path)
        self.model.load_state_dict(checkpoint_dict['model'], strict=True)
        
        language = language.split('_')[0]
        self.language = 'ZH_MIX_EN' if language == 'ZH' else language # we support a ZH_MIX_EN model

    @staticmethod
    def audio_numpy_concat(segment_data_list, sr, speed=1.):
        audio_segments = []
        for segment_data in segment_data_list:
            audio_segments += segment_data.reshape(-1).tolist()
            audio_segments += [0] * int((sr * 0.05) / speed)
        audio_segments = np.array(audio_segments).astype(np.float32)
        return audio_segments

    @staticmethod
    def split_sentences_into_pieces(text, language, quiet=False):
        texts = split_sentence(text, language_str=language)
        if not quiet:
            print(" > Text split to sentences.")
            print('\n'.join(texts))
            print(" > ===========================")
        return texts

    def tts_to_file(self, text, speaker_id, output_path=None, sdp_ratio=0.2, noise_scale=0.6, noise_scale_w=0.8, speed=1.0, pbar=None, format=None, position=None, quiet=False,):
        language = self.language
        texts = self.split_sentences_into_pieces(text, language, quiet)
        audio_list = []
        if pbar:
            tx = pbar(texts)
        else:
            if position:
                tx = tqdm(texts, position=position)
            elif quiet:
                tx = texts
            else:
                tx = tqdm(texts)
        for t in tx:
            if language in ['EN', 'ZH_MIX_EN']:
                t = re.sub(r'([a-z])([A-Z])', r'\1 \2', t)
            device = self.device
            bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
            with torch.no_grad():
                x_tst = phones.to(device).unsqueeze(0)
                tones = tones.to(device).unsqueeze(0)
                lang_ids = lang_ids.to(device).unsqueeze(0)
                bert = bert.to(device).unsqueeze(0)
                ja_bert = ja_bert.to(device).unsqueeze(0)
                x_tst_lengths = torch.LongTensor([phones.size(0)]).to(device)
                del phones
                speakers = torch.LongTensor([speaker_id]).to(device)
                audio = self.model.infer(
                        x_tst,
                        x_tst_lengths,
                        speakers,
                        tones,
                        lang_ids,
                        bert,
                        ja_bert,
                        sdp_ratio=sdp_ratio,
                        noise_scale=noise_scale,
                        noise_scale_w=noise_scale_w,
                        length_scale=1. / speed,
                    )[0][0, 0].data.cpu().float().numpy()
                del x_tst, tones, lang_ids, bert, ja_bert, x_tst_lengths, speakers
                # 
            audio_list.append(audio)
        torch.cuda.empty_cache()
        audio = self.audio_numpy_concat(audio_list, sr=self.hps.data.sampling_rate, speed=speed)

        if output_path is None:
            return audio
        else:
            if format:
                soundfile.write(output_path, audio, self.hps.data.sampling_rate, format=format)
            else:
                soundfile.write(output_path, audio, self.hps.data.sampling_rate)


================================================
FILE: melo/app.py
================================================
# WebUI by mrfakename <X @realmrfakename / HF @mrfakename>
# Demo also available on HF Spaces: https://huggingface.co/spaces/mrfakename/MeloTTS
import gradio as gr
import os, torch, io
# os.system('python -m unidic download')
print("Make sure you've downloaded unidic (python -m unidic download) for this WebUI to work.")
from melo.api import TTS
speed = 1.0
import tempfile
import click
device = 'auto'
models = {
    'EN': TTS(language='EN', device=device),
    'ES': TTS(language='ES', device=device),
    'FR': TTS(language='FR', device=device),
    'ZH': TTS(language='ZH', device=device),
    'JP': TTS(language='JP', device=device),
    'KR': TTS(language='KR', device=device),
}
speaker_ids = models['EN'].hps.data.spk2id

default_text_dict = {
    'EN': 'The field of text-to-speech has seen rapid development recently.',
    'ES': 'El campo de la conversión de texto a voz ha experimentado un rápido desarrollo recientemente.',
    'FR': 'Le domaine de la synthèse vocale a connu un développement rapide récemment',
    'ZH': 'text-to-speech 领域近年来发展迅速',
    'JP': 'テキスト読み上げの分野は最近急速な発展を遂げています',
    'KR': '최근 텍스트 음성 변환 분야가 급속도로 발전하고 있습니다.',    
}
    
def synthesize(speaker, text, speed, language, progress=gr.Progress()):
    bio = io.BytesIO()
    models[language].tts_to_file(text, models[language].hps.data.spk2id[speaker], bio, speed=speed, pbar=progress.tqdm, format='wav')
    return bio.getvalue()
def load_speakers(language, text):
    if text in list(default_text_dict.values()):
        newtext = default_text_dict[language]
    else:
        newtext = text
    return gr.update(value=list(models[language].hps.data.spk2id.keys())[0], choices=list(models[language].hps.data.spk2id.keys())), newtext
with gr.Blocks() as demo:
    gr.Markdown('# MeloTTS WebUI\n\nA WebUI for MeloTTS.')
    with gr.Group():
        speaker = gr.Dropdown(speaker_ids.keys(), interactive=True, value='EN-US', label='Speaker')
        language = gr.Radio(['EN', 'ES', 'FR', 'ZH', 'JP', 'KR'], label='Language', value='EN')
        speed = gr.Slider(label='Speed', minimum=0.1, maximum=10.0, value=1.0, interactive=True, step=0.1)
        text = gr.Textbox(label="Text to speak", value=default_text_dict['EN'])
        language.input(load_speakers, inputs=[language, text], outputs=[speaker, text])
    btn = gr.Button('Synthesize', variant='primary')
    aud = gr.Audio(interactive=False)
    btn.click(synthesize, inputs=[speaker, text, speed, language], outputs=[aud])
    gr.Markdown('WebUI by [mrfakename](https://twitter.com/realmrfakename).')
@click.command()
@click.option('--share', '-s', is_flag=True, show_default=True, default=False, help="Expose a publicly-accessible shared Gradio link usable by anyone with the link. Only share the link with people you trust.")
@click.option('--host', '-h', default=None)
@click.option('--port', '-p', type=int, default=None)
def main(share, host, port):
    demo.queue(api_open=False).launch(show_api=False, share=share, server_name=host, server_port=port)

if __name__ == "__main__":
    main()


================================================
FILE: melo/attentions.py
================================================
import math
import torch
from torch import nn
from torch.nn import functional as F

from . import commons
import logging

logger = logging.getLogger(__name__)


class LayerNorm(nn.Module):
    def __init__(self, channels, eps=1e-5):
        super().__init__()
        self.channels = channels
        self.eps = eps

        self.gamma = nn.Parameter(torch.ones(channels))
        self.beta = nn.Parameter(torch.zeros(channels))

    def forward(self, x):
        x = x.transpose(1, -1)
        x = F.layer_norm(x, (self.channels,), self.gamma, self.beta, self.eps)
        return x.transpose(1, -1)


@torch.jit.script
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
    n_channels_int = n_channels[0]
    in_act = input_a + input_b
    t_act = torch.tanh(in_act[:, :n_channels_int, :])
    s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
    acts = t_act * s_act
    return acts


class Encoder(nn.Module):
    def __init__(
        self,
        hidden_channels,
        filter_channels,
        n_heads,
        n_layers,
        kernel_size=1,
        p_dropout=0.0,
        window_size=4,
        isflow=True,
        **kwargs
    ):
        super().__init__()
        self.hidden_channels = hidden_channels
        self.filter_channels = filter_channels
        self.n_heads = n_heads
        self.n_layers = n_layers
        self.kernel_size = kernel_size
        self.p_dropout = p_dropout
        self.window_size = window_size

        self.cond_layer_idx = self.n_layers
        if "gin_channels" in kwargs:
            self.gin_channels = kwargs["gin_channels"]
            if self.gin_channels != 0:
                self.spk_emb_linear = nn.Linear(self.gin_channels, self.hidden_channels)
                self.cond_layer_idx = (
                    kwargs["cond_layer_idx"] if "cond_layer_idx" in kwargs else 2
                )
                assert (
                    self.cond_layer_idx < self.n_layers
                ), "cond_layer_idx should be less than n_layers"
        self.drop = nn.Dropout(p_dropout)
        self.attn_layers = nn.ModuleList()
        self.norm_layers_1 = nn.ModuleList()
        self.ffn_layers = nn.ModuleList()
        self.norm_layers_2 = nn.ModuleList()

        for i in range(self.n_layers):
            self.attn_layers.append(
                MultiHeadAttention(
                    hidden_channels,
                    hidden_channels,
                    n_heads,
                    p_dropout=p_dropout,
                    window_size=window_size,
                )
            )
            self.norm_layers_1.append(LayerNorm(hidden_channels))
            self.ffn_layers.append(
                FFN(
                    hidden_channels,
                    hidden_channels,
                    filter_channels,
                    kernel_size,
                    p_dropout=p_dropout,
                )
            )
            self.norm_layers_2.append(LayerNorm(hidden_channels))

    def forward(self, x, x_mask, g=None):
        attn_mask = x_mask.unsqueeze(2) * x_mask.unsqueeze(-1)
        x = x * x_mask
        for i in range(self.n_layers):
            if i == self.cond_layer_idx and g is not None:
                g = self.spk_emb_linear(g.transpose(1, 2))
                g = g.transpose(1, 2)
                x = x + g
                x = x * x_mask
            y = self.attn_layers[i](x, x, attn_mask)
            y = self.drop(y)
            x = self.norm_layers_1[i](x + y)

            y = self.ffn_layers[i](x, x_mask)
            y = self.drop(y)
            x = self.norm_layers_2[i](x + y)
        x = x * x_mask
        return x


class Decoder(nn.Module):
    def __init__(
        self,
        hidden_channels,
        filter_channels,
        n_heads,
        n_layers,
        kernel_size=1,
        p_dropout=0.0,
        proximal_bias=False,
        proximal_init=True,
        **kwargs
    ):
        super().__init__()
        self.hidden_channels = hidden_channels
        self.filter_channels = filter_channels
        self.n_heads = n_heads
        self.n_layers = n_layers
        self.kernel_size = kernel_size
        self.p_dropout = p_dropout
        self.proximal_bias = proximal_bias
        self.proximal_init = proximal_init

        self.drop = nn.Dropout(p_dropout)
        self.self_attn_layers = nn.ModuleList()
        self.norm_layers_0 = nn.ModuleList()
        self.encdec_attn_layers = nn.ModuleList()
        self.norm_layers_1 = nn.ModuleList()
        self.ffn_layers = nn.ModuleList()
        self.norm_layers_2 = nn.ModuleList()
        for i in range(self.n_layers):
            self.self_attn_layers.append(
                MultiHeadAttention(
                    hidden_channels,
                    hidden_channels,
                    n_heads,
                    p_dropout=p_dropout,
                    proximal_bias=proximal_bias,
                    proximal_init=proximal_init,
                )
            )
            self.norm_layers_0.append(LayerNorm(hidden_channels))
            self.encdec_attn_layers.append(
                MultiHeadAttention(
                    hidden_channels, hidden_channels, n_heads, p_dropout=p_dropout
                )
            )
            self.norm_layers_1.append(LayerNorm(hidden_channels))
            self.ffn_layers.append(
                FFN(
                    hidden_channels,
                    hidden_channels,
                    filter_channels,
                    kernel_size,
                    p_dropout=p_dropout,
                    causal=True,
                )
            )
            self.norm_layers_2.append(LayerNorm(hidden_channels))

    def forward(self, x, x_mask, h, h_mask):
        """
        x: decoder input
        h: encoder output
        """
        self_attn_mask = commons.subsequent_mask(x_mask.size(2)).to(
            device=x.device, dtype=x.dtype
        )
        encdec_attn_mask = h_mask.unsqueeze(2) * x_mask.unsqueeze(-1)
        x = x * x_mask
        for i in range(self.n_layers):
            y = self.self_attn_layers[i](x, x, self_attn_mask)
            y = self.drop(y)
            x = self.norm_layers_0[i](x + y)

            y = self.encdec_attn_layers[i](x, h, encdec_attn_mask)
            y = self.drop(y)
            x = self.norm_layers_1[i](x + y)

            y = self.ffn_layers[i](x, x_mask)
            y = self.drop(y)
            x = self.norm_layers_2[i](x + y)
        x = x * x_mask
        return x


class MultiHeadAttention(nn.Module):
    def __init__(
        self,
        channels,
        out_channels,
        n_heads,
        p_dropout=0.0,
        window_size=None,
        heads_share=True,
        block_length=None,
        proximal_bias=False,
        proximal_init=False,
    ):
        super().__init__()
        assert channels % n_heads == 0

        self.channels = channels
        self.out_channels = out_channels
        self.n_heads = n_heads
        self.p_dropout = p_dropout
        self.window_size = window_size
        self.heads_share = heads_share
        self.block_length = block_length
        self.proximal_bias = proximal_bias
        self.proximal_init = proximal_init
        self.attn = None

        self.k_channels = channels // n_heads
        self.conv_q = nn.Conv1d(channels, channels, 1)
        self.conv_k = nn.Conv1d(channels, channels, 1)
        self.conv_v = nn.Conv1d(channels, channels, 1)
        self.conv_o = nn.Conv1d(channels, out_channels, 1)
        self.drop = nn.Dropout(p_dropout)

        if window_size is not None:
            n_heads_rel = 1 if heads_share else n_heads
            rel_stddev = self.k_channels**-0.5
            self.emb_rel_k = nn.Parameter(
                torch.randn(n_heads_rel, window_size * 2 + 1, self.k_channels)
                * rel_stddev
            )
            self.emb_rel_v = nn.Parameter(
                torch.randn(n_heads_rel, window_size * 2 + 1, self.k_channels)
                * rel_stddev
            )

        nn.init.xavier_uniform_(self.conv_q.weight)
        nn.init.xavier_uniform_(self.conv_k.weight)
        nn.init.xavier_uniform_(self.conv_v.weight)
        if proximal_init:
            with torch.no_grad():
                self.conv_k.weight.copy_(self.conv_q.weight)
                self.conv_k.bias.copy_(self.conv_q.bias)

    def forward(self, x, c, attn_mask=None):
        q = self.conv_q(x)
        k = self.conv_k(c)
        v = self.conv_v(c)

        x, self.attn = self.attention(q, k, v, mask=attn_mask)

        x = self.conv_o(x)
        return x

    def attention(self, query, key, value, mask=None):
        # reshape [b, d, t] -> [b, n_h, t, d_k]
        b, d, t_s, t_t = (*key.size(), query.size(2))
        query = query.view(b, self.n_heads, self.k_channels, t_t).transpose(2, 3)
        key = key.view(b, self.n_heads, self.k_channels, t_s).transpose(2, 3)
        value = value.view(b, self.n_heads, self.k_channels, t_s).transpose(2, 3)

        scores = torch.matmul(query / math.sqrt(self.k_channels), key.transpose(-2, -1))
        if self.window_size is not None:
            assert (
                t_s == t_t
            ), "Relative attention is only available for self-attention."
            key_relative_embeddings = self._get_relative_embeddings(self.emb_rel_k, t_s)
            rel_logits = self._matmul_with_relative_keys(
                query / math.sqrt(self.k_channels), key_relative_embeddings
            )
            scores_local = self._relative_position_to_absolute_position(rel_logits)
            scores = scores + scores_local
        if self.proximal_bias:
            assert t_s == t_t, "Proximal bias is only available for self-attention."
            scores = scores + self._attention_bias_proximal(t_s).to(
                device=scores.device, dtype=scores.dtype
            )
        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e4)
            if self.block_length is not None:
                assert (
                    t_s == t_t
                ), "Local attention is only available for self-attention."
                block_mask = (
                    torch.ones_like(scores)
                    .triu(-self.block_length)
                    .tril(self.block_length)
                )
                scores = scores.masked_fill(block_mask == 0, -1e4)
        p_attn = F.softmax(scores, dim=-1)  # [b, n_h, t_t, t_s]
        p_attn = self.drop(p_attn)
        output = torch.matmul(p_attn, value)
        if self.window_size is not None:
            relative_weights = self._absolute_position_to_relative_position(p_attn)
            value_relative_embeddings = self._get_relative_embeddings(
                self.emb_rel_v, t_s
            )
            output = output + self._matmul_with_relative_values(
                relative_weights, value_relative_embeddings
            )
        output = (
            output.transpose(2, 3).contiguous().view(b, d, t_t)
        )  # [b, n_h, t_t, d_k] -> [b, d, t_t]
        return output, p_attn

    def _matmul_with_relative_values(self, x, y):
        """
        x: [b, h, l, m]
        y: [h or 1, m, d]
        ret: [b, h, l, d]
        """
        ret = torch.matmul(x, y.unsqueeze(0))
        return ret

    def _matmul_with_relative_keys(self, x, y):
        """
        x: [b, h, l, d]
        y: [h or 1, m, d]
        ret: [b, h, l, m]
        """
        ret = torch.matmul(x, y.unsqueeze(0).transpose(-2, -1))
        return ret

    def _get_relative_embeddings(self, relative_embeddings, length):
        2 * self.window_size + 1
        # Pad first before slice to avoid using cond ops.
        pad_length = max(length - (self.window_size + 1), 0)
        slice_start_position = max((self.window_size + 1) - length, 0)
        slice_end_position = slice_start_position + 2 * length - 1
        if pad_length > 0:
            padded_relative_embeddings = F.pad(
                relative_embeddings,
                commons.convert_pad_shape([[0, 0], [pad_length, pad_length], [0, 0]]),
            )
        else:
            padded_relative_embeddings = relative_embeddings
        used_relative_embeddings = padded_relative_embeddings[
            :, slice_start_position:slice_end_position
        ]
        return used_relative_embeddings

    def _relative_position_to_absolute_position(self, x):
        """
        x: [b, h, l, 2*l-1]
        ret: [b, h, l, l]
        """
        batch, heads, length, _ = x.size()
        # Concat columns of pad to shift from relative to absolute indexing.
        x = F.pad(x, commons.convert_pad_shape([[0, 0], [0, 0], [0, 0], [0, 1]]))

        # Concat extra elements so to add up to shape (len+1, 2*len-1).
        x_flat = x.view([batch, heads, length * 2 * length])
        x_flat = F.pad(
            x_flat, commons.convert_pad_shape([[0, 0], [0, 0], [0, length - 1]])
        )

        # Reshape and slice out the padded elements.
        x_final = x_flat.view([batch, heads, length + 1, 2 * length - 1])[
            :, :, :length, length - 1 :
        ]
        return x_final

    def _absolute_position_to_relative_position(self, x):
        """
        x: [b, h, l, l]
        ret: [b, h, l, 2*l-1]
        """
        batch, heads, length, _ = x.size()
        # pad along column
        x = F.pad(
            x, commons.convert_pad_shape([[0, 0], [0, 0], [0, 0], [0, length - 1]])
        )
        x_flat = x.view([batch, heads, length**2 + length * (length - 1)])
        # add 0's in the beginning that will skew the elements after reshape
        x_flat = F.pad(x_flat, commons.convert_pad_shape([[0, 0], [0, 0], [length, 0]]))
        x_final = x_flat.view([batch, heads, length, 2 * length])[:, :, :, 1:]
        return x_final

    def _attention_bias_proximal(self, length):
        """Bias for self-attention to encourage attention to close positions.
        Args:
          length: an integer scalar.
        Returns:
          a Tensor with shape [1, 1, length, length]
        """
        r = torch.arange(length, dtype=torch.float32)
        diff = torch.unsqueeze(r, 0) - torch.unsqueeze(r, 1)
        return torch.unsqueeze(torch.unsqueeze(-torch.log1p(torch.abs(diff)), 0), 0)


class FFN(nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        filter_channels,
        kernel_size,
        p_dropout=0.0,
        activation=None,
        causal=False,
    ):
        super().__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.filter_channels = filter_channels
        self.kernel_size = kernel_size
        self.p_dropout = p_dropout
        self.activation = activation
        self.causal = causal

        if causal:
            self.padding = self._causal_padding
        else:
            self.padding = self._same_padding

        self.conv_1 = nn.Conv1d(in_channels, filter_channels, kernel_size)
        self.conv_2 = nn.Conv1d(filter_channels, out_channels, kernel_size)
        self.drop = nn.Dropout(p_dropout)

    def forward(self, x, x_mask):
        x = self.conv_1(self.padding(x * x_mask))
        if self.activation == "gelu":
            x = x * torch.sigmoid(1.702 * x)
        else:
            x = torch.relu(x)
        x = self.drop(x)
        x = self.conv_2(self.padding(x * x_mask))
        return x * x_mask

    def _causal_padding(self, x):
        if self.kernel_size == 1:
            return x
        pad_l = self.kernel_size - 1
        pad_r = 0
        padding = [[0, 0], [0, 0], [pad_l, pad_r]]
        x = F.pad(x, commons.convert_pad_shape(padding))
        return x

    def _same_padding(self, x):
        if self.kernel_size == 1:
            return x
        pad_l = (self.kernel_size - 1) // 2
        pad_r = self.kernel_size // 2
        padding = [[0, 0], [0, 0], [pad_l, pad_r]]
        x = F.pad(x, commons.convert_pad_shape(padding))
        return x


================================================
FILE: melo/commons.py
================================================
import math
import torch
from torch.nn import functional as F


def init_weights(m, mean=0.0, std=0.01):
    classname = m.__class__.__name__
    if classname.find("Conv") != -1:
        m.weight.data.normal_(mean, std)


def get_padding(kernel_size, dilation=1):
    return int((kernel_size * dilation - dilation) / 2)


def convert_pad_shape(pad_shape):
    layer = pad_shape[::-1]
    pad_shape = [item for sublist in layer for item in sublist]
    return pad_shape


def intersperse(lst, item):
    result = [item] * (len(lst) * 2 + 1)
    result[1::2] = lst
    return result


def kl_divergence(m_p, logs_p, m_q, logs_q):
    """KL(P||Q)"""
    kl = (logs_q - logs_p) - 0.5
    kl += (
        0.5 * (torch.exp(2.0 * logs_p) + ((m_p - m_q) ** 2)) * torch.exp(-2.0 * logs_q)
    )
    return kl


def rand_gumbel(shape):
    """Sample from the Gumbel distribution, protect from overflows."""
    uniform_samples = torch.rand(shape) * 0.99998 + 0.00001
    return -torch.log(-torch.log(uniform_samples))


def rand_gumbel_like(x):
    g = rand_gumbel(x.size()).to(dtype=x.dtype, device=x.device)
    return g


def slice_segments(x, ids_str, segment_size=4):
    ret = torch.zeros_like(x[:, :, :segment_size])
    for i in range(x.size(0)):
        idx_str = ids_str[i]
        idx_end = idx_str + segment_size
        ret[i] = x[i, :, idx_str:idx_end]
    return ret


def rand_slice_segments(x, x_lengths=None, segment_size=4):
    b, d, t = x.size()
    if x_lengths is None:
        x_lengths = t
    ids_str_max = x_lengths - segment_size + 1
    ids_str = (torch.rand([b]).to(device=x.device) * ids_str_max).to(dtype=torch.long)
    ret = slice_segments(x, ids_str, segment_size)
    return ret, ids_str


def get_timing_signal_1d(length, channels, min_timescale=1.0, max_timescale=1.0e4):
    position = torch.arange(length, dtype=torch.float)
    num_timescales = channels // 2
    log_timescale_increment = math.log(float(max_timescale) / float(min_timescale)) / (
        num_timescales - 1
    )
    inv_timescales = min_timescale * torch.exp(
        torch.arange(num_timescales, dtype=torch.float) * -log_timescale_increment
    )
    scaled_time = position.unsqueeze(0) * inv_timescales.unsqueeze(1)
    signal = torch.cat([torch.sin(scaled_time), torch.cos(scaled_time)], 0)
    signal = F.pad(signal, [0, 0, 0, channels % 2])
    signal = signal.view(1, channels, length)
    return signal


def add_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4):
    b, channels, length = x.size()
    signal = get_timing_signal_1d(length, channels, min_timescale, max_timescale)
    return x + signal.to(dtype=x.dtype, device=x.device)


def cat_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4, axis=1):
    b, channels, length = x.size()
    signal = get_timing_signal_1d(length, channels, min_timescale, max_timescale)
    return torch.cat([x, signal.to(dtype=x.dtype, device=x.device)], axis)


def subsequent_mask(length):
    mask = torch.tril(torch.ones(length, length)).unsqueeze(0).unsqueeze(0)
    return mask


@torch.jit.script
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
    n_channels_int = n_channels[0]
    in_act = input_a + input_b
    t_act = torch.tanh(in_act[:, :n_channels_int, :])
    s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
    acts = t_act * s_act
    return acts


def convert_pad_shape(pad_shape):
    layer = pad_shape[::-1]
    pad_shape = [item for sublist in layer for item in sublist]
    return pad_shape


def shift_1d(x):
    x = F.pad(x, convert_pad_shape([[0, 0], [0, 0], [1, 0]]))[:, :, :-1]
    return x


def sequence_mask(length, max_length=None):
    if max_length is None:
        max_length = length.max()
    x = torch.arange(max_length, dtype=length.dtype, device=length.device)
    return x.unsqueeze(0) < length.unsqueeze(1)


def generate_path(duration, mask):
    """
    duration: [b, 1, t_x]
    mask: [b, 1, t_y, t_x]
    """

    b, _, t_y, t_x = mask.shape
    cum_duration = torch.cumsum(duration, -1)

    cum_duration_flat = cum_duration.view(b * t_x)
    path = sequence_mask(cum_duration_flat, t_y).to(mask.dtype)
    path = path.view(b, t_x, t_y)
    path = path - F.pad(path, convert_pad_shape([[0, 0], [1, 0], [0, 0]]))[:, :-1]
    path = path.unsqueeze(1).transpose(2, 3) * mask
    return path


def clip_grad_value_(parameters, clip_value, norm_type=2):
    if isinstance(parameters, torch.Tensor):
        parameters = [parameters]
    parameters = list(filter(lambda p: p.grad is not None, parameters))
    norm_type = float(norm_type)
    if clip_value is not None:
        clip_value = float(clip_value)

    total_norm = 0
    for p in parameters:
        param_norm = p.grad.data.norm(norm_type)
        total_norm += param_norm.item() ** norm_type
        if clip_value is not None:
            p.grad.data.clamp_(min=-clip_value, max=clip_value)
    total_norm = total_norm ** (1.0 / norm_type)
    return total_norm


================================================
FILE: melo/configs/config.json
================================================
{
  "train": {
    "log_interval": 200,
    "eval_interval": 1000,
    "seed": 52,
    "epochs": 10000,
    "learning_rate": 0.0003,
    "betas": [
      0.8,
      0.99
    ],
    "eps": 1e-09,
    "batch_size": 6,
    "fp16_run": false,
    "lr_decay": 0.999875,
    "segment_size": 16384,
    "init_lr_ratio": 1,
    "warmup_epochs": 0,
    "c_mel": 45,
    "c_kl": 1.0,
    "skip_optimizer": true
  },
  "data": {
    "training_files": "",
    "validation_files": "",
    "max_wav_value": 32768.0,
    "sampling_rate": 44100,
    "filter_length": 2048,
    "hop_length": 512,
    "win_length": 2048,
    "n_mel_channels": 128,
    "mel_fmin": 0.0,
    "mel_fmax": null,
    "add_blank": true,
    "n_speakers": 256,
    "cleaned_text": true,
    "spk2id": {}
  },
  "model": {
    "use_spk_conditioned_encoder": true,
    "use_noise_scaled_mas": true,
    "use_mel_posterior_encoder": false,
    "use_duration_discriminator": true,
    "inter_channels": 192,
    "hidden_channels": 192,
    "filter_channels": 768,
    "n_heads": 2,
    "n_layers": 6,
    "n_layers_trans_flow": 3,
    "kernel_size": 3,
    "p_dropout": 0.1,
    "resblock": "1",
    "resblock_kernel_sizes": [
      3,
      7,
      11
    ],
    "resblock_dilation_sizes": [
      [
        1,
        3,
        5
      ],
      [
        1,
        3,
        5
      ],
      [
        1,
        3,
        5
      ]
    ],
    "upsample_rates": [
      8,
      8,
      2,
      2,
      2
    ],
    "upsample_initial_channel": 512,
    "upsample_kernel_sizes": [
      16,
      16,
      8,
      2,
      2
    ],
    "n_layers_q": 3,
    "use_spectral_norm": false,
    "gin_channels": 256
  }
}


================================================
FILE: melo/data/example/metadata.list
================================================
data/example/wavs/000.wav|EN-default|EN|Well, there are always new trends and styles emerging in the fashion world, but I think some of the biggest trends at the moment include sustainability and ethical fashion, streetwear and athleisure, and oversized and deconstructed silhouettes.
data/example/wavs/001.wav|EN-default|EN|Many designers and brands are focusing on creating more environmentally-friendly and socially responsible clothing, while others are incorporating elements of sportswear and casual wear into their collections.
data/example/wavs/002.wav|EN-default|EN|And there's a growing interest in looser, more relaxed shapes and unconventional materials and finishes.
data/example/wavs/003.wav|EN-default|EN|That's really insightful.
data/example/wavs/004.wav|EN-default|EN|What do you think are some of the benefits of following fashion trends?
data/example/wavs/005.wav|EN-default|EN|Well, I think one of the main benefits of following fashion trends is that it can be a way to express your creativity, personality, and individuality.
data/example/wavs/006.wav|EN-default|EN|Fashion can be a powerful tool for self-expression and can help you feel more confident and comfortable in your own skin.
data/example/wavs/007.wav|EN-default|EN|Additionally, staying up-to-date with fashion trends can help you develop your own sense of style and learn how to put together outfits that make you look and feel great.
data/example/wavs/008.wav|EN-default|EN|That's a great point.
data/example/wavs/009.wav|EN-default|EN|Do you think it's important to stay on top of the latest fashion trends, or is it more important to focus on timeless style?
data/example/wavs/010.wav|EN-default|EN|I think it's really up to each individual to decide what approach to fashion works best for them.
data/example/wavs/011.wav|EN-default|EN|Some people prefer to stick with classic, timeless styles that never go out of fashion, while others enjoy experimenting with new and innovative trends.
data/example/wavs/012.wav|EN-default|EN|Ultimately, fashion is about personal expression and there's no right or wrong way to approach it.
data/example/wavs/013.wav|EN-default|EN|The most important thing is to wear what makes you feel good and confident.
data/example/wavs/014.wav|EN-default|EN|I completely agree.
data/example/wavs/015.wav|EN-default|EN|Some popular ones that come to mind are oversized blazers, statement sleeves, printed maxi dresses, and chunky sneakers.
data/example/wavs/016.wav|EN-default|EN|It's been really interesting chatting with you about fashion.
data/example/wavs/017.wav|EN-default|EN|That's a good point.
data/example/wavs/018.wav|EN-default|EN|What do you think are some current fashion trends that are popular right now?
data/example/wavs/019.wav|EN-default|EN|There are so many trends happening right now, it's hard to keep track of them all!


================================================
FILE: melo/data_utils.py
================================================
import os
import random
import torch
import torch.utils.data
from tqdm import tqdm
from loguru import logger
import commons
from mel_processing import spectrogram_torch, mel_spectrogram_torch
from utils import load_filepaths_and_text
from utils import load_wav_to_torch_librosa as load_wav_to_torch
from text import cleaned_text_to_sequence, get_bert
import numpy as np

"""Multi speaker version"""


class TextAudioSpeakerLoader(torch.utils.data.Dataset):
    """
    1) loads audio, speaker_id, text pairs
    2) normalizes text and converts them to sequences of integers
    3) computes spectrograms from audio files.
    """

    def __init__(self, audiopaths_sid_text, hparams):
        self.audiopaths_sid_text = load_filepaths_and_text(audiopaths_sid_text)
        self.max_wav_value = hparams.max_wav_value
        self.sampling_rate = hparams.sampling_rate
        self.filter_length = hparams.filter_length
        self.hop_length = hparams.hop_length
        self.win_length = hparams.win_length
        self.sampling_rate = hparams.sampling_rate
        self.spk_map = hparams.spk2id
        self.hparams = hparams
        self.disable_bert = getattr(hparams, "disable_bert", False)

        self.use_mel_spec_posterior = getattr(
            hparams, "use_mel_posterior_encoder", False
        )
        if self.use_mel_spec_posterior:
            self.n_mel_channels = getattr(hparams, "n_mel_channels", 80)

        self.cleaned_text = getattr(hparams, "cleaned_text", False)

        self.add_blank = hparams.add_blank
        self.min_text_len = getattr(hparams, "min_text_len", 1)
        self.max_text_len = getattr(hparams, "max_text_len", 300)

        random.seed(1234)
        random.shuffle(self.audiopaths_sid_text)
        self._filter()


    def _filter(self):
        """
        Filter text & store spec lengths
        """
        # Store spectrogram lengths for Bucketing
        # wav_length ~= file_size / (wav_channels * Bytes per dim) = file_size / (1 * 2)
        # spec_length = wav_length // hop_length

        audiopaths_sid_text_new = []
        lengths = []
        skipped = 0
        logger.info("Init dataset...")
        for item in tqdm(
            self.audiopaths_sid_text
        ):
            try:
                _id, spk, language, text, phones, tone, word2ph = item
            except:
                print(item)
                raise
            audiopath = f"{_id}"
            if self.min_text_len <= len(phones) and len(phones) <= self.max_text_len:
                phones = phones.split(" ")
                tone = [int(i) for i in tone.split(" ")]
                word2ph = [int(i) for i in word2ph.split(" ")]
                audiopaths_sid_text_new.append(
                    [audiopath, spk, language, text, phones, tone, word2ph]
                )
                lengths.append(os.path.getsize(audiopath) // (2 * self.hop_length))
            else:
                skipped += 1
        logger.info(f'min: {min(lengths)}; max: {max(lengths)}' )
        logger.info(
            "skipped: "
            + str(skipped)
            + ", total: "
            + str(len(self.audiopaths_sid_text))
        )
        self.audiopaths_sid_text = audiopaths_sid_text_new
        self.lengths = lengths

    def get_audio_text_speaker_pair(self, audiopath_sid_text):
        # separate filename, speaker_id and text
        audiopath, sid, language, text, phones, tone, word2ph = audiopath_sid_text

        bert, ja_bert, phones, tone, language = self.get_text(
            text, word2ph, phones, tone, language, audiopath
        )

        spec, wav = self.get_audio(audiopath)
        sid = int(getattr(self.spk_map, sid, '0'))
        sid = torch.LongTensor([sid])
        return (phones, spec, wav, sid, tone, language, bert, ja_bert)

    def get_audio(self, filename):
        audio_norm, sampling_rate = load_wav_to_torch(filename, self.sampling_rate)
        if sampling_rate != self.sampling_rate:
            raise ValueError(
                "{} {} SR doesn't match target {} SR".format(
                    filename, sampling_rate, self.sampling_rate
                )
            )
        # NOTE: normalize has been achieved by torchaudio
        # audio_norm = audio / self.max_wav_value
        audio_norm = audio_norm.unsqueeze(0)
        spec_filename = filename.replace(".wav", ".spec.pt")
        if self.use_mel_spec_posterior:
            spec_filename = spec_filename.replace(".spec.pt", ".mel.pt")
        try:
            spec = torch.load(spec_filename)
            assert False
        except:
            if self.use_mel_spec_posterior:
                spec = mel_spectrogram_torch(
                    audio_norm,
                    self.filter_length,
                    self.n_mel_channels,
                    self.sampling_rate,
                    self.hop_length,
                    self.win_length,
                    self.hparams.mel_fmin,
                    self.hparams.mel_fmax,
                    center=False,
                )
            else:
                spec = spectrogram_torch(
                    audio_norm,
                    self.filter_length,
                    self.sampling_rate,
                    self.hop_length,
                    self.win_length,
                    center=False,
                )
            spec = torch.squeeze(spec, 0)
            torch.save(spec, spec_filename)
        return spec, audio_norm

    def get_text(self, text, word2ph, phone, tone, language_str, wav_path):
        phone, tone, language = cleaned_text_to_sequence(phone, tone, language_str)
        if self.add_blank:
            phone = commons.intersperse(phone, 0)
            tone = commons.intersperse(tone, 0)
            language = commons.intersperse(language, 0)
            for i in range(len(word2ph)):
                word2ph[i] = word2ph[i] * 2
            word2ph[0] += 1
        bert_path = wav_path.replace(".wav", ".bert.pt")
        try:
            bert = torch.load(bert_path)
            assert bert.shape[-1] == len(phone)
        except Exception as e:
            print(e, wav_path, bert_path, bert.shape, len(phone))
            bert = get_bert(text, word2ph, language_str)
            torch.save(bert, bert_path)
            assert bert.shape[-1] == len(phone), phone

        if self.disable_bert:
            bert = torch.zeros(1024, len(phone))
            ja_bert = torch.zeros(768, len(phone))
        else:
            if language_str in ["ZH"]:
                bert = bert
                ja_bert = torch.zeros(768, len(phone))
            elif language_str in ["JP", "EN", "ZH_MIX_EN", "KR", 'SP', 'ES', 'FR', 'DE', 'RU']:
                ja_bert = bert
                bert = torch.zeros(1024, len(phone))
            else:
                raise
                bert = torch.zeros(1024, len(phone))
                ja_bert = torch.zeros(768, len(phone))
        assert bert.shape[-1] == len(phone)
        phone = torch.LongTensor(phone)
        tone = torch.LongTensor(tone)
        language = torch.LongTensor(language)
        return bert, ja_bert, phone, tone, language

    def get_sid(self, sid):
        sid = torch.LongTensor([int(sid)])
        return sid

    def __getitem__(self, index):
        return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index])

    def __len__(self):
        return len(self.audiopaths_sid_text)


class TextAudioSpeakerCollate:
    """Zero-pads model inputs and targets"""

    def __init__(self, return_ids=False):
        self.return_ids = return_ids

    def __call__(self, batch):
        """Collate's training batch from normalized text, audio and speaker identities
        PARAMS
        ------
        batch: [text_normalized, spec_normalized, wav_normalized, sid]
        """
        # Right zero-pad all one-hot text sequences to max input length
        _, ids_sorted_decreasing = torch.sort(
            torch.LongTensor([x[1].size(1) for x in batch]), dim=0, descending=True
        )

        max_text_len = max([len(x[0]) for x in batch])
        max_spec_len = max([x[1].size(1) for x in batch])
        max_wav_len = max([x[2].size(1) for x in batch])

        text_lengths = torch.LongTensor(len(batch))
        spec_lengths = torch.LongTensor(len(batch))
        wav_lengths = torch.LongTensor(len(batch))
        sid = torch.LongTensor(len(batch))

        text_padded = torch.LongTensor(len(batch), max_text_len)
        tone_padded = torch.LongTensor(len(batch), max_text_len)
        language_padded = torch.LongTensor(len(batch), max_text_len)
        bert_padded = torch.FloatTensor(len(batch), 1024, max_text_len)
        ja_bert_padded = torch.FloatTensor(len(batch), 768, max_text_len)

        spec_padded = torch.FloatTensor(len(batch), batch[0][1].size(0), max_spec_len)
        wav_padded = torch.FloatTensor(len(batch), 1, max_wav_len)
        text_padded.zero_()
        tone_padded.zero_()
        language_padded.zero_()
        spec_padded.zero_()
        wav_padded.zero_()
        bert_padded.zero_()
        ja_bert_padded.zero_()
        for i in range(len(ids_sorted_decreasing)):
            row = batch[ids_sorted_decreasing[i]]

            text = row[0]
            text_padded[i, : text.size(0)] = text
            text_lengths[i] = text.size(0)

            spec = row[1]
            spec_padded[i, :, : spec.size(1)] = spec
            spec_lengths[i] = spec.size(1)

            wav = row[2]
            wav_padded[i, :, : wav.size(1)] = wav
            wav_lengths[i] = wav.size(1)

            sid[i] = row[3]

            tone = row[4]
            tone_padded[i, : tone.size(0)] = tone

            language = row[5]
            language_padded[i, : language.size(0)] = language

            bert = row[6]
            bert_padded[i, :, : bert.size(1)] = bert

            ja_bert = row[7]
            ja_bert_padded[i, :, : ja_bert.size(1)] = ja_bert

        return (
            text_padded,
            text_lengths,
            spec_padded,
            spec_lengths,
            wav_padded,
            wav_lengths,
            sid,
            tone_padded,
            language_padded,
            bert_padded,
            ja_bert_padded,
        )


class DistributedBucketSampler(torch.utils.data.distributed.DistributedSampler):
    """
    Maintain similar input lengths in a batch.
    Length groups are specified by boundaries.
    Ex) boundaries = [b1, b2, b3] -> any batch is included either {x | b1 < length(x) <=b2} or {x | b2 < length(x) <= b3}.

    It removes samples which are not included in the boundaries.
    Ex) boundaries = [b1, b2, b3] -> any x s.t. length(x) <= b1 or length(x) > b3 are discarded.
    """

    def __init__(
        self,
        dataset,
        batch_size,
        boundaries,
        num_replicas=None,
        rank=None,
        shuffle=True,
    ):
        super().__init__(dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle)
        self.lengths = dataset.lengths
        self.batch_size = batch_size
        self.boundaries = boundaries

        self.buckets, self.num_samples_per_bucket = self._create_buckets()
        self.total_size = sum(self.num_samples_per_bucket)
        self.num_samples = self.total_size // self.num_replicas
        print('buckets:', self.num_samples_per_bucket)

    def _create_buckets(self):
        buckets = [[] for _ in range(len(self.boundaries) - 1)]
        for i in range(len(self.lengths)):
            length = self.lengths[i]
            idx_bucket = self._bisect(length)
            if idx_bucket != -1:
                buckets[idx_bucket].append(i)

        try:
            for i in range(len(buckets) - 1, 0, -1):
                if len(buckets[i]) == 0:
                    buckets.pop(i)
                    self.boundaries.pop(i + 1)
            assert all(len(bucket) > 0 for bucket in buckets)
        # When one bucket is not traversed
        except Exception as e:
            print("Bucket warning ", e)
            for i in range(len(buckets) - 1, -1, -1):
                if len(buckets[i]) == 0:
                    buckets.pop(i)
                    self.boundaries.pop(i + 1)

        num_samples_per_bucket = []
        for i in range(len(buckets)):
            len_bucket = len(buckets[i])
            total_batch_size = self.num_replicas * self.batch_size
            rem = (
                total_batch_size - (len_bucket % total_batch_size)
            ) % total_batch_size
            num_samples_per_bucket.append(len_bucket + rem)
        return buckets, num_samples_per_bucket

    def __iter__(self):
        # deterministically shuffle based on epoch
        g = torch.Generator()
        g.manual_seed(self.epoch)

        indices = []
        if self.shuffle:
            for bucket in self.buckets:
                indices.append(torch.randperm(len(bucket), generator=g).tolist())
        else:
            for bucket in self.buckets:
                indices.append(list(range(len(bucket))))

        batches = []
        for i in range(len(self.buckets)):
            bucket = self.buckets[i]
            len_bucket = len(bucket)
            if len_bucket == 0:
                continue
            ids_bucket = indices[i]
            num_samples_bucket = self.num_samples_per_bucket[i]

            # add extra samples to make it evenly divisible
            rem = num_samples_bucket - len_bucket
            ids_bucket = (
                ids_bucket
                + ids_bucket * (rem // len_bucket)
                + ids_bucket[: (rem % len_bucket)]
            )

            # subsample
            ids_bucket = ids_bucket[self.rank :: self.num_replicas]

            # batching
            for j in range(len(ids_bucket) // self.batch_size):
                batch = [
                    bucket[idx]
                    for idx in ids_bucket[
                        j * self.batch_size : (j + 1) * self.batch_size
                    ]
                ]
                batches.append(batch)

        if self.shuffle:
            batch_ids = torch.randperm(len(batches), generator=g).tolist()
            batches = [batches[i] for i in batch_ids]
        self.batches = batches

        assert len(self.batches) * self.batch_size == self.num_samples
        return iter(self.batches)

    def _bisect(self, x, lo=0, hi=None):
        if hi is None:
            hi = len(self.boundaries) - 1

        if hi > lo:
            mid = (hi + lo) // 2
            if self.boundaries[mid] < x and x <= self.boundaries[mid + 1]:
                return mid
            elif x <= self.boundaries[mid]:
                return self._bisect(x, lo, mid)
            else:
                return self._bisect(x, mid + 1, hi)
        else:
            return -1

    def __len__(self):
        return self.num_samples // self.batch_size


================================================
FILE: melo/download_utils.py
================================================
import torch
import os
from . import utils
from cached_path import cached_path
from huggingface_hub import hf_hub_download

DOWNLOAD_CKPT_URLS = {
    'EN': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/EN/checkpoint.pth',
    'EN_V2': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/EN_V2/checkpoint.pth',
    'FR': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/FR/checkpoint.pth',
    'JP': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/JP/checkpoint.pth',
    'ES': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/ES/checkpoint.pth',
    'ZH': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/ZH/checkpoint.pth',
    'KR': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/KR/checkpoint.pth',
}

DOWNLOAD_CONFIG_URLS = {
    'EN': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/EN/config.json',
    'EN_V2': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/EN_V2/config.json',
    'FR': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/FR/config.json',
    'JP': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/JP/config.json',
    'ES': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/ES/config.json',
    'ZH': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/ZH/config.json',
    'KR': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/KR/config.json',
}

PRETRAINED_MODELS = {
    'G.pth': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/pretrained/G.pth',
    'D.pth': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/pretrained/D.pth',
    'DUR.pth': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/pretrained/DUR.pth',
}

LANG_TO_HF_REPO_ID = {
    'EN': 'myshell-ai/MeloTTS-English',
    'EN_V2': 'myshell-ai/MeloTTS-English-v2',
    'EN_NEWEST': 'myshell-ai/MeloTTS-English-v3',
    'FR': 'myshell-ai/MeloTTS-French',
    'JP': 'myshell-ai/MeloTTS-Japanese',
    'ES': 'myshell-ai/MeloTTS-Spanish',
    'ZH': 'myshell-ai/MeloTTS-Chinese',
    'KR': 'myshell-ai/MeloTTS-Korean',
}

def load_or_download_config(locale, use_hf=True, config_path=None):
    if config_path is None:
        language = locale.split('-')[0].upper()
        if use_hf:
            assert language in LANG_TO_HF_REPO_ID
            config_path = hf_hub_download(repo_id=LANG_TO_HF_REPO_ID[language], filename="config.json")
        else:
            assert language in DOWNLOAD_CONFIG_URLS
            config_path = cached_path(DOWNLOAD_CONFIG_URLS[language])
    return utils.get_hparams_from_file(config_path)

def load_or_download_model(locale, device, use_hf=True, ckpt_path=None):
    if ckpt_path is None:
        language = locale.split('-')[0].upper()
        if use_hf:
            assert language in LANG_TO_HF_REPO_ID
            ckpt_path = hf_hub_download(repo_id=LANG_TO_HF_REPO_ID[language], filename="checkpoint.pth")
        else:
            assert language in DOWNLOAD_CKPT_URLS
            ckpt_path = cached_path(DOWNLOAD_CKPT_URLS[language])
    return torch.load(ckpt_path, map_location=device)

def load_pretrain_model():
    return [cached_path(url) for url in PRETRAINED_MODELS.values()]


================================================
FILE: melo/infer.py
================================================
import os
import click
from melo.api import TTS

    
    
@click.command()
@click.option('--ckpt_path', '-m', type=str, default=None, help="Path to the checkpoint file")
@click.option('--text', '-t', type=str, default=None, help="Text to speak")
@click.option('--language', '-l', type=str, default="EN", help="Language of the model")
@click.option('--output_dir', '-o', type=str, default="outputs", help="Path to the output")
def main(ckpt_path, text, language, output_dir):
    if ckpt_path is None:
        raise ValueError("The model_path must be specified")
    
    config_path = os.path.join(os.path.dirname(ckpt_path), 'config.json')
    model = TTS(language=language, config_path=config_path, ckpt_path=ckpt_path)
    
    for spk_name, spk_id in model.hps.data.spk2id.items():
        save_path = f'{output_dir}/{spk_name}/output.wav'
        os.makedirs(os.path.dirname(save_path), exist_ok=True)
        model.tts_to_file(text, spk_id, save_path)

if __name__ == "__main__":
    main()


================================================
FILE: melo/init_downloads.py
================================================


if __name__ == '__main__':

    from melo.api import TTS
    device = 'auto'
    models = {
        'EN': TTS(language='EN', device=device),
        'ES': TTS(language='ES', device=device),
        'FR': TTS(language='FR', device=device),
        'ZH': TTS(language='ZH', device=device),
        'JP': TTS(language='JP', device=device),
        'KR': TTS(language='KR', device=device),
    }

================================================
FILE: melo/losses.py
================================================
import torch


def feature_loss(fmap_r, fmap_g):
    loss = 0
    for dr, dg in zip(fmap_r, fmap_g):
        for rl, gl in zip(dr, dg):
            rl = rl.float().detach()
            gl = gl.float()
            loss += torch.mean(torch.abs(rl - gl))

    return loss * 2


def discriminator_loss(disc_real_outputs, disc_generated_outputs):
    loss = 0
    r_losses = []
    g_losses = []
    for dr, dg in zip(disc_real_outputs, disc_generated_outputs):
        dr = dr.float()
        dg = dg.float()
        r_loss = torch.mean((1 - dr) ** 2)
        g_loss = torch.mean(dg**2)
        loss += r_loss + g_loss
        r_losses.append(r_loss.item())
        g_losses.append(g_loss.item())

    return loss, r_losses, g_losses


def generator_loss(disc_outputs):
    loss = 0
    gen_losses = []
    for dg in disc_outputs:
        dg = dg.float()
        l = torch.mean((1 - dg) ** 2)
        gen_losses.append(l)
        loss += l

    return loss, gen_losses


def kl_loss(z_p, logs_q, m_p, logs_p, z_mask):
    """
    z_p, logs_q: [b, h, t_t]
    m_p, logs_p: [b, h, t_t]
    """
    z_p = z_p.float()
    logs_q = logs_q.float()
    m_p = m_p.float()
    logs_p = logs_p.float()
    z_mask = z_mask.float()

    kl = logs_p - logs_q - 0.5
    kl += 0.5 * ((z_p - m_p) ** 2) * torch.exp(-2.0 * logs_p)
    kl = torch.sum(kl * z_mask)
    l = kl / torch.sum(z_mask)
    return l


================================================
FILE: melo/main.py
================================================
import click
import warnings
import os


@click.command
@click.argument('text')
@click.argument('output_path')
@click.option("--file", '-f', is_flag=True, show_default=True, default=False, help="Text is a file")
@click.option('--language', '-l', default='EN', help='Language, defaults to English', type=click.Choice(['EN', 'ES', 'FR', 'ZH', 'JP', 'KR'], case_sensitive=False))
@click.option('--speaker', '-spk', default='EN-Default', help='Speaker ID, only for English, leave empty for default, ignored if not English. If English, defaults to "EN-Default"', type=click.Choice(['EN-Default', 'EN-US', 'EN-BR', 'EN_INDIA', 'EN-AU']))
@click.option('--speed', '-s', default=1.0, help='Speed, defaults to 1.0', type=float)
@click.option('--device', '-d', default='auto', help='Device, defaults to auto')
def main(text, file, output_path, language, speaker, speed, device):
    if file:
        if not os.path.exists(text):
            raise FileNotFoundError(f'Trying to load text from file due to --file/-f flag, but file not found. Remove the --file/-f flag to pass a string.')
        else:
            with open(text) as f:
                text = f.read().strip()
    if text == '':
        raise ValueError('You entered empty text or the file you passed was empty.')
    language = language.upper()
    if language == '': language = 'EN'
    if speaker == '': speaker = None
    if (not language == 'EN') and speaker:
        warnings.warn('You specified a speaker but the language is English.')
    from melo.api import TTS
    model = TTS(language=language, device=device)
    speaker_ids = model.hps.data.spk2id
    if language == 'EN':
        if not speaker: speaker = 'EN-Default'
        spkr = speaker_ids[speaker]
    else:
        spkr = speaker_ids[list(speaker_ids.keys())[0]]
    model.tts_to_file(text, spkr, output_path, speed=speed)


================================================
FILE: melo/mel_processing.py
================================================
import torch
import torch.utils.data
import librosa
from librosa.filters import mel as librosa_mel_fn

MAX_WAV_VALUE = 32768.0


def dynamic_range_compression_torch(x, C=1, clip_val=1e-5):
    """
    PARAMS
    ------
    C: compression factor
    """
    return torch.log(torch.clamp(x, min=clip_val) * C)


def dynamic_range_decompression_torch(x, C=1):
    """
    PARAMS
    ------
    C: compression factor used to compress
    """
    return torch.exp(x) / C


def spectral_normalize_torch(magnitudes):
    output = dynamic_range_compression_torch(magnitudes)
    return output


def spectral_de_normalize_torch(magnitudes):
    output = dynamic_range_decompression_torch(magnitudes)
    return output


mel_basis = {}
hann_window = {}


def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False):
    if torch.min(y) < -1.1:
        print("min value is ", torch.min(y))
    if torch.max(y) > 1.1:
        print("max value is ", torch.max(y))

    global hann_window
    dtype_device = str(y.dtype) + "_" + str(y.device)
    wnsize_dtype_device = str(win_size) + "_" + dtype_device
    if wnsize_dtype_device not in hann_window:
        hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(
            dtype=y.dtype, device=y.device
        )

    y = torch.nn.functional.pad(
        y.unsqueeze(1),
        (int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)),
        mode="reflect",
    )
    y = y.squeeze(1)

    spec = torch.stft(
        y,
        n_fft,
        hop_length=hop_size,
        win_length=win_size,
        window=hann_window[wnsize_dtype_device],
        center=center,
        pad_mode="reflect",
        normalized=False,
        onesided=True,
        return_complex=False,
    )

    spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
    return spec


def spectrogram_torch_conv(y, n_fft, sampling_rate, hop_size, win_size, center=False):
    global hann_window
    dtype_device = str(y.dtype) + '_' + str(y.device)
    wnsize_dtype_device = str(win_size) + '_' + dtype_device
    if wnsize_dtype_device not in hann_window:
        hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device)

    y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')
    
    # ******************** original ************************#
    # y = y.squeeze(1)
    # spec1 = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
    #                   center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)

    # ******************** ConvSTFT ************************#
    freq_cutoff = n_fft // 2 + 1
    fourier_basis = torch.view_as_real(torch.fft.fft(torch.eye(n_fft)))
    forward_basis = fourier_basis[:freq_cutoff].permute(2, 0, 1).reshape(-1, 1, fourier_basis.shape[1])
    forward_basis = forward_basis * torch.as_tensor(librosa.util.pad_center(torch.hann_window(win_size), size=n_fft)).float()

    import torch.nn.functional as F

    # if center:
    #     signal = F.pad(y[:, None, None, :], (n_fft // 2, n_fft // 2, 0, 0), mode = 'reflect').squeeze(1)
    assert center is False

    forward_transform_squared = F.conv1d(y, forward_basis.to(y.device), stride = hop_size)
    spec2 = torch.stack([forward_transform_squared[:, :freq_cutoff, :], forward_transform_squared[:, freq_cutoff:, :]], dim = -1)


    # ******************** Verification ************************#
    spec1 = torch.stft(y.squeeze(1), n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
                      center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)
    assert torch.allclose(spec1, spec2, atol=1e-4)

    spec = torch.sqrt(spec2.pow(2).sum(-1) + 1e-6)
    return spec


def spec_to_mel_torch(spec, n_fft, num_mels, sampling_rate, fmin, fmax):
    global mel_basis
    dtype_device = str(spec.dtype) + "_" + str(spec.device)
    fmax_dtype_device = str(fmax) + "_" + dtype_device
    if fmax_dtype_device not in mel_basis:
        mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)
        mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(
            dtype=spec.dtype, device=spec.device
        )
    spec = torch.matmul(mel_basis[fmax_dtype_device], spec)
    spec = spectral_normalize_torch(spec)
    return spec


def mel_spectrogram_torch(
    y, n_fft, num_mels, sampling_rate, hop_size, win_size, fmin, fmax, center=False
):
    global mel_basis, hann_window
    dtype_device = str(y.dtype) + "_" + str(y.device)
    fmax_dtype_device = str(fmax) + "_" + dtype_device
    wnsize_dtype_device = str(win_size) + "_" + dtype_device
    if fmax_dtype_device not in mel_basis:
        mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)
        mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(
            dtype=y.dtype, device=y.device
        )
    if wnsize_dtype_device not in hann_window:
        hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(
            dtype=y.dtype, device=y.device
        )

    y = torch.nn.functional.pad(
        y.unsqueeze(1),
        (int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)),
        mode="reflect",
    )
    y = y.squeeze(1)

    spec = torch.stft(
        y,
        n_fft,
        hop_length=hop_size,
        win_length=win_size,
        window=hann_window[wnsize_dtype_device],
        center=center,
        pad_mode="reflect",
        normalized=False,
        onesided=True,
        return_complex=False,
    )

    spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)

    spec = torch.matmul(mel_basis[fmax_dtype_device], spec)
    spec = spectral_normalize_torch(spec)

    return spec


================================================
FILE: melo/models.py
================================================
import math
import torch
from torch import nn
from torch.nn import functional as F

from melo import commons
from melo import modules
from melo import attentions

from torch.nn import Conv1d, ConvTranspose1d, Conv2d
from torch.nn.utils import weight_norm, remove_weight_norm, spectral_norm

from melo.commons import init_weights, get_padding
import melo.monotonic_align as monotonic_align


class DurationDiscriminator(nn.Module):  # vits2
    def __init__(
        self, in_channels, filter_channels, kernel_size, p_dropout, gin_channels=0
    ):
        super().__init__()
        self.in_channels = in_channels
        self.filter_channels = filter_channels
        self.kernel_size = kernel_size
        self.p_dropout = p_dropout
        self.gin_channels = gin_channels

        self.drop = nn.Dropout(p_dropout)
        self.conv_1 = nn.Conv1d(
            in_channels, filter_channels, kernel_size, padding=kernel_size // 2
        )
        self.norm_1 = modules.LayerNorm(filter_channels)
        self.conv_2 = nn.Conv1d(
            filter_channels, filter_channels, kernel_size, padding=kernel_size // 2
        )
        self.norm_2 = modules.LayerNorm(filter_channels)
        self.dur_proj = nn.Conv1d(1, filter_channels, 1)

        self.pre_out_conv_1 = nn.Conv1d(
            2 * filter_channels, filter_channels, kernel_size, padding=kernel_size // 2
        )
        self.pre_out_norm_1 = modules.LayerNorm(filter_channels)
        self.pre_out_conv_2 = nn.Conv1d(
            filter_channels, filter_channels, kernel_size, padding=kernel_size // 2
        )
        self.pre_out_norm_2 = modules.LayerNorm(filter_channels)

        if gin_channels != 0:
            self.cond = nn.Conv1d(gin_channels, in_channels, 1)

        self.output_layer = nn.Sequential(nn.Linear(filter_channels, 1), nn.Sigmoid())

    def forward_probability(self, x, x_mask, dur, g=None):
        dur = self.dur_proj(dur)
        x = torch.cat([x, dur], dim=1)
        x = self.pre_out_conv_1(x * x_mask)
        x = torch.relu(x)
        x = self.pre_out_norm_1(x)
        x = self.drop(x)
        x = self.pre_out_conv_2(x * x_mask)
        x = torch.relu(x)
        x = self.pre_out_norm_2(x)
        x = self.drop(x)
        x = x * x_mask
        x = x.transpose(1, 2)
        output_prob = self.output_layer(x)
        return output_prob

    def forward(self, x, x_mask, dur_r, dur_hat, g=None):
        x = torch.detach(x)
        if g is not None:
            g = torch.detach(g)
            x = x + self.cond(g)
        x = self.conv_1(x * x_mask)
        x = torch.relu(x)
        x = self.norm_1(x)
        x = self.drop(x)
        x = self.conv_2(x * x_mask)
        x = torch.relu(x)
        x = self.norm_2(x)
        x = self.drop(x)

        output_probs = []
        for dur in [dur_r, dur_hat]:
            output_prob = self.forward_probability(x, x_mask, dur, g)
            output_probs.append(output_prob)

        return output_probs


class TransformerCouplingBlock(nn.Module):
    def __init__(
        self,
        channels,
        hidden_channels,
        filter_channels,
        n_heads,
        n_layers,
        kernel_size,
        p_dropout,
        n_flows=4,
        gin_channels=0,
        share_parameter=False,
    ):
        super().__init__()
        self.channels = channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.n_layers = n_layers
        self.n_flows = n_flows
        self.gin_channels = gin_channels

        self.flows = nn.ModuleList()

        self.wn = (
            attentions.FFT(
                hidden_channels,
                filter_channels,
                n_heads,
                n_layers,
                kernel_size,
                p_dropout,
                isflow=True,
                gin_channels=self.gin_channels,
            )
            if share_parameter
            else None
        )

        for i in range(n_flows):
            self.flows.append(
                modules.TransformerCouplingLayer(
                    channels,
                    hidden_channels,
                    kernel_size,
                    n_layers,
                    n_heads,
                    p_dropout,
                    filter_channels,
                    mean_only=True,
                    wn_sharing_parameter=self.wn,
                    gin_channels=self.gin_channels,
                )
            )
            self.flows.append(modules.Flip())

    def forward(self, x, x_mask, g=None, reverse=False):
        if not reverse:
            for flow in self.flows:
                x, _ = flow(x, x_mask, g=g, reverse=reverse)
        else:
            for flow in reversed(self.flows):
                x = flow(x, x_mask, g=g, reverse=reverse)
        return x


class StochasticDurationPredictor(nn.Module):
    def __init__(
        self,
        in_channels,
        filter_channels,
        kernel_size,
        p_dropout,
        n_flows=4,
        gin_channels=0,
    ):
        super().__init__()
        filter_channels = in_channels  # it needs to be removed from future version.
        self.in_channels = in_channels
        self.filter_channels = filter_channels
        self.kernel_size = kernel_size
        self.p_dropout = p_dropout
        self.n_flows = n_flows
        self.gin_channels = gin_channels

        self.log_flow = modules.Log()
        self.flows = nn.ModuleList()
        self.flows.append(modules.ElementwiseAffine(2))
        for i in range(n_flows):
            self.flows.append(
                modules.ConvFlow(2, filter_channels, kernel_size, n_layers=3)
            )
            self.flows.append(modules.Flip())

        self.post_pre = nn.Conv1d(1, filter_channels, 1)
        self.post_proj = nn.Conv1d(filter_channels, filter_channels, 1)
        self.post_convs = modules.DDSConv(
            filter_channels, kernel_size, n_layers=3, p_dropout=p_dropout
        )
        self.post_flows = nn.ModuleList()
        self.post_flows.append(modules.ElementwiseAffine(2))
        for i in range(4):
            self.post_flows.append(
                modules.ConvFlow(2, filter_channels, kernel_size, n_layers=3)
            )
            self.post_flows.append(modules.Flip())

        self.pre = nn.Conv1d(in_channels, filter_channels, 1)
        self.proj = nn.Conv1d(filter_channels, filter_channels, 1)
        self.convs = modules.DDSConv(
            filter_channels, kernel_size, n_layers=3, p_dropout=p_dropout
        )
        if gin_channels != 0:
            self.cond = nn.Conv1d(gin_channels, filter_channels, 1)

    def forward(self, x, x_mask, w=None, g=None, reverse=False, noise_scale=1.0):
        x = torch.detach(x)
        x = self.pre(x)
        if g is not None:
            g = torch.detach(g)
            x = x + self.cond(g)
        x = self.convs(x, x_mask)
        x = self.proj(x) * x_mask

        if not reverse:
            flows = self.flows
            assert w is not None

            logdet_tot_q = 0
            h_w = self.post_pre(w)
            h_w = self.post_convs(h_w, x_mask)
            h_w = self.post_proj(h_w) * x_mask
            e_q = (
                torch.randn(w.size(0), 2, w.size(2)).to(device=x.device, dtype=x.dtype)
                * x_mask
            )
            z_q = e_q
            for flow in self.post_flows:
                z_q, logdet_q = flow(z_q, x_mask, g=(x + h_w))
                logdet_tot_q += logdet_q
            z_u, z1 = torch.split(z_q, [1, 1], 1)
            u = torch.sigmoid(z_u) * x_mask
            z0 = (w - u) * x_mask
            logdet_tot_q += torch.sum(
                (F.logsigmoid(z_u) + F.logsigmoid(-z_u)) * x_mask, [1, 2]
            )
            logq = (
                torch.sum(-0.5 * (math.log(2 * math.pi) + (e_q**2)) * x_mask, [1, 2])
                - logdet_tot_q
            )

            logdet_tot = 0
            z0, logdet = self.log_flow(z0, x_mask)
            logdet_tot += logdet
            z = torch.cat([z0, z1], 1)
            for flow in flows:
                z, logdet = flow(z, x_mask, g=x, reverse=reverse)
                logdet_tot = logdet_tot + logdet
            nll = (
                torch.sum(0.5 * (math.log(2 * math.pi) + (z**2)) * x_mask, [1, 2])
                - logdet_tot
            )
            return nll + logq  # [b]
        else:
            flows = list(reversed(self.flows))
            flows = flows[:-2] + [flows[-1]]  # remove a useless vflow
            z = (
                torch.randn(x.size(0), 2, x.size(2)).to(device=x.device, dtype=x.dtype)
                * noise_scale
            )
            for flow in flows:
                z = flow(z, x_mask, g=x, reverse=reverse)
            z0, z1 = torch.split(z, [1, 1], 1)
            logw = z0
            return logw


class DurationPredictor(nn.Module):
    def __init__(
        self, in_channels, filter_channels, kernel_size, p_dropout, gin_channels=0
    ):
        super().__init__()

        self.in_channels = in_channels
        self.filter_channels = filter_channels
        self.kernel_size = kernel_size
        self.p_dropout = p_dropout
        self.gin_channels = gin_channels

        self.drop = nn.Dropout(p_dropout)
        self.conv_1 = nn.Conv1d(
            in_channels, filter_channels, kernel_size, padding=kernel_size // 2
        )
        self.norm_1 = modules.LayerNorm(filter_channels)
        self.conv_2 = nn.Conv1d(
            filter_channels, filter_channels, kernel_size, padding=kernel_size // 2
        )
        self.norm_2 = modules.LayerNorm(filter_channels)
        self.proj = nn.Conv1d(filter_channels, 1, 1)

        if gin_channels != 0:
            self.cond = nn.Conv1d(gin_channels, in_channels, 1)

    def forward(self, x, x_mask, g=None):
        x = torch.detach(x)
        if g is not None:
            g = torch.detach(g)
            x = x + self.cond(g)
        x = self.conv_1(x * x_mask)
        x = torch.relu(x)
        x = self.norm_1(x)
        x = self.drop(x)
        x = self.conv_2(x * x_mask)
        x = torch.relu(x)
        x = self.norm_2(x)
        x = self.drop(x)
        x = self.proj(x * x_mask)
        return x * x_mask


class TextEncoder(nn.Module):
    def __init__(
        self,
        n_vocab,
        out_channels,
        hidden_channels,
        filter_channels,
        n_heads,
        n_layers,
        kernel_size,
        p_dropout,
        gin_channels=0,
        num_languages=None,
        num_tones=None,
    ):
        super().__init__()
        if num_languages is None:
            from text import num_languages
        if num_tones is None:
            from text import num_tones
        self.n_vocab = n_vocab
        self.out_channels = out_channels
        self.hidden_channels = hidden_channels
        self.filter_channels = filter_channels
        self.n_heads = n_heads
        self.n_layers = n_layers
        self.kernel_size = kernel_size
        self.p_dropout = p_dropout
        self.gin_channels = gin_channels
        self.emb = nn.Embedding(n_vocab, hidden_channels)
        nn.init.normal_(self.emb.weight, 0.0, hidden_channels**-0.5)
        self.tone_emb = nn.Embedding(num_tones, hidden_channels)
        nn.init.normal_(self.tone_emb.weight, 0.0, hidden_channels**-0.5)
        self.language_emb = nn.Embedding(num_languages, hidden_channels)
        nn.init.normal_(self.language_emb.weight, 0.0, hidden_channels**-0.5)
        self.bert_proj = nn.Conv1d(1024, hidden_channels, 1)
        self.ja_bert_proj = nn.Conv1d(768, hidden_channels, 1)

        self.encoder = attentions.Encoder(
            hidden_channels,
            filter_channels,
            n_heads,
            n_layers,
            kernel_size,
            p_dropout,
            gin_channels=self.gin_channels,
        )
        self.proj = nn.Conv1d(hidden_channels, out_channels * 2, 1)

    def forward(self, x, x_lengths, tone, language, bert, ja_bert, g=None):
        bert_emb = self.bert_proj(bert).transpose(1, 2)
        ja_bert_emb = self.ja_bert_proj(ja_bert).transpose(1, 2)
        x = (
            self.emb(x)
            + self.tone_emb(tone)
            + self.language_emb(language)
            + bert_emb
            + ja_bert_emb
        ) * math.sqrt(
            self.hidden_channels
        )  # [b, t, h]
        x = torch.transpose(x, 1, -1)  # [b, h, t]
        x_mask = torch.unsqueeze(commons.sequence_mask(x_lengths, x.size(2)), 1).to(
            x.dtype
        )

        x = self.encoder(x * x_mask, x_mask, g=g)
        stats = self.proj(x) * x_mask

        m, logs = torch.split(stats, self.out_channels, dim=1)
        return x, m, logs, x_mask


class ResidualCouplingBlock(nn.Module):
    def __init__(
        self,
        channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        n_layers,
        n_flows=4,
        gin_channels=0,
    ):
        super().__init__()
        self.channels = channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.n_layers = n_layers
        self.n_flows = n_flows
        self.gin_channels = gin_channels

        self.flows = nn.ModuleList()
        for i in range(n_flows):
            self.flows.append(
                modules.ResidualCouplingLayer(
                    channels,
                    hidden_channels,
                    kernel_size,
                    dilation_rate,
                    n_layers,
                    gin_channels=gin_channels,
                    mean_only=True,
                )
            )
            self.flows.append(modules.Flip())

    def forward(self, x, x_mask, g=None, reverse=False):
        if not reverse:
            for flow in self.flows:
                x, _ = flow(x, x_mask, g=g, reverse=reverse)
        else:
            for flow in reversed(self.flows):
                x = flow(x, x_mask, g=g, reverse=reverse)
        return x


class PosteriorEncoder(nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        n_layers,
        gin_channels=0,
    ):
        super().__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.n_layers = n_layers
        self.gin_channels = gin_channels

        self.pre = nn.Conv1d(in_channels, hidden_channels, 1)
        self.enc = modules.WN(
            hidden_channels,
            kernel_size,
            dilation_rate,
            n_layers,
            gin_channels=gin_channels,
        )
        self.proj = nn.Conv1d(hidden_channels, out_channels * 2, 1)

    def forward(self, x, x_lengths, g=None, tau=1.0):
        x_mask = torch.unsqueeze(commons.sequence_mask(x_lengths, x.size(2)), 1).to(
            x.dtype
        )
        x = self.pre(x) * x_mask
        x = self.enc(x, x_mask, g=g)
        stats = self.proj(x) * x_mask
        m, logs = torch.split(stats, self.out_channels, dim=1)
        z = (m + torch.randn_like(m) * tau * torch.exp(logs)) * x_mask
        return z, m, logs, x_mask


class Generator(torch.nn.Module):
    def __init__(
        self,
        initial_channel,
        resblock,
        resblock_kernel_sizes,
        resblock_dilation_sizes,
        upsample_rates,
        upsample_initial_channel,
        upsample_kernel_sizes,
        gin_channels=0,
    ):
        super(Generator, self).__init__()
        self.num_kernels = len(resblock_kernel_sizes)
        self.num_upsamples = len(upsample_rates)
        self.conv_pre = Conv1d(
            initial_channel, upsample_initial_channel, 7, 1, padding=3
        )
        resblock = modules.ResBlock1 if resblock == "1" else modules.ResBlock2

        self.ups = nn.ModuleList()
        for i, (u, k) in enumerate(zip(upsample_rates, upsample_kernel_sizes)):
            self.ups.append(
                weight_norm(
                    ConvTranspose1d(
                        upsample_initial_channel // (2**i),
                        upsample_initial_channel // (2 ** (i + 1)),
                        k,
                        u,
                        padding=(k - u) // 2,
                    )
                )
            )

        self.resblocks = nn.ModuleList()
        for i in range(len(self.ups)):
            ch = upsample_initial_channel // (2 ** (i + 1))
            for j, (k, d) in enumerate(
                zip(resblock_kernel_sizes, resblock_dilation_sizes)
            ):
                self.resblocks.append(resblock(ch, k, d))

        self.conv_post = Conv1d(ch, 1, 7, 1, padding=3, bias=False)
        self.ups.apply(init_weights)

        if gin_channels != 0:
            self.cond = nn.Conv1d(gin_channels, upsample_initial_channel, 1)

    def forward(self, x, g=None):
        x = self.conv_pre(x)
        if g is not None:
            x = x + self.cond(g)

        for i in range(self.num_upsamples):
            x = F.leaky_relu(x, modules.LRELU_SLOPE)
            x = self.ups[i](x)
            xs = None
            for j in range(self.num_kernels):
                if xs is None:
                    xs = self.resblocks[i * self.num_kernels + j](x)
                else:
                    xs += self.resblocks[i * self.num_kernels + j](x)
            x = xs / self.num_kernels
        x = F.leaky_relu(x)
        x = self.conv_post(x)
        x = torch.tanh(x)

        return x

    def remove_weight_norm(self):
        print("Removing weight norm...")
        for layer in self.ups:
            remove_weight_norm(layer)
        for layer in self.resblocks:
            layer.remove_weight_norm()


class DiscriminatorP(torch.nn.Module):
    def __init__(self, period, kernel_size=5, stride=3, use_spectral_norm=False):
        super(DiscriminatorP, self).__init__()
        self.period = period
        self.use_spectral_norm = use_spectral_norm
        norm_f = weight_norm if use_spectral_norm is False else spectral_norm
        self.convs = nn.ModuleList(
            [
                norm_f(
                    Conv2d(
                        1,
                        32,
                        (kernel_size, 1),
                        (stride, 1),
                        padding=(get_padding(kernel_size, 1), 0),
                    )
                ),
                norm_f(
                    Conv2d(
                        32,
                        128,
                        (kernel_size, 1),
                        (stride, 1),
                        padding=(get_padding(kernel_size, 1), 0),
                    )
                ),
                norm_f(
                    Conv2d(
                        128,
                        512,
                        (kernel_size, 1),
                        (stride, 1),
                        padding=(get_padding(kernel_size, 1), 0),
                    )
                ),
                norm_f(
                    Conv2d(
                        512,
                        1024,
                        (kernel_size, 1),
                        (stride, 1),
                        padding=(get_padding(kernel_size, 1), 0),
                    )
                ),
                norm_f(
                    Conv2d(
                        1024,
                        1024,
                        (kernel_size, 1),
                        1,
                        padding=(get_padding(kernel_size, 1), 0),
                    )
                ),
            ]
        )
        self.conv_post = norm_f(Conv2d(1024, 1, (3, 1), 1, padding=(1, 0)))

    def forward(self, x):
        fmap = []

        # 1d to 2d
        b, c, t = x.shape
        if t % self.period != 0:  # pad first
            n_pad = self.period - (t % self.period)
            x = F.pad(x, (0, n_pad), "reflect")
            t = t + n_pad
        x = x.view(b, c, t // self.period, self.period)

        for layer in self.convs:
            x = layer(x)
            x = F.leaky_relu(x, modules.LRELU_SLOPE)
            fmap.append(x)
        x = self.conv_post(x)
        fmap.append(x)
        x = torch.flatten(x, 1, -1)

        return x, fmap


class DiscriminatorS(torch.nn.Module):
    def __init__(self, use_spectral_norm=False):
        super(DiscriminatorS, self).__init__()
        norm_f = weight_norm if use_spectral_norm is False else spectral_norm
        self.convs = nn.ModuleList(
            [
                norm_f(Conv1d(1, 16, 15, 1, padding=7)),
                norm_f(Conv1d(16, 64, 41, 4, groups=4, padding=20)),
                norm_f(Conv1d(64, 256, 41, 4, groups=16, padding=20)),
                norm_f(Conv1d(256, 1024, 41, 4, groups=64, padding=20)),
                norm_f(Conv1d(1024, 1024, 41, 4, groups=256, padding=20)),
                norm_f(Conv1d(1024, 1024, 5, 1, padding=2)),
            ]
        )
        self.conv_post = norm_f(Conv1d(1024, 1, 3, 1, padding=1))

    def forward(self, x):
        fmap = []

        for layer in self.convs:
            x = layer(x)
            x = F.leaky_relu(x, modules.LRELU_SLOPE)
            fmap.append(x)
        x = self.conv_post(x)
        fmap.append(x)
        x = torch.flatten(x, 1, -1)

        return x, fmap


class MultiPeriodDiscriminator(torch.nn.Module):
    def __init__(self, use_spectral_norm=False):
        super(MultiPeriodDiscriminator, self).__init__()
        periods = [2, 3, 5, 7, 11]

        discs = [DiscriminatorS(use_spectral_norm=use_spectral_norm)]
        discs = discs + [
            DiscriminatorP(i, use_spectral_norm=use_spectral_norm) for i in periods
        ]
        self.discriminators = nn.ModuleList(discs)

    def forward(self, y, y_hat):
        y_d_rs = []
        y_d_gs = []
        fmap_rs = []
        fmap_gs = []
        for i, d in enumerate(self.discriminators):
            y_d_r, fmap_r = d(y)
            y_d_g, fmap_g = d(y_hat)
            y_d_rs.append(y_d_r)
            y_d_gs.append(y_d_g)
            fmap_rs.append(fmap_r)
            fmap_gs.append(fmap_g)

        return y_d_rs, y_d_gs, fmap_rs, fmap_gs


class ReferenceEncoder(nn.Module):
    """
    inputs --- [N, Ty/r, n_mels*r]  mels
    outputs --- [N, ref_enc_gru_size]
    """

    def __init__(self, spec_channels, gin_channels=0, layernorm=False):
        super().__init__()
        self.spec_channels = spec_channels
        ref_enc_filters = [32, 32, 64, 64, 128, 128]
        K = len(ref_enc_filters)
        filters = [1] + ref_enc_filters
        convs = [
            weight_norm(
                nn.Conv2d(
                    in_channels=filters[i],
                    out_channels=filters[i + 1],
                    kernel_size=(3, 3),
                    stride=(2, 2),
                    padding=(1, 1),
                )
            )
            for i in range(K)
        ]
        self.convs = nn.ModuleList(convs)
        # self.wns = nn.ModuleList([weight_norm(num_features=ref_enc_filters[i]) for i in range(K)]) # noqa: E501

        out_channels = self.calculate_channels(spec_channels, 3, 2, 1, K)
        self.gru = nn.GRU(
            input_size=ref_enc_filters[-1] * out_channels,
            hidden_size=256 // 2,
            batch_first=True,
        )
        self.proj = nn.Linear(128, gin_channels)
        if layernorm:
            self.layernorm = nn.LayerNorm(self.spec_channels)
            print('[Ref Enc]: using layer norm')
        else:
            self.layernorm = None

    def forward(self, inputs, mask=None):
        N = inputs.size(0)

        out = inputs.view(N, 1, -1, self.spec_channels)  # [N, 1, Ty, n_freqs]
        if self.layernorm is not None:
            out = self.layernorm(out)

        for conv in self.convs:
            out = conv(out)
            # out = wn(out)
            out = F.relu(out)  # [N, 128, Ty//2^K, n_mels//2^K]

        out = out.transpose(1, 2)  # [N, Ty//2^K, 128, n_mels//2^K]
        T = out.size(1)
        N = out.size(0)
        out = out.contiguous().view(N, T, -1)  # [N, Ty//2^K, 128*n_mels//2^K]

        self.gru.flatten_parameters()
        memory, out = self.gru(out)  # out --- [1, N, 128]

        return self.proj(out.squeeze(0))

    def calculate_channels(self, L, kernel_size, stride, pad, n_convs):
        for i in range(n_convs):
            L = (L - kernel_size + 2 * pad) // stride + 1
        return L


class SynthesizerTrn(nn.Module):
    """
    Synthesizer for Training
    """

    def __init__(
        self,
        n_vocab,
        spec_channels,
        segment_size,
        inter_channels,
        hidden_channels,
        filter_channels,
        n_heads,
        n_layers,
        kernel_size,
        p_dropout,
        resblock,
        resblock_kernel_sizes,
        resblock_dilation_sizes,
        upsample_rates,
        upsample_initial_channel,
        upsample_kernel_sizes,
        n_speakers=256,
        gin_channels=256,
        use_sdp=True,
        n_flow_layer=4,
        n_layers_trans_flow=6,
        flow_share_parameter=False,
        use_transformer_flow=True,
        use_vc=False,
        num_languages=None,
        num_tones=None,
        norm_refenc=False,
        **kwargs
    ):
        super().__init__()
        self.n_vocab = n_vocab
        self.spec_channels = spec_channels
        self.inter_channels = inter_channels
        self.hidden_channels = hidden_channels
        self.filter_channels = filter_channels
        self.n_heads = n_heads
        self.n_layers = n_layers
        self.kernel_size = kernel_size
        self.p_dropout = p_dropout
        self.resblock = resblock
        self.resblock_kernel_sizes = resblock_kernel_sizes
        self.resblock_dilation_sizes = resblock_dilation_sizes
        self.upsample_rates = upsample_rates
        self.upsample_initial_channel = upsample_initial_channel
        self.upsample_kernel_sizes = upsample_kernel_sizes
        self.segment_size = segment_size
        self.n_speakers = n_speakers
        self.gin_channels = gin_channels
        self.n_layers_trans_flow = n_layers_trans_flow
        self.use_spk_conditioned_encoder = kwargs.get(
            "use_spk_conditioned_encoder", True
        )
        self.use_sdp = use_sdp
        self.use_noise_scaled_mas = kwargs.get("use_noise_scaled_mas", False)
        self.mas_noise_scale_initial = kwargs.get("mas_noise_scale_initial", 0.01)
        self.noise_scale_delta = kwargs.get("noise_scale_delta", 2e-6)
        self.current_mas_noise_scale = self.mas_noise_scale_initial
        if self.use_spk_conditioned_encoder and gin_channels > 0:
            self.enc_gin_channels = gin_channels
        else:
            self.enc_gin_channels = 0
        self.enc_p = TextEncoder(
            n_vocab,
            inter_channels,
            hidden_channels,
            filter_channels,
            n_heads,
            n_layers,
            kernel_size,
            p_dropout,
            gin_channels=self.enc_gin_channels,
            num_languages=num_languages,
            num_tones=num_tones,
        )
        self.dec = Generator(
            inter_channels,
            resblock,
            resblock_kernel_sizes,
            resblock_dilation_sizes,
            upsample_rates,
            upsample_initial_channel,
            upsample_kernel_sizes,
            gin_channels=gin_channels,
        )
        self.enc_q = PosteriorEncoder(
            spec_channels,
            inter_channels,
            hidden_channels,
            5,
            1,
            16,
            gin_channels=gin_channels,
        )
        if use_transformer_flow:
            self.flow = TransformerCouplingBlock(
                inter_channels,
                hidden_channels,
                filter_channels,
                n_heads,
                n_layers_trans_flow,
                5,
                p_dropout,
                n_flow_layer,
                gin_channels=gin_channels,
                share_parameter=flow_share_parameter,
            )
        else:
            self.flow = ResidualCouplingBlock(
                inter_channels,
                hidden_channels,
                5,
                1,
                n_flow_layer,
                gin_channels=gin_channels,
            )
        self.sdp = StochasticDurationPredictor(
            hidden_channels, 192, 3, 0.5, 4, gin_channels=gin_channels
        )
        self.dp = DurationPredictor(
            hidden_channels, 256, 3, 0.5, gin_channels=gin_channels
        )

        if n_speakers > 0:
            self.emb_g = nn.Embedding(n_speakers, gin_channels)
        else:
            self.ref_enc = ReferenceEncoder(spec_channels, gin_channels, layernorm=norm_refenc)
        self.use_vc = use_vc


    def forward(self, x, x_lengths, y, y_lengths, sid, tone, language, bert, ja_bert):
        if self.n_speakers > 0:
            g = self.emb_g(sid).unsqueeze(-1)  # [b, h, 1]
        else:
            g = self.ref_enc(y.transpose(1, 2)).unsqueeze(-1)
        if self.use_vc:
            g_p = None
        else:
            g_p = g
        x, m_p, logs_p, x_mask = self.enc_p(
            x, x_lengths, tone, language, bert, ja_bert, g=g_p
        )
        z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g)
        z_p = self.flow(z, y_mask, g=g)

        with torch.no_grad():
            # negative cross-entropy
            s_p_sq_r = torch.exp(-2 * logs_p)  # [b, d, t]
            neg_cent1 = torch.sum(
                -0.5 * math.log(2 * math.pi) - logs_p, [1], keepdim=True
            )  # [b, 1, t_s]
            neg_cent2 = torch.matmul(
                -0.5 * (z_p**2).transpose(1, 2), s_p_sq_r
            )  # [b, t_t, d] x [b, d, t_s] = [b, t_t, t_s]
            neg_cent3 = torch.matmul(
                z_p.transpose(1, 2), (m_p * s_p_sq_r)
            )  # [b, t_t, d] x [b, d, t_s] = [b, t_t, t_s]
            neg_cent4 = torch.sum(
                -0.5 * (m_p**2) * s_p_sq_r, [1], keepdim=True
            )  # [b, 1, t_s]
            neg_cent = neg_cent1 + neg_cent2 + neg_cent3 + neg_cent4
            if self.use_noise_scaled_mas:
                epsilon = (
                    torch.std(neg_cent)
                    * torch.randn_like(neg_cent)
                    * self.current_mas_noise_scale
                )
                neg_cent = neg_cent + epsilon

            attn_mask = torch.unsqueeze(x_mask, 2) * torch.unsqueeze(y_mask, -1)
            attn = (
                monotonic_align.maximum_path(neg_cent, attn_mask.squeeze(1))
                .unsqueeze(1)
                .detach()
            )

        w = attn.sum(2)

        l_length_sdp = self.sdp(x, x_mask, w, g=g)
        l_length_sdp = l_length_sdp / torch.sum(x_mask)

        logw_ = torch.log(w + 1e-6) * x_mask
        logw = self.dp(x, x_mask, g=g)
        l_length_dp = torch.sum((logw - logw_) ** 2, [1, 2]) / torch.sum(
            x_mask
        )  # for averaging

        l_length = l_length_dp + l_length_sdp

        # expand prior
        m_p = torch.matmul(attn.squeeze(1), m_p.transpose(1, 2)).transpose(1, 2)
        logs_p = torch.matmul(attn.squeeze(1), logs_p.transpose(1, 2)).transpose(1, 2)

        z_slice, ids_slice = commons.rand_slice_segments(
            z, y_lengths, self.segment_size
        )
        o = self.dec(z_slice, g=g)
        return (
            o,
            l_length,
            attn,
            ids_slice,
            x_mask,
            y_mask,
            (z, z_p, m_p, logs_p, m_q, logs_q),
            (x, logw, logw_),
        )

    def infer(
        self,
        x,
        x_lengths,
        sid,
        tone,
        language,
        bert,
        ja_bert,
        noise_scale=0.667,
        length_scale=1,
        noise_scale_w=0.8,
        max_len=None,
        sdp_ratio=0,
        y=None,
        g=None,
    ):
        # x, m_p, logs_p, x_mask = self.enc_p(x, x_lengths, tone, language, bert)
        # g = self.gst(y)
        if g is None:
            if self.n_speakers > 0:
                g = self.emb_g(sid).unsqueeze(-1)  # [b, h, 1]
            else:
                g = self.ref_enc(y.transpose(1, 2)).unsqueeze(-1)
        if self.use_vc:
            g_p = None
        else:
            g_p = g
        x, m_p, logs_p, x_mask = self.enc_p(
            x, x_lengths, tone, language, bert, ja_bert, g=g_p
        )
        logw = self.sdp(x, x_mask, g=g, reverse=True, noise_scale=noise_scale_w) * (
            sdp_ratio
        ) + self.dp(x, x_mask, g=g) * (1 - sdp_ratio)
        w = torch.exp(logw) * x_mask * length_scale
        
        w_ceil = torch.ceil(w)
        y_lengths = torch.clamp_min(torch.sum(w_ceil, [1, 2]), 1).long()
        y_mask = torch.unsqueeze(commons.sequence_mask(y_lengths, None), 1).to(
            x_mask.dtype
        )
        attn_mask = torch.unsqueeze(x_mask, 2) * torch.unsqueeze(y_mask, -1)
        attn = commons.generate_path(w_ceil, attn_mask)

        m_p = torch.matmul(attn.squeeze(1), m_p.transpose(1, 2)).transpose(
            1, 2
        )  # [b, t', t], [b, t, d] -> [b, d, t']
        logs_p = torch.matmul(attn.squeeze(1), logs_p.transpose(1, 2)).transpose(
            1, 2
        )  # [b, t', t], [b, t, d] -> [b, d, t']

        z_p = m_p + torch.randn_like(m_p) * torch.exp(logs_p) * noise_scale
        z = self.flow(z_p, y_mask, g=g, reverse=True)
        o = self.dec((z * y_mask)[:, :, :max_len], g=g)
        # print('max/min of o:', o.max(), o.min())
        return o, attn, y_mask, (z, z_p, m_p, logs_p)

    def voice_conversion(self, y, y_lengths, sid_src, sid_tgt, tau=1.0):        
        g_src = sid_src
        g_tgt = sid_tgt
        z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g_src, tau=tau)
        z_p = self.flow(z, y_mask, g=g_src)
        z_hat = self.flow(z_p, y_mask, g=g_tgt, reverse=True)
        o_hat = self.dec(z_hat * y_mask, g=g_tgt)
        return o_hat, y_mask, (z, z_p, z_hat)


================================================
FILE: melo/modules.py
================================================
import math
import torch
from torch import nn
from torch.nn import functional as F

from torch.nn import Conv1d
from torch.nn.utils import weight_norm, remove_weight_norm

from . import commons
from .commons import init_weights, get_padding
from .transforms import piecewise_rational_quadratic_transform
from .attentions import Encoder

LRELU_SLOPE = 0.1


class LayerNorm(nn.Module):
    def __init__(self, channels, eps=1e-5):
        super().__init__()
        self.channels = channels
        self.eps = eps

        self.gamma = nn.Parameter(torch.ones(channels))
        self.beta = nn.Parameter(torch.zeros(channels))

    def forward(self, x):
        x = x.transpose(1, -1)
        x = F.layer_norm(x, (self.channels,), self.gamma, self.beta, self.eps)
        return x.transpose(1, -1)


class ConvReluNorm(nn.Module):
    def __init__(
        self,
        in_channels,
        hidden_channels,
        out_channels,
        kernel_size,
        n_layers,
        p_dropout,
    ):
        super().__init__()
        self.in_channels = in_channels
        self.hidden_channels = hidden_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.n_layers = n_layers
        self.p_dropout = p_dropout
        assert n_layers > 1, "Number of layers should be larger than 0."

        self.conv_layers = nn.ModuleList()
        self.norm_layers = nn.ModuleList()
        self.conv_layers.append(
            nn.Conv1d(
                in_channels, hidden_channels, kernel_size, padding=kernel_size // 2
            )
        )
        self.norm_layers.append(LayerNorm(hidden_channels))
        self.relu_drop = nn.Sequential(nn.ReLU(), nn.Dropout(p_dropout))
        for _ in range(n_layers - 1):
            self.conv_layers.append(
                nn.Conv1d(
                    hidden_channels,
                    hidden_channels,
                    kernel_size,
                    padding=kernel_size // 2,
                )
            )
            self.norm_layers.append(LayerNorm(hidden_channels))
        self.proj = nn.Conv1d(hidden_channels, out_channels, 1)
        self.proj.weight.data.zero_()
        self.proj.bias.data.zero_()

    def forward(self, x, x_mask):
        x_org = x
        for i in range(self.n_layers):
            x = self.conv_layers[i](x * x_mask)
            x = self.norm_layers[i](x)
            x = self.relu_drop(x)
        x = x_org + self.proj(x)
        return x * x_mask


class DDSConv(nn.Module):
    """
    Dialted and Depth-Separable Convolution
    """

    def __init__(self, channels, kernel_size, n_layers, p_dropout=0.0):
        super().__init__()
        self.channels = channels
        self.kernel_size = kernel_size
        self.n_layers = n_layers
        self.p_dropout = p_dropout

        self.drop = nn.Dropout(p_dropout)
        self.convs_sep = nn.ModuleList()
        self.convs_1x1 = nn.ModuleList()
        self.norms_1 = nn.ModuleList()
        self.norms_2 = nn.ModuleList()
        for i in range(n_layers):
            dilation = kernel_size**i
            padding = (kernel_size * dilation - dilation) // 2
            self.convs_sep.append(
                nn.Conv1d(
                    channels,
                    channels,
                    kernel_size,
                    groups=channels,
                    dilation=dilation,
                    padding=padding,
                )
            )
            self.convs_1x1.append(nn.Conv1d(channels, channels, 1))
            self.norms_1.append(LayerNorm(channels))
            self.norms_2.append(LayerNorm(channels))

    def forward(self, x, x_mask, g=None):
        if g is not None:
            x = x + g
        for i in range(self.n_layers):
            y = self.convs_sep[i](x * x_mask)
            y = self.norms_1[i](y)
            y = F.gelu(y)
            y = self.convs_1x1[i](y)
            y = self.norms_2[i](y)
            y = F.gelu(y)
            y = self.drop(y)
            x = x + y
        return x * x_mask


class WN(torch.nn.Module):
    def __init__(
        self,
        hidden_channels,
        kernel_size,
        dilation_rate,
        n_layers,
        gin_channels=0,
        p_dropout=0,
    ):
        super(WN, self).__init__()
        assert kernel_size % 2 == 1
        self.hidden_channels = hidden_channels
        self.kernel_size = (kernel_size,)
        self.dilation_rate = dilation_rate
        self.n_layers = n_layers
        self.gin_channels = gin_channels
        self.p_dropout = p_dropout

        self.in_layers = torch.nn.ModuleList()
        self.res_skip_layers = torch.nn.ModuleList()
        self.drop = nn.Dropout(p_dropout)

        if gin_channels != 0:
            cond_layer = torch.nn.Conv1d(
                gin_channels, 2 * hidden_channels * n_layers, 1
            )
            self.cond_layer = torch.nn.utils.weight_norm(cond_layer, name="weight")

        for i in range(n_layers):
            dilation = dilation_rate**i
            padding = int((kernel_size * dilation - dilation) / 2)
            in_layer = torch.nn.Conv1d(
                hidden_channels,
                2 * hidden_channels,
                kernel_size,
                dilation=dilation,
                padding=padding,
            )
            in_layer = torch.nn.utils.weight_norm(in_layer, name="weight")
            self.in_layers.append(in_layer)

            # last one is not necessary
            if i < n_layers - 1:
                res_skip_channels = 2 * hidden_channels
            else:
                res_skip_channels = hidden_channels

            res_skip_layer = torch.nn.Conv1d(hidden_channels, res_skip_channels, 1)
            res_skip_layer = torch.nn.utils.weight_norm(res_skip_layer, name="weight")
            self.res_skip_layers.append(res_skip_layer)

    def forward(self, x, x_mask, g=None, **kwargs):
        output = torch.zeros_like(x)
        n_channels_tensor = torch.IntTensor([self.hidden_channels])

        if g is not None:
            g = self.cond_layer(g)

        for i in range(self.n_layers):
            x_in = self.in_layers[i](x)
            if g is not None:
                cond_offset = i * 2 * self.hidden_channels
                g_l = g[:, cond_offset : cond_offset + 2 * self.hidden_channels, :]
            else:
                g_l = torch.zeros_like(x_in)

            acts = commons.fused_add_tanh_sigmoid_multiply(x_in, g_l, n_channels_tensor)
            acts = self.drop(acts)

            res_skip_acts = self.res_skip_layers[i](acts)
            if i < self.n_layers - 1:
                res_acts = res_skip_acts[:, : self.hidden_channels, :]
                x = (x + res_acts) * x_mask
                output = output + res_skip_acts[:, self.hidden_channels :, :]
            else:
                output = output + res_skip_acts
        return output * x_mask

    def remove_weight_norm(self):
        if self.gin_channels != 0:
            torch.nn.utils.remove_weight_norm(self.cond_layer)
        for l in self.in_layers:
            torch.nn.utils.remove_weight_norm(l)
        for l in self.res_skip_layers:
            torch.nn.utils.remove_weight_norm(l)


class ResBlock1(torch.nn.Module):
    def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5)):
        super(ResBlock1, self).__init__()
        self.convs1 = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[0],
                        padding=get_padding(kernel_size, dilation[0]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[1],
                        padding=get_padding(kernel_size, dilation[1]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[2],
                        padding=get_padding(kernel_size, dilation[2]),
                    )
                ),
            ]
        )
        self.convs1.apply(init_weights)

        self.convs2 = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=1,
                        padding=get_padding(kernel_size, 1),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=1,
                        padding=get_padding(kernel_size, 1),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=1,
                        padding=get_padding(kernel_size, 1),
                    )
                ),
            ]
        )
        self.convs2.apply(init_weights)

    def forward(self, x, x_mask=None):
        for c1, c2 in zip(self.convs1, self.convs2):
            xt = F.leaky_relu(x, LRELU_SLOPE)
            if x_mask is not None:
                xt = xt * x_mask
            xt = c1(xt)
            xt = F.leaky_relu(xt, LRELU_SLOPE)
            if x_mask is not None:
                xt = xt * x_mask
            xt = c2(xt)
            x = xt + x
        if x_mask is not None:
            x = x * x_mask
        return x

    def remove_weight_norm(self):
        for l in self.convs1:
            remove_weight_norm(l)
        for l in self.convs2:
            remove_weight_norm(l)


class ResBlock2(torch.nn.Module):
    def __init__(self, channels, kernel_size=3, dilation=(1, 3)):
        super(ResBlock2, self).__init__()
        self.convs = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[0],
                        padding=get_padding(kernel_size, dilation[0]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[1],
                        padding=get_padding(kernel_size, dilation[1]),
                    )
                ),
            ]
        )
        self.convs.apply(init_weights)

    def forward(self, x, x_mask=None):
        for c in self.convs:
            xt = F.leaky_relu(x, LRELU_SLOPE)
            if x_mask is not None:
                xt = xt * x_mask
            xt = c(xt)
            x = xt + x
        if x_mask is not None:
            x = x * x_mask
        return x

    def remove_weight_norm(self):
        for l in self.convs:
            remove_weight_norm(l)


class Log(nn.Module):
    def forward(self, x, x_mask, reverse=False, **kwargs):
        if not reverse:
            y = torch.log(torch.clamp_min(x, 1e-5)) * x_mask
            logdet = torch.sum(-y, [1, 2])
            return y, logdet
        else:
            x = torch.exp(x) * x_mask
            return x


class Flip(nn.Module):
    def forward(self, x, *args, reverse=False, **kwargs):
        x = torch.flip(x, [1])
        if not reverse:
            logdet = torch.zeros(x.size(0)).to(dtype=x.dtype, device=x.device)
            return x, logdet
        else:
            return x


class ElementwiseAffine(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.channels = channels
        self.m = nn.Parameter(torch.zeros(channels, 1))
        self.logs = nn.Parameter(torch.zeros(channels, 1))

    def forward(self, x, x_mask, reverse=False, **kwargs):
        if not reverse:
            y = self.m + torch.exp(self.logs) * x
            y = y * x_mask
            logdet = torch.sum(self.logs * x_mask, [1, 2])
            return y, logdet
        else:
            x = (x - self.m) * torch.exp(-self.logs) * x_mask
            return x


class ResidualCouplingLayer(nn.Module):
    def __init__(
        self,
        channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        n_layers,
        p_dropout=0,
        gin_channels=0,
        mean_only=False,
    ):
        assert channels % 2 == 0, "channels should be divisible by 2"
        super().__init__()
        self.channels = channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.n_layers = n_layers
        self.half_channels = channels // 2
        self.mean_only = mean_only

        self.pre = nn.Conv1d(self.half_channels, hidden_channels, 1)
        self.enc = WN(
            hidden_channels,
            kernel_size,
            dilation_rate,
            n_layers,
            p_dropout=p_dropout,
            gin_channels=gin_channels,
        )
        self.post = nn.Conv1d(hidden_channels, self.half_channels * (2 - mean_only), 1)
        self.post.weight.data.zero_()
        self.post.bias.data.zero_()

    def forward(self, x, x_mask, g=None, reverse=False):
        x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
        h = self.pre(x0) * x_mask
        h = self.enc(h, x_mask, g=g)
        stats = self.post(h) * x_mask
        if not self.mean_only:
            m, logs = torch.split(stats, [self.half_channels] * 2, 1)
        else:
            m = stats
            logs = torch.zeros_like(m)

        if not reverse:
            x1 = m + x1 * torch.exp(logs) * x_mask
            x = torch.cat([x0, x1], 1)
            logdet = torch.sum(logs, [1, 2])
            return x, logdet
        else:
            x1 = (x1 - m) * torch.exp(-logs) * x_mask
            x = torch.cat([x0, x1], 1)
            return x


class ConvFlow(nn.Module):
    def __init__(
        self,
        in_channels,
        filter_channels,
        kernel_size,
        n_layers,
        num_bins=10,
        tail_bound=5.0,
    ):
        super().__init__()
        self.in_channels = in_channels
        self.filter_channels = filter_channels
        self.kernel_size = kernel_size
        self.n_layers = n_layers
        self.num_bins = num_bins
        self.tail_bound = tail_bound
        self.half_channels = in_channels // 2

        self.pre = nn.Conv1d(self.half_channels, filter_channels, 1)
        self.convs = DDSConv(filter_channels, kernel_size, n_layers, p_dropout=0.0)
        self.proj = nn.Conv1d(
            filter_channels, self.half_channels * (num_bins * 3 - 1), 1
        )
        self.proj.weight.data.zero_()
        self.proj.bias.data.zero_()

    def forward(self, x, x_mask, g=None, reverse=False):
        x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
        h = self.pre(x0)
        h = self.convs(h, x_mask, g=g)
        h = self.proj(h) * x_mask

        b, c, t = x0.shape
        h = h.reshape(b, c, -1, t).permute(0, 1, 3, 2)  # [b, cx?, t] -> [b, c, t, ?]

        unnormalized_widths = h[..., : self.num_bins] / math.sqrt(self.filter_channels)
        unnormalized_heights = h[..., self.num_bins : 2 * self.num_bins] / math.sqrt(
            self.filter_channels
        )
        unnormalized_derivatives = h[..., 2 * self.num_bins :]

        x1, logabsdet = piecewise_rational_quadratic_transform(
            x1,
            unnormalized_widths,
            unnormalized_heights,
            unnormalized_derivatives,
            inverse=reverse,
            tails="linear",
            tail_bound=self.tail_bound,
        )

        x = torch.cat([x0, x1], 1) * x_mask
        logdet = torch.sum(logabsdet * x_mask, [1, 2])
        if not reverse:
            return x, logdet
        else:
            return x


class TransformerCouplingLayer(nn.Module):
    def __init__(
        self,
        channels,
        hidden_channels,
        kernel_size,
        n_layers,
        n_heads,
        p_dropout=0,
        filter_channels=0,
        mean_only=False,
        wn_sharing_parameter=None,
        gin_channels=0,
    ):
        assert n_layers == 3, n_layers
        assert channels % 2 == 0, "channels should be divisible by 2"
        super().__init__()
        self.channels = channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.n_layers = n_layers
        self.half_channels = channels // 2
        self.mean_only = mean_only

        self.pre = nn.Conv1d(self.half_channels, hidden_channels, 1)
        self.enc = (
            Encoder(
                hidden_channels,
                filter_channels,
                n_heads,
                n_layers,
                kernel_size,
                p_dropout,
                isflow=True,
                gin_channels=gin_channels,
            )
            if wn_sharing_parameter is None
            else wn_sharing_parameter
        )
        self.post = nn.Conv1d(hidden_channels, self.half_channels * (2 - mean_only), 1)
        self.post.weight.data.zero_()
        self.post.bias.data.zero_()

    def forward(self, x, x_mask, g=None, reverse=False):
        x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
        h = self.pre(x0) * x_mask
        h = self.enc(h, x_mask, g=g)
        stats = self.post(h) * x_mask
        if not self.mean_only:
            m, logs = torch.split(stats, [self.half_channels] * 2, 1)
        else:
            m = stats
            logs = torch.zeros_like(m)

        if not reverse:
            x1 = m + x1 * torch.exp(logs) * x_mask
            x = torch.cat([x0, x1], 1)
            logdet = torch.sum(logs, [1, 2])
            return x, logdet
        else:
            x1 = (x1 - m) * torch.exp(-logs) * x_mask
            x = torch.cat([x0, x1], 1)
            return x

        x1, logabsdet = piecewise_rational_quadratic_transform(
            x1,
            unnormalized_widths,
            unnormalized_heights,
            unnormalized_derivatives,
            inverse=reverse,
            tails="linear",
            tail_bound=self.tail_bound,
        )

        x = torch.cat([x0, x1], 1) * x_mask
        logdet = torch.sum(logabsdet * x_mask, [1, 2])
        if not reverse:
            return x, logdet
        else:
            return x


================================================
FILE: melo/monotonic_align/__init__.py
================================================
from numpy import zeros, int32, float32
from torch import from_numpy

from .core import maximum_path_jit


def maximum_path(neg_cent, mask):
    device = neg_cent.device
    dtype = neg_cent.dtype
    neg_cent = neg_cent.data.cpu().numpy().astype(float32)
    path = zeros(neg_cent.shape, dtype=int32)

    t_t_max = mask.sum(1)[:, 0].data.cpu().numpy().astype(int32)
    t_s_max = mask.sum(2)[:, 0].data.cpu().numpy().astype(int32)
    maximum_path_jit(path, neg_cent, t_t_max, t_s_max)
    return from_numpy(path).to(device=device, dtype=dtype)


================================================
FILE: melo/monotonic_align/core.py
================================================
import numba


@numba.jit(
    numba.void(
        numba.int32[:, :, ::1],
        numba.float32[:, :, ::1],
        numba.int32[::1],
        numba.int32[::1],
    ),
    nopython=True,
    nogil=True,
)
def maximum_path_jit(paths, values, t_ys, t_xs):
    b = paths.shape[0]
    max_neg_val = -1e9
    for i in range(int(b)):
        path = paths[i]
        value = values[i]
        t_y = t_ys[i]
        t_x = t_xs[i]

        v_prev = v_cur = 0.0
        index = t_x - 1

        for y in range(t_y):
            for x in range(max(0, t_x + y - t_y), min(t_x, y + 1)):
                if x == y:
                    v_cur = max_neg_val
                else:
                    v_cur = value[y - 1, x]
                if x == 0:
                    if y == 0:
                        v_prev = 0.0
                    else:
                        v_prev = max_neg_val
                else:
                    v_prev = value[y - 1, x - 1]
                value[y, x] += max(v_prev, v_cur)

        for y in range(t_y - 1, -1, -1):
            path[y, index] = 1
            if index != 0 and (
                index == y or value[y - 1, index] < value[y - 1, index - 1]
            ):
                index = index - 1


================================================
FILE: melo/preprocess_text.py
================================================
import json
from collections import defaultdict
from random import shuffle
from typing import Optional

from tqdm import tqdm
import click
from text.cleaner import clean_text_bert
import os
import torch
from text.symbols import symbols, num_languages, num_tones

@click.command()
@click.option(
    "--metadata",
    default="data/example/metadata.list",
    type=click.Path(exists=True, file_okay=True, dir_okay=False),
)
@click.option("--cleaned-path", default=None)
@click.option("--train-path", default=None)
@click.option("--val-path", default=None)
@click.option(
    "--config_path",
    default="configs/config.json",
    type=click.Path(exists=True, file_okay=True, dir_okay=False),
)
@click.option("--val-per-spk", default=4)
@click.option("--max-val-total", default=8)
@click.option("--clean/--no-clean", default=True)
def main(
    metadata: str,
    cleaned_path: Optional[str],
    train_path: str,
    val_path: str,
    config_path: str,
    val_per_spk: int,
    max_val_total: int,
    clean: bool,
):
    if train_path is None:
        train_path = os.path.join(os.path.dirname(metadata), 'train.list')
    if val_path is None:
        val_path = os.path.join(os.path.dirname(metadata), 'val.list')
    out_config_path = os.path.join(os.path.dirname(metadata), 'config.json')

    if cleaned_path is None:
        cleaned_path = metadata + ".cleaned"

    if clean:
        out_file = open(cleaned_path, "w", encoding="utf-8")
        new_symbols = []
        for line in tqdm(open(metadata, encoding="utf-8").readlines()):
            try:
                utt, spk, language, text = line.strip().split("|")
                norm_text, phones, tones, word2ph, bert = clean_text_bert(text, language, device='cuda:0')
                for ph in phones:
                    if ph not in symbols and ph not in new_symbols:
                        new_symbols.append(ph)
                        print('update!, now symbols:')
                        print(new_symbols)
                        with open(f'{language}_symbol.txt', 'w') as f:
                            f.write(f'{new_symbols}')

                assert len(phones) == len(tones)
                assert len(phones) == sum(word2ph)
                out_file.write(
                    "{}|{}|{}|{}|{}|{}|{}\n".format(
                        utt,
                        spk,
                        language,
                        norm_text,
                        " ".join(phones),
                        " ".join([str(i) for i in tones]),
                        " ".join([str(i) for i in word2ph]),
                    )
                )
                bert_path = utt.replace(".wav", ".bert.pt")
                os.makedirs(os.path.dirname(bert_path), exist_ok=True)
                torch.save(bert.cpu(), bert_path)
            except Exception as error:
                print("err!", line, error)

        out_file.close()

        metadata = cleaned_path

    spk_utt_map = defaultdict(list)
    spk_id_map = {}
    current_sid = 0

    with open(metadata, encoding="utf-8") as f:
        for line in f.readlines():
            utt, spk, language, text, phones, tones, word2ph = line.strip().split("|")
            spk_utt_map[spk].append(line)

            if spk not in spk_id_map.keys():
                spk_id_map[spk] = current_sid
                current_sid += 1

    train_list = []
    val_list = []

    for spk, utts in spk_utt_map.items():
        shuffle(utts)
        val_list += utts[:val_per_spk]
        train_list += utts[val_per_spk:]

    if len(val_list) > max_val_total:
        train_list += val_list[max_val_total:]
        val_list = val_list[:max_val_total]

    with open(train_path, "w", encoding="utf-8") as f:
        for line in train_list:
            f.write(line)

    with open(val_path, "w", encoding="utf-8") as f:
        for line in val_list:
            f.write(line)

    config = json.load(open(config_path, encoding="utf-8"))
    config["data"]["spk2id"] = spk_id_map

    config["data"]["training_files"] = train_path
    config["data"]["validation_files"] = val_path
    config["data"]["n_speakers"] = len(spk_id_map)
    config["num_languages"] = num_languages
    config["num_tones"] = num_tones
    config["symbols"] = symbols

    with open(out_config_path, "w", encoding="utf-8") as f:
        json.dump(config, f, indent=2, ensure_ascii=False)


if __name__ == "__main__":
    main()


================================================
FILE: melo/split_utils.py
================================================
import re
import os
import glob
import numpy as np
import soundfile as sf
import torchaudio
import re

def split_sentence(text, min_len=10, language_str='EN'):
    if language_str in ['EN', 'FR', 'ES', 'SP']:
        sentences = split_sentences_latin(text, min_len=min_len)
    else:
        sentences = split_sentences_zh(text, min_len=min_len)
    return sentences


def split_sentences_latin(text, min_len=10):
    text = re.sub('[。!?;]', '.', text)
    text = re.sub('[,]', ',', text)
    text = re.sub('[“”]', '"', text)
    text = re.sub('[‘’]', "'", text)
    text = re.sub(r"[\<\>\(\)\[\]\"\«\»]+", "", text)
    return [item.strip() for item in txtsplit(text, 256, 512) if item.strip()]


def split_sentences_zh(text, min_len=10):
    text = re.sub('[。!?;]', '.', text)
    text = re.sub('[,]', ',', text)
    # 将文本中的换行符、空格和制表符替换为空格
    text = re.sub('[\n\t ]+', ' ', text)
    # 在标点符号后添加一个空格
    text = re.sub('([,.!?;])', r'\1 $#!', text)
    # 分隔句子并去除前后空格
    # sentences = [s.strip() for s in re.split('(。|!|?|;)', text)]
    sentences = [s.strip() for s in text.split('$#!')]
    if len(sentences[-1]) == 0: del sentences[-1]

    new_sentences = []
    new_sent = []
    count_len = 0
    for ind, sent in enumerate(sentences):
        new_sent.append(sent)
        count_len += len(sent)
        if count_len > min_len or ind == len(sentences) - 1:
            count_len = 0
            new_sentences.append(' '.join(new_sent))
            new_sent = []
    return merge_short_sentences_zh(new_sentences)


def merge_short_sentences_en(sens):
    """Avoid short sentences by merging them with the following sentence.

    Args:
        List[str]: list of input sentences.

    Returns:
        List[str]: list of output sentences.
    """
    sens_out = []
    for s in sens:
        # If the previous sentense is too short, merge them with
        # the current sentence.
        if len(sens_out) > 0 and len(sens_out[-1].split(" ")) <= 2:
            sens_out[-1] = sens_out[-1] + " " + s
        else:
            sens_out.append(s)
    try:
        if len(sens_out[-1].split(" ")) <= 2:
            sens_out[-2] = sens_out[-2] + " " + sens_out[-1]
            sens_out.pop(-1)
    except:
        pass
    return sens_out


def merge_short_sentences_zh(sens):
    # return sens
    """Avoid short sentences by merging them with the following sentence.

    Args:
        List[str]: list of input sentences.

    Returns:
        List[str]: list of output sentences.
    """
    sens_out = []
    for s in sens:
        # If the previous sentense is too short, merge them with
        # the current sentence.
        if len(sens_out) > 0 and len(sens_out[-1]) <= 2:
            sens_out[-1] = sens_out[-1] + " " + s
        else:
            sens_out.append(s)
    try:
        if len(sens_out[-1]) <= 2:
            sens_out[-2] = sens_out[-2] + " " + sens_out[-1]
            sens_out.pop(-1)
    except:
        pass
    return sens_out



def txtsplit(text, desired_length=100, max_length=200):
    """Split text it into chunks of a desired length trying to keep sentences intact."""
    text = re.sub(r'\n\n+', '\n', text)
    text = re.sub(r'\s+', ' ', text)
    text = re.sub(r'[""]', '"', text)
    text = re.sub(r'([,.?!])', r'\1 ', text)
    text = re.sub(r'\s+', ' ', text)
    
    rv = []
    in_quote = False
    current = ""
    split_pos = []
    pos = -1
    end_pos = len(text) - 1
    def seek(delta):
        nonlocal pos, in_quote, current
        is_neg = delta < 0
        for _ in range(abs(delta)):
            if is_neg:
                pos -= 1
                current = current[:-1]
            else:
                pos += 1
                current += text[pos]
            if text[pos] == '"':
                in_quote = not in_quote
        return text[pos]
    def peek(delta):
        p = pos + delta
        return text[p] if p < end_pos and p >= 0 else ""
    def commit():
        nonlocal rv, current, split_pos
        rv.append(current)
        current = ""
        split_pos = []
    while pos < end_pos:
        c = seek(1)
        if len(current) >= max_length:
            if len(split_pos) > 0 and len(current) > (desired_length / 2):
                d = pos - split_pos[-1]
                seek(-d)
            else:
                while c not in '!?.\n ' and pos > 0 and len(current) > desired_length:
                    c = seek(-1)
            commit()
        elif not in_quote and (c in '!?\n' or (c in '.,' and peek(1) in '\n ')):
            while pos < len(text) - 1 and len(current) < max_length and peek(1) in '!?.':
                c = seek(1)
            split_pos.append(pos)
            if len(current) >= desired_length:
                commit()
        elif in_quote and peek(1) == '"' and peek(2) in '\n ':
            seek(2)
            split_pos.append(pos)
    rv.append(current)
    rv = [s.strip() for s in rv]
    rv = [s for s in rv if len(s) > 0 and not re.match(r'^[\s\.,;:!?]*$', s)]
    return rv


if __name__ == '__main__':
    zh_text = "好的,我来给你讲一个故事吧。从前有一个小姑娘,她叫做小红。小红非常喜欢在森林里玩耍,她经常会和她的小伙伴们一起去探险。有一天,小红和她的小伙伴们走到了森林深处,突然遇到了一只凶猛的野兽。小红的小伙伴们都吓得不敢动弹,但是小红并没有被吓倒,她勇敢地走向野兽,用她的智慧和勇气成功地制服了野兽,保护了她的小伙伴们。从那以后,小红变得更加勇敢和自信,成为了她小伙伴们心中的英雄。"
    en_text = "I didn’t know what to do. I said please kill her because it would be better than being kidnapped,” Ben, whose surname CNN is not using for security concerns, said on Wednesday. “It’s a nightmare. I said ‘please kill her, don’t take her there.’"
    sp_text = "¡Claro! ¿En qué tema te gustaría que te hable en español? Puedo proporcionarte información o conversar contigo sobre una amplia variedad de temas, desde cultura y comida hasta viajes y tecnología. ¿Tienes alguna preferencia en particular?"
    fr_text = "Bien sûr ! En quelle matière voudriez-vous que je vous parle en français ? Je peux vous fournir des informations ou discuter avec vous sur une grande variété de sujets, que ce soit la culture, la nourriture, les voyages ou la technologie. Avez-vous une préférence particulière ?"

    print(split_sentence(zh_text, language_str='ZH'))
    print(split_sentence(en_text, language_str='EN'))
    print(split_sentence(sp_text, language_str='SP'))
    print(split_sentence(fr_text, language_str='FR'))


================================================
FILE: melo/text/__init__.py
================================================
from .symbols import *


_symbol_to_id = {s: i for i, s in enumerate(symbols)}


def cleaned_text_to_sequence(cleaned_text, tones, language, symbol_to_id=None):
    """Converts a string of text to a sequence of IDs corresponding to the symbols in the text.
    Args:
      text: string to convert to a sequence
    Returns:
      List of integers corresponding to the symbols in the text
    """
    symbol_to_id_map = symbol_to_id if symbol_to_id else _symbol_to_id
    phones = [symbol_to_id_map[symbol] for symbol in cleaned_text]
    tone_start = language_tone_start_map[language]
    tones = [i + tone_start for i in tones]
    lang_id = language_id_map[language]
    lang_ids = [lang_id for i in phones]
    return phones, tones, lang_ids


def get_bert(norm_text, word2ph, language, device):
    from .chinese_bert import get_bert_feature as zh_bert
    from .english_bert import get_bert_feature as en_bert
    from .japanese_bert import get_bert_feature as jp_bert
    from .chinese_mix import get_bert_feature as zh_mix_en_bert
    from .spanish_bert import get_bert_feature as sp_bert
    from .french_bert import get_bert_feature as fr_bert
    from .korean import get_bert_feature as kr_bert

    lang_bert_func_map = {"ZH": zh_bert, "EN": en_bert, "JP": jp_bert, 'ZH_MIX_EN': zh_mix_en_bert, 
                          'FR': fr_bert, 'SP': sp_bert, 'ES': sp_bert, "KR": kr_bert}
    bert = lang_bert_func_map[language](norm_text, word2ph, device)
    return bert


================================================
FILE: melo/text/chinese.py
================================================
import os
import re

import cn2an
from pypinyin import lazy_pinyin, Style

from .symbols import punctuation
from .tone_sandhi import ToneSandhi

current_file_path = os.path.dirname(__file__)
pinyin_to_symbol_map = {
    line.split("\t")[0]: line.strip().split("\t")[1]
    for line in open(os.path.join(current_file_path, "opencpop-strict.txt")).readlines()
}

import jieba.posseg as psg


rep_map = {
    ":": ",",
    ";": ",",
    ",": ",",
    "。": ".",
    "!": "!",
    "?": "?",
    "\n": ".",
    "·": ",",
    "、": ",",
    "...": "…",
    "$": ".",
    "“": "'",
    "”": "'",
    "‘": "'",
    "’": "'",
    "(": "'",
    ")": "'",
    "(": "'",
    ")": "'",
    "《": "'",
    "》": "'",
    "【": "'",
    "】": "'",
    "[": "'",
    "]": "'",
    "—": "-",
    "~": "-",
    "~": "-",
    "「": "'",
    "」": "'",
}

tone_modifier = ToneSandhi()


def replace_punctuation(text):
    text = text.replace("嗯", "恩").replace("呣", "母")
    pattern = re.compile("|".join(re.escape(p) for p in rep_map.keys()))

    replaced_text = pattern.sub(lambda x: rep_map[x.group()], text)

    replaced_text = re.sub(
        r"[^\u4e00-\u9fa5" + "".join(punctuation) + r"]+", "", replaced_text
    )

    return replaced_text


def g2p(text):
    pattern = r"(?<=[{0}])\s*".format("".join(punctuation))
    sentences = [i for i in re.split(pattern, text) if i.strip() != ""]
    phones, tones, word2ph = _g2p(sentences)
    assert sum(word2ph) == len(phones)
    assert len(word2ph) == len(text)  # Sometimes it will crash,you can add a try-catch.
    phones = ["_"] + phones + ["_"]
    tones = [0] + tones + [0]
    word2ph = [1] + word2ph + [1]
    return phones, tones, word2ph


def _get_initials_finals(word):
    initials = []
    finals = []
    orig_initials = lazy_pinyin(word, neutral_tone_with_five=True, style=Style.INITIALS)
    orig_finals = lazy_pinyin(
        word, neutral_tone_with_five=True, style=Style.FINALS_TONE3
    )
    for c, v in zip(orig_initials, orig_finals):
        initials.append(c)
        finals.append(v)
    return initials, finals


def _g2p(segments):
    phones_list = []
    tones_list = []
    word2ph = []
    for seg in segments:
        # Replace all English words in the sentence
        seg = re.sub("[a-zA-Z]+", "", seg)
        seg_cut = psg.lcut(seg)
        initials = []
        finals = []
        seg_cut = tone_modifier.pre_merge_for_modify(seg_cut)
        for word, pos in seg_cut:
            if pos == "eng":
                import pdb; pdb.set_trace()
                continue
            sub_initials, sub_finals = _get_initials_finals(word)
            sub_finals = tone_modifier.modified_tone(word, pos, sub_finals)
            initials.append(sub_initials)
            finals.append(sub_finals)

            # assert len(sub_initials) == len(sub_finals) == len(word)
        initials = sum(initials, [])
        finals = sum(finals, [])
        #
        for c, v in zip(initials, finals):
            raw_pinyin = c + v
            # NOTE: post process for pypinyin outputs
            # we discriminate i, ii and iii
            if c == v:
                assert c in punctuation
                phone = [c]
                tone = "0"
                word2ph.append(1)
            else:
                v_without_tone = v[:-1]
                tone = v[-1]

                pinyin = c + v_without_tone
                assert tone in "12345"

                if c:
                    # 多音节
                    v_rep_map = {
                        "uei": "ui",
                        "iou": "iu",
                        "uen": "un",
                    }
                    if v_without_tone in v_rep_map.keys():
                        pinyin = c + v_rep_map[v_without_tone]
                else:
                    # 单音节
                    pinyin_rep_map = {
                        "ing": "ying",
                        "i": "yi",
                        "in": "yin",
                        "u": "wu",
                    }
                    if pinyin in pinyin_rep_map.keys():
                        pinyin = pinyin_rep_map[pinyin]
                    else:
                        single_rep_map = {
                            "v": "yu",
                            "e": "e",
                            "i": "y",
                            "u": "w",
                        }
                        if pinyin[0] in single_rep_map.keys():
                            pinyin = single_rep_map[pinyin[0]] + pinyin[1:]

                assert pinyin in pinyin_to_symbol_map.keys(), (pinyin, seg, raw_pinyin)
                phone = pinyin_to_symbol_map[pinyin].split(" ")
                word2ph.append(len(phone))

            phones_list += phone
            tones_list += [int(tone)] * len(phone)
    return phones_list, tones_list, word2ph


def text_normalize(text):
    numbers = re.findall(r"\d+(?:\.?\d+)?", text)
    for number in numbers:
        text = text.replace(number, cn2an.an2cn(number), 1)
    text = replace_punctuation(text)
    return text


def get_bert_feature(text, word2ph, device=None):
    from text import chinese_bert

    return chinese_bert.get_bert_feature(text, word2ph, device=device)


if __name__ == "__main__":
    from text.chinese_bert import get_bert_feature

    text = "啊!chemistry 但是《原神》是由,米哈\游自主,  [研发]的一款全.新开放世界.冒险游戏"
    text = text_normalize(text)
    print(text)
    phones, tones, word2ph = g2p(text)
    bert = get_bert_feature(text, word2ph)

    print(phones, tones, word2ph, bert.shape)


# # 示例用法
# text = "这是一个示例文本:,你好!这是一个测试...."
# print(g2p_paddle(text))  # 输出: 这是一个示例文本你好这是一个测试


================================================
FILE: melo/text/chinese_bert.py
================================================
import torch
import sys
from transformers import AutoTokenizer, AutoModelForMaskedLM


# model_id = 'hfl/chinese-roberta-wwm-ext-large'
local_path = "./bert/chinese-roberta-wwm-ext-large"


tokenizers = {}
models = {}

def get_bert_feature(text, word2ph, device=None, model_id='hfl/chinese-roberta-wwm-ext-large'):
    if model_id not in models:
        models[model_id] = AutoModelForMaskedLM.from_pretrained(
            model_id
        ).to(device)
        tokenizers[model_id] = AutoTokenizer.from_pretrained(model_id)
    model = models[model_id]
    tokenizer = tokenizers[model_id]

    if (
        sys.platform == "darwin"
        and torch.backends.mps.is_available()
        and device == "cpu"
    ):
        device = "mps"
    if not device:
        device = "cuda"

    with torch.no_grad():
        inputs = tokenizer(text, return_tensors="pt")
        for i in inputs:
            inputs[i] = inputs[i].to(device)
        res = model(**inputs, output_hidden_states=True)
        res = torch.cat(res["hidden_states"][-3:-2], -1)[0].cpu()
    # import pdb; pdb.set_trace()
    # assert len(word2ph) == len(text) + 2
    word2phone = word2ph
    phone_level_feature = []
    for i in range(len(word2phone)):
        repeat_feature = res[i].repeat(word2phone[i], 1)
        phone_level_feature.append(repeat_feature)

    phone_level_feature = torch.cat(phone_level_feature, dim=0)
    return phone_level_feature.T


if __name__ == "__main__":
    import torch

    word_level_feature = torch.rand(38, 1024)  # 12个词,每个词1024维特征
    word2phone = [
        1,
        2,
        1,
        2,
        2,
        1,
        2,
        2,
        1,
        2,
        2,
        1,
        2,
        2,
        2,
        2,
        2,
        1,
        1,
        2,
        2,
        1,
        2,
        2,
        2,
        2,
        1,
        2,
        2,
        2,
        2,
        2,
        1,
        2,
        2,
        2,
        2,
        1,
    ]

    # 计算总帧数
    total_frames = sum(word2phone)
    print(word_level_feature.shape)
    print(word2phone)
    phone_level_feature = []
    for i in range(len(word2phone)):
        print(word_level_feature[i].shape)

        # 对每个词重复word2phone[i]次
        repeat_feature = word_level_feature[i].repeat(word2phone[i], 1)
        phone_level_feature.append(repeat_feature)

    phone_level_feature = torch.cat(phone_level_feature, dim=0)
    print(phone_level_feature.shape)  # torch.Size([36, 1024])


================================================
FILE: melo/text/chinese_mix.py
================================================
import os
import re

import cn2an
from pypinyin import lazy_pinyin, Style

# from text.symbols import punctuation
from .symbols import language_tone_start_map
from .tone_sandhi import ToneSandhi
from .english import g2p as g2p_en
from transformers import AutoTokenizer

punctuation = ["!", "?", "…", ",", ".", "'", "-"]
current_file_path = os.path.dirname(__file__)
pinyin_to_symbol_map = {
    line.split("\t")[0]: line.strip().split("\t")[1]
    for line in open(os.path.join(current_file_path, "opencpop-strict.txt")).readlines()
}

import jieba.posseg as psg


rep_map = {
    ":": ",",
    ";": ",",
    ",": ",",
    "。": ".",
    "!": "!",
    "?": "?",
    "\n": ".",
    "·": ",",
    "、": ",",
    "...": "…",
    "$": ".",
    "“": "'",
    "”": "'",
    "‘": "'",
    "’": "'",
    "(": "'",
    ")": "'",
    "(": "'",
    ")": "'",
    "《": "'",
    "》": "'",
    "【": "'",
    "】": "'",
    "[": "'",
    "]": "'",
    "—": "-",
    "~": "-",
    "~": "-",
    "「": "'",
    "」": "'",
}

tone_modifier = ToneSandhi()


def replace_punctuation(text):
    text = text.replace("嗯", "恩").replace("呣", "母")
    pattern = re.compile("|".join(re.escape(p) for p in rep_map.keys()))
    replaced_text = pattern.sub(lambda x: rep_map[x.group()], text)
    replaced_text = re.sub(r"[^\u4e00-\u9fa5_a-zA-Z\s" + "".join(punctuation) + r"]+", "", replaced_text)
    replaced_text = re.sub(r"[\s]+", " ", replaced_text)

    return replaced_text


def g2p(text, impl='v2'):
    pattern = r"(?<=[{0}])\s*".format("".join(punctuation))
    sentences = [i for i in re.split(pattern, text) if i.strip() != ""]
    if impl == 'v1':
        _func = _g2p
    elif impl == 'v2':
        _func = _g2p_v2
    else:
        raise NotImplementedError()
    phones, tones, word2ph = _func(sentences)
    assert sum(word2ph) == len(phones)
    # assert len(word2ph) == len(text)  # Sometimes it will crash,you can add a try-catch.
    phones = ["_"] + phones + ["_"]
    tones = [0] + tones + [0]
    word2ph = [1] + word2ph + [1]
    return phones, tones, word2ph


def _get_initials_finals(word):
    initials = []
    finals = []
    orig_initials = lazy_pinyin(word, neutral_tone_with_five=True, style=Style.INITIALS)
    orig_finals = lazy_pinyin(
        word, neutral_tone_with_five=True, style=Style.FINALS_TONE3
    )
    for c, v in zip(orig_initials, orig_finals):
        initials.append(c)
        finals.append(v)
    return initials, finals

model_id = 'bert-base-multilingual-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_id)
def _g2p(segments):
    phones_list = []
    tones_list = []
    word2ph = []
    for seg in segments:
        # Replace all English words in the sentence
        # seg = re.sub("[a-zA-Z]+", "", seg)
        seg_cut = psg.lcut(seg)
        initials = []
        finals = []
        seg_cut = tone_modifier.pre_merge_for_modify(seg_cut)
        for word, pos in seg_cut:
            if pos == "eng":
                initials.append(['EN_WORD'])
                finals.append([word])
            else:
                sub_initials, sub_finals = _get_initials_finals(word)
                sub_finals = tone_modifier.modified_tone(word, pos, sub_finals)
                initials.append(sub_initials)
                finals.append(sub_finals)

            # assert len(sub_initials) == len(sub_finals) == len(word)
        initials = sum(initials, [])
        finals = sum(finals, [])
        #
        for c, v in zip(initials, finals):
            if c == 'EN_WORD':
                tokenized_en = tokenizer.tokenize(v)
                phones_en, tones_en, word2ph_en = g2p_en(text=None, pad_start_end=False, tokenized=tokenized_en)
                # apply offset to tones_en
                tones_en = [t + language_tone_start_map['EN'] for t in tones_en]
                phones_list += phones_en
                tones_list += tones_en
                word2ph += word2ph_en
            else:
                raw_pinyin = c + v
                # NOTE: post process for pypinyin outputs
                # we discriminate i, ii and iii
                if c == v:
                    assert c in punctuation
                    phone = [c]
                    tone = "0"
                    word2ph.append(1)
                else:
                    v_without_tone = v[:-1]
                    tone = v[-1]

                    pinyin = c + v_without_tone
                    assert tone in "12345"

                    if c:
                        # 多音节
                        v_rep_map = {
                            "uei": "ui",
                            "iou": "iu",
                            "uen": "un",
                        }
                        if v_without_tone in v_rep_map.keys():
                            pinyin = c + v_rep_map[v_without_tone]
                    else:
                        # 单音节
                        pinyin_rep_map = {
                            "ing": "ying",
                            "i": "yi",
                            "in": "yin",
                            "u": "wu",
                        }
                        if pinyin in pinyin_rep_map.keys():
                            pinyin = pinyin_rep_map[pinyin]
                        else:
                            single_rep_map = {
                                "v": "yu",
                                "e": "e",
                                "i": "y",
                                "u": "w",
                            }
                            if pinyin[0] in single_rep_map.keys():
                                pinyin = single_rep_map[pinyin[0]] + pinyin[1:]

                    assert pinyin in pinyin_to_symbol_map.keys(), (pinyin, seg, raw_pinyin)
                    phone = pinyin_to_symbol_map[pinyin].split(" ")
                    word2ph.append(len(phone))

                phones_list += phone
                tones_list += [int(tone)] * len(phone)
    return phones_list, tones_list, word2ph


def text_normalize(text):
    numbers = re.findall(r"\d+(?:\.?\d+)?", text)
    for number in numbers:
        text = text.replace(number, cn2an.an2cn(number), 1)
    text = replace_punctuation(text)
    return text


def get_bert_feature(text, word2ph, device):
    from . import chinese_bert
    return chinese_bert.get_bert_feature(text, word2ph, model_id='bert-base-multilingual-uncased', device=device)

from .chinese import _g2p as _chinese_g2p
def _g2p_v2(segments):
    spliter = '#$&^!@'

    phones_list = []
    tones_list = []
    word2ph = []

    for text in segments:
        assert spliter not in text
        # replace all english words
        text = re.sub('([a-zA-Z\s]+)', lambda x: f'{spliter}{x.group(1)}{spliter}', text)
        texts = text.split(spliter)
        texts = [t for t in texts if len(t) > 0]

        
        for text in texts:
            if re.match('[a-zA-Z\s]+', text):
                # english
                tokenized_en = tokenizer.tokenize(text)
                phones_en, tones_en, word2ph_en = g2p_en(text=None, pad_start_end=False, tokenized=tokenized_en)
                # apply offset to tones_en
                tones_en = [t + language_tone_start_map['EN'] for t in tones_en]
                phones_list += phones_en
                tones_list += tones_en
                word2ph += word2ph_en
            else:
                phones_zh, tones_zh, word2ph_zh = _chinese_g2p([text])
                phones_list += phones_zh
                tones_list += tones_zh
                word2ph += word2ph_zh
    return phones_list, tones_list, word2ph

    

if __name__ == "__main__":
    # from text.chinese_bert import get_bert_feature

    text = "NFT啊!chemistry 但是《原神》是由,米哈\游自主,  [研发]的一款全.新开放世界.冒险游戏"
    text = '我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。'
    text = '今天下午,我们准备去shopping mall购物,然后晚上去看一场movie。'
    text = '我们现在 also 能够 help 很多公司 use some machine learning 的 algorithms 啊!'
    text = text_normalize(text)
    print(text)
    phones, tones, word2ph = g2p(text, impl='v2')
    bert = get_bert_feature(text, word2ph, device='cuda:0')
    print(phones)
    import pdb; pdb.set_trace()


# # 示例用法
# text = "这是一个示例文本:,你好!这是一个测试...."
# print(g2p_paddle(text))  # 输出: 这是一个示例文本你好这是一个测试


================================================
FILE: melo/text/cleaner.py
================================================
from . import chinese, japanese, english, chinese_mix, korean, french, spanish
from . import cleaned_text_to_sequence
import copy

language_module_map = {"ZH": chinese, "JP": japanese, "EN": english, 'ZH_MIX_EN': chinese_mix, 'KR': korean,
                    'FR': french, 'SP': spanish, 'ES': spanish}


def clean_text(text, language):
    language_module = language_module_map[language]
    norm_text = language_module.text_normalize(text)
    phones, tones, word2ph = language_module.g2p(norm_text)
    return norm_text, phones, tones, word2ph


def clean_text_bert(text, language, device=None):
    language_module = language_module_map[language]
    norm_text = language_module.text_normalize(text)
    phones, tones, word2ph = language_module.g2p(norm_text)
    
    word2ph_bak = copy.deepcopy(word2ph)
    for i in range(len(word2ph)):
        word2ph[i] = word2ph[i] * 2
    word2ph[0] += 1
    bert = language_module.get_bert_feature(norm_text, word2ph, device=device)
    
    return norm_text, phones, tones, word2ph_bak, bert


def text_to_sequence(text, language):
    norm_text, phones, tones, word2ph = clean_text(text, language)
    return cleaned_text_to_sequence(phones, tones, language)


if __name__ == "__main__":
    pass

================================================
FILE: melo/text/cleaner_multiling.py
================================================
"""Set of default text cleaners"""
# TODO: pick the cleaner for languages dynamically

import re

# Regular expression matching whitespace:
_whitespace_re = re.compile(r"\s+")

rep_map = {
    ":": ",",
    ";": ",",
    ",": ",",
    "。": ".",
    "!": "!",
    "?": "?",
    "\n": ".",
    "·": ",",
    "、": ",",
    "...": ".",
    "…": ".",
    "$": ".",
    "“": "'",
    "”": "'",
    "‘": "'",
    "’": "'",
    "(": "'",
    ")": "'",
    "(": "'",
    ")": "'",
    "《": "'",
    "》": "'",
    "【": "'",
    "】": "'",
    "[": "'",
    "]": "'",
    "—": "",
    "~": "-",
    "~": "-",
    "「": "'",
    "」": "'",
}

def replace_punctuation(text):
    pattern = re.compile("|".join(re.escape(p) for p in rep_map.keys()))
    replaced_text = pattern.sub(lambda x: rep_map[x.group()], text)
    return replaced_text

def lowercase(text):
    return text.lower()


def collapse_whitespace(text):
    return re.sub(_whitespace_re, " ", text).strip()

def remove_punctuation_at_begin(text):
    return re.sub(r'^[,.!?]+', '', text)

def remove_aux_symbols(text):
    text = re.sub(r"[\<\>\(\)\[\]\"\«\»\']+", "", text)
    return text


def replace_symbols(text, lang="en"):
    """Replace symbols based on the lenguage tag.

    Args:
      text:
       Input text.
      lang:
        Lenguage identifier. ex: "en", "fr", "pt", "ca".

    Returns:
      The modified text
      example:
        input args:
            text: "si l'avi cau, diguem-ho"
            lang: "ca"
        Output:
            text: "si lavi cau, diguemho"
    """
    text = text.replace(";", ",")
    text = text.replace("-", " ") if lang != "ca" else text.replace("-", "")
    text = text.replace(":", ",")
    if lang == "en":
        text = text.replace("&", " and ")
    elif lang == "fr":
        text = text.replace("&", " et ")
    elif lang == "pt":
        text = text.replace("&", " e ")
    elif lang == "ca":
        text = text.replace("&", " i ")
        text = text.replace("'", "")
    elif lang== "es":
        text=text.replace("&","y")
        text = text.replace("'", "")
    return text

def unicleaners(text, cased=False, lang='en'):
    """Basic pipeline for Portuguese text. There is no need to expand abbreviation and
    numbers, phonemizer already does that"""
    if not cased:
        text = lowercase(text)
    text = replace_punctuation(text)
    text = replace_symbols(text, lang=lang)
    text = remove_aux_symbols(text)
    text = remove_punctuation_at_begin(text)
    text = collapse_whitespace(text)
    text = re.sub(r'([^\.,!\?\-…])$', r'\1.', text)
    return text



================================================
FILE: melo/text/cmudict.rep
================================================
## Date:  August 8, 1998
##
## The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6] is Copyright 1998
## by Carnegie Mellon University. Use of this dictionary, for any research or
## commercial purpose, is completely unrestricted.  If you make use of or
## redistribute this material, we would appreciate acknowlegement of its
## origin.
##
## cmudict.0.6 is the fifth release of cmudict, first released as cmudict.0.1
## in September of 1993.  There was no generally available public release
## of version 0.5.
##
## See the README in this directory before you use this dictionary.
##
## Thanks to Bill Huggins at BBN; Bill Fisher at NIST; Alex Hauptman,
## Alex Rudnicky, Jack Mostow, Roni Rosenfeld, Richard Stern,
## Matthew Siegler, Kevin Lenzo, Maxine Eskenazi, Mosur Ravishankar,
## Eric Thayer, Kristie Seymore, and Raj Reddy at CMU; Lin Chase at
## LIMSI; Doug Paul at MIT Lincoln Labs; Ben Serridge at MIT SLS; Murray
## Spiegel at Bellcore; Tony Robinson at Cambridge UK; David Bowness of
## CAE Electronics Ltd. and CRIM; Stephen Hocking; Jerry Quinn at BNR
## Canada, and Marshal Midden for bringing to our attention problems and
## inadequacies with the first releases. Most special thanks to Bob Weide
## for all his work on prior versions of the dictionary.
##
## We welcome input from users and will continue to acknowledge such input
## in subsequent releases. If I failed to acknowledge your input in this
## release, please remind me and I will update these comments. If I failed to
## fix things that you brought to my attention, please remind me and have
## patience. If I actually fixed things that you brought to my attention and
## you appreciate it, I wouldn't mind a pat on the back.
##
## This version differs from previous releases of cmudict most significantly
## in the addition of new words from the common ARPA tasks for 1996 and 1997.
##
## There are undoubtedly still errors and inconsistencies in this dictionary
## so keep your eyes open for problems and mail them to me.
##
## We hope this dictionary is an improvement over cmudict.0.4.
##
## email: cmudict@cs.cmu.edu
## web:   http://www.speech.cs.cmu.edu/cgi-bin/cmudict
## ftp:   ftp://ftp.cs.cmu.edu/project/speech/dict/
##
## Thank you for your continued interest in the CMU Pronouncing
## Dictionary.  Further addictions and improvements are planned
## for forthcoming releases.
##
!EXCLAMATION-POINT  EH2 K - S K L AH0 - M EY1 - SH AH0 N - P OY2 N T
"CLOSE-QUOTE  K L OW1 Z - K W OW1 T
"DOUBLE-QUOTE  D AH1 - B AH0 L - K W OW1 T
"END-OF-QUOTE  EH1 N - D AH0 V - K W OW1 T
"END-QUOTE  EH1 N D - K W OW1 T
"IN-QUOTES  IH1 N - K W OW1 T S
"QUOTE  K W OW1 T
"UNQUOTE  AH1 N - K W OW1 T
#SHARP-SIGN  SH AA1 R P - S AY1 N
%PERCENT  P ER0 - S EH1 N T
&AMPERSAND  AE1 M - P ER0 - S AE2 N D
'CAUSE  K AH0 Z
'COURSE  K AO1 R S
'EM  AH0 M
'END-INNER-QUOTE  EH1 N - D IH1 - N ER0 - K W OW1 T
'END-QUOTE  EH1 N D - K W OW1 T
'INNER-QUOTE  IH1 - N ER0 - K W OW1 T
'M  AH0 M
'N  AH0 N
'QUOTE  K W OW1 T
'S  EH1 S
'SINGLE-QUOTE  S IH1 NG - G AH0 L - K W OW1 T
'TIL  T IH1 L
'TIS  T IH1 Z
'TWAS  T W AH1 Z
(BEGIN-PARENS  B IH0 - G IH1 N - P ER0 - EH1 N Z
(IN-PARENTHESES  IH1 N - P ER0 - EH1 N - TH AH0 - S IY2 Z
(LEFT-PAREN  L EH1 F T - P ER0 - EH1 N
(OPEN-PARENTHESES  OW1 - P AH0 N - P ER0 - EH1 N - TH AH0 - S IY2 Z
(PAREN  P ER0 - EH1 N
(PARENS  P ER0 - EH1 N Z
(PARENTHESES  P ER0 - EH1 N - TH AH0 - S IY2 Z
)CLOSE-PAREN  K L OW1 Z - P ER0 - EH1 N
)CLOSE-PARENTHESES  K L OW1 Z - P ER0 - EH1 N - TH AH0 - S IY2 Z
)END-PAREN  EH1 N D - P ER0 - EH1 N
)END-PARENS  EH1 N D - P ER0 - EH1 N Z
)END-PARENTHESES  EH1 N D - P ER0 - EH1 N - TH AH0 - S IY2 Z
)END-THE-PAREN  EH1 N D - DH AH0 - P ER0 - EH1 N
)PAREN  P ER0 - EH1 N
)PARENS  P ER0 - EH1 N Z
)RIGHT-PAREN  R AY1 T - P ER0 - EH1 N
)RIGHT-PAREN(2)  R AY1 T - P EH1 - R AH0 N
)UN-PARENTHESES  AH1 N - P ER0 - EH1 N - TH AH0 - S IY1 Z
,COMMA  K AA1 - M AH0
-DASH  D AE1 SH
-HYPHEN  HH AY1 - F AH0 N
...ELLIPSIS  IH0 - L IH1 P - S IH0 S
.DECIMAL  D EH1 - S AH0 - M AH0 L
.DOT  D AA1 T
.FULL-STOP  F UH1 L - S T AA1 P
.PERIOD  P IH1 - R IY0 - AH0 D
.POINT  P OY1 N T
/SLASH  S L AE1 SH
0MALEFACTORS  M AE1 - L AH0 - F AE2 K - T ER0 Z
:COLON  K OW1 - L AH0 N
;SEMI-COLON  S EH1 - M IY0 - K OW1 - L AH0 N
;SEMI-COLON(2)  S EH1 - M IH0 - K OW2 - L AH0 N
?QUESTION-MARK  K W EH1 S - CH AH0 N - M AA1 R K
A  AH0
A'S  EY1 Z
A(2)  EY1
A.  EY1
A.'S  EY1 Z
A.S  EY1 Z
A42128  EY1 - F AO1 R - T UW1 - W AH1 N - T UW1 - EY1 T
AAA  T R IH2 - P AH0 - L EY1
AABERG  AA1 - B ER0 G
AACHEN  AA1 - K AH0 N
AAKER  AA1 - K ER0
AALSETH  AA1 L - S EH0 TH
AAMODT  AA1 - M AH0 T
AANCOR  AA1 N - K AO2 R
AARDEMA  AA0 R - D EH1 - M AH0
AARDVARK  AA1 R D - V AA2 R K
AARON  EH1 - R AH0 N
AARON'S  EH1 - R AH0 N Z
AARONS  EH1 - R AH0 N Z
AARONSON  EH1 - R AH0 N - S AH0 N
AARONSON'S  EH1 - R AH0 N - S AH0 N Z
AARONSON'S(2)  AA1 - R AH0 N - S AH0 N Z
AARONSON(2)  AA1 - R AH0 N - S AH0 N
AARTI  AA1 R - T IY2
AASE  AA1 S
AASEN  AA1 - S AH0 N
AB  AE1 B
AB(2)  EY1 - B IY1
ABABA  AH0 - B AA1 - B AH0
ABABA(2)  AA1 - B AH0 - B AH0
ABACHA  AE1 - B AH0 - K AH0
ABACK  AH0 - B AE1 K
ABACO  AE1 - B AH0 - K OW2
ABACUS  AE1 - B AH0 - K AH0 S
ABAD  AH0 - B AA1 D
ABADAKA  AH0 - B AE1 - D AH0 - K AH0
ABADI  AH0 - B AE1 - D IY0
ABADIE  AH0 - B AE1 - D IY0
ABAIR  AH0 - B EH1 R
ABALKIN  AH0 - B AA1 L - K IH0 N
ABALONE  AE2 - B AH0 - L OW1 - N IY0
ABALOS  AA0 - B AA1 - L OW0 Z
ABANDON  AH0 - B AE1 N - D AH0 N
ABANDONED  AH0 - B AE1 N - D AH0 N D
ABANDONING  AH0 - B AE1 N - D AH0 - N IH0 NG
ABANDONMENT  AH0 - B AE1 N - D AH0 N - M AH0 N T
ABANDONMENTS  AH0 - B AE1 N - D AH0 N - M AH0 N T S
ABANDONS  AH0 - B AE1 N - D AH0 N Z
ABANTO  AH0 - B AE1 N - T OW0
ABARCA  AH0 - B AA1 R - K AH0
ABARE  AA0 - B AA1 - R IY0
ABASCAL  AE1 - B AH0 S - K AH0 L
ABASH  AH0 - B AE1 SH
ABASHED  AH0 - B AE1 SH T
ABATE  AH0 - B EY1 T
ABATED  AH0 - B EY1 - T IH0 D
ABATEMENT  AH0 - B EY1 T - M AH0 N T
ABATEMENTS  AH0 - B EY1 T - M AH0 N T S
ABATES  AH0 - B EY1 T S
ABATING  AH0 - B EY1 - T IH0 NG
ABBA  AE1 - B AH0
ABBADO  AH0 - B AA1 - D OW0
ABBAS  AH0 - B AA1 S
ABBASI  AA0 - B AA1 - S IY0
ABBATE  AA1 - B EY0 T
ABBATIELLO  AA0 - B AA0 - T IY0 - EH1 - L OW0
ABBE  AE1 - B IY0
ABBE(2)  AE0 - B EY1
ABBENHAUS  AE1 - B AH0 N - HH AW2 S
ABBETT  AH0 - B EH1 T
ABBEVILLE  AE1 B - V IH0 L
ABBEY  AE1 - B IY0
ABBEY'S  AE1 - B IY0 Z
ABBIE  AE1 - B IY0
ABBITT  AE1 - B IH0 T
ABBOT  AE1 - B AH0 T
ABBOTT  AE1 - B AH0 T
ABBOTT'S  AE1 - B AH0 T S
ABBOUD  AH0 - B UW1 D
ABBOUD(2)  AH0 - B AW1 D
ABBREVIATE  AH0 - B R IY1 - V IY0 - EY2 T
ABBREVIATED  AH0 - B R IY1 - V IY0 - EY2 - T AH0 D
ABBREVIATED(2)  AH0 - B R IY1 - V IY0 - EY2 - T IH0 D
ABBREVIATES  AH0 - B R IY1 - V IY0 - EY2 T S
ABBREVIATING  AH0 - B R IY1 - V IY0 - EY2 - T IH0 NG
ABBREVIATION  AH0 - B R IY2 - V IY0 - EY1 - SH AH0 N
ABBREVIATIONS  AH0 - B R IY2 - V IY0 - EY1 - SH AH0 N Z
ABBRUZZESE  AA0 - B R UW0 T - S EY1 - Z IY0
ABBS  AE1 B Z
ABBY  AE1 - B IY0
ABCO  AE1 B - K OW0
ABCOTEK  AE1 B - K OW0 - T EH2 K
ABDALLA  AE2 B - D AE1 - L AH0
ABDALLAH  AE2 B - D AE1 - L AH0
ABDEL  AE1 B - D EH2 L
ABDELLA  AE2 B - D EH1 - L AH0
ABDICATE  AE1 B - D AH0 - K EY2 T
ABDICATED  AE1 B - D AH0 - K EY2 - T AH0 D
ABDICATES  AE1 B - D AH0 - K EY2 T S
ABDICATING  AE1 B - D IH0 - K EY2 - T IH0 NG
ABDICATION  AE2 B - D IH0 - K EY1 - SH AH0 N
ABDNOR  AE1 B D - N ER0
ABDO  AE1 B - D OW0
ABDOLLAH  AE2 B - D AA1 - L AH0
ABDOMEN  AE0 B - D OW1 - M AH0 N
ABDOMEN(2)  AE1 B - D AH0 - M AH0 N
ABDOMINAL  AE0 B - D AA1 - M AH0 - N AH0 L
ABDOMINAL(2)  AH0 B - D AA1 - M AH0 - N AH0 L
ABDUCT  AE0 B - D AH1 K T
ABDUCTED  AE0 B - D AH1 K - T IH0 D
ABDUCTED(2)  AH0 B - D AH1 K - T IH0 D
ABDUCTEE  AE0 B - D AH2 K - T IY1
ABDUCTEES  AE0 B - D AH2 K - T IY1 Z
ABDUCTING  AE0 B - D AH1 K - T IH0 NG
ABDUCTING(2)  AH0 B - D AH1 K - T IH0 NG
ABDUCTION  AE0 B - D AH1 K - SH AH0 N
ABDUCTION(2)  AH0 B - D AH1 K - SH AH0 N
ABDUCTIONS  AE0 B - D AH1 K - SH AH0 N Z
ABDUCTIONS(2)  AH0 B - D AH1 K - SH AH0 N Z
ABDUCTOR  AE0 B - D AH1 K - T ER0
ABDUCTOR(2)  AH0 B - D AH1 K - T ER0
ABDUCTORS  AE0 B - D AH1 K - T ER0 Z
ABDUCTORS(2)  AH0 B - D AH1 K - T ER0 Z
ABDUCTS  AE0 B - D AH1 K T S
ABDUL  AE0 B - D UW1 L
ABDULAZIZ  AE0 B - D UW2 - L AH0 - Z IY1 Z
ABDULLA  AA0 B - D UW1 - L AH0
ABDULLAH  AE2 B - D AH1 - L AH0
ABE  EY1 B
ABED  AH0 - B EH1 D
ABEDI  AH0 - B EH1 - D IY0
ABEE  AH0 - B IY1
ABEL  EY1 - B AH0 L
ABELA  AA0 - B EH1 - L AH0
ABELARD  AE1 - B IH0 - L ER0 D
ABELE  AH0 - B IY1 L
ABELES  AH0 - B IY1 L Z
ABELES(2)  EY1 - B AH0 - L IY2 Z
ABELL  EY1 - B AH0 L
ABELLA  AH0 - B EH1 - L AH0
ABELN  AE1 - B IH0 L N
ABELOW  AE1 - B AH0 - L OW0
ABELS  EY1 - B AH0 L Z
ABELSON  AE1 - B IH0 L - S AH0 N
ABEND  AE1 - B EH0 N D
ABEND(2)  AH0 - B EH1 N D
ABENDROTH  AE1 - B IH0 N - D R AO0 TH
ABER  EY1 - B ER0
ABERCROMBIE  AE2 - B ER0 - K R AA1 M - B IY0
ABERDEEN  AE1 - B ER0 - D IY2 N
ABERFORD  EY1 - B ER0 - F ER0 D
ABERG  AE1 - B ER0 G
ABERLE  AE1 - B ER0 - AH0 L
ABERLE(2)  AE1 - B ER0 L
ABERMIN  AE1 - B ER0 - M IH0 N
ABERNATHY  AE1 - B ER0 - N AE2 - TH IY0
ABERNETHY  AE1 - B ER0 - N EH2 - TH IY0
ABERRANT  AE0 - B EH1 - R AH0 N T
ABERRATION  AE2 - B ER0 - EY1 - SH AH0 N
ABERRATIONAL  AE2 - B ER0 - EY1 - SH AH0 - N AH0 L
ABERRATIONS  AE2 - B ER0 - EY1 - SH AH0 N Z
ABERT  AE1 - B ER0 T
ABET  AH0 - B EH1 T
ABETTED  AH0 - B EH1 - T IH0 D
ABETTING  AH0 - B EH1 - T IH0 NG
ABEX  EY1 - B EH0 K S
ABEYANCE  AH0 - B EY1 - AH0 N S
ABEYTA  AA0 - B EY1 - T AH0
ABHOR  AE0 B - HH AO1 R
ABHORRED  AH0 B - HH AO1 R D
ABHORRENCE  AH0 B - HH AO1 - R AH0 N S
ABHORRENT  AE0 B - HH AO1 - R AH0 N T
ABHORS  AH0 B - HH AO1 R Z
ABID  EY1 - B IH0 D
ABIDE  AH0 - B AY1 D
ABIDED  AH0 - B AY1 - D IH0 D
ABIDES  AH0 - B AY1 D Z
ABIDING  AH0 - B AY1 - D IH0 NG
ABIE  AE1 - B IY0
ABIGAIL  AE1 - B AH0 - G EY2 L
ABILA  AA0 - B IY1 - L AH0
ABILENE  AE1 - B IH0 - L IY2 N
ABILITIES  AH0 - B IH1 - L AH0 - T IY0 Z
ABILITY  AH0 - B IH1 - L AH0 - T IY0
ABINGTON  AE1 - B IH0 NG - T AH0 N
ABIO  AA1 - B IY0 - OW0
ABIOLA  AA2 - B IY0 - OW1 - L AH0
ABIOLA'S  AA2 - B IY0 - OW1 - L AH0 Z
ABIOMED  EY0 - B IY1 - AH0 - M EH0 D
ABITIBI  AE2 - B IH0 - T IY1 - B IY0
ABITZ  AE1 - B IH0 T S
ABJECT  AE1 B - JH EH0 K T
ABKHAZIA  AE0 B K - HH AA1 - Z Y AH0
ABKHAZIA(2)  AE0 B K - HH AE1 - Z Y AH0
ABKHAZIAN  AE0 B K - HH AA1 - Z IY0 - AH0 N
ABKHAZIAN(2)  AE0 B K - HH AE1 - Z IY0 - AH0 N
ABKHAZIAN(3)  AE0 B K - HH AA1 - Z Y AH0 N
ABKHAZIAN(4)  AE0 B K - HH AE1 - Z Y AH0 N
ABKHAZIANS  AE0 B K - HH AA1 - Z IY0 - AH0 N Z
ABKHAZIANS(2)  AE0 B K - HH AE1 - Z IY0 - AH0 N Z
ABLAZE  AH0 - B L EY1 Z
ABLE  EY1 - B AH0 L
ABLED  EY1 - B AH0 L D
ABLER  EY1 - B AH0 L - ER0
ABLER(2)  EY1 - B L ER0
ABLES  EY1 - B AH0 L Z
ABLEST  EY1 - B AH0 L S T
ABLEST(2)  EY1 - B L AH0 S T
ABLOOM  AH0 - B L UW1 M
ABLY  EY1 - B L IY0
ABNER  AE1 B - N ER0
ABNEY  AE1 B - N IY0
ABNORMAL  AE0 B - N AO1 R - M AH0 L
ABNORMALITIES  AE2 B - N AO0 R - M AE1 - L AH0 - T IY0 Z
ABNORMALITY  AE2 B - N AO0 R - M AE1 - L AH0 - T IY0
ABNORMALLY  AE0 B - N AO1 R - M AH0 - L IY0
ABO  AA1 - B OW0
ABO'S  AA1 - B OW0 Z
ABOARD  AH0 - B AO1 R D
ABODE  AH0 - B OW1 D
ABOLISH  AH0 - B AA1 - L IH0 SH
ABOLISHED  AH0 - B AA1 - L IH0 SH T
ABOLISHES  AH0 - B AA1 - L IH0 - SH IH0 Z
ABOLISHING  AH0 - B AA1 - L IH0 - SH IH0 NG
ABOLITION  AE2 - B AH0 - L IH1 - SH AH0 N
ABOLITIONISM  AE2 - B AH0 - L IH1 - SH AH0 - N IH2 - Z AH0 M
ABOLITIONIST  AE2 - B AH0 - L IH1 - SH AH0 - N AH0 S T
ABOLITIONISTS  AE2 - B AH0 - L IH1 - SH AH0 - N AH0 S T S
ABOLITIONISTS(2)  AE2 - B AH0 - L IH1 - SH AH0 - N AH0 S S
ABOLITIONISTS(3)  AE2 - B AH0 - L IH1 - SH AH0 - N AH0 S
ABOMINABLE  AH0 - B AA1 - M AH0 - N AH0 - B AH0 L
ABOMINATION  AH0 - B AA2 - M AH0 - N EY1 - SH AH0 N
ABOOD  AH0 - B UW1 D
ABOODI  AH0 - B UW1 - D IY0
ABORIGINAL  AE2 - B ER0 - IH1 - JH AH0 - N AH0 L
ABORIGINE  AE2 - B ER0 - IH1 - JH AH0 - N IY0
ABORIGINES  AE2 - B ER0 - IH1 - JH AH0 - N IY0 Z
ABORN  AH0 - B AO1 R N
ABORT  AH0 - B AO1 R T
ABORTED  AH0 - B AO1 R - T IH0 D
ABORTIFACIENT  AH0 - B AO2 R - T AH0 - F EY1 - SH AH0 N T
ABORTIFACIENTS  AH0 - B AO2 R - T AH0 - F EY1 - SH AH0 N T S
ABORTING  AH0 - B AO1 R - T IH0 NG
ABORTION  AH0 - B AO1 R - SH AH0 N
ABORTIONIST  AH0 - B AO1 R - SH AH0 N - IH0 S T
ABORTIONISTS  AH0 - B AO1 R - SH AH0 N - IH0 S T S
ABORTIONISTS(2)  AH0 - B AO1 R - SH AH0 N - IH0 S S
ABORTIONISTS(3)  AH0 - B AO1 R - SH AH0 N - IH0 S
ABORTIONS  AH0 - B AO1 R - SH AH0 N Z
ABORTIVE  AH0 - B AO1 R - T IH0 V
ABOTT  AH0 - B AA1 T
ABOU  AH0 - B UW1
ABOUD  AA0 - B UW1 D
ABOUHALIMA  AA2 - B UW0 - HH AA0 - L IY1 - M AH0
ABOUHALIMA'S  AA2 - B UW0 - HH AA0 - L IY1 - M AH0 Z
ABOUND  AH0 - B AW1 N D
ABOUNDED  AH0 - B AW1 N - D IH0 D
ABOUNDING  AH0 - B AW1 N - D IH0 NG
ABOUNDS  AH0 - B AW1 N D Z
ABOUT  AH0 - B AW1 T
ABOUT'S  AH0 - B AW1 T S
ABOVE  AH0 - B AH1 V
ABOVE'S  AH0 - B AH1 V Z
ABOVEBOARD  AH0 - B AH1 V - B AO2 R D
ABPLANALP  AE1 B - P L AH0 - N AE0 L P
ABRA  AA1 - B R AH0
ABRACADABRA  AE2 - B R AH0 - K AH0 - D AE1 - B R AH0
ABRAHAM  EY1 - B R AH0 - HH AE2 M
ABRAHAMIAN  AE2 - B R AH0 - HH EY1 - M IY0 - AH0 N
ABRAHAMS  EY1 - B R AH0 - HH AE2 M Z
ABRAHAMSEN  AE0 - B R AH0 - HH AE1 M - S AH0 N
ABRAHAMSON  AH0 - B R AE1 - HH AH0 M - S AH0 N
ABRAM  AH0 - B R AE1 M
ABRAMCZYK  AA1 - B R AH0 M - CH IH0 K
ABRAMO  AA0 - B R AA1 - M OW0
ABRAMOVITZ  AH0 - B R AA1 - M AH0 - V IH0 T S
ABRAMOWICZ  AH0 - B R AA1 - M AH0 - V IH0 CH
ABRAMOWITZ  AH0 - B R AA1 - M AH0 - W IH0 T S
ABRAMS  EY1 - B R AH0 M Z
ABRAMSON  EY1 - B R AH0 M - S AH0 N
ABRASION  AH0 - B R EY1 - ZH AH0 N
ABRASIONS  AH0 - B R EY1 - ZH AH0 N Z
ABRASIVE  AH0 - B R EY1 - S IH0 V
ABRASIVES  AH0 - B R EY1 - S IH0 V Z
ABREAST  AH0 - B R EH1 S T
ABREGO  AA0 - B R EH1 - G OW0
ABREU  AH0 - B R UW1
ABRIDGE  AH0 - B R IH1 JH
ABRIDGED  AH0 - B R IH1 JH D
ABRIL  AH0 - B R IH1 L
ABROAD  AH0 - B R AO1 D
ABROGATE  AE1 - B R AH0 - G EY2 T
ABROGATED  AE1 - B R AH0 - G EY2 - T IH0 D
ABROGATING  AE1 - B R AH0 - G EY2 - T IH0 NG
ABROGATION  AE2 - B R AH0 - G EY1 - SH AH0 N
ABRON  AH0 - B R AA1 N
ABRUPT  AH0 - B R AH1 P T
ABRUPTLY  AH0 - B R AH1 P T - L IY0
ABRUPTNESS  AH0 - B R AH1 P T - N AH0 S
ABRUTYN  EY1 - B R UW0 - T IH0 N
ABRUZZESE  AA0 - B R UW0 T - S EY1 - Z IY0
ABRUZZO  AA0 - B R UW1 - Z OW0
ABS  EY1 - B IY1 - EH1 S
ABS(2)  AE1 B Z
ABSALOM  AE1 B - S AH0 - L AH0 M
ABSCAM  AE1 B - S K AE0 M
ABSCESS  AE1 B - S EH2 S
ABSENCE  AE1 B - S AH0 N S
ABSENCES  AE1 B - S AH0 N - S IH0 Z
ABSENT  AE1 B - S AH0 N T
ABSENTEE  AE2 B - S AH0 N - T IY1
ABSENTEEISM  AE2 B - S AH0 N - T IY1 - IH0 - Z AH0 M
ABSENTEES  AE2 B - S AH0 N - T IY1 Z
ABSENTIA  AE0 B - S EH1 N - SH AH0
ABSHER  AE1 B - SH ER0
ABSHIER  AE1 B - SH IY0 - ER0
ABSHIRE  AE1 B - SH AY2 R
ABSO  AE1 B - S OW0
ABSOLOM  AE1 B - S AH0 - L AH0 M
ABSOLUT  AE2 B - S AH0 - L UW1 T
ABSOLUTE  AE1 B - S AH0 - L UW2 T
ABSOLUTELY  AE2 B - S AH0 - L UW1 T - L IY0
ABSOLUTENESS  AE1 B - S AH0 - L UW2 T - N AH0 S
ABSOLUTES  AE1 B - S AH0 - L UW2 T S
ABSOLUTION  AE2 B - S AH0 - L UW1 - SH AH0 N
ABSOLUTISM  AE1 B - S AH0 - L UW2 - T IH2 - Z AH0 M
ABSOLUTIST  AE0 B - S IH0 - L UW1 - T IH0 S T
ABSOLVE  AH0 B - Z AA1 L V
ABSOLVE(2)  AE0 B - Z AA1 L V
ABSOLVED  AH0 B - Z AA1 L V D
ABSOLVED(2)  AE0 B - Z AA1 L V D
ABSOLVES  AH0 B - Z AA1 L V Z
ABSOLVES(2)  AE0 B - Z AA1 L V Z
ABSOLVING  AH0 B - Z AA1 L - V IH0 NG
ABSOLVING(2)  AE0 B - Z AA1 L - V IH0 NG
ABSORB  AH0 B - Z AO1 R B
ABSORBED  AH0 B - Z AO1 R B D
ABSORBENCY  AH0 B - Z AO1 R - B AH0 N - S IY0
ABSORBENT  AH0 B - Z AO1 R - B AH0 N T
ABSORBER  AH0 B - Z AO1 R - B ER0
ABSORBERS  AH0 B - Z AO1 R - B ER0 Z
ABSORBING  AH0 B - Z AO1 R - B IH0 NG
ABSORBS  AH0 B - Z AO1 R B Z
ABSORPTION  AH0 B - Z AO1 R P - SH AH0 N
ABSORPTION(2)  AH0 B - S AO1 R P - SH AH0 N
ABSTAIN  AH0 B - S T EY1 N
ABSTAIN(2)  AE0 B - S T EY1 N
ABSTAINED  AH0 B - S T EY1 N D
ABSTAINED(2)  AE0 B - S T EY1 N D
ABSTAINING  AH0 B - S T EY1 - N IH0 NG
ABSTAINING(2)  AE0 B - S T EY1 - N IH0 NG
ABSTENTION  AH0 B - S T EH1 N - CH AH0 N
ABSTENTION(2)  AE0 B - S T EH1 N - CH AH0 N
ABSTENTIONS  AH0 B - S T EH1 N - CH AH0 N Z
ABSTENTIONS(2)  AE0 B - S T EH1 N - CH AH0 N Z
ABSTINENCE  AE1 B - S T AH0 - N AH0 N S
ABSTINENT  AE1 B - S T AH0 - N AH0 N T
ABSTON  AE1 B - S T AH0 N
ABSTRACT  AE0 B - S T R AE1 K T
ABSTRACT(2)  AE1 B - S T R AE2 K T
ABSTRACTED  AE1 B - S T R AE2 K - T IH0 D
ABSTRACTION  AE0 B - S T R AE1 K - SH AH0 N
ABSTRACTIONS  AE0 B - S T R AE1 K - SH AH0 N Z
ABSTRACTS  AE1 B - S T R AE0 K T S
ABSTRUSE  AH0 B - S T R UW1 S
ABSURD  AH0 B - S ER1 D
ABSURDIST  AH0 B - S ER1 - D IH0 S T
ABSURDITIES  AH0 B - S ER1 - D AH0 - T IY0 Z
ABSURDITY  AH0 B - S ER1 - D AH0 - T IY0
ABSURDLY  AH0 B - S ER1 D - L IY0
ABT  AE1 B T
ABT(2)  EY1 - B IY1 - T IY1
ABTS  AE1 B T S
ABTS(2)  EY1 - B IY1 - T IY1 Z
ABTS(3)  EY1 - B IY1 - T IY1 - EH1 S
ABU  AE1 - B UW0
ABUDRAHM  AH0 - B AH1 - D R AH0 M
ABULADZE  AE2 - B Y UW0 - L AE1 D - Z IY0
ABUNDANCE  AH0 - B AH1 N - D AH0 N S
ABUNDANT  AH0 - B AH1 N - D AH0 N T
ABUNDANTLY  AH0 - B AH1 N - D AH0 N T - L IY0
ABURTO  AH0 - B UH1 R - T OW2
ABURTO'S  AH0 - B UH1 R - T OW2 Z
ABUSE  AH0 - B Y UW1 S
ABUSE(2)  AH0 - B Y UW1 Z
ABUSED  AH0 - B Y UW1 Z D
ABUSER  AH0 - B Y UW1 - Z ER0
ABUSERS  AH0 - B Y UW1 - Z ER0 Z
ABUSES  AH0 - B Y UW1 - S IH0 Z
ABUSES(2)  AH0 - B Y UW1 - Z IH0 Z
ABUSING  AH0 - B Y UW1 - Z IH0 NG
ABUSIVE  AH0 - B Y UW1 - S IH0 V
ABUT  AH0 - B AH1 T
ABUTS  AH0 - B AH1 T S
ABUTTED  AH0 - B AH1 - T AH0 D
ABUTTING  AH0 - B AH1 - T IH0 NG
ABUZZ  AH0 - B AH1 Z
ABYSMAL  AH0 - B IH1 Z - M AH0 L
ABYSMALLY  AH0 - B IH1 Z - M AH0 - L IY0
ABYSS  AH0 - B IH1 S
ABZUG  AE1 B - Z AH2 G
ABZUG(2)  AE1 B - Z UH2 G
AC  EY1 - S IY1
ACA  AE1 - K AH0
ACACIA  AH0 - K EY1 - SH AH0
ACADEME  AE1 - K AH0 - D IY2 M
ACADEMIA  AE2 - K AH0 - D IY1 - M IY0 - AH0
ACADEMIC  AE2 - K AH0 - D EH1 - M IH0 K
ACADEMICALLY  AE2 - K AH0 - D EH1 - M IH0 K - L IY0
ACADEMICIAN  AE2 - K AH0 - D AH0 - M IH1 - SH AH0 N
ACADEMICIANS  AE2 - K AH0 - D AH0 - M IH1 - SH AH0 N Z
ACADEMICIANS(2)  AH0 - K AE2 - D AH0 - M IH1 - SH AH0 N Z
ACADEMICS  AE2 - K AH0 - D EH1 - M IH0 K S
ACADEMIES  AH0 - K AE1 - D AH0 - M IY0 Z
ACADEMY  AH0 - K AE1 - D AH0 - M IY0
ACADEMY'S  AH0 - K AE1 - D AH0 - M IY0 Z
ACADIA  AH0 - K EY1 - D IY0 - AH0
ACAMPORA  AH0 - K AE1 M - P ER0 - AH0
ACANTHA  AA0 - K AA1 N - DH AH0
ACAPULCO  AE2 - K AH0 - P UH1 L - K OW0
ACCARDI  AA0 - K AA1 R - D IY0
ACCARDO  AA0 - K AA1 R - D OW0
ACCEDE  AE0 K - S IY1 D
ACCEDED  AE0 K - S IY1 - D IH0 D
ACCEDES  AE0 K - S IY1 D Z
ACCEDING  AE0 K - S IY1 - D IH0 NG
ACCEL  AH0 K - S EH1 L
ACCELERANT  AE0 K - S EH1 - L ER0 - AH0 N T
ACCELERANTS  AE0 K - S EH1 - L ER0 - AH0 N T S
ACCELERATE  AE0 K - S EH1 - L ER0 - EY2 T
ACCELERATED  AE0 K - S EH1 - L ER0 - EY2 - T IH0 D
ACCELERATES  AE0 K - S EH1 - L ER0 - EY2 T S
ACCELERATING  AE0 K - S EH1 - L ER0 - EY2 - T IH0 NG
ACCELERATION  AE2 K - S EH2 - L ER0 - EY1 - SH AH0 N
ACCELERATOR  AE0 K - S EH1 - L ER0 - EY2 - T ER0
ACCELEROMETER  AE0 K - S EH2 - L ER0 - AA1 - M AH0 - T ER0
ACCELEROMETERS  AE0 K - S EH2 - L ER0 - AA1 - M AH0 - T ER0 Z
ACCENT  AH0 K - S EH1 N T
ACCENT(2)  AE1 K - S EH2 N T
ACCENTED  AE1 K - S EH0 N - T IH0 D
ACCENTING  AE1 K - S EH0 N - T IH0 NG
ACCENTS  AE1 K - S EH0 N T S
ACCENTUATE  AE0 K - S EH1 N - CH UW0 - EY0 T
ACCENTUATED  AE0 K - S EH1 N - CH AH0 W - EY2 - T IH0 D
ACCENTUATES  AE0 K - S EH1 N - CH UW0 - EY0 T S
ACCENTUATING  AE0 K - S EH1 N - CH AH0 W - EY2 - T IH0 NG
ACCEPT  AE0 K - S EH1 P T
ACCEPT(2)  AH0 K - S EH1 P T
ACCEPTABILITY  AH0 K - S EH2 P - T AH0 - B IH1 - L AH0 - T IY0
ACCEPTABLE  AE0 K - S EH1 P - T AH0 - B AH0 L
ACCEPTABLE(2)  AH0 K - S EH1 P - T AH0 - B AH0 L
ACCEPTANCE  AE0 K - S EH1 P - T AH0 N S
ACCEPTANCE(2)  AH0 K - S EH1 P - T AH0 N S
ACCEPTANCES  AE0 K - S EH1 P - T AH0 N - S IH0 Z
ACCEPTED  AE0 K - S EH1 P - T IH0 D
ACCEPTED(2)  AH0 K - S EH1 P - T AH0 D
ACCEPTING  AE0 K - S EH1 P - T IH0 NG
ACCEPTING(2)  AH0 K - S EH1 P - T IH0 NG
ACCEPTS  AE0 K - S EH1 P T S
ACCESS  AE1 K - S EH2 S
ACCESSED  AE1 K - S EH2 S T
ACCESSIBILITY  AE2 K - S EH0 - S AH0 - B IH1 - L IH0 - T IY0
ACCESSIBLE  AE0 K - S EH1 - S AH0 - B AH0 L
ACCESSING  AE1 K - S EH2 - S IH0 NG
ACCESSION  AH0 K - S EH1 - SH AH0 N
ACCESSORIES  AE0 K - S EH1 - S ER0 - IY0 Z
ACCESSORIZE  AE0 K - S EH1 - S ER0 - AY2 Z
ACCESSORIZED  AE0 K - S EH1 - S ER0 - AY2 Z D
ACCESSORY  AE0 K - S EH1 - S ER0 - IY0
ACCETTA  AA0 - CH EH1 - T AH0
ACCIDENT  AE1 K - S AH0 - D AH0 N T
ACCIDENT'S  AE1 K - S AH0 - D AH0 N T S
ACCIDENTAL  AE2 K - S AH0 - D EH1 N - T AH0 L
ACCIDENTAL(2)  AE2 K - S AH0 - D EH1 - N AH0 L
ACCIDENTALLY  AE2 K - S AH0 - D EH1 N - T AH0 - L IY0
ACCIDENTALLY(2)  AE2 K - S AH0 - D EH1 - N AH0 - L IY0
ACCIDENTLY  AE1 K - S AH0 - D AH0 N T - L IY0
ACCIDENTS  AE1 K - S AH0 - D AH0 N T S
ACCION  AE1 - CH IY0 - AH0 N
ACCIVAL  AE1 - S IH0 - V AA2 L
ACCLAIM  AH0 - K L EY1 M
ACCLAIMED  AH0 - K L EY1 M D
ACCLAIMING  AH0 - K L EY1 - M IH0 NG
ACCLIMATE  AE1 - K L AH0 - M EY2 T
ACCLIMATED  AE1 - K L AH0 - M EY2 - T IH0 D
ACCLIMATION  AE2 - K L AH0 - M EY1 - SH AH0 N
ACCO  AE1 - K OW0
ACCOLA  AA0 - K OW1 - L AH0
ACCOLADE  AE1 - K AH0 - L EY2 D
ACCOLADES  AE1 - K AH0 - L EY2 D Z
ACCOMANDO  AA0 - K OW0 - M AA1 N - D OW0
ACCOMMODATE  AH0 - K AA1 - M AH0 - D EY2 T
ACCOMMODATED  AH0 - K AA1 - M AH0 - D EY2 - T AH0 D
ACCOMMODATES  AH0 - K AA1 - M AH0 - D EY2 T S
ACCOMMODATING  AH0 - K AA1 - M AH0 - D EY2 - T IH0 NG
ACCOMMODATION  AH0 - K AA2 - M AH0 - D EY1 - SH AH0 N
ACCOMMODATIONS  AH0 - K AA2 - M AH0 - D EY1 - SH AH0 N Z
ACCOMMODATIVE  AH0 - K AA1 - M AH0 - D EY2 - T IH0 V
ACCOMPANIED  AH0 - K AH1 M - P AH0 - N IY0 D
ACCOMPANIES  AH0 - K AH1 M - P AH0 - N IY0 Z
ACCOMPANIMENT  AH0 - K AH1 M P - N IH0 - M AH0 N T
ACCOMPANIMENT(2)  AH0 - K AH1 M P - N IY0 - M AH0 N T
ACCOMPANIMENTS  AH0 - K AH1 M P - N IH0 - M AH0 N T S
ACCOMPANIMENTS(2)  AH0 - K AH1 M P - N IY0 - M AH0 N T S
ACCOMPANIST  AH0 - K AH1 M - P AH0 - N AH0 S T
ACCOMPANY  AH0 - K AH1 M - P AH0 - N IY0
ACCOMPANYING  AH0 - K AH1 M - P AH0 - N IY0 - IH0 NG
ACCOMPLI  AA2 - K AA1 M - P L IY0
ACCOMPLI(2)  AH0 - K AA1 M - P L IY0
ACCOMPLICE  AH0 - K AA1 M - P L AH0 S
ACCOMPLICES  AH0 - K AA1 M - P L AH0 - S AH0 Z
ACCOMPLISH  AH0 - K AA1 M - P L IH0 SH
ACCOMPLISHED  AH0 - K AA1 M - P L IH0 SH T
ACCOMPLISHES  AH0 - K AA1 M - P L IH0 - SH IH0 Z
ACCOMPLISHING  AH0 - K AA1 M - P L IH0 - SH IH0 NG
ACCOMPLISHMENT  AH0 - K AA1 M - P L IH0 SH - M AH0 N T
ACCOMPLISHMENTS  AH0 - K AA1 M - P L IH0 SH - M AH0 N T S
ACCOR  AE1 - K AO2 R
ACCOR'S  AE1 - K ER0 Z
ACCORD  AH0 - K AO1 R D
ACCORD'S  AH0 - K AO1 R D Z
ACCORDANCE  AH0 - K AO1 R - D AH0 N S
ACCORDED  AH0 - K AO1 R - D IH0 D
ACCORDING  AH0 - K AO1 R - D IH0 NG
ACCORDINGLY  AH0 - K AO1 R - D IH0 NG - L IY0
ACCORDION  AH0 - K AO1 R - D IY0 - AH0 N
ACCORDIONS  AH0 - K AO1 R - D IY0 - AH0 N Z
ACCORDS  AH0 - K AO1 R D Z
ACCOST  AH0 - K AO1 S T
ACCOSTED  AH0 - K AA1 - S T AH0 D
ACCOSTING  AH0 - K AA1 - S T IH0 NG
ACCOUNT  AH0 - K AW1 N T
ACCOUNT'S  AH0 - K AW1 N T S
ACCOUNTABILITY  AH0 - K AW1 N - T AH0 - B IH0 - L IH0 - T IY0
ACCOUNTABILITY(2)  AH0 - K AW1 - N AH0 - B IH0 - L IH0 - T IY0
ACCOUNTABLE  AH0 - K AW1 N - T AH0 - B AH0 L
ACCOUNTABLE(2)  AH0 - K AW1 - N AH0 - B AH0 L
ACCOUNTANCY  AH0 - K AW1 N - T AH0 N - S IY0
ACCOUNTANT  AH0 - K AW1 N - T AH0 N T
ACCOUNTANT'S  AH0 - K AW1 N - T AH0 N T S
ACCOUNTANTS  AH0 - K AW1 N - T AH0 N T S
ACCOUNTANTS'  AH0 - K AW1 N - T AH0 N T S
ACCOUNTED  AH0 - K AW1 N - T AH0 D
ACCOUNTED(2)  AH0 - K AW1 - N AH0 D
ACCOUNTEMP  AH0 - K AW1 N - T EH2 M P
ACCOUNTEMPS  AH0 - K AW1 N - T EH2 M P S
ACCOUNTING  AH0 - K AW1 N - T IH0 NG
ACCOUNTING(2)  AH0 - K AW1 - N IH0 NG
ACCOUNTS  AH0 - K AW1 N T S
ACCOUTERMENT  AH0 - K UW1 - T ER0 - M AH0 N T
ACCOUTERMENTS  AH0 - K UW1 - T ER0 - M AH0 N T S
ACCREDIT  AH0 - K R EH2 - D AH0 T
ACCREDITATION  AH0 - K R EH2 - D AH0 - T EY1 - SH AH0 N
ACCREDITATIONS  AH0 - K R EH2 - D AH0 - D EY1 - SH AH0 N Z
ACCREDITED  AH0 - K R EH1 - D IH0 - T IH0 D
ACCREDITING  AH0 - K R EH1 - D AH0 - T IH0 NG
ACCRETION  AH0 - K R IY1 - SH AH0 N
ACCRUAL  AH0 - K R UW1 - AH0 L
ACCRUALS  AH0 - K R UW1 - AH0 L Z
ACCRUE  AH0 - K R UW1
ACCRUED  AH0 - K R UW1 D
ACCRUES  AH0 - K R UW1 Z
ACCRUING  AH0 - K R UW1 - IH0 NG
ACCUMULATE  AH0 - K Y UW1 - M Y AH0 - L EY2 T
ACCUMULATED  AH0 - K Y UW1 - M Y AH0 - L EY2 - T IH0 D
ACCUMULATES  AH0 - K Y UW1 - M Y AH0 - L EY2 T S
ACCUMULATING  AH0 - K Y UW1 - M Y AH0 - L EY2 - T IH0 NG
ACCUMULATION  AH0 - K Y UW2 - M Y AH0 - L EY1 - SH AH0 N
ACCUMULATIONS  AH0 - K Y UW2 - M Y AH0 - L EY1 - SH AH0 N Z
ACCUMULATIVE  AH0 - K Y UW1 - M Y AH0 - L EY2 - T IH0 V
ACCUMULATIVELY  AH0 - K Y UW1 - M Y AH0 - L EY2 - T IH0 V - L IY0
ACCUMULATIVELY(2)  AH0 - K Y UW1 - M Y AH0 - L AH0 - T IH0 V - L IY0
ACCUMULATOR  AH0 - K Y UW1 - M Y AH0 - L EY2 - T ER0
ACCUMULATORS  AH0 - K Y UW1 - M Y AH0 - L EY2 - T ER0 Z
ACCURACIES  AE1 - K Y ER0 - AH0 - S IY0 Z
ACCURACY  AE1 - K Y ER0 - AH0 - S IY0
ACCURATE  AE1 - K Y ER0 - AH0 T
ACCURATELY  AE1 - K Y ER0 - AH0 T - L IY0
ACCURAY  AE1 - K Y ER0 - EY2
ACCURAY'S  AE1 - K Y ER0 - EY2 Z
ACCURIDE  AE1 - K Y ER0 - AY2 D
ACCURSO  AA0 - K UH1 R - S OW0
ACCUSATION  AE2 - K Y AH0 - Z EY1 - SH AH0 N
ACCUSATION(2)  AE2 - K Y UW0 - Z EY1 - SH AH0 N
ACCUSATIONS  AE2 - K Y AH0 - Z EY1 - SH AH0 N Z
ACCUSATIONS(2)  AE2 - K Y UW0 - Z EY1 - SH AH0 N Z
ACCUSATIVE  AH0 - K Y UW1 - Z AH0 - T IH0 V
ACCUSATORY  AH0 - K Y UW1 - Z AH0 - T AO2 - R IY0
ACCUSE  AH0 - K Y UW1 Z
ACCUSED  AH0 - K Y UW1 Z D
ACCUSER  AH0 - K Y UW1 - Z ER0
ACCUSERS  AH0 - K Y UW1 - Z ER0 Z
ACCUSES  AH0 - K Y UW1 - Z IH0 Z
ACCUSING  AH0 - K Y UW1 - Z IH0 NG
ACCUSINGLY  AH0 - K Y UW1 - Z IH0 NG - L IY0
ACCUSTOM  AH0 - K AH1 - S T AH0 M
ACCUSTOMED  AH0 - K AH1 - S T AH0 M D
ACCUTANE  AE1 - K Y UW0 - T EY2 N
ACE  EY1 S
ACED  EY1 S T
ACER  EY1 - S ER0
ACERBIC  AH0 - S EH1 R - B IH0 K
ACERO  AH0 - S EH1 - R OW0
ACERRA  AH0 - S EH1 - R AH0
ACES  EY1 - S IH0 Z
ACETAMINOPHEN  AH0 - S IY2 - T AH0 - M IH1 - N AH0 - F AH0 N
ACETATE  AE1 - S AH0 - T EY2 T
ACETIC  AH0 - S EH1 - T IH0 K
ACETIC(2)  AH0 - S IY1 - T IH0 K
ACETO  AA0 - S EH1 - T OW0
ACETONE  AE1 - S AH0 - T OW2 N
ACETYLCHOLINE  AH0 - S EH2 - T AH0 L - K OW1 - L IY0 N
ACETYLCHOLINE(2)  AH0 - S IY2 - T AH0 L - K OW1 - L IY0 N
ACETYLENE  AH0 - S EH1 - T AH0 - L IY2 N
ACEVEDO  AE0 - S AH0 - V EY1 - D OW0
ACEVES  AA0 - S EY1 - V EH0 S
ACEY  EY1 - S IY0
ACHATZ  AE1 - K AH0 T S
ACHE  EY1 K
ACHEBE  AA0 - CH EY1 - B IY0
ACHEE  AH0 - CH IY1
ACHENBACH  AE1 - K IH0 N - B AA0 K
ACHENBAUM  AE1 - K AH0 N - B AW2 M
ACHES  EY1 K S
ACHESON  AE1 - CH AH0 - S AH0 N
ACHEY  AE1 - CH IY0
ACHIEVABLE  AH0 - CH IY1 - V AH0 - B AH0 L
ACHIEVE  AH0 - CH IY1 V
ACHIEVED  AH0 - CH IY1 V D
ACHIEVEMENT  AH0 - CH IY1 V - M AH0 N T
ACHIEVEMENTS  AH0 - CH IY1 V - M AH0 N T S
ACHIEVER  AH0 - CH IY1 - V ER0
ACHIEVERS  AH0 - CH IY1 - V ER0 Z
ACHIEVES  AH0 - CH IY1 V Z
ACHIEVING  AH0 - CH IY1 - V IH0 NG
ACHILLE  AH0 - K IH1 - L IY0
ACHILLES  AH0 - K IH1 - L IY0 Z
ACHILLES'  AH0 - K IH1 - L IY0 Z
ACHING  EY1 - K IH0 NG
ACHMED  AA1 HH - M EH0 D
ACHOA  AH0 - CH OW1 - AH0
ACHOA'S  AH0 - CH OW1 - AH0 Z
ACHOR  EY1 - K ER0
ACHORD  AE1 - K AO0 R D
ACHORN  AE1 - K ER0 N
ACHTENBERG  AE1 K - T EH0 N - B ER0 G
ACHTERBERG  AE1 K - T ER0 - B ER0 G
ACHY  EY1 - K IY0
ACID  AE1 - S AH0 D
ACIDIC  AH0 - S IH1 - D IH0 K
ACIDIFICATION  AH0 - S IH2 - D AH0 - F AH0 - K EY1 - SH AH0 N
ACIDIFIED  AH0 - S IH1 - D AH0 - F AY2 D
ACIDIFIES  AH0 - S IH1 - D AH0 - F AY2 Z
ACIDIFY  AH0 - S IH1 - D AH0 - F AY2
ACIDITY  AH0 - S IH1 - D AH0 - T IY0
ACIDLY  AE1 - S AH0 D - L IY0
ACIDOSIS  AE2 - S AH0 - D OW1 - S AH0 S
ACIDS  AE1 - S AH0 D Z
ACIDURIA  AE2 - S AH0 - D UH1 - R IY0 - AH0
ACIERNO  AA0 - S IH1 R - N OW0
ACK  AE1 K
ACKER  AE1 - K ER0
ACKER'S  AE1 - K ER0 Z
ACKERLEY  AE1 - K ER0 - L IY0
ACKERLY  AE1 - K ER0 - L IY0
ACKERMAN  AE1 - K ER0 - M AH0 N
ACKERMANN  AE1 - K ER0 - M AH0 N
ACKERSON  AE1 - K ER0 - S AH0 N
ACKERT  AE1 - K ER0 T
ACKHOUSE  AE1 K - HH AW2 S
ACKLAND  AE1 K - L AH0 N D
ACKLES  AE1 - K AH0 L Z
ACKLEY  AE1 K - L IY0
ACKLIN  AE1 - K L IH0 N
ACKMAN  AE1 K - M AH0 N
ACKNOWLEDGE  AE0 K - N AA1 - L IH0 JH
ACKNOWLEDGE(2)  IH0 K - N AA1 - L IH0 JH
ACKNOWLEDGEABLE  AE0 K - N AA1 - L IH0 - JH AH0 - B AH0 L
ACKNOWLEDGEABLE(2)  IH0 K - N AA1 - L IH0 - JH AH0 - B AH0 L
ACKNOWLEDGED  AE0 K - N AA1 - L IH0 JH D
ACKNOWLEDGED(2)  IH0 K - N AA1 - L IH0 JH D
ACKNOWLEDGEMENT  AE0 K - N AA1 - L IH0 JH - M AH0 N T
ACKNOWLEDGEMENT(2)  IH0 K - N AA1 - L IH0 JH - M AH0 N T
ACKNOWLEDGES  AE0 K - N AA1 - L IH0 - JH IH0 Z
ACKNOWLEDGES(2)  IH0 K - N AA1 - L IH0 - JH IH0 Z
ACKNOWLEDGING  AE0 K - N AA1 - L IH0 - JH IH0 NG
ACKNOWLEDGING(2)  IH0 K - N AA1 - L IH0 - JH IH0 NG
ACKNOWLEDGMENT  AE0 K - N AA1 - L IH0 JH - M AH0 N T
ACKNOWLEDGMENT(2)  IH0 K - N AA1 - L IH0 JH - M AH0 N T
ACKROYD  AE1 - K R OY2 D
ACKROYD'S  AE1 - K R OY2 D Z
ACMAT  AE1 K - M AE0 T
ACMAT'S  AE1 K - M AE0 T S
ACME  AE1 K - M IY0
ACME'S  AE1 K - M IY0 Z
ACNE  AE1 K - N IY0
ACOCELLA  AA0 - K OW0 - CH EH1 - L AH0
ACOFF  AE1 - K AO0 F
ACOG  AH0 - K AO1 G
ACOLYTE  AE1 - K AH0 - L AY2 T
ACOLYTES  AE1 - K AH0 - L AY2 T S
ACORD  AH0 - K AO1 R D
ACORN  EY1 - K AO0 R N
ACORNS  EY1 - K AO0 R N Z
ACOSTA  AH0 - K AO1 - S T AH0
ACOUSTIC  AH0 - K UW1 - S T IH0 K
ACOUSTICAL  AH0 - K UW1 - S T IH0 - K AH0 L
ACOUSTICALLY  AH0 - K UW1 - S T IH0 K - L IY0
ACOUSTICS  AH0 - K UW1 - S T IH0 K S
ACQUAINT  AH0 - K W EY1 N T
ACQUAINTANCE  AH0 - K W EY1 N - T AH0 N S
ACQUAINTANCES  AH0 - K W EY1 N - T AH0 N - S IH0 Z
ACQUAINTANCESHIP  AH0 - K W EY1 N - T AH0 N S - SH IH0 P
ACQUAINTED  AH0 - K W EY1 N - T IH0 D
ACQUAINTED(2)  AH0 - K W EY1 - N IH0 D
ACQUAVIVA  AA0 - K W AA0 - V IY1 - V AH0
ACQUIESCE  AE2 - K W IY0 - EH1 S
ACQUIESCED  AE2 - K W IY0 - EH1 S T
ACQUIESCENCE  AE2 - K W IY0 - EH1 - S AH0 N S
ACQUIESCING  AE2 - K W IY0 - EH1 - S IH0 NG
ACQUIRE  AH0 - K W AY1 - ER0
ACQUIRED  AH0 - K W AY1 - ER0 D
ACQUIRER  AH0 - K W AY1 - ER0 - ER0
ACQUIRERS  AH0 - K W AY1 - ER0 - ER0 Z
ACQUIRES  AH0 - K W AY1 - ER0 Z
ACQUIRING  AH0 - K W AY1 - R IH0 NG
ACQUIRING(2)  AH0 - K W AY1 - ER0 - IH0 NG
ACQUISITION  AE2 - K W AH0 - Z IH1 - SH AH0 N
ACQUISITION'S  AE2 - K W AH0 - Z IH1 - SH AH0 N Z
ACQUISITIONS  AE2 - K W AH0 - Z IH1 - SH AH0 N Z
ACQUISITIVE  AH0 - K W IH1 - Z AH0 - T IH0 V
ACQUIT  AH0 - K W IH1 T
ACQUITAINE  AE1 - K W IH0 - T EY2 N
ACQUITS  AH0 - K W IH1 T S
ACQUITTAL  AH0 - K W IH1 - T AH0 L
ACQUITTALS  AH0 - K W IH1 - T AH0 L Z
ACQUITTED  AH0 - K W IH1 - T AH0 D
ACQUITTED(2)  AH0 - K W IH1 - T IH0 D
ACQUITTING  AH0 - K W IH1 - T IH0 NG
ACRE  EY1 - K ER0
ACREAGE  EY1 - K ER0 - IH0 JH
ACREAGE(2)  EY1 - K R AH0 JH
ACREE  AH0 - K R IY1
ACRES  EY1 - K ER0 Z
ACREY  AE1 - K R IY0
ACRI  AA1 - K R IY0
ACRID  AE1 - K R IH0 D
ACRIMONIOUS  AE2 - K R AH0 - M OW1 - N IY0 - AH0 S
ACRIMONY  AE1 - K R IH0 - M OW2 - N IY0
ACROBAT  AE1 - K R AH0 - B AE2 T
ACROBATIC  AE2 - K R AH0 - B AE1 - T IH0 K
ACROBATICS  AE2 - K R AH0 - B AE1 - T IH0 K S
ACROBATS  AE1 - K R AH0 - B AE2 T S
ACRONYM  AE1 - K R AH0 - N IH0 M
ACRONYMS  AE1 - K R AH0 - N IH0 M Z
ACROPOLIS  AH0 - K R AA1 - P AH0 - L AH0 S
ACROSS  AH0 - K R AO1 S
ACRYLIC  AH0 - K R IH1 - L IH0 K
ACRYLICS  AH0 - K R IH1 - L IH0 K S
ACT  AE1 K T
ACT'S  AE1 K T S
ACTAVA  AE2 K - T AA1 - V AH0
ACTED  AE1 K - T AH0 D
ACTED(2)  AE1 K - T IH0 D
ACTIGALL  AE1 K - T IH0 - G AO0 L
ACTIN  AE1 K - T AH0 N
ACTING  AE1 K - T IH0 NG
ACTINIDE  AE1 K - T IH0 - N AY2 D
ACTINIDIA  AE2 K - T IH0 - N IH1 - D IY0 - AH0
ACTION  AE1 K - SH AH0 N
ACTION'S  AE1 K - SH AH0 N Z
ACTIONABLE  AE1 K - SH AH0 N - AH0 - B AH0 L
ACTIONS  AE1 K - SH AH0 N Z
ACTIVASE  AE1 K - T IH0 - V EY2 Z
ACTIVATE  AE1 K - T AH0 - V EY2 T
ACTIVATED  AE1 K - T AH0 - V EY2 - T AH0 D
ACTIVATED(2)  AE1 K - T IH0 - V EY2 - T IH0 D
ACTIVATES  AE1 K - T AH0 - V EY2 T S
ACTIVATING  AE1 K - T AH0 - V EY2 - T IH0 NG
ACTIVATION  AE2 K - T AH0 - V EY1 - SH AH0 N
ACTIVATOR  AE1 K - T AH0 - V EY2 - T ER0
ACTIVE  AE1 K - T IH0 V
ACTIVELY  AE1 K - T IH0 V - L IY0
ACTIVES  AE1 K - T IH0 V Z
ACTIVISION  AE1 K - T IH0 - V IH2 - ZH AH0 N
ACTIVISM  AE1 K - T IH0 - V IH2 - Z AH0 M
ACTIVIST  AE1 K - T AH0 - V AH0 S T
ACTIVIST(
Download .txt
gitextract_toln448z/

├── .github/
│   └── workflows/
│       └── pypi.yml
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.md
├── docs/
│   ├── install.md
│   ├── quick_use.md
│   └── training.md
├── melo/
│   ├── __init__.py
│   ├── api.py
│   ├── app.py
│   ├── attentions.py
│   ├── commons.py
│   ├── configs/
│   │   └── config.json
│   ├── data/
│   │   └── example/
│   │       └── metadata.list
│   ├── data_utils.py
│   ├── download_utils.py
│   ├── infer.py
│   ├── init_downloads.py
│   ├── losses.py
│   ├── main.py
│   ├── mel_processing.py
│   ├── models.py
│   ├── modules.py
│   ├── monotonic_align/
│   │   ├── __init__.py
│   │   └── core.py
│   ├── preprocess_text.py
│   ├── split_utils.py
│   ├── text/
│   │   ├── __init__.py
│   │   ├── chinese.py
│   │   ├── chinese_bert.py
│   │   ├── chinese_mix.py
│   │   ├── cleaner.py
│   │   ├── cleaner_multiling.py
│   │   ├── cmudict.rep
│   │   ├── cmudict_cache.pickle
│   │   ├── english.py
│   │   ├── english_bert.py
│   │   ├── english_utils/
│   │   │   ├── __init__.py
│   │   │   ├── abbreviations.py
│   │   │   ├── number_norm.py
│   │   │   └── time_norm.py
│   │   ├── es_phonemizer/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── cleaner.py
│   │   │   ├── es_symbols.json
│   │   │   ├── es_symbols.txt
│   │   │   ├── es_symbols_v2.json
│   │   │   ├── es_to_ipa.py
│   │   │   ├── example_ipa.txt
│   │   │   ├── gruut_wrapper.py
│   │   │   ├── punctuation.py
│   │   │   ├── spanish_symbols.txt
│   │   │   └── test.ipynb
│   │   ├── fr_phonemizer/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── cleaner.py
│   │   │   ├── en_symbols.json
│   │   │   ├── example_ipa.txt
│   │   │   ├── fr_symbols.json
│   │   │   ├── fr_to_ipa.py
│   │   │   ├── french_abbreviations.py
│   │   │   ├── french_symbols.txt
│   │   │   ├── gruut_wrapper.py
│   │   │   └── punctuation.py
│   │   ├── french.py
│   │   ├── french_bert.py
│   │   ├── japanese.py
│   │   ├── japanese_bert.py
│   │   ├── ko_dictionary.py
│   │   ├── korean.py
│   │   ├── opencpop-strict.txt
│   │   ├── spanish.py
│   │   ├── spanish_bert.py
│   │   ├── symbols.py
│   │   └── tone_sandhi.py
│   ├── train.py
│   ├── train.sh
│   ├── transforms.py
│   └── utils.py
├── requirements.txt
├── setup.py
└── test/
    ├── basetts_test_resources/
    │   ├── en_egs_text.txt
    │   ├── es_egs_text.txt
    │   ├── fr_egs_text.txt
    │   ├── jp_egs_text.txt
    │   ├── kr_egs_text.txt
    │   └── zh_mix_en_egs_text.txt
    ├── test_base_model_tts_package.py
    └── test_base_model_tts_package_from_S3.py
Download .txt
SYMBOL INDEX (402 symbols across 49 files)

FILE: melo/api.py
  class TTS (line 20) | class TTS(nn.Module):
    method __init__ (line 21) | def __init__(self,
    method audio_numpy_concat (line 66) | def audio_numpy_concat(segment_data_list, sr, speed=1.):
    method split_sentences_into_pieces (line 75) | def split_sentences_into_pieces(text, language, quiet=False):
    method tts_to_file (line 83) | def tts_to_file(self, text, speaker_id, output_path=None, sdp_ratio=0....

FILE: melo/app.py
  function synthesize (line 31) | def synthesize(speaker, text, speed, language, progress=gr.Progress()):
  function load_speakers (line 35) | def load_speakers(language, text):
  function main (line 57) | def main(share, host, port):

FILE: melo/attentions.py
  class LayerNorm (line 12) | class LayerNorm(nn.Module):
    method __init__ (line 13) | def __init__(self, channels, eps=1e-5):
    method forward (line 21) | def forward(self, x):
  function fused_add_tanh_sigmoid_multiply (line 28) | def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
  class Encoder (line 37) | class Encoder(nn.Module):
    method __init__ (line 38) | def __init__(
    method forward (line 98) | def forward(self, x, x_mask, g=None):
  class Decoder (line 118) | class Decoder(nn.Module):
    method __init__ (line 119) | def __init__(
    method forward (line 178) | def forward(self, x, x_mask, h, h_mask):
  class MultiHeadAttention (line 204) | class MultiHeadAttention(nn.Module):
    method __init__ (line 205) | def __init__(
    method forward (line 258) | def forward(self, x, c, attn_mask=None):
    method attention (line 268) | def attention(self, query, key, value, mask=None):
    method _matmul_with_relative_values (line 319) | def _matmul_with_relative_values(self, x, y):
    method _matmul_with_relative_keys (line 328) | def _matmul_with_relative_keys(self, x, y):
    method _get_relative_embeddings (line 337) | def _get_relative_embeddings(self, relative_embeddings, length):
    method _relative_position_to_absolute_position (line 355) | def _relative_position_to_absolute_position(self, x):
    method _absolute_position_to_relative_position (line 376) | def _absolute_position_to_relative_position(self, x):
    method _attention_bias_proximal (line 392) | def _attention_bias_proximal(self, length):
  class FFN (line 404) | class FFN(nn.Module):
    method __init__ (line 405) | def __init__(
    method forward (line 433) | def forward(self, x, x_mask):
    method _causal_padding (line 443) | def _causal_padding(self, x):
    method _same_padding (line 452) | def _same_padding(self, x):

FILE: melo/commons.py
  function init_weights (line 6) | def init_weights(m, mean=0.0, std=0.01):
  function get_padding (line 12) | def get_padding(kernel_size, dilation=1):
  function convert_pad_shape (line 16) | def convert_pad_shape(pad_shape):
  function intersperse (line 22) | def intersperse(lst, item):
  function kl_divergence (line 28) | def kl_divergence(m_p, logs_p, m_q, logs_q):
  function rand_gumbel (line 37) | def rand_gumbel(shape):
  function rand_gumbel_like (line 43) | def rand_gumbel_like(x):
  function slice_segments (line 48) | def slice_segments(x, ids_str, segment_size=4):
  function rand_slice_segments (line 57) | def rand_slice_segments(x, x_lengths=None, segment_size=4):
  function get_timing_signal_1d (line 67) | def get_timing_signal_1d(length, channels, min_timescale=1.0, max_timesc...
  function add_timing_signal_1d (line 83) | def add_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4):
  function cat_timing_signal_1d (line 89) | def cat_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4, axis...
  function subsequent_mask (line 95) | def subsequent_mask(length):
  function fused_add_tanh_sigmoid_multiply (line 101) | def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
  function convert_pad_shape (line 110) | def convert_pad_shape(pad_shape):
  function shift_1d (line 116) | def shift_1d(x):
  function sequence_mask (line 121) | def sequence_mask(length, max_length=None):
  function generate_path (line 128) | def generate_path(duration, mask):
  function clip_grad_value_ (line 145) | def clip_grad_value_(parameters, clip_value, norm_type=2):

FILE: melo/data_utils.py
  class TextAudioSpeakerLoader (line 17) | class TextAudioSpeakerLoader(torch.utils.data.Dataset):
    method __init__ (line 24) | def __init__(self, audiopaths_sid_text, hparams):
    method _filter (line 53) | def _filter(self):
    method get_audio_text_speaker_pair (line 94) | def get_audio_text_speaker_pair(self, audiopath_sid_text):
    method get_audio (line 107) | def get_audio(self, filename):
    method get_text (line 150) | def get_text(self, text, word2ph, phone, tone, language_str, wav_path):
    method get_sid (line 189) | def get_sid(self, sid):
    method __getitem__ (line 193) | def __getitem__(self, index):
    method __len__ (line 196) | def __len__(self):
  class TextAudioSpeakerCollate (line 200) | class TextAudioSpeakerCollate:
    method __init__ (line 203) | def __init__(self, return_ids=False):
    method __call__ (line 206) | def __call__(self, batch):
  class DistributedBucketSampler (line 285) | class DistributedBucketSampler(torch.utils.data.distributed.DistributedS...
    method __init__ (line 295) | def __init__(
    method _create_buckets (line 314) | def _create_buckets(self):
    method __iter__ (line 346) | def __iter__(self):
    method _bisect (line 397) | def _bisect(self, x, lo=0, hi=None):
    method __len__ (line 412) | def __len__(self):

FILE: melo/download_utils.py
  function load_or_download_config (line 44) | def load_or_download_config(locale, use_hf=True, config_path=None):
  function load_or_download_model (line 55) | def load_or_download_model(locale, device, use_hf=True, ckpt_path=None):
  function load_pretrain_model (line 66) | def load_pretrain_model():

FILE: melo/infer.py
  function main (line 12) | def main(ckpt_path, text, language, output_dir):

FILE: melo/losses.py
  function feature_loss (line 4) | def feature_loss(fmap_r, fmap_g):
  function discriminator_loss (line 15) | def discriminator_loss(disc_real_outputs, disc_generated_outputs):
  function generator_loss (line 31) | def generator_loss(disc_outputs):
  function kl_loss (line 43) | def kl_loss(z_p, logs_q, m_p, logs_p, z_mask):

FILE: melo/main.py
  function main (line 14) | def main(text, file, output_path, language, speaker, speed, device):

FILE: melo/mel_processing.py
  function dynamic_range_compression_torch (line 9) | def dynamic_range_compression_torch(x, C=1, clip_val=1e-5):
  function dynamic_range_decompression_torch (line 18) | def dynamic_range_decompression_torch(x, C=1):
  function spectral_normalize_torch (line 27) | def spectral_normalize_torch(magnitudes):
  function spectral_de_normalize_torch (line 32) | def spectral_de_normalize_torch(magnitudes):
  function spectrogram_torch (line 41) | def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, cente...
  function spectrogram_torch_conv (line 79) | def spectrogram_torch_conv(y, n_fft, sampling_rate, hop_size, win_size, ...
  function spec_to_mel_torch (line 118) | def spec_to_mel_torch(spec, n_fft, num_mels, sampling_rate, fmin, fmax):
  function mel_spectrogram_torch (line 132) | def mel_spectrogram_torch(

FILE: melo/models.py
  class DurationDiscriminator (line 17) | class DurationDiscriminator(nn.Module):  # vits2
    method __init__ (line 18) | def __init__(
    method forward_probability (line 53) | def forward_probability(self, x, x_mask, dur, g=None):
    method forward (line 69) | def forward(self, x, x_mask, dur_r, dur_hat, g=None):
  class TransformerCouplingBlock (line 91) | class TransformerCouplingBlock(nn.Module):
    method __init__ (line 92) | def __init__(
    method forward (line 147) | def forward(self, x, x_mask, g=None, reverse=False):
  class StochasticDurationPredictor (line 157) | class StochasticDurationPredictor(nn.Module):
    method __init__ (line 158) | def __init__(
    method forward (line 206) | def forward(self, x, x_mask, w=None, g=None, reverse=False, noise_scal...
  class DurationPredictor (line 268) | class DurationPredictor(nn.Module):
    method __init__ (line 269) | def __init__(
    method forward (line 294) | def forward(self, x, x_mask, g=None):
  class TextEncoder (line 311) | class TextEncoder(nn.Module):
    method __init__ (line 312) | def __init__(
    method forward (line 360) | def forward(self, x, x_lengths, tone, language, bert, ja_bert, g=None):
  class ResidualCouplingBlock (line 384) | class ResidualCouplingBlock(nn.Module):
    method __init__ (line 385) | def __init__(
    method forward (line 419) | def forward(self, x, x_mask, g=None, reverse=False):
  class PosteriorEncoder (line 429) | class PosteriorEncoder(nn.Module):
    method __init__ (line 430) | def __init__(
    method forward (line 459) | def forward(self, x, x_lengths, g=None, tau=1.0):
  class Generator (line 471) | class Generator(torch.nn.Module):
    method __init__ (line 472) | def __init__(
    method forward (line 519) | def forward(self, x, g=None):
    method remove_weight_norm (line 540) | def remove_weight_norm(self):
  class DiscriminatorP (line 548) | class DiscriminatorP(torch.nn.Module):
    method __init__ (line 549) | def __init__(self, period, kernel_size=5, stride=3, use_spectral_norm=...
    method forward (line 605) | def forward(self, x):
  class DiscriminatorS (line 627) | class DiscriminatorS(torch.nn.Module):
    method __init__ (line 628) | def __init__(self, use_spectral_norm=False):
    method forward (line 643) | def forward(self, x):
  class MultiPeriodDiscriminator (line 657) | class MultiPeriodDiscriminator(torch.nn.Module):
    method __init__ (line 658) | def __init__(self, use_spectral_norm=False):
    method forward (line 668) | def forward(self, y, y_hat):
  class ReferenceEncoder (line 684) | class ReferenceEncoder(nn.Module):
    method __init__ (line 690) | def __init__(self, spec_channels, gin_channels=0, layernorm=False):
    method forward (line 724) | def forward(self, inputs, mask=None):
    method calculate_channels (line 746) | def calculate_channels(self, L, kernel_size, stride, pad, n_convs):
  class SynthesizerTrn (line 752) | class SynthesizerTrn(nn.Module):
    method __init__ (line 757) | def __init__(
    method forward (line 888) | def forward(self, x, x_lengths, y, y_lengths, sid, tone, language, ber...
    method infer (line 966) | def infer(
    method voice_conversion (line 1023) | def voice_conversion(self, y, y_lengths, sid_src, sid_tgt, tau=1.0):

FILE: melo/modules.py
  class LayerNorm (line 17) | class LayerNorm(nn.Module):
    method __init__ (line 18) | def __init__(self, channels, eps=1e-5):
    method forward (line 26) | def forward(self, x):
  class ConvReluNorm (line 32) | class ConvReluNorm(nn.Module):
    method __init__ (line 33) | def __init__(
    method forward (line 74) | def forward(self, x, x_mask):
  class DDSConv (line 84) | class DDSConv(nn.Module):
    method __init__ (line 89) | def __init__(self, channels, kernel_size, n_layers, p_dropout=0.0):
    method forward (line 118) | def forward(self, x, x_mask, g=None):
  class WN (line 133) | class WN(torch.nn.Module):
    method __init__ (line 134) | def __init__(
    method forward (line 185) | def forward(self, x, x_mask, g=None, **kwargs):
    method remove_weight_norm (line 212) | def remove_weight_norm(self):
  class ResBlock1 (line 221) | class ResBlock1(torch.nn.Module):
    method __init__ (line 222) | def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5)):
    method forward (line 296) | def forward(self, x, x_mask=None):
    method remove_weight_norm (line 311) | def remove_weight_norm(self):
  class ResBlock2 (line 318) | class ResBlock2(torch.nn.Module):
    method __init__ (line 319) | def __init__(self, channels, kernel_size=3, dilation=(1, 3)):
    method forward (line 347) | def forward(self, x, x_mask=None):
    method remove_weight_norm (line 358) | def remove_weight_norm(self):
  class Log (line 363) | class Log(nn.Module):
    method forward (line 364) | def forward(self, x, x_mask, reverse=False, **kwargs):
  class Flip (line 374) | class Flip(nn.Module):
    method forward (line 375) | def forward(self, x, *args, reverse=False, **kwargs):
  class ElementwiseAffine (line 384) | class ElementwiseAffine(nn.Module):
    method __init__ (line 385) | def __init__(self, channels):
    method forward (line 391) | def forward(self, x, x_mask, reverse=False, **kwargs):
  class ResidualCouplingLayer (line 402) | class ResidualCouplingLayer(nn.Module):
    method __init__ (line 403) | def __init__(
    method forward (line 437) | def forward(self, x, x_mask, g=None, reverse=False):
  class ConvFlow (line 459) | class ConvFlow(nn.Module):
    method __init__ (line 460) | def __init__(
    method forward (line 486) | def forward(self, x, x_mask, g=None, reverse=False):
  class TransformerCouplingLayer (line 519) | class TransformerCouplingLayer(nn.Module):
    method __init__ (line 520) | def __init__(
    method forward (line 562) | def forward(self, x, x_mask, g=None, reverse=False):

FILE: melo/monotonic_align/__init__.py
  function maximum_path (line 7) | def maximum_path(neg_cent, mask):

FILE: melo/monotonic_align/core.py
  function maximum_path_jit (line 14) | def maximum_path_jit(paths, values, t_ys, t_xs):

FILE: melo/preprocess_text.py
  function main (line 30) | def main(

FILE: melo/split_utils.py
  function split_sentence (line 9) | def split_sentence(text, min_len=10, language_str='EN'):
  function split_sentences_latin (line 17) | def split_sentences_latin(text, min_len=10):
  function split_sentences_zh (line 26) | def split_sentences_zh(text, min_len=10):
  function merge_short_sentences_en (line 51) | def merge_short_sentences_en(sens):
  function merge_short_sentences_zh (line 77) | def merge_short_sentences_zh(sens):
  function txtsplit (line 105) | def txtsplit(text, desired_length=100, max_length=200):

FILE: melo/text/__init__.py
  function cleaned_text_to_sequence (line 7) | def cleaned_text_to_sequence(cleaned_text, tones, language, symbol_to_id...
  function get_bert (line 23) | def get_bert(norm_text, word2ph, language, device):

FILE: melo/text/chinese.py
  function replace_punctuation (line 55) | def replace_punctuation(text):
  function g2p (line 68) | def g2p(text):
  function _get_initials_finals (line 80) | def _get_initials_finals(word):
  function _g2p (line 93) | def _g2p(segments):
  function text_normalize (line 171) | def text_normalize(text):
  function get_bert_feature (line 179) | def get_bert_feature(text, word2ph, device=None):

FILE: melo/text/chinese_bert.py
  function get_bert_feature (line 13) | def get_bert_feature(text, word2ph, device=None, model_id='hfl/chinese-r...

FILE: melo/text/chinese_mix.py
  function replace_punctuation (line 59) | def replace_punctuation(text):
  function g2p (line 69) | def g2p(text, impl='v2'):
  function _get_initials_finals (line 87) | def _get_initials_finals(word):
  function _g2p (line 101) | def _g2p(segments):
  function text_normalize (line 189) | def text_normalize(text):
  function get_bert_feature (line 197) | def get_bert_feature(text, word2ph, device):
  function _g2p_v2 (line 202) | def _g2p_v2(segments):

FILE: melo/text/cleaner.py
  function clean_text (line 9) | def clean_text(text, language):
  function clean_text_bert (line 16) | def clean_text_bert(text, language, device=None):
  function text_to_sequence (line 30) | def text_to_sequence(text, language):

FILE: melo/text/cleaner_multiling.py
  function replace_punctuation (line 43) | def replace_punctuation(text):
  function lowercase (line 48) | def lowercase(text):
  function collapse_whitespace (line 52) | def collapse_whitespace(text):
  function remove_punctuation_at_begin (line 55) | def remove_punctuation_at_begin(text):
  function remove_aux_symbols (line 58) | def remove_aux_symbols(text):
  function replace_symbols (line 63) | def replace_symbols(text, lang="en"):
  function unicleaners (line 98) | def unicleaners(text, cased=False, lang='en'):

FILE: melo/text/english.py
  function post_replace_ph (line 95) | def post_replace_ph(ph):
  function read_dict (line 118) | def read_dict():
  function cache_dict (line 142) | def cache_dict(g2p_dict, file_path):
  function get_dict (line 147) | def get_dict():
  function refine_ph (line 161) | def refine_ph(phn):
  function refine_syllables (line 169) | def refine_syllables(syllables):
  function text_normalize (line 181) | def text_normalize(text):
  function g2p_old (line 190) | def g2p_old(text):
  function g2p (line 217) | def g2p(text, pad_start_end=True, tokenized=None):
  function get_bert_feature (line 262) | def get_bert_feature(text, word2ph, device=None):

FILE: melo/text/english_bert.py
  function get_bert_feature (line 9) | def get_bert_feature(text, word2ph, device=None):

FILE: melo/text/english_utils/abbreviations.py
  function expand_abbreviations (line 28) | def expand_abbreviations(text, lang="en"):

FILE: melo/text/english_utils/number_norm.py
  function _remove_commas (line 16) | def _remove_commas(m):
  function _expand_decimal_point (line 20) | def _expand_decimal_point(m):
  function __expand_currency (line 24) | def __expand_currency(value: str, inflection: Dict[float, str]) -> str:
  function _expand_currency (line 42) | def _expand_currency(m: "re.Match") -> str:
  function _expand_ordinal (line 74) | def _expand_ordinal(m):
  function _expand_number (line 78) | def _expand_number(m):
  function normalize_numbers (line 91) | def normalize_numbers(text):

FILE: melo/text/english_utils/time_norm.py
  function _expand_num (line 18) | def _expand_num(n: int) -> str:
  function _expand_time_english (line 22) | def _expand_time_english(match: "re.Match") -> str:
  function expand_time_english (line 46) | def expand_time_english(text: str) -> str:

FILE: melo/text/es_phonemizer/base.py
  class BasePhonemizer (line 7) | class BasePhonemizer(abc.ABC):
    method __init__ (line 34) | def __init__(self, language, punctuations=Punctuation.default_puncs(),...
    method _init_language (line 46) | def _init_language(self, language):
    method language (line 57) | def language(self):
    method name (line 63) | def name():
    method is_available (line 69) | def is_available(cls):
    method version (line 75) | def version(cls):
    method supported_languages (line 81) | def supported_languages():
    method is_supported_language (line 85) | def is_supported_language(self, language):
    method _phonemize (line 90) | def _phonemize(self, text, separator):
    method _phonemize_preprocess (line 93) | def _phonemize_preprocess(self, text) -> Tuple[List[str], List]:
    method _phonemize_postprocess (line 107) | def _phonemize_postprocess(self, phonemized, punctuations) -> str:
    method phonemize (line 116) | def phonemize(self, text: str, separator="|", language: str = None) ->...
    method print_logs (line 137) | def print_logs(self, level: int = 0):

FILE: melo/text/es_phonemizer/cleaner.py
  function replace_punctuation (line 43) | def replace_punctuation(text):
  function lowercase (line 48) | def lowercase(text):
  function collapse_whitespace (line 52) | def collapse_whitespace(text):
  function remove_punctuation_at_begin (line 55) | def remove_punctuation_at_begin(text):
  function remove_aux_symbols (line 58) | def remove_aux_symbols(text):
  function replace_symbols (line 63) | def replace_symbols(text, lang="en"):
  function spanish_cleaners (line 98) | def spanish_cleaners(text):

FILE: melo/text/es_phonemizer/es_to_ipa.py
  function es2ipa (line 4) | def es2ipa(text):

FILE: melo/text/es_phonemizer/gruut_wrapper.py
  class Gruut (line 14) | class Gruut(BasePhonemizer):
    method __init__ (line 41) | def __init__(
    method name (line 54) | def name():
    method phonemize_gruut (line 57) | def phonemize_gruut(self, text: str, separator: str = "|", tie=False) ...
    method _phonemize (line 109) | def _phonemize(self, text, separator):
    method is_supported_language (line 112) | def is_supported_language(self, language):
    method supported_languages (line 117) | def supported_languages() -> List:
    method version (line 125) | def version(self):
    method is_available (line 134) | def is_available(cls):

FILE: melo/text/es_phonemizer/punctuation.py
  class PuncPosition (line 12) | class PuncPosition(Enum):
  class Punctuation (line 21) | class Punctuation:
    method __init__ (line 43) | def __init__(self, puncs: str = _DEF_PUNCS):
    method default_puncs (line 47) | def default_puncs():
    method puncs (line 52) | def puncs(self):
    method puncs (line 56) | def puncs(self, value):
    method strip (line 62) | def strip(self, text):
    method strip_to_restore (line 74) | def strip_to_restore(self, text):
    method _strip_to_restore (line 88) | def _strip_to_restore(self, text):
    method restore (line 120) | def restore(cls, text, puncs):
    method _restore (line 135) | def _restore(cls, text, puncs, num):  # pylint: disable=too-many-retur...

FILE: melo/text/fr_phonemizer/base.py
  class BasePhonemizer (line 7) | class BasePhonemizer(abc.ABC):
    method __init__ (line 34) | def __init__(self, language, punctuations=Punctuation.default_puncs(),...
    method _init_language (line 46) | def _init_language(self, language):
    method language (line 57) | def language(self):
    method name (line 63) | def name():
    method is_available (line 69) | def is_available(cls):
    method version (line 75) | def version(cls):
    method supported_languages (line 81) | def supported_languages():
    method is_supported_language (line 85) | def is_supported_language(self, language):
    method _phonemize (line 90) | def _phonemize(self, text, separator):
    method _phonemize_preprocess (line 93) | def _phonemize_preprocess(self, text) -> Tuple[List[str], List]:
    method _phonemize_postprocess (line 107) | def _phonemize_postprocess(self, phonemized, punctuations) -> str:
    method phonemize (line 116) | def phonemize(self, text: str, separator="|", language: str = None) ->...
    method print_logs (line 137) | def print_logs(self, level: int = 0):

FILE: melo/text/fr_phonemizer/cleaner.py
  function replace_punctuation (line 48) | def replace_punctuation(text):
  function expand_abbreviations (line 53) | def expand_abbreviations(text, lang="fr"):
  function lowercase (line 61) | def lowercase(text):
  function collapse_whitespace (line 65) | def collapse_whitespace(text):
  function remove_punctuation_at_begin (line 68) | def remove_punctuation_at_begin(text):
  function remove_aux_symbols (line 71) | def remove_aux_symbols(text):
  function replace_symbols (line 76) | def replace_symbols(text, lang="en"):
  function french_cleaners (line 111) | def french_cleaners(text):

FILE: melo/text/fr_phonemizer/fr_to_ipa.py
  function remove_consecutive_t (line 5) | def remove_consecutive_t(input_str):
  function fr2ipa (line 23) | def fr2ipa(text):

FILE: melo/text/fr_phonemizer/gruut_wrapper.py
  class Gruut (line 14) | class Gruut(BasePhonemizer):
    method __init__ (line 41) | def __init__(
    method name (line 54) | def name():
    method phonemize_gruut (line 57) | def phonemize_gruut(self, text: str, separator: str = "|", tie=False) ...
    method _phonemize (line 109) | def _phonemize(self, text, separator):
    method is_supported_language (line 112) | def is_supported_language(self, language):
    method supported_languages (line 117) | def supported_languages() -> List:
    method version (line 125) | def version(self):
    method is_available (line 134) | def is_available(cls):

FILE: melo/text/fr_phonemizer/punctuation.py
  class PuncPosition (line 12) | class PuncPosition(Enum):
  class Punctuation (line 21) | class Punctuation:
    method __init__ (line 43) | def __init__(self, puncs: str = _DEF_PUNCS):
    method default_puncs (line 47) | def default_puncs():
    method puncs (line 52) | def puncs(self):
    method puncs (line 56) | def puncs(self, value):
    method strip (line 62) | def strip(self, text):
    method strip_to_restore (line 74) | def strip_to_restore(self, text):
    method _strip_to_restore (line 88) | def _strip_to_restore(self, text):
    method restore (line 118) | def restore(cls, text, puncs):
    method _restore (line 133) | def _restore(cls, text, puncs, num):  # pylint: disable=too-many-retur...

FILE: melo/text/french.py
  function distribute_phone (line 11) | def distribute_phone(n_phone, n_word):
  function text_normalize (line 19) | def text_normalize(text):
  function g2p (line 26) | def g2p(text, pad_start_end=True, tokenized=None):
  function get_bert_feature (line 66) | def get_bert_feature(text, word2ph, device=None):
  function text_normalize (line 83) | def text_normalize(text):

FILE: melo/text/french_bert.py
  function get_bert_feature (line 9) | def get_bert_feature(text, word2ph, device=None):

FILE: melo/text/japanese.py
  function _makerulemap (line 325) | def _makerulemap():
  function kata2phoneme (line 333) | def kata2phoneme(text: str) -> str:
  function hira2kata (line 360) | def hira2kata(text: str) -> str:
  function text2kata (line 370) | def text2kata(text: str) -> str:
  function japanese_convert_numbers_to_words (line 467) | def japanese_convert_numbers_to_words(text: str) -> str:
  function japanese_convert_alpha_symbols_to_words (line 474) | def japanese_convert_alpha_symbols_to_words(text: str) -> str:
  function japanese_text_to_phonemes (line 478) | def japanese_text_to_phonemes(text: str) -> str:
  function is_japanese_character (line 488) | def is_japanese_character(char):
  function replace_punctuation (line 524) | def replace_punctuation(text):
  function text_normalize (line 548) | def text_normalize(text):
  function distribute_phone (line 557) | def distribute_phone(n_phone, n_word):
  function g2p (line 571) | def g2p(norm_text):
  function get_bert_feature (line 614) | def get_bert_feature(text, word2ph, device):

FILE: melo/text/japanese_bert.py
  function get_bert_feature (line 8) | def get_bert_feature(text, word2ph, device=None, model_id='tohoku-nlp/be...

FILE: melo/text/korean.py
  function normalize (line 16) | def normalize(text):
  function normalize_with_dictionary (line 25) | def normalize_with_dictionary(text, dic):
  function normalize_english (line 32) | def normalize_english(text):
  function korean_text_to_phonemes (line 44) | def korean_text_to_phonemes(text, character: str = "hangeul") -> str:
  function text_normalize (line 73) | def text_normalize(text):
  function distribute_phone (line 82) | def distribute_phone(n_phone, n_word):
  function g2p (line 97) | def g2p(norm_text):
  function get_bert_feature (line 141) | def get_bert_feature(text, word2ph, device='cuda'):

FILE: melo/text/spanish.py
  function distribute_phone (line 11) | def distribute_phone(n_phone, n_word):
  function text_normalize (line 19) | def text_normalize(text):
  function post_replace_ph (line 23) | def post_replace_ph(ph):
  function refine_ph (line 44) | def refine_ph(phn):
  function refine_syllables (line 52) | def refine_syllables(syllables):
  function g2p (line 68) | def g2p(text, pad_start_end=True, tokenized=None):
  function get_bert_feature (line 108) | def get_bert_feature(text, word2ph, device=None):

FILE: melo/text/spanish_bert.py
  function get_bert_feature (line 9) | def get_bert_feature(text, word2ph, device=None):

FILE: melo/text/tone_sandhi.py
  class ToneSandhi (line 22) | class ToneSandhi:
    method __init__ (line 23) | def __init__(self):
    method _neural_sandhi (line 466) | def _neural_sandhi(self, word: str, pos: str, finals: List[str]) -> Li...
    method _bu_sandhi (line 522) | def _bu_sandhi(self, word: str, finals: List[str]) -> List[str]:
    method _yi_sandhi (line 533) | def _yi_sandhi(self, word: str, finals: List[str]) -> List[str]:
    method _split_word (line 558) | def _split_word(self, word: str) -> List[str]:
    method _three_sandhi (line 571) | def _three_sandhi(self, word: str, finals: List[str]) -> List[str]:
    method _all_tone_three (line 611) | def _all_tone_three(self, finals: List[str]) -> bool:
    method _merge_bu (line 616) | def _merge_bu(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
    method _merge_yi (line 636) | def _merge_yi(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
    method _merge_continuous_three_tones (line 669) | def _merge_continuous_three_tones(
    method _is_reduplication (line 700) | def _is_reduplication(self, word: str) -> bool:
    method _merge_continuous_three_tones_2 (line 704) | def _merge_continuous_three_tones_2(
    method _merge_er (line 734) | def _merge_er(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
    method _merge_reduplication (line 743) | def _merge_reduplication(self, seg: List[Tuple[str, str]]) -> List[Tup...
    method pre_merge_for_modify (line 752) | def pre_merge_for_modify(self, seg: List[Tuple[str, str]]) -> List[Tup...
    method modified_tone (line 764) | def modified_tone(self, word: str, pos: str, finals: List[str]) -> Lis...

FILE: melo/train.py
  function run (line 49) | def run():
  function train_and_evaluate (line 291) | def train_and_evaluate(
  function evaluate (line 539) | def evaluate(hps, generator, eval_loader, writer_eval):

FILE: melo/transforms.py
  function piecewise_rational_quadratic_transform (line 12) | def piecewise_rational_quadratic_transform(
  function searchsorted (line 45) | def searchsorted(bin_locations, inputs, eps=1e-6):
  function unconstrained_rational_quadratic_spline (line 50) | def unconstrained_rational_quadratic_spline(
  function rational_quadratic_spline (line 100) | def rational_quadratic_spline(

FILE: melo/utils.py
  function get_text_for_tts_infer (line 22) | def get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id...
  function load_checkpoint (line 60) | def load_checkpoint(checkpoint_path, model, optimizer=None, skip_optimiz...
  function save_checkpoint (line 119) | def save_checkpoint(model, optimizer, learning_rate, iteration, checkpoi...
  function summarize (line 140) | def summarize(
  function latest_checkpoint_path (line 159) | def latest_checkpoint_path(dir_path, regex="G_*.pth"):
  function plot_spectrogram_to_numpy (line 166) | def plot_spectrogram_to_numpy(spectrogram):
  function plot_alignment_to_numpy (line 192) | def plot_alignment_to_numpy(alignment, info=None):
  function load_wav_to_torch (line 223) | def load_wav_to_torch(full_path):
  function load_wav_to_torch_new (line 228) | def load_wav_to_torch_new(full_path):
  function load_wav_to_torch_librosa (line 233) | def load_wav_to_torch_librosa(full_path, sr):
  function load_filepaths_and_text (line 238) | def load_filepaths_and_text(filename, split="|"):
  function get_hparams (line 244) | def get_hparams(init=True):
  function clean_checkpoints (line 290) | def clean_checkpoints(path_to_models="logs/44k/", n_ckpts_to_keep=2, sor...
  function get_hparams_from_dir (line 335) | def get_hparams_from_dir(model_dir):
  function get_hparams_from_file (line 346) | def get_hparams_from_file(config_path):
  function check_git_hash (line 355) | def check_git_hash(model_dir):
  function get_logger (line 380) | def get_logger(model_dir, filename="train.log"):
  class HParams (line 395) | class HParams:
    method __init__ (line 396) | def __init__(self, **kwargs):
    method keys (line 402) | def keys(self):
    method items (line 405) | def items(self):
    method values (line 408) | def values(self):
    method __len__ (line 411) | def __len__(self):
    method __getitem__ (line 414) | def __getitem__(self, key):
    method __setitem__ (line 417) | def __setitem__(self, key, value):
    method __contains__ (line 420) | def __contains__(self, key):
    method __repr__ (line 423) | def __repr__(self):

FILE: setup.py
  class PostInstallCommand (line 11) | class PostInstallCommand(install):
    method run (line 13) | def run(self):
  class PostDevelopCommand (line 18) | class PostDevelopCommand(develop):
    method run (line 20) | def run(self):
Condensed preview — 90 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,506K chars).
[
  {
    "path": ".github/workflows/pypi.yml",
    "chars": 1094,
    "preview": "# This workflow will upload a Python Package using Twine when a release is created\n# For more information see: https://d"
  },
  {
    "path": ".gitignore",
    "chars": 151,
    "preview": "__pycache__/\n.ipynb_checkpoints/\nbasetts_outputs_use_bert/\nbasetts_outputs/\nmultilingual_ckpts\nbasetts_outputs_package/\n"
  },
  {
    "path": "Dockerfile",
    "chars": 316,
    "preview": "FROM python:3.9-slim\nWORKDIR /app\nCOPY . /app\n\nRUN apt-get update && apt-get install -y \\\n    build-essential libsndfile"
  },
  {
    "path": "LICENSE",
    "chars": 1053,
    "preview": "Copyright (c) 2024 MyShell.ai\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this soft"
  },
  {
    "path": "README.md",
    "chars": 3492,
    "preview": "<div align=\"center\">\n  <div>&nbsp;</div>\n  <img src=\"logo.png\" width=\"300\"/> <br>\n  <a href=\"https://trendshift.io/repos"
  },
  {
    "path": "docs/install.md",
    "chars": 5146,
    "preview": "## Install and Use Locally\n\n### Table of Content\n- [Linux and macOS Install](#linux-and-macos-install)\n- [Docker Install"
  },
  {
    "path": "docs/quick_use.md",
    "chars": 1666,
    "preview": "## Use MeloTTS without Installation\n\n**Quick Demo**\n\n- [Official live demo](https://app.myshell.ai/bot/UN77N3/1709094629"
  },
  {
    "path": "docs/training.md",
    "chars": 1403,
    "preview": "## Training\n\nBefore training, please install MeloTTS in dev mode and go to the `melo` folder. \n```\npip install -e .\ncd m"
  },
  {
    "path": "melo/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "melo/api.py",
    "chars": 5108,
    "preview": "import os\nimport re\nimport json\nimport torch\nimport librosa\nimport soundfile\nimport torchaudio\nimport numpy as np\nimport"
  },
  {
    "path": "melo/app.py",
    "chars": 3044,
    "preview": "# WebUI by mrfakename <X @realmrfakename / HF @mrfakename>\n# Demo also available on HF Spaces: https://huggingface.co/sp"
  },
  {
    "path": "melo/attentions.py",
    "chars": 15936,
    "preview": "import math\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom . import commons\nimport logging"
  },
  {
    "path": "melo/commons.py",
    "chars": 4956,
    "preview": "import math\nimport torch\nfrom torch.nn import functional as F\n\n\ndef init_weights(m, mean=0.0, std=0.01):\n    classname ="
  },
  {
    "path": "melo/configs/config.json",
    "chars": 1681,
    "preview": "{\n  \"train\": {\n    \"log_interval\": 200,\n    \"eval_interval\": 1000,\n    \"seed\": 52,\n    \"epochs\": 10000,\n    \"learning_ra"
  },
  {
    "path": "melo/data/example/metadata.list",
    "chars": 2860,
    "preview": "data/example/wavs/000.wav|EN-default|EN|Well, there are always new trends and styles emerging in the fashion world, but "
  },
  {
    "path": "melo/data_utils.py",
    "chars": 14829,
    "preview": "import os\nimport random\nimport torch\nimport torch.utils.data\nfrom tqdm import tqdm\nfrom loguru import logger\nimport comm"
  },
  {
    "path": "melo/download_utils.py",
    "chars": 3440,
    "preview": "import torch\nimport os\nfrom . import utils\nfrom cached_path import cached_path\nfrom huggingface_hub import hf_hub_downlo"
  },
  {
    "path": "melo/infer.py",
    "chars": 998,
    "preview": "import os\nimport click\nfrom melo.api import TTS\n\n    \n    \n@click.command()\n@click.option('--ckpt_path', '-m', type=str,"
  },
  {
    "path": "melo/init_downloads.py",
    "chars": 393,
    "preview": "\n\nif __name__ == '__main__':\n\n    from melo.api import TTS\n    device = 'auto'\n    models = {\n        'EN': TTS(language"
  },
  {
    "path": "melo/losses.py",
    "chars": 1386,
    "preview": "import torch\n\n\ndef feature_loss(fmap_r, fmap_g):\n    loss = 0\n    for dr, dg in zip(fmap_r, fmap_g):\n        for rl, gl "
  },
  {
    "path": "melo/main.py",
    "chars": 1850,
    "preview": "import click\nimport warnings\nimport os\n\n\n@click.command\n@click.argument('text')\n@click.argument('output_path')\n@click.op"
  },
  {
    "path": "melo/mel_processing.py",
    "chars": 5868,
    "preview": "import torch\nimport torch.utils.data\nimport librosa\nfrom librosa.filters import mel as librosa_mel_fn\n\nMAX_WAV_VALUE = 3"
  },
  {
    "path": "melo/models.py",
    "chars": 34027,
    "preview": "import math\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom melo import commons\nfrom melo i"
  },
  {
    "path": "melo/modules.py",
    "chars": 18975,
    "preview": "import math\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom torch.nn import Conv1d\nfrom tor"
  },
  {
    "path": "melo/monotonic_align/__init__.py",
    "chars": 563,
    "preview": "from numpy import zeros, int32, float32\r\nfrom torch import from_numpy\r\n\r\nfrom .core import maximum_path_jit\r\n\r\n\r\ndef max"
  },
  {
    "path": "melo/monotonic_align/core.py",
    "chars": 1270,
    "preview": "import numba\r\n\r\n\r\n@numba.jit(\r\n    numba.void(\r\n        numba.int32[:, :, ::1],\r\n        numba.float32[:, :, ::1],\r\n    "
  },
  {
    "path": "melo/preprocess_text.py",
    "chars": 4423,
    "preview": "import json\nfrom collections import defaultdict\nfrom random import shuffle\nfrom typing import Optional\n\nfrom tqdm import"
  },
  {
    "path": "melo/split_utils.py",
    "chars": 6251,
    "preview": "import re\nimport os\nimport glob\nimport numpy as np\nimport soundfile as sf\nimport torchaudio\nimport re\n\ndef split_sentenc"
  },
  {
    "path": "melo/text/__init__.py",
    "chars": 1477,
    "preview": "from .symbols import *\n\n\n_symbol_to_id = {s: i for i, s in enumerate(symbols)}\n\n\ndef cleaned_text_to_sequence(cleaned_te"
  },
  {
    "path": "melo/text/chinese.py",
    "chars": 5616,
    "preview": "import os\nimport re\n\nimport cn2an\nfrom pypinyin import lazy_pinyin, Style\n\nfrom .symbols import punctuation\nfrom .tone_s"
  },
  {
    "path": "melo/text/chinese_bert.py",
    "chars": 2481,
    "preview": "import torch\nimport sys\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\n\n\n# model_id = 'hfl/chinese-roberta"
  },
  {
    "path": "melo/text/chinese_mix.py",
    "chars": 8250,
    "preview": "import os\nimport re\n\nimport cn2an\nfrom pypinyin import lazy_pinyin, Style\n\n# from text.symbols import punctuation\nfrom ."
  },
  {
    "path": "melo/text/cleaner.py",
    "chars": 1245,
    "preview": "from . import chinese, japanese, english, chinese_mix, korean, french, spanish\nfrom . import cleaned_text_to_sequence\nim"
  },
  {
    "path": "melo/text/cleaner_multiling.py",
    "chars": 2590,
    "preview": "\"\"\"Set of default text cleaners\"\"\"\n# TODO: pick the cleaner for languages dynamically\n\nimport re\n\n# Regular expression m"
  },
  {
    "path": "melo/text/cmudict.rep",
    "chars": 3969309,
    "preview": "## Date:  August 8, 1998\n##\n## The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6] is Copyright 1998\n## by Carnegie"
  },
  {
    "path": "melo/text/english.py",
    "chars": 6479,
    "preview": "import pickle\nimport os\nimport re\nfrom g2p_en import G2p\n\nfrom . import symbols\n\nfrom .english_utils.abbreviations impor"
  },
  {
    "path": "melo/text/english_bert.py",
    "chars": 1194,
    "preview": "import torch\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\nimport sys\n\nmodel_id = 'bert-base-uncased'\ntok"
  },
  {
    "path": "melo/text/english_utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "melo/text/english_utils/abbreviations.py",
    "chars": 948,
    "preview": "import re\n\n# List of (regular expression, replacement) pairs for abbreviations in english:\nabbreviations_en = [\n    (re."
  },
  {
    "path": "melo/text/english_utils/number_norm.py",
    "chars": 2804,
    "preview": "\"\"\" from https://github.com/keithito/tacotron \"\"\"\n\nimport re\nfrom typing import Dict\n\nimport inflect\n\n_inflect = inflect"
  },
  {
    "path": "melo/text/english_utils/time_norm.py",
    "chars": 1173,
    "preview": "import re\n\nimport inflect\n\n_inflect = inflect.engine()\n\n_time_re = re.compile(\n    r\"\"\"\\b\n                          ((0?"
  },
  {
    "path": "melo/text/es_phonemizer/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "melo/text/es_phonemizer/base.py",
    "chars": 4339,
    "preview": "import abc\nfrom typing import List, Tuple\n\nfrom .punctuation import Punctuation\n\n\nclass BasePhonemizer(abc.ABC):\n    \"\"\""
  },
  {
    "path": "melo/text/es_phonemizer/cleaner.py",
    "chars": 2549,
    "preview": "\"\"\"Set of default text cleaners\"\"\"\n# TODO: pick the cleaner for languages dynamically\n\nimport re\n\n# Regular expression m"
  },
  {
    "path": "melo/text/es_phonemizer/es_symbols.json",
    "chars": 1180,
    "preview": "{\n    \"symbols\": [\n        \"_\",\n        \",\",\n        \".\",\n        \"!\",\n        \"?\",\n        \"-\",\n        \"~\",\n        \"\\"
  },
  {
    "path": "melo/text/es_phonemizer/es_symbols.txt",
    "chars": 78,
    "preview": "_,.!?-~…NQabdefghijklmnopstuvwxyzɑæʃʑçɯɪɔɛɹðəɫɥɸʊɾʒθβŋɦ⁼ʰ`^#*=ˈˌ→↓↑ ɡrɲʝɣʎː—¿¡"
  },
  {
    "path": "melo/text/es_phonemizer/es_symbols_v2.json",
    "chars": 1243,
    "preview": "{\n    \"symbols\": [\n        \"_\",\n        \",\",\n        \".\",\n        \"!\",\n        \"?\",\n        \"-\",\n        \"~\",\n        \"\\"
  },
  {
    "path": "melo/text/es_phonemizer/es_to_ipa.py",
    "chars": 393,
    "preview": "from .cleaner import spanish_cleaners\nfrom .gruut_wrapper import Gruut\n\ndef es2ipa(text):\n    e = Gruut(language=\"es-es\""
  },
  {
    "path": "melo/text/es_phonemizer/example_ipa.txt",
    "chars": 36314,
    "preview": "kapˈitulo ˈuno de daβˈid kˌoppeɾfjˈelð o el soβɾˈino de mi tˈia de tʃˈaɾles dˌiθjˈens.\nˈesta ɡɾˌaβaθjˈon de lˌiβɾˈiβoks "
  },
  {
    "path": "melo/text/es_phonemizer/gruut_wrapper.py",
    "chars": 6991,
    "preview": "import importlib\nfrom typing import List\n\nimport gruut\nfrom gruut_ipa import IPA # pip install gruut_ipa\n\nfrom .base imp"
  },
  {
    "path": "melo/text/es_phonemizer/punctuation.py",
    "chars": 5526,
    "preview": "import collections\nimport re\nfrom enum import Enum\n\nimport six\n\n_DEF_PUNCS = ';:,.!?¡¿—…\"«»“”'\n\n_PUNC_IDX = collections."
  },
  {
    "path": "melo/text/es_phonemizer/spanish_symbols.txt",
    "chars": 37,
    "preview": "dˌaβˈiðkopeɾfjl unθsbmtʃwɛxɪŋʊɣɡrɲʝʎː"
  },
  {
    "path": "melo/text/es_phonemizer/test.ipynb",
    "chars": 5440,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"ename\""
  },
  {
    "path": "melo/text/fr_phonemizer/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "melo/text/fr_phonemizer/base.py",
    "chars": 4339,
    "preview": "import abc\nfrom typing import List, Tuple\n\nfrom .punctuation import Punctuation\n\n\nclass BasePhonemizer(abc.ABC):\n    \"\"\""
  },
  {
    "path": "melo/text/fr_phonemizer/cleaner.py",
    "chars": 2875,
    "preview": "\"\"\"Set of default text cleaners\"\"\"\n# TODO: pick the cleaner for languages dynamically\n\nimport re\nfrom .french_abbreviati"
  },
  {
    "path": "melo/text/fr_phonemizer/en_symbols.json",
    "chars": 847,
    "preview": "{\"symbols\": [\n    \"_\",\n    \",\",\n    \".\",\n    \"!\",\n    \"?\",\n    \"-\",\n    \"~\",\n    \"\\u2026\",\n    \"N\",\n    \"Q\",\n    \"a\",\n  "
  },
  {
    "path": "melo/text/fr_phonemizer/fr_symbols.json",
    "chars": 1351,
    "preview": "{\n    \"symbols\": [\n        \"_\",\n        \",\",\n        \".\",\n        \"!\",\n        \"?\",\n        \"-\",\n        \"~\",\n        \"\\"
  },
  {
    "path": "melo/text/fr_phonemizer/fr_to_ipa.py",
    "chars": 742,
    "preview": "from .cleaner import french_cleaners\nfrom .gruut_wrapper import Gruut\n\n\ndef remove_consecutive_t(input_str):\n    result "
  },
  {
    "path": "melo/text/fr_phonemizer/french_abbreviations.py",
    "chars": 1360,
    "preview": "import re\n\n# List of (regular expression, replacement) pairs for abbreviations in french:\nabbreviations_fr = [\n    (re.c"
  },
  {
    "path": "melo/text/fr_phonemizer/french_symbols.txt",
    "chars": 84,
    "preview": "_,.!?-~…NQabdefghijklmnopstuvwxyzɑæʃʑçɯɪɔɛɹðəɫɥɸʊɾʒθβŋɦ⁼ʰ`^#*=ˈˌ→↓↑ ɣɡrɲʝʎː̃œøʁɒʌ—ɜɐ"
  },
  {
    "path": "melo/text/fr_phonemizer/gruut_wrapper.py",
    "chars": 7168,
    "preview": "import importlib\nfrom typing import List\n\nimport gruut\nfrom gruut_ipa import IPA # pip install gruut_ipa\n\nfrom .base imp"
  },
  {
    "path": "melo/text/fr_phonemizer/punctuation.py",
    "chars": 5442,
    "preview": "import collections\nimport re\nfrom enum import Enum\n\nimport six\n\n_DEF_PUNCS = ';:,.!?¡¿—…\"«»“”'\n\n_PUNC_IDX = collections."
  },
  {
    "path": "melo/text/french.py",
    "chars": 2885,
    "preview": "import pickle\nimport os\nimport re\n\nfrom . import symbols\nfrom .fr_phonemizer import cleaner as fr_cleaner\nfrom .fr_phone"
  },
  {
    "path": "melo/text/french_bert.py",
    "chars": 1215,
    "preview": "import torch\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\nimport sys\n\nmodel_id = 'dbmdz/bert-base-french"
  },
  {
    "path": "melo/text/japanese.py",
    "chars": 13440,
    "preview": "# Convert Japanese text to phonemes which is\n# compatible with Julius https://github.com/julius-speech/segmentation-kit\n"
  },
  {
    "path": "melo/text/japanese_bert.py",
    "chars": 1510,
    "preview": "import torch\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\nimport sys\n\n\nmodels = {}\ntokenizers = {}\ndef g"
  },
  {
    "path": "melo/text/ko_dictionary.py",
    "chars": 756,
    "preview": "# coding: utf-8\r\n# Add the word you want to the dictionary.\r\netc_dictionary = {\"1+1\": \"원플러스원\", \"2+1\": \"투플러스원\"}\r\n\r\n\r\nengl"
  },
  {
    "path": "melo/text/korean.py",
    "chars": 5970,
    "preview": "# Convert Japanese text to phonemes which is\n# compatible with Julius https://github.com/julius-speech/segmentation-kit\n"
  },
  {
    "path": "melo/text/opencpop-strict.txt",
    "chars": 4084,
    "preview": "a\tAA a\nai\tAA ai\nan\tAA an\nang\tAA ang\nao\tAA ao\nba\tb a\nbai\tb ai\nban\tb an\nbang\tb ang\nbao\tb ao\nbei\tb ei\nben\tb en\nbeng\tb eng\nb"
  },
  {
    "path": "melo/text/spanish.py",
    "chars": 3157,
    "preview": "import pickle\nimport os\nimport re\n\nfrom . import symbols\nfrom .es_phonemizer import cleaner as es_cleaner\nfrom .es_phone"
  },
  {
    "path": "melo/text/spanish_bert.py",
    "chars": 1216,
    "preview": "import torch\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\nimport sys\n\nmodel_id = 'dccuchile/bert-base-sp"
  },
  {
    "path": "melo/text/symbols.py",
    "chars": 4183,
    "preview": "# punctuation = [\"!\", \"?\", \"…\", \",\", \".\", \"'\", \"-\"]\npunctuation = [\"!\", \"?\", \"…\", \",\", \".\", \"'\", \"-\", \"¿\", \"¡\"]\npu_symbo"
  },
  {
    "path": "melo/text/tone_sandhi.py",
    "chars": 21326,
    "preview": "# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "melo/train.py",
    "chars": 22501,
    "preview": "# flake8: noqa: E402\n\nimport os\nimport torch\nfrom torch.nn import functional as F\nfrom torch.utils.data import DataLoade"
  },
  {
    "path": "melo/train.sh",
    "chars": 391,
    "preview": "CONFIG=$1\nGPUS=$2\nMODEL_NAME=$(basename \"$(dirname $CONFIG)\")\n\nPORT=10902\n\nwhile : # auto-resume: the code sometimes cra"
  },
  {
    "path": "melo/transforms.py",
    "chars": 7253,
    "preview": "import torch\nfrom torch.nn import functional as F\n\nimport numpy as np\n\n\nDEFAULT_MIN_BIN_WIDTH = 1e-3\nDEFAULT_MIN_BIN_HEI"
  },
  {
    "path": "melo/utils.py",
    "chars": 13223,
    "preview": "import os\nimport glob\nimport argparse\nimport logging\nimport json\nimport subprocess\nimport numpy as np\nfrom scipy.io.wavf"
  },
  {
    "path": "requirements.txt",
    "chars": 424,
    "preview": "txtsplit\ntorch\ntorchaudio\ncached_path\ntransformers==4.27.4\nnum2words==0.5.12\nunidic_lite==1.0.8\nunidic==1.1.0\nmecab-pyth"
  },
  {
    "path": "setup.py",
    "chars": 1010,
    "preview": "import os \nfrom setuptools import setup, find_packages\nfrom setuptools.command.develop import develop\nfrom setuptools.co"
  },
  {
    "path": "test/basetts_test_resources/en_egs_text.txt",
    "chars": 4966,
    "preview": "Did you ever hear a folk tale about a giant turtle?\nCan you name five cars that were popular in the 1970s?\nMay I ask wha"
  },
  {
    "path": "test/basetts_test_resources/es_egs_text.txt",
    "chars": 1869,
    "preview": "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.\nLas estrellas bailan en la noche"
  },
  {
    "path": "test/basetts_test_resources/fr_egs_text.txt",
    "chars": 1873,
    "preview": "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.\nLes étoiles dansent dans la nu"
  },
  {
    "path": "test/basetts_test_resources/jp_egs_text.txt",
    "chars": 286,
    "preview": "彼は毎朝ジョギングをして体を健康に保っています。\n私たちは来年、友人たちと一緒にヨーロッパ旅行を計画しています。\n新しいレストランで美味しい料理を試すことが楽しみです。\n彼女の絵は情熱と芸術性が溢れていて、見る人を魅了します。\n最近、忙しさ"
  },
  {
    "path": "test/basetts_test_resources/kr_egs_text.txt",
    "chars": 183,
    "preview": "안녕하세요! 오늘은 날씨가 정말 좋네요.\n한국 음식을 먹어보고 싶어요. 불고기랑 김치찌개가 제가 좋아하는 음식이에요.\n요즘에는 한국 드라마를 자주 보고 있어요. 정말 재미있어요.\n한글을 배우는 것이 재미있어요. 조금"
  },
  {
    "path": "test/basetts_test_resources/zh_mix_en_egs_text.txt",
    "chars": 342,
    "preview": "人工智能是一种非常适合和促进自上而下集中控制的技术,而加密货币则是一种完全关注自下而上分散合作的技术。\nWeb 3的一个目标是支持艺术家。\n欢迎来到Web 3与A6Z,一个由团队打造的构建下一代互联网的节目。\n我最喜欢的fruit是苹果。\n"
  },
  {
    "path": "test/test_base_model_tts_package.py",
    "chars": 1577,
    "preview": "from melo.api import TTS\nimport os\nimport glob\nimport sys\n\n\nlanguage = sys.argv[1]\nmodel = TTS(language=language)\n\nspeak"
  },
  {
    "path": "test/test_base_model_tts_package_from_S3.py",
    "chars": 1599,
    "preview": "from melo.api import TTS\nimport os\nimport glob\nimport sys\n\n\nlanguage = sys.argv[1]\nmodel = TTS(language=language, use_hf"
  }
]

// ... and 2 more files (download for full content)

About this extraction

This page contains the full source code of the myshell-ai/MeloTTS GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 90 files (15.4 MB), approximately 1.1M tokens, and a symbol index with 402 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!